No autocorrelation, or other sequencing issues such as a series type of regressor, time or any other order inducing regressor. Individual regressors are independent of residuals Mostly to be checked by understanding the data collection method. Homoscedasticty (equal variance) of residuals (after accounting for the variance function) with respect to the predicted response Link function for GLM provides linearity to the regressors for GLM. Linearity of each of the individual regressors. Linearity of the predicted response (on the link scale for GLM) to the response variable. The model assumptions to be checked are organized as follows in this article.Ĭorrect functional form of the expected means. A separate distributional assumption for the errors is not always required for GLM's. For example, a response variable that is the number of successes in a fixed number of trials would be expected to follow a binomial distribution. The GLM residuals may have a distributional assumption depending on the response variable being modeled. The variance of the residuals of a GLM is based on the \(v(\cdot)\) function. The OLS model assumes that the residual part of the model, the part of the response which is unexplained by the regressors, has a constant variance and that it is normally distributed. \text\).) The OLS model assumes the means are a linear function of the regressors, where GLM's assume a transformation (link function) of regressors. The diagnostics of this article are for application to Ordinary Least Squares (OLS) models and Generalized Linear Models (GLM.) (Most of the diagnostics can be applied to other regression models, often with some modification.) We specify the OLS model, Quality of model evaluations using such measures as AIC, BIC, (adjusted) R \(^2\), etc. Multiple correlation and other issues of redundancy of regressors. These include but are not limited toĬhoosing between alternate set of regressors that produce similar models. Issues associated with decisions on which regressors to include in a model. Mixed models (models with random effects) Concerns raised by diagnostics are resolved by model selection, with the additional information obtained through the diagnostics. The reader is responsible for learning the theory and gaining the experience needed to properly diagnose a regression model.Ĭorrective actions for issues identified by diagnostics. This article should not to be taken as a complete coverage of the theory for model diagnostics or an exhaustive set of diagnostics for all models.
Both R and Stata code for the diagnostic examples are provided. The article also includes an example model that is used to demonstrate running these diagnostics.
This article provides an overview of diagnostics for regression models. In this article we separate diagnostics from the other parts of model selection to provide a focus on this important topic this separation is not meant to imply that these tools are used separately from other regression modeling tools. Practitioners would typically run diagnostics as part of the process to select a model. These tools allow practitioners to evaluate if a model appropriately represents the data of their study. Diagnostics for regression models are tools that assess a model's compliance to its assumptions and investigate if there is a single observation or group of observations that are not well represented by the model.