I have examined the last five years of the Journal of Broadcasting and Electronic Media (Journal), specifically with an eye toward those articles using regression. (They all use "multiple regression," so I'll just call it "regression.") What's common about the articles? How are they different? What is done especially well? What can be improved? My own research is included in this set, so some of the comments -- especially the improvements -- apply to me as well. I will not cite any authors because we know who we are! The purpose of this article is not to point fingers at anyone: it is to provoke discussion so that we may agree on some common or minimal expectations for regression in Journal manuscripts.
Regression is a popular statistical technique. Depending on what is included (for example, whether to count discriminant analysis), there is an average of almost two articles that use regression per issue of the Journal, and as many as ten separate regression models used in each article. Various methods have been employed, with the most popular being hierarchical. Yet, only one article during this time addressed any model assumptions.
Probably none of us fully understands regression. Our use of regression has evolved over the 25 years I've been reading the Journal. For instance, the use of hierarchical was almost unheard of twenty years ago.
Most research methods texts -- old and new, regardless of field (e.g., Stevens, 1996; Monge, 1980; Blalock, 1979) -- present the same major assumptions that are almost universally ignored by Journal authors (but also ignored by Kerlinger & Pedhazur, 1973, and Glass & Stanley, 1970). Perhaps these assumptions regarding biased estimators, variance, efficiency and linearity are ignored because we social scientists use large samples, and as sample sizes get large -- "say into the hundreds" (Monge, 1980, p. 23) -- our estimates become less biased and less variant. (Note that large samples do not apply to the linearity assumption.) Perhaps.
Yet, no author has explicitly stated this. Monge (1980, pp. 50ff) recommends plotting residuals to test these assumption. These plots can be produced easily, and can be useful in examining the linearity assumption as well.
Multicollinearity is another issue, that of the interrelations among the independent variables. High interrelations give us a false sense of accomplishment, since they overestimate the overall relation between the independent variables and the dependent variable. They artificially inflate the estimate of explained variance. At the same time, they can cause significant individual relations to appear nonsignificant. This issue is easily examined with a correlation table. Monge (1980, p. 52) suggests that social scientists only need to worry about high interrelations, "correlation among two or more predictors above .70" [emphasis added].
The Value of Regression
Clearly, regression is valuable to us. Studying social phenomena, we know that behavior is determined by many variables. Not only does regression allow us to examine the relation between a set of independent variables and a dependent variable, it allows us to determine which of the independent variables is most important.
I was especially pleased to see that many articles used hierarchical regression...