Clark JS, Bell DM, et al.. (2011). Individual‐scale variation, species‐scale differences: inference needed to understand diversity. Ecology Letters, 14(12), 1273-1287. http://dx.doi.rg/10.1111/j.1461-0248.2011.01685.x ]]>

what do you mean by “most important”? A forward selection type procedure does nothing to address simpson’s paradox (as Jared points out above, it can induce confounding), but maybe you have something else in mind.

]]>Regarding ordering the variables, all I had in mind was taking action if Simpson’s Paradox is observed. It should be a clue that some other variable should be brought into the analysis first, i.e. the existing ordering should be changed. That should be easy.

]]>right thanks

]]>ok thanks for the clarification

]]>A continuous version of Simpson’s paradox, sometimes called the ecological fallacy…

The ecological “fallacy” was noted by Goodman, who noted that when individual-level data was aggregated, the correlation between two variables was typically different than when individual-level data was available and the correlation was calculated using the individual-level data. His solution was to use linear regression, which gave the same estimate as the individual level estimate of proportion. In a classic case, if the percent of one group supporting a candidate is calculated using the individual data (which is then just the mean of the vote for that candidate, if vote choice is coded 0-1) as it is for the regression estimate (essentially, in an election unit, let V be the vote for the candidate, X1 being the number in group one, p1 being the proportion voting for the candidate in group1, X2 being the number in group 2, p2 being the proportion voting for the candidate in group2, then V = X1p1 + X2p2 = X1p1 + X2p1 – X2p1 + X2p2 = (X1 + X2)p1 + X2(p2 – p1) which gives %Vote = a + %X2b, where b = p2 – p1, a = p1, and %V is percent vote for a candidate and %X2 is percent in group 2). Then the estimates for p1 and p2 from the regression will be approximately the same as from using the individual data _except_ for the empirical fact that the consistency of the p1 and p2, implicitly assumed in the regression, is rarely true. The relationship of the correlation coefficient and violations of this consistency assumption with its implications for regression analysis (including a theory of how central limiting processes imply a distribution on the error in the regression) are discussed at some length in

Arthur Lupia and Kenneth F. McCue. 1990. “Why the 1980’s Measures of Racially Polarized Voting Are Inadequate for the 1990’s.” Law and Policy 12: 353-387.

but it is the violation of the consistency of the coefficients between electoral units that gives the relation to SImpson’s paradox, so the ecological fallacy is a specialized subset of the paradox. In particular, in the example you give above about a drug interaction and gender, there is no aggregate data involved, and hence the term ecological fallacy, as is commonly used in the social science literature, would not be appropriate. It is appropriate to note that Simpson’s paradox (which I would interpret as multiple regimes generating the data) has made it very difficult to do ecological analysis.

]]>