*Geek Box: Heterogeneity

When you read a meta-analysis, you will inevitably come across the term ‘heterogeneity’, which reflects statistical tests for heterogeneity between the included studies. Heterogeneity between studies may relate to clinical factors, like participant characteristics or outcomes, methodological differences in study design, or variations in the results. These are all important because it can indicate that the effect of the same nominal exposure is not observed in all circumstances.

You can sometimes observe heterogeneity with your own eyes, if the forest plot of a meta-analysis shows studies either side of the ‘null’ 1.0 and to varying magnitudes of effect. But statistical tests provide more precision, and the two most common are the Chi-squared (χ²) test and the I² test. The Chi-squared is a simple yes/no hypothesis test to determine whether heterogeneity is present, and assumes all studies are the same: if the resulting p-value is significant (which for this test is often <0.1, not the customary <0.05), this means there is heterogeneity between studies.

The I² test measures the extent the heterogeneity expressed as a percentage: 0%-40% may not be important, 30-60% may represent moderate heterogeneity, 50%-90% substantial heterogeneity, and 75%-100% considerable heterogeneity.

Heterogeneity is neither ‘good’ nor ‘bad’, as it may be often be an inevitable result of differing methodology in trial designs. It can allow for important differences to be teased out in subgroup or sensitivity analysis. However, high heterogeneity is an indication that a meta-analysis may not have been appropriate.