R-squared vs P-value in Regression — Which Statistic Answers Which Question
R-squared and p-value answer different questions in regression. R² (and adjusted R²) measures how much of the variation in the outcome is explained by the model — model fit. P-value measures whether a specific coefficient is significantly different from zero — coefficient significance. Marketers confuse these constantly. A high R² with insignificant coefficients means the model fits but you don't know why; significant coefficients with low R² means the predictors matter but most variation is unexplained.
Regression output displays both R-squared and p-values, and most marketers learn to nod at them without internalizing what each one actually measures. The confusion is costly — using R² to justify a model that has no significant coefficients, or dismissing a model with significant coefficients because R² seems low. The two statistics answer separate questions.
R-squared — how much variation does the model explain?
R² is the proportion of variance in the dependent variable that the model explains. R² = 1 - (SS_residual / SS_total). Ranges from 0 (model explains nothing) to 1 (model explains everything).
Interpretation: R² = 0.65 means the model explains 65% of the variation in the outcome. The other 35% is unexplained — random variation, omitted variables, or noise.
What R² doesn't tell you: whether the model is correct, whether individual predictors matter, whether the residuals are well-behaved, whether the model will generalize.
Adjusted R²: penalizes R² for adding more predictors. Use Adjusted R² when comparing models with different numbers of predictors. Plain R² always increases when you add predictors, even useless ones.
P-value — is this coefficient significantly different from zero?
For each predictor's coefficient, the p-value tests the null hypothesis: 'This coefficient is zero (the predictor has no effect)'. A small p-value (typically <0.05) rejects the null — the coefficient is significantly different from zero.
Interpretation: p-value = 0.03 on a coefficient means there's a 3% chance of seeing this coefficient (or larger) by random chance if the predictor truly has no effect.
What p-value doesn't tell you: the size of the effect, the practical importance, whether the coefficient is causally meaningful, whether the model is correct.
Multiple testing correction: when testing many coefficients simultaneously, p-values inflate. Use Bonferroni or false-discovery-rate adjustments for high-dimensional regressions.
The four combinations operators see
- High R² + significant coefficients — model fits well and you know which predictors matter. Best case.
- High R² + insignificant coefficients — model fits but you can't attribute the fit to specific predictors. Often caused by multicollinearity (correlated predictors). Action: refit with ridge regression or remove correlated predictors.
- Low R² + significant coefficients — predictors matter but most variation is unexplained. Common in marketing data where outcomes have huge noise. Action: trust the directional finding, don't over-claim the magnitude.
- Low R² + insignificant coefficients — model fits poorly and predictors don't matter. Time to rethink the model.
When R² is misleading
R² can be high even when the model is wrong:
- Overfitting — too many predictors fit the training noise. Test on held-out data; the held-out R² is typically much lower.
- Time series with trend — both outcome and predictors trend over time, producing high R² with no real relationship (spurious regression). Check for stationarity; difference if needed.
- Influential outliers — a single observation can dominate R²; check Cook's distance.
- Functional form mismatch — linear regression on nonlinear relationship may have low R² that doesn't reflect the underlying relationship.
- Heteroscedastic residuals — variance differs across the range of predictors; R² is still computed but standard errors are wrong.
When p-value is misleading
P-values have well-known pathologies:
- P-hacking — running many tests, reporting only those with p<0.05; vastly inflates false positive rate
- Multiple comparisons — testing 20 coefficients, expect 1 false positive at α=0.05 by chance; adjust with Bonferroni
- Large sample sizes — with millions of observations, every coefficient becomes 'significant' even if practically meaningless. Look at effect sizes, not just p-values.
- Confounded coefficients — significant p-value on a confounded predictor is not a causal claim
- Multicollinearity — correlated predictors give unstable coefficients with inflated standard errors; p-values become unreliable
RGM Experts Say
The operating rule we use: R² tells you whether the model is useful for prediction. P-values tell you whether to trust the individual coefficients for explanation. MMM models often have R² of 0.7–0.9 with mixed coefficient significance — that means the overall fit is good but you should treat individual channel coefficients with skepticism. Use both statistics, but never use either one alone.
Other regression diagnostics that matter
- Residual plots — check for non-random patterns (curvature, fan shapes, outliers)
- VIF (Variance Inflation Factor) — measures multicollinearity; VIF > 5–10 indicates problematic correlation
- Durbin-Watson statistic — tests for autocorrelation in time-series residuals; should be near 2
- AIC / BIC — penalized model fit measures; lower is better; useful for model comparison
- F-statistic and overall p-value — tests whether the model as a whole has explanatory power
- Confidence intervals on coefficients — more informative than p-values for effect-size interpretation
- Cross-validation R² — out-of-sample R² is the honest measure of model quality
Related guides
Sources
- [1]Wasserman, All of Statistics; James, Witten, Hastie, Tibshirani, An Introduction to Statistical Learning; Gelman and Hill, Data Analysis Using Regression