R-squared vs P-value in Regression — Which Statistic Answers Which Question

R-squared and p-value answer different questions in regression. R² (and adjusted R²) measures how much of the variation in the outcome is explained by the model — model fit. P-value measures whether a specific coefficient is significantly different from zero — coefficient significance. Marketers confuse these constantly. A high R² with insignificant coefficients means the model fits but you don't know why; significant coefficients with low R² means the predictors matter but most variation is unexplained.

Regression output displays both R-squared and p-values, and most marketers learn to nod at them without internalizing what each one actually measures. The confusion is costly — using R² to justify a model that has no significant coefficients, or dismissing a model with significant coefficients because R² seems low. The two statistics answer separate questions.

R-squared — how much variation does the model explain?

R² is the proportion of variance in the dependent variable that the model explains. R² = 1 - (SS_residual / SS_total). Ranges from 0 (model explains nothing) to 1 (model explains everything).

Interpretation: R² = 0.65 means the model explains 65% of the variation in the outcome. The other 35% is unexplained — random variation, omitted variables, or noise.

What R² doesn't tell you: whether the model is correct, whether individual predictors matter, whether the residuals are well-behaved, whether the model will generalize.

Adjusted R²: penalizes R² for adding more predictors. Use Adjusted R² when comparing models with different numbers of predictors. Plain R² always increases when you add predictors, even useless ones.

P-value — is this coefficient significantly different from zero?

For each predictor's coefficient, the p-value tests the null hypothesis: 'This coefficient is zero (the predictor has no effect)'. A small p-value (typically <0.05) rejects the null — the coefficient is significantly different from zero.

Interpretation: p-value = 0.03 on a coefficient means there's a 3% chance of seeing this coefficient (or larger) by random chance if the predictor truly has no effect.

What p-value doesn't tell you: the size of the effect, the practical importance, whether the coefficient is causally meaningful, whether the model is correct.

Multiple testing correction: when testing many coefficients simultaneously, p-values inflate. Use Bonferroni or false-discovery-rate adjustments for high-dimensional regressions.

The four combinations operators see

  • High R² + significant coefficients — model fits well and you know which predictors matter. Best case.
  • High R² + insignificant coefficients — model fits but you can't attribute the fit to specific predictors. Often caused by multicollinearity (correlated predictors). Action: refit with ridge regression or remove correlated predictors.
  • Low R² + significant coefficients — predictors matter but most variation is unexplained. Common in marketing data where outcomes have huge noise. Action: trust the directional finding, don't over-claim the magnitude.
  • Low R² + insignificant coefficients — model fits poorly and predictors don't matter. Time to rethink the model.

When R² is misleading

R² can be high even when the model is wrong:

  • Overfitting — too many predictors fit the training noise. Test on held-out data; the held-out R² is typically much lower.
  • Time series with trend — both outcome and predictors trend over time, producing high R² with no real relationship (spurious regression). Check for stationarity; difference if needed.
  • Influential outliers — a single observation can dominate R²; check Cook's distance.
  • Functional form mismatch — linear regression on nonlinear relationship may have low R² that doesn't reflect the underlying relationship.
  • Heteroscedastic residuals — variance differs across the range of predictors; R² is still computed but standard errors are wrong.

When p-value is misleading

P-values have well-known pathologies:

  • P-hacking — running many tests, reporting only those with p<0.05; vastly inflates false positive rate
  • Multiple comparisons — testing 20 coefficients, expect 1 false positive at α=0.05 by chance; adjust with Bonferroni
  • Large sample sizes — with millions of observations, every coefficient becomes 'significant' even if practically meaningless. Look at effect sizes, not just p-values.
  • Confounded coefficients — significant p-value on a confounded predictor is not a causal claim
  • Multicollinearity — correlated predictors give unstable coefficients with inflated standard errors; p-values become unreliable

RGM Experts Say

The operating rule we use: R² tells you whether the model is useful for prediction. P-values tell you whether to trust the individual coefficients for explanation. MMM models often have R² of 0.7–0.9 with mixed coefficient significance — that means the overall fit is good but you should treat individual channel coefficients with skepticism. Use both statistics, but never use either one alone.

Other regression diagnostics that matter

  • Residual plots — check for non-random patterns (curvature, fan shapes, outliers)
  • VIF (Variance Inflation Factor) — measures multicollinearity; VIF > 5–10 indicates problematic correlation
  • Durbin-Watson statistic — tests for autocorrelation in time-series residuals; should be near 2
  • AIC / BIC — penalized model fit measures; lower is better; useful for model comparison
  • F-statistic and overall p-value — tests whether the model as a whole has explanatory power
  • Confidence intervals on coefficients — more informative than p-values for effect-size interpretation
  • Cross-validation R² — out-of-sample R² is the honest measure of model quality

Related guides

Sources

  1. [1]Wasserman, All of Statistics; James, Witten, Hastie, Tibshirani, An Introduction to Statistical Learning; Gelman and Hill, Data Analysis Using Regression