R-squared vs P-value in Regression — Which Statistic Answers Which Question

R-squared and p-value answer different questions in regression. R² (and adjusted R²) measures how much of the variation in the outcome is explained by the model — model fit. P-value measures whether a specific coefficient is significantly different from zero — coefficient significance. Marketers confuse these constantly. A high R² with insignificant coefficients means the model fits but you don't know why; significant coefficients with low R² means the predictors matter but most variation is unexplained.

Regression output displays both R-squared and p-values, and most marketers learn to nod at them without internalizing what each one actually measures. The confusion is costly — using R² to justify a model that has no significant coefficients, or dismissing a model with significant coefficients because R² seems low. The two statistics answer separate questions.

R-squared — how much variation does the model explain?

R² is the proportion of variance in the dependent variable that the model explains. R² = 1 - (SS_residual / SS_total). Ranges from 0 (model explains nothing) to 1 (model explains everything).

Interpretation: R² = 0.65 means the model explains 65% of the variation in the outcome. The other 35% is unexplained — random variation, omitted variables, or noise.

What R² doesn't tell you: whether the model is correct, whether individual predictors matter, whether the residuals are well-behaved, whether the model will generalize.

Adjusted R²: penalizes R² for adding more predictors. Use Adjusted R² when comparing models with different numbers of predictors. Plain R² always increases when you add predictors, even useless ones.

P-value — is this coefficient significantly different from zero?

For each predictor's coefficient, the p-value tests the null hypothesis: 'This coefficient is zero (the predictor has no effect)'. A small p-value (typically <0.05) rejects the null — the coefficient is significantly different from zero.

Interpretation: p-value = 0.03 on a coefficient means there's a 3% chance of seeing this coefficient (or larger) by random chance if the predictor truly has no effect.

What p-value doesn't tell you: the size of the effect, the practical importance, whether the coefficient is causally meaningful, whether the model is correct.

Multiple testing correction: when testing many coefficients simultaneously, p-values inflate. Use Bonferroni or false-discovery-rate adjustments for high-dimensional regressions.

The four combinations operators see

High R² + significant coefficients — model fits well and you know which predictors matter. Best case.
High R² + insignificant coefficients — model fits but you can't attribute the fit to specific predictors. Often caused by multicollinearity (correlated predictors). Action: refit with ridge regression or remove correlated predictors.
Low R² + significant coefficients — predictors matter but most variation is unexplained. Common in marketing data where outcomes have huge noise. Action: trust the directional finding, don't over-claim the magnitude.
Low R² + insignificant coefficients — model fits poorly and predictors don't matter. Time to rethink the model.

When R² is misleading

R² can be high even when the model is wrong:

Overfitting — too many predictors fit the training noise. Test on held-out data; the held-out R² is typically much lower.
Time series with trend — both outcome and predictors trend over time, producing high R² with no real relationship (spurious regression). Check for stationarity; difference if needed.
Influential outliers — a single observation can dominate R²; check Cook's distance.
Functional form mismatch — linear regression on nonlinear relationship may have low R² that doesn't reflect the underlying relationship.
Heteroscedastic residuals — variance differs across the range of predictors; R² is still computed but standard errors are wrong.

When p-value is misleading

P-values have well-known pathologies:

P-hacking — running many tests, reporting only those with p<0.05; vastly inflates false positive rate
Multiple comparisons — testing 20 coefficients, expect 1 false positive at α=0.05 by chance; adjust with Bonferroni
Large sample sizes — with millions of observations, every coefficient becomes 'significant' even if practically meaningless. Look at effect sizes, not just p-values.
Confounded coefficients — significant p-value on a confounded predictor is not a causal claim
Multicollinearity — correlated predictors give unstable coefficients with inflated standard errors; p-values become unreliable

RGM Experts Say

The operating rule we use: R² tells you whether the model is useful for prediction. P-values tell you whether to trust the individual coefficients for explanation. MMM models often have R² of 0.7–0.9 with mixed coefficient significance — that means the overall fit is good but you should treat individual channel coefficients with skepticism. Use both statistics, but never use either one alone.

Other regression diagnostics that matter

Residual plots — check for non-random patterns (curvature, fan shapes, outliers)
VIF (Variance Inflation Factor) — measures multicollinearity; VIF > 5–10 indicates problematic correlation
Durbin-Watson statistic — tests for autocorrelation in time-series residuals; should be near 2
AIC / BIC — penalized model fit measures; lower is better; useful for model comparison
F-statistic and overall p-value — tests whether the model as a whole has explanatory power
Confidence intervals on coefficients — more informative than p-values for effect-size interpretation
Cross-validation R² — out-of-sample R² is the honest measure of model quality

Related guides

Sources

[1]Wasserman, All of Statistics; James, Witten, Hastie, Tibshirani, An Introduction to Statistical Learning; Gelman and Hill, Data Analysis Using Regression