Visualizing R-Squared in Statistics
Prepared by Ian McLeod
In general, R2 is often referred to as the coefficient of determination and its verbal interpretation is that it is the fraction of variation explained by the model. This in fact is illustrated in our Demonstration in the case of simple linear regression. A random sample of size n is generated from a bivariate normal distribution with correlation parameter p, means 0, and variances 1. The first plot, graph 1, shows the data and the fitted regression line. Next, graph 2 shows the data and the fitted points. In graph 3, a rug is added on each of the y axes. The axes on the left with the blue rug show the y values; the axes on the right with the red rug show the fitted y values. The plot label shows R2 = r2 = Sreg/Sy, where r is the correlation coefficient, Sy is the sum of the squared deviations of the 's from their mean, and Sreg is the sum of the squared deviations of the ŷ's from their mean. In graph 3, the rugs provide a visualization of the spread of the y's and ŷ's. In statistics courses not based on calculus, it is often mentioned when discussing simple linear regression that R2 = r2 and that r2 is the fraction of variation explained by the model. This Demonstration is especially useful for explaining the concept to these students. Notice how the plot changes dramatically as the parameter p is changed. You may also wish to experiment with different sample sizes by changing n or different random samples by changing the seed.

Download the CDF file to view the simulation using the free Wolfram CDF player.