During my first few years of teaching, I treated the interpretation of r^2 as a fill-in-the-blank exercise:
"__________ percent of the variation in (response variable) can be accounted for (can be explained by) the least squares regression line."
I told my students to memorize this interpretation and they would get it right on the test.
This is against everything I stand for in statistics education. I don’t want students to memorize and regurgitate; I want them to experience and think and reason and justify. I want them to know the “why”.
But….I continued to tell students to memorize.
And then I went to some workshops and I found some activities that developed the conceptual idea of interpreting r^2 (drawing squares on graph paper, Starburst activity). And….I still continued to tell students to memorize. I couldn’t justify spending 60 minutes on one of these activities where only half of the students might internalize the process and we were only getting one little concept. It just wasn’t efficient enough.
Can You Guess My IQ?
So finally after 13 years of teaching statistics, I tried to create an engaging, effective, efficient context for getting students to understand the interpretation of r^2. The Activity is called “Can You Guess My IQ”.
This activity takes my students about 25 minutes to complete. It is all student directed, with a 5 minute wrap-up where I help students eventually arrive at the specific words of the interpretation. I also show them this visual representation of the activity. Huge thank you to Steve Phelps (@giohio) for creating this incredible resource.
In addition to arriving at the proper interpretation, the activity also reviews:
How to use the LSRL to make predictions
How to calculate residuals
What makes the line of best fit the line of best fit
A Connection to Multiple Regression
It is very common in real world data analysis that there is more than one explanatory variable that helps to predict a response variable.
Question: Which explanatory variable is the most helpful?
Answer: The one that creates a line of best fit that best helps explain the variation in the response variable (highest r^2 value).
It is also common to see what happens to the r^2 value as additional explanatory variables are added to a model. With each additional explanatory variable added to the model, we can assess how much more of the variability in the response variable is being accounted for by the new model by looking at the increase in the r^2 value (see 2014 AP Exam FR#6).