Gail Burrill is a member of the Program for Mathematics Education at Michigan State University and a long time statistics teacher at both the high school and college levels. She was President of the National Council of Teachers of Mathematics and of the International Association for Statistical Education and is currently President of the Council of the Presidential Awardees in Mathematics. Gail is a fellow of the ASA and a T-Cubed instructor. She has published numerous articles and books on teaching and learning statistics.
Technology has been instrumental in shifting statistics from a formal mathematical approach to one driven by data. As technology continues to evolve - think Symbolab, Photomath, and ChatGPT- the question of what we should be teaching becomes increasingly relevant. What should students learn and understand about mathematics and statistics? How much of what we teach is still important?
A Look Back to the Past
My statistics teaching career goes way back to the days when calculators only performed arithmetic operations. Back then, in teaching statistics we used an electronic calculating machine that divided by subtracting; students were amazed that when you divided by zero (with a klunk for every time it subtracted), the machine kept subtracting until it was unplugged. Measures of center and variability were primarily mean and standard deviation because the machines could not sort to find medians, and the process was tedious.
Formulas were contrived to make computations easier. For example, the standard deviation by definition is
but to make the work manageable, we used the formula
(It was a nice algebraic exercise to prove these were equivalent.) Early statistical calculators such as the SHARP EL-520WB even had second register keys for Σxy, Σx, Σy, Σx², Σy² (figure 1) for parts of the formulas for measures such as standard deviation and correlation coefficient. Needless to say, the calculations did not support sense making, and the concepts were rarely understood by many students.
Moving to the Future
Today the statistical measures you are likely to need are reported with a keystroke; many of us have students work several examples using the definition to build conceptual understanding before turning to technology to do the work. The point of this reflection is some things we had to teach because it was the only way to get what we needed, but today many things can be done quickly and easily by the technology. So, the question is, what are we teaching today that we did only because we had to in the past when the technology was not available? And this question is now critical because we are being asked to include working with large data sets in what we teach. This means the curriculum has to expand to include data cleaning; data management moves such as filtering, merging, making hierarchy; new techniques for interrogating and visualizing data such as decision trees and choropleth maps; algorithmic predictive modeling; and computational thinking. Given the reality of the time we can claim for statistics in the curriculum, something has to give.
The following suggestions offer some candidates for possible retirement: feel free to think of your own – or to disagree.
Consider the formulas for standard deviation of a sample and of a population. In the past, by necessity most of the samples were small - today, much of our work is done with large samples where the difference between dividing by n and n-1 is negligible. It would reduce a lot of confusion if there were only one standard deviation key on the calculator (and would be fine to leave why we cared about small samples for extensions).
How about histograms - some may not remember the tedious construction of histograms by hand with “rules” for bin widths before it was possible to change them by dragging or clicking. Given that it is possible to create dot plots with thousands of values, do we need histograms? My students find it easier to think about a distribution when looking at the actual values rather than at bars that contain some indeterminate number of values.
Random number tables can help students understand the notion of randomness and that streaks happen, but why use tables when several keystrokes will generate as many as one could want?
Do we really need all of the formal tests when it is so easy to use bootstrapping and randomization techniques?
In an era of large samples, are p-values less than 0.05 still considered the gold standard for determining the merits of an experiment? How do we deal with statistical significance vs practical significance?
As we think about what students of tomorrow will need to move around in a world of data, we have to make hard choices about what we teach. We no longer use the geometric methods of ancient cultures to solve linear and quadratic equations, and we no longer teach students to calculate using characteristics and mantissas. What statistical ideas should we be teaching today’s students for tomorrow’s world?
Comments