top of page

Would This Get Credit? AP Statistics Exam 2024 #6

Writer's picture: Kathy PetkoKathy Petko

Updated: 11 minutes ago



Kathy Petko was a Table Leader for this year’s Question #6, the Investigative Task. She teaches AP Statistics at Palatine High School in Palatine, Illinois, and General Education Statistics at McHenry County College in Crystal Lake, Illinois. Kathy has been an AP Statistics Reader since 2012 and a Table Leader since 2020. She has written multiple choice questions for the AP Statistics Exam and has also served on the AP Statistics Instructional Design Team where she tagged items for use in AP Classroom.

 

I really like the Investigative Task because it begins by requiring the students to show what they know about a topic, then it holds their hand while teaching them something new, and finally asks them to apply their newfound knowledge to something they have not previously been taught. This question assesses the concepts of inference, describing the shape of a distribution, and identifying outliers of a distribution in the context of Julio and his whistles. It then goes on to ask the student to calculate a new statistic, plot it correctly on a graph, and interpret the skewness of a distribution based on their results. 


The Question - 2024 #6 and Rubric

A company sells a certain type of whistle. The price of the whistle varies from store to store. Julio, a statistician at the company, wants to estimate the mean price, in dollars ($), of this type of whistle at all stores that sell the whistle.

 

Part (a)

(a) (i) Identify the appropriate inference procedure for Julio to use.

 

WOULD THIS GET CREDIT?


When we name an inference procedure for proportion(s) or mean(s), we should mention the following four things:

 

1.     Proportions or means?

2.     z (or proportions) or t (for means)?

3.     One or two samples (groups)?

4.     Confidence interval or significance test?

 

In this particular problem, the details of the parameter were requested in part (a-ii), so they were not required here in part (a-i).

 

This part of question 6 was missed quite frequently. By far the most common student error here was to identify a “test” instead of an “interval.” Julio was intending to “estimate” the mean price of the whistles, meaning a confidence interval was needed. Also, some students did not specify how many samples were used, or used a plural form of the word means, implying that there were two samples rather than one. It was common for students to omit the t, or mistakenly write z for the distribution.

 

Teaching tips:

  • Be sure that students know that confidence intervals and significance tests are types of inference procedures.  Many students do not know what is being asked when they are instructed to name an inference procedure. 

  • When learning the different inference procedures, require students to name the procedure every time they perform the procedure.

  • When naming an inference procedure, have students ask themselves four questions:

    1.     Proportions or means?

    2.     z (or proportions) or t (for means)?

    3.     One or two samples (groups)?

    4.     Confidence interval or significance test?


 (ii) Describe the parameter for the inference procedure you identified in part (a-i) in context.

 

WOULD THIS GET CREDIT?

Part (a-ii) requires the student to identify the mean as the correct parameter. They also need to provide sufficient context for the parameter by referencing the population (“all stores that sell the whistle”, or by using either the word “true” or the symbol μ), and the variable of interest (“the price of this type of whistle”).

 

Many students did not know what this prompt was asking for. Students often did not know what a parameter is and how to describe it in context. Students sometimes listed conditions for an inference procedure, or described the process of collecting data and calculating a mean.

 

Teaching tips:

  • Always require students to answer free response questions using the context of the question.

  • Have students use the words provided in the stem of the question “mean price, in dollars ($), of this type of whistle at all stores that sell the whistle” when describing a parameter.

  • Be sure that students know the difference between population parameters and sample statistics (as well as the corresponding symbols), and can clearly describe the correct one, in context, for a given situation.


Part (b)

Julio called the managers of 20 randomly selected stores that sell the whistle and recorded the price of the whistle at each store. Following is a dotplot of Julio’s data.              

 The summary statistics for Julio’s data are shown in the following table.

(b) Julio wants to examine some characteristics of the distribution of the sample of whistle prices.     

(i) Describe the shape of the distribution of the sample of whistle prices. Justify your response using appropriate values from the summary statistics table.


Response 1:

The distribution of the sample of whistle prices is left skewed.

 

Response 2:

The distribution of the sample of whistle prices is right skewed because there is more data on the left side of the dotplot and there is a long tail on the right.

 

Response 3:

The distribution of the sample of whistle prices appears skewed right, since the mean is higher than the median.

 

Response 4:

The distribution is right skewed because the difference between the maximum and the median is greater than the difference between the minimum and the median.

 

Response 5:

The distribution of the sample of whistle prices appears approximately symmetric, because the mean and median are relatively close together.


WOULD THIS GET CREDIT?

This part of the question was graded as two components:

(1)   Indicates the correct shape (skewed right or approximately symmetric)

(2)   Justifies the shape using summary statistics

 

Response 1 does not satisfy either component because it mistakenly describes the distribution as left skewed rather than right skewed. It also does not justify using values from the summary statistics table, as requested in the prompt.

 

Response 2 satisfies the first component, but not the second. It correctly describes the distribution as right skewed, but references the dotplot rather than using appropriate values from the summary statistics table, as requested in the prompt.

 

Response 3 satisfies both components.  It correctly describes the distribution of the sample as skewed right, and justifies this decision correctly by stating that the mean is higher than the median

 

Response 4 satisfies both components because it correctly describes the distribution of the sample as right skewed, and gives an acceptable rationale for this conclusion based on values from the summary statistics table.

 

Response 5 is acceptable in this situation. Although the distribution is not described as right skewed, this response is adequate here because the values of the mean and median are relatively close to each other, leading to the description of approximately symmetric.


Teaching tips:        

  • It is important that students know the difference between a left skewed distribution and a right skewed distribution. I tell my students that a left skewed distribution looks like the toes on your left foot, and a right skewed distribution looks like the toes on your right foot. 

  • Calculating summary statistics for symmetric as well as skewed distributions helps students to learn the characteristics of distributions with different shapes.

  • Stress to students to read the directions in the prompt carefully.  When a graph is given, this does not necessarily mean that they should describe the entire distribution (shape, center, variability, unusual points, etc.). This prompt only asked the student to describe the shape.


     (ii) Using the 1.5 x IQR rule, determine whether there are any outliers in the sample of whistle prices. Justify your response.


 WOULD THIS GET CREDIT?

Although the formula for outliers is shown in the full credit response above, it was not required to earn full credit for this part.

 

Using the median in the 1.5 x IQR calculation is incorrect, and results in incorrect values for the upper and lower fences. Some students miscalculated the outlier criteria in other ways as well, resulting in them mistakenly identifying one or more outliers.


Teaching tips:        

  • It is important that students know the correct formula for the outlier criteria, and practice using it when all sample data is given, and also when only summary statistics are given. 

  • Reinforce to students to double check values that they use to plug into formulas, as oftentimes they miscopy a value, or use an incorrect value from a table of summary statistics.


Part (c)

It can often be difficult to determine whether the distribution of sample data is skewed by looking at a graph of the data and the summary statistics, particularly when the sample size is small. Thus, statisticians sometimes measure how skewed a data set is. One such measure is Pearson’s coefficient of skewness, which is calculated using the following formula.

 (c) (i) Calculate Pearson’s coefficient of skewness for Julio’s sample of 20 whistle prices. Show your work.


WOULD THIS GET CREDIT?

It is necessary that supporting work is shown in this calculation to get credit.

 

While many students calculated Pearson’s coefficient of skewness correctly, some students miscalculated it, leading to an error in part (c-ii).


Teaching tips:        

  • Remind students to be careful when using their calculator to simplify an expression.  Sometimes they may type an incorrect value, miscopy a value from a table, or forget a parenthesis, resulting in an incorrect answer.

  • Students should always justify how they come up with their answers by showing their work.


The following graph shows conclusions that can be made about the shape of the distribution of sample data based on Pearson’s coefficient of skewness and sample size.

(ii) Indicate the value of the Pearson’s coefficient of skewness you calculated in part (c-i) for the appropriate sample size by marking it with an “X” on the preceding graph.


Response 1:

Response 2:

Response 3:

WOULD THIS GET CREDIT?

Response 1 places the X incorrectly based on an error in the calculation of the Pearson’s coefficient of skewness. The student accidentally substitutes the sample size of 20 where they should substitute the sample mean (they are next to each other in the table).  This results in an answer of approximately 61.029, which the student incorrectly plots as 0.61, along with a correct sample size of 20. This leads the student to place the X is in the shaded area of the graph, rather than the unshaded area.

 

Response 2 places the X at the correct Pearson’s coefficient of skewness of 0.949, but the student thinks that the X needs to be placed on the curve rather than at a sample size of 20, resulting in an incorrect answer. This causes confusion to the student about what the graph tells them because the X is not in either of the labelled areas.

 

Response 3 is a correct solution.  It correctly places the X at a Pearson’s coefficient of skewness of 0.949 and a sample size of 20, clearly putting the X into the unshaded area of the graph.


Teaching tips:        

  • Stress to students that carefully reading directions is very important, particularly in the Investigative Task where they will be asked to do something new. I suggest that they underline and/or circle important information as they read the prompt to make it easy to find later if needed.

  • Remind students to always look at the labels on the axes when presented with a graph.  Oftentimes, that gives valuable information about the graph (like the sample size on the vertical axis in this problem). Many students did not take the sample size into account when placing their X on the graph, resulting in an incorrect placement of their X.


Part (d)

(d) Consider your work in part (c).

 

(i) What should you conclude about the shape of the distribution of the sample of whistle prices? Justify your response.

 

WOULD THIS GET CREDIT?

In order to get credit for this part, the student needs to write a response that is consistent with the placement of their X in part (c-ii) referencing either the location of their X on the graph, or their calculated value of the Pearson’s coefficient of skewness. 

  • If their X was placed in the unshaded region, they need to say that the shape of the distribution is “strongly skewed”. 

  • If their X was placed (incorrectly) inside the shaded region, they need to say that the shape of the distribution is “approximately symmetric”.

  • If their X was placed (incorrectly) on the curve, they need to say that the sample is skewed (not strongly skewed or approximately symmetric) or indicate that it is not possible to choose whether it is strongly skewed or approximately symmetric.


Teaching tips:        

  • We cannot “see” a normal distribution.  Students may be seeing a bell-shaped symmetric curve in the graph and incorrectly conclude that the distribution of the sample data is normal.

  • Students should use the words that are given on the graph. They shouldn’t change the words to other words with different meanings like “most likely skewed”.



Julio’s inference procedure in part (a-i) needs one of the following requirements to be satisfied to verify the normality condition.

  • The sample size is greater than or equal to 30.

  • If the sample size is less than 30, the distribution of the sample data is not strongly skewed and does not have outliers.

 (ii) Using your response to (d-i) and the preceding requirements, is the normality condition satisfied for Julio’s data? Explain your response.


The student’s response to this part of the question must pull together their answers from parts (b), (c) and (d-i) to form a correct interpretation of their results. Individual student responses may vary based on their answers to previous parts, but must agree with their description of the distribution in part (d-i).

 

If the X was placed inside the shaded area, and the distribution was identified in (d-i) as “approximately symmetric”:

Response 1:

The distribution of sample data is approximately symmetric (not strongly skewed), and does not have outliers.

 

Response 2:

Yes, the normality condition is met. Since the sample size is 20 which is less than 30, the distribution of the sample data is approximately symmetric (not strongly skewed), and does not have outliers.

 

If the X was placed on the curve, and the distribution was identified in (d-i) as “skewed, but not strongly skewed”:

Response 3:

No, the normality condition is not met. The sample size is 20 < 30, the distribution is strongly skewed, and does not have any outliers.

 

Response 4:

Yes, the normality condition is met. The sample size is 20, which is less than 30, the distribution is not very skewed and there are no outliers. 


If the X was placed in the unshaded area and the distribution was described in (d-i) as “strongly skewed”:  

Response 5:

No, the normality condition is not met. The sample size is 20, the distribution is strongly skewed, and there are no outliers.

 

Response 6:

No, the normality condition is not met. The sample size is 20 (and 20 is less than 30), there are no outliers, and the distribution is strongly skewed. 

 

WOULD THIS GET CREDIT?

Response 1 does not satisfy all components because it does not say if the normality condition is satisfied or not.

 

Response 2 does satisfy all components because it puts us in the second bullet with the 20 < 30, and also states that the distribution is not strongly skewed and there are no outliers.

 

Response 3 does not satisfy all components. It says “no”, and correctly says that the sample size is less than 30, but says in (d-ii) that the distribution is “strongly skewed”, when in part (d-i), they said that the distribution is “skewed, but not strongly skewed”.  This is incorrect because the two descriptions of the distribution are not consistent.

 

Response 4 does satisfy all components. It says “yes”, states that the sample size 20 is less than 30, reiterates that their description of the distribution as “not very skewed”, and goes on to say that there are no outliers.

 

Response 5 correctly says that the normality condition is not met, but states that the sample size is 20 without comparing it to the required sample size of 30, which would lose one component.

 

Response 6 would get full credit. It correctly states that the normality condition is not met.  It also correctly says that the sample size of 20 is less than 30, there are no outliers, and the distribution is “strongly skewed”, a statement consistent with their description of the distribution in part (d-i).

 

Teaching tips:

  • Although it is intended to have students do something they have not explicitly been taught before, all parts of the investigative task are not difficult! Usually the first part or two ask the student to do something that is familiar to them. 

  • When students are presented with a yes/no question, the first word of their response should be “Yes” or “No”. Some students start trying to justify their response without answering this necessary part of the question.

  • The last part of the investigative task often asks students to pull together things that they have done in previous parts of the question, or interpret what they have done. Have students make sure to leave themselves enough time to give themselves a chance to make the necessary connections from the different parts of the question.

 

I really enjoyed scoring this question! It was so nice to see a good number of students successful on the investigative parts! I plan to use this problem during class when I teach my students about free response question 6. I believe my students will consider it doable, and it is a great introduction to the nature of the investigative task. 

 

bottom of page