Today’s ridiculous project – a google sheet for AP style computer output

Working through my fourth candy assignment today, we ran into a problem where we could not easily create AP Stat style “computer output” for regression with student data. We do not have licenses for Minitab or JMP, R is outside my sphere (and the output looks different anyway), and Geogebra, statskey, and other free options I have tried either give different data, confusing data, or not enough data. I thought NCTMs Core Math Tools did what I needed, but I misread the output when planning.

I have since learned that StatsCrunch can output very similar data (and is nice enough that I may do subscriptions for it next year and use it more thoroughly with my students), but I did not know that tool this morning.

So what did I do? Made a google sheet that you can paste data into OR enter information for another google sheet (url, tab name, etc) and it will generate computer output for you. I may not ever use it again now that I have figured out StatsCrunch, but for data already in a google sheet this is quite possibly easier. I put it here for anybody who may find it useful.


Should statistics be taught longitudinally?

I’m working my way through my AP Statistics Candy Review activities and I really, really like them. And they are making me think about how this class is structured in my text and, I assume, most texts.

Here is the basic structure of MANY of the chapters of my text:

  • Big Topic
    • A bit about it generally
    • How it applies to categorical data
    • How it applies to quantitative data

As specific examples, we have:

  1. Chapter 1 – Exploring Data
    1. Analyzing categorical data
    2. Analyzing quantitative data
    3. Describing quantitative data
  2. Chapter 7 – Sampling distributions
    1. What is it?
    2. Sample proportions
    3. Sample means
  3. Chapter 8 – Confidence intervals
    1. The Basics
    2. Proportion intervals
    3. Mean intervals
  4. Chapter 9 – Hypothesis tests
    1. Basics
    2. Proportions
    3. Means
  5. Chapter 10 – two sample tests
    1. Comparing two proportions
    2. comparing two means

You get the picture.

I understand the appeal and purpose of setting things up this way; means and proportions are by far the most common things we do statistical studies and inference with, and the general process of, say, constructing a confidence interval is the same in both cases. But my students have struggled this entire semester with keeping straight the differences between them. This is partially my fault for failing to make the distinctions clear, but I have to wonder; would it be better to do everything to do with categorical data and proportions FIRST?

Here’s what I envision:

  1. Designing studies (chapter 4 in my book). Crucial to any longitudinal, semi-project-based approach, since I will want the students to design our at least have input on the design of our longitudinal projects
  2. A categorical data project. Not dissimilar from my Skittles activity but broken into pieces and interspersed with additional practice. This project, which will get touched on every single day, will require them to learn about:
    1.  Graphing and representing categorical data, including discussion of frequency tables, marginal distributions, conditional distributions, etc. (Chapter 1 in my book)
    2. Some aspects of Probability; using our sample data as the “true” value, imagine other future sampling options. (Chapter 5 in my book)
    3. Sampling distributions of proportions (Chapter 7 in my book) – this will be our first quantitative data, so…
    4. Displaying quantitative data with graphs (dotplots and histograms) and describing them with numbers (mean and standard deviation) (more chapter 1)
    5. Normal curves (Chapter 2)
    6. and now we have neough for… Confidence intervals and hypothesis tests with proportions (chapters 8 and 9)
    7. Comparing two proportions (chapter 10)
    8. chi-square goodness of fit and independence tests (Chapter 11)

Do you see how one giant, connected series of investigations involving exclusively categorical data and quantitative data about that categorical data, could lead to all of these ideas?

Once that project is finished, we start fresh and do data that was quantitative from the start. Two quantitative variables that can be connected, so we analyze each variable separately, do inference on each variable separately, then combine them for regression and regression inference.

Finally, fill in any gaps: probability ideas that never came up seem like the most obvious ones.

The chapters felt very disconnected this year. A significant part of that was teaching (I was basically relearning the curriculum myself as I went, after all), but a big part is structural as well, and I wonder if this sort of linear, longitudinal structure would be helpful. What would I lose by doing this?

How accurate is the official M&M data really? Add your data!

Yesterday I did a chi-square goodness-of-fit test with my class comparing a large sample of M&Ms – over 800 of them – to the data that is provided by Mars for the true count of M&Ms. We got a p-value of 0.0002, which seems crazy. So now I simply need to know how accurate their data actually is.

So here is my proposal: if you do an activity that involves counting the various colors of M&ms in any random sample, any year,  add your data to my collection using the form below. If you buy a bags of M&Ms and just feel like counting it, add data. If you want to put young children to work counting M&M colors, add their data too. If you have data from previous years, great! Add that too, and maybe I can add a time component to the analysis. If we get enough people on board, we should start to get an accurate picture of the true proportion of M&M colors, to see if Mars tells it true.

I have embedded the form and part of the analysis below, but you can also click here to access the full Google Sheet with the analysis of the total data, analysis by year (currently only 2015 makes sense obviously) submission-by-submission analysis, and original results.

[Note: I also had to make my own chi-square cumulative distribution function for Google Sheets, borrowing some source code from this online calculator at UCLA. If you want to know how to use it, or make your own custom Google Sheets functions, e-mail me and I can advise.]

AP Stat Candy Review so far – and a new one!

As of now, I have written 3 AP Statistics review activities centered around candy. You can access them all by clicking here. They are taking us around 2 class periods each, which is a full 160 minutes of class time; not a short commitment by any means, but I’m enjoying the long game connections a LOT.

Here’s a summary of topics I’ve covered so far:
Activity 1 – Skittles
This activity works itself up to chi-square goodness of fit tests (which, when I gave out this activity, we had not actually covered). Along the way we covered:
  • relative frequency tables
  • sample design, including
    • voluntary response sample
    • convenience sample
    • simple random sample
    • cluster sampling
    • stratified sampling
    • multistage sampling
  • collecting a sample and creating frequency and relative frequency tables
  • displaying categorical data with pie charts and bar graphs
  • two-way tables
  • marginal and conditional distributions
  • sampling distribution of sample proportions
  • confidence intervals for population proportions
  • 1 proportion z tests
  • Chi-square GOF tests
Activity 2 – Starbursts
This activity centered around a claim that the proportion of orange Starbursts is the same as the the proportion of orange skittles. We had never done 2-proportion z-tests, so the activity spends a lot of time walking through the logic of them and reviewing symbols and their meaning.
  • power of a test
  • calculating power (as an exercise in reviewing tests and CIs)
  • 1-proportion z tests (again)
  • 2-proportion z-tests
Activity 3 – M&Ms
In this one, we introduce quantitative data by measuring the mass of the M&Ms, while also reviewing some categorical data.
  • gathering data
  • review chi-square GOF with several possible distributions and sample sizes (exploring power, etc, along the way)
  • check percentage of actual rejected hypotheses against alpha
  • represent quantitative data visually and use SOCS
  • confidence interval for sample means
  • effect of sample size and alpha on power
  • 2-sample mean t-test
I honestly don’t know if this sort of long-range, vertical review is better than more traditional review or learning, but the students seem more engaged then I was getting them before, and I think they are grasping some of the big-picture conceptual things better than before, at least, which is satisfying for all of us. No idea if it will translate to AP scores.

The power of power

Today in AP Statistics we continued the Great Candy Review by comparing Starburst proportions to the skittles proportions; specifically, we started trying to decide if the proportion of orange starbursts could be equal to the proportion of orange skittles.

The activity covers both 1-sample proportion tests (by assuming that 20% of skittles are orange, as we surmised, and comparing our starburst sample proportion to 0.2) and then 2-sample proportion tests by dropping that assumption and comparing our actual samples, but before I dove into the tests I decided to spend some time dwelling on power.

This is my first time teaching this course, and I haven’t always figured out until too late what aspects to prioritize. Power is hard, it comes near the end of a chapter, and I skimmed it.

Big. Mistake.

Really thinking about the power of a test, even calculating it, turns out to be an extremely good way to really think about the underlying concepts of statistical inference. It took us 30-45 minutes to really get through the first two pages of the packet, which I didn’t expect, but I saw light bulbs going on all over the room as we slowly grasped the big picture. When students really understood the power of the test – when they realized that even if our friend is wrong there is a 75% chance we won’t be able to “prove” it with these techniques and understood why… well, they were obviously annoyed, but they also clearly understood the limitations and execution of inference tests better than they have all year.

It was a good moment.

Next class we will actually take a sample of starbursts and conduct the tests. I doubt we will be able to decide with high confidence that they proportions different (even though they really ARE) and now, hopefully, students will understand better why.

See this folder for the candy activities we’ve done so far.

Spring break is over: time for… The Candy Strategy

Spring break officially ends in a little under 12 hours, when my first class starts tomorrow. Naturally, tomorrow is the day in our rotating schedule where I don’t have a single free period, so it will be quite a change of pace.

Having just finished up our giant soccer goal geometry project, I’ve given myself permission to work a little less hard on that class for a week or two – my students will actually enjoy a trip into traditional math land, I think, and that will free me up to focus on AP Statistics (well, and the grades and comments that are due on Wednesday).

AP Statistics has not always one well for me this year, and I spent some time over spring break strategizing. I need to teach a few more concepts, help many of them fill in the gaping conceptual holes that our year has left behind, and give them plenty of practical test-taking strategies, all in 8 weeks. To do that, I need to re-establish trust and fun. Rapport. It needs to be a class they, and I, look forward to, and right now it just isn’t.

So, thus: the candy strategy.

I know I’m not the first to realize that statistics and candy go hand and hand – there are pages and pages of candy activities (with a special focus on M&Ms) on the internet. I read a lot of those, today. I think that I can piece together those ideas, along with some of my own, and review every single major concept we have learned with candy. Specifically, for me, a combination of jelly beans, starbursts, skittles, and M&Ms.

Today I wrote a 10-page document (see link at the end) that reviews frequency tables, pie charts, bar graphs, two-way tables, sample design, proportion confidence intervals, 1-sample proportion z-tests, and introduces chi-square goodness of fit tests as a cap, using nothing but skittles. We will work through the activity alone and in groups for the next two class days. Along the way we will review vocabulary, conditions, graphing techniques on calculator and by hand, and learn something new. Plus, eat some skittles.

It feels a little cheap, but I hope that candy will improve the relationship with my students enough that we can tackle this final quarter together with positive attitudes. One more to go!

Click here to access my folder of candy activities, including the very first Skittles Proportions activity.