Bye bye, wordpress.com! I have purchased hosting and the domain name http://www.mtbos.org, and have put my blog at http://davidgriswoldhh.mtbos.org . Find me there. If you might be interested in putting a blog of your own at a subdomain of mtbos.org, we can talk about it! See details here: http://www.mtbos.org/2015/06/04/visions-and-opportunities-for-mtbos-org/

# What do we measure, how do we measure it, and why?

I had an interesting conversation with a coworker earlier. She teaches an honors geometry class, and they are engaged in a really cool project right now, rather than traditional class time. She also wants them to learn about volume – which is what they would be doing in class – and so she is assigning them reading and homework on volume. It is pretty straightforward material in some ways, so she is actually going to give them a take-home assessment on the volume material without ever covering it in class.

She was concerned about fairness. Some students who would do well if she did straightforward “lecture->3 examples->homework” classes on this material will likely do worse. As she phrased it “I’m grading their innate abilities since I’m never teaching it. Is that fair?”

What does it mean, fair? If she gives this take-home assessment, it will *measure something different from other tests*. Most tests in a traditional environment measure ability to memorize, ability to process lecture and notes, ability to apply class-discussed examples to new (but usually similar) situations. Good ones test some ability to apply techniques in a larger context, or with a new wrinkle. These are fine and good things to measure. But ability to read and learn information from a text *without explicit guidance* is also a reasonable thing to measure. It isn’t the *same*, but that doesn’t make it less *fair. *I told her she should do it if she wants to.

And this made me think: what do we **want** to measure in a student? If grades are part of the process – and there’s certainly no way we will be removing them systemically any time soon – what would we ideally like those grades to reflect, and how are we doing at making that so?

There are many things I like about traditional tests, which I admit to using almost exclusively (along with homework completion and occasional classwork problems and exercises) at the moment. I think they are a good way to assess fluency in vocabulary and basic procedures, and a good traditional test will include problem-solving questions that call for “putting it all together.” But mathematical modeling is usually tested in a limited way or not at all. Same for explanations and derivations, which I emphasize heavily in class discussion then barely use at all on assessments. I think I often struggle putting in questions that can be viewed as “subjective,” as if my tests need to be unimpeachably numeric, a “numbers don’t lie” attitude that is specious but hard to shake. I also think I struggle with the idea of making a test too “hard,” both so as not to crush their spirits (because they don’t like feeling confused and it’s easier to cave to that than teach them to be fine with it) and because I feel weirdly tied to the arbitrary 70=D, 80=B, 90=A system we have created for ourselves. But by keeping things simple, I am overemphasizing the skills of the good memorizers and example-appliers and underemphasizing the skills of the problem-solvers and explorers, who would be able to demonstrate their brilliant attempts given more space to try. All with the goal of making it so that ones that freeze and give up under pressure can still squeeze out an arbitrarily-defined 70%.

It is also a hard truth that straightforward tests are easy to make, grade, and find time for, while problem solving and modeling can be hard and time consuming. And students *expect* tests. They expect a life of mostly-coasting with occasional nights of insane study. So it is simply the path of least resistance to take that path; no complaints, except in my own mind. And I definitely overemphasize a life of no conflict.

I don’t know the solution. I do not have the time or energy to do the sort of constant assessing of thinking that I think I would do in my ideal world. With two young children at home, there is not the window to look through exit slips every day and leave detailed feedback. It isn’t maintainable. And I haven’t yet found a way to keep my workload bearable and also do an honest assessment of the skills I actually find most important. And so, when I realize that I don’t have “enough grades” I write another test with a sigh and continue in the status quo.

# Tweaking and playing with the learning goal assessment system

My experiment with learning goal grading continues, and I’m having fun with it. My students seem to like it, though some of them are a little confused on the process of it all – how do we retry? What things are graded? How do we… when do we… etc. To be fair, a lot of that confusion is because I’m still stabbing about with a stick to figure out the best way to implement this system.

While grading my third assessment in the system – and my third assessment that tested only one standard – I realized that I needed to separate out algebra and arithmetic errors from the content itself. The standard was “Can move back and forth among area, perimeter, radius, and diameter of a circle.” And on the assessment, almost every student did that perfectly; equations were perfectly set up, understanding of the basic concecpts was clearly there. A few girls missed things under this standard – confused diameters and radii or circumferences with areas – but most of the mistakes on their papers were **procedural **not **conceptual**. Algebra, arithmetic, units, or simply following directions. So I added two learning goals to my list called “Units, directions, rounding, completeness” and “Algebra and arithmetic”. I gave each student a perfect score of 100 on those topics, and they can *only lose points*. 1 point here, 2 points there, 3 or 4 points for egregious issues. So instead of giving a student who left units off of everything a 90 on the Circle Area and Circumference standard, I can give them a 100 on it but take 4 points off of this rolling standard that is constantly important.

At the end of the unit, I will give a mastery badge to anybody who has a 90 or above in those areas, since they will have shown consistency in these areas.

# A Google Sheets based standards gamification system

As described in a previous post, I recently decided to try a mastery-based grading system with gamification for my geometry classes, inspired by some amazing work from Kyle Pearce and John Orr. Implementation of the system is key, and they had together, with the help of Alice Keeler, created one of the most amazing pieces of Google Sheet work I have ever seen to help with this; you can see his post explaining it here.

Here is the highlight version:

The spreadsheet has one tab, called “Master,” which controls most of the system. On this tab you define your standards or learning goals for the unit/course, put your roster, and assess each standard. Students can earn from 0-4 stars on each standard (though you enter it as a standard 1-100 grade). You can also award Mastery Badges to students when they have, in your estimation, mastered a goal. You can also add feedback, links to assessments or resources, and notes either to the class, a student, or to a student in response to a particular standard.

All of this is automatically imported by the students’ own personal tab. They can see their grade on every standard, which ones they have mastered, how many they’ve mastered, and what level this makes them. Their goal is to level up to the highest possible polygon! This can then be published to the web so they can see it as their own personal Mastery Portal, with links, feedback, and so forth that (should) automatically update. It looks like this:

All of this was awesome, but I immediately saw some things I could do to improve the sheet; my summer of working as a spreadsheet programming professional really came in handy here!

I made three major changes to Kyle’s sheet:

- I sped it up. The original student sheets relied extensively on HLookup and VLookup calls, which are amazing spreadsheet functions that also tend to be rather slow when used a lot. I was able to use some different commands (Index and Match) to speed up the calculation of student spreadsheets by limiting the number of times one sheet looks up data in another.
- I added some automation. Specifically, I added scripts to automatically create the student tabs from the roster, automatically get their URLs so you don’t have to copy and paste links one-by-one from a menu, and delete all of the student tabs if you need to start over. I also added a script to force the student tabs to update if for some reason they don’t change when you enter a score. Thanks to Alice Keeler for her TemplateTab script from which I started and got inspiration.
- I added a little bit of customization that was not in the original (though not as much as I’d like to add eventually)
- I added a tab with directions, so there’s no need to reference a blog post to remember how to work it. =)

I’m very excited to use this spreadsheet for this unit. Thanks so much to John Orr and Kyle Pearce and all of their inspirations for the brilliant idea and work – I think this could be a real game changer.

### Click Here to get your own copy of the Gamified Standards sheet

# I’m Gamifying Learning Goals with help from the MTBoS

On Thursday night I was up until about 11 pm – far past my normal weeknight bedtime – working on finishing some grading as midquarter grades were being posted the next day. As I worked my way through 40+ copies of a Big Unit Test, I realized that I was being surprised by them more often than I’d like.

I’ve never mastered formative assessment. I have a hard time putting emphasis on it and time into it for the same reason that my students do – I’m a procrastinator and heart and work better with deadline and attached value. So we worked through a right triangle trig and area test in my geometry class, and some students *never really got it*, and I didn’t know that. Then there were some students who just screwed up on test day – as one student told me later, her test grade was collateral damage to a lab report. And there were just as many positive surprises – which is nice, but still tells me I didn’t know what I was doing.

I wrote this tweet:

And then, after finishing grading and writing necessary comments, I stayed up a little later, in a tired-but-annoyed fugue state. I stumbled upon this tweet by Kyle Pearce:

Go ahead and read his post. I’ll wait.

I followed the link, read the post, and realized that I needed to try it out. Immediately. And I couldn’t wait. I decided that I would try it starting the very next day, with the unit I had already been doing for two days with my geometry classes: circles.

The next morning, I had 80 minutes to prepare for my first geometry class. I was able to get their names entered on Kyle’s spreadsheet, create a sample web page to show them, get some preliminary standards written up, and make assessments for my first three standards – naming parts of a circle, sketching parts of a circle, and moving between area, circumference, and diameter of a circle. You can see the assessments I made here: **Circle Standard Assessments . **The assessments are not particularly clever or good – I made them fast – but it’s a start. I ended up doing standards 1 and 2 at the end of class with them, and assigned standard 3 as homework – they can either do it for practice and attempt it again later OR pledge not to use notes/books/others and do it for Mastery (we have an honor code that makes it reasonable for me to offer this option).

I’m really excited about this. I think it is going to be awesome. My students were excited as well.

If you want details on implementation, see this post on how exactly to use the spreadsheet to implement this system, with some modifications I added.

# Today’s ridiculous project – a google sheet for AP style computer output

Working through my fourth candy assignment today, we ran into a problem where we could not easily create AP Stat style “computer output” for regression with student data. We do not have licenses for Minitab or JMP, R is outside my sphere (and the output looks different anyway), and Geogebra, statskey, and other free options I have tried either give different data, confusing data, or not enough data. I thought NCTMs Core Math Tools did what I needed, but I misread the output when planning.

I have since learned that StatsCrunch can output very similar data (and is nice enough that I may do subscriptions for it next year and use it more thoroughly with my students), but I did not know that tool this morning.

So what did I do? Made a google sheet that you can paste data into OR enter information for another google sheet (url, tab name, etc) and it will generate computer output for you. I may not ever use it again now that I have figured out StatsCrunch, but for data already in a google sheet this is quite possibly easier. I put it here for anybody who may find it useful.

# Should statistics be taught longitudinally?

I’m working my way through my AP Statistics Candy Review activities and I really, really like them. And they are making me think about how this class is structured in my text and, I assume, most texts.

Here is the basic structure of MANY of the chapters of my text:

- Big Topic
- A bit about it generally
- How it applies to categorical data
- How it applies to quantitative data

As specific examples, we have:

- Chapter 1 – Exploring Data
- Analyzing categorical data
- Analyzing quantitative data
- Describing quantitative data

- Chapter 7 – Sampling distributions
- What is it?
- Sample proportions
- Sample means

- Chapter 8 – Confidence intervals
- The Basics
- Proportion intervals
- Mean intervals

- Chapter 9 – Hypothesis tests
- Basics
- Proportions
- Means

- Chapter 10 – two sample tests
- Comparing two proportions
- comparing two means

You get the picture.

I understand the appeal and purpose of setting things up this way; means and proportions are by far the most common things we do statistical studies and inference with, and the general process of, say, constructing a confidence interval is the same in both cases. But my students have struggled this entire semester with keeping straight the differences between them. This is partially my fault for failing to make the distinctions clear, but I have to wonder; would it be better to do everything to do with categorical data and proportions FIRST?

Here’s what I envision:

- Designing studies (chapter 4 in my book). Crucial to any longitudinal, semi-project-based approach, since I will want the students to design our at least have input on the design of our longitudinal projects
- A categorical data project. Not dissimilar from my Skittles activity but broken into pieces and interspersed with additional practice. This project, which will get touched on every single day, will require them to learn about:
- Graphing and representing categorical data, including discussion of frequency tables, marginal distributions, conditional distributions, etc. (Chapter 1 in my book)
- Some aspects of Probability; using our sample data as the “true” value, imagine other future sampling options. (Chapter 5 in my book)
- Sampling distributions of proportions (Chapter 7 in my book) – this will be our first quantitative data, so…
- Displaying quantitative data with graphs (dotplots and histograms) and describing them with numbers (mean and standard deviation) (more chapter 1)
- Normal curves (Chapter 2)
- and now we have neough for… Confidence intervals and hypothesis tests with proportions (chapters 8 and 9)
- Comparing two proportions (chapter 10)
- chi-square goodness of fit and independence tests (Chapter 11)

Do you see how one giant, connected series of investigations involving *exclusively categorical data and quantitative data about that categorical data*, could lead to all of these ideas?

Once that project is finished, we start fresh and do data that was quantitative *from the start*. Two quantitative variables that can be connected, so we analyze each variable separately, do inference on each variable separately, then combine them for regression and regression inference.

Finally, fill in any gaps: probability ideas that never came up seem like the most obvious ones.

The chapters felt very disconnected this year. A significant part of that was teaching (I was basically relearning the curriculum myself as I went, after all), but a big part is structural as well, and I wonder if this sort of linear, longitudinal structure would be helpful. What would I lose by doing this?

# Multi-part area problem

I merged the study of triangle trigonometry and polygon area in my geometry class, since they go together very well. For their test, I created this multi-part area problem I like quite a bit. You can click the image to access the Geogebra sketch I used to make it on GeogebraTube if you’d like to download and modify it.

# Correlation with candy is hard

I had a hard time coming up with a candy activity for bivariate data that was really good. I ended up kind of cheating, and wrote an activity that has students gather *lots* of data points, one of which is “how many jelly beans can you pick up in one no-palm pinch?” and then letting them check the correlation of that with any of a wide variety of data points (height, hair length, finger length, age in days, etc.).

I don’t think it is as good, or will allow for quite as deep a discussion, as the earlier activities, but it is still decent I think.

I still haven’t actually taught regression inference, so I’m using this activity right after a brief mini-lesson on interpreting and using computer output for regression inference. I definitely will not be tackling this topic at the depth our book does; will focus instead on practicality and the basic idea of extending our knowledge of T-tests and confidence intervals to a more uncharted area. Some of the data sets they decide to plot will probably NOT satisfy all of the conditions for regression inference, though, so that could lead to a good discussion.

You can find all of my AP Stat candy activities, including this one (forgive any typos, it was made very late last night) here: https://drive.google.com/open?id=0B-C-lUvv4rQ4ZG9hcnh1azdHNjQ&authuser=0

# How accurate is the official M&M data really? Add your data!

Yesterday I did a chi-square goodness-of-fit test with my class comparing a large sample of M&Ms – over 800 of them – to the data that is provided by Mars for the true count of M&Ms. We got a p-value of 0.0002, which seems crazy. So now I simply **need** to know how accurate their data actually is.

So here is my proposal: if you do an activity that involves counting the various colors of M&ms in any random sample, any year, add your data to my collection using the form below. If you buy a bags of M&Ms and just feel like counting it, add data. If you want to put young children to work counting M&M colors, add their data too. If you have data from previous years, great! Add that too, and maybe I can add a time component to the analysis. If we get enough people on board, we should start to get an accurate picture of the true proportion of M&M colors, to see if Mars tells it true.

I have embedded the form and part of the analysis below, but you can also click here to access the full Google Sheet with the analysis of the total data, analysis by year (currently only 2015 makes sense obviously) submission-by-submission analysis, and original results.

[Note: I also had to make my own chi-square cumulative distribution function for Google Sheets, borrowing some source code from this online calculator at UCLA. If you want to know how to use it, or make your own custom Google Sheets functions, e-mail me and I can advise.]