More things I’d like to play with, but haven’t had time:
Another cool dataset with birth records, implying that there is a “human mating season”. It would be fun to slice by age and see if this is for everyone, or just some subset of people. I’m guessing it could be just when people are inside more, like cold/flu season.
My kids have forced me to play a millon hands of UNO. It would be good to check if there are actually any winning strategies for the game. Is it better to play reverses and draw 2/4 cards early or late, etc? It’s not always obvious, but a Monte Carlo could show some interesting results (or the null result that there’s no good strategy to be had).
I want to make a vizulization of Disney/Pixar protagonist family mortality. It’s long been known that Disney moms are at a huge risk of dying, but now we’re seeing grandmothers and even great grandmothers dying (thanks Coco)
I’m convinced the reason the original Star Wars movies and The Force Awakens were so good is that the main characters spent a large amount of time on screen together. I’d like to compare to the amount of time the main characters are together in The Last Jedi.
So many things I’d like to be calculating and fiddling with, but haven’t had the time. Some of my recent ideas:
Look at birth records to see if the recent presidential election results in a “baby bust”, and compare to the baby bust caused by the great recession.
Is it possible to figure out who ghost wrote various celebrity books? Or figure out if multiple people wrote chapters?
Scrape WebWM and see how many clicks it takes to get to “cancer” from any given page.
Dump NFL player stats into various machine learning algos to see how they work (like, how well can ML predict a players position, or, predict next year’s performance).
The AAS started holding meetings at Gaylord resorts a few years ago. I think it would be fun to scrape the attendee information to see if there’s been a significant drop in the number of attendees because of it.
Follow up on this article on a spike in US mortality. It’s hard to understand how extending health coverage would lead to increasing mortality. I suspect the big part is that they lumped the age range 18-64 into a single bin. This could me a statistical mirage where mortality is constant for all age groups, but the US has an aging population, so it appears overall mortality is increasing. Looks like all the data to check is here for the scraping.
See if 2016 really was extra lethal in terms of celebrity deaths. Put together a list of “celebrities” (I dunno, top athlete’s, musicians, and actors), based on actuarial tables, predict how many of them should have died in 2016, compare to how many actually did. I suspect celebrities tend to live a little longer than the average person, so one would predict more deaths than actually happened.
The one thing I did manage to do was make a Venn Diagram for Adam West, the most recent victim of the Curse of the Simpsons. I figured he would be the only celebrity to have voiced roles on the Simpsons, Family Guy, and Super Friends. Turns out I was wrong, Frank Welker has also been on all 3!
I’ve been reading comic strips for as long as I can remember. As a kid, I had every Calvin and Hobbes book and every Far Side. These days I read plenty of web comics. I recently discovered MRA Dilbert, where the words on Dilbert comics are replaced by the bizarre blog rantings of Scott Adams. This made me realize what (for me) makes a comic strip great, the art has to be part of the joke. For many Dilbert strips, the art adds nothing–the joke would work just as well as an all-text tweet.
But look at this classic Calvin, it isn’t funny at all without the art!
It just seems odd to write a comic that doesn’t utilize the whole medium. It’s like reading text off a PowerPoint slide.
Anyway, here’s my favorite example of a text joke that’s kinda funny that becomes hilarious with the right art.
A while back, a study on grade bias based on student attractiveness made the rounds. See here (“For female students, an increase of one standard deviation in attractiveness was associated with a 0.024 increase in grade (on a 4.0 scale)”) and here (“But women in the “less attractive” group showed a much larger gap, earning on average 0.067 grade points less than other students. “)
When I read that, I couldn’t help but think that those are awfully small changes, basically, the difference between a 3.0 and a 2.933 GPA. How could such a small bias play out in reality? let’s assume we have two students who are identical solid “B” students in their academic performance, but one is much more attractive than the other. Let’s see how a bias against one student could look.
##Scenario 1##
Everyone is a little biased, and sometimes that is enough to randomly round down a grade. That would result in report cards looking like:
Hot student
Not Hot
Prof 1
B
B
Prof 2
B
B-
Prof 3
B
B
Prof 4
B
B
Prof 5
B
B
Prof 6
B
B-
Prof 7
B
B
Prof 8
B
B
Prof 9
B
B
Prof 10
B
B
Prof 11
B
B
Prof 12
B
B-
GPA
3.0
2.92
In this case, three different proffs rounded the unattractive student down a little bit.
##Scenario 2##
A few teachers are a little biased and consistently round unattractive students down a little bit. Report card would look like:
Hot student
Not Hot
Prof 1
B
B
Prof 2
B
B-
Prof 3
B
B
Prof 4
B
B
Prof 1
B
B
Prof 2
B
B-
Prof 3
B
B
Prof 4
B
B
Prof 1
B
B
Prof 2
B
B-
Prof 3
B
B
Prof 4
B
B
GPA
3.0
2.93
In this case, Prof 2 consistently gave a slightly lower grade to the not hot student.
##Scenario 3##
A very small minority of teachers are very biased against unattractive students. Report cards would look like:
Hot student
Not Hot
Prof 1
B
B
Prof 2
B
B
Prof 3
B
B
Prof 4
B
B
Prof 5
B
B
Prof 6
B
B
Prof 7
B
C
Prof 8
B
B
Prof 9
B
B
Prof 10
B
B
Prof 11
B
B
Prof 12
B
B
GPA
3.0
2.92
Here, Prof 7 is a real jerk who lowers the not hot student by a full grade.
##Summary##
To summarize the possibilities:
100% of people are a tiny bit biased, and it effects the grades they give 25% of the time
25% of people are consistently biased (a little bit)
8% of people are very biased
or of course
some linear combination of the above (say, 4% of people are very biased and 12% of people are a little biased)
I think the big takeaway here is that, when it comes to giving out grades, it looks like the vast majority of teachers are fair most of the time. Even if you are attractive enough to be a model, you’re unlikely to get a grade rounded up.
Even with that, no reason to assume you have no biases, so why not grade “blind” (i.e., don’t look at the name on the paper before you grade it)? Costs virtually nothing, and could increase fairness of the course.
**Confession: I did not go and read the paper these articles were based on like I should have. I probably just rediscovered things the authors already knew.
Now, there’s an excellent point to be made that way too many people are getting murdered with guns in the US. I totally agree with that. But the above graphic is obviously cherry-picking countries to make a point rather than selecting countries that are reasonably comparable to the US. I mean, they have Luxembourg up there. Luxembourg has a population of a half-million people, so about 640 times smaller than the US.
What would the graphic look like if it wasn’t cherry-picked? Well, here are some countries off the top of my head that I think are comparable to the US in important ways.
Russia. The US and USSR were the only two super powers for a long time, seems like that’s a good comparison.
Canada. Large continent-size country that we border and have a lot in common with.
Mexico. If we have Canada, makes sense to include the other large country we border.
Australia. Big English speaking country. Another former British colony.
Germany. Big industrialized country (Or France, the UK, whatever)
And here’s what that looks like:
So, it’s fine if you want to say the US should strive to be more like Canada and Germany and less like Mexico and Russia, but don’t pretend the US is some bizarre violent outlier.
**I had to fudge to figure out the gun homicide rate in Russia. I just assumed the fraction of total homicides that were gun homicides was the same in the US and Russia. Sources here and here.