There are some weeks when I’m stumped as to what I should write about in this weekly trip to Mad Moviedom. Sometimes I’m in the middle of an interesting study that isn’t quite ready for publication. Sometimes an idea isn’t quite fully developed. Sometimes I have an idea but I find myself blocked as to how to present it. When I find myself in this position, one avenue always open to me is to create a quick study that might be halfway interesting.
This is where I found myself this week. I had ideas that weren’t ready to publish yet. So, my fallback study was going to be a quick study of which movie decades present the best “really like” viewing potential. Here are the results of my first pass at this:
“Really Like” Decades | ||||||
Based on Number of “Really Like” Movies | ||||||
As of April 6, 2017 | ||||||
My Rating | ||||||
Really Liked | Didn’t Really Like | Total | “Really Like” Probability | |||
All | 1,108 | 888 | 1,996 | |||
2010’s | 232 | 117 | 349 | 60.9% | ||
2000’s | 363 | 382 | 745 | 50.5% | ||
1990’s | 175 | 75 | 250 | 62.0% | ||
1980’s | 97 | 60 | 157 | 58.4% | ||
1970’s | 56 | 49 | 105 | 54.5% | ||
1960’s | 60 | 55 | 115 | 53.9% | ||
1950’s | 51 | 78 | 129 | 46.6% | ||
1940’s | 55 | 43 | 98 | 55.8% | ||
1930’s | 19 | 29 | 48 | 46.9% |
These results are mildly interesting. The 2010’s, 1990″s, 1980’s, and 1940’s are above average decades for me. There are an unusually high number of movies in the sample that were released in the 2000’s. Remember that movies stay in my sample for 15 years from the year I last watched the movie. After 15 years they are removed from the sample and put into the pool of movies available to watch again. The good movies get watched again and the other movies are never seen again, hopefully. Movies last seen after 2002 have not gone through the process of separating out the “really like” movies to be watched again and permanently weeding from the sample the didn’t “really like” movies. The contrast of the 2000’s with the 2010’s is a good measure of the impact of the undisciplined selection movies and the disciplined selection.
As I’ve pointed out in recent posts, I’ve made some changes to my algorithm. One of the big changes I’ve made is that I’ve replaced the number of movies that are “really like” movies with the number of ratings for the movies that are “really like” movies. After doing my decade study based on number of movies, I realized I should have used the number of ratings method to be consistent with my new methodology. Here are the results based on the new methodology:
“Really Like” Decades | ||||||
Based on Number of “Really Like” Ratings | ||||||
As of April 6, 2017 | ||||||
My Rating | ||||||
Really Liked | Didn’t Really Like | Total | “Really Like” Probability | |||
All | 2,323,200,802 | 1,367,262,395 | 3,690,463,197 | |||
2010’s | 168,271,890 | 166,710,270 | 334,982,160 | 57.1% | ||
2000’s | 1,097,605,373 | 888,938,968 | 1,986,544,341 | 56.6% | ||
1990’s | 610,053,403 | 125,896,166 | 735,949,569 | 70.8% | ||
1980’s | 249,296,289 | 111,352,418 | 360,648,707 | 65.3% | ||
1970’s | 85,940,966 | 25,372,041 | 111,313,007 | 67.7% | ||
1960’s | 57,485,708 | 15,856,076 | 73,341,784 | 68.0% | ||
1950’s | 28,157,933 | 23,398,131 | 51,556,064 | 59.5% | ||
1940’s | 17,003,848 | 5,220,590 | 22,224,438 | 67.4% | ||
1930’s | 9,385,392 | 4,517,735 | 13,903,127 | 64.6% |
While the results are different, the big reveal was that 63.0% of the ratings are for “really like” movies and only 55.5% of the number of movies are “really like” movies. It starkly reinforces the impact of the law of large numbers. Movie website indicators of “really like” movies are more reliable when the number of ratings driving those indicators are larger. The following table illustrates this better:
“Really Like” Decades | ||||||
Based on Average Number of “Really Like” Ratings per Movie | ||||||
As of April 6, 2017 | ||||||
My Rating | ||||||
Really Liked | Didn’t Really Like | Total | “Really Like” % Difference | |||
All | 2,096,751.63 | 1,539,709.90 | 1,848,929.46 | 36.2% | ||
2010’s | 725,309.87 | 1,424,874.10 | 959,834.27 | -49.1% | ||
2000’s | 3,023,706.26 | 2,327,065.36 | 2,666,502.47 | 29.9% | ||
1990’s | 3,486,019.45 | 1,678,615.55 | 2,943,798.28 | 107.7% | ||
1980’s | 2,570,064.84 | 1,855,873.63 | 2,297,125.52 | 38.5% | ||
1970’s | 1,534,660.11 | 517,796.76 | 1,060,123.88 | 196.4% | ||
1960’s | 958,095.13 | 288,292.29 | 637,754.64 | 232.3% | ||
1950’s | 552,116.33 | 299,976.04 | 399,659.41 | 84.1% | ||
1940’s | 309,160.87 | 121,409.07 | 226,779.98 | 154.6% | ||
1930’s | 493,968.00 | 155,783.97 | 289,648.48 | 217.1% |
With the exception of the 2010’s, the average number of ratings per movie is larger for the “really like” movies. In fact, they are dramatically different for the decades prior to 2000. My educated guess is that the post-2000 years will end up fitting the pattern of the other decades once those years mature.
So what is the significance of this finding. It clearly suggests that waiting to decide whether to see a new movie or not until a sufficient number of ratings come in will produce a more reliable result. The unanswered question is how many ratings is enough.
The finding also reinforces the need to have something like Oscar performance to act as a second measure of quality for movies that will never have “enough” ratings for a reliable result.
Finally, the path from “there to here” is not always found on a map.