Last week I stated in my article that I could pretty much identify whether a movie has a good chance of being a “really like movie” within six months of its release. If you need any further evidence, here are my top ten movies that I’ve never seen that are older than six months.
My Top Ten Never Seen Movie Prospects | ||||
Never Seen Movies = > Release Date + 6 Months | ||||
Movie Title | Last Data Update | Release Date | Total # of Ratings | “Really Like” Probability |
Hey, Boo: Harper Lee and ‘To Kill a Mockingbird’ | 2/4/2017 | 5/13/2011 | 97,940 | 51.7% |
Incendies | 2/4/2017 | 4/22/2011 | 122,038 | 51.7% |
Conjuring, The | 2/4/2017 | 7/19/2013 | 241,546 | 51.7% |
Star Trek Beyond | 2/4/2017 | 7/22/2016 | 114,435 | 51.7% |
Pride | 2/4/2017 | 9/26/2014 | 84,214 | 44.6% |
Glen Campbell: I’ll Be Me | 2/9/2017 | 10/24/2014 | 105,751 | 44.6% |
Splendor in the Grass | 2/5/2017 | 10/10/1961 | 246,065 | 42.1% |
Father of the Bride | 2/5/2017 | 6/16/1950 | 467,569 | 42.1% |
Imagine: John Lennon | 2/5/2017 | 10/7/1998 | 153,399 | 42.1% |
Lorenzo’s Oil | 2/5/2017 | 1/29/1993 | 285,981 | 42.1% |
The movies with a high “really like” probability in this group have already been watched. Of the remaining movies, there are three movies that are 50/50 and the rest have the odds stacked against them. In other words, if I watch all ten movies I probably won’t “really like” half of them. The dilemma is that I would probably “really like” half of them if I do watch all ten. The reality is that I won’t watch any of these ten movies as long as there are movies that I’ve already seen with better odds. Is there a way to improve the odds for any of these ten movies?
You’ll note that all ten movies have probabilities based on less than 500,000 ratings. Will some of these movies improve their probabilities as they receive more ratings? Maybe. Maybe not. To explore this possibility further I divided my database into quintiles based on the total number of ratings. When I look at the quintile with the most ratings, the most credible quintile, it does provide results that define the optimal performance of my algorithm.
Quintile 5
# Ratings Range > 2,872,053 |
# of Movies | # “Really Like” Movies | % “Really Like” Movies | Proj. Avg. Rating All Sites | My Avg Rating | My Rating to Proj. Rating Diff. |
Movies Seen More than Once | 152 | 134 | 88% | 8.6 | 8.5 | -0.1 |
Movies Seen Once | 246 | 119 | 48% | 7.5 | 6.9 | -0.7 |
All Movies in Range | 398 | 253 | 64% | 7.9 | 7.5 |
All of the movies in Quintile 5 have more than 2,872,053 ratings. My selection of movies that I had seen before is clearly better than my selection of movies I watched for the first time. This better selection is because the algorithm results led me to the better movies and my memory did some additional weeding. My takeaway is that, when considering movies I’ve never seen before, put my greatest trust in the algorithm if the movie falls in this quintile.
Lets look at the next four quintiles.
Quintile 4
# Ratings Range 1,197,745 to 2,872,053 |
# of Movies | # “Really Like” Movies | % “Really Like” Movies | Proj. Avg. Rating All Sites | My Avg Rating | My Rating to Proj. Rating Diff. |
Movies Seen More than Once | 107 | 85 | 79% | 8.3 | 8.3 | 0.1 |
Movies Seen Once | 291 | 100 | 34% | 7.1 | 6.4 | -0.7 |
All Movies in Range | 398 | 185 | 46% | 7.4 | 6.9 |
Quintile 3
# Ratings Range 516,040 to 1,197,745 |
# of Movies | # “Really Like” Movies | % “Really Like” Movies | Proj. Avg. Rating All Sites | My Avg Rating | My Rating to Proj. Rating Diff. |
Movies Seen More than Once | 122 | 93 | 76% | 7.8 | 8.0 | 0.2 |
Movies Seen Once | 278 | 102 | 37% | 7.1 | 6.6 | -0.6 |
All Movies in Range | 400 | 195 | 49% | 7.3 | 7.0 | |
Quintile 2
# Ratings Range 179,456 to 516,040 |
# of Movies | # “Really Like” Movies | % “Really Like” Movies | Proj. Avg. Rating All Sites | My Avg Rating | My Rating to Proj. Rating Diff. |
Movies Seen More than Once | 66 | 46 | 70% | 7.4 | 7.5 | 0.2 |
Movies Seen Once | 332 | 134 | 40% | 7.0 | 6.4 | -0.6 |
All Movies in Range | 398 | 180 | 45% | 7.1 | 6.6 | |
Quintile 1
# Ratings Range < 179,456 |
# of Movies | # “Really Like” Movies | % “Really Like” Movies | Proj. Avg. Rating All Sites | My Avg Rating | My Rating to Proj. Rating Diff. |
Movies Seen More than Once | 43 | 31 | 72% | 7.0 | 7.5 | 0.5 |
Movies Seen Once | 355 | 136 | 38% | 6.9 | 6.2 | -0.7 |
All Movies in Range | 398 | 167 | 42% | 6.9 | 6.4 |
Look at the progression of the algorithm projections as the quintiles get smaller. The gap between the movies seen more than once and those seen only once narrows as the number of ratings gets smaller. Notice that the difference between my ratings and the projected ratings for Movies Seen Once is fairly constant for all quintiles, either -0.6 or -0.7. But for the Movies Seen More than Once, the difference grows positively as the number of ratings gets smaller. This suggests that, for Movies Seen More than Once, the higher than expected ratings I give movies in Quintiles 1 & 2 are primarily driven by my memory of the movies rather than the algorithm.
What does this mean for my top ten never before seen movies listed above? All of the top ten is either in Quintiles 1 or 2. As they grow into the higher quintiles some may emerge with higher “really like” probabilities. Certainly, Star Trek Beyond, which is only 7 months old, can be expected to grow into the higher quintiles. But, what about Splendor in the Grass which was released in 1961 and, at 55 years old, might not move into Quintile 3 until another 55 years pass.
It suggests that another secondary movie quality indicator is needed that is separate from the movie recommender sites already in use. It sounds like I’ve just added another project to my 2017 “really like” project list.