The Art of Selecting “Really Like Movies: Older Never Before Seen

Last week I stated in my article that I could pretty much identify whether a movie has a good chance of being a “really like movie” within six months of its release. If you need any further evidence, here are my top ten movies that I’ve never seen that are older than six months.

Last week I stated in my article that I could pretty much identify whether a movie has a good chance of being a “really like movie” within six months of its release. If you need any further evidence, here are my top ten movies that I’ve never seen that are older than six months.

My Top Ten Never Seen Movie Prospects 
Never Seen Movies =  > Release Date + 6 Months
Movie Title Last Data Update Release Date Total # of Ratings “Really Like” Probability
Hey, Boo: Harper Lee and ‘To Kill a Mockingbird’ 2/4/2017 5/13/2011          97,940 51.7%
Incendies 2/4/2017 4/22/2011        122,038 51.7%
Conjuring, The 2/4/2017 7/19/2013        241,546 51.7%
Star Trek Beyond 2/4/2017 7/22/2016        114,435 51.7%
Pride 2/4/2017 9/26/2014          84,214 44.6%
Glen Campbell: I’ll Be Me 2/9/2017 10/24/2014        105,751 44.6%
Splendor in the Grass 2/5/2017 10/10/1961        246,065 42.1%
Father of the Bride 2/5/2017 6/16/1950        467,569 42.1%
Imagine: John Lennon  2/5/2017 10/7/1998        153,399 42.1%
Lorenzo’s Oil 2/5/2017 1/29/1993        285,981 42.1%

The movies with a high “really like” probability in this group have already been watched. Of the remaining movies, there are three movies that are 50/50 and the rest have the odds stacked against them. In other words, if I watch all ten movies I probably won’t “really like” half of them. The dilemma is that I would probably “really like” half of them if I do watch all ten. The reality is that I won’t watch any of these ten movies as long as there are movies that I’ve already seen with better odds. Is there a way to improve the odds for any of these ten movies?

You’ll note that all ten movies have probabilities based on less than 500,000 ratings. Will some of these movies improve their probabilities as they receive more ratings? Maybe. Maybe not. To explore this possibility further I divided my database into quintiles based on the total number of ratings. When I look at the quintile with the most ratings, the most credible quintile, it does provide results that define the optimal performance of my algorithm.

Quintile 5

# Ratings Range > 2,872,053

# of Movies # “Really Like” Movies % “Really Like” Movies Proj.  Avg. Rating All Sites My Avg Rating My Rating to Proj. Rating Diff.
Movies Seen More than Once 152 134 88% 8.6 8.5 -0.1
Movies Seen Once 246 119 48% 7.5 6.9 -0.7
             
All Movies in Range 398 253 64% 7.9 7.5  

All of the movies in Quintile 5 have more than 2,872,053 ratings. My selection of movies that I had seen before is clearly better than my selection of movies I watched for the first time. This better selection is because the algorithm results led me to the better movies and my memory did some additional weeding. My takeaway is that, when considering movies I’ve never seen before, put my greatest trust in the algorithm if the movie falls in this quintile.

Lets look at the next four quintiles.

Quintile 4

# Ratings Range 1,197,745 to 2,872,053

# of Movies # “Really Like” Movies % “Really Like” Movies Proj.  Avg. Rating All Sites My Avg Rating My Rating to Proj. Rating Diff.
Movies Seen More than Once 107 85 79% 8.3 8.3 0.1
Movies Seen Once 291 100 34% 7.1 6.4 -0.7
             
All Movies in Range 398 185 46% 7.4 6.9
Quintile 3

# Ratings Range 516,040 to 1,197,745

# of Movies # “Really Like” Movies % “Really Like” Movies Proj.  Avg. Rating All Sites My Avg Rating My Rating to Proj. Rating Diff.
Movies Seen More than Once 122 93 76% 7.8 8.0 0.2
Movies Seen Once 278 102 37% 7.1 6.6 -0.6
             
All Movies in Range 400 195 49% 7.3 7.0
Quintile 2

# Ratings Range 179,456 to 516,040

# of Movies # “Really Like” Movies % “Really Like” Movies Proj.  Avg. Rating All Sites My Avg Rating My Rating to Proj. Rating Diff.
Movies Seen More than Once 66 46 70% 7.4 7.5 0.2
Movies Seen Once 332 134 40% 7.0 6.4 -0.6
             
All Movies in Range 398 180 45% 7.1 6.6
Quintile 1

# Ratings Range < 179,456

# of Movies # “Really Like” Movies % “Really Like” Movies Proj.  Avg. Rating All Sites My Avg Rating My Rating to Proj. Rating Diff.
Movies Seen More than Once 43 31 72% 7.0 7.5 0.5
Movies Seen Once 355 136 38% 6.9 6.2 -0.7
             
All Movies in Range 398 167 42% 6.9 6.4

Look at the progression of the algorithm projections as the quintiles get smaller. The gap between the movies seen more than once and those seen only once narrows as the number of ratings gets smaller. Notice that the difference between my ratings and the projected ratings for Movies Seen Once is fairly constant for all quintiles, either -0.6 or -0.7. But for the Movies Seen More than Once, the difference grows positively as the number of ratings gets smaller. This suggests that, for Movies Seen More than Once, the higher than expected ratings I give movies in Quintiles 1 & 2 are primarily driven by my memory of the movies rather than the algorithm.

What does this mean for my top ten never before seen movies listed above? All of the top ten is either in Quintiles 1 or 2. As they grow into the higher quintiles some may emerge with higher “really like” probabilities. Certainly, Star Trek Beyond, which is only 7 months old, can be expected to grow into the higher quintiles. But, what about Splendor in the Grass which was released in 1961 and, at 55 years old, might not move into Quintile 3 until another 55 years pass.

It suggests that another secondary movie quality indicator is needed that is separate from the movie recommender sites already in use. It sounds like I’ve just added another project to my 2017 “really like” project list.

 

 

The Art of Selecting “Really Like” Movies: New Movies

Over the next three weeks I’ll outline the steps I’m taking this year to improve my “really like” movie odds. Starting this week with New Movies , I’ll lay out a focused strategy for three different types of movie selection decisions.

I watch a lot of movies, a fact that my wife, and occasionally my children, like to remind of. Unlike the average, non-geeky, movie fan, though, I am constantly analyzing the process I go through to determine which movies I watch. I don’t like to watch mediocre, or worse, movies. I’ve pretty much eliminated bad movies from my selections. But, every now and then a movie I “like” rather than “really like” will get past my screen.

Over the next three weeks I’ll outline the steps I’m taking this year to improve my “really like” movie odds. Starting this week with New Movies, I’ll lay out a focused strategy for three different types of movie selection decisions.

The most challenging “really like” movie decision I make is which movies that I’ve never seen before are likely to be “really like” movies. There is only a 39.3% chance that watching a movie I’ve never seen before will result in a “really like” experience. My goal is to improve those odds by the end of the year.

The first step I’ve taken is to separate movies I’ve seen before from movies I’ve never seen in establishing my “really like” probabilities. As a frame of reference, there is a 79.5% chance that I will “really like” a movie I’ve seen before. By setting my probabilities for movies I’ve never seen off of the 39.3% probability I have created a tighter screen for those movies. This should result in me watching fewer never-before-seen movies then I’ve typically watched in previous years. Of the 20 movies I’ve watched so far this year, only two were never-before-seen movies.

The challenge in selecting never-before-seen movies is that, because I’ve watched close to 2,000 movies over the last 15 years, I’ve already watched the “cream of the crop” from those 15 years.. From 2006 to 2015, there were 331 movies that I rated as “really like” movies, that is 33 movies a year, or less than 3 a month. Last year I watched 109 movies that I had never seen before. So, except for the 33 new movies that came out last year that, statistically, might be “really like” movies, I watched 76 movies that didn’t have a great chance of being “really like” movies.

Logically, the probability of selecting a “really like” movie that I’ve never seen before should be highest for new releases. I just haven’t seen that many of them. I’ve only seen 6 movies that were released in the last six months and I “really liked” 5 of them. If, on average, there are 33 “really like” movies released each year, then, statistically, there should be a dozen “really like” movies released in the last six months that I haven’t seen yet. I just have to discover them. Here is my list of the top ten new movie prospects that I haven’t seen yet.

My Top Ten New Movie Prospects 
New Movies =  < Release Date + 6 Months
Movie Title Release Date Last Data Update “Really Like” Probability
Hacksaw Ridge 11/4/2016 2/4/2017 94.9%
Arrival 11/11/2016 2/4/2017 94.9%
Doctor Strange 11/4/2016 2/6/2017 78.9%
Hidden Figures 1/6/2017 2/4/2017 78.7%
Beatles, The: Eight Days a Week 9/16/2016 2/4/2017 78.7%
13th 10/7/2016 2/4/2017 78.7%
Before the Flood 10/30/2016 2/4/2017 51.7%
Fantastic Beasts and Where to Find Them 11/18/2016 2/4/2017 51.7%
Moana 11/23/2016 2/4/2017 51.7%
Deepwater Horizon 9/30/2016 2/4/2017 45.4%
Fences 12/25/2016 2/4/2017 45.4%

Based on my own experience, I believe you can identify most of the new movies that will be “really like” movies within 6 months of their release, which is how I’ve defined “new” for this list. I’m going to test this theory this year.

In case you are interested, here is the ratings data driving the probabilities.

My Top Ten New Movie Prospects 
Movie Site Ratings Breakdown
Ratings *
Movie Title # of Ratings All Sites Age 45+ IMDB Rotten Tomatoes ** Criticker Movielens Netflix
Hacksaw Ridge         9,543 8.2 CF 86% 8.3 8.3 8.6
Arrival      24,048 7.7 CF 94% 8.8 8.1 9.0
Doctor Strange      16,844 7.7 CF 90% 8.2 8.3 7.8
Hidden Figures         7,258 8.2 CF 92% 7.7 7.3 8.2
Beatles, The: Eight Days a Week         1,689 8.2 CF 95% 8.0 7.3 8.0
13th    295,462 8.1 CF 97% 8.3 7.5 8.0
Before the Flood         1,073 7.8 F 70% 7.6 8.2 7.8
Fantastic Beasts and Where to Find Them      14,307 7.5 CF 73% 7.3 6.9 7.6
Moana         5,967 7.7 CF 95% 8.4 8.0 7.0
Deepwater Horizon      40,866 7.1 CF 83% 7.8 7.6 7.6
Fences         4,418 7.6 CF 95% 7.7 7.1 7.2
*All Ratings Except Rotten Tomatoes Calibrated to a 10.0 Scale
** CF = Certified Fresh, F = Fresh

Two movies, Hacksaw Ridge and Arrival, are already probably “really like” movies and should be selected to watch when available. The # of Ratings All Sites is a key column. The ratings for Movielens and Netflix need ratings volume before they can credibly reach their true level. Until, there is a credible amount of data the rating you get is closer to what an average movie would get. A movie like Fences, at 4,418 ratings, hasn’t reached the critical mass needed to migrate to the higher ratings I would expect that movie to reach. Deep Water Horizon, on the other hand, with 40,866 ratings, has reached a fairly credible level and may not improve upon its current probability.

I’m replacing my monthly forecast on the sidebar of this website with the top ten new movie prospects exhibit displayed above. I think it is a better reflection of the movies that have the best chance of being “really like” movies. Feel free to share any comments you might have.

 

Create, Test, Analyze, and Recreate

Apple’s IPhone just turned 10 years old. Why has it been such a successful product? It might be because the product hasn’t stayed static. The latest version of the IPhone is the IPhone 7+. As a product, it is constantly reinventing itself to improve its utility. It is always fresh. Apple, like most producers of successful products, probably follows a process whereby they:
Create.
Test what they’ve created.
Analyze the results of their tests.
Recreate.
They never dust off their hands and say, “My job is done.”

Apple’s IPhone just turned 10 years old. Why has it been such a successful product? It might be because the product hasn’t stayed static. The latest version of the IPhone is the IPhone 7+. As a product, it is constantly reinventing itself to improve its utility. It is always fresh. Apple, like most producers of successful products, probably follows a process whereby they:

  1. Create.
  2. Test what they’ve created.
  3. Analyze the results of their tests.
  4. Recreate.

They never dust off their hands and say, “My job is done.”

Now I won’t be so presumptuous to claim to have created something as revolutionary as the IPhone. But, regardless of how small your creation, its success requires you to follow the same steps outlined above.

My post last week outlined the testing process I put my algorithm through each year. This week I will provide some analysis and take some steps towards a recreation. The results of my test was that using my “really like” movie selection system significantly improved the overall quality of the movies I watch. On the negative side, the test showed that once you hit some optimal number of movies in a year the additional movies you might watch has a diminishing quality as the remaining pool of “really like” movies shrinks.

A deeper dive into these results begins to clarify the key issues. Separating movies that I’ve seen at least twice from those that were new to me is revealing.

Seen More than Once Seen Once
1999 to 2001 2014 to 2016 1999 to 2001 2014 to 2016
# of Movies 43 168 231 158
% of Total Movies in Timeframe 15.7% 51.5% 84.3% 48.5%
IMDB Avg Rating                   7.6                   7.6                   6.9                   7.5
My Avg Rating                   8.0                   8.4                   6.1                   7.7
% Difference 5.2% 10.1% -12.0% 2.0%

There is so much interesting data here I don’t know where to start. Let’s start with the notion that the best opportunity for a “really like” movie experience is the “really like” movie you’ve already seen. I’ve highlighted in teal the percentage that My Avg Rating outperforms the IMDB Avg Rating in both timeframes. The fact that, from 1999 to 2001, I was able to watch movies that I “really liked” more than the average IMDB voter, without the assistance of any movie recommender website, suggests that memory of a “really like” movie is a pretty reliable “really like” indicator. The 2014 to 2016 results suggest that my “really like” system can help prioritize the movies that memory tells you that you will “really like” seeing again.

The data highlighted in red and blue clearly display the advantages of the “really like” movie selection system. It’s for the movies you’ve never seen that movie recommender websites are worth their weight in gold. With limited availability of movie websites from 1999 to 2001 my selection of new movies underperformed the IMDB Avg Rating by 12% and they represented 84.3% of all of the movies I watched during that timeframe. From 2014 to 2016 (the data in blue), my “really like” movie selection system recognized that there is a limited supply of new “really like” movies. As a result less than half of the movies watched from 2014 through 2016 were movies I’d never seen before. Of the new movies I did watch, there was a significant improvement over the 1999 to 2001 timeframe in terms of quality, as represented by the IMD Avg Rating, and my enjoyment of the movies, as represented by My Avg Rating.

Still, while the 2014 to 2016 new movies were significantly better than the new movies watched from 1999 to 2001, is it unrealistic to expect My Ratings to be better than IMDB by more than 2%? To gain some perspective on this question, I profiled the new movies I “really liked” in the 2014 to 2016 timeframe and contrasted them with the movies I didn’t “really like”.

Movies Seen Once
2014 to 2016
“Really Liked” Didn’t “Really Like”
# of Movies 116 42
% of Total Movies in Timeframe 73.4% 26.6%
IMDB Avg Rating                       7.6                                  7.5
My Avg Rating                       8.1                                  6.3
“Really Like” Probability 82.8% 80.7%

The probability results for these movies suggest that I should “really like” between 80.7% and 82.8% of the movies in the sample. I actually “really liked” 73.4%, not too far off the probability expectations. The IMDB Avg Rating for the movies I didn’t “really like” is only a tick lower than the rating for the “really liked” movies. Similarly, the “Really Like” Probability is only a tick lower for the Didn’t “Really Like” movies. My conclusion is that there is some, but not much, opportunity to improve selection of new movies through a more disciplined approach. The better approach would be to favor “really like” movies that I’ve seen before and give new movies more time for their data to mature.

Based on my analysis, here is my action plan:

  1. Set separate probability standards for movies I’ve seen before and movies I’ve never seen.
  2. Incorporate the probability revisions into the algorithm.
  3. Set a minimum probability threshold for movies I’ve never seen before.
  4. When the supply of “really like” movies gets thin, only stretch for movies I’ve already seen and memory tells me I “really liked”.

Create, test, analyze and recreate.

 

A New Year’s Ritual: Looking Back to Help Move Forward

I’m a big fan of the New Year’s ritual of taking stock of where you’ve been and resolving to make some adjustments to make the coming year better. This New Year marks the completion of my third year of working with an algorithm to help me select better movies to watch. Since establishing my database, I’ve used each New Year to take two snapshots of my viewing habits.

I’m a big fan of the New Year’s ritual of taking stock of where you’ve been and resolving to make some adjustments to make the coming year better. This New Year marks the completion of my third year of working with an algorithm to help me select better movies to watch. Since establishing my database, I’ve used each New Year to take two snapshots of my viewing habits.

The first snapshot is of the movies that have met the fifteen year limit that I’ve imposed on my database. This year it’s the year 2001 that is frozen in time. I became a user of IMDB in June 2000. That makes 2001 the first full year that I used a data based resource to supplement my movie selection process which, at the time, was still primarily guided by the weekly recommendations of Siskel & Ebert.

The second snapshot is of the data supporting the movie choices I made in 2016. By looking at a comparison of 2001 with 2016, I can gain an appreciation of how far I’ve come in effectively selecting movies. Since this is the third set of snapshots I’ve taken I can also compare 1999 with 2014 and 2000 with 2015, and all years with each other.

Here are the questions I had and the results of the analysis. In some instances it suggests additional targets of research.

Am I more effective now than I was before in selecting movies to watch?

There is no question that the creation of online movie recommending websites and the systematic use of them to select movies improves overall selection. The comparison below of the two snapshots mentioned previously for the last three years demonstrates significant improvement over the last three years.

 Year # of Movies My Avg Rating  Year # of Movies My Avg Rating % Rating Diff.
2001 109                        6.0 2016 144 7.4 23.3%
2000 106                        6.9 2015 106 8.4 21.7%
1999 59                        6.4 2014 76 8.8 37.5%
1999 – 2001 274 6.4 2014 – 2016 326 8.1 25.1%

One area of concern might be a pattern, or it could be random, in the 2014 to 2016 data that might suggest that there is a diminishing return in the overall quality of movies watched as the number of movies watched increases.

Am I more likely to watch movies I “really like”?

Again, the answer is a resounding “Yes”.

# of Movies # “Really Liked” % “Really Liked”
1999 59 25 42.4%
2000 106 50 47.2%
2001 109 40 36.7%
2014 76 76 100.0%
2015 106 91 85.8%
2016 144 100 69.4%

The concern raised about diminishing returns from increasing the number of movies watched is in evidence here as well. In 2014 I “really liked” all 76 movies I watched. Is it worth my time to watch another 30 movies, as I did in 2015, if I will “really like” 15 of them? Maybe. Maybe not. Is it worth my while to watch an additional 68 movies, as I did in 2016, if I will “really like” only 24? Probably not.

How do I know that I am selecting better movies and not just rating them higher?

As a control, I’ve used the IMDB average rating as an objective measure of quality.

IMDB Avg Rating My Avg Rating Difference
1999 7.0 6.4                   (0.6)
2000 7.1 6.9                   (0.2)
2001 6.9 6.0                   (0.9)
2014 7.8 8.8                     1.0
2015 7.6 8.4                     0.8
2016 7.4 7.4                        –

The average IMDB voter agrees that the movies I’ve watched from 2014 to 2016 are much better than the movies I watched from 1999 to 2001. What is particularly interesting is that the movies I chose to watch from 1999 to 2001, without the benefit of any website recommending movies I’d personally like, were movies I ended up liking less than the average IMDB voter. From 2014 to 2016, with the benefit of tools like Netflix, Movielens, and Criticker, I’ve selected movies that I’ve liked better than the average IMDB voter. The 2016 results feed the diminishing returns narrative, suggesting that the more movies that I watch the more my overall ratings will migrate to average.

My 2017 “Really Like” resolution.

My selection algorithm is working effectively. But, the combination of a diminishing number of “really like” movies that I haven’t seen in the last fifteen years, and my desire to grow the size of my database, may be causing me to reach for movies that are less likely to result in a “really like” movie experience. Therefore, I resolve to establish within the next month a minimum standard below which I will not reach.

Now that’s what New Year’s is all about, the promise of an even better “really like” movie year.

 

 

 

 

 

 

 

 

How Do You Know a Tarnished Penny Isn’t a Tarnished Quarter?

One of my first posts on this site was The Shiny Penny in which I espoused the virtues of older movies. I still believe that and yet here I am, almost eleven months later, wondering if my movie selection algorithm does a good enough job surfacing those “tarnished quarters”. A more accurate statement of the problem is that older movies generate less data for the movie websites I use in my algorithm which in turn creates fewer recommended movies.

One of my first posts on this site was The Shiny Penny in which I espoused the virtues of older movies. I still believe that and yet here I am, almost eleven months later, wondering if my movie selection algorithm does a good enough job surfacing those “tarnished quarters”. A more accurate statement of the problem is that older movies generate less data for the movie websites I use in my algorithm which in turn creates fewer recommended movies.

Let me explain the issue by using a comparison of IMDB voting with my own ratings for each movie decade. Since I began developing my algorithm around 2010, I’m also going to use 2010 as the year that I began disciplining my movie choices to an algorithm. Also, you might recall from previous posts, that my database consists of movies I’ve watched in the last fifteen years. Each month I remove movies from the database that go beyond the fifteen years and make them available for me to watch again. One other clarification, I use the IMDB ratings for age 45+ to better match with my demographic.

To familiarize you with the format I’ll display for each decade here’s a look at the 2010’s:

Database Movies Released in the 2010’s # of Movies % of Movies Avg # of Voters Avg. IMDB Rating My Avg. Rating
Viewed After Algorithm 340 100.0%    10,369 7.3 7.3
Viewed Before Algorithm 0 0.0%

The 340 movies that I’ve seen from the 2010’s are 17.2% of all of the movies I’ve seen in the last 15 years and there are three more years in the decade to go. If the number of recommended movies were distributed evenly across all nine decades this percentage would be closer to 11%. Because the “shiny pennies” are the most available to watch, there is a tendency to watch more of the newer movies. I also believe that many of the newer movies fit the selection screen before the data matures that might not fit the screen after the data matures. The Average # of Voters column is an indicator of how mature the data is. Keep this in mind as we look at subsequent decades.

The 2000’s represent my least disciplined movie watching. 38.4% of all of the movies in the database come from this decade. The decision to watch specific movies was driven primarily by what was available rather than what was recommended.

Database Movies Released in the 2000’s # of Movies % of Movies Avg # of Voters Avg. IMDB Score Avg.My Score
Viewed After Algorithm 81 10.6%    10,763 7.2 6.8
Viewed Before Algorithm 680 89.4%    10,405 7.1 6.4

One thing to remember about movies in this decade is that only movies watched in 2000 and 2001 have dropped out of the database. As a result, only 10.6% of the movies were selected to watch with some version of the selection algorithm.

The next three decades represent the reliability peak in terms of the algorithm.

Database Movies Released in the 1990’s # of Movies % of Movies Avg # of Voters Avg. IMDB Score Avg.My Score
Viewed After Algorithm 115 46.7%    18,179 7.4 8.1
Viewed Before Algorithm 131 53.3%    11,557 7.2 7.0
Database Movies Released in the 1980’s # of Movies % of Movies Avg # of Voters Avg. IMDB Score Avg.My Score
Viewed After Algorithm 68 44.4%    14,025 7.5 7.6
Viewed Before Algorithm 85 55.6%    12,505 7.4 7.0
Database Movies Released in the 1970’s # of Movies % of Movies Avg # of Voters Avg. IMDB Score Avg.My Score
Viewed After Algorithm 38 38.0%    18,365 7.8 7.6
Viewed Before Algorithm 62 62.0%      9,846 7.5 6.5

Note that the average number of voters per movie is higher for these three decades than the movies released after 2000. Each decade there is a growing gap in the number of voters per movie that get recommended by the algorithm and those that are seen before using the algorithm. This may be indicative of the amount of data needed to produce a recommendation. You also see larger gaps in my enjoyment of the movies that use the disciplined movie selection process against those movies seen prior to the use of the algorithm. My theory is that younger movie viewers will only watch the classics and as a result they are the movies that generate sufficient data for the algorithm to be effective.

When we get to the four oldest decades in the database, it becomes clear that the number of movies with enough data to fit the algorithm is minimal.

Database Movies Released in the 1960’s # of Movies % of Movies Avg # of Voters Avg. IMDB Score Avg.My Score
Viewed After Algorithm 23 20.0%    14,597 8.0 8.3
Viewed Before Algorithm 92 80.0%      6,652 7.7 6.6
Database Movies Released in the 1950’s # of Movies % of Movies Avg # of Voters Avg. IMDB Score Avg.My Score
Viewed After Algorithm 22 18.0%    11,981 8.0 8.4
Viewed Before Algorithm 100 82.0%      5,995 7.7 5.9
Database Movies Released in the 1940’s # of Movies % of Movies Avg # of Voters Avg. IMDB Score Avg.My Score
Viewed After Algorithm 21 22.1%      8,021 8.0 7.9
Viewed Before Algorithm 74 77.9%      4,843 7.8 6.5
Database Movies Released in the Pre-1940’s # of Movies % of Movies Avg # of Voters Avg. IMDB Score Avg.My Score
Viewed After Algorithm 7 14.0%    12,169 8.0 7.5
Viewed Before Algorithm 43 86.0%      4,784 7.9 6.2

The results are even more stark. For these oldest decades of movies, today’s movie viewers and critics are drawn to the classics for these decades but probably not much else. It is clear that the selection algorithm is effective for movies with enough data. The problem is that the “really like” movies from these decades that don’t generate data don’t get recommended.

Finding tarnished quarters with a tool that requires data when data diminishes as movies age is a problem. Another observation is that the algorithm works best for the movies released from the 1970’s to the 1990’s probably because the data is mature and plentiful. Is there a value in letting the shiny pennies that look like quarters get a little tarnished before watching them?

Merry Christmas to all and may all of your movies seen this season be “really like” movies.

 

 

Oh, What To Do About Those Tarnished Old Quarters.

In one of my early articles, I wrote about the benefits of including older movies in your catalogue of movies to watch. I used the metaphor of our preference for holding onto shiny new pennies rather than tarnished old quarters. One of the things that has been bothering me is that my movie selection system hasn’t been surfacing older movie gems that I haven’t seen.

In one of my early articles, I wrote about the benefits of including older movies in your catalogue of movies to watch. I used the metaphor of our preference for holding onto shiny new pennies rather than tarnished old quarters. One of the things that has been bothering me is that my movie selection system hasn’t been surfacing older movie gems that I haven’t seen. Take a look at the table below based on the movie I’ve watched over the last 15 years:

Movie Release Time Frame # of Movies Seen % of Total
2007 to 2016 573 29%
1997 to 2006 606 31%
1987 to 1996 226 11%
1977 to 1986 128 6%
1967 to 1976 101 5%
1957 to 1966 122 6%
1947 to 1956 109 6%
1937 to 1946 87 4%
1920 to 1936 25 1%

60% of the movies I’ve watched in the last 15 years were released in the last 20 years. That’s probably typical. In fact, watching movies more than 20 years old 40% of the time is probably unusual. Still, there are probably quality older movies out there that I’m not seeing.

My hypothesis has been that the databases for the movie websites that produce my recommendations are smaller for older movies. This results in recommendations that are based on less credible data. In the world of probabilities, if your data isn’t credible, your probability stays closer to the average probability for randomly selected movies.

I set out to test this hypothesis against the movies I’ve watched since I began to diligently screens my movies through my movie selection system. It was around 2010 that I began putting together my database and using it to select movies. Here is a profile of those movies.

Seen after 2010
Movie Release My
Time Frame Average Rating # of Movies Seen % of Total Seen
2007 to 2016 7.2 382 55%
1997 to 2006 7.9 60 9%
1987 to 1996 7.9 101 15%
1977 to 1986 7.8 57 8%
1967 to 1976 7.9 23 3%
1957 to 1966 8.2 26 4%
1947 to 1956 8.2 20 3%
1937 to 1946 8.4 17 2%
1920 to 1936 6.9 4 1%

It seems that it’s the shiniest pennies, that I watch most often, that I’m least satisfied with. So again I have to ask, why aren’t my recommendations producing more older movies to watch?

It comes back to my original hypothesis. Netflix has the greatest influence on the movies that are recommended for me. So, I compared my ratings to Netflix’ Best Guess ratings for me and added the average number of ratings those “best guesses” were based on.

Movie Release Time Frame My Average Rating Netflix Average Best Guess Avg. # of Ratings per Movie My Rating Difference from Netflix
2007 to 2016 7.2 7.7    1,018,163 -0.5
1997 to 2006 7.9 8.0    4,067,544 -0.1
1987 to 1996 7.9 8.1    3,219,037 -0.2
1977 to 1986 7.8 7.8    2,168,369 0
1967 to 1976 7.9 7.6    1,277,919 0.3
1957 to 1966 8.2 7.9        991,961 0.3
1947 to 1956 8.2 7.8        547,577 0.4
1937 to 1946 8.4 7.8        541,873 0.6
1920 to 1936 6.9 6.1        214,569 0.8

A couple of observations on this table;

  • Netflix pretty effectively predicts my rating for movies released between 1977 to 2006. The movies from this thirty year time frame base their Netflix best guesses on more than 2,000,000 ratings per movie.
  • Netflix overestimates my ratings for movies released from 2007 to today by a half point. It may be that the people who see newer movies first are those who are most likely to rate them higher. It might take twice as many ratings before the best guess finds its equilibrium, like the best guesses for the 1987 to 2006 releases.
  • Netflix consistently underestimates my ratings for movies released prior to 1977. And, the fewer ratings the Netflix best guess is based on, the greater Netflix underestimates my rating of the movies.

What have I learned? First, to improve the quality of new movies I watch, I should wait until the number of ratings the recommendations are based on is greater. What is the right number of ratings is something I have to explore further.

The second thing I’ve learned is that my original hypothesis is probably correct. The number of ratings Netflix has available to base its recommendations on for older movies is probably too small for their recommendations to be adequately responsive to my taste for older movies. The problem is, “Oh, what to do about those tarnished old quarters” isn’t readily apparent.

 

While I Was Away, I Had a Thought or Two

Last Friday my wife and I moved into our new place. Not all of my time this past week was spent wandering through the maze of boxes to be unpacked and wondering which one contained our toaster. Every now and then random ideas for movie studies and articles popped into my head and I’m back to share them with you.

For example, a couple of weeks ago I saw the movie Sing Street. This is the third movie directed by John Carney that I’ve seen, Once and Begin Again being the other two, and I’ve “really liked” all three. There is an identifiable DNA to the movies that certain directors make. In Carney’s case, all three movies are about making music and the not always easy interrelationship the process has with love. There is also a certain DNA to the movies we enjoy watching. I think sites like Netflix and Movielens do a pretty good job of linking our movie watching DNA with a director’s movie DNA. In the coming weeks I plan to explore Movie DNA further.

October is just around the corner and another awards season is upon us. Already buzz about Oscar worthy movies is coming out of the 2016 Toronto International Film Festival, where La La Land has been anointed a Best Picture front runner. In the spirit of the season I’ve begun a data driven study of who are the top male and female actors of each decade for which Oscars have been awarded.

As I’ve begun to look at actors who’ve been nominated for awards in the earlier years of movie history, I’ve run across a number of movies that I’ve never seen before that pique my interest. Is it possible that movie sites like Netflix aren’t as effective in collecting movie DNA for vintage movies as they are for contemporary movies? Is it possible that fewer vintage movies get recommended by Netflix because there is less data in their database for movies that predate it’s existence as a movie recommender website? Would Netflix have surfaced John Carney’s three movies for me if they were made between 1947 and 1956 rather than 2007 to 2016?

As I mentioned in my last pre-sabbatical post, I’m reducing my posts to one  post each Thursday. For me, this blog is all about sharing the results of the research ideas  I’ve involved myself in. It seemed that with two posts a week I was spending too much time writing and not enough time generating the research that you might find interesting. So that’s my plan and I’m sticking to it, at least until I need another sabbatical.

The Mad Movie Man is back and there is much to do.

Until That Next Special Movie Comes Along

As I was thinking about special movies the last few days, a question occurred to me. Can I use my rating system to find movies I’ll “love” rather than just “really like”? Of course I can.

Happy 4th of July to all of my visitors from the States and, to my friends to the North, Happy Canada Day which was celebrated on this past Saturday. It is a good day to watch Yankee Doodle Dandy, one of those special movie experiences I’m fond of.

This past weekend I watched another patriotic movie,  Courage Under Fire with Denzel Washington, Meg Ryan, and a young Matt Damon among others in a terrific cast. It was one of those special movies that I yearned for in my last post on July movie prospects. It was a July 1996 release that wasn’t nominated for an Academy Award (how it didn’t get an acting nomination among several powerful performances astounds me). It earned a 94 out of 100 score from me. I loved this movie. The feeling I get after watching a movie this good is why I watch so many movies. It is the promise that there are more movies out there to see that I will love that feeds my passion for movies.

As I was thinking about special movies the last few days, a question occurred to me. Can I use my rating system to find movies I’ll “love” rather than just “really like”? Of course I can. Any movie that earns a rating of 85 out of 100 or higher meets my definition of a movie I will “love”. An 85 also converts to a five star movie on Netflix. I can rank each of the movie rating websites that I use in my algorithm from highest rating to lowest. I then can take the top 10% of the rankings and calculate the probability that a movie in that top 10% would earn a score of 85 or higher. Regular readers of this blog shouldn’t be surprised by the results.

Top 10% Threshold Actual % of My Database Probability for “Love” Movie
Netflix >  4.5 9.5% 81.4%
Movielens >  4.2 10.7% 76.9%
Criticker >  90 10.3% 55.4%
IMDB >  8.1 10.8% 45.8%
Rotten Tomatoes >  Cert. Fresh 95% 10.4% 41.7%

High Netflix and Movielens scores are the most reliable indicators of “love” movies. Here’s my problem. There are no movies that I haven’t seen in the last fifteen years that have a Netflix Best Guess of 4.5 or higher. There are fewer than 10 movies that I haven’t seen in the last fifteen years with a Movielens predicted score of greater than 4.2. Here’s the kicker, the probability that I will “love” a movie with a Movielens predicted score of 4.2 or better that doesn’t also have a Netflix Best Guess greater than 4.5 is only 62%. It seems the chances to find movies to “love” are significantly diminished without the strong support of Netflix.

On the 1st of each month Netflix Streaming and Amazon Prime shake up the movies that are available in their inventory. The July 1 shakeup has resulted in a couple of new movies being added to my list of the Top Ten “Really Like” Movies Available on Netflix or Amazon Prime. This list is actually mistitled. It should be the Top Ten “Love” Movies Available. Take a look at the list. Perhaps you haven’t seen one of these movies, or haven’t seen it in a while. It is your good fortune to be able to watch one of these movies the next time you are in the mood for a special movie experience.

As for me, I’m still hoping that one of the movies released this year rises to the top of my watch list and is able to captivate me. If it were easy to find movies that I will “love”, I would have named this blog Will I “Love” This Movie?. For now, I will continue to watch movies that I will “really like” until that next special movie comes along.

If You Want to Watch “Really Like” Movies, Don’t Count on IMDB.

There seems to be a correlation between IMDB rating and the probability of “really like” movies in the group. The problem is that the results suggest that IMDB does a better job identifying movies that you won’t “really like” rather than which ones that you will “really like”.

Today’s post is for those of you who want to get your “geek” on. As regular readers of these pages are aware, IMDB is the least reliable indicator of whether I will “really like” a given movie. As you might also be aware, I am constantly making adjustments to my forecasting algorithm for “really like” movies. I follow the practice of establishing probabilities for the movies in my database, measuring how effectively those probabilities are at selecting “really like” movies, and revising the model to improve on the results. When that’s done, I start the process all over. Which brings me back to IMDB, the focus of today’s study.

My first step in measuring the effectiveness of IMDB at selecting “really like” movies is to rank the movies in the database by IMDB average rating and then divide the movies into ten groups of the same size. Here are my results:

IMDB Avg Rating Range # of Movies Probability I Will “Really Like”
> 8.1 198 64.6%
7.8 to 8.1 198 60.6%
7.7 to 7.8 198 64.6%
7.5 to 7.7 198 58.6%
7.4 to 7.5 198 55.1%
7.2 to 7.4 198 52.5%
7.0 to 7.2 198 42.4%
6.8 to 7.0 198 39.4%
6.5 to 6.8 198 35.4%
< 6.5 197 11.7%
All Movies          1,979 48.5%

There seems to be a correlation between IMDB rating and the probability of “really like” movies in the group. The problem is that the results suggest that IMDB does a better job identifying movies that you won’t “really like” rather than which ones that you will “really like”. For example, when I’ve gone through the same exercise for Netflix and Movielens, the probabilities for the top 10% of the ratings have been over 90% for each site, compared to the 64.6% for IMDB.

With the graph displayed here, you can begin to picture the problem.

IMDB Rating Graph

The curve peaks at 7.4. There are enough ratings on the low ratings side of the curve to create significant probability differences in the groups. On the low side, it looks more like a classic bell curve. On the high side, the highest rated movie, Shawshank Redemption has a 9.2 rating. The range between 7.4 and 9.2 is too narrow to create the kind of probability differences that would make IMDB a good predictor of “really like” movies. IMDB would probably work as a predictor of “really like” movies if IMDB voters rated average movies as a 5.0. Instead an average movie is probably in the low 7s.

So, what is a good average IMDB rating to use for “really like” movies? Let’s simplify the data from above:

IMDB Avg Rating Range # of Movies Probability I Will “Really Like”
> 7.7 636 62.7%
7.3 to 7.6 502 55.4%
< 7.2 841 33.7%
All Movies          1,979 48.5%

If we want to incrementally improve IMDB as a predictor of “really like” movies, we might set the bar at movies that are rated  7.7 or higher. I’m inclined to go in the opposite direction and utilize what IMDB does best, identify which movies have a high probability of not being “really like” movies. By setting the IMDB recommendation threshold at 7.3, we are identifying better than average movies and relying on the other recommender websites to identify the “really like” movies.

IMDB is one of the most utilized movie sites in the world. It has a tremendous amount of useful information. But,if you want to select movies that you will “really like” don’t count on IMDB.

Is Opening Weekend at the Movie Theaters a Flip of the Coin?

These results beg the question, should we ever go to the movies when a movie first comes out? Without the benefit of the feedback from actual moviegoers, our potential enjoyment of a movie during its early run in the theaters might be no better than the flip of a coin.

Last weekend, the top five movies at the Box Office all earned Rotten grades from Rotten Tomatoes. Two of the five have managed to receive favorable scores from IMDB, while the remaining three have received very mediocre feedback from their IMDB voters. Here are the five movies:

TOP FIVE MOVIES AT THE BOX OFFICE
WEEKEND  OF 6/3 TO 6/5
Movie Box Office (000000) Rotten Tomatoes IMDB Avg. Rating
Teenage Mutant Ninja Turtles: Out of the Shadows $35.30 36% Rotten 6.6
X-Men: Apocalypse $22.80 48% Rotten 7.5
Me Before You $18.70 56% Rotten 8.1
Alice Through the Looking Glass $11.30 29% Rotten 6.4
The Angry Birds Movie $10.20 43% Rotten 6.4

These results beg the question, should we ever go to the movies when a movie first comes out? Without the benefit of the feedback from actual moviegoers, our potential enjoyment of a movie during its early run in the theaters might be no better than the flip of a coin. Three of the five movies were released Memorial Day weekend and their numbers are down significantly from their strong numbers the first weekend, possibly the influence of their adverse Rotten Tomatoes grades. All of the movies have a built in audience because they are sequels, or, in the case of Me Before You, they read the book, or, in the case of The Angry Birds Movie, they play the phone app. Despite an audience that is predisposed to like each movie, only in the instances of X-Men: Apocalypse and Me Before You has the audience actually liked the movie, as evidenced by the IMDB ratings. Moviegoers spent $98.3 million last weekend expecting to be entertained by these five movies. Those who saw the TMNT movie, or Angry Birds, or the latest adventure of Alice, were a little disappointed. There has to be a better way.

I don’t know if it’s possible to improve the odds of selecting “really like” movies when they are first released. My efforts to forecast “really like” movies beginning in June will at least test whether I can do it. You may have noticed that I’ve made a notation  in my June forecast that my forecast for Me Before You is final. In order to truly test the ability to project a movie before its opening weekend, all forecast adjustments have to be finalized before it opens in the theaters.  After four to six months, I plan to go back and compare how the actual “really like” probabilities developed against what I projected. After all, a forecast doesn’t have much credibility unless you keep score and demonstrate a track record of success.

I’ve been to movies on opening weekend where I felt pretty confident that I would “really like” the movie,  Captain America: Civil War for example. In that instance there was a significant amount of data out there from its International run the week before the U.S. opening. For most other movies the data will be less robust requiring more creativity.

I’d like to think I can do better than the flip of a coin.