What’s This Movie’s Demographic Footprint?

A Quiet Place has attracted a young audience of both males and females

You’re probably looking at this week’s title and saying to yourself, “What is the Mad Movie Man getting us into this week?” Well, thanks to the folks at IMDB who provide demographic data of each movie based on their ratings, we can derive a demographic footprint for each movie.

Wait a minute, didn’t I tell you in these pages that IMDB ratings are skewed heavily towards men. I did. I’ll also tell you that the ratings are skewed towards IMDB voters older than 29. Based on my data sample, only 18.6% of all votes on IMDB are female votes. Additionally, only 38.8% of all votes are from movie fans under 30.   By scaling this data to the averages, we can begin to neutralize the biases of the data. From this data I can create the following scales:

Female % of IMDB Vote
“guy movie” Gender Neutral “Chick flick”
< 14%  14% – 23.3%  > 23.3%
Age < 30 % of IMDB Vote
“young adult’s  movie” Age Neutral “grown-ups movie”
< 29.1%  29.1% – 48.5%  > 48.5%

Additionally, IMDB provides average ratings by demographic group. We can use this data to determine if any particular group tends to like a movie more than others.

So, how would this work? Let’s say you are going to the movies this weekend and you are trying to decide which movie to see. The three movies below are all quality movies.

 

Movie Gender Orientation Gender Friendly Age Orientation Age Friendly
A Quiet Place Neutral Female < 30 30+
Ready Player One Male Neutral Neutral 30+
Love, Simon Female Female < 30 Neutral

Which movie you decide to see comes down to who you are going to see the movie with. There really aren’t any bad choices here. There are just more informed choices. A Quiet Place has attracted a young audience of both males and females but women and voters over 29 tend to like it more than the target audience. Ready Player One is a “guy” movie that women don’t seem to mind and older audiences seem to like more than younger audiences. And, Love, Simon is geared towards a young female audience that all ages seem to like. Which movie best fits your demographic footprint and the footprint of whoever is joining you at the movies should lead you to the best decision.

I’m excited about sharing this perspective on a given movie. I’ve already incorporated it into my weekly movie watch list. I’ll be incorporating this movie view into other lists I’ll be creating in the near future. I hope you find the approach useful. If you agree let me know.

 

 

Before You See Mother! This Weekend, You Might Read This Article

As you might expect, I’m a big fan of Nate Silver’s FiveThirtyEight website. Last Thursday they published an interesting article on the impact of polarizing movies on IMDB ratings, using Al Gore’s An Inconvenient Sequel: Truth to Power as an example. This is not the first instance of this happening and it won’t be the last.

As you might expect, I’m a big fan of Nate Silver’s FiveThirtyEight website. Last Thursday they published an interesting article on the impact of polarizing movies on IMDB ratings, using Al Gore’s An Inconvenient Sequel: Truth to Power as an example. This is not the first instance of this happening and it won’t be the last.

When the new Ghostbusters movie with the all female cast came out in July 2016 there was a similar attempt to tank the IMDB ratings for that movie. That attempt was by men who resented the all female cast. At that time I posted this article. Has a year of new ratings done anything to smooth out the initial polarizing impact of the attempt to tank the ratings? Fortunately, IMDB has a nice little feature that allows you to look at the demographic distribution behind a movie’s rating. If you access IMDB on it’s website, clicking the number of votes that a rating is based on will get you to the demographics behind the rating.

Before looking at the distribution for Ghostbusters, let’s look at a movie that wasn’t polarizing. The 2016 movie Sully is such a movie according to the following demographics:

Votes Average
Males  99301  7.4
Females  19115  7.6
Aged under 18  675  7.7
Males under 18  566  7.6
Females under 18  102  7.8
Aged 18-29  50050  7.5
Males Aged 18-29  40830  7.5
Females Aged 18-29  8718  7.6
Aged 30-44  47382  7.4
Males Aged 30-44  40321  7.4
Females Aged 30-44  6386  7.5
Aged 45+  12087  7.5
Males Aged 45+  9871  7.5
Females Aged 45+  1995  7.8
IMDb staff  17  7.7
Top 1000 voters  437  7.2
US users  17390  7.5
Non-US users  68746  7.4

There is very little difference in the average rating (the number to the far right) among all of the groups. When you have a movie that is not polarizing, like Sully, the distribution by rating should look something like this:

Votes  Percentage  Rating
12465  8.1% 10
19080  12.4% 9
52164  33.9% 8
47887  31.1% 7
15409  10.0% 6
4296  2.8% 5
1267  0.8% 4
589  0.4% 3
334  0.2% 2
576  0.4% 1

It takes on the principles of a bell curve, with the most ratings clustering around the average for the movie.

Here’s what the demographic breakdown for Ghostbusters looks like today:

Votes Average
Males  87119  5.0
Females  27237  6.7
Aged under 18  671  5.3
Males under 18  479  4.9
Females under 18  185  6.6
Aged 18-29  36898  5.4
Males Aged 18-29  25659  5.0
Females Aged 18-29  10771  6.7
Aged 30-44  54294  5.2
Males Aged 30-44  43516  5.0
Females Aged 30-44  9954  6.6
Aged 45+  11422  5.3
Males Aged 45+  9087  5.1
Females Aged 45+  2130  6.3
IMDb staff  45  7.4
Top 1000 voters  482  4.9
US users  25462  5.5
Non-US users  54869  5.2

There is still a big gap in the ratings between men and women and it persists in all age groups. This polarizing effect produces a ratings distribution graph very different from the one for Sully.

Votes  Percentage  Rating
20038  12.8% 10
6352  4.1% 9
13504  8.6% 8
20957  13.4% 7
24206  15.5% 6
18686  12.0% 5
10868  7.0% 4
7547  4.8% 3
6665  4.3% 2
27501  17.6% 1

It looks like a bell curve sitting inside a football goal post. But it is still useful because it suggests the average IMDB rating for the movie when you exclude the 1’s and the 10’s is around 6 rather than a 5.3.

You are probably thinking that, while interesting, is this information useful. Does it help me decide whether to watch a movie or not? Well, here’s the payoff. The big movie opening this weekend that the industry will be watching closely is Mother!. The buzz coming out of the film festivals is that it is a brilliant but polarizing movie. All four of the main actors (Jennifer Lawrence, Javier Bardem, Michele Pfeiffer, Ed Harris) are in the discussion for acting awards. I haven’t seen the movie but I don’t sense that it is politically polarizing like An Inconvenient Sequel and Ghostbusters. I think it probably impacts the sensibilities of different demographics in different ways.

So, should you go see Mother! this weekend? Fortunately, its early screenings at the film festivals give us an early peek at the data trends. The IMDB demographics so far are revealing. First, by looking at the rating distribution, you can see the goal post shape of the graph, confirming that the film is polarizing moviegoers.

Votes  Percentage  Rating
486  36.0% 10
108  8.0% 9
112  8.3% 8
92  6.8% 7
77  5.7% 6
44  3.3% 5
49  3.6% 4
40  3.0% 3
52  3.8% 2
291  21.5% 1

57.5% of IMDB voters have rated it either a 10 or a 1. So are you likely to love it or hate it? Here’s what the demographics suggest:

Votes Average
Males  717  6.1
Females  242  5.4
Aged under 18  25  8.4
Males under 18  18  8.2
Females under 18  6  10.0
Aged 18-29  404  7.3
Males Aged 18-29  305  7.5
Females Aged 18-29  98  6.1
Aged 30-44  288  5.0
Males Aged 30-44  215  5.0
Females Aged 30-44  69  5.2
Aged 45+  152  4.3
Males Aged 45+  111  4.3
Females Aged 45+  40  4.1
Top 1000 voters  48  4.6
US users  273  4.4
Non-US users  438  6.5

While men like the movie more than women, if you are over 30, men and women hate the movie almost equally. There is also a 2 point gap between U.S. and non-U.S. voters. This is a small sample but it has a distinct trend. I’ll be interested to see if the trends hold up as the sample grows.

So, be forewarned. If you take your entire family to see Mother! this weekend, some of you will probably love the trip and some of you will probably wish you stayed home.

 

What IMDB Ratings Give You the Best Chance for a “Really Like” Movie?

As I was browsing the IMDB ratings for the movies released in July, I wondered how the average user of IMDB knows what is a good rating for a movie. I’m sure the more than casual visitor to IMDB would see the 8.2 rating for Baby Driver and immediately recognize that only above average movies receive ratings that high. Or, they might see the 1.5 rating for The Emoji Movie and fully understand that this is a really bad movie. But, what about the 6.8 for Valerian and the City of a Thousand Planets or the 7.2 for Atomic Blonde.

As I was browsing the IMDB ratings for the movies released in July, I wondered how the average user of IMDB knows what is a good rating for a movie. I’m sure the more than casual visitor to IMDB would see the 8.2 rating for Baby Driver and immediately recognize that only above average movies receive ratings that high. Or, they might see the 1.5 rating for The Emoji Movie and fully understand that this is a really bad movie. But, what about the 6.8 for Valerian and the City of a Thousand Planets or the 7.2 for Atomic Blonde. They might have a number in their head as to what is the tipping point for a good and bad rating but that number could only be a guess. To really know, you’d have to compile a list of all the movies you’ve seen and compare their IMDB rating to how you’ve rated them. That would be crazy. Right? But, wait a minute. I’m that crazy! I’ve done that! Well, maybe not every movie I’ve ever seen. But, every movie I’ve seen in the last fifteen years.

So, given the fact that I’ve done what only a crazy man would do, what can I tell you about what is a good IMDB rating. Here’s my breakdown:

IMDB Avg. Rating # I Really Liked # I Didn’t Really Like Really Like %
> 8.2 108 43 71.5%
7.2 to 8.1 732 427 63.2%
6.2 to 7.1 303 328 48.0%
< 6.2 6 71 7.8%
> 7.2 840 470 64.1%
< 7.2 309 399 43.6%
All 1149 869 56.9%

The data suggests that IMDB ratings of 7.2 or higher give me the best chance of choosing a “really like” movie.

I mentioned a few posts ago that my new long range project is to develop a database that is totally objective, free from the biases of my movie tastes. I’m compiling data for the top 150 movies in box office receipts for the last 25 years. It’s a time-consuming project that should produce a more robust sample for analysis. One of my concerns has been that the database of movies that I’ve seen doesn’t have a representative sample of bad movies. While it’s a long way from completion, I have completed years 1992 and 1993 which are representative enough to make my point.

IMDB Avg. Rating % of All Movies in Objective Database (Years 1992 & 1993) % of All Movies in My Seen Movie Database
> 8.2 1% 7%
7.2 to 8.1 23% 57%
6.2 to 7.1 35% 31%
< 6.2 41% 4%

Over the last six or seven years in particular, I have made a concerted effort to avoid watching bad movies. You can see this in the data. If 7.2 is the “really like” benchmark, then only 24% of the top 150 movies at the box office are typically “really like” movies. On the other hand, my selective database has generated 64% “really like” movies over the past 15 years. This is a big difference.

***

While no new movies broke into the Objective Top Fifteen this week, Megan Leavy, which was released around eight weeks ago, slipped into the list. This under-the-radar movie didn’t have enough critics’ reviews to be Certified Fresh on Rotten Tomatoes until recently.

As for this weekend, The Dark Tower could be a disappointment to everyone but the most die-hard of Stephen King fans. Instead, I’m keeping an eye on Detroit. This urban drama, directed by Kathryn Bigelow, captures the chaos of Detroit in 1967. It probably will be surveyed by Cinemascore.

A third movie, that probably won’t be surveyed by Cinemascore but I’m watching nevertheless, is Wind River. Taylor Sheridan, who wrote the acclaimed movies Hell or High Water and Sicario, wrote this movie. Sheridan is a great young talent who is stepping behind the camera in his directorial debut as well.

 

 

 

 

Leave Mummy Out of Your Father’s Day Plans

One of the goals of this blog is to make sure that you are aware of the internet tools that are out there to protect you from wasting your time on blockbusters like The Mummy. While it had a disappointing opening in the U.S., moviegoers still shelled out an estimated $32.2 million at the box office last weekend for this bad movie. Overseas it met its blockbuster expectations with a box office of $141.8 million. However, if you were really in the mood for a horror genre movie a better choice, but not a sure thing, might have been It Comes At Night which had a more modest U.S. box office of $6 million.

One of the goals of this blog is to make sure that you are aware of the internet tools that are out there to protect you from wasting your time on blockbusters like The Mummy. While it had a disappointing opening in the U.S., moviegoers still shelled out an estimated $32.2 million at the box office last weekend for this bad movie. Overseas it met its blockbuster expectations with a box office of $141.8 million. However, if you were really in the mood for a horror genre movie a better choice, but not a sure thing, might have been It Comes At Night which had a more modest U.S. box office of $6 million.

As a general rule, I won’t go to a movie on its opening weekend. I prefer to get at least a weekend’s worth of data. But if you just have to see a movie on its opening weekend here are a couple of hints. First, if you are seeing the movie on its opening Friday, the most reliable indicator is Rotten Tomatoes. Most critics have released their reviews before the day of the movie’s release. The Rotten Tomatoes rating on the movie’s release date is a statistically mature evaluation of the movie. It won’t change much after that day.

If you are going to the movies on the Saturday of opening weekend, you can add Cinemascore to the mix. I’ve blogged about this tool before. This grade is based on feedback moviegoers provide about the movie as they are leaving the theater. The grade is posted on the Saturday after the Friday release.

Finally, by Sunday IMDB will produce a pretty good, though slightly inflated, average rating for the movie.

The comparison of these three checkpoints for The Mummy and for It Comes At Night might’ve been helpful to those who thought they were in for a “really like” movie experience.

Rotten Tomatoes IMDB Avg. Rating Cinemascore Grade
The Mummy Rotten (17%) 5.9 B-
It Comes At Night Certified Fresh (86%) 7.2 D

While the Cinemascore grade of D for It Comes At Night would keep me away from opening weekend for both movies, if I had to see one, it wouldn’t be The Mummy.

Here’s the data behind my reasoning. For IMDB, the breakpoint between a movie with a good chance that I will “really like” it and one that I probably won’t like is an average rating of 7.2. Movies with a 7.2 IMDB average rating of 7.2 or higher I “really like” 63.3% of the time. Movies with an IMDB rating less than 7.2 I “really like” 43.3% of the time. Turning to Rotten Tomatoes, Movies that are Certified Fresh I “really like” 68% of the time. These “really like” percentages drop to 49.6% for movies that are Fresh and 37.5% for movies that are Rotten. So absent any information based on my own personal tastes, I won’t go to the movieplex to watch a movie that isn’t graded Certified Fresh by Rotten Tomatoes and has an IMDB Rating 7.2 or higher. That doesn’t mean that there aren’t any movies out there that don’t meet that criteria that I wouldn’t “really like”. The movie may be in a genre that appeals to me which might provide some tolerance for a little less quality. That being said, the odds that I’ll “really like” a low rated movie are less than 50/50.

I should probably explore the potential of adding Cinemascore to the objective probability factors I use in developing “really like” probabilities. To date, though, I don’t have any Cinemascore data . I don’t yet have a feel for its “really like” reliability. For now, I just use it as another piece of data that might tip me one way or the other if I’m on the fence about a new movie.

Enjoy Father’s Day but stay away from Mummy.

For 1987 to 1996, the Actress of the Decade Comes Down to a Coin Toss?

Three months ago I began a series of articles on the best actors and actresses of each of the nine decades of Oscar. I was satisfied with the approach I was taking until…this month.

Three months ago I began a series of articles on the best actors and actresses of each of the nine decades of Oscar. I was satisfied with the approach I was taking until…this month. My scoring system works great when the results come out like the 1987 to 1996 Actor of the Decade.

Top Actors of the Decade
1987 to 1996
Actor Lead Actor Nominations Lead Actor Wins Supporting Actor Nominations Supporting Actor Wins Total Academy Award Points
Tom Hanks 3 2 0 0 15
Anthony Hopkins 3 1 0 0 12
Robin Williams 3 0 0 0 9
Daniel Day Lewis 2 1 0 0 9
Al Pacino 1 1 2 0 8

Clearly, Tom Hanks deserves that honor since he won Best Actor twice and Anthony Hopkins won only once. Both were nominated three times.

Now, let’s look at the Actresses of the decade.

Top Actresses of the Decade
1987 to 1996
Actress Lead Actress Nominations Lead Actress Wins Supporting Actress Nominations Supporting Actress Wins Total Academy Award Points
Susan Sarandon 4 1 0 0 15
Jodie Foster 3 2 0 0 15
Emma Thompson 3 1 1 0 13
Meryl Streep 4 0 0 0 12
Holly Hunter 2 1 1 0 10

It’s a tie…and it’s kind of a mess. Including Supporting Actress nominations, Susan Sarandon, Meryl Streep, and Emma Thompson all have one more nomination than Jodie Foster. Because Jodie Foster won twice, she passes everyone except Susan Sarandon. The two actresses tie because my scoring system values a Lead Actress win twice as much as a nomination. Previously I’ve handled ties by letting IMDB and Rotten Tomatoes results for nominated movies act as a tie breaker. In this case, it’s inconclusive.

Tie Breakers for Top Actresses of the Decade
Avg IMDB & Rotten Tomatoes Ratings for Nominated Movies
Released from 1987 to 1996
Actor IMDB Avg Rating # of Votes Rotten Tomatoes % Fresh How Fresh? # of Critics Reviews
Susan Sarandon 7.3    242,422 88% Certified Fresh 191
Jodie Foster 8.5    971,401 84% Certified Fresh 125

The critics like Susan Sarandon’s movies more, but Jodie Foster rides Silence of the Lambs to a decisive IMDB nod.

In trying to decipher an advantage in these tie-breaker results, I reached a very different conclusion. They’re probably not that relevant. Critics and viewers may like a movie because of an actors performance, or they may like it for an entirely different reason. It isn’t like Oscar voting which is focused solely on the performance of a single actor. It would be better to use Golden Globe or Screen Actors Guild results as tie breakers or supplements to the scoring system.

And, is an Oscar win twice as valuable an indicator of greatness as an Oscar nomination? No, it’s even more valuable.

For Best Actress in a Leading Role
Number of Actresses Who Have:
% of Total Nominated
Been Nominated 219
Been Nominated More than Once 85 38.8%
Won 72 32.9%
Won More Than Once 13 5.9%

It is easier to be nominated twice than it is to win once. And, it has been more than five times as hard to win twice as it is to be nominated twice.

I’ve got to rework my scoring system. For now, with only two decades left to consider, we’ll keep it as it is. For Actress of this decade, it is a coin toss with a coin weighted towards Jodie Foster and her two wins.

Create, Test, Analyze, and Recreate

Apple’s IPhone just turned 10 years old. Why has it been such a successful product? It might be because the product hasn’t stayed static. The latest version of the IPhone is the IPhone 7+. As a product, it is constantly reinventing itself to improve its utility. It is always fresh. Apple, like most producers of successful products, probably follows a process whereby they:
Create.
Test what they’ve created.
Analyze the results of their tests.
Recreate.
They never dust off their hands and say, “My job is done.”

Apple’s IPhone just turned 10 years old. Why has it been such a successful product? It might be because the product hasn’t stayed static. The latest version of the IPhone is the IPhone 7+. As a product, it is constantly reinventing itself to improve its utility. It is always fresh. Apple, like most producers of successful products, probably follows a process whereby they:

  1. Create.
  2. Test what they’ve created.
  3. Analyze the results of their tests.
  4. Recreate.

They never dust off their hands and say, “My job is done.”

Now I won’t be so presumptuous to claim to have created something as revolutionary as the IPhone. But, regardless of how small your creation, its success requires you to follow the same steps outlined above.

My post last week outlined the testing process I put my algorithm through each year. This week I will provide some analysis and take some steps towards a recreation. The results of my test was that using my “really like” movie selection system significantly improved the overall quality of the movies I watch. On the negative side, the test showed that once you hit some optimal number of movies in a year the additional movies you might watch has a diminishing quality as the remaining pool of “really like” movies shrinks.

A deeper dive into these results begins to clarify the key issues. Separating movies that I’ve seen at least twice from those that were new to me is revealing.

Seen More than Once Seen Once
1999 to 2001 2014 to 2016 1999 to 2001 2014 to 2016
# of Movies 43 168 231 158
% of Total Movies in Timeframe 15.7% 51.5% 84.3% 48.5%
IMDB Avg Rating                   7.6                   7.6                   6.9                   7.5
My Avg Rating                   8.0                   8.4                   6.1                   7.7
% Difference 5.2% 10.1% -12.0% 2.0%

There is so much interesting data here I don’t know where to start. Let’s start with the notion that the best opportunity for a “really like” movie experience is the “really like” movie you’ve already seen. I’ve highlighted in teal the percentage that My Avg Rating outperforms the IMDB Avg Rating in both timeframes. The fact that, from 1999 to 2001, I was able to watch movies that I “really liked” more than the average IMDB voter, without the assistance of any movie recommender website, suggests that memory of a “really like” movie is a pretty reliable “really like” indicator. The 2014 to 2016 results suggest that my “really like” system can help prioritize the movies that memory tells you that you will “really like” seeing again.

The data highlighted in red and blue clearly display the advantages of the “really like” movie selection system. It’s for the movies you’ve never seen that movie recommender websites are worth their weight in gold. With limited availability of movie websites from 1999 to 2001 my selection of new movies underperformed the IMDB Avg Rating by 12% and they represented 84.3% of all of the movies I watched during that timeframe. From 2014 to 2016 (the data in blue), my “really like” movie selection system recognized that there is a limited supply of new “really like” movies. As a result less than half of the movies watched from 2014 through 2016 were movies I’d never seen before. Of the new movies I did watch, there was a significant improvement over the 1999 to 2001 timeframe in terms of quality, as represented by the IMD Avg Rating, and my enjoyment of the movies, as represented by My Avg Rating.

Still, while the 2014 to 2016 new movies were significantly better than the new movies watched from 1999 to 2001, is it unrealistic to expect My Ratings to be better than IMDB by more than 2%? To gain some perspective on this question, I profiled the new movies I “really liked” in the 2014 to 2016 timeframe and contrasted them with the movies I didn’t “really like”.

Movies Seen Once
2014 to 2016
“Really Liked” Didn’t “Really Like”
# of Movies 116 42
% of Total Movies in Timeframe 73.4% 26.6%
IMDB Avg Rating                       7.6                                  7.5
My Avg Rating                       8.1                                  6.3
“Really Like” Probability 82.8% 80.7%

The probability results for these movies suggest that I should “really like” between 80.7% and 82.8% of the movies in the sample. I actually “really liked” 73.4%, not too far off the probability expectations. The IMDB Avg Rating for the movies I didn’t “really like” is only a tick lower than the rating for the “really liked” movies. Similarly, the “Really Like” Probability is only a tick lower for the Didn’t “Really Like” movies. My conclusion is that there is some, but not much, opportunity to improve selection of new movies through a more disciplined approach. The better approach would be to favor “really like” movies that I’ve seen before and give new movies more time for their data to mature.

Based on my analysis, here is my action plan:

  1. Set separate probability standards for movies I’ve seen before and movies I’ve never seen.
  2. Incorporate the probability revisions into the algorithm.
  3. Set a minimum probability threshold for movies I’ve never seen before.
  4. When the supply of “really like” movies gets thin, only stretch for movies I’ve already seen and memory tells me I “really liked”.

Create, test, analyze and recreate.

 

How Do You Know a Tarnished Penny Isn’t a Tarnished Quarter?

One of my first posts on this site was The Shiny Penny in which I espoused the virtues of older movies. I still believe that and yet here I am, almost eleven months later, wondering if my movie selection algorithm does a good enough job surfacing those “tarnished quarters”. A more accurate statement of the problem is that older movies generate less data for the movie websites I use in my algorithm which in turn creates fewer recommended movies.

One of my first posts on this site was The Shiny Penny in which I espoused the virtues of older movies. I still believe that and yet here I am, almost eleven months later, wondering if my movie selection algorithm does a good enough job surfacing those “tarnished quarters”. A more accurate statement of the problem is that older movies generate less data for the movie websites I use in my algorithm which in turn creates fewer recommended movies.

Let me explain the issue by using a comparison of IMDB voting with my own ratings for each movie decade. Since I began developing my algorithm around 2010, I’m also going to use 2010 as the year that I began disciplining my movie choices to an algorithm. Also, you might recall from previous posts, that my database consists of movies I’ve watched in the last fifteen years. Each month I remove movies from the database that go beyond the fifteen years and make them available for me to watch again. One other clarification, I use the IMDB ratings for age 45+ to better match with my demographic.

To familiarize you with the format I’ll display for each decade here’s a look at the 2010’s:

Database Movies Released in the 2010’s # of Movies % of Movies Avg # of Voters Avg. IMDB Rating My Avg. Rating
Viewed After Algorithm 340 100.0%    10,369 7.3 7.3
Viewed Before Algorithm 0 0.0%

The 340 movies that I’ve seen from the 2010’s are 17.2% of all of the movies I’ve seen in the last 15 years and there are three more years in the decade to go. If the number of recommended movies were distributed evenly across all nine decades this percentage would be closer to 11%. Because the “shiny pennies” are the most available to watch, there is a tendency to watch more of the newer movies. I also believe that many of the newer movies fit the selection screen before the data matures that might not fit the screen after the data matures. The Average # of Voters column is an indicator of how mature the data is. Keep this in mind as we look at subsequent decades.

The 2000’s represent my least disciplined movie watching. 38.4% of all of the movies in the database come from this decade. The decision to watch specific movies was driven primarily by what was available rather than what was recommended.

Database Movies Released in the 2000’s # of Movies % of Movies Avg # of Voters Avg. IMDB Score Avg.My Score
Viewed After Algorithm 81 10.6%    10,763 7.2 6.8
Viewed Before Algorithm 680 89.4%    10,405 7.1 6.4

One thing to remember about movies in this decade is that only movies watched in 2000 and 2001 have dropped out of the database. As a result, only 10.6% of the movies were selected to watch with some version of the selection algorithm.

The next three decades represent the reliability peak in terms of the algorithm.

Database Movies Released in the 1990’s # of Movies % of Movies Avg # of Voters Avg. IMDB Score Avg.My Score
Viewed After Algorithm 115 46.7%    18,179 7.4 8.1
Viewed Before Algorithm 131 53.3%    11,557 7.2 7.0
Database Movies Released in the 1980’s # of Movies % of Movies Avg # of Voters Avg. IMDB Score Avg.My Score
Viewed After Algorithm 68 44.4%    14,025 7.5 7.6
Viewed Before Algorithm 85 55.6%    12,505 7.4 7.0
Database Movies Released in the 1970’s # of Movies % of Movies Avg # of Voters Avg. IMDB Score Avg.My Score
Viewed After Algorithm 38 38.0%    18,365 7.8 7.6
Viewed Before Algorithm 62 62.0%      9,846 7.5 6.5

Note that the average number of voters per movie is higher for these three decades than the movies released after 2000. Each decade there is a growing gap in the number of voters per movie that get recommended by the algorithm and those that are seen before using the algorithm. This may be indicative of the amount of data needed to produce a recommendation. You also see larger gaps in my enjoyment of the movies that use the disciplined movie selection process against those movies seen prior to the use of the algorithm. My theory is that younger movie viewers will only watch the classics and as a result they are the movies that generate sufficient data for the algorithm to be effective.

When we get to the four oldest decades in the database, it becomes clear that the number of movies with enough data to fit the algorithm is minimal.

Database Movies Released in the 1960’s # of Movies % of Movies Avg # of Voters Avg. IMDB Score Avg.My Score
Viewed After Algorithm 23 20.0%    14,597 8.0 8.3
Viewed Before Algorithm 92 80.0%      6,652 7.7 6.6
Database Movies Released in the 1950’s # of Movies % of Movies Avg # of Voters Avg. IMDB Score Avg.My Score
Viewed After Algorithm 22 18.0%    11,981 8.0 8.4
Viewed Before Algorithm 100 82.0%      5,995 7.7 5.9
Database Movies Released in the 1940’s # of Movies % of Movies Avg # of Voters Avg. IMDB Score Avg.My Score
Viewed After Algorithm 21 22.1%      8,021 8.0 7.9
Viewed Before Algorithm 74 77.9%      4,843 7.8 6.5
Database Movies Released in the Pre-1940’s # of Movies % of Movies Avg # of Voters Avg. IMDB Score Avg.My Score
Viewed After Algorithm 7 14.0%    12,169 8.0 7.5
Viewed Before Algorithm 43 86.0%      4,784 7.9 6.2

The results are even more stark. For these oldest decades of movies, today’s movie viewers and critics are drawn to the classics for these decades but probably not much else. It is clear that the selection algorithm is effective for movies with enough data. The problem is that the “really like” movies from these decades that don’t generate data don’t get recommended.

Finding tarnished quarters with a tool that requires data when data diminishes as movies age is a problem. Another observation is that the algorithm works best for the movies released from the 1970’s to the 1990’s probably because the data is mature and plentiful. Is there a value in letting the shiny pennies that look like quarters get a little tarnished before watching them?

Merry Christmas to all and may all of your movies seen this season be “really like” movies.