Degrees of Separation in Movies

Here’s a fun fact for you: 91% of all movies are within six degrees of separation from each other.

According to IMDb, anyway.

I recently wrote a script in Python to search through a snapshot of IMDb for how many movies are within six degrees of separation from “The Wizard of Oz”. Any movie that shared an actor with “The Wizard of Oz” is degree 1, and any movie that shared an actor with a movie that shared an actor with “The Wizard of Oz” is degree 2, etc… The snapshot uses the following SQLite schema:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
CREATE TABLE Actor (
    aid INTEGER PRIMARY KEY AUTOINCREMENT,
    first TEXT DEFAULT '',
    last TEXT DEFAULT '',
    dob DATE DEFAULT '',
    gender TEXT DEFAULT '');

CREATE TABLE Movie (
    mid INTEGER PRIMARY KEY AUTOINCREMENT,
    title TEXT DEFAULT '',
    year INTEGER DEFAULT 0,
    rating TEXT DEFAULT '');

CREATE TABLE Role (
    aid INTEGER REFERENCES Actor(aid),
    mid INTEGER REFERENCES Movie(mid),
    role TEXT DEFAULT '',
    billing INTEGER DEFAULT 0);

Here’s the percent of all movies within 6 degrees:

When I saw this I started to wonder what percent of movies is “The Wizard of Oz” connected to for any degree number? I ran the code again for 15 degrees this time, and it looks like it tops out at about 91%…

Well that’s cool I guess, but let’s look at other movies! I’m sure you’re all wondering how well connected Nicolas Cage is – I have a bit of a soft spot for Con Air.

After looking around at a few other movies, it seems like all of them top out at 91% after degree 4 or so. I guess this means that 9% of the movies on IMDb star actors who are not even remotely famous. You can find my code here, let me know if you find any other cool trends!

Comments