We’re just over a week away from the debut of the eighth (and final) season of Game of Thrones. I thought I’d use some Data Science techniques to determine who, exactly, are the most important characters.
Note: For those of you who care about how the numbers were generated, scroll to the bottom of the article. For the rest of you, enjoy (or get frustrated by my choices).
Take a moment to notice the gap between Theon and Arya below. Those 65.79 points mark about the same difference between Theon and Samwell who started this list!
After Arya and Daenerys, there’s a sizable gap between the #6 and the #5 position, although there’s not much that separate #5 from #4 and #3.
Then a HUGE gap between Jaime Lannister and his brother Tyrion who ranks as #2.
Finally, one more large gap between Tyrion at #2 and our number one and my favorite for the character who winds up on the Iron Throne – Jon Snow.
To determine the rankings, I first found multiple episode-by-episode guides for the series. I then scanned the guide to see how many times each character was named per episode. Then I determined the total number of character mentions per each episode and determined a percentage for each character per episode. So, if Data Source #1 mentioned Character #1 6 times, Character #2 3 times and Character #3 once; but Data Source #2 mentioned Character #1 5 times, Character #2 twice and Character #3 three times; then Character #1 would wind up with 55% of mentions for that episode (6/10 + 5/10) / 2. Character #2 would wind up with 25% of mentions (3/10 + 2/10) / 2. That means that Character #3 winds up with 20% of the mentions (1/10 + 3/10) / 2.
The first statistic is the total number of the mention percentage. So if Character #1 was in four episodes, where he had 55% of mentions, 10% of mentions, 8% of mentions and 13% of mentions; he would have a total of 86 mention points. This is the number in the TotMenPct field (first yellow column). I then divide that value by the Maximum value for all characters (so if the character with the highest TotMenPct value had 240 points, then Character #1 would get 86/240 and would be at 35.83% of the highest point total, which means 35.83 points (TotMenPts column – 1st Green Column)
The next four columns show the number of times a character reached 1%, 5%, 10% and 20% of mentions for a single episode; with the points being generated the same way as above (character’s value divided by the highest value for a character for that column). Therefore the most that any character can have for any single column is 100. Total up those 5 columns together, and you can have a maximum of 500 points.
Written by Dave Curewitz