We’re just over a week away from the debut of the eighth (and final) season of Game of Thrones. I thought I’d use some Data Science techniques to determine who, exactly, are the most important characters.
Note: For those of you who care about how the numbers were generated, scroll to the bottom of the article. For the rest of you, enjoy (or get frustrated by my choices).
Take a moment to notice the gap between Theon and Arya below. Those 65.79 points mark about the same difference between Theon and Samwell who started this list!
After Arya and Daenerys, there’s a sizable gap between the #6 and the #5 position, although there’s not much that separate #5 from #4 and #3.
Then a HUGE gap between Jaime Lannister and his brother Tyrion who ranks as #2.
Finally, one more large gap between Tyrion at #2 and our number one and my favorite for the character who winds up on the Iron Throne – Jon Snow.
To determine the rankings, I first found multiple episode-by-episode guides for the series. I then scanned the guide to see how many times each character was named per episode. Then I determined the total number of character mentions per each episode and determined a percentage for each character per episode. So, if Data Source #1 mentioned Character #1 6 times, Character #2 3 times and Character #3 once; but Data Source #2 mentioned Character #1 5 times, Character #2 twice and Character #3 three times; then Character #1 would wind up with 55% of mentions for that episode (6/10 + 5/10) / 2. Character #2 would wind up with 25% of mentions (3/10 + 2/10) / 2. That means that Character #3 winds up with 20% of the mentions (1/10 + 3/10) / 2.
The first statistic is the total number of the mention percentage. So if Character #1 was in four episodes, where he had 55% of mentions, 10% of mentions, 8% of mentions and 13% of mentions; he would have a total of 86 mention points. This is the number in the TotMenPct field (first yellow column). I then divide that value by the Maximum value for all characters (so if the character with the highest TotMenPct value had 240 points, then Character #1 would get 86/240 and would be at 35.83% of the highest point total, which means 35.83 points (TotMenPts column – 1st Green Column)
The next four columns show the number of times a character reached 1%, 5%, 10% and 20% of mentions for a single episode; with the points being generated the same way as above (character’s value divided by the highest value for a character for that column). Therefore the most that any character can have for any single column is 100. Total up those 5 columns together, and you can have a maximum of 500 points.
The Simpson’s second season ran for 22 episodes between October 1990 and May 1991 (with a final episode being aired in July of that year) and features classic episodes such as “Two Cars in Every Garage and Three Eyes on Every Fish” featuring Blinky, the three-eyed fish; “The Way We Was” which told of how Marge and Homer met; and the first Treehouse of Horror episode. By the end of this season, the template of the Simpsons series will have been set. The Second season would also show the series at it’s ratings peak. So now with that quick intro out of the way let’s look at the second season with the same metrics we looked at the first (click HERE to see the Season One breakdown)
Just as last time, I’d like to look at the interconnection of characters from this season. As I did with Season 1, I decided that each character needed to have at least 5% of the character mentions in an episode and they had to meet that threshold for at least 2 episodes. I then compared characters who met that threshold and appeared in the same episode. In the following NetworkX diagram, the size of the circle represents the total number of episodes that the character appeared in (with that 5% mention threshold). Meanwhile, the “stronger” the line is between two characters represents a larger number of episodes that the two characters appeared in.
The four primary Simpsons characters (all indicated with aqua circles) are the most prominent and have the strongest “ties” to each other. The curious thing I see with this is that while there are strong ties between Bart and Homer with all the others; Marge and Lisa share only a moderate bond. This indicates that while Lisa and Marge interact heavily with Homer and Bart, they aren’t featured in the same episodes starring the other.
As for the secondary characters, Burns has ties with Homer, Marge (she was hired to paint a picture of him in “Brush with Greatness”) and his loyal flunky Smithers. I was also a bit surprised to see that Martin was more frequently portrayed than Milhouse, Nelson or any of the other kids at school.
I’d also like to briefly discuss the representation of women and minorities in the series. While Season 1 has an average of just over 25% of female representation per episode, in Season 2 it went down ever so slightly to 24.83%; however the median (middle) value went up from 19.52% to 25.48%; so I guess you could say, it’s a wash. One important difference however, is that in Season 2, no episode was over 50% female (although Episode #19 “Lisa’s Substitute” came close with 49.63%. Unfortunately, the reviews of Episode #17 (“Old Money”) don’t include any mentions of the reoccuring female characters (although Bea Simmons, who dies leaving Grampa everything does play a role, albeit by dying).
As for the portrayal of minorities, the average representation per Season 2 Episode was only .76%; which I guess is an improvement from Season 1’s .29%; although the median score for each series was 0. Altogether, the storylines of 17 out of the 22 episodes in the season didn’t warrant a mention of a minority character; while the highpoint was 5.81%; which occured in Episode 10’s “Bart Gets Hit by a Car”, which featured Dr. Hibbert and Dr. Nick Riviera.
For those of you who read the writeup on Season 1, you’ll know about the color-coding of the following grid, but for the rest of you, here it goes. The chart below shows character mentions, as based upon a percentage of the total character mentions for that episode. The chart is also color coded from purple (less than 1% of a mention in an episode), blue (between 1% and 5%), green (5% to 10%), gold (10% to 20%), orange (20% to 30%) and red (more than 30%). The actual percentage (rounded down to the nearest number) is placed in the block for that episode with the total percentage (rounded down to the nearest number again) in the total columns. It’s this rounding down, that is responsible for the episode-based numbers not equaling the total number (so Martin was in 7.62% of Episode 2 and 2.38% of Episode 10 – which totals 10%). If a character has a total of 10 points or more, they appear in this chart:
As opposed to Season 1, where Bart narrowly edged out his father; Season 2 has Homer with a overwhelming lead over his son, with 610 mention points versus Bart’s 293. We then get Marge with 254 and Lisa with 183. What surprises me the most is that Mr. Burns was only 14 poins behind Lisa with 169 poins. Nobody else has over 100, but Grampa comes closest with 91 and then there’s a big dropoff to Flander’s 51; followed by Smithers with 40, Martin with 34 and then a tie between Milhouse and Herb, Homer’s half-brother, each with 29 to close out the Top Ten.
For the overall ranking for Season 2, as opposed to Season One’s near tie behind Bart and Homer (with Bart edging out his father), Season Two went overwhelmingly to Homer 550.00 to 418.25 (even though Bart “won” the Titular episodes category). Homer wound up dominated the “Most Prominent Characters” and “25% Episode” Categories (limiting Bart and the rest of the pack to a maximum of 44.44% and 36.36%).
Marge and Lisa are once again in 3rd and 4th place, with both making gains over the previous year. The only other character to pass the 100 point threshold was Mister Burns, who racked up 153.39 points (over a 100 point gain from Season One). Grampa, Flanders and Smithers made the top 10 this year. Skinner dropped from 8th (46.82 points) to 9th (43.52 points) this season; while Maggie dropped from 5th place (116.13) to 10th (41.87). Krusty, Moe and Milhouse didn’t make the Top 10 this year.
Finally, now that I’ve processed two seasons, I’m going to rank the characters across seasons the same way that I do within one (using the same 6 categories). When looking at Seasons 1 and 2, Homer winds up being the most important character with 566.67 out of a possible 600 points. In second is Bart with 284.37, followed by Marge with 267.15 and Lisa with 241.42. Mr. Burns is the first non-Simpson to appear, with 114.50 points; which actually beats Maggie and Grampa Simpson. Finally, Principal Skinner, Krusty the Klown and Moe round up the Top Ten.
At the time of this writing, the Simpsons has been on the air for over 30 years, with 654 episodes (and climbing). It’s amazing that any show can be on the air for this long, and while the viewership (and some would say the quality) of the series has gone down since it’s heyday; there’s no denying that there’s a lot of information that can be mined from that series.
I’ve tried to come up with a few different ways of looking at the series that will hopefully give you and understanding of who was in the series and maybe what that says about our society at the time. From a processing standpoint, I decided to scan multiple wiki sites about each episide to see which characters diehard fans noticed.
Now the first season consisted of a scant 13 episodes, but it also laid the groundwork for the entire series and introduced most of the major players of the Simpsons-verse. As time goes by, I’ll continue to add more seasons and we can see how the series changes over time.
The first way I thought about visualizing this series is by seeing how the most prominent characters interacted. To be included, I decided that each character needed to have at least 5% of the character mentions in an episode and they had to meet that threshold for at least 2 episodes. I then compared characters who met that threshold and appeared in the same episode. In the following NetworkX diagram, the size of the circle represents the total number of episodes that the character appeared in (with that 5% mention threshold). Meanwhile, the “stronger” the line is between two characters represents a larger number of episodes that the two characters appeared in.
So, you’ll see the two largest circles, with the most/strongest amount of connections are Homer and Bart. Next, come Lisa and Marge. Finally, come Maggie, Mr. Burns and Principal Skinner. Curiously enough, when Skinner was prominently featured, Lisa wasn’t. As time goes on and we see more and more episodes, this diagram should become more complicated and more informative.
The second way that I’d like to look at the season is to look at the demographics of the series. I’m going to first look at the Gender of the characters portrayed/mentioned. First, let’s look at Gender. As you can see in the graph below, despite making up half of humanity, women were portrayed, on average, only slightly more than 25% of the time; with a median value (middle value) of a lower 19.52%. The Lisa-featured “Moaning Lisa” (Episode 6) and the Marge-centric “Life on the Fast Lane” (Episode 9) were the most female friendly, with representations over 50%; while the Bart-centered “The Telltale Head” (Episode 8) only had a piddling 4.86%.
The other way of looking at the series is based upon the racial ethnicity of the characters being portrayed. Sure, the main family is Caucasian; but some of the series memorable characters are African-American (Carl, Dr. Hibbert, Lou) with some Asians (Apu, Akira) and Hispanics (Dr. Nick Rivera and Bumblebee Man) do appear. I was therefore shocked, to find that minority characters were all but invisible during the first season, with an average of .29% of character mentions being associated with minority characters and a median (middle) value of 0. Truth be told, 11 of the 13 episodes had virtually no minority characters do anything that warranted a mention in a wiki.
Now, let’s do a deeper dive into who was mentioned. The chart below shows character mentions, as based upon a percentage of the total character mentions for that episode. The chart is also color coded from purple (less than 1% of a mention in an episode), blue (between 1% and 5%), green (5% to 10%), gold (10% to 20%), orange (20% to 30%) and red (more than 30%). The actual percentage (rounded down to the nearest number) is placed in the block for that episode with the total percentage (rounded down to the nearest number again) in the total columns. It’s this rounding down, that is responsible for the episode-based numbers not equaling the total number (so Martin was in 7.62% of Episode 2 and 2.38% of Episode 10 – which totals 10%). If a character has a total of 10 points or more, they appear in this chart:
So, you can see that Bart and Homer are shown/discussed the most followed by the rest of the Simpsons family. You’ll then see reoccurring characters such as Krusty, Skinner, Nelson, Burns and Moe. Then there’s Jacques, the romantic bowler who tried to woo Marge away from Homer. He only appeared in a single episode, but he was fairly prominent in that episode. Rounding out the list are Milhouse and Martin and the kid’s table; with the distinction of Milhouse appearing in slivers of 5 episodes, while Martin appeared in only two, but was fairly prominent in one of them.
The last thing that I’ll discuss is the overall Ranking of characters. To do this, I considered 6 different factors:
(1) How many times the character was the titular character of an episode
(2) The number of times the character was the most prominent character in that episode
(3) The number of times that a character reached the 25% threshold
(4) The number of times that a character reached the 10% threshold
(5) The number of times a character was mentioned in an episode summary
(6) The total number of mentions (from the grid above)
I then divided the characters value by the maximum value and multiplied it by a 100, which would mean whoever did “best” in that category would get 100 points (yellow columns). Sum of those six different point totals and you get a total value between 0 and 600 (blue column).
Looking at these metrics, Bart and Homer are in a near tie: each having 2 titular rows, each being the most prominent character and crossing the 25% threshold in 5 episodes; each having hit the 10% threshold in 12 episodes and appearing in all 13 episodes. The sole difference is that Bart was mentioned 376 times (see the description in the GRID section above) while Homer was slightly lower with 350 mentions; so, Bart “wins” with 600 points, while Homer finishes up with just over 593.
After that, it’s a long way down before we get to #3 Marge (224 points) and #4 Lisa (216 points). It’s then another leap down to Krusty with 116 points. Krusty appeared in only one episode (#12 – Krusty Gets Busted) but he was the titular/most prominent character in that episode which boosts his rating.
We then see Maggie (80 points total) followed by Moe, who never hit the 10% threshold in any episode but did appear in 6 different episodes. Founding out the list are Skinner and Burns (each hitting 10% in one episode and appearing in 4; although Skinner had more mentions that Burns and thus beat out the power plant owner). Finally, we get Milhouse, who appeared in 5 separate episodes, but had a relatively low (11) number of mentions.
So, that’s a breakdown of Season 1 for you. Stay tuned for the next time when I tackle Season 2.