Network Analysis with Gephi— An Application Case : “The Simpsons” Season 4
Tabla de contenidos
Mehul VedFollowJun 27, 2018
“The Simpsons” needs no introduction. At 29 seasons and counting, it’s the longest-running scripted series in the history of American television. This blog covers the Network Analysis of “The Simpsons” Season 4 , using Gephi an open-source graph visualization tool.
The show’s longevity, and the fact that it’s animated, provides a vast and relatively unchanging universe of characters to study. It’s easier for an animated show to scale to hundreds of recurring characters; without live-action actors to grow old or move on to other projects.

Background to the Application Case and the Dataset for “The Simpsons” Season 4:
The Datasets are from the co-appearance of cast of season 4 of “The Simpsons” TV soap. The Dataset comprises of two data files provided for “The Simpsons” TV Soap for Season4 :
- nodes.csv (vertices)
- edges.csv(edges)
Each vertex or node represents a character
Edges connect the vertices of pairs of characters who appear together in an episode
The size of a vertex encodes the number of episodes in which a character appears in a given season
Task at Hand
- Build a network based on given data and filter it on degree of nodes, for top characters in the episode. Give reasoning for choosing the number of characters you chose and methodology used.
- Build an ego network for Homer Simpson
- Top insights/inferences from network analysis of the dataset
Basic Statistics of the Data Set

Cleaning up the Data
Two minor actions done for clean-up in the nodes.csv file
- All Characters ending with Jr., Sr., etc had to be realigned into the Label Column
- The Label for ID no. 159 was within double quotes e.g. “Just stamp the ticket man”


Task 1. a. : Overview of the Entire Network
The image below is an Overview of the Entire Network for the given data of “The Simpson’s” — Season 4
The diameter of the Nodes or Vertices, as well as the Font Size of the label — are proportional to the Degree of the Node/Vertices

Key Highlights of the Network

213 Characters (represented by Nodes) have appeared in only 1 episode in Season 4
A total of 20 characters have appeared in more than 11 episodes out of 22

If we look at a minimum of 11 episodes for co-appearances of characters, it amounts to only 99 co-appearances out of a total of 11754 co-appearances
Task 1.b: Degree Distribution — to select the Top Characters

The degree of a vertex of a graph is the number of edges incident to the vertex.
In this network, for a given vertex (= node = character), the degree of that node is the number of other characters with which they appear in an episode.
In terms of The Simpsons, this idea can be expressed as a sort of popularity rating of any given character, as that character appears with others very often.
Network Representation for Top Characters — based on Degree Distribution. Degree Distribution: Higher than 200

Task 1.b: Another Metric to select the Top Characters — Weighted Degree Distribution

The degree distribution can be taken further by taking into account the weight of the links at each node.
This becomes the weighted degree distribution shown in figure to the right
Network Representation for Top Characters — based on Weighted Degree Distribution. Weighted Degree Distribution: Higher than 500

Task 2: Build an Ego Network for Homer Simpson

Ego Network for Homer Simpson
Key Parameters:
ID: 13
Depth: 1
Self: True
Additional Filter:
Edge Weight: 11 or greater
Task 3: Top insights/inferences from network analysis of the dataset
Insight 1: Top Co-appearances in “The Simpsons” Season 4

Insight 2: Top Non-Simpson Co-appearances in “The Simpsons” Season 4

Insight 3: Top 10 characters in “The Simpsons” Season 4, based on their PageRank Score

Insight 4: The most obvious properties that the network tells is that the main family of the Simpsons (Homer, Marge, Bart, Lisa) are very well connected and hence important in the universe.This is not surprising as they are the characters that have been around the longest and are the main thread of most episodes
Insight 5: Modularity of the Network
The modularity of a network is a measure of how well it can be split into communities. It is generally thought that a high value for the modularity means that the network has a complex underlying community structure. These communities can sometimes have significant meaning in the network.
This is useful to analyse this group because it can reveal those connections that the other ”importance” measures such as edge weight and node degree, may have missed
As with class 1 it is quite large but this time contains only a few influential nodes. These still correspond to regular characters such as Barney Gumble, Clancy Wiggum and Principal Skinner but not the main family. This leads to the conclusion that this class corresponds to more peripheral characters.
Summary
The analysis of the network was done using Gephi an open-source graph visualization tool.
The report was produced for the coursework and contains many screenshots of the network from Gephi.
Thanks for reading through this blog post. Any suggestions for further improving this would be cheerfully solicited.
Source: Medium