Network Analysis with Gephi— An Application Case : “The Simpsons” Season 4

Mehul VedFollowJun 27, 2018

“The Simpsons” needs no introduction. At 29 seasons and counting, it’s the longest-running scripted series in the history of American television. This blog covers the Network Analysis of “The Simpsons” Season 4 , using Gephi an open-source graph visualization tool.

The show’s longevity, and the fact that it’s animated, provides a vast and relatively unchanging universe of characters to study. It’s easier for an animated show to scale to hundreds of recurring characters; without live-action actors to grow old or move on to other projects.

Network Analysis with Gephi: An Application Case — “The Simpsons” Season 4

Background to the Application Case and the Dataset for “The Simpsons” Season 4:

The Datasets are from the co-appearance of cast of season 4 of “The Simpsons” TV soap. The Dataset comprises of two data files provided for “The Simpsons” TV Soap for Season4 :

  1. nodes.csv (vertices)
  2. edges.csv(edges)

Each vertex or node represents a character

Edges connect the vertices of pairs of characters who appear together in an episode

The size of a vertex encodes the number of episodes in which a character appears in a given season

Task at Hand

  1. Build a network based on given data and filter it on degree of nodes, for top characters in the episode. Give reasoning for choosing the number of characters you chose and methodology used.
  2. Build an ego network for Homer Simpson
  3. Top insights/inferences from network analysis of the dataset

Basic Statistics of the Data Set

Basic Statistics for The Simpson’s Season 4 Dataset

Cleaning up the Data

Two minor actions done for clean-up in the nodes.csv file

  1. All Characters ending with Jr., Sr., etc had to be realigned into the Label Column
  2. The Label for ID no. 159 was within double quotes e.g. “Just stamp the ticket man”
Data Cleaning — Step 1
Data Cleaning — Step 2

Task 1. a. : Overview of the Entire Network

The image below is an Overview of the Entire Network for the given data of “The Simpson’s” — Season 4

The diameter of the Nodes or Vertices, as well as the Font Size of the label — are proportional to the Degree of the Node/Vertices

Overview of the Entire Network for “The Simpsons” Season 4 Dataset

Key Highlights of the Network

213 Characters (represented by Nodes) have appeared in only 1 episode in Season 4

A total of 20 characters have appeared in more than 11 episodes out of 22

If we look at a minimum of 11 episodes for co-appearances of characters, it amounts to only 99 co-appearances out of a total of 11754 co-appearances

Task 1.b: Degree Distribution — to select the Top Characters

Degree Distribution — to select the Top Characters

The degree of a vertex of a graph is the number of edges incident to the vertex.

In this network, for a given vertex (= node = character), the degree of that node is the number of other characters with which they appear in an episode.

In terms of The Simpsons, this idea can be expressed as a sort of popularity rating of any given character, as that character appears with others very often.

Network Representation for Top Characters — based on Degree Distribution. Degree Distribution: Higher than 200

Network Representation for Top Characters — based on Degree Distribution. Degree Distribution: Higher than 200

Task 1.b: Another Metric to select the Top Characters — Weighted Degree Distribution

Weighted Degree Distribution: Higher than 500

The degree distribution can be taken further by taking into account the weight of the links at each node.

This becomes the weighted degree distribution shown in figure to the right

Network Representation for Top Characters — based on Weighted Degree Distribution. Weighted Degree Distribution: Higher than 500

Network Representation for Top Characters — based on Weighted Degree Distribution. Weighted Degree Distribution: Higher than 500

Task 2: Build an Ego Network for Homer Simpson

Ego Network for Homer Simpson

Ego Network for Homer Simpson
 
 Key Parameters:
 ID: 13
 Depth: 1
 Self: True
 
 Additional Filter:
 Edge Weight: 11 or greater

Task 3: Top insights/inferences from network analysis of the dataset

Insight 1: Top Co-appearances in “The Simpsons” Season 4

Top Co-appearances in “The Simpsons” Season 4

Insight 2: Top Non-Simpson Co-appearances in “The Simpsons” Season 4

Top Non-Simpson Co-appearances in “ The Simpsons” Season 4

Insight 3: Top 10 characters in “The Simpsons” Season 4, based on their PageRank Score

Top 10 characters in “The Simpsons” Season 4 based on their PageRank Score

Insight 4: The most obvious properties that the network tells is that the main family of the Simpsons (Homer, Marge, Bart, Lisa) are very well connected and hence important in the universe.This is not surprising as they are the characters that have been around the longest and are the main thread of most episodes

Insight 5: Modularity of the Network

The modularity of a network is a measure of how well it can be split into communities. It is generally thought that a high value for the modularity means that the network has a complex underlying community structure. These communities can sometimes have significant meaning in the network.

This is useful to analyse this group because it can reveal those connections that the other ”importance” measures such as edge weight and node degree, may have missed

As with class 1 it is quite large but this time contains only a few influential nodes. These still correspond to regular characters such as Barney Gumble, Clancy Wiggum and Principal Skinner but not the main family. This leads to the conclusion that this class corresponds to more peripheral characters.

Summary

The analysis of the network was done using Gephi an open-source graph visualization tool.

The report was produced for the coursework and contains many screenshots of the network from Gephi.

Thanks for reading through this blog post. Any suggestions for further improving this would be cheerfully solicited.

Source: Medium

Judith Chao Andrade

Apasionada del conocimiento, de compartirlo y de aprender de todo lo que me rodea, disfruto aprendiendo y realizando actividades. Actualmente estoy aprendiendo programación pero me fascinan los temas relacionados con los materiales especiales, las cuiriosidades, el humor, los eventos, las redes sociales ... Mi mayor interés podría decir que es no perder nunca la cuiriosidad por lo que si tienes un plan en mente solo proponlo !.

Deja una respuesta

Tu dirección de correo electrónico no será publicada.

X
X
X
X