Six Degrees of Kevin Bacon

A fun application of graph theory comes up in the popular game ``Six Degrees of Kevin Bacon''. The idea of the game is to connect an actor to Kevin Bacon through a trail of actors who appeared together in movies, and do so in less than six steps. For example, Theodore Hesburgh (President Emeritus of the University of Notre Dame) was in the movie ``Rudy'' with the actor Gerry Becker, who was in the movie ``Sleepers'' with Kevin Bacon. Why Kevin Bacon? For some reason the three students who invented the game, Mike Ginelli, Craig Fass, and Brian Turtle decided that Kevin Bacon is the center of the entertainment world. The Kevin Bacon game is quite similar to the Erdös Number which has ``been a part of the folklore of mathematicians throughout the world for many years''.

The ``Six Degrees of Kevin Bacon'' game is really a graph problem. If the user assigns each actor to a vertex, and add an edge between two actors if they have appeared together in a movie, then the user has a graph that represents the data for this game. Then the problem of finding a trail of actors to Kevin Bacon becomes a traditional graph problem: that of finding a path between two vertices. Since we wish to find a path that is shorter than six steps, ideally we would find the shortest path between the vertices. One might think to apply Dijkstra's shortest path algorithm to this problem, but that would be overkill since Dijkstra's algorithm is meant for situations when each edge has an associated length (or weight) and the goal is to find the path with the shortest cumulative length. Since we are only concerned with finding the shortest paths in terms of the number of edges the breadth-first search algorithm will solve the problem (and use less time than Dijkstra's).


Input File and Graph Setup

For this example we will use a much smaller graph than all movies and all actors. The full source code for this example is in examples/kevin-bacon.cpp. We have supplied a file kevin-bacon.dat which contains a list of actor pairs who appeared in the same movie. Each line of the file contains an actor's name, a movie, and another actor that appeared in the movie. The ";'' character is used as a separator. Here is an excerpt.

Patrick Stewart;Prince of Egypt, The (1998);Steve Martin

The first task will be to read in the file and create a graph from it. Use  std::ifstream to input the file.

  std::ifstream datafile("./kevin-bacon.dat");
  if (!datafile) {
    std::cerr << "No ./kevin-bacon.dat file" << std::endl;
    return EXIT_FAILURE;
  }

We will want to attach the actor's names to the vertices in the graph, and the movie names to the edges. We use the property class to specify the addition of these vertex and edge properties to the graph. We choose to use an undirected graph, because the relationship of actors appearing together in a movie is symmetric.

  using namespace boost;
  typedef adjacency_list<vecS, vecS, undirectedS, 
    property<vertex_name_t, string>,
    property<edge_name_t, string > >
  > Graph;

To access the properties, obtain property map objects from the graph. The following code shows how this is done.

  property_map<Graph, vertex_name_t>::type actor_name = get(vertex_name, g);
  property_map<Graph, edge_name_t>::type connecting_movie = get(edge_name, g);

Next we can start to loop through the file, parsing each line into a sequence of tokens using the Boost Tokenizer Library.

  for (std::string line; std::getline(datafile, line);) {
    char_delimiters_separator<char> sep(false, "", ";");
    tokenizer<> line_toks(line, sep);
    ...
  }

Each line of the input corresponds to an edge in the graph, with the two actors as the vertices incident to the edge, and the movie name will be a property attached to the edge. One challenge in creating the graph from this format of input is that the edges are specified by a pair of actor names. As we add vertices to the graph, we'll need to create a map from actor names to their vertices so that later appearances of the same actor (on a different edge) can be linked with the correct vertex in the graph. The std::map can be used to implement this mapping.

  typedef graph_traits<Graph>::vertex_descriptor Vertex;
  typedef std::map<string, Vertex> NameVertexMap;
  NameVertexMap actors;

The first token will be an actor's name. If the actor is not already in our actor's map we will add a vertex to the graph, set the name property of the vertex, and record the vertex descriptor in the map. If the actor is already in the graph, we will retrieve the vertex descriptor from the map's pos iterator.

  NameVertexMap::iterator pos; 
  bool inserted;
  Vertex u, v;

  tokenizer<>::iterator i = line_toks.begin();
  std::string actors_name = *i++;

  tie(pos, inserted) = actors.insert(std::make_pair(actors_name, Vertex()));
  if (inserted) {
    u = add_vertex(g);
    actor_name[u] = actors_name;
    pos->second = u;
  } else
    u = pos->second;

The second token is the name of the movie, and the third token is the other actor. We use the same technique as above to insert the second actor into the graph.

  std::string movie_name = *i++;
      
  tie(pos, inserted) = actors.insert(std::make_pair(*i, Vertex()));
  if (inserted) {
    v = add_vertex(g);
    actor_name[v] = *i;
    pos->second = v;
  } else
    v = pos->second;

The final step is to add an edge connecting the two actors, and record the name of the connecting movie.

  graph_traits<Graph>::edge_descriptor e;
  tie(e, inserted) = add_edge(u, v, g);
  if (inserted)
    connecting_movie[e] = movie_name;


A Custom Visitor Class

We create a custom visitor class to record the Kevin Bacon numbers. The Kevin Bacon number will be the shortest distances from each actor to Kevin Bacon. This distance has also been referred to as the ``Bacon Number'' in memory of the ``Erdos Number'' used to rank mathematicians according to how many publications separate them from Paul Erdos. The shortest distance to from Kevin Bacon to each actor will follow the breadth-first tree. The BFS algorithm identifies edges that belong to the breadth-first tree and calls our visitor's tree_edge function. We record the distance to the target vertex as one plus the distance to the source vertex.

  template <typename DistanceMap>
  class bacon_number_recorder : public default_bfs_visitor
  {
  public:
    bacon_number_recorder(DistanceMap dist) : d(dist) { }

    template <typename Edge, typename Graph>
    void tree_edge(Edge e, const Graph& g) const
    {
      typename graph_traits<Graph>::vertex_descriptor
	u = source(e, g), v = target(e, g);
	d[v] = d[u] + 1;
    }
  private:
      DistanceMap d;
  };
  // Convenience function
  template <typename DistanceMap>
  bacon_number_recorder<DistanceMap>
  record_bacon_number(DistanceMap d)
  {
    return bacon_number_recorder<DistanceMap>(d);
  }

We will use a vector to store the bacon numbers.

  std::vector<int> bacon_number(num_vertices(g));

Apply the Breadth-First Search

The BFS algorithm requires a source vertex as input; of course this will be Kevin Bacon. We now call the BGL BFS algorithm, passing in the graph, source vertex, and the visitor.

  Vertex src = actors["Kevin Bacon"];
  bacon_number[src] = 0;

  breadth_first_search(g, src, visitor(record_bacon_number(&bacon_number[0])));

We can output the Bacon number for each actor simply by looping through all the vertices in the graph and use them to index into the bacon_number vector.

  graph_traits<Graph>::vertex_iterator i, end;
  for (boost::tie(i, end) = vertices(g); i != end; ++i)
    std::cout << actor_name[*i] << "'s bacon number is " 
              << bacon_number[*i] << std::endl;

Note that vertex descriptor objects can not always be used as indices into vectors or arrays such as bacon_number. This is valid with the adjacency_list class with VertexList=vecS, but not with other variations of adjacency_list. A more generic way to index based on vertices is to use the ID property map (vertex_index_t) in coordination with the iterator_property_map.

Here are some excepts from the output of the program.

William Shatner's bacon number is 2
Denise Richards's bacon number is 1
Kevin Bacon's bacon number is 0
Patrick Stewart's bacon number is 2
Steve Martin's bacon number is 1    
....

Copyright © 2000-2001br> Jeremy Siek, Indiana University ([email protected])