## 字幕列表 影片播放

• The following content is provided under a Creative

• Your support will help MIT OpenCourseWare

• To make a donation or to view additional materials

• from hundreds of MIT courses, visit MIT OpenCourseWare

• at ocw.mit.edu.

• JULIAN SHUN: Hi, good afternoon, everyone.

• So today, we're going to be talking

• And as a reminder, on Thursday, we're

• going to have a guest lecture by Professor Johnson of the MIT

• Math Department.

• And he'll be talking about performance

• of high-level languages.

• So please be sure to attend the guest lecture on Thursday.

• So here's an outline of what I'm going

• to be talking about today.

• So we're first going to remind ourselves what a graph is.

• And then we're going to talk about various ways

• to represent a graph in memory.

• And then we'll talk about how to implement

• an efficient breadth-first search algorithm, both serially

• and also in parallel.

• And then I'll talk about how to use graph compression and graph

• reordering to improve the locality of graph algorithms.

• So first of all, what is a graph?

• So a graph contains vertices and edges,

• where vertices represent certain objects of interest,

• and edges between objects model relationships between the two

• objects.

• For example, you can have a social network,

• where the people are represented as vertices

• and edges between people mean that they're

• friends with each other.

• The edges in this graph don't have to be bi-directional.

• So you could have a one-way relationship.

• For example, if you're looking at the Twitter network,

• Alice could follow Bob, but Bob doesn't necessarily

• have to follow Alice back.

• The graph also doesn't have to be connected.

• So here, this graph here is connected.

• But, for example, there could be some people

• who don't like to talk to other people.

• And then they're just off in their own component.

• You can also use graphs to model protein networks, where

• the vertices are proteins, and edges between vertices

• means that there's some sort of interaction

• between the proteins.

• So this is useful in computational biology.

• As I said, edges can be directed,

• so their relationship can go one way or both ways.

• In this graph here, we have some directed edges and then also

• some edges that are directed in both directions.

• So here, John follows Alice.

• Alice follows Peter.

• And then Alice follows Bob, and Bob also follows Alice.

• If you use a graph to represent the world wide web,

• then the vertices would be websites,

• and then the edges would denote that there is a hyperlink

• from one website to another.

• And again, the edges here don't have to be bi-directional

• because website A could have a link to website B.

• But website B doesn't necessarily

• have to have a link back.

• Edges can also be weighted.

• So you can have a weight on the edge that

• denotes the strength of the relationship

• or some sort of distance measure corresponding

• to that relationship.

• So here, I have an example where I am using a graph

• to represent cities.

• And the edges between cities have

• a weight that corresponds to the distance between the two

• cities.

• And if I want to find the quickest way to get from city A

• to city B, then I would be interested in finding

• the shortest path from A to B in this graph here.

• Here's another example, where the edge weights now

• are the costs of a direct flight from city A to city B.

• And here the edges are directed.

• So, for example, this says that there's

• a flight from San Francisco to LA for \$45.

• And if I want to find the cheapest

• way to get from one city to another city,

• then, again, I would try to find the shortest path in this graph

• from city A to city B.

• Vertices and edges can also have metadata on them,

• and they can also have types.

• So, for example, here's the Google Knowledge Graph,

• which represents all the knowledge on the internet

• And here, the nodes have metadata on them.

• So, for example, the node corresponding to da Vinci

• is labeled with his date of birth and date of death.

• And the vertices also have a color

• corresponding to the type of knowledge that they refer to.

• So you can see that some of these nodes are blue,

• some of them are red, some of them are green,

• and some of them have other things on them.

• So in general, graphs can have types and metadata

• on both the vertices as well as the edges.

• Let's look at some more applications of graphs.

• So graphs are very useful for implementing queries

• on social networks.

• So here are some examples of queries

• that you might want to ask on a social network.

• So, for example, you might be interested in finding

• all of your friends who went to the same high school as you

• So that can be implemented using a graph algorithm.

• You might also be interested in finding

• all of the common friends you have with somebody else--

• again, a graph algorithm.

• And a social network service might run a graph algorithm

• to recommend people that you might know and want

• to become friends with.

• And they might use a graph algorithm

• to recommend certain products that you

• might be interested in.

• So these are all examples of social network queries.

• And there are many other queries that you

• might be interested in running on a social network.

• And many of them can be implemented

• using graph algorithms.

• Another important application is clustering.

• So here, the goal is to find groups of vertices

• in a graph that are well-connected

• internally and poorly-connected externally.

• So in this image here, each blob of vertices of the same color

• corresponds to a cluster.

• And you can see that inside a cluster,

• there are a lot of edges going among the vertices.

• And between clusters, there are relatively fewer edges.

• And some applications of clustering

• include community detection and social networks.

• So here, you might be interested in finding

• groups of people with similar interests or hobbies.

• You can also use clustering to detect fraudulent websites

• on the internet.

• You can use it for clustering documents.

• So you would cluster documents that

• have similar text together.

• And clustering is often used for unsupervised learning

• and machine learning applications.

• Another application is connectomics.

• So connectomics is the study of the structure, the network

• structure of the brain.

• And here, the vertices correspond to neurons.

• And edges between two vertices means

• that there's some sort of interaction between the two

• neurons.

• And recently, there's been a lot of work

• on trying to do high-performance connectomics.

• And some of this work has been going on here at MIT

• by Professor Charles Leiserson and Professor Nir Shavit's

• research group.

• So recently, this has been a very hot area.

• Graphs are also used in computer vision--

• for example, in image segmentation.

• So here, you want to segment your image

• into the distinct objects that appear in the image.

• And you can construct a graph by representing the pixels

• as vertices.

• And then you would place an edge between every pair

• of neighboring pixels with a weight that corresponds

• to their similarity.

• And then you would run some sort of minimum cost cut algorithm

• to partition your graph into the different objects that

• appear in the image.

• So there are many other applications.

• And I'm not going to have time to go through all of them

• today.

• But here's just a flavor of some of the applications of graphs.

• So any questions so far?

• OK, so next, let's look at how we can

• represent a graph in memory.

• So for the rest of this lecture, I'm

• going to assume that my vertices are labeled in the range from 0

• to n minus 1.

• So they have an integer in this range.

• Sometimes, your graph might be given to you

• where the vertices are already labeled in this range,

• sometimes, not.

• But you can always get these labels

• by mapping each of the identifiers

• to a unique integer in this range.

• So for the rest of the lecture, I'm

• just going to assume that we have these labels from 0

• to n minus 1 for the vertices.

• One way to represent a graph is to use an adjacency matrix.

• So this is going to be n by n matrix.

• And there's a 1 bit in i-th row in j-th column

• if there's an edge that goes from vertex I to vertex J,

• and 0 otherwise.

• Another way to represent a graph is the edgeless representation,

• where we just store a list of the edges that

• appear in the graph.

• So we have one pair for each edge,

• where the pair contains the two coordinates of that edge.

• So what is the space requirement for each

• of these two representations in terms of the number of edges m

• and the number of vertices n in the graph?

• So it should be pretty easy.

• Yes.

• AUDIENCE: n squared for the [INAUDIBLE]

• and m for the [INAUDIBLE].

• JULIAN SHUN: Yes, so the space for the adjacency matrix

• is order n squared because you have n

• squared cells in this matrix.

• And you have 1 bit for each of the cells.

• For the edge list, it's going to be order m

• because you have m edges.

• And for each edge, you're storing a constant amount

• of data in the edge list.

• So here's another way to represent a graph.

• This is known as the adjacency list format.

• And idea here is that we're going

• to have an array of pointers, 1 per vertex.

• And each pointer points to a linked list storing

• the edges for that vertex.

• And the linked list is unordered in this example.

• So what's the space requirement of this representation?

• AUDIENCE: It's n plus m.

• JULIAN SHUN: Yeah, so it's going to be order n plus m.

• And this is because we have n pointers.

• And the number of entries across all of the linked lists

• is just equal to the number of edges in the graph, which is m.

• What's one potential issue with this sort of representation

• if you think in terms of cache performance?

• Does anyone see a potential performance issue here?

• Yeah.

• AUDIENCE: So it could be [INAUDIBLE]..

• JULIAN SHUN: Right.

• So the issue here is that if you're

• trying to loop over all of the neighbors of a vertex,

• you're going to have to dereference the pointer

• in every linked list node.

• Because these are not contiguous in memory.

• And every time you dereference linked lists node,

• that's going to be a random access into memory.

• So that can be bad for cache performance.

• One way you can improve cache performance

• is instead of using linked lists for each of these neighbor

• lists, you can use an array.

• So now you can store the neighbors just in this array,

• and they'll be contiguous in memory.

• One drawback of this approach is that it

• becomes more expensive if you're trying to update the graph.

• And we'll talk more about that later.

• So any questions so far?

• So what's another way to represent the graph that we've

• seen in a previous lecture?

• What's a more compressed or compact way

• to represent a graph, especially a sparse graph?

• So does anybody