Placeholder Image

字幕列表 影片播放

  • The following content is provided under a Creative

  • Commons license.

  • Your support will help MIT OpenCourseWare

  • continue to offer high-quality educational resources for free.

  • To make a donation or to view additional materials

  • from hundreds of MIT courses, visit MIT OpenCourseWare

  • at ocw.mit.edu.

  • JULIAN SHUN: Hi, good afternoon, everyone.

  • So today, we're going to be talking

  • about graph optimizations.

  • And as a reminder, on Thursday, we're

  • going to have a guest lecture by Professor Johnson of the MIT

  • Math Department.

  • And he'll be talking about performance

  • of high-level languages.

  • So please be sure to attend the guest lecture on Thursday.

  • So here's an outline of what I'm going

  • to be talking about today.

  • So we're first going to remind ourselves what a graph is.

  • And then we're going to talk about various ways

  • to represent a graph in memory.

  • And then we'll talk about how to implement

  • an efficient breadth-first search algorithm, both serially

  • and also in parallel.

  • And then I'll talk about how to use graph compression and graph

  • reordering to improve the locality of graph algorithms.

  • So first of all, what is a graph?

  • So a graph contains vertices and edges,

  • where vertices represent certain objects of interest,

  • and edges between objects model relationships between the two

  • objects.

  • For example, you can have a social network,

  • where the people are represented as vertices

  • and edges between people mean that they're

  • friends with each other.

  • The edges in this graph don't have to be bi-directional.

  • So you could have a one-way relationship.

  • For example, if you're looking at the Twitter network,

  • Alice could follow Bob, but Bob doesn't necessarily

  • have to follow Alice back.

  • The graph also doesn't have to be connected.

  • So here, this graph here is connected.

  • But, for example, there could be some people

  • who don't like to talk to other people.

  • And then they're just off in their own component.

  • You can also use graphs to model protein networks, where

  • the vertices are proteins, and edges between vertices

  • means that there's some sort of interaction

  • between the proteins.

  • So this is useful in computational biology.

  • As I said, edges can be directed,

  • so their relationship can go one way or both ways.

  • In this graph here, we have some directed edges and then also

  • some edges that are directed in both directions.

  • So here, John follows Alice.

  • Alice follows Peter.

  • And then Alice follows Bob, and Bob also follows Alice.

  • If you use a graph to represent the world wide web,

  • then the vertices would be websites,

  • and then the edges would denote that there is a hyperlink

  • from one website to another.

  • And again, the edges here don't have to be bi-directional

  • because website A could have a link to website B.

  • But website B doesn't necessarily

  • have to have a link back.

  • Edges can also be weighted.

  • So you can have a weight on the edge that

  • denotes the strength of the relationship

  • or some sort of distance measure corresponding

  • to that relationship.

  • So here, I have an example where I am using a graph

  • to represent cities.

  • And the edges between cities have

  • a weight that corresponds to the distance between the two

  • cities.

  • And if I want to find the quickest way to get from city A

  • to city B, then I would be interested in finding

  • the shortest path from A to B in this graph here.

  • Here's another example, where the edge weights now

  • are the costs of a direct flight from city A to city B.

  • And here the edges are directed.

  • So, for example, this says that there's

  • a flight from San Francisco to LA for $45.

  • And if I want to find the cheapest

  • way to get from one city to another city,

  • then, again, I would try to find the shortest path in this graph

  • from city A to city B.

  • Vertices and edges can also have metadata on them,

  • and they can also have types.

  • So, for example, here's the Google Knowledge Graph,

  • which represents all the knowledge on the internet

  • that Google knows about.

  • And here, the nodes have metadata on them.

  • So, for example, the node corresponding to da Vinci

  • is labeled with his date of birth and date of death.

  • And the vertices also have a color

  • corresponding to the type of knowledge that they refer to.

  • So you can see that some of these nodes are blue,

  • some of them are red, some of them are green,

  • and some of them have other things on them.

  • So in general, graphs can have types and metadata

  • on both the vertices as well as the edges.

  • Let's look at some more applications of graphs.

  • So graphs are very useful for implementing queries

  • on social networks.

  • So here are some examples of queries

  • that you might want to ask on a social network.

  • So, for example, you might be interested in finding

  • all of your friends who went to the same high school as you

  • on Facebook.

  • So that can be implemented using a graph algorithm.

  • You might also be interested in finding

  • all of the common friends you have with somebody else--

  • again, a graph algorithm.

  • And a social network service might run a graph algorithm

  • to recommend people that you might know and want

  • to become friends with.

  • And they might use a graph algorithm

  • to recommend certain products that you

  • might be interested in.

  • So these are all examples of social network queries.

  • And there are many other queries that you

  • might be interested in running on a social network.

  • And many of them can be implemented

  • using graph algorithms.

  • Another important application is clustering.

  • So here, the goal is to find groups of vertices

  • in a graph that are well-connected

  • internally and poorly-connected externally.

  • So in this image here, each blob of vertices of the same color

  • corresponds to a cluster.

  • And you can see that inside a cluster,

  • there are a lot of edges going among the vertices.

  • And between clusters, there are relatively fewer edges.

  • And some applications of clustering

  • include community detection and social networks.

  • So here, you might be interested in finding

  • groups of people with similar interests or hobbies.

  • You can also use clustering to detect fraudulent websites

  • on the internet.

  • You can use it for clustering documents.

  • So you would cluster documents that

  • have similar text together.

  • And clustering is often used for unsupervised learning

  • and machine learning applications.

  • Another application is connectomics.

  • So connectomics is the study of the structure, the network

  • structure of the brain.

  • And here, the vertices correspond to neurons.

  • And edges between two vertices means

  • that there's some sort of interaction between the two

  • neurons.

  • And recently, there's been a lot of work

  • on trying to do high-performance connectomics.

  • And some of this work has been going on here at MIT

  • by Professor Charles Leiserson and Professor Nir Shavit's

  • research group.

  • So recently, this has been a very hot area.

  • Graphs are also used in computer vision--

  • for example, in image segmentation.

  • So here, you want to segment your image

  • into the distinct objects that appear in the image.

  • And you can construct a graph by representing the pixels

  • as vertices.

  • And then you would place an edge between every pair

  • of neighboring pixels with a weight that corresponds

  • to their similarity.

  • And then you would run some sort of minimum cost cut algorithm

  • to partition your graph into the different objects that

  • appear in the image.

  • So there are many other applications.

  • And I'm not going to have time to go through all of them

  • today.

  • But here's just a flavor of some of the applications of graphs.

  • So any questions so far?

  • OK, so next, let's look at how we can

  • represent a graph in memory.

  • So for the rest of this lecture, I'm

  • going to assume that my vertices are labeled in the range from 0

  • to n minus 1.

  • So they have an integer in this range.

  • Sometimes, your graph might be given to you

  • where the vertices are already labeled in this range,

  • sometimes, not.

  • But you can always get these labels

  • by mapping each of the identifiers

  • to a unique integer in this range.

  • So for the rest of the lecture, I'm

  • just going to assume that we have these labels from 0

  • to n minus 1 for the vertices.

  • One way to represent a graph is to use an adjacency matrix.

  • So this is going to be n by n matrix.

  • And there's a 1 bit in i-th row in j-th column

  • if there's an edge that goes from vertex I to vertex J,

  • and 0 otherwise.

  • Another way to represent a graph is the edgeless representation,

  • where we just store a list of the edges that

  • appear in the graph.

  • So we have one pair for each edge,

  • where the pair contains the two coordinates of that edge.

  • So what is the space requirement for each

  • of these two representations in terms of the number of edges m

  • and the number of vertices n in the graph?

  • So it should be pretty easy.

  • Yes.

  • AUDIENCE: n squared for the [INAUDIBLE]

  • and m for the [INAUDIBLE].

  • JULIAN SHUN: Yes, so the space for the adjacency matrix

  • is order n squared because you have n

  • squared cells in this matrix.

  • And you have 1 bit for each of the cells.

  • For the edge list, it's going to be order m

  • because you have m edges.

  • And for each edge, you're storing a constant amount

  • of data in the edge list.

  • So here's another way to represent a graph.

  • This is known as the adjacency list format.

  • And idea here is that we're going

  • to have an array of pointers, 1 per vertex.

  • And each pointer points to a linked list storing

  • the edges for that vertex.

  • And the linked list is unordered in this example.

  • So what's the space requirement of this representation?

  • AUDIENCE: It's n plus m.

  • JULIAN SHUN: Yeah, so it's going to be order n plus m.

  • And this is because we have n pointers.

  • And the number of entries across all of the linked lists

  • is just equal to the number of edges in the graph, which is m.

  • What's one potential issue with this sort of representation

  • if you think in terms of cache performance?

  • Does anyone see a potential performance issue here?

  • Yeah.

  • AUDIENCE: So it could be [INAUDIBLE]..

  • JULIAN SHUN: Right.

  • So the issue here is that if you're

  • trying to loop over all of the neighbors of a vertex,

  • you're going to have to dereference the pointer

  • in every linked list node.

  • Because these are not contiguous in memory.

  • And every time you dereference linked lists node,

  • that's going to be a random access into memory.

  • So that can be bad for cache performance.

  • One way you can improve cache performance

  • is instead of using linked lists for each of these neighbor

  • lists, you can use an array.

  • So now you can store the neighbors just in this array,

  • and they'll be contiguous in memory.

  • One drawback of this approach is that it

  • becomes more expensive if you're trying to update the graph.

  • And we'll talk more about that later.

  • So any questions so far?

  • So what's another way to represent the graph that we've

  • seen in a previous lecture?

  • What's a more compressed or compact way

  • to represent a graph, especially a sparse graph?

  • So does anybody