4 11 光譜聚類三步驟 7 17 高級 (4 11 Spectral Clustering Three Steps 7 17 Advanced)

字幕列表影片播放

So basically, let me quickly summarize what we have, learned so far.
Right, so if we want to apply spectral graph partitioning to
find two clusters in a given graph, there are three steps we have to go through.
In the first step, we have to do what is called pre-processing where we
construct a matrix representation of a graph.
In the second step, we go and compute the eigenvalue decomposition of this,
of this graph by identifying eigenvalues and eigenvectors.
In particular, we are interested in the, second smallest eigenvalue lambda 2 and
the corresponding eigenvector x.
And then, once we have the cor, the, the,
the vector x, all we have to do is we have to go and
do the grouping where we basically look at the components of ne, of x.
And determine which nodes belong to the set A, and
which nodes belong to the set B, right?
Kind of, who belongs to the left partition,
who belongs to the right partition.
So, let me now give you an example.
Right, so in the pre-processing step, I take our graph G and
compute the Laplacian matrix L.
In the second step, then, we do the eigenvalue decomposition, where basically
we take, our L when we find a set of eigenvalues and a set of eigenvectors.
To this matrix L,
we take the second smallest eigenvalue and the corresponding eigenvector.
Here it is.
Right? We take this, this one out.
So here are now all the nodes one to, 1 to 6.
And these are the corresponding entries of the eigenvector.
And what we find, for example, is that now what we have to do is,
we have to group these nodes, right?
We would like to group these nodes into clusters corresponding to the embedding or
the labels of the nodes that, that we have in the corresponding eigenvector.
So how do we do this grouping?
Doing the grouping is very simple.
We basically go and sort the components, of of the vector X, and
identify clusters, by splitting the vector in 2, so [COUGH].
A naive approach to this,
to this case would be to ask, what are the corresponding values that are negative,
what are the corresponding values that are positive, right.
So if this is our vector X, we would split it here between nodes 3 and 4.
These are all the nodes that are on the right-hand side of the 0.
These are all the nodes on the left-hand side of the 0.
Notice that some of the node labels equals to 0, exactly as we said,
as we said it should be.
Also notice the sum of the squares of the values equals to 1, which is,
again, exactly as it should be.
So what this basically means is, now, by splitting this vector in half into kind of
the positive part and the negative part, we identified the two clusters.
Here is the cluster A, which is composed of nodes 1 to 3,
and the cluster B that is composed of nodes, 4, 5, and 6.
So I have an A and I have a B, right?
So, basically, what the components of second smallest eigenvector here basically
embedded the nodes on a line, assigned them the positive and negative values.
We take all the positive nodes, put them into cluster A.
Take all the negative nodes,
nodes with negative values, put them in the cluster B.
So, now if I give you a bit more interesting example,
here I have graph on the left G that you see it has two clusters.
I computed the graph Laplacian.
I computed the lambda 2 and the corresponding eigenvector X.
And all I did here is now I sorted the vector X2, right, the corresponding
eigenvector to the second smallest eigenvalue by, by the entries.
And what you see very nicely is there is a set of components that
has a negative value, and a set of components that has a positive value.
So basically, the eigenvalues, or
the entries of the eigenvector with a positive value correspond to
one cluster and the remaining ones correspond to the second cluster.
Of course you can now start asking what,
is there anything interesting about the, eigenvectors that corresponds to,
let's say the third smallest eigenvalue, the fourth smallest eigenvalue, and so on.
And just to, for example, show you what happens that, the first case I'll
show you is a graph that I have here that actually contains four small clusters.
If the graph contains four small clusters and I still compute, lambda 2 and
the corresponding eigen vector X2, what, what I show you here is now the,
the components of this eigenvector and
you see how, how the entries now first are split into two clusters, right?
Kind of above zero and below zero.
But then you actually find that here, we have these different steps.
Right? So basically we have the,
first the two clusters, and
then each of the two clusters has two small clusters embedded in them.
And we see this kind of from the structure of the components of
the second eigenvector very nicely.
So the question is, if I have multiple, clusters, how would I go identify them?
So let me just show you some examples how to do this.
For example, if this is my graph G, the same as I had before,
I have my Laplacian matrix L that I do eigendecomposition.
For example, here are the components of vector X1, and
they all have exactly the same value, so exactly as we said and as we have proven.
But for example the components of the third,
smallest eigenvector are the following.
And now what you see here is for example that, this is the, the bottom cluster.
This is the, the second cluster.
Then this would be the third cluster and
here these component's correspond to the fourth cluster.
So basically what does this mean?
This gives us now an idea of how do we go if we want to
partition a graph not into clusters but in K of them.
There are two possible approaches.
One approach is to do recursive bipartitioning.
It's basically take the full graph, split it in two, and now for
each of the two pieces, again, kind of try to split it in two and we keep applying
this recursively in until we are getting kind of smaller and smaller pieces.
And another idea is to basically cluster using multiple eigenvalues and
multiple eigenvectors.
So basically the idea is that, for every node in the graph, we take our
Laplacian matrix computed the eigenvalue decomposition, and now we take the second
eigenvector, the third eigenvector, fourth eigenvector, which basically means that
every node of the graph is now described by a small vector of values.
And now we can apply k-means or something like that to to identify the clusters.
So basically the idea would be that we take the graph,
compute the Laplacian matrix.
From Laplacian matrix what we will do is for every node i,
we will come up with it's corresponding, in some sense, coordinates.
Right? So we will take the coordinate i of
second smallest eigenvector, coordinate i of the third smallest eigenvector, and
so on and so forth.
And now that every node is described by a set of coordinates, we can run k-means
that we have already discussed how to use, to identify, K clusters.
And it turns out in practice, this method, works really well.
So with this, we have finished the discussion of spectral clustering and
basically given a graph how to find individual clusters of nodes, in it.