Placeholder Image

字幕列表 影片播放

  • So we are starting with a new topic.

  • The topic we will discuss today is, is called dimensionality reduction.

  • And the idea here is basically that we will learn about techniques that will

  • later become very handy when we will talk about recommender systems, and

  • in particular latent factor recommender systems.

  • So let me give you an idea of what the problem of dimensionality reduction

  • is all about.

  • So basically our assumption is that we have a set of data points.

  • Think of them as points in a plane or points in a three-dimensional space.

  • And the idea is that these points are not just randomly scattered through the space,

  • but they, they li, lie in a subspace of it.

  • So for example, here, here I have two cases of this.

  • You could imagine that you have a set of data in a two-dimensional plane, but

  • the data is not only kind of randomly scattered through this plane,

  • but it it it is only scattered across a small subspace of it.

  • So for example in the first case, we have our we have the data points that are that

  • are embedded on this particular line so maybe a more better representation of

  • this data is not in this two-dimensional space but it's basically just

  • where where in the length of the line is, is a given data point.

  • Or, for example, in the second case, we have we, we are drawing a case where we

  • have points embedded in a three-dimensional space, but again, these

  • point, points are not randomly scattered through space, but basically, they are,

  • they are, they all lie on this single plane that is embedded in this space.

  • So basically the idea for axes can we go and discover such data in presentation.

  • So if I give you another clear set of data can we go identify what are the main

  • axes along original data is represented or embedded.

  • So in particular, in this second case,

  • we have these 2 are an axis where all the data lies.

  • So our goal in some sense will be that we want to find a sub space

  • that effectively represents all the data in that we are given.

  • So, let me just give you a complete example, right.

  • So, our goal, in a sense, would be that we want to compress or

  • reduce the dimensionality or the size of the data representation.

  • So the way we can think of this is that we are given a big table with a, large number

  • of rows, let's say millions of rows, and also a large number of, of columns.

  • And what we can think of, of this,

  • of this kind of table is that every row represents a different data point.

  • And every column represents a different coordinate or a dif, different dimension.

  • And our goal is that we take this set of data and

  • identify kind of more compact or fewer dimensional representations.

  • So in a sense, we would like to keep all the rows.

  • But we would like to shrink the number of columns.

  • While, while stoll, still preserve the richness of a da, of the data set.

  • So, for example, let's look at the the table that I have here.

  • I have, for example, a table where every row is a different customer and

  • every column is a different time of the day, where every entry stores how many.

  • But how many of particular transactions or

  • particular products need a particular customer to buy.

  • And for example what we see in this particular case is that even though we

  • have five different days so five different columns,

  • our data is not really in some sense five dimensional but it's only two dimensional.

  • What do I mean by this is that for example all the first four rows and

  • the first three columns, they're basically all multiplications of one another, right?

  • So since I have a set of customers that all buy products on

  • the first in the first three columns and they do nothing on the last.

  • Two and then I have another set of let's say, customers.

  • That they will make transactions over the weekends.

  • And they don't do anything over the week.

  • Right? So, in some sense,

  • rather than representing every customer now with the with a set of five values.

  • I can, I can simply represent this data with a.

  • With a set of.

  • Two two coordinate vectors, plus a value of which,

  • in some sense, which dimension or which cluster it belongs to, right?

  • So for example, this matrix that I showed you is really two dimensional, where every

  • row is simply a multiplication of one of the, one of the two vectors of 1s and 0s.

  • So basically the idea for

  • us will be can we identify this kind of low low level of representation of data.

  • So let me explain a concept that will be very important for

  • us to think about this, right?

  • So we are thinking that our data comes in the form of a matrix right.

  • So we can think of matrix basically as every line giving us,

  • giving us coordinates of a point in some d-dimensional space.

  • So we have our data point, we have some number of data points, and we have some

  • number of columns which is corresponds to the dimensionality of the data.

  • And now the question is, what is the real intrinsic dimensionality to that data set?

  • And the concept we need to explain is the concept of a rank of a matrix.

  • And we will say that the rank of a matrix A is simply the number of

  • linearly independent columns of A.

  • So let me give you an example.

  • So for, for in, in here is an example.

  • You can see that the matrix A that has three rows and three columns.

  • And the rank of this matrix equals 2.

  • Why's the rank of this matrix equal to 2?

  • Is because it has 2 linear, linearly independent rows in this case.

  • What do we notice for example is that, I can,

  • that the row number 3 is simply the sum of rows one and two.

  • So the, the third row of this,

  • of this matrix can be represented as a linear combination of rows one and two.

  • So in this case our matrix is really two dimensional.

  • Even I have a, I have data, in three dimensions.

  • I have three columns, this matrix is really two dimensional.

  • So how can we think about this is the following?

  • I can basically think that there are really like two basis vectors or

  • two coordinate vectors in my in my space first one corresponds to

  • the first row second one corresponds to the second row and then what I can do

  • now is I can represent every data point as a linear combination of these two vectors.

  • So for example,

  • the first row can simply be represented as a vector of one and zero.

  • Which means that I only take the,

  • the first, the first vector and I take zero of the second vector.

  • For example the, the second row of my matrix say,

  • can be represented now as a vector of one ze, zero one because I'm only taking the.

  • The second of my two basis vectors.

  • And for example the last row which is a sum of the rows one and

  • two can be simply represented as with a vector one one.

  • So why is this intuition interesting.

  • This intuition is important because I could think of

  • now data as being some points in high dimensional space.

  • I can think of the data being represented as a matrix where, as I mentioned before,

  • every data point is a row in this matrix, and every column is a separate dimension.

  • And what I can do now, I can think of this as doing dimensionality reduction, right?

  • So for example, if I'm given the matrix, on the top, I can basically take and

  • rewrite this, the coordinates of these points.

  • Instead of using three coordinates, using only two coordinates, right?

  • So if I use my original coordinate space, where basically I have axis aligned.

  • Vectors that describe coordinates of my space.

  • So I have a one and then two zeros, and a zero one zero, and zero zero one.

  • So this is x, y, and z coordinate.

  • Then every, in this coordinate system,

  • every data point simply corresponds to the, to the, to the row of my matrix.

  • But, what I can also do is I can come and invent a new coordinate system.

  • Imagine I invent the second one, where I only have two, two vectors.

  • So basically, I want to represent every data point.

  • With two coordinates and every what is mean this means that I want to

  • represent every data point as linear combination of the, of the two vectors.

  • And as I mentioned before now in this new coordinate space I

  • can represent the coordinates of every point using only, only two values, right?

  • And I can still reconstruct the or, the original coordinate values.

  • So what does this mean is in some sense that we, we, reduce the dimensionality or

  • we compressed the date in a sense that now I need a fewer num,

  • number of coordinates to describe the location of every point right and

  • this is what the the role of dimensionality deduction is.

  • So, really the way we can think of dimensionality deduction is that we have

  • a set of data points embedded in some some high dimensional space as in this case I

  • have two dimensional space but clearly the data is in high dimensions but only spends

  • a small dimensional spart, part of it so as in this case, I have a set of points.

  • That are, that I, that I'm given in, in two-dimensional space but in reality

  • these points simply fall on a line and i would like to discover that these points

  • are imbedded in a small, small subspace and I would like to present now or

  • compress the dimensionality of every point to this small coordinate subspace.

  • And what is important here for example in this particular case is that.

  • I can now think of representing the coordinates of every point,

  • using kind of two dimensions.

  • I can represent that position along the, the, the red line.

  • And I can represent it with the coordinate that tells me how far away

  • from the red line is a given data point.

  • And what is interesting now that's,

  • is that I can say that instead of representing, still using two coordinates.

  • I can could only represent using one coordinate.

  • So meaning, I would forget about how far from the red line a point and I would

  • only care about the location on the red line where the point can be projected.

  • And this way I would be able to represent every point with a single,

  • with a single number, basically the position of it along the red line and

  • I would incur a bit of an error.

  • Right so what we will be doing is we will be in some sense trying to use a,

  • a smaller representation of our data as possible.

  • So as few columns as possible while also including as little error as

  • possible right so,

  • what will what will the game we will be playing is between having a smaller data

  • representation while also trying to incur as little error as possible.

  • So the way we will do this and why we would want to do this is, is the following

  • right why would I want to dis, discuss do the dimensionality reduction.

  • So the first thing is I would want to for

  • example discover hidden correlations in my data.

  • And sometimes I would like to discovered really the,

  • the latent dimensions along the which d, along which the data varies.

  • So this is particularly useful if I think of my da, data as, as my points.

  • I think of them as documents.

  • Right so I can take every document, represent it as a very long vector,

  • where this vector has only values zero and one where zero means a given word.

  • You know, the Kth word does not appear in the document, and

  • one means the word appear, appears in the document.

  • And my goal, for example, would be to identify what are the axes along.

  • Which these, the documents, are spread in this,

  • all possible words kind of space and what we would find out is that here,

  • documents are basically align themselves along different axes that

  • correspond to topics like, like sports, politics, technology and so on.

  • Another in, useful thing that we would want to do is for example many times we

  • can take a large data set and represent it as a much smaller data set.

  • In some sense that basically we,

  • we are able to remove or get rid of noisy features so, or noisy columns because.

  • There our data is not wearing too much, too much.

  • So we can kind of get rid of, of that part of the data while still preserving more,

  • most, most, most of it.

  • So this is the idea in some sense to do remove, to remove noise from

  • the data to remove noise and redundant features or noise and redundant columns.

  • Another way why we, we may want to do this.

  • Is that we want to, for example, be able to interpret or visualize data.

  • What this means is that we can have very high dimensional data and

  • we can reduce the dimensionality of it, maybe just to two or three dimensions.

  • And plotting two or three dimensions is very easy, right?

  • We can kind of plot it on the screen.

  • So, that's another case.

  • And, of course, one important application is that, many ties, times, we want to

  • reduce dimensionality of the data so that kind of the data size also shrinks,

  • which means it's easier to store, process and analyze the data afterwards, right?

  • So these are all the reasons why I would want to, in some sense, find as low or

  • dimens, dimension of representation of a given set of data.

So we are starting with a new topic.

字幕與單字

單字即點即查 點擊單字可以查詢單字解釋

B1 中級

5 6 維度還原介紹 12 01 (5 6 Dimensionality Reduction Introduction 12 01)

  • 10 2
    HaoLang Chen 發佈於 2021 年 01 月 14 日
影片單字