My first ever computer for video was on digital images on really what I was talking about.
How we represented image in memory.
But a little bit of that was about multi dimensional arrays, because images are multi dimensional raise now, obviously, a lot of my researchers around deep learning we actually see multi dimensional raise quite a lot in deep burning, and they can get 45 and more dimensional, depending on what kind of date you're looking at.
So what I wanted to talk about today was the kind of nifty way we represent these in the computer because, of course, memories.
We don't represent memory in a multi dimensional ray.
It's just linear address space.
So how do we map from five dimensions down to one very, very quickly so that we can perform all this very, very fast, deep learning on it if you use any kind of new miracle library.
So, for example, numb pie or Matt Lab, or are, or any of the tensile I bruise like tend to flow or pytorch, they have multi dimensional arrays in them so, Pytorch recalls.
Intentions tend to flow calls and tenses on.
They are the kind of the building block of your deep network.
They represent the date of your passing around, and these tenses are often quite large on multiple.
They have multiple dimensions.
So maybe five dimensions or four dimensions is quite common because these actually represented in memory is a linear array.
We have to decide how to actually indicts him quite quickly.
On the whole point of these libels.
Is there really, really fast?
No one's going to be on a train and eat network.
If you're spending ages copying array bits around and moving things around, it has to be super quick.
Let's start with a really basic definition of a two dimensional away, right, which is kind of going back to our first video on talking about an image, right?
So I'm gonna talk about images could I'd spend most of my time doing.
Actually, it doesn't really matter what data you're representing.
It's just a multi dimensional way.
So let's imagine that we have an image which is currently gray scale.
So it's a little box on it has.
This is why and this is X.
Let's say it's 100 pixels by 100 pixels.
Quick, some of my head suggests to me that there's probably 10,000 pictures in this image and each one has some value.
Now, we don't actually store in ma'am 100 by 100.
That doesn't make sense.
We store worn by 10,000 array, and we have to make sure that when we want a specific pixel, we jump to the right point.
So what we need to define is a few items.
So we're gonna define a size off the array, and that's going to be 100 by 100.
I'm starting on the right.
We'll see later as I add more dimensions.
So if I'm gonna run out of space and then we have an offset which it's going to be zero now, that offset is where do we start in our memory of space?
Now, you might change your offset if you're doing certain operations.
We're not gonna talk about that at the moment, so it's gonna keep that zero and then we have our strike.
I mentioned this in my original video.
What we use is some people with stride, which is the whip of a row of an image bearing in mind any padding.
And that depends on the former stride represents how far we have to go in memory to get to the next one of a certain dimension, right?
So in this case, to go from one X to the next X is just one pixel across.
So that's going to be a strike of one.
To get form one road to the next row, we have to go 100 pictures along and wrapped around to the next one, and so that's going to be a stride of 100.
So suppose we want to find the fourth row in this image.
What we would do is we would do four times by the wife's tried, and that would give us the index into the fourth row.
In terms of memory, it's just going along a long list to the right point.
So this list is one long list and we're jumping to the correct position for that row and then we can read the X values off.
The way I've written is here is kind of general, so we can add a dimension to this.
So let's imagine we no longer have just one great scale channel.
We have three channels are G and B Right now, I'm gonna represent this as additional channels forming another dimension at the back here.
So this is our ji and be so we have three of these in this direction.
So this is our sort of channel dimension.
You could say it said, but we're gonna add more dimensions.
Let's not run out of letters to quickly.
So to do this, what we have to do is we have to add another entry into our size in another entry into our stride.
So to go from one road to the next is moving 100 pixels.
But to go from one image to the next on one channel to the next is going to be 100 times 100 pixels, which is 10,000.
So that's our stride for this next dimension along on the size in this dimension is going to be three.
That makes sense.
Obviously, I've just added three dimensions because I've added them behind like this.
I've sort of messed it up.
I'm gonna have to reallocate memory, and this is going to sort of get reset a zit were.
So officer is still zero.
But our new origin point is this one here.
We're moving this way.
I just think it's easier to see them coming this way rather than I mean this this hour pointing the wrong way.
But we won't dwell on it.
So why X men?
I know Zed or channel or whatever it is you want.
So offset is still zero.
So this is gonna be here.
All right, so this is a very origin of our data on.
As we move along in memory, we're gonna go down through this first image.
Then we're gonna go to the next image.
We're going down for this image of next image down through it all the way to the end.
Suppose you want to find out what the index for some low in the middle channel is.
What we have to do is we have to do so.
This is the first index into the channel.
So is one times 10,000 plus zero times 100 plus your x times one, and that will give you the exact point in memory.
Linear memory for the pixel on the second plane.
We're gonna run out of space.
I tried not to, but it's gonna happen, Right?
So let's imagine we want to add another dimension.
So now we don't have a X Y image, which has three channels.
We have a number of X Y images that have free channels, but they're all represented in the same place in memory.
They each have their location, so we'll get We'll get some drawing going on.
This is gonna be the next one.
Well, that's just bad.
And then I'm in this way like this, and it comes off the screen way.
You know what?
We will ignore this one, because off the page and we're not gonna be in a drawer, probably on it.
So we're going to start off set here.
Now, this is our new offset.
But this is our position in memory, off the first pixel in all of our data.
And this is the next dimension going this way.
And then you've got the channel dimension, and then you've got the X and Y dimensions going this way.
We need bigger paper for my dimensions.
I'm gonna have to add another stride and another size in here.
So this happens to be three as well.
We're gonna add another size of three to go through.
Each of these dimensions is 10,000 on.
There were three of them, so it's gonna be 10,000 times three is 30,000.
We're going to need to jump between each of these indexes in this dimension, So this is gonna be 30 1000 like this.
So now if you want to jump to this one, we go 012 in that dimension Times 30,000 plus whichever channel we want times 10,000 plus the wine, the X on We can go right to exactly where we want.
But the thing to remember is that this is always when it's declared this way.
Always just worn long line in memory.
Now we can fiddle about with a stride and offsetting stuff.
Do clever things that would be pats win over time so we'll add another dimension right now in five dimensions.
Now I don't like to think I had to visualize five dimensions.
I just think of it is groups of groups of groups because that's how I see it in my mind, possesses how is in memory.
So now we're going to add another one.
I should have added fewer dimensions.
It would've taken me less time to draw.
Maybe I could get this one more square than the one before still elsewhere.
So now this is our next dimension coming down this way.
So this is now five dimensional.
Now haven't got room for my stride, but this stride is going to be 30,000 times by a number three, which is 90,000 times gonna put 90 k in here on the size is going to be too in this dimension on the officer, it is now this position here, right, which is the beginning of all of our data.
I'm just coming this direction.
That's why you could add more dimensions in the midst of a direction if you want to now.
So if I want to go down in this dimension, I'm gonna have to do 01 time by 90,000 plus one times 30,000 and so on.
And I can jump straight to be exact position I want in memory now, this is actually surprisingly, there's a few things about this that make it very, very quick.
The first is actually each plane of data.
So each image channel is actually contiguous in memory almost all the time.
There are things you can do with offsets and strides and so on to mess around with that property.
But for the sake of argument, if this is contributing memory copping to inform it.
Bovary quick doing operations on these channels over a quick so you can say to, Let's say, some layer in a convolution all network.
You can say Do me a matrix multiplication on this little bit here through all these channels and it will go off and it will do that.
And it can look in the correct place in memory based on all of these values.
So we have to copy that anywhere else and protests it.
We can just use this thing on.
Just look into it, which is a super neat way of looking into this kind of side of data, and you can have as many dimensions as you want.
It really makes no difference.
It's just about how many entries you having the size and the stride less is on.
Basically, which one you multiply by what dimensions so on when you increase.
Yes, that's exactly what?
It's unlikely that you would use it to show even brighter red because usually 255 would mean a ZX red as you could get, Let's say so.
You would just have a fine and range of colors in between.