Name: 增強現實#1 - 光流(C++) (Augmenting Reality #1 - Optical Flow (C++))
Uploaded: 2021-01-14T10:13:54.000Z
Duration: 24 min 34 s
Description: 【看影片學英語】數萬部 YouTube 影片，搭配英漢字典即點即查，輕鬆掌握單字發音與用法，長久累積看電影不必再看字幕。

Oh, welcome to part one of an infrequent Siri's about augmenting reality in this series.

未來進行式

I'm going to be learning about exploring the algorithms that operate behind the scenes on enable augmented reality applications.

And I think a good place to start will be with one of the more enjoyable algorithms optic flow.

Before we get stuck into the algorithm, I thought it might be worth having a look at the set up in what I've got.

Here is my webcam at a low resolution displaying a grayscale image at the command prompt on after the whole video about that already.

I'm not going to go into the details of the Webcam application just here on we can see.

I've got a red rectangle stuck to my head, and the idea of optic flow is I can create a velocity map that allows me to interact with things on the scenery so you can see I can manipulate the rectangle with my hands.

I'm doing it rather gingerly and slowly, because if I hit the rectangle, it gets more velocity.

So there's some sort of physics involved with how the rectangle behaves, and if the rectangle goes off the screen.

I've made it wrap around to the bottom and we can see now rather, unfortunately, it's got stuck in my microphone.

So if I move my hands around, you can see the microphones in the way s I need to just go in front of the microphone on lift up the object.

So this is a crude form of augmented reality.

But the algorithm behind it is quite nice, and it also covers some of the fundamentals off image processing.

What I'm tying to demonstrate is that for the most part, the object behaves according to how I'm interacting with it.

So I'll bring the rectangle to the middle of the screen and just try and leave it the You see, I can push it to the left on.

I can push it back to the right a little bit.

I try and grab hold of it from above and bring it back down.

Let's start by considering how motion is determined between two successive images.

So here I've got the two pictures on the left, which are frames taken from a camera onda T minus zero.

Here is the most recent frame, and T minus one is the frame.

We can see that the little Chappy inside the frame has waved his arm.

It's moved a little bit, and if we subtract the two images, what we actually get is not the whole body.

But we just get the two locations that have changed.

So we'll get the original home plus the new arm on.

Depending on how the image is encoded on the type of the movement and the luminess on the screen, one of these will be positive on.

And if we take the absolute values of all of the pixel values, I set the multi positive.

We can work out an area where motion has occurred.

But all this map will tell us is that motion has happened and whereabouts in the map.

We don't know what direction, and we don't know how much.

Because in the world of image processing, our motion is represented as just being the difference between two successive frames.

Before we start working out the motion vector for individual pixels, let's consider how we would do it for the whole image.

So I want to imagine a scenario where the camera is moving, but the scene that the camera's viewing remains mostly static.

In this case, I've got two successive frames shown by the black, and the blue image on our brains can work out that actually the even though they're similar, one of the frames has moved slightly.

In fact, the camera must have moved in this direction because we can see that the little man has moved towards the left of the frame.

If we overlay the frames precisely, we can see this more clearly, and the result we're interested in is how much movement has occurred.

And there's really only one way to test this now, because this is the first video in this series.

I'm trying to keep things conceptually, simply know, So I'm not going to be looking into the more advanced routines and feature extraction routines, which would typically be used to solve this problem.

Instead, we're going to brute force it and so to calculate the motion back.

So the only thing we can do is to physically test the new frame in a variety of different positions to see where does it line up with the old frame?

And so we would just need to algorithmic Lee, overlay the two and see where they have the closest match.

I worries the difference between them, the minimum.

And once you found the closest match, we can observe the nature of the vector that we need to represent that motion.

And don't forget, this is for the whole image moving.

This is if the camera is moving on because it's the whole image.

We can assume that the factor has its origin of the middle of that image, and we only have the one vector for the whole image, and this is our global optic flow on your computer.

Mouse operates on a similar principle to this.

It rapidly takes images in succession and tries to work out how the image is transformed in the two D plane from one location to another.

Once it's done this, that vector is then translated into coordinates that ascent to the computer to move the mouse cursor around.

Now the type of optic flow we're interested in isn't global optic flow is such it's local optic flow.

We want to work it out per pixel, so every pixel in our image gets associated with the vector that suggests how that pixel might have moved in the past.

And so things that remain static don't really have a vector.

In fact, the only thing that did change was the arm here.

Again, I've got are two successive frames, and if I overlay them, we could see the difference is really just the armed, changing position.

I'm going to try and work out where has that pixel come from?

And to do this, I create a patch that surrounds that pixel.

On this patch will be a small fragment of the image.

It will have some statistical features that make it difference to other parts of the image.

For now, of course, it might not be, and that could lead to errors in this album.

Once I've got a small patch on going to test that patch against lots of different locations within a given search area, and we need to restrict the search area for two reasons.

One is if we searched the whole image, it would be tremendously slow.

It's not a very computational, efficient algorithm, but secondly, we can assume that the differences between frames on the whole are minimal, particularly, it's a human being.

I can't move my arms so fast, but it makes a massive difference at, say, 30 frames per second For each patch in the search area, I record the similarity with the base patch on DDE.

In this instance, it would be somewhere like here.

They're not exactly the same, but they represent a similar part of similar feature off the arm than therefore for this patch.

To get from the location and green to the location and read, it had to move with this vector.

Once we found the best matching patch for every pixel in the image.

We can then modulate our vector field map with the original motion image because a lot of the vectors we're not interested in there hasn't been any motion.

And so they'll just be the results of noise, another unwanted estimations and byproducts of the searching algorithm.

So we restrict, then to just the pixels that we're interested in on.

All of the other factors effectively get set to zero.

Now let's start with the coding, but I want to emphasize that I'm not making any effort at all to try and optimize this algorithm.

In fact, it's going to be very inefficient and run quite poorly indeed.

And the reason for this is I want to keep it clear.

I want to keep the step by step nature of the algorithm very visible for viewers to study, have us the algorithm work.

But I do intend to follow up this video with another video that specifically talks about optimizing this algorithm using a technique called integral images.

But for this video, we're going to keep it simple on brute force, and we're going to pay the price for that simplicity.

As usual, I'm using the OLC Consul game engine on.

If you've seen the Web cam video, we're going to use a lively called S Cappie Tau handle the webcam for us, and I've derived a class from the council game engine called Optic Flow because the performance is going to suffer in this algorithm, I'm using quite a low resolution console, 80 by 60 characters where each character is 16 by 16 pixels and I've taken the liberty of already entering the Webcam capture code.

We don't need to see it again, and this involved a union.

字幕列表影片播放

增強現實#1 - 光流(C++) (Augmenting Reality #1 - Optical Flow (C++))

stick

assume

individual

process