Placeholder Image

字幕列表 影片播放

  • The idea is to take a single photograph for the face on as outputs, we get a three D model volumetric form.

  • When you start being something like that, this kind of thing is becoming more common now with CNN's.

  • You can give them a problem and just let them try and solve it.

  • This is convolution.

  • Neural networks, right?

  • Yeah, that's right.

  • So the idea is that so is the image comes in, we apply some filters.

  • The values of these filters alone's through grading descent on Michael the video talking about that.

  • This is a recent U.

  • S O off about eight months on, we decided it would be quite fun if we release kind of an online demonstration, mainly.

  • So people didn't have to run the code on their own computer because you have to install some dependencies on you need a teepee, you on?

  • Yeah, this is a much easier way.

  • You could just upload a photograph on doesn't output.

  • You get three.

  • Match what she converted eights and play with.

  • This is the object which you can download on view in mesh lab.

  • So if I also disable the colors, you know this is the actual that the object without without any text during applies the actual mesh which comes out of the network is slightly high resolution than this.

  • But because of demand, we have to kind of compress it.

  • But in general it seems to capture things like the mouth and the nose.

  • You can try.

  • Yes.

  • I was running on one of the DP machines that we have in the group on DDE.

  • I think the moment's processing, maybe 80 photographs for well at the time, maybe 3000 per hour.

  • This is a photograph of your face on we have I'm gonna draw a magic books and now I'll try and fill it in a bit.

  • On dso Everything is coming this direction on.

  • Does that put?

  • We produce a cubes and this is a three D volumes made up of much smaller cubes.

  • On in each one of these cubes, the network progresses the value of their or one and then we use ah Burton called marching cubes which takes the surface of all of these ones on DDE.

  • Simply send them back to the website and we used the friend with three dress to surrender it.

  • So the magic box turns this simple image somehow into this three D shape, and then you basically color in with yes, So the text during the move is ridiculously simple.

  • So if you upload a side profile, it will actually take the the texture from from the front of your face on the back on.

  • You might also notice some distortion around the sides where it's actually included.

  • The background be behind the face.

  • But the problem was not to try and improve text a ring.

  • It was to try and recruiting and improve.

  • The quality of the three year construction has good, good performance.

  • And have you taken some of these three D volumes and compare them to actual Freedy scans just to see how close they are or is it will be by our we did a lot of testing on calculating of error, which is probably one of the most challenging problems in doing this work.

  • Um, the ground tree that we have to have any details.

  • You know, it doesn't show wrinkles a little sports, but the fit is very good to the face, so it should try and match the shape of the face.

  • That's the magic question, which is what's going on in the box.

  • So inside we have an architecture which actually looks like an hourglass, and it's called the Stacked Hourglass.

  • So here we have the first part of the hourglass on and we have it up sampling again.

  • And then we have a second glass in here.

  • The image comes in the spatial resolution as it passes through.

  • The convolutions on your network is getting smaller until we have something maybe 10 pixels per 10 pixels.

  • So not the original resolution, which is close to 200 by 200.

  • We then up sample it on while we're sampling it.

  • We actually have smaller I'm a glass networks, which are working at larger resolutions.

  • So these come from another earlier point in the network.

  • So, as this is being up, sampled were combining the result from these smaller CNN's.

  • This increases the resolution around the side of the face.

  • For example, I'm so you have more detail.

  • Otherwise you just get a blurry, very all basically, yes.

  • So as you have these largest CNN's inside, these increase the global context.

  • So you know that the allies are in the right place, and then the mouth is below the ice.

  • For example, this thing goes for another stabbed hourglass network, which is identical.

  • So it also has these smaller hourglass networks on as output.

  • We just take the three billion.

  • So the hourglass there is taking a duty image.

  • So these smaller but high resolution our glasses besides a making.

  • Besides, how does it extrapolating year so it can actually be thought of a segmentation problems?

  • There's a lot of work on using conventional networks for segmentation, so if you take a single image, it will.

  • It will give you a mask of pixels and you give each pixel in number with it being like a person or a dog, for example, instead of regressing a single class per channel as output.

  • So if you if you're segmenting dogs on humans, you'd have two channels.

  • In this case, we regressed 200 channels, but they're all the same class.

  • They're all the human face.

  • So any time that you have set of zeroes, you know that it's part of the face.

  • If you imagine a cube drawn around my face because you know I have to animate, it's your face is now inside.

  • Yeah, slices inside a cube.

  • If you chop my face up into smaller slices, you would see a set of one's any place that is so in the front.

  • You just see a few ones on the tip of my nose as you get towards my ears.

  • You'll see kind of, you know, just around the on oval, shaped like exposed.

  • It's working out what features to the image is off, and then it's putting us in the right place in three D Space is exactly so before we process the image, we first moved the face that it's in the same sort of spatial dimension as the cube that we, uh, put so the output of the three D Cube.

  • It should be specially lined perfect with the face, which is why, when you see it on the Web sites, the actual volume is perfectly in line, and if you have a slight irritation to your face, the the measure will also be aligned with.

  • One thing I noticed was how I was.

  • Hey, what's going on?

  • Yeah, so the training that doesn't contain any any hair, it's It's just the three d shape of the face.

  • And when we were producing the days that we train from.

  • We actually didn't bother with the back of head.

  • It will be quite nice to have the back of the head as well.

  • The problem is that since this is a volumetric problem, we actually have to produce these volumes.

  • Which means checking the box is that inside the measure or not, that could be quite time consuming.

  • We have some statistics which have been watching continuously case something crashes in the back end.

  • We have six cues on at the moment.

  • There are 67 images waiting across those cues.

  • You can see that we've had 100 and 81,000 photographs that loaded uh, most of those.

  • Of instance, the 12th of September on today I think is the 19th eso Just in the past minute we've had 70 which have been uploaded.

  • So that's why I feel sorry or bad for the technical sources at the moment.

  • That's so much traffic.

  • I don't want to just switch it off, and I would like it to be available for the re such as he wanted to try and look at our work.

  • Um, we'll see what happens, but I hope that we can keep it running.

  • The reason why doesn't work on the online demonstration when you use the side poses because the face detective using doesn't doesn't recognize these faces eso, instead of producing horrible results withdrawal that just way end up with a much smaller image on lots off features going all the way back.

  • So these are my different convolutions off convolutions off convolutions of combinations.

The idea is to take a single photograph for the face on as outputs, we get a three D model volumetric form.

字幕與單字

單字即點即查 點擊單字可以查詢單字解釋

B1 中級

自拍到3D模型 - Computerphile (Selfie to 3D Model - Computerphile)

  • 2 0
    林宜悉 發佈於 2021 年 01 月 14 日
影片單字