Name: 兩分鐘論文--利用深度學習產生幻覺的影像 (Two Minute Papers - Hallucinating Images With Deep Learning)
Uploaded: 2021-01-14T06:19:11.000Z
Duration: 3 min 14 s
Description: 【看影片學英語】數萬部 YouTube 影片，搭配英漢字典即點即查，輕鬆掌握單字發音與用法，長久累積看電影不必再看字幕。

Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér.

In an earlier episode, we showcased a technique for summarizing images not in a word, but

情態副詞

an entire sentence that actually makes sense. If you were spellbound by those results, you'll

be out of your mind when you hear this one: let's turn it around, and ask the neural network

to have a sentence as an input, and ask it to generate images according to it. Not fetching

already existing images from somewhere, generating new images according to these sentences. Create

new images according to sentences. Is this for real?

This is an idea, that is completely out of this world. A few years ago, if someone proposed

such an idea and hoped that any useful result can come out of this, that person would have

immediately been transported to an asylum.

An important keyword here is "zero shot" recognition. Before we go to the zero part, let's talk

about one shot learning. One shot learning means a class of techniques that can learn

something from one, or at most a handful of examples. Deep neural networks typically require

to see hundreds of thousands of mugs before they can learn the concept of a mug. However,

if I show one mug to any of you Fellow Scholars, you will, of course, immediately get the concept

of a mug. At this point, it is amazing what these deep neural networks can do, but with

the current progress in this area, I am convinced that in a few years, feeding millions of examples

to a deep neural network to learn such a simple concept will be considered a crime.

Onto zero shot recognition! The zero shot is pretty simple - it means zero training

samples. But this sounds preposterous! What it actually means is that we can train our

network to recognize birds, tiny things, what the concept of blue is, what a crown is, but

then we ask it to show us an image of "a tiny bird with a blue crown".

Essentially, the neural network learns to combine these concepts together and generate

new images leaning on these learned concepts.

I think this paper is a wonderful testament as to why Two Minute Papers is such a strident

advocate of deep learning and why more people should know about these extraordinary works.

About the paper - it is really well written, there are quite a few treats in there for

scientists: game theory and minimax optimization, among other things. Cupcakes for my brain.

We will definitely talk about these topics in later Two Minute Papers episodes, stay

tuned! But for now, you shouldn't only read the paper - you should devour it.

And before we go, let's address the elephant in the room: the output images are tiny because

this technique is very expensive to compute. Prediction: two papers down the line, it will

be done in a matter of seconds, two even more papers down the line, it will do animations

in full HD. Until then, I'll sit here stunned by the results, and just frown and wonder.

Thanks for watching, and for your generous support, and I'll see you next time!