萬能的 AI 什麼都能畫，但為什麼就是畫不好「手」呢？(Why AI art struggles with hands)

字幕列表影片播放

已審核字幕已審核

You're called to create a post-apocalyptic giraffe astronaut.

您的創作指令是：末世太空長頸鹿。
Generated.

產生。
Genghis Khan playing a guitar solo, pixel art.

成吉思汗獨奏吉他，像素圖。
Generated.

產生。
A man holding a delicious apple...

男人拿著美味的蘋果…
Ah... What's with his hands?

額…他的手是怎樣？
Why can't AI art make hands?

AI 為什麼畫不好手？
It doesn't matter what AI art model you use.

不管你用的是哪個 AI 繪圖工具，
If you have a man holding a delicious apple, his hands will look weird holding it.

如果你輸入男人拿著美味的蘋果，他的手都會看起來很奇怪。
Why is this so hard?

為什麼畫不好呢？
Seems easy enough, right?

看起來挺簡單的，不是嗎？
We've got this weird situation where AI art can instantly make...

這種情況很奇怪，人工智能能立即產生出…
Abraham Lincoln dressed like glam David Bowie.

亞伯拉罕林肯穿得像華麗的大衛鮑伊，
But struggles with a woman holding a cell phone.

但卻畫不好拿著手機的女人。
This isn't just a weird glitch.

這不僅只是個奇怪的小漏洞。
The struggle of AI art with hands can actually teach you something bigger about how AI art works.

AI 繪圖碰上的這個難題其實能帶我們領略更深的層面，了解此技術的運作方式。
I mean, what is so hard about this?

我是說，這有什麽好難的？
I asked an artist who has taught thousands of people how to draw hands from imagination.

我請教了一位教過成千上萬人如何憑想像畫手的藝術家。
Before someone becomes or starts training to be an artist, like officially training.

在某人成為畫家或開始接受畫家訓練前，我是指正式培訓，
It's pattern recognition.

得先識別模式。
You just grow up seeing a whole bunch of hands...

從小到大，我們看過很多手…
and you start knowing what hands look like.

從中知道了手該長怎樣。
You learn how things look by living in the world and recognizing patterns.

人類透過從生活中識別模式來了解事物外觀。
An AI is similar, but has key differences.

人工智能也一樣，但有幾處關鍵的不同。
Imagine an AI is like you,

想像 AI 和你一樣，
but trapped in a museum from birth.

但它從出生就被困在博物館中。
All the machine has to learn from are the pictures...

這些機器只能從圖片…
and the little placards on the side.

及一旁的解說標示學習。
Apple: A red apple on a brown table.

蘋果：一顆紅蘋果在褐色桌面上。
That's like the images it sees from the web and the descriptions that go with them.

這就像 AI 從網路上看到的圖像，以及其附帶的描述。
It's similar to how you learn, but locked in that museum.

AI 的學習方式跟人很像，只是它被關在一間博物館裡。
If you want to understand an apple you can rotate it in your hand.

如果想了解一顆蘋果，你可以在手中轉它。
You can watch it whenever you want.

你可以隨時觀察它。
If AI wants to understand an apple,

如果 AI 想了解一顆蘋果，
it has to find another picture of an apple in the museum.

它得在博物館裡找到另一張蘋果的照片。
Pattern recognition has allowed AI and people to draw decent apples,

模式識別讓人工智能和人類都能畫出像樣的蘋果，
but the processes differ.

但過程不同。
You start training to become an artist,

開始受訓成為一名畫家時，
and now you're like, okay, now I have to learn the rules.

你會想，好吧，我得先學習規則。
And that's where it becomes very different from how AI is learning.

而這就是人類與人工智能學習方式不同的地方。
Artists, in order to draw something complicated,

畫家想畫複雜的東西時，
we tend to simplify things into basic forms.

我們傾向於將事物簡化到基本形式。
And so when you look at a hand,

觀察我們的手，
you pretty much have the big blocky part of the palm, right?

基本構成是一大塊手掌，是吧？
You have the front, you have the back,

有手心手背，
and then you have the thickness.

然後手掌是有厚度的。
So you can pretty much just make that into like a square with some thickness to it.

所以基本上是一個有厚度的方形。
Then an artist can add all the style and texture and detail they want.

然後畫家可以再加上他們想要的樣式、紋理和細節。
AI works differently.

人工智能的運作不同。
Look at this hand.

看看這隻手。
The shapes are bizarre,

形狀很奇怪，
but the AI has done a great job showing the light and texture here.

但是 AI 在光線和紋理方面處理得很好。
Remember, the AI knows how things look,

記住，AI 知道東西的樣子，
but not how they work.

但不知道它們怎麼運作的。
So these patterns in pixels are easy to understand.

所以這些以像素為單位的圖很容易理解。
It never learned, however, that fingers don't really bend like this.

然而，AI 不知道手指並不會這樣彎曲。
It doesn't simplify the forms.

人工智能不會將事物簡化分析。
Remember, it's trapped in the museum.

別忘了，它被關在「博物館」中。
So it is just trying to guess where hand-like pixels should be

所以它只是在猜這些像手的像素該擺在哪，
without knowing how hands work like we do.

而不知道手是如何運作的，不像我們。
But listen, I find this kind of dissatisfying.

但聽著，我對這個答案並不滿意。
I mean, I'm basically just saying that AI can't draw hands because it's not a person.

我基本上只是在說 AI 手畫得很爛，因為它不是人。
But AI also doesn't know anything about construction,

但人工智能對建築也一無所知，
and it can still make a beautiful skyscraper in New York City.

它照樣可以畫出一棟在紐約的漂亮摩天大樓。
So to understand this better,

所以為了更好地了解這一點，
I spoke to two people who have worked with generative art models.

我採訪了兩位研究 AI 繪圖的人。
Yilun Du is a grad student whose heart is in robotics.

Yilun Du 是名專攻機器人的研究生。
But, you know, AI art is like a big deal now.

但你知道的，AI 繪畫現在是頭等大事。
So, he got pulled into it.

於是，他也投身其中。
Because of how popular these models have been in generative art...

這些繪圖產生器十分流行…
I've also been working on that.

所以我正在研究這塊。
And I talked to Roy Shilkrot,

我採訪了 Roy Shilkrot，
who has a super varied resume,

他經歷豐富，
but has been teaching about generative art since 2018.

自 2018 年來一直在教授關於生成繪圖的知識。
Good students that come in that are trying to break those models and take them to the next level.

進入此領域的優秀學生們一直試圖做出技術突破，想提升 AI 繪圖水平。
Talking to them helped me figure out three big reasons.

與他們談過後，我找出了三個主要原因。
Not every reason,

不是所有原因，
but three big reasons that hands are tough for AI art models.

但是 AI 繪圖畫不好手的三大原因。
The data size and quality,

數據的大小和畫質，
the way hands act,

手的動作，
and the low margin for error.

和誤差容忍度低。
For the data size, let's go back to the museum idea.

關於數據大小，讓我們先回到博物館那個比喻。
The museum the robot hangs out in,

機器人在的那個博物館，
it has a ton of rooms dedicated to faces,

有大量容納臉部的空間，
but not so many rooms for hands.

但沒那麼多空間給手。
That means it has less to learn from.

這代表了 AI 能學習的手部資訊較少。
Just as an example, available datasets like Flickr HQ has 70,000 faces.

舉個例子，像 Flickr HQ 這種數據庫有七萬張臉孔資料。
70,000

七萬張。
And this popular one annotates 200,000 pics of celebrity faces...

而這個熱門數據庫有二十萬張名人臉部照…
for lots of details, like eyeglasses or pointy noses.

包含很多細節，如眼鏡或尖挺的鼻子。
There are a ton of great hand datasets that can really understand hands,

其實是有大量的手部圖庫可以幫助 AI 理解手部的，
like this one with 11,000 hands.

像這個有一萬一千張手部圖。
But these may not have been used to train the AI that makes art.

但這些可能沒被用來訓練 AI 繪圖。
That data scarcity combines with the quality and complexity of the data.

資料稀缺加上畫質和手部復雜性等問題，
Hands data in the art museum isn't yet annotated to show how they work,

「博物館」中的手部資料還沒辦法展示出它們是如何運作的，
like the celebrities pointy noses.

不像名人堅挺的鼻子。
What they say is...

指令是這麼說的…
there is an image and there is a person in the image and that person is holding an umbrella.

圖中有個人，那個人拿著把傘。
You don't give the machine a lot of clues,

給機器的線索不夠多。
saying this is a person holding the umbrella.

應該要說，有個人撐著傘。
The thumb is going from one side of the handle and the fingers are curled,

拇指從手柄的一側伸出，其他手指捲曲，
and then the thumb is covering the index finger, but not the other ones.

拇指覆蓋住食指，但不會蓋到其他手指。
All that is made worse because hands do lots of things compared to, say... faces.

這讓狀況變得更糟了，因為與面部相比，手可以做很多動作。
So there's a pretty common like portrait photo face.

人像照片很常見。
There are a lot of these photos online,

網路上有很多這樣的照片，
and the thing is everything is very well centered, right?

臉部很好定中心，是吧？
Like eyes are always around here.

眼睛永遠都是在這附近。
Like there's always this order.

順序永遠是這樣的。
That's not true of hands,

而手不是這樣，
which can do this and this and this.

手可以這樣，這樣，還可以這樣。
I swear I'm sober right now.

我沒醉，我發誓。
Stan mentioned this, too.

Stan 也有提到這個。
How many fingers do you see right now?

你現在看到幾隻手指？
Like... two or three.

兩隻或三隻。
Like it doesn't know there's five

AI 不會知道有五隻手指，
cuz sometimes there's two, sometimes there's three,

因為有時候圖中的數量是兩隻或三隻，
sometimes four, sometimes five.

有時候是四或五隻。
You can see these problems with AI hands,

這種問題出現在手部繪圖上，
but the jankiness is all over AI art.

但其實 AI 繪畫很多地方都有這種紕漏。
Just look at horses.

看看馬就知道。
You can also have like three legs, five legs, six legs.

可能會出現三條腿、五條腿、六條腿。
The model does not learn to explain this because there's too much diversity

人工智能學不會這點，因為資料太多樣了，
and it doesn't have as much bias as we do.

而且人工智能不像人有那麼多偏見。
Okay. Did you hear that last part he said?

好，你有聽到他說的最後一部分嗎？
Good, because it's really important.

很好，因為這很重要。
It doesn't have as much bias as we do.

人工智能不像人有那麼多偏見。
We care a lot about hands and need them to be perfect.

我們非常注重手，我們要求完美無缺。
There is a low margin for error.

容錯率很低。
But because the model doesn't understand hands,

但因為人工智能不懂手，
hasn't seen many and because hands act weird...

沒有足夠手部資訊，而且手能做很多奇怪的動作…
it makes pictures that are like hands it's seen in the museum,

AI 會畫出長得像它在資料庫裡看到的手，
but not an exact hand.

但不是真的手的圖。
That's good enough for a ton of stuff, but not hands.

這對很多東西來說已夠了，但對手來說不夠好。
Here, let me give you some examples.

我給你看看一些例子。
Come over here.

過來。
So, I typed "make me a person with exactly five freckles".

我輸入「畫恰好有五個雀斑的人」。
So this one's from Dall-E 2,

這張是 Dall-E 2 產生的，
this one is from Stable Diffusion,

這張是 Stable Diffusion，
and this one is from Midjourney.

然後這張是 Midjourney 畫的。
So it's like, you know, great job.

是畫的很好沒錯。
You've got, you know, a red haired person.

它產生出了一個紅髮的人，
They're more likely to have freckles.

他們比較可能有雀斑，
But there are not exactly five freckles here.

但這並不符合「恰好五個雀斑」。
Here that doesn't really matter because we see a freckly face.

在這裡這並不重要，因為我們看到了一張長滿雀斑的臉。
But hands require higher standards.

但我們對手的要求更高。
Look at our apple-holding man again.

再看一次我們拿著蘋果的男人。
I made 3 other variations.

我做了四個版本。
The hands are all weird, but don't look at them right now.

手都很怪，但先不要看手。
It changed the shirt stripes, the buttons, the apple style...

AI 改變了襯衫的條紋、鈕扣、蘋果的樣子…
None of that matters because it's stripe-like

這些都不重要，因為圖案都是條狀的，
and button-like and apple-like.

都是鈕扣，蘋果也都是蘋果。
But hand-like isn't good enough.

但「長得像手」是不夠的。
I came away from this thinking a couple of things.

這讓我不禁思考了幾件事。
AI art is basically bad at art.

AI 繪畫基本上畫得不好，
We're just able to see it with hands.

只是我們只注意到手而已。
And B, it's never going to get any better.

還有，AI 繪畫是不可能畫得好的。
But both of those things are a bit wrong.

但以上兩點都不全對。
I will say that the newest AI art generator to come out at the time of this video is Midjourney version 5

這支影片出時最新發佈的 AI 繪圖器應該是 Midjourney 5.0。
and they made some progress with hands for sure,

而他們在手部繪畫已經有所進步，
but it's not totally fixed yet.

但還沒完全改善。
Don't tell the AI to hold an umbrella.

別叫 AI 畫拿雨傘的樣子。
I think they're, like, spending lots of time on some things that you appreciate,

我認為他們花很多功夫在人們會欣賞的事上，
which is why you like the images, and a lot of stuff that you don't actually even notice.

這就是你會喜歡這些圖的原因，實際上很多部分是你沒注意到的。
I think that for a lot of natural scenery or something like that,

我覺得像自然風景之類的圖，
I feel like model might be better at that than people.

AI 可能畫得比人類好。
And they are working on two things.

他們正在做兩件事。
First, they have the AI look at a ton more pictures,

首先，他們讓 AI 看更多圖片，
which requires more computing power.

這需要更多的「算力」。
They're trying to solve that on a big scale

他們也正試圖大規模解決這個問題，
because if you want to train on more than a handful of images...

因為如果你不想只用幾張圖片…
if you want to train on more than 100 images

如果你想用超過 100 張圖片訓練，
this would take tremendous resources from you to retrain the model itself.

這將佔用大量資源來重新訓練人工智能。
The other solution might be to invite more people into the museum.

另一個解決方案可能是邀請更多人進入「博物館」。
There's an interesting analog.

有個有趣的相似情況。
So like, have you heard of like ChatGPT?

你有聽過 ChatGPT 嗎？
The big difference was that it basically used human feedback.

最大的區別在於它基本上用了人工反饋。
So like they generated many, many sentences

它會產生很多句子，
and asked people to rate which ones are good and which ones are not good.

然後讓用戶評價哪些好，哪些不好。
They basically fine-tuned the model

基本上，這能幫助微調。
so that it would generate sentences that are convincing to people.

這樣 AI 就能產生出對人們來說合理的句子。
I guess it would require a lot of engineering to get people to label so much data.

我想這需要大量工程才能讓人們標記如此多數據。
But I think if we could just get, like, people to rank how good the images are generated by these models

但我認為，如果我們能讓用戶評價 AI 生成的圖，
then, like, a lot of these issues will go away, actually.

那很多問題都會消失。
Because they're just training the models to do what people like.

因為這樣就是在訓練 AI 照著人們喜好做事。
It's not just the hand,

不只是手，
teeth and abs,

牙齒、腹肌都是。
anything where there's like a pattern, a large amount of something,

任何有規律，有一定數目的東西，
It doesn't know the rule of "there are this many"

AI 都不知道該有幾個，
because it's trained on different amounts.

因為用於訓練它的資料中量都不同。