Placeholder Image

字幕列表 影片播放

由 AI 自動生成
  • You've seen photos come to life before but not like this.

    你以前見過栩栩如生的照片,但沒有見過這樣的。

  • EMO is the new AI on the block and it's revolutionizing the game making every other attempt look like a mere prototype.

    EMO是AI領域的新寵,正在顛覆遊戲領域,使其他嘗試看起來只是一種原型。

  • With its ability to infuse any still image with voice and motion, EMO is setting a new standard for digital animation.

    以其能夠為任何靜止圖像注入聲音和動作的能力,EMO正在為數位動畫設立新的標準。

  • Prepare to be amazed as we dive into how EMO is reshaping our expectations for interactive media.

    準備好驚奇吧,讓我們深入了解EMO如何重新塑造我們對互動媒體的期望。

  • All right. So how does EMO turn a still picture into a moving talking video that looks so real and keeps the person or character looking just like themselves over time?

    那麼EMO是如何將靜止的圖片轉換成一段看起來非常真實,並使人物或角色在長時間內保持原貌的動態影片呢?

  • That's what we're diving into today.

    這就是我們今天要深入探討的問題。

  • I'll break down what sets EMO apart, how it operates its tricks, plus the good stuff and the not so good stuff about it.

    我將詳細介紹 Emo 的與眾不同之處、它的操作技巧,以及它的優點和缺點。

  • All right, let's break down what EMO is in simpler terms.

    讓我們用更簡單的語言來解釋一下什麼是 EMO。

  • EMO, which stands for emote portrait alive, is this cool new AI system that can make pictures look like they're talking or singing just by using a single photo and some sound.

    EMO,全名為「表情畫像活現」,是一個很酷的新型AI系統,可以僅使用一張照片和一些聲音,使圖片看起來像在說話或唱歌。

  • It's really pushing the boundaries of how we can make videos that look super real and can mimic the way humans express themselves.

    它真的正在推動我們製作看起來超真實並能模仿人類表達方式的影片的極限。這一部分點,我們正重新定義我們的方式,你知道的。

  • Traditional ways of doing this often miss the mark, not quite capturing how unique everyone's face moves.

    傳統的做法通常無法完全捕捉到每個人臉部獨特動作的細節,往往會偏離目標。

  • EMO does something pretty smart to avoid these pitfalls.

    EMO採取一些相當聰明的方法來避免這些陷阱。

  • Instead of relying on complicated steps like making a 3D model of the face or trying to map out all the facial features exactly, it jumps straight from the sound to making the video.

    與其依賴複雜的步驟,如製作臉部的3D模型或試圖精確映射所有面部特徵,EMO直接從聲音轉到製作影片。

  • It uses something called a diffusion model, which is an AI method that's great at making images look lifelike and natural.

    它使用一種稱為擴散模型的方法,這是一種在使圖像看起來逼真和自然方面表現出色的AI方法。

  • This model listens to the audio and then figures out all the tiny movements your face would make to produce those sounds and the results are amazing.

    這個模型聆聽音頻,然後計算出你的臉部會產生這些聲音所需的所有微小動作,其結果令人驚嘆。

  • Videos made by EMO look incredibly real and full of life, showing emotions and movements that feel just right.

    由EMO製作的視頻看起來令人難以置信地逼真,充滿生氣,展現出感人至深的情感和動作,讓人感覺非常自然。

  • So just how impressive is EMO? Let me break it down for you.

    那麼EMO到底有多令人印象深刻呢?讓我為你分析一下。

  • It is seriously cool.

    它真的很酷。

  • It's not just about making videos where people are talking.

    這不僅僅是製作人們在說話的影片。

  • Don't cry, you don't need to cry.

    不要哭,你不需要哭。

  • It can make them sing too and in all sorts of styles.

    它還可以製作歌唱的影片,而且可以呈現各種風格。

  • Whether you need to bring to life a face with a full range of emotions or want someone to look around naturally, EMO has got you covered.

    無論您是想要賦予一張臉以豐富的情感表達,還是希望某人自然地四處張望,EMO都能滿足你的需求。

  • It keeps the same vibe of the person or character throughout the whole video, no matter how long it is.

    無論影片有多長,EMO都能保持整個影片中人物或角色的相同氛圍。

  • Plus, it isn't picky about who it animates.

    此外,它對於要製作動畫的對象並不挑剔。

  • It could be someone super realistic, a character from your favorite anime or even a 3D model and it works with any kind of voice input, actual speech, singing or computer-generated voices.

    它可以是非常逼真的人物,來自你最喜歡的動畫的角色,甚至是一個3D模型,而且它適用於任何類型的語音輸入,包括實際的語音、唱歌或是由電腦生成的聲音。

  • The cool part is you only need one picture.

    最酷的是,你只需要一張圖片。

  • Forget about hunting down a bunch of photos or videos to make something awesome.

    不必再辛苦搜尋一堆照片或影片來創造出令人驚艷的東西。

  • One single image is enough for EMO to work its magic.

    對EMO來說,只需要一張照片就足夠發揮其魔力。

  • It actually nails the subtle details of how people talk and sing, bringing animation so close to real life movements.

    它能捕捉到人們說話和唱歌的微妙細節,使動畫的運動非常接近真實生活。

  • It keeps the essence of the character consistent even when they move or change expressions in different ways.

    即使在人物移動或表情變化時,也能保持人物本質的一致性。

  • It's like you can recognize them instantly, even if it's your first time seeing them.

    彷彿你能立即辨認出他們,即使是第一次見到。

  • And the emotions, they come through loud and clear, making the voice feel genuine even if it's not originally theirs.

    而情感也清晰可見,使聲音感覺真實,即使不是他們原本的聲音。

  • In short, EMO is an incredibly flexible and potent tool for crafting videos where people talk or sing.

    簡而言之,EMO是一個極其靈活而強大的工具,可用於製作人們說話或唱歌的影片。

  • Now, let's delve into the technical components that contribute to EMO's success.

    現在,讓我們深入探討一下 EMO 成功的技術要素。

  • EMO is composed of various modules that synergize to produce fluid, stable and lifelike motions.

    EMO 由各種模塊組成,這些模塊可協同產生流暢、穩定和逼真的動作。

  • The process starts with the audio encoder which extracts acoustic features from the input audio, such as pitch energy and emotion.

    這個過程始於音頻編碼器,它從輸入的音頻中提取聲學特徵,如音調、能量和情感。

  • These features are crucial for driving the generation of mouth shapes and head movements.

    這些特徵對於生成口型和頭部動作至關重要。

  • Following this, the reference encoder comes into play, encoding the visual identity of the reference image including aspects like face shape, skin tone and hairstyle.

    隨後,參考編碼器開始發揮作用,對參考影像的視覺特徵進行編碼,包括臉型、膚色和髮型等方面。

  • This ensures that the character's appearance is consistently maintained throughout the video.

    這樣可以確保在整個影片中始終保持角色的外觀。

  • The core of EMO is the diffusion model.

    EMO的核心是擴散模型。

  • A pivotal module that synthesizes video frames from the audio and reference features through a reverse diffusion process.

    一個關鍵模組,透過反向擴散過程從音訊和參考特徵合成視訊幀。

  • This model having been trained on a vast data set of talking head videos is adept at creating realistic and expressive facial motions.

    該模型經過大量頭部說話影片資料集的訓練,擅長創建逼真且富有表現力的臉部動作。

  • To enhance the temporal coherence and stability of the video, the temporal module processes frames in groups, effectively smoothing out any potential jitter or flicker.

    為了增強影片的時間連貫性和穩定性,時間模組將幀分組處理,有效消除任何潛在的抖動或閃爍。

  • The facial region mask is another critical module.

    臉部區域遮罩是另一個關鍵模組。

  • Focusing the generation efforts on key facial regions such as the mouth, eyes and nose, thereby improving the detail and quality of the video, especially for lipsyncing.

    將生成工作集中在嘴巴、眼睛和鼻子等關鍵臉部區域,從而提高影片的細節和質量,尤其是口型同步。

  • Lastly, the speed control layer adjusts the pace of head movements to match the audio input, preventing unnaturally fast or slow motions and ensuring a more natural and consistent movement.

    最後,速度控制層調整頭部運動的速度以配合音訊輸入,防止不自然的快或慢運動,並確保更自然和一致的運動。

  • Now, this AI model opens up a wide range of potential applications from entertainment and education to telepresence and beyond.

    這種AI模型開闢了從娛樂和教育到遠端呈現等廣泛的潛在應用。

  • You can make your photos talk or sing or even create your own vocal avatar.

    你可以讓照片說話或唱歌,甚至創造自己的聲音和頭像。

  • You can also use EMO to enhance your communication and expression by adding facial animation and emotion to your voice or text messages.

    體還可以使用 EMO 透過在語音或簡訊中加入臉部動畫和情感來增強你的溝通和表達。

  • You can also use it to create immersive and interactive experiences by animating historical figures, celebrities or fictional characters.

    你還可以使用它透過動畫歷史人物、名人或虛構人物來創造身臨其境的互動體驗。

  • It can also be used for social goods such as preserving cultural heritage, promoting language learning or raising awareness.

    它還可用於社會公益,如保護文化遺產、促進語言學習或提高認識。

  • EMO is a game changer for content creation and it has the potential to revolutionize the way we communicate and interact with each other.

    EMO 改變了內容創作的遊戲規則,有可能徹底改變我們的溝通和互動方式。

  • But is EMO really the best out there?

    但EMO真的是最好的嗎?

  • Well, according to the researchers, EMO is superior to the current state-of-the-art methods in terms of expressiveness, realism and character identity preservation.

    研究人員表示,EMO 在表現力、真實性和角色身份保存方面優於目前最先進的模型。

  • Unlike others that might give you something stiff or odd looking, EMO's got the skills to create a wide range of believable facial expressions.

    與其他可能給你一些僵硬或奇怪的東西不同,EMO 擁有創造各種可信面部表情的技能。

  • It also avoids the common pitfalls like weird glitches or changes in the video that can make it look fake or off.

    它還避免了常見的陷阱,例如奇怪的故障或影片中的變化,這些缺陷可能會使影片看起來很假或不真實。

  • Plus, EMO's really good at making sure the person or character you start with looks like the same one throughout the video, something other technologies struggle with.

    另外,EMO 非常擅長確保你開始的人物或角色在整個影片中看起來都是同一個人或角色,這是其他技術難以做到的。

  • The team didn't just make these claims without backing them up. They put Emo through its paces with tests and studies to see how it measures up.

    該團隊並非只是在沒有支持的情況下提出這些主張。 他們透過測試和研究對 Emo 進行了測試,看看它的表現如何。

  • They used a bunch of different ways to check its performance, including something called expression-FID.

    他們使用了多種不同的方法來檢查其效能,包括一種稱為 expression-FID 的方法。

  • This test looks at how closely the video's expressions match up with the emotions in the audio it's paired with.

    該測試著重於影片的表情與其所配對的音訊中的情緒的匹配程度。

  • EMO came out on top with the lowest expression-FID score, meaning it was the most on point with its expressions.

    EMO 以最低的 expression-FID 得分名列前茅,這意味著它的表情最為準確。

  • They also got people to watch the videos and give their thoughts on how natural they seemed, how well they conveyed emotion and how accurately they kept the identity of the characters.

    他們還讓人們觀看影片,並評價他們的表現如何自然、如何很好地表達情感以及如何準確地保持角色的身份。

  • Again, EMO won out, earning the highest marks for making users happy with what they saw.

    EMO 再次獲勝,因讓用戶對所看到的內容感到滿意而獲得最高分。

  • Now, is it flawless?

    它完美無瑕嗎?

  • No. There are a few bumps in the road for EMO.

    不。EMO 的發展之路有一些坎坷。

  • Sometimes the videos it creates might have some weird bits or glitches, especially if the picture or sound it's working with isn't super clear, and there are moments when it doesn't quite get those little details right.

    有時,它生成的影片可能會有一些奇怪的位元或小故障,特別是如果它正在處理的圖像或聲音不是非常清晰,並且有時它不能完全正確地處理這些小細節。

  • Like a quick wink or a smile.

    比如一個快速的眨眼或微笑。

  • If someone's turning their head a lot or wearing something like glasses, EMO might not handle that too well.

    如果有人經常轉頭或戴眼鏡,EMO 可能無法處理的很好。

  • These issues mostly come down to what the system has learned from and how it's built.

    這些問題主要歸結於系統的借鏡和構建方式。

  • But the folks behind EMO are on it, trying to make it better.

    但 EMO 背後的人們正在努力讓它變得更好。

  • They're looking into ways to give users more say in how things turn out, add more types of characters and make it even more interactive.

    他們正在研究如何讓用戶對事情的結果有更多發言權,增加更多類型的角色,並使其更具互動性。

  • It's still a bit of a work in progress, but the future looks bright for EMO.

    EMO 還在不斷發展壯大,但前景一片光明。

  • Keep in mind, EMO is still evolving.

    請記住,EMO 還在不斷發展。

  • The brains behind it are working tirelessly to fix any flaws and expand its capabilities, ensuring it only gets better from here.

    它背後的大腦正在不懈地修復任何缺陷並擴展它的功能,確保它以後會越來越好。

  • And that wraps up our video for today. I really hope you found EMO as fascinating as I do.

    今天的影片到此結束。我真心希望你能像我一樣發現 EMO 的魅力。

  • It's seriously one of the most mind-blowing pieces of tech I've come across, I'm eager to see where it goes from here.

    說真的,這是我見過的最震撼人心的技術之一,我很想知道它以後的發展方向。

  • What about you? Thinking about giving it a whirl? Drop your thoughts in the comments.

    你呢? 考慮嘗試嗎? 在評論中留下你的想法。

  • If you enjoyed this dive into EMO and want to keep up with all things AI and tech, smash that like button, hit subscribe and turn on notifications.

    如果你喜歡這部深入瞭解 EMO 的影片,並想跟上 AI 和技術的發展,請按讚,訂閱並打開通知。

  • Thanks for watching and I'll catch you in the next video.

    感謝觀看,下部影片再見。

You've seen photos come to life before but not like this.

你以前見過栩栩如生的照片,但沒有見過這樣的。

字幕與單字
由 AI 自動生成

單字即點即查 點擊單字可以查詢單字解釋