Placeholder Image

字幕列表 影片播放

  • Ten years ago,

    10 年前,

  • computer vision researchers thought that getting a computer

    電腦視覺研究人員認為,

  • to tell the difference between a cat and a dog

    要讓電腦辨別貓與狗的差別,

  • would be almost impossible,

    幾乎是比登天還難,

  • even with the significant advance in the state of artificial intelligence.

    即使用了相當先進的 人工智慧都很難辦到。

  • Now we can do it at a level greater than 99 percent accuracy.

    現在我們可以把辨別的準確度 提升到 99% 以上。

  • This is called image classification --

    這技術叫做圖像分類——

  • give it an image, put a label to that image --

    給電腦看圖片, 並給圖片貼上標籤——

  • and computers know thousands of other categories as well.

    電腦還可以識別出 許多其它類別的東西。

  • I'm a graduate student at the University of Washington,

    我目前是華盛頓大學的研究生,

  • and I work on a project called Darknet,

    我正在做一個專題叫做「暗黑網路」,

  • which is a neural network framework

    它是一個用來訓練及測試

  • for training and testing computer vision models.

    電腦視覺模型的神經網路架構。

  • So let's just see what Darknet thinks

    所以,讓我們來瞧瞧暗黑網路

  • of this image that we have.

    對我們照片識別能力的狀況。

  • When we run our classifier

    當我們在這張照片上

  • on this image,

    開啟我們的分類器,

  • we see we don't just get a prediction of dog or cat,

    可以看到電腦現在不只 在預測這是狗或貓,

  • we actually get specific breed predictions.

    它實際上正在擷取特定品種的預測。

  • That's the level of granularity we have now.

    這就是現在我們電腦的粒度等級。

  • And it's correct.

    辨別正確。

  • My dog is in fact a malamute.

    我的狗的確是隻雪橇犬。

  • So we've made amazing strides in image classification,

    所以,我們在圖像識別上 已經有了很大的進步,

  • but what happens when we run our classifier

    但如果我們用識別器

  • on an image that looks like this?

    來辨別這樣的照片呢?

  • Well ...

    嗯……

  • We see that the classifier comes back with a pretty similar prediction.

    可以看到從分類器 得到的預測也相當類似。

  • And it's correct, there is a malamute in the image,

    沒錯,圖片中有一隻雪橇狗,

  • but just given this label, we don't actually know that much

    但它只給出一個標籤,

  • about what's going on in the image.

    我們對這張照片的理解 還不是很完整。

  • We need something more powerful.

    我們需要更強的東西。

  • I work on a problem called object detection,

    我正在研究一個問題, 叫做「物件偵測」,

  • where we look at an image and try to find all of the objects,

    我們把一張照片中的 所有物體都找出來,

  • put bounding boxes around them

    用邊界框把它們框起來,

  • and say what those objects are.

    然後標示它們是那些東西。

  • So here's what happens when we run a detector on this image.

    我們來看一下當我們在這一張圖片上 執行偵測軟體時,會發生甚麼事。

  • Now, with this kind of result,

    現在,有了這類的結果,

  • we can do a lot more with our computer vision algorithms.

    我們就可以利用電腦視覺演算法, 幫我們做更多的事。

  • We see that it knows that there's a cat and a dog.

    我們可以看到, 電腦知道圖片中有一隻貓和狗。

  • It knows their relative locations,

    它知道牠們彼此的相對位置、

  • their size.

    大小。

  • It may even know some extra information.

    電腦甚至可能知道其它的資訊。

  • There's a book sitting in the background.

    它也看到了背景中有一本書。

  • And if you want to build a system on top of computer vision,

    如果你想要建立一個 基於電腦視覺系統的實用系統,

  • say a self-driving vehicle or a robotic system,

    比如說,自動駕駛車或機械人系統,

  • this is the kind of information that you want.

    這類就會是你想要的資訊。

  • You want something so that you can interact with the physical world.

    你會想要一個可以 與實體世界互動的東西。

  • Now, when I started working on object detection,

    當我開始做物件偵測時,

  • it took 20 seconds to process a single image.

    它要花 20 秒才能處理一張圖片。

  • And to get a feel for why speed is so important in this domain,

    為了讓各位體會 為什麼這個領域這麼講究速度,

  • here's an example of an object detector

    我這邊做個執行物件偵測器的示範,

  • that takes two seconds to process an image.

    一張照片只要 2 秒的處理時間。

  • So this is 10 times faster

    所以,比 20 秒一張的偵測器

  • than the 20-seconds-per-image detector,

    快了 10 倍,

  • and you can see that by the time it makes predictions,

    各位可以看到, 在它識別圖像的過程中,

  • the entire state of the world has changed,

    周圍環境已經發生了變化,

  • and this wouldn't be very useful

    但對一個應用軟體而言,

  • for an application.

    這樣的速度是很鷄肋的。

  • If we speed this up by another factor of 10,

    如果我們把另一個參數調升到 10 ,

  • this is a detector running at five frames per second.

    這個偵測器每秒 就可以識別 5 張圖片。

  • This is a lot better,

    這樣好多了,

  • but for example,

    但,假如,

  • if there's any significant movement,

    移動很快的時候……

  • I wouldn't want a system like this driving my car.

    我可不想在我車上裝這樣慢的系統。

  • This is our detection system running in real time on my laptop.

    這是在我筆電上運行的 即時偵測系統。

  • So it smoothly tracks me as I move around the frame,

    我在框框附近移動的時候, 它可以很順暢地追蹤著我,

  • and it's robust to a wide variety of changes in size,

    而且,它可以根據不同的大小、

  • pose,

    姿勢、

  • forward, backward.

    前、後來做調整。

  • This is great.

    太棒了。

  • This is what we really need

    如果我們要建立一個 基於電腦視覺系統的實用系統,

  • if we're going to build systems on top of computer vision.

    這個才會是我真正想要的。

  • (Applause)

    (掌聲)

  • So in just a few years,

    所以,才幾年的時間,

  • we've gone from 20 seconds per image

    我們從每 20 秒處理一張照片,

  • to 20 milliseconds per image, a thousand times faster.

    進步到每張照片只要 20 毫秒, 快了 1000 倍。

  • How did we get there?

    我們是如何辦到的?

  • Well, in the past, object detection systems

    過去,物件偵測系統,

  • would take an image like this

    會把一張像這樣的照片,

  • and split it into a bunch of regions

    分割成好幾個小區塊,

  • and then run a classifier on each of these regions,

    然後在每一個小區塊 運行分類器軟體,

  • and high scores for that classifier

    相似度得分如果比較高

  • would be considered detections in the image.

    會被識別器認為照片偵測成功。

  • But this involved running a classifier thousands of times over an image,

    但這樣一張圖片要執行 好幾千次的識別指令、

  • thousands of neural network evaluations to produce detection.

    經過好幾千次的神經網路評估 才有辦法偵測出來。

  • Instead, we trained a single network to do all of detection for us.

    但我們不是這樣做,我們訓練了一個 網路模型來幫我們完成所有的偵測。

  • It produces all of the bounding boxes and class probabilities simultaneously.

    它可以同時產出邊界框 並同時對可能的結果進行評估。

  • With our system, instead of looking at an image thousands of times

    有了我們的系統, 你就不用一張圖片看了好幾千遍

  • to produce detection,

    才能偵測出來。

  • you only look once,

    你只要看一眼 (YOLO),

  • and that's why we call it the YOLO method of object detection.

    所以我們簡稱這個 物件偵測技術為「YOLO」。

  • So with this speed, we're not just limited to images;

    所以,有了這樣的辨識速度, 我們不只可以偵測圖片;

  • we can process video in real time.

    還可以處理即時的影片。

  • And now, instead of just seeing that cat and dog,

    現在各位看到的不是 貓、狗的靜態圖片,

  • we can see them move around and interact with each other.

    而是有牠們在移動、 互動的動態影片。

  • This is a detector that we trained

    這是我們用微軟 COCO 資料集裡

  • on 80 different classes

    80 種不同的類別

  • in Microsoft's COCO dataset.

    訓練出來的辨識器。

  • It has all sorts of things like spoon and fork, bowl,

    它包含各種東西, 像是湯匙、叉子、碗

  • common objects like that.

    這類的日常用品。

  • It has a variety of more exotic things:

    它還有很多奇妙的東西:

  • animals, cars, zebras, giraffes.

    動物、車子、斑馬、長頸鹿。

  • And now we're going to do something fun.

    現在我們要進行一件好玩的事。

  • We're just going to go out into the audience

    我們會進到觀眾席,

  • and see what kind of things we can detect.

    去看看能辨識到哪些東西。

  • Does anyone want a stuffed animal?

    有誰要填充娃娃?

  • There are some teddy bears out there.

    這邊還有一些泰迪熊。

  • And we can turn down our threshold for detection a little bit,

    我們現在降低一下 對偵測結果的精確度的要求,

  • so we can find more of you guys out in the audience.

    這樣我們可以在觀眾席中 找到更多東西。

  • Let's see if we can get these stop signs.

    我們來看看能不能偵測到停止標誌。

  • We find some backpacks.

    我們有偵測到一些背包。

  • Let's just zoom in a little bit.

    現在把鏡頭拉近一點。

  • And this is great.

    這真的很厲害。

  • And all of the processing is happening in real time

    所有的偵測流程

  • on the laptop.

    都可以在筆電裡即時呈現。

  • And it's important to remember

    更重要的是,

  • that this is a general purpose object detection system,

    這只是一個一般用的物件偵測系統,

  • so we can train this for any image domain.

    我們還可以訓練它 辨別任何領域的照片。

  • The same code that we use

    同樣的程式碼, 放在自動駕駛車裡,

  • to find stop signs or pedestrians,

    可以偵測到停止標誌、行人、

  • bicycles in a self-driving vehicle,

    腳踏車,

  • can be used to find cancer cells

    但放到組織切片

  • in a tissue biopsy.

    就可以偵測出癌症細胞。

  • And there are researchers around the globe already using this technology

    現在全球有很多研究人員 已經開始在使用這項技術

  • for advances in things like medicine, robotics.

    做進一步的研究, 像是醫藥、機械人領域。

  • This morning, I read a paper

    今天早上,我讀到一篇文章,

  • where they were taking a census of animals in Nairobi National Park

    在奈洛比國家公園裡, 他們要對動物們進行統計調查,

  • with YOLO as part of this detection system.

    YOLO 就是其使用的 偵測系統的一部分。

  • And that's because Darknet is open source

    而這一切都是因為 暗黑網路是開放原始碼,

  • and in the public domain, free for anyone to use.

    在公眾領域, 任何人都可以免費使用。

  • (Applause)

    (掌聲)

  • But we wanted to make detection even more accessible and usable,

    但我們希望偵測系統 可以更親民、更好用,

  • so through a combination of model optimization,

    所以在經過模型優化、

  • network binarization and approximation,

    網路二值化及近似度化的整合後,

  • we actually have object detection running on a phone.

    我們終於可以在手機上偵測物件。

  • (Applause)

    (掌聲)

  • And I'm really excited because now we have a pretty powerful solution

    而我真的相當興奮,因為我們現在

  • to this low-level computer vision problem,

    在低階的電腦影像處理問題上 有了相當強力的解決方式,

  • and anyone can take it and build something with it.

    任何人都可以拿去並創造一些東西。

  • So now the rest is up to all of you

    所以,接下來就看各位

  • and people around the world with access to this software,

    以及全世界所有人 用這個軟體大展身手了,

  • and I can't wait to see what people will build with this technology.

    我真的等不及想看看你們 用這項科技所做出來的產品。

  • Thank you.

    謝謝。

  • (Applause)

    (掌聲)

Ten years ago,

10 年前,

字幕與單字

影片操作 你可以在這邊進行「影片」的調整,以及「字幕」的顯示

A2 初級 中文 美國腔 TED 偵測 電腦 圖片 物件 照片

【TED】約瑟夫-雷德蒙:計算機如何學會瞬間識別物體(How computers learn to recognize objects instantly | Joseph Redmon)。 (【TED】Joseph Redmon: How computers learn to recognize objects instantly (How computers learn to recognize objects instantly | Joseph Redmon))

  • 748 59
    Caurora 發佈於 2021 年 01 月 14 日
影片單字