Placeholder Image

字幕列表 影片播放

  • [Sebastian Thrun] So what's your take on how to build a search engine,

    建立搜尋引擎 (search engine) ,你有什麼收穫呢?

  • you've build one before, right?


  • [Sergey Brin - Co-Founder, Google] Yes. I think the most important thing


  • if you're going to build a search engine


  • is to have a really good corpus to start out with.

    是從一個非常好的語料庫 (corpus) 開始

  • In our case we used the world wide web, which at time was certainly smaller than it is today.

    我們以前使用 WWW,它比今天的 WWW 小多了

  • But it was also very new and exciting.


  • There were all sorts of unexpected things there.


  • [David Evans] So the goal for the first three units for the course is to build that corpus.


  • And we want to build the corpus for our search engine


  • by crawling the web and that's what a web crawler does.

    爬行網頁是網頁蜘蛛 (web crawler) 的工作

  • What a web crawler is, it's a program that collects content from the web.


  • If you think of a web page that you see in your browser, you have a page like this.


  • And we'll use the udacity site as an example web page.

    我們將使用 udacity 的網站做為網頁的例子

  • It has lot's of content, it has some images, it has some text.


  • All of this comes into your browser when you request the page.

    當您請求這個網頁時, 所有的內容都來到你的瀏覽器 (browser)

  • The important thing that it has is links.

    重要的是,網頁含有連結 (link)

  • And what a link is, is something that goes to another page.

    連結 (link) 是什麼? link 通往另一個網頁

  • So we have a link to the frequently asked questions,

    有一個通往「常見問題」的 link

  • we have a link to CS 101 page.

    有一個通往 CS101 網頁的 link

  • There's some other links on the page.

    還有其他一些 link

  • And that link may show in you browser with an underscore,

    link 在瀏覽器中顯示的時候,可能帶有底線

  • it may not, depending on how your browser is set.


  • But the important thing that it does,


  • is it's a pointer to some other web page.

    link 是通往其他網頁的指引

  • And those other web pages may also have links

    而其他網頁也可能含有 link

  • so we have another link on this page.

    這個網頁上有另一個 link

  • Maybe it's to my name, you can follow to my home page.

    也許它通往我的名字,你可以跟隨它通往我的首頁 (homepage)

  • And all the pages that we can find with our web crawler


  • are found by following the links.

    都是跟隨 link 而找到的

  • So it won't necessarily find every page on the web


  • If we start with a good seed page

    如果我們從一個好的種子網頁 (seed page) 開始

  • we'll find lot's of pages, though.


  • And what the crawler's gonna do is start with one page,


  • find all the links on that page, follow them to find other pages

    找出網頁中所有的 link,跟隨它們,找到其他的網頁

  • and then on those other pages it will follow the links on those pages

    然後在這些網頁裡,繼續跟隨網頁中的 link

  • to find other pages and there will be lot's more links on those pages.

    以找到其他網頁,那些網頁中有更多的 link

  • And eventually we'll have a collection of lot's of pages on the web.


  • So that's what we want to do to build a web crawler.


  • We want to find some way to start from one seed page,

    我們希望找到方法,從一個 seed page 開始

  • extract the links on that page,

    擷取網頁上的 link

  • follow those links to other pages,

    跟隨這些 link 找到其他的網頁

  • then collect the links on those other pages,

    然後收集那些網頁中的 link

  • follow them, collect all that.

    跟隨它們,收集所有的 link

  • So that sounds like a lot to do.


  • We're not going to all that this first class.


  • What we're going to do this first unit, is just extract a link.

    第一單元要做的,只是擷取一個 link

  • So we're going to start with a bunch of text.


  • It's going to have a link in it with a URL.

    其中帶有 URL 的 link

  • What we want to find is that URL,

    我們要找出那個 URL

  • so we can request the next page.


  • The goal for the second unit


  • is be able to keep going.


  • if there's many links on one page, you will want to be able to find them all.

    如果網頁中有很多 link,你要把它們全找出來

  • So that's what we'll do in unit 2,


  • is to figure out how to keep going to extract all those links.

    是要弄清楚如何持續的擷取所有的 link

  • In unit three, well, we want to go beyond just one page.


  • So by the end of unit two we can print out all the links on one page.

    第二單元結束時,我們能夠印出一個網頁中的所有 link

  • For unit 3 we want to collect all those links, so we can keep going,

    第三單元我們要收集所有的 link,才可以持續下去

  • end up following our crawler to collect many, many pages.


  • So by the end of unit three we'll have built a web crawler.


  • We'll have a way of building our corpus.


  • Then the remaining three units will look at how to actually respond to queries.

    剩下三個單元重點在於,如何回應查詢 (queries)

  • So in unit four we'll figure out how to give a good response.


  • So if you search for a keyword, you want to get a response that's a list of the pages

    當搜索一個關鍵字 (keyword) 時, 我們要給出一個網頁列表 (list) 當作回應

  • where that keyword appears.


  • And we'll figure out in unit five a way to do that, that scales, if we have a large corpus.

    第五單元我們將思考, 如果有一個很大的語料庫,如何擴展規模

  • And then in unit six what we want to do is, well, we don't just want to find a list,


  • we want to find the best one.


  • So we'll figure out how to rank all the pages where that keyword appears.


  • So we're getting a little ahead of ourselves now,


  • because all we're going to do for unit one,


  • is to figure out how to extract a link from the page.

    思考如何從網頁中擷取一個 link

  • And the search engine that we'll build at the end of this


  • will be a functional search engine.


  • It will have the main components that a search engine like Google has.

    它將擁有像 Google 這種搜尋引擎所具備的主要元件

  • It certainly won't be as powerful as Google will be,

    它當然不會像 Google 那麼強大

  • we want to keep things simple.


  • We want to have a small amount of code to write.


  • And we should remember that our real goal


  • is not as much to build a search engine,


  • but to use the goal of building a search engine as a vehicle


  • for learning about computer science


  • and learning about programming


  • so the things we learn by doing this


  • will allow us to solve lot's and lot's of other problems.


[Sebastian Thrun] So what's your take on how to build a search engine,

建立搜尋引擎 (search engine) ,你有什麼收穫呢?


單字即點即查 點擊單字可以查詢單字解釋