Placeholder Image

字幕列表 影片播放

  • A mildly fun thing to do when you're bored is start the beginning of a text message,

  • and then use only the suggested words to finish it.

  • "In five years I will see you in the morning and then you can get it."

  • The technology behind these text predictions is called a “language model": a computer

  • program that uses statistics to guess the next word in a sentence.

  • And in the past year, other, newer language models have gotten really, weirdly good at

  • generating text that mimics human writing. "In five years, I will never return to this

  • place. He felt his eye sting and his throat tighten."

  • The program completely made this up. It's not taken from anywhere else and it's not

  • using a template made by humans. For the first time in history, computers can

  • write stories. The only problem is that it's easier for machines to write fiction than

  • to write facts.

  • Language models are useful for a lot of reasons.

  • They helprecognize speechproperly when sounds are ambiguous in speech-to-text

  • applications. And they can make translations more fluent

  • when a word in one language maps to multiple words in another.

  • But if you asked language models to simply generate passages of text, the results

  • never made much sense. SHANE: And so the kinds of things that made

  • sense to do were like generating single words or very short phrases.

  • For years, Janelle Shane has been experimenting with language generation for her blog AI Weirdness.

  • Her algorithms have generated paint colors, "Bull Cream

  • Halloween costumes, “Sexy Michael Cera

  • And pick-up lines. "You look like a thing and I love you.”

  • But this is what she got in 2017 when she asked for longer passages, like the first

  • lines of a novel: SHANE:The year of the island is discovered

  • the Missouri of the galaxy like a teenage lying and always discovered the year of her

  • own class-writing bed ...It makes no sense. Compare that to this opening line from a newer

  • language model called GPT-2. SHANE: It was a rainy, drizzling day in the

  • summer of 1869. And the people of New York, who had become accustomed to the warm, kissable

  • air of the city, were having another bad one. JOSS: It's like it's getting better at bullsh*tting

  • us. SHANE: Yes, yes, it is very good at generating

  • scannable, readable bullsh*t. Going from word salad to pretty passable prose

  • took a new approach in the field of natural language processing.

  • Typically, language tasks have required carefully structured data. You need thousands of correct

  • examples to train the program. For translation you need a bunch of samples

  • of the same document in multiple languages. For spam filters, you need emails that humans

  • have labeled as spam. For summarization, you need full documents

  • plus their human-written summaries. Those data sources are limited and can take a lot

  • of work to collect. But if the task is to simply guess the next

  • word in a sentence, the problem comes with its own solution.

  • So the training data can be any human-written text, no labeling required. This is called

  • self-supervised learning.” That's what makes it easy and inexpensive to gather data,

  • which means you can use a LOT of it. Like all of Wikipedia, or 11,000 books, or

  • 8 million web sites. With that amount of data, plus serious

  • computing resources, and a few tweaks to the architecture and size of the algorithms, these

  • new language models build vast mathematical maps of how every word correlates with every

  • other word, all without being explicitly told any of the rules of grammar or syntax.

  • That gives them fluency with whatever language they're trained on, but it doesn't mean

  • they know what's true or false. To get language models to generate true stories,

  • like summarizing documents or answering questions accurately, it takes extra training.

  • The simplest thing to do without much more work is just generate passages of text, which

  • are both superficially coherent and also false. GEITGEY: So give me any headline that you

  • want a fake news story for. JOSS: Scientists discover Flying Horse.

  • Adam Geitgey is a software developer who created a fake news website populated entirely with

  • generated text. He used a language model called Grover, which

  • was trained on news articles from 5,000 publications. “More than 1,000 years ago, archaeologists

  • unearthed a mysterious flying animal in France and hailed it the 'Winged Horse of Afzel'

  • or 'Horse of Wisdom'” GEITGEY: This is amazing, right? Like this

  • is crazy. JOSS: So crazy.

  • GEITGEY: "The animal, which is the size of a horse, was not easy." If we just Google

  • that. Like there's nothing. JOSS::It doesn't exist anywhere.

  • GEITGEY: And I don't want to say this is perfect. But just from a longer term point of view

  • of what people were really excited about three years ago versus what people can do now, like

  • this is just like a huge, huge leap. If you read closely, you can see that the

  • model is describing a creature that is somehow bothmouse-likeandthe size of a

  • horse.” That's because it doesn't actually know

  • what it's talking about. It's simply mimicking the writing style of a news reporter.

  • These models can be trained to write in the voice of any source, like a twitter feed,

  • “I'd like to be very clear about one thing. shrek is not based on any actual biblical

  • characters. not even close.” Or whole subreddits.

  • “I found a potato on my floor.” “A lot of people use the word 'potato'

  • as an insult to imply they are not really a potato, they just 'looked like' one.”

  • “I don't mean insult, I mean as in as in the definition of the word potato.”

  • Fair enough. The potato has been used in various ways for a long time.”

  • But we may be entering a time when AI-generated text isn't so funny anymore.

  • Islam has taken the place of Communism as the chief enemy of the West.”

  • Researchers have shown that these models can be used to flood government websites with

  • fake public comments about policy proposals, post tons of fake business reviews, argue

  • with people online, and generate extremist and racist posts that can make fringe opinions

  • seem more popular than they really are. GEITGEY: It's all about like taking something

  • you could do and then just increasing the scale of it, making it more scalable and cheaper.

  • The good news is that some of the developers who built these language models also built

  • ways to detect much of the text generated through their models. But it's not clear

  • who has the responsibility to fake-check the internet.

  • And as bots become even better mimics - with faces like ours, voices like ours, and now our language, those

  • of us made of flesh and blood may find ourselves increasingly burdened with not only detecting

  • what's fake, but also proving that we're real.

A mildly fun thing to do when you're bored is start the beginning of a text message,

字幕與單字

單字即點即查 點擊單字可以查詢單字解釋

B1 中級

計算機只是在寫作方面有了很大的進步 (Computers just got a lot better at writing)

  • 6 0
    林宜悉 發佈於 2021 年 01 月 14 日
影片單字