YouTube如何知道你應該看什麼？速成班AI #15 (How YouTube knows what you should watch: Crash Course AI #15)

字幕列表影片播放

So, John-Green-Bot, you know when you let me use your computer the other day?
Well, I went on YouTube and it was like seeing a completely different website. There were
videos about restoring old VCRs and different kinds of cassette tapes, and ads for motor
oil?!
John-Green-bot: Yes, Jabril! I love learning about other machines.
Jabril: Okay, but do you even know what humans are watching these days? What about those
Boston Dynamics videos?
John-Green-bot: No. The humans in those videos are so mean to the robots! What about Epic
Computation Battles of History, or MKB-AI, or Robot Appétit?
Jabril: …… what?
INTRO
Hi, I'm Jabril and welcome to Crash Course AI! Recommender systems are a type of AI that
try to understand our brains and make useful recommendations to us.
This kind of AI can guide the things we watch by recommending YouTube videos or shows on
Netflix for example.
On Amazon, it's recommending items to buy, when I search on Google, it's recommending
relevant and interesting links. And everywhere online, advertisement servers are trying to
recommend products and services.
Recommender systems combine supervised learning and unsupervised learning techniques to learn
about us.
And because we're so complicated, recommending stuff to us is a tough problem that can produce
lots of unexpected results.
Maybe we get caught in an online bubble and only see tweets from our friends and people
who think like us. Maybe we miss a new TV show because streaming sites don't think
we'd like it. Or maybe that creepy thing happens where you're talking to your friends
about supercomputers and then every single ad you see for the next day is for supercomputers?!?
AI that make recommendations can really change what version of the internet we all see. But
to understand the benefits and drawbacks of these algorithms, we have to understand where
they get their data and how they work.
As an example, let's focus on an algorithm that could recommend YouTube videos. Because
“The Algorithm” is a really big deal if YouTube is your job, and everyone's talking
about the mysterious changes behind the algorithm anyway.
Three common approaches are content-based recommendation, social recommendation, and
personalized recommendations.
Content-based recommendations look at the content of the videos, not the audience.
Like, for example, our algorithm may decide to recommend more recent videos, or videos
that are made by someone on a list of “quality creators.” But this means someone has to
decide who “quality creators” are, or program an AI that tries to predict creator
quality.
On the other hand, social recommendations pay attention to the audience.
YouTube is on the internet so we can use social ratings such as “likes” or “views”
or “watch time” to decide what people are watching and should be recommended. But
not everybody likes the same stuff, so maybe pure popularity isn't the way to go.
Different people have different preferences, so our AI can incorporate that with personalized
recommendations.
If you like this Crash Course video, maybe we'd recommend other Crash Course videos
or videos from my channel. But the problem with personalized recommendations is that
it might be difficult to stumble onto new interesting stuff.
So, to get the best of all worlds, recommender systems generally use collaborative filtering,
which combines all three of these recommenders.
When we see a recommendation on YouTube, it could be because that video is similar to
other videos that we've watched and liked and other people who have similar tastes watched
and liked that video. Or (especially if you're new to Youtube) that video might be recommended
because it's popular and lots of people are watching and liking it.
Collaborative filtering combines several of the techniques we've already talked about
in Crash Course AI. It uses unsupervised learning to find similar people or content, and it
tries to use data from those things to predict how we would feel about something we haven't
even seen yet.
To see how collaborative filtering works, let's use a simple example.
In this table, YouTube channels are represented as columns. So, here, one column represents
CrashCourse, one is Jabrils, one is The Best of BattleBots, one is The Art Assignment,
and so on.
Specific users that watch YouTube videos are represented as rows. So this row is John-Green-bot,
this one is me, these two are a couple random folks, this one is our producer Brandon, and
so on.
Each cell in the table corresponds to whether the user subscribes to a specific channel
or not. 1 means they've watched at least one video and subscribed, 0 means they've
watched at least one video and didn't subscribe, and the cell is empty if they haven't seen
any videos.
If we look at John-Green-bot's row, he subscribes to Crash Course and Jabrils, so those cells
have a 1. He saw The Best of Battlebots and did not subscribe, because of all the robot-on-robot
violence, so that's a 0. And he's never seen The Art Assignment so there's no information
in that cell.
To recommend new channels for John-Green-bot, our collaborative filtering AI needs to predict
how likely he is to subscribe to a channel he's never seen before. In this case, let's
see if The Art Assignment ends up in his recommendations.
To make a prediction, the algorithm needs to look at which other people have subscribed
to the Art Assignment. And because YouTube tastes are very personal, instead of looking
at all other users, our algorithm will focus on finding the users who are most similar
to John-Green-Bot.
Finding similar things is a classic unsupervised learning problem. Our AI can look at all the
rows, cluster together similar users, and then pick some of those that are most similar
to John-Green-Bot, and who have seen The Art Assignment.
Let's just say there are 1000 of these specific users, but there are other clusters with thousands
of users too that these recommender systems take into consideration.
Now, we have a classic supervised learning problem: training an AI to make predictions
based on past examples. In this case, we're training an AI to predict a 1 or 0 (subscribe
or not) for John-Green-bot based on other users.
We can re-adjust the results so that ratings from the cluster of 1000 most similar users
are given more weight in the final prediction, compared to those other clusters. And after
the predictions are sorted, our AI does predict that John-Green-bot would subscribe
to The Art Assignment, so it gets recommended to him… along with some other new channels.
Recommender systems that use collaborative filtering AI can take in lots of different
data, not just a 1 or a 0, for whether a user subscribed to a YouTube channel or bought
a product. A movie rating site might use a one-to-five star rating system. Or a social
media AI could keep track of the number of milliseconds a user dwells on a post.
Regardless, the basic strategy is the same: use known information from users to predict
preferences. And this can get complicated on big websites that gather lots of user information
using a combination of different algorithms.
The real world is full of a lot of data and there are three key problems that can lead
to recommender systems making small or big mistakes.
First, datasets that recommender system AIs get are usually very sparse. Most people don't
watch most shows or videos -- there just isn't enough time! And even fewer people give social
ratings such as “likes.”
Doing any kind of analysis with sparse datasets is very computationally intense, which gets
expensive, which means some companies are willing to lose some accuracy to reduce costs.
Second, there's the cold start problem. When we go on a website for the first time,
for example, the AI doesn't know enough about us to provide good personalized recommendations.
And third, even if an AI makes statistically likely predictions, that doesn't mean those
recommendations are actually useful to us.
Online ads run into this failure a lot, where we'll be shown ads for sites we recently
visited, or something we just bought. Sure, that's probably something I'm interested
in, but I could've figured that out without a recommender system.
In a potentially more harmful way, recommender systems don't understand important social
context, so “statistically likely” recommendations can be worrying.
Recommendations may stereotype users in a socially uncomfortable way.
Like, for example, an AI might assume that because John-Green-Bot is a robot, he really
wants to watch WALL-E and Robocop. Just because he's a robot doesn't mean he wants to
watch robot stuff.
Or recommendations might be inappropriate for certain users, like recommending a video
that a parent would consider too violent to their children after they had watched a bunch
of NERF War videos.
And, on social media, recommendations can trap us in ideological echo chambers, where
we tend to only see the opinions from people that agree with us, which can skew our knowledge
about the world.
This idea that we all see slightly different versions of the internet, and data is constantly
being collected about us, can be a little concerning. But understanding how recommender
systems work, can help us live more knowledgeable lives, and coexist with AI.
When we don't want data added to a recommender system's model of us, we can use a private
or incognito browser window and not log into sites. If we open a news homepage this way,
we might see what the average human (or robot!) is being recommended.
Of course, incognito browsers don't mean total privacy, but this strategy prevents
sites from connecting data -- like, for example, my Twitter account with my searches for tiny
polo shirts on Google (because I needed to get John-Green-bot a birthday present).
Plus, since we spend so much time online, we might want to make the most of it with
really personalized recommendations. So… seriously… “like, comment, and subscribe”
to your favorite creators because as we leave ratings, reviews, and other traces of online
activities, recommender systems can learn better models.
Recommender systems are a part of the internet as we know it, whether we like it or not.
And as AI becomes a bigger part of our lives,these kinds of recommendations will be too. So it's
on us to be aware of this technology, so that we know what kind of world we're living
in, and the ways AI might influence us every single day.
And if you're here to learn how to build recommender systems, my advice would be to
think explicitly about the trade-offs that are involved. Deciding how to define the clusters
of users or items, can create more or less personalized spaces.
In our next episode, we'll work together on some code to build a recommender system,
and we'll get some hands-on experience with weighing some of these trade-offs. I'll
see ya then.
Speaking of recommendations, you should check out Sound Field. Sound Field is a new music
education show from PBS Digital Studios that explores the music theory, production, history
and culture behind our favorite songs and musical styles. Hosted by two supremely talented
musicians, Arthur “LA” Buckner and Nahre Sol, every episode is one part video essay
and one part musical performance.
So go subscribe to Sound Field! Link in the description below.
Crash Course AI is produced in association with PBS Digital Studios! If you want to help
keep all Crash Course free for everybody, forever, you can join our community on Patreon.
And if you want to check out Sound Field, click the link below.