Name: 沙赞的工作原理（How Shazam Works）
Uploaded: 2021-06-11T14:46:30.000Z
Duration: 10 min 25 s
Description: 【看影片學英語】數萬部 YouTube 影片，搭配英漢字典即點即查，輕鬆掌握單字發音與用法，長久累積看電影不必再看字幕。

基礎被動語態

This episode of Real Engineering is brought to you by Brilliant, a problem solving website

that teaches you to think like an engineer.

Introduction: Opening, scene in a pub listening to a song and opening the shazam app.

What you just witnessed was the Shazam app recognising a song in a noisy environment,

and proceeding to find a match for it among the millions of songs in its servers database.

For most this probably seems like a trivial task.

Our brains can identify songs incredibly quickly from a young age, but the pathways in your

brain that allow you to identify a song quickly are incredibly complex.

Often times you simply need to hear just a few chords to know exactly what song is about

to play, that jolt of excitement when you can hear a DJ fading in the baseline of your

A simple combination of tones in a specific order allow you to identify a song from the

thousands of other songs you have heard in your life in an instant, but coding a computer

do the same thing is an incredible challenge.

A computer does not have an intuitive understanding of music.

A computer can only compare songs to other songs in its database, looking for a match

It is a problem akin to finding a needle in a haystack, where you can only find the needle

by looking at a picture of a needle and comparing it to each individual straw, comparing it's

length and colour until you finally find the needle.

To create a software capable of doing this task quickly poses a very interesting coding

challenge and the solution the engineers at Shazam came up with gives us some interesting

A study by the Manchester Museum of Science and Industry tested 12,000 people's ability

They created an interactive game to search for the most recognisable songs, where they

would play the hook of 1000 best selling songs and recorded the time required to identify

Can you identify this song with just 2.3 seconds of the hook?

That was the Spice Girls song Wannabe, which ranks highest with a recognition time averaging

just 2.3 seconds, and that's including the reaction time required to hit the button.

Our brains are hardwired for this kind of pattern recognition.

In a world where recognising the sound of an approaching threat meant life or death,

we have evolved incredibly efficient ways of categorizing and accessing historical data

Our brain does not take the sound and compare it to every sound we have ever heard like

a computer, the specific combination of chords in progression simply activates specific neurons

What if the chords were played by a different instrument?

Those same 2.3 seconds played on a guitar sounds like this.

The notes are exactly the same, but they don't sound the exactly the same.

We even know intuitively what instrument is playing.

This is called the timbre of a note and different instruments have different timbres.

Pianos and guitars are examples of harmonic instruments and when they produce a note,

they aren't just producing a pure note of a single frequency.

Each note is a combination of multiple frequencies all related to the base note, the fundamental

These are called overtones, and they are simply multiples of the base frequency.

Each instrument has a unique combination and evolution of these overtones that give it

Again, it's quite easy for our brains to distinguish between a piano and a guitar,

but we need a way to quantify these characteristics for a computer to recognise, and this is where

A spectrogram is a visual representation of sound.

It's a 3D graph with time on the x-axis, frequency on the y-axis, and the amplitude

of the frequency, or in other words the loudness, on the z-axis, which is often represented

This 3D graph is something a computer can absolutely recognise and store as data, but

there is huge amount of data within a spectrogram like this, and the more data there is the

more computation time is required to find a match.

So the first step in reducing computation time is reducing the data required to classify

Shazam uses something they call a fingerprint, where they transform these spectrograms into

[2] Here each star represents the strongest frequencies at particular times.

Doing this, we have not only reduced our graph from 3 dimensions down to 2, but have drastically

reduced the amount of data points on the graph.

This is the first vital part of Shazam's technology.

Every single song in Shazams database is stored in a fingerprint like this.

When you open your phone and hit that Shazam button, the app accesses your microphone and

begins to create its own fingerprint of the sound waves it receives.

This ingenious method also helps the shazam app to filter out noise because it only creates

Once the app has created a fingerprint of your audio, it then sends it to the shazam

servers where the recognition part of the process begins.

Let's look at a simplified song fingerprint, and a recorded fingerprint to see why.

The recorded fingerprint is only a short recording of the song, in our example we have just 3

possible frequencies, and each recorded fingerprint will have just 3 time points.

If we want to check the first 3 time points in the song for a match we first check the

3 frequencies, then we move onto the next time point and check the 3 possible frequencies

again, and do the same for the final time point.

If we find a match, that is 9 operations required to find a match, but obviously that isn't

We then need to do those nine operations for every time point in the song, or perhaps every

time point in Shazams massive music archive, this obviously is going to take a lot of computation

This is not how Shazam looks for a match.

First Shazam categorises fingerprints in a clever way.

We don't search to see if a note exists in a song, we search to see if several notes

exist separated by a particular time, just as brain does.

This becomes our searchable address for a hash table.

Hashes and hash functions are an incredibly useful technique that appear everywhere in

Hash functions can be found in search algorithms used by Google, to make sure files are downloaded

correctly, and are the backbone of crypto currencies like bitcoin.