校驗和與漢明距離 (Checksums and Hamming distance)

字幕列表影片播放

In the last video we saw how
Parody could be used to detect any single bit error that occurs when transmitting from one system to another
And we can inject an error here too to sort of demonstrate that but the limitation of parity is if there's more than one error
That occurs in your message, then it may not detect it. And actually in this case that seems to be what has happened here
So, for example, we have a comma in our you know
hello comma world
Over here no comma so there was definitely an error and of course
I
injected an error you saw that but I received parity is is even which
Indicates that we did not detect a parity error
Even though there was an error and that's just a limitation of parity is if there's one bit error
You'll you're guaranteed to catch it
But if there's more than one depending on how many errors there are you basically have a 50/50 chance of catching them
And then at the end of the last video I showed you some potential ways to work around this by potentially adding more parity bits
So if we have our message here hello world when we can convert that to binary
we've got the the binary representation of what we're sending and
You know, just four four basic parity. We have a single parity bit
and in this case if we count up all the ones in this message, there are like 49 ones and
In order to send this with even parity. We want to send either 48 or 50
We want to send an even number of ones so we have 49 ones
We need to add another one so that we send an even number of ones
But if we change any two bits in here, for example, we just change any any two ones to zero
So instead of having 50 ones we'll have forty-eight ones that's still an even number and so we won't detect a parity error
So one way to work around that is to add more parity bits
So instead of having a single parity bit for the entire message
We can have a separate parity bit for each byte
And of course the trade-off is if we're sending a parity bit for each byte then we're sending more data overall
Because we're essentially adding 12 percent overhead to our entire transmission and maybe that's worth it because you want to catch errors
But even here you still may not catch every error because if you have multiple errors in a single byte
Then you may not catch that with the parity bit
And so you could keep going you could say well send a parity bit for every four bits or a parity bit for every two
Bits or even a parity bit for every bit which essentially just means you're sending the entire message twice
But all of that just adds more overhead and so it seems like there's this trade-off, right?
so in this video
I want to talk about some ways to sort of do better than this too to get a better air detection
But lower over that's sort of our goal here
And one thing to keep in mind is just the types of errors that we might that we might expect
So it's actually not true that it's equally likely for every bit to just randomly flip errors tend to happen in certain patterns
So for example the error that we had here where you know
We were supposed to be receiving hello comma world but we received hello and then and then two spaces
That error happened because of two bit flips. So if we look at the difference between a comma and a space
It's just these two bits, right?
It's 0 0 1 0
In both cases and then it's either 1 1 0 0 or 0 0 0 0
and so what happened while we were transmitting this is I just hit this button to
Drop those two bits and that that's what injected the error and because it was 2 bits that flipped
We didn't catch it with the parity, but we wouldn't catch you here either, right?
because if those two bits flipped
Then this parity bits still gonna be a 1 right because we'd go from 3 ones in here to just one one in
Here and that's still an odd number. So we'd still have a 1 out here as our parity bit
So we still wouldn't catch it with even this scheme
and in fact this this type of error where you have multiple bit errors in a row is called a burst error and it's quite
Common because you might imagine some sort of thing interfering with your with your signal
Maybe it's not a button like this. But but some sort of interference that happens
It's gonna happen, you know over or maybe some short period of time and corrupt a couple different bits in a row
So one thing we could do with parity. That's actually much better at catching
Burst errors is instead of using a parity bit to protect a sequence of bits in a row have the parity bit protect an interleaved
Sequence of bits so one way to do that is something like this where we have the same data here
But instead of computing a parity bit for each byte is essentially compute a parity bit for each column here
Which is sort of each each bit position within the byte
So for example, if we look at the first bit of every byte, well this case they're all zeros
So the parity bit is a 0
And we just do that for every bit position at the end here. We're gonna have
Another another essentially another byte of 8 parity bits
and so we send our message and then we send these eight parity bits and this is
Much less susceptible to burst errors
so for example
if we change both of these bits here like we did in our message where we lost our comma we would detect that because both
Of these parity bits down here would be invalid now
This still has limitations, of course, so just like here if we change two bits in a row
We wouldn't necessarily detect that here
If you change two bits in a column, you wouldn't detect that and so there's this trade-off between you know
Parity bits per row versus parity bits per column and which one you might choose depends on the types of errors
you might expect and
in most communication systems burst errors are actually pretty common and so something like a a
Parity bit per column here might be a better choice
But know that you you may not catch multiple bit flips in a column
So that's the trade-off that you're making but maybe we don't need to make that trade-off
you know a few of you suggested in the comments on the last video using a checksum and
The term checksum can can technically refer to any additional little bit of information that you attach to your message to validate that it's correct
But it can also literally mean a sum
So if we take the the characters in our message and get their ASCII numerical equivalent
We just add those up and in the case of hello world
We get 11 61. And so what if we just send our message and we send this number 11 61
Well, then you'd think well if anything in here changes, well, then the sums probably going to change. So is that better or
As good or worse than these parity things we've been looking at well to figure that out
it actually helps to look at this addition in binary because you know
We can do the addition in decimal like we did here we could do the same addition in binary
We should get the same answer
And in fact, you know, this answer we get here is 11 61 in in binary
But if we look at the way this addition works, there are actually a lot of similarities to parity going on here
and so we we can actually sort of compare what's going on here to what's going on with
Parity scenario where we were taking a parity bit for every column
Well, it turns out that actually for this first column here on the right
It's the same
Because if we just count up how many ones there are which I guess is the same as adding them
We got 1 2 3 4 5 and so the sum there would be 5 of course. There is no 5 in binary
so the way you represent 5 is 1 0 1 so
You would put a 1 down here. And then we carry a 1 0 over into these next columns
So the 5 here is 1 0 1 we actually look at this one that we're putting down here at the bottom
this is
Identical to a parity bit for this column because if you add up this column or you count the ones however you want to think
about it
if the answer is even then it's a multiple of 2 and
In binary a multiple of 2 is going to have a 0 in the last place
Yeah, just like in decimal if you think about a number that has a zero in the last place
It's going to be a multiple of 10 when binary phase 2 if you have a 0 here it's a multiple of 2
and if it's not a zero, which I mean in binary
The only other option is that it's one then it's not a multiple 2 which means that it's odd
So this bit here essentially is a parity bit for this column
And then the next column works the same way, right? We add these up and there's 1 2 3 4
So there's 4 ones in that column or you can think of the sum being a 4
That's even so you put a 0 there if we're doing parity
but if we're adding then 4 is 1 0 0 so you put the 0 there and carry the 1 0 so 1 0
0 and you carry, but what's going on that's different than parity here
It shows up when we get to the third column here when we add this up in the third column
We got 1 2 3 4 5 6 7 8 9 ones
But it's actually 10 ones because we we have this one that we carried from the first column and so because it's 10
That's was it 1 0 1 0 so we put a 0 down here and carry the 1 0 so
It's 1 0 1 0 and so it's interesting now is that this number here is not the parity bit
For this third column from the right as it as it was originally it's the parity bit of the 3rd column from the right plus
What was carried in from these other additions over here on the right?
and
So in some sense you can think of what we're doing here with this addition as being very similar
To computing a parity bit for each column
But we're kind of holding on to some extra information when we do that
and then that extra information is percolating over into the columns to the left and
So in some sense, we're getting kind of the best of both worlds here
Which is that if we have a burst error and several bits change within a single row here
Then we'll catch that in our checksum down here. But also if several bits change in a column, we may catch that as well
So for example, if both of these ones were to change to 0 the parity wouldn't change still be a 1 down here
but it would be a 1 because the the sum of this column instead of being 5 would be 3 and
So while this wouldn't change what we carry does instead of carrying a 1 0 1 4 a 5 it would just be 1 1
4 3
So that would change the parity of these next two columns or the sum if you want to think of it that way of these
Next two columns and so we're able to catch that type of error here
Now in practice you may not actually send the entire trek sound like this
because if we're sending 8-bit words
Oftentimes what you'll do is you'll just chop off
This first part here and just send eight bits of the checksum and that's that's a pretty common thing to do and you can think
Of that as essentially being equivalent to sending the remainder of a division problem here
So if you add it up you get 1161 if we divide by 256 we get four which is the part on the left there
And then the remainder 137 is the part on the right and we just send that as our essentially is our checksum
That's a pretty common way to do this because if you're already sending 8-bit words
It may not be so easy to just send an extra 11 bits
but of course the drawback of lopping off this lose leftmost bits is you know
What we just talked about this sort of information that percolates over to the left here you'd end up losing some of that
And so one thing that's done
Fairly commonly is to do an end-around carry where you take this extra bit that you you've had there and you essentially add it back
To your original checksum, so you just sort of lop that off
Move it around here and then add it and then you you end up with with this number here
And essentially what that does is that you're doing this sort of end around carry
So when you're adding these leftmost columns here instead of carrying off to the left
you're essentially sort of carrying back around to the right here and
Taking this extra information and holding on to it over here on the right side of your addition problem here
And so this ends up kind of retaining some of that information
Incidentally this technique of doing this end around carry like this when you're doing addition is referred to as one's complement addition and it has
a few other interesting properties
I won't go into all of that right now. But if you're interested, I would encourage you to go look up ones complement addition
Now if we do this end around carry, it turns out that actually what's going on here is instead of dividing by 256
What we're essentially doing is dividing by 255
And it might be interesting for you to sort of think through why that's the case
but but essentially, you know, we take our sum here 1161 if we divide that by 255 this remainder
141 it's actually for more than 137. That's what we end up with. Here is our final checksum if you will
But one technique that's kind of interesting is rather than sending our our message followed by a checksum of 141
We can send our message followed by sort of the inverse of that
So if we just flip all over the bits we get this other number. I think this is 114
And it ends up being the same as 255 minus 141 if we send this is our checksum
So for example if we send our message
Followed by by this number. There's 114 when we do our addition
You know
We do the addition the same way and then we end up with this a little bit over here so we can add that in
And then we add that together we get all ones and then if again we flip all the bits again
we get two zeros and that's
Essentially a very quick way to check that
we got the right checksum and that's an interesting side effect of this ones complement addition where essentially what we're doing is we're adding
And getting the remainder when we divide by 255 and because we can just easily flip the bits to get
255 minus this number when we add that in we add this up
including this checksum
Assuming nothing has changed we would expect to get a multiple of 255 and that's in a sort of where we end up with
This is a very clever way to do a checksum
And in fact, this is the exact technique that is used in what's called the internet checksum
Which is a checksum that is performed on every IP packet on the Internet
The only difference is that for the internet checksum
The data is jumped up into 16-bit words instead of 8-bit words
But otherwise, it's the same thing you add it up. You you carry this bit. That's carried. You bring wrap that around
Add that together flip it and then that becomes the checksum and then you can validate it just by doing that same addition
Wrapping around and you'll end up when you flip it with all zeros
Assuming that it's correct
and that's how the internet checksum works obviously very widely used but the checksum still has
Limitations and you know to think about what some of those limitations are you can think about, you know
Since we're doing addition here
One of the things about addition is that the order that you add numbers in doesn't matter and so the order that these bytes show
Up also wouldn't affect the final checksum validation
So if we received this message and each of these each of these letters was just like flipped around in a completely different order
We'd get the same checksum and we have no way of detecting that that had been reordered like that
That's one limitation of this type of checksum
now another limitation could be you know data being inserted or removed now, obviously if we inserted a you know,
Just a random byte in here
We'd probably detect that with a checksum or if we removed any of these bytes, we would certainly detect that with our checksum
But if you just inserted a bunch of bytes, there were all zeros
You wouldn't have any way of detecting that so just in the middle of our message if we instead of having this 13
Character message, you know, it turned into a 20 character message and they were just like a bunch of zeros inserted, you know
We received the wrong thing, but the checksum would be the same because when you add and 0 doesn't change the result
Same thing if our message had some zeroes in it
No, in this case, none of our bytes are just all zeros, but if we had some zeros
You know some bytes that were just all zeros if some of those bytes got removed from our message in transit again
It wouldn't affect the checksum. We wouldn't be able to detect that
Hey, you may be thinking Oh extra zero bytes getting inserted into our message or bytes arriving out of order
You know, that's that's unlikely in our in our application here and and that's certainly true
I mean, we're definitely not gonna get things arriving out of order because you know
Our transmission here is just a wire right every byte
They go every bit that goes in is gonna the wire on this end is gonna come out of the wire on on this end
In the same order
There's just nothing in here that's gonna cause them to be out of order
But in more complex networks data arriving out of order
This is actually quite possible because there might be multiple paths to get from point A to point B and some data might go across
path one and other data might go across the other path and
if those paths don't take quite the same amount of time some of the data might show up before the other data even though it
was sent after it and you get data out of words, you'd want to be able to detect that and
So and so something like the checksum that we just saw might not be able to catch that
Now there are ways to use checksums that are able to detect out of order data
and just as an example one common sort of low-tech way is these
ISBN numbers that are on the backs of of every book and there's a couple different formats as a 13 digit format and a 10
digit format I look at the 10 digit one and
Basically, they look at this 10 digit number. There's nine digits
And then this last digit in this case the six
That's actually a checksum of the other digits and it's a checksum that also is able to detect if any of the digits get swapped
And in this case that's important because you might imagine somebody is typing this number into a computer and they just mix up a couple
Digits instead of four six five they put four or five six or something like that
You want to be able to catch that and have the computer say oh that's not a valid number. So the way this works is
you've got the
ISBN number here at least the first nine digits of it and the way that we calculate that last digit that's six
Is that instead of just doing a simple sum of these digits?
Like we would for a normal some first
what we do is multiply them by a number that indicates the place that they're in so we multiply
The first digit by 10 the next digit by 9 8 7 so forth and we get this
sequence of numbers here and this is actually what we add and
So instead of you know adding four plus six plus five and so forth, you know
We end up adding 36 plus 48 plus 35 and adding that it kind of encodes the the place that the number is
And then we end up with our sum in this case. It's 192
Then you get that final digit. What we do is we divide by 11, so
We divide 192 by 11 and we get 17. We actually don't care about that part
what we care about is the remainder which is 5 and it's interesting that we're dividing by 11 that's sort of an interesting choice and
The reason for that is that each of the numbers that we're adding here to get to our sum are
the the product of two numbers
neither of which are factors of 11 and
so what that means is that if either of these numbers change either the
The digit or the or the place essentially for that digit, then the sum will change and the sum will change by some value
That's not a multiple of 11
And because of that we'll know that the remainder that we end up getting
Will be different if any of these digits changes or if the place of any of these digits changes which is which is pretty powerful
That's what we want
And then the other thing you'll notice is that we don't actually put the 5 in here as our check digit
We we take 11 minus that 5 and put a 6 as our check digit
So if our check digit ends up being 6, which is what we see on the book here
And the reason for that is is this other pattern that you might have started to notice with these error detection codes
Which is we'll take the entire number with the checksum and compute a checksum of that what we'll end up with is
Ends up being zero. So for example here take six times one. We're going to get six instead of 192. This will be 198 and
Then if we divide 11 into 198
We get 18 and that goes in evenly there's there's a remainder of 0
And that's an important pattern to notice and you've probably seen that popping up multiple times here, you know
So instead of putting our 5 here
We we put a 6 such that when we recompute the checksum including the 6 we end up with essentially a remainder of 0
same thing happened when we're doing these checks on they're sort of like the internet checksum, which is you know,
We get our answer here when we do our ones complement addition
But then we invert it such that when we do the same
Operation with our sequence of numbers including that that inverted checksum the answer we get once we invert it also ends up being zero
So we sort of apply the same process to the original data plus the checksum we get a zero and that's how we validate it
and
In fact, this is the same principle
We saw with our parity bit here
which is we send our message and then we send a parity bit and
Eventually the parity bit that we're going to send here is going to be a one and we send that parity bit of one such
That when we add that parity bit to our message we end up with a parity of zero for the whole thing
so it's that same concept of
crafting your check bit or check digit or
Checksum, whatever. It is crafting that in such a way so that when you recompute it on the receiver
Including that checksum, if you received it correctly you end up with a zero, so that's that's a pattern
We'll see over and over again now incidentally because we're dividing by eleven a remainder and hence
our checksum could be any number from zero to ten not to zero to nine and
So in some cases, you'll see a book that instead of having a number from zero to nine
It has an X there in that last digit and that just means that the checksum was a ten
And we talked about out of order errors and burst errors as potential problems
but of course
You can also just have random bit errors that just occur randomly
And in order to think about how good a particular error detection scheme is at detecting these random bit errors
It's a useful concept to think about which is called hamming distance
And the Hamming distance is essentially the number of bits that are changed between two messages
So for example this message hi
And this message ho
Have a Hamming distance of two
because the only difference between these two messages including the parity bit is just these two bits have been flipped and
So we could say the Hamming distance between these two messages is two
But we think about error detection
What we want to think about is the Hamming distance between is called valid code words and a valid codeword
Well, it depends on the error detection scheme you're using so if we're we're using parity like this then well
we have to decide what type of parity so in this case we're saying we're going to use even parity and
a valid codeword and even parity has NEC the bits that has an even number of ones you
Know he sequence if it's including the parity bit has an even number of ones so both of these
Here have an even number of ones and so both of these are valid code word
If you had a sequence of bits where only one bit had flipped, but the parity bit was still 0 for example
Then it would have an odd number of ones over all and so that would not be a valid code word
And the way all this terminology is useful is it gives us a way to quantify how good?
parity in this case is at detecting a random bit errors and
So because the Hamming distance between valid code words is two bits
That means that you have to change at least two bits in your message in
Order to sort of I guess fool the the parity detection scheme
Well, how about check sums well checksum is actually the same thing, so in this case, I have a message
hi, and then I have a message over here ha with the parentheses that you know, maybe got corrupted and
It got corrupted just by changing these two bits
I changed a 1 to a zero and a zero to a one
And so the sum ends up being the same and then the checksum you can see is the same in both messages
And so when we add those up I get all ones
Which is how we validate that this is a valid codeword
And so again with a checksum you have this concept of a valid codeword and a valid codeword is one
where well when you add everything up
Including the checksum you get all ones or you can think about it you add it up and then flip all the bits you get
All zeros, however, you want to think about that
so here are two messages both are valid code words under under the
Checksum scheme that we talked about and the difference between these two valid code words
Is just two bits and so the Hamming distance essentially between valid code words for checksum is two bits
So what does that mean?
that means that the minimum number of bits that have to be flipped before you're able to detect an error with a checksum is
two bits
And so in some sense checksum is no better than then parity
And of course we've seen that checksum is is better than parity in a lot of ways
But at least in this dimension or this in this measurement of Hamming distance, both of them are susceptible
To just two bits being flipped they can both be fooled
Well, there are error detection schemes that have a higher hemming distance
So just an example if we go back to the parity examples that we're looking at earlier in the video here
Here we have hello world
We have a single parity bit and as you've seen many times
Of course, if you change any one bit in here, then then this entire thing will become an invalid code word
But if we have two bit errors, then we wouldn't detect that at all
So like I just said parity has a Hamming distance of two
But how about how about this scheme here where we have a parity bit for for each byte. Well, it's the same thing
I mean we could just change two bits in in a single row here and
We'd fool this and you'd have a you'd have what appears to be a valid message even though two bits changed
So again, this scheme has a Hamming distance of two
How about here? Well, same thing obviously, you know, you change two bits in a column and you'd fool it as well. So
Hamming distance for parity it seems in all of these cases is is gonna be two
But check out what happens when we combine all of these
And we send a parity bit for for every byte
But then we also send a parity bit for each column
and
Then we also send the parity bit here for the entire message
which turns out is also the parity bit for this last column of parity bits and it's also the parity bit for
This last row of parity bits conveniently enough
If we send all of this all of these parity bits, then it turns out we're able to detect more random bit errors
So for example, obviously if we change any particular bit in here, we would detect that you know
because if we change this bit here, for example
Then the parity for this row would be wrong. The parity for this column would be wrong
And the overall parity would be wrong. So so one bit error easy to detect just like any other parity scenario
Well now what happens if we change two bits. Well, we change two bits in the same row then the
Parity bit for the rows not going to detect it. But the parity bits for the two columns will detect it. Same thing
If we if we change two bits in two different rows, but in the same column
You know the column parity bit won't detected but the two row parity bits will and in fact
You can change any two bits anywhere in here and you let me try them all if you want
But you can change any two bits in here and you and you'd still be able to detect that
Well, how about three bits? Well, if we change three bits, I mean there's a couple different options as to how they're arranged
But you can imagine well either all three bits are in the same row
In which case the parity for that row, which would be well and parity for the columns for that matter
but you can have two bits in a row that change and then one bit in another row and
This row here the parity would check out that would be fine
And then of course this column here the parity would check out as well and that would be fine
But this column it wouldn't right because you only you only have one bit that changed in this column
And so you'd still detect that and in fact, you can try any arrangement of three bit errors
And you would be able to detect it
How about for bidders? Well, four bit errors
We could change two bits in in this row and we would not be able to detect that
the the parity for this row would still be correct and
Then we could change two bits in this row and same thing the parity for that row would be correct
And because we can arrange the four bits so that you know, there's two in in these two columns here
We won't be able to detect that either and so finally by changing four bits
We're able to transform this message in a way where we aren't able to detect that there's any error with it
And so it turns out with this scheme the number of bits that you have to change to go from any one sort of valid
Encoding or any what we call a valid codeword to another valid codeword is four bits
It's the minimum number of bits that have to change and they have to of course be the right four bits
But the minimum is four bits and so we would say that this scheme has a Hamming distance of four
Which of course is, you know better than simple parity that are even then a checksum
But of course there's there's drawbacks, right?
I mean the obvious drawback with this scheme is the amount of overhead that it takes right because we're adding
You know an extra bit for every byte
So that's something like twelve and a half percent overhead there plus an extra byte essentially for the overall message
So we're adding quite a bit of overhead with with this game
Of course a longer message means more of these of these parity bits over here
So there's a trade-off again, you know, we're sending more data to get you know, sort of a better
quote unquote better scheme
And of course overhead and effectiveness of catching different types of errors
Isn't the only thing that we care about we might also care about how easy an algorithm is to implement in hardware
so for example parity turned out to be very simple to
Implement with just a couple chips like this and we're actually computing the parity bit in hardware and sending that along
Checksum, you can imagine be much more complex to to actually try implement in hardware by ourselves may be fairly trivial to do in software
Now that we have these nice little micro processors we can do all sorts of fancy stuff
But if we really want to be very efficient
We may just want to compute a checksum in hardware
And that's another consideration is how easy it is to implement some of these different algorithms. We've been talking about purely in hardware
And so that's why I'm going to talk about one more algorithm in the next video that turns out to be a really powerful
Algorithm and that's the the CRC the cyclical redundancy check
And it turns out you're able to devise CRC checks that are able to detect random bit errors
In fact have a very hide Hamming distance
And they're also able to detect out of order data and very good at detecting burst errors
So almost ideal for what we're what we're talking about here
But that's want to be talking about in a lot of detail in the next video
We're a walk through precisely how CRC works and how we can prove that the different types of errors that it's able to detect
And then we'll get into how all of the math that is behind CRC can be implemented in just a few chips
In fact, not much more than we have here and we'll be able to do an incredibly robust error detection completely in hardware
You