字幕列表 影片播放
(smooth electronic music)
- Hello, everybody, thank you very much for coming.
Good morning.
Hope you are doing well.
I'd like to talk today about conflict resolution
in distributed systems, that is if several
people change some data independently of each other,
what happens, how do we resolve those conflicts
that'll occur.
My background is I'm a researcher at the University
of Cambridge.
I was previously in industry
and a bunch of internet start ups.
So I worked at Linkedin for a couple of years,
for example.
At the moment, I'm working on this research product
called Trve Data.
Spelled T-R-V-E
and what we're trying to do here
is to bring end to end encryption
to a larger range of applications.
So think something like Google Docs
where several people can edit a document
at the same time online
but without having to trust Google servers
because what we want to do is
to be able to put data on various servers in the cloud
but not have to worry about what happens
if they get compromised or so on.
So that's kind of the background of all of this.
I'm not talking about the encryption
and the security protocols today,
I'm only focusing on one little piece of that whole project
which is what happens if several people edit data
at the same time and how do we resolve that.
So I'd like to start with a scenario
that will probably be familiar with you,
which is you, a little blue stick figure here,
are hacking on some code on your computer
and at some point you decide that this code is done
and you commit it using your favorite
version of control system.
I'll just use git as an example here
and so at this point, you'd put the code
in the repository and then maybe you push it somewhere
so that other people can see that `code as well.
So in the case of Git, you maybe you'll push
it to a repository on Github
and this is now the communication mechanism
for people with your team.
So if there's somebody else, say,
this little red stick figure
who is also hacking on code,
then well you can synchronize up through
the central repository.
This is all very familiar, this is what
we do everyday
and so the little red person here
might independently at the same time
also be working on the same code base
and also do a commit and I'll,
what happens if you this person now fetches
from Github, well they'll have to either
do a merge or rebase if that's how your work flow goes
or something along those lines.
So somehow these changes are going
to have to be combined together
and as you've probably experienced,
if people change different files in the same repository,
that's no problem, they will just get merged cleanly.
If one person changes the beginning of a file
and another person changes the end of the file,
that's probably okay because the version control system
will merge them automatically.
If people change the same part of the same file,
then you're going to have to resolve the merge conflict
yourself and then we have these tools
for doing three way mergers for copying patches
from one side to another and figuring out
what the results should be.
So you've probably have to find with things like this.
This is exactly the kind of problem I'm talking about.
But this problem happens not only in software development.
It's a very general purpose problem.
So imagine you're a lawyer working in a law firm
and maybe there's a contract being negotiated
with, you've got a client on one side
and the other company's law firm on the other side
and everybody is sending these versions
of contracts back and forth
and the contracts are probably Microsoft Word documents
because that's how lawyers work
and they send these things by email
and so you've got one person making changes
to these Word documents and then hit save
and then at the same time, maybe somebody else,
maybe at the different company
is also updating the same document.
Changes it and now you email these changes to each other
and so this is actually very much the same data flow
and I actually just reused the same diagram
and changed the labels.
You've got the email as the communication path
and at some point these changes are going
to have to be merged together
and now before Microsoft Word,
I'm not sure there is even this kind
of nice user interface for three way mergers.
I know you can compare two documents
but I think what people end up doing
is manually copying the changes
from one version of the document to another
and so performing this merge really manually.
So that's kind of this oh crap situation here.
So in this case, it's kind of best to have like
an informal lock where one person says,
okay I'm going to be editing the document now.
Please don't change it for the next day.
I'll send it to you and then you can edit it.
So people try to sequence their updates
like this crude manual communication.
What happens in another?
Let's look at the third example.
Let's look at a to do list.
So this is maybe a shared
to do list where me and my wife together
have this shopping list where I can add stuff
and she can add stuff and then whoever next
goes to the shop can buy those things.
So here buying milk is added
to the to do list and let's say this to do list
is stored on a central server.
Now so thats what allows us to communicate
and so I add buy milk to the to do list
and press okay button and so it does a request,
like maybe an http post request
over the network, stores it there, it comes back
and says okay that was added to the to do list
and at the same time, maybe my wife goes,
oh you need to water the plants, so I just remembered.
So add stuff to the to do list
also does this post to the server
and comes back okay
and so in this case, actually what has happened
is if this central server stores its data
in a data base and it uses something like transactions.
If this is a relational database say,
then we actually have serialization going on
and that is these updates are actually applied
in a sequential order, in a serial order.
That's where serializable comes from
and database transactions.
Which means they're applied one at a time.
So you don't actually have the same concurrency problem
as we had with the code editing
and with the Word documents being sent back and forth.
Cause actually, there's one primary copy of the data
and that lives on the server
and that's being updated one transaction
at a time in sequence.
So in this case, you don't
get this conflict resolution problem.
So this seems nice but on the other hand
you have a problem which is well,
what if I don't have signal on my mobile phone right now
or if the network is interuppted for some other reason
or I can press the button to save
and I'll just a spinning, spinning wait indicator
and nothing will happen.
So here we have this, it's kind of obvious,
this problem if you don't have internet connection
then you can't reach the central server
so you can't store any data.
You can't edit the data in anyway.
So it doesn't work offline.
So these are the problems here.
We get the advantage from a central server
that we don't have to worry about
this concurrency and these different edits
happening at the same time
and having to merge those together
but at the cost of it only works online.
So we now we require constant connectivity to the server.
Which is not so great.
Especially if you're doing things on mobile devices.
Another problem is that these requests
are synchornous, so when I press the save button,
I have to wait until I get the okay back from the server
and only then I notice it is actually being saved.
So until I get the okay back,
I don't know whether the network request
actually made it through or not.
Maybe if the network is unrealiable.
So that also can be a timing problem.
Let's see in the document editing case,
say in the case of Google Docs,
you know you can press one letter
and then a second later
that letter will appear on the screen
for another person who's editing the same document.
So there the edits of, the units of editing
is a single key stroke.
Its not even like a commit or something like that.
It's just a single letter
and so in that case, if you wanted to send that
through a central server
than every time you press a letter,
you'd have to first send that to the server,
wait for it to be saved there,
wait for the server to come back to you,
say okay, and then you could
display the letter on the screen
and so you would have to wait that network round trip
for every single change to the document.
Which is actually what happens exactly
if you're editing a file over SSH.
So if you're logged into a Linux server
and you're using Vim or Emax or something
on the server and you type a letter
then that edit actually lives on the server.
So you've actually got that synchronus round trip.
So if you've got a slow one, reliable network,
and using SSH is pretty painful,
as you've probably experienced.
Our final problem with putting everything
on a central server is that now
there's this single point of failure
and people make fun of Github for example,
every time Github goes down,
people say, oh we've got this nice decentrilized
version control system and what do we do?
We put everything back in a centralized service.
Isn't that great?
So the sync can be disrupted.
If you worry about denial of surface attacks,
for example, or if you worry about blocking,
maybe in some countries that don't have such
free internet access, if you're critical
of the local regime or something like that,
actually this kind of blockability
is quite a problem.
So what we'd like to do is
to figure out how to not have these problems
of the single central server
but at the same time,
not have all of the problems of having
to do the merges by hand.
So coming back to the to do list example,
if we think about this having
to synchronously having to communicate
with a server, that's okay if the server is in the same town
but if the server is on the other side of the planet,
then this is quite slow
because it's simply speed of light
takes a while to go all the way round
to the other side of the world and back again.
So what if we just put data centers in several
different places?
So let's say when the blue person adds buy milk
to the to do list, that request just goes
to the blue person's local data center,
call it data center one and get saved there,
and when the red person adds water plants
to the to do list, then that goes
to the local data center, too.
Those are two different local data centers.
If the people are in different locations,
and so now that's okay.
So each person can get the response from their local
data center and now these changes will be
propagated asynchronously and so
the speed of light going all the way around
the planet is still as slow as it was before.
So it could happen that two people make these changes
without knowing about each other
and then you simply don't know which one of these
actually came first.
Because you know it's not really defined.
Did the blue come first or the red one.
From the point of view of data center one,
first the blue one came and then the red one came in
but from the point of view of data center two
it was first the red one then the blue one.
So if this adding an item to the to do list
means just appending it to the end of the list,
then which order should these items appear?
Now we certainly have to worry about the fact
that the data center one might
have buying milk first and watering the plants second
and the other data center might have them flipped
in the other order.
Oh dear.
Let's think this further.
Even without two data centers,
we can just make it really extreme
and say okay we're not even going to talk
to the data center, we're just going
to talk to our local storage on the device
and I'm going to treat the storage on my mobile phone
as a data center though it's the best data center ever
because I can't have a network interupption between
me and the storage on my phone.
So this communication is always going to work.
So I can just store something locally
and that will be nice and fast
and the red user can similarly store something
locally and then we can just kind of have
some kind of blah blah network stuff
to synchronize the changes around
and if you think about it,
this is exactly the same as the synchronization process
that has to happen with the two data centers,
exchanging data and this is exactly
the same synchronization process
that happened with the Git commits
being sent via Github or with the Word documents
being sent by email.
It's all exactly the same,
which is you've got some changes which are happening
concurrently, independently from each other
and those changes are being propagated asynchronously
and here, of course, now we're back
to this problem of conflict.
So take the example of adding two items
to the to do list.
You can kind of imagine that quite easy to resolve.
You just make sure that you somehow
decide on a consistent order.
Maybe using some Ids or something
or some user Ids.
You can order them arbtrially
but what if one item,
one change is deleted to do item
and the other actually edits that same to do item.
So the red one changes buying milk to buying soy milk
and the blue one just deletes the buying milk item
and these two changes happen without knowing about
each other, so how do we resolve this?
Because if you say that the deletion wins.
So, okay this item was deleted,
the to do list item was deleted
so that edit to change buy milk to buy soy milk
is just gone because the item that it refers to
is no longer there
but that means that we've forgotten about the fact
that soy milk was added in there
and so maybe that's an important fact
that we've now lost.
On the other hand, we can do it the way other way around.
We could say that this buying soy milk wins
but when the deletion has been lost
and so maybe the deletion is meaningful.
So how do we resolve these kind of conflicts?
In general, we have concurrent operations
that happen, concurrent doesn't mean necessarily
at the same moment in time.
It just means they happen without knowing about each other
and somehow we need to achieve convergence
where everybody agrees on the same data at the end,
and so people talk about eventual consistency
which is kind of the data base term
that's often used for describing these types of system
where you simply, you can allow different things
to happen concurrently and you just want
to end up in the same state at the end.
The problem with the term eventual consistency
is that it's actually very vaguely defined
and so people are very imprecise when talking about it
and I'd like to break it down into three points,
three properties that are a bit more precise.
So I'm going to firstly assume
eventual delivery and that is we sent messages
over the network and we're going to assume
that the network is not interuppted forever.
So we're going to assume that after some finite amount
of time, you get a network connection again
and so if you keep retrying then you can get
the message through eventually.
You kind of have to make this assumption
because if you assume that a device can be
offline forever, then it can never come in synch
with the other devices again.
Just by definition of offline.
So we're just going to assume
that eventually some messages go through
but we're not going to make any assumptions
about how quick that is.
So if I go to Iceland and there's
no mobile signal and I'm hiking
somewhere up on the top of a volcano
for two weeks, then I won't get any messages
for two weeks and when I get back
to Reykjavik, then I will get some messages
when I reach an internet connection again.
So that means my message delay is two weeks in that case.
That's okay, we're just going to assume that's alright.
Secondly, the second part of eventual consistency
is making sure that everybody ends up in the same state.
That is if we assume that eventually everybody
gets all the messages, then if two people
have received the same messages, then they should
be in the same state.
So that means even if they received the messages
in a different order, they should end up in the same state
and finally what we want is to not lose data.
Now this, it seems like a kind of ridiculous point
because of course nobody wants to lose data
but it is actually quite important to make this explicit.
So quite a few data base users use this mode
called last writer wins
where you say if two people change the data
at the same time, we're just going
to pick one of them arbitrally
based on the time stamp as being the winner
and the other ones were just going to throw away
and so this is like equivalent if you think
about the word document example of,
well we've got two people editing the Word document
and one person emailed their change to another
and the other person is just going to say,
well I made changes myself as well
and I'm going to declare that my changes are
newer than your changes
and so I'm just going to ignore your changes, sorry,
and so this is not very friendly to people
because changes get lost.
So let's require not losing data as well.
So this is what I mean with eventual consistency
and this is the kind of data structure
to which we might want to apply it.
So I'm going to model this to do list now
as a JSON document and you can just imagine
it's like a list of to do items
which is like each to do item has a title
and the flag done, whether it's done or not,
true or false and then maybe there's some settings
and stuff as well on the side,
and so the main data structures that we have here
are an ordered list.
That is you've got a list of to do items
and they have to appear in a certain order
and the user specifies this order
and also you have maps.
So you've got, maybe one adjacent object
inside another adjacent object or something like that.
Now once we've got these data structures
we can then make various changes to them.
So as the user interact with the application,
they will make changes.
For example, they might set watering plants to true.
So they say, okay this is now done,
I'm going to check the box on my phone
and what that does is set this done flag
from false to true.
So the operation here, the change operation,
the edit operation, is assigning a value
to a particular field in this JSON document.
Another thing that might happen
is somebody might edit a string.
So change milk, change buy milk to buy soy milk.
So by inserting a few letters into that string,
another thing that might, people might do
is insert a whole new list item
of phone mum between buy milk and before watering water
and plants.
So this is editing the ordered list object
to insert a new item.
Another thing that people might do
is add another key to this map here
or they might even delete an entire entry.
So the top level to do item, just delete the entire list.
Why not?
Maybe you have several lists or something like that.
So these are the kinds of changes that people
can make to these documents
and what we want to do is have some way of resolving
those changes.
So that even if people make these changes concurrently
to each other, we end up with eventual consistency.
So we can model this document as a tree
and we can have some data type annotations on it.
So say the top level document is a map
and inside it we have a list under the key to do
and so on.
Won't go into too much detail.
So this is actually an algorithm that I developed
with a colleague and we wrote this paper about it
a few months ago.
So if you're interested in that, you can find it online.
It's called a Conflict-Free Replicated JSON Datatype.
Has anyone here heard of CRDTS before?
Okay a few people, cool.
So this is an example of a CRDT.
If you don't know what a CRDT is, don't worry.
This paper is very theoretical .
It looks inside, it looks kind of like this.
So I'm not actually going to run through
all of the operational semantics today,
so don't worry about that.
I'm just going to kind give the intuition
about how the algorithm works
and just show kind of some of the curious edge cases
that occur there.
Some of the stuff that we have to think about
when trying to do this.
So the hope is that this algorithm
would allow people to concurrently edit
JSON documents and merge those edits
and end up with a sensible result at the end.
So one example of a document,
we saw the to do list example,
another one might be simply a text document.
So.
So a text document consists of the file name,
which is just a string.
Consists of characters or of the body,
of the actual text.
So each individual characters.
Like the smallest unit you can edit
and like maybe there might be additional stuff
for formatting, setting fonts and so on,
I'm just going to leave all of that out.
Another thing that's quite useful is cursor positions.
So if you use Google Docs, you can see where
in the document another user is editing right now.
That's quite handy to see so that you don't change
the same place at the same time while you're online.
So you could imagine implementing that as a map
from each client has a position.
A position is like a location in the document
and so that would allow you to keep track
of other people's cursors.
So I'll just focus on the body characters for now.
So the ordered list of characters
is what constitutes the content of the document.
You might wonder, why this is,
why this is a list of strings
and not a list of characters.
So begin footnotes.
It's a list of strings because
in Unicode, if you actually represent like one character,
the smallest editable unit of the document
is not necessarily a single Unicode code point.
Because you have things like combining,
combining marks, I think they're called for accents,
and diacritics and various things
or for emoji, I think the skin color annotation
is a combining mark and so you end up with a sequence
of several Unicode code points that constitutes
an character from user points of view.
If you're interested in this, look at the Unicode annex
number 29.
Anyway, end of footnote.
This is completely irrelevant.
I just wanted to mention that as well.
I'll give a little demo of the text editor
we implemented because its just kind of fun
to make this a bit more interactive.
So what I have here is something that doesn't work.
Sorry.
This is always the thing with demos.
Maybe it has to be on the wifi.
Okay, let's try again.
Okay, good.
So.
So this is a very basic text editor
that we implemented using this data structure,
this algorithm that we developed
and I've got actually two windows here, left and right,
which are both instances of this text editor running
and I can say hello to Berlin and it works.
So you can see I can type something on one side
and it appears on the other side
and these two editors are actually communicating
via a network connection.
So they are two separate processes
that are otherwise don't share anything.
So they could easily be on two separate computers
on other sides of the world, no problem
and what I can now do is can I kill this server.
So they use a web socket server here to communicate
and when I kill the server,
what I'm simulating is a network interupption
between the two.
So now both editors are offline.
So this could be because actually it's a server outage
or just because the clients have lost their internet
connection, it doesn't really matter
and so I can keep editing offline.
So lets say here.
So I'll say, Hello everyone at GOTO Berlin
and you see it doesn't appear on the right hand side,
because they're offline.
So we have these two now
and so I'm going to restart the server now
and what we want is for all of those changes
to be preserved.
So make sure, look at that everyone at is still there
and hope you're all doing well is still there.
So when I restart the server, the editors
in the background keep trying to reconnect
to the server automatically
and keep retrying in the background
and eventually they will mange to connect
and resychronize and now you see everyone at
has been copied from the left to the right
and the hope you're all doing well
has been copied from the right to the left
and we didn't have any like three way merge
user interface for this.
Like it just did this automatically.
Now, let's have a look at how this
algorithm actually works because
that's kind of interesting.
This is basically the same as what Google Docs does, right?
So, Google Docs is a bit fussy
with its offline support
but if you force it into actually doing it,
it essentially does the same kind of merge
and I can run a bit of how the algorithm
in Google Docs work.
So imagine you have a document consisting
of the letters H E L and O
and you label each letter with the index
of what position it is.
So 0, 1, 2, 3 and on the left hand side
we've got the green editor
and on the right hand side, we've got the purple editor
and each editor now makes an edit to this.
So the left hand side inserts a second L character
to change it to Hello
and the right hand side inserts and exclamation mark
at the end, making it H-E-L-O-exclamation mark
and so now we want to merge these two changes
that have happened concurrently
and so in order to do that,
we have the server to,
this is, in this case, run by Google
and these clients send essentially a dif
or like an operation recording what the change,
what change has been made by the user
and sent that operation over to the server
and so the left hand side says insert L
at position three.
So you see 0,1,2,3,4 so we inserted the L
so the O has moved from position three
to position four
and so we inserted that L there
and so this side, we have the inserting exclamation mark
at the position four
and so that describes the change that was made
and now server forwards those changes over
to the other client and so the green change,
inserting the L gets forward by the server over here.
Insert L at position three,
position three is the right position here.
So you end up with Hello exclamation mark.
Which is what we wanted.
However, if you think about what happens
in the other direction.
So this insert exclamation mark at position four,
if we simply send that through unchanged,
what we will get is Hell exclamation mark O
because over here, we inserted the letter L
at position three so the O moved along
from three to four.
So really we would need to change that
to insert exclamation mark at position five
and only then we would get the right outcome
of having Hello exclamation mark
and so what has to happen here is that
this position four needs to be changed to position five
because there was concurrently an insert
at position three.
So the server has to keep track
of all of these things going on simultaneously
and it has to transform the messages,
rewriting this three to four.
Some of the transformation happens on the client as well
but this algorithm does actually depend
on the server to do some work as well
and that works.
So this algorithm is called operational transformation
and it's been around for quite awhile.
So it was first discussed in the academic literature
back in the 1980s.
Although the first paper that presented
this algorithm was actually incorrect
and they said so in the paper.
They said, heres a case in which our algorithm fails.
Can someone help us fix it please?
And then several people, researchers came along
and proposed fixes to it, which worked,
and there were several different competing
algorithms there.
The one that most of the,
well done operational transformation based systems
are kind of inherit from is called Jupiter
from 1995 and that's what like Google Docs
and Ethopad is based off
and Google Wave which is now Apache Wave.
They all use this Jupiter design
which is to use a central server
that does some of the transformations.
Some of the others don't use a central server
but instead keep track of some kind
of multidimensonal hypercubes
of all of the edits happening simultaneously.
They start requiring quite a lot of memory,
some of those algorithms.
Which is why they're not used as often in practice
and so this kind of works fine for Google's purposes
but remember what we wanted is to be able
to work within enter and encryption
and so in that case we can't have a server
transforming our messages
because the server would then have to see
the content of the messages
and we want what we want is,
well we want to avoid one central server
because that's a single point of failure
and we want to avoid that server seeing the content
of our messages.
So we want to be able to just forward the messages
and this is where CRDT's come in.
So CRDT stands for communative,
no sorry, conflict free replicated data types
which is a bit of a mouthful
which is people just say CRDTS
and essentially this is a family of data structures
where several nodes can concurrently change
the data and they could automatically merge
and so they've got, as part of the definition
of the data structure, they've got merge functions
or functions which allow you to apply operations
in a different order and still get the same
outcome at the end and if we want
to model something like a text document
like what I had in our editor example,
in that case we have an ordered list of characters
and so the data type we want here
is an ordered list and several different
algorithms were, have been proposed to that (mumbles),
more recent about in the last 10 years
these things have come up
and the one that I'll describe now
and that our text editor has based off
is called RGA, replicated growable array.
Which came out of a Korean research group in 2011.
So let me show you how this one works,
in a nutshell.
We start off with the same example document
which is H-E-L-O
but now instead of giving each letter
just an index, 0,1,2,3
I'm going to give each a unique identifier.
Each letter has a unique identifier
which might be just 0a, 1a, 2a, 3a
and we now have the same edits happen on the left side,
we insert the letter L
and on the right hand side, we insert the exclamation mark
and every time we insert a new letter,
we have to make up a new identifier for that letter
and we're going to have a little rule
for how we create new identifiers.
So we have, they have to be globally unique
and they also have the certain ordering property
and so we're going to construct them as follows.
Each identifier is a number and a letter
and for the number, we chose one greater
than the highest number we have so far in the document
and so the document so far contains 0 to three as numbers
so the next number we're going to pick is four
and then we call the left hand one is node A,
the right hand one is Node B
and so that will be the letter that we use
for the identifier and so if we assume
that the names of the nodes are unique.
So there's no other node called A, there's only one node
called A and the, furthermore we assume
that these numbers are generated by taking
one plus the highest we have.
So the numbers per node will always be unique.
So that means we generate 4a on the left hand side
and 4b on the right hand side.
So it's the same four because they're both in the
same starting position but they have two different node Ids
so we get two different identifiers
and so that works
and now we can send these things via server
or it doesn't even have to be a server
that goes via P2P network or anything else you like as well.
So I'll just use a server here for simplicity
and instead of inserting at the particular position,
we're now going to say insert L
and we're going to say the idea of the new letter,
which is 4a and we're going to say
where to insert it by saying after position 2a
and so 2a is the first L and so the new L
with 4a get's inserted after 2a
and on the other side, we insert the exclamation mark
after 4, after 3a.
3a is the letter O and so inserting
the exclamation mark with ID for b after 3a
means put the exclamation mark after the O
and so, we can now forward these messages.
So insert L with 4a after 2a.
2a here still means the first L,
so it's going to put the second L in the right place
and give it ID 4a
and on the other side, we can take this message
and apply that.
Insert exclamation mark after 3a.
3a is still the letter O
even though we inserted the letter L before it.
It hasn't changed the fact that the ID
of the letter O is still 3a
and so we can put the exclamation mark
with 4b after the letter O,
after 3a and so we end up with the same document
in both places which is very nice.
We can now just have some kind of network between
these two.
It doesn't have to transform the messages in anyway,
it just needs to make sure it keeps retrying
and eventually the messages get through.
That's all we need here.
We can encrypt them and it's all very nice.
Now there's still a problem with this algorithm,
which I wonder if anyone can spot?
Yes?
Deletion is one thing.
Yes I, deletion is actually quite easy.
So what we do for deletion is just to set a flag
for every ID and say that if it's deleted
then the ID actually remains there, secretly hidden,
but we're just going to say okay, this O is gone now,
so don't show it in the user interface.
That's called a tombstone.
There's not.
Same position, exactly.
So what if two people insert at the same position?
So let's have a simpler example document here.
A,b,c, with Ids 1a, 2a, 3a,
and now let's say X and Y get inserted here
between A and B.
So the left hand inserts X and Y,
gives them two new Ids according to the rules
that I specified earlier.
So 4a and 5a and
and so what that means here is insert X after 1a,
insert Y after 4a and on the other side,
on the right hand side, I'm going to insert P and Q
and we'll have the same rule again for assigning Ids
so they'll get 4b and 5b as their Ids
and they also appear between A and B
and now we need to make sure that all of these letters
end up in some kind of consistent order
on both nodes afterwards
and so lets take those operations and apply them
to the other side.
So insert P with the new Id 4b after 1a.
1a is the first letter A, so we insert P there
and then insert Q with letter,
with id 5b after 4b.
4b was the letter Q, which we've already inserted,
so we can put the, with the letter P.
4b with the P.
So the Q with 5b goes after the 4b
so we put A, P, Q, X, Y, B, C.
So now, okay that's fine.
So far now, we need to make sure that on the other side,
we're actually going to end up with the same thing.
So we need to make sure the X ends up here
after the P Q but before the B
because we put the X here,
then the two would be inconsistent.
So, well how do we do this now?
Because the message is going to be the same.
The message is just going to be,
insert X with ID 4a after 1a
but if we put it straight after 1a,
then the X will go here
and we have an inconsistency.
So I'm gong to introduce a small algorithm
which uses these Ids
and that is, look at the Id of the incoming operation here.
When you're applying an operation that came in
from a different node, in this case that's 4a.
So 4a is the incoming operation
and we first find the position after which
we want to insert.
So 1a is here.
So we start here, after the A
and we look at the next, the ID of the next element
in that list.
Which is 4b and if that ID here, 4b,
is greater than the incoming element,
ID 4a, then we're going to skip over it
and if it's greater, the next one
is greater again, we'll again skip over that
and we keep skipping until we find an element
that is less than the ID of the incoming operation.
So 2a is less than 4a.
So in this ordering here, we look at first the number
than the letter.
So here 4b is greater than 4a because B is greater than A.
5b is definitely greater than 4a
because five is greater than four
but two is less than four
so we know now we have to insert
between the 5b and the 2a.
Whew and then for the final operation is actually easy.
So this insert Y with Id 5a after 4a
while 4a is here.
So 5a just goes after it, that works,
and if you're wondering the skipping here
doesn't apply on this side because here
the 4b is greater than 4a
so we inserted this 4b here, we didn't skip over 4a
because 4b is greater than 4a
and so this actually works
and you kind of have to actually prove it
mathematically to convince yourself
this really works in all cases
but we have proved it and so we actually believed it
and so hoorah, we now have an algorithm
which will allow us to make these changes
concurrently without any transformation,
without any coordination and just allow
us to end up in the same place.
So let's say our document is Hey guys
and we decided that is not gender neutral enough,
so one person is going to change it to hey everyone
and the other person is going to change it
to hey folks and what does our editor do in this case now?
So I described just now how deletion works
and so deletion just means setting a flag on these
so if two people are concurrently delete
the same letter, they both delete the G of guys,
well that deleting it once is the same as deleting twice,
it doesn't get anymore deleted
through being deleted several times.
So those letters just get deleted
and the insertion stay
because insertion doesn't replace another insertion.
So as these two merge, what you're going to end up with
is the two insertion concatinated,
so it will be hey everyone folks
or maybe it will be hey folks everyone.
So the order of those two is arbitrary,
that just depends on coincidence of how
the networking happens to work out
and you know there's no order that is any,
one order isn't any better than the other.
The problem is just that what we have here,
everyone folks is not an English word
and so you kind of end up with junk in the document.
Though this is actually what Google Docs does as well.
So at this point I'm willing to just say,
okay we'll tell the users that look here,
this doesn't pass the spell checker
and it maybe can pop up a warning
saying hey several people edited the document
at the same place.
Actually Google Docs doesn't even do this warning.
It just leaves humans to spot it by themselves
and it seems to work in practice
so I think at this point we just say okay,
it's good enough.
We're not going to try to do a grammatical
analyze of the sentences and do the merges automatically
so that they obey English grammar
because that's just not going to work
and people have tried that
and it simply doesn't work.
So we've got this CRDT, this data structure
for all the lists which several people can change
at the same time.
Described deletion and how that works.
There's still a lot of open questions there.
So the problem with deletion here is
we need to actually remember for a long time
afterwards where a particular ID is
because we use these Ids as a position in the list
like address like a pointer
and so we can't just delete,
if something gets deleted from the document,
we can't then just remove it and forget about it
entirely because then some operation would come in
and insert after 500 and 3a
and we go 500 and 3a, sorry that was deleted.
I don't know where we put that insertion.
So we have to keep those things
which are called tombstones.
Also haven't talked about how to do undo,
like you know, control zed type of undo
or how to reorder the elements in the list.
That's kind of interesting
or how to make all of these Ids efficient
so it doesn't take too much storage.
So still a lot of open questions there.
I think the basic algorithm is kind of fine
but actually putting it into practice
is still going to need a bit of work
but I want to talk about some other stuff as well, actually
so we were talking about these JSON documents earlier
and so interesting things happened
when several people concurrently change JSON documents
that don't happen just in the case of a text document.
So let's say we've got here,
it's a bit of a contrived example I'm afraid,
a map of colors and so this is,
imagine this is an actual structure.
So it's not like curly brace qoutation mark CO etc
as a text document but it's actually the JSON syntax tree.
Described here and so one person inserts a new color,
the color red, into this map
and so you would then end up with a map
containing blue and red
and another person concurrently decides
actually they want to wipe out the entire contents
of the map so it sets it to the empty map
and then adds green into the map
and so how do we now merge these concurrent changes
that have happened here?
What happens, what's the outcome,
what is even the right behavior here?
What do we expect?
So we can think about it systematically
and say, well okay blue, does it contain blue?
Well blue was deleted on the right hand side
by setting the whole map to empty
so I guess blue should not be in the final result
cause it was deleted and the left hand side didn't touch it.
What about red?
Red intially didn't exist but it was inserted on the right,
on the left hand side and the right hand side
didn't touch red so I guess we need red
to be in the final result.
What about green?
Green was inserted on the right hand side
and it was not touched on the left hand side,
so yes we want green to be in the final results.
So our final expected outcome is to contain red and green
but not blue.
That seem logical?
I guess so.
So we can do that and our algorithm does this.
What this means here is that we have
to keep track of what this setting the map to empty means
because if you just take this and say,
on the left hand client,
you first add the red in and then you take that change
that came from the right hand side
which is setting the map to empty
and if you just apply that naively
and say you've got an operation
that has sent the map to empty,
well then you're going to wipe out the red
because the red was added concurrently.
So we have to somehow remember
what the state of the map was
at the time when you set it to empty.
So we have here, the thing is known as a causal context
or something like that.
In databases like react or example,
has anyone used react?
They have this data types feature, okay.
It works very similair to this that you have
this little extra bit of information that you pass around
in a HTP header and that,
the purpose of that is to keep track of
what is the state, what was the state of the document
at this time when you saw it so that then later
you know what the changes need to apply to.
So let's look at a different example.
Here we've got again an to do list example
and we've got buying milk and it is not yet done
and so on the left hand side an edit happens
and it just deletes buy milk from the list
because maybe it's been done
or maybe we no longer need milk or whatever,
it's just removed from the list.
On the right hand side, somebody sets done to true.
So taps the button to set buying milk done to true.
What happens if we try to merge these two different updates?
We can apply the same logic as we have just now
with the colors with the nested maps.
So with the colors what we said is okay,
does it contain each of the different things
depending on who edited what?
Okay so the title of buying milk,
on the left hand side that was deleted
because we deleted the entire to do item.
So I guess the title should not be in the result.
On the right hand side, we didn't touch the title.
So that's got to be deleted.
What about done false?
Well done false was deleted on the left hand side
and over written on the right hand side with done true.
So I guess done false is out.
What about done true?
Done true did not appear on the left hand side.
On the right hand side, done true was added
so we want done true to be in the end result.
So if we apply exactly the same reasoning logic
what we end up with is a to do item
that contains the flag done true but no title
and this is kind of not what you expect
because like we've applied exactly the same logic
but what we've ended up with something is that looks weird
and it looks weird because we kind of have this implicit
scheme idea we expected to do item
to always have a title and the done field
but this merge here has essentially
tampered with our schema.
So what do we do here?
Maybe we need a schema to explicitly say exactly
what fields something must have
but in that case what do we do?
So if somebody changes, if one person changes a field
within an object that has a schema
and another person deletes that entire object,
well do we say the deletion wins?
So in that case, any changes
to stuff within the deleted item
are just going to be lost
but we said earlier we don't want to lose data
so how do we resolve this?
Well maybe we say, actually the,
if somebody concurrently changes this
and sets done to true
then we actually going to bring the title back as well.
So even though the title was actually deleted
on the left hand side, we kind of ressurect
this whole deleted item and say,
okay it's gonna both the title and one true
but then well we forgot about the fact
that somebody deleted this item.
So it seems like something has got to give here
and we don't really know what it is.
We could say that maybe overwriting something
with an empty map should not have the same semantics
as deleting it and re adding it again
or maybe it should have the same semantics.
So I'm afraid this is going to end
on a slightly depressing note which is,
like we simply don't know how to expose
these kind of concurrently editable data structures
to application in a way that is not horrendously computing
and so I think that, I think there's a lot of value
in having these kind of data structures
that you can just merge automatically
and not have to worry about
writing manual conflict resolution code
but at the same time, conccurenacy still is hard,
even if you abstract away the concurrent communication
and everything, you've still got the problem
that somehow you can sometimes end
up in these merge situations where's there no one right way
of doing it and so if any of you have ideas,
I'd love to hear them.
Otherwise, we're continuing to work on this
and just try different approaches,
see what kind of APIs make sense to developers.
We could go back to the old bad days
and say okay, we're just going to have this,
this oh crap scenario,
we're just going to let users resolve
all of the edits manually.
I don't think that's really friendly to users
and so I think that like the popularity of things
like Google Docs and indeed like,
Meld or things like that for conflict resolution
get, shows that I actually, people do need
a bit of tooling and help in order to,
in order to resolve conflict.
We could say, okay well just put everything
on a central server and serialize everything.
That's an option too but again,
we've got this problem that you require
a network communication all the time
and if stuff doesn't work offline,
which is also a shame.
So I think it is worth working on this problem
but it is an open problem.
If you're interested in more details on this,
I've put, tweeted the slides already
so you can find it there
and there are links to all of the papers.
Heres the second page and here is a third page
of references.
So our paper is one of those
and there are also several others
about different CRTDs and different operational
transformation functions.
So that goes into vast amounts of detail.
Here is a book that I've been writing
that I've just sent off to the publishers
just two days ago.
So you an get an early release of this online already.
This is, it's not specifically about merging,
it's very broad, kind of introduction
to the architecture of databases
and what do databases do under the hood, also.
So if you're interested in this sort of thing,
I'd love if you could check it out.
Maybe give me any feedback if you
have any thoughts about it.
That would be wonderful.
Thank you very much all for coming
and I hope you have a great rest of the conference.
(applause)
Do we have a minute or two for questions?
I'm a bit short of time.
We do?
Okay, does anyone have any questions
or comments or anything?
We have some on the app.
That's alright.
So the question is, "Apart from the operational
transformation algorithm,
is there any algorithm which uses weighted operators
to decide the last operation?
For instance, the delete operation
has the lowest impact for that loses.
In terms of weight maybe,
neural network probabilities making decision faster?"
So you can used weighting except
that doesn't fundamentally resolve the problem.
Which is that you can end up then
in some kind of scenarios where you've just
got an impasse and you've got two things
with the same weighting and one of them has got to win
and so you can then arbitraily decide
whether one thing should win over another
and so that's what I was getting at with
kind of these trade offs here of like,
you could specify maybe in a schema
maybe some kind of semantic annotation
saying we want delete to win
or we want update to win.
The problem there is that it's really hard, I think,
to communicate that to developers of like
what does that actually mean?
If you don't have PHD in distributed systems,
will you still be able to understand
what this flag in your schema actually means?
So I think just making stuff comphrensiable
in a way that like, somebody can just go
and build an app and they don't have
to know about all of the internal details
of how conflict resolved internally.
I think would be very good
and so maybe priorities is one way of doing that
but I'm not sure.
The other question just there was,
"How did I make my slides?"
Which is using an iPad, I draw them by hand
using a iPad app called Paper
by a company called 53.
- [Man] I think you should wrap up.
- Okay, thank you very much for coming.
(applause)