Placeholder Image

字幕列表 影片播放

  • (smooth electronic music)

  • - Hello, everybody, thank you very much for coming.

  • Good morning.

  • Hope you are doing well.

  • I'd like to talk today about conflict resolution

  • in distributed systems, that is if several

  • people change some data independently of each other,

  • what happens, how do we resolve those conflicts

  • that'll occur.

  • My background is I'm a researcher at the University

  • of Cambridge.

  • I was previously in industry

  • and a bunch of internet start ups.

  • So I worked at Linkedin for a couple of years,

  • for example.

  • At the moment, I'm working on this research product

  • called Trve Data.

  • Spelled T-R-V-E

  • and what we're trying to do here

  • is to bring end to end encryption

  • to a larger range of applications.

  • So think something like Google Docs

  • where several people can edit a document

  • at the same time online

  • but without having to trust Google servers

  • because what we want to do is

  • to be able to put data on various servers in the cloud

  • but not have to worry about what happens

  • if they get compromised or so on.

  • So that's kind of the background of all of this.

  • I'm not talking about the encryption

  • and the security protocols today,

  • I'm only focusing on one little piece of that whole project

  • which is what happens if several people edit data

  • at the same time and how do we resolve that.

  • So I'd like to start with a scenario

  • that will probably be familiar with you,

  • which is you, a little blue stick figure here,

  • are hacking on some code on your computer

  • and at some point you decide that this code is done

  • and you commit it using your favorite

  • version of control system.

  • I'll just use git as an example here

  • and so at this point, you'd put the code

  • in the repository and then maybe you push it somewhere

  • so that other people can see that `code as well.

  • So in the case of Git, you maybe you'll push

  • it to a repository on Github

  • and this is now the communication mechanism

  • for people with your team.

  • So if there's somebody else, say,

  • this little red stick figure

  • who is also hacking on code,

  • then well you can synchronize up through

  • the central repository.

  • This is all very familiar, this is what

  • we do everyday

  • and so the little red person here

  • might independently at the same time

  • also be working on the same code base

  • and also do a commit and I'll,

  • what happens if you this person now fetches

  • from Github, well they'll have to either

  • do a merge or rebase if that's how your work flow goes

  • or something along those lines.

  • So somehow these changes are going

  • to have to be combined together

  • and as you've probably experienced,

  • if people change different files in the same repository,

  • that's no problem, they will just get merged cleanly.

  • If one person changes the beginning of a file

  • and another person changes the end of the file,

  • that's probably okay because the version control system

  • will merge them automatically.

  • If people change the same part of the same file,

  • then you're going to have to resolve the merge conflict

  • yourself and then we have these tools

  • for doing three way mergers for copying patches

  • from one side to another and figuring out

  • what the results should be.

  • So you've probably have to find with things like this.

  • This is exactly the kind of problem I'm talking about.

  • But this problem happens not only in software development.

  • It's a very general purpose problem.

  • So imagine you're a lawyer working in a law firm

  • and maybe there's a contract being negotiated

  • with, you've got a client on one side

  • and the other company's law firm on the other side

  • and everybody is sending these versions

  • of contracts back and forth

  • and the contracts are probably Microsoft Word documents

  • because that's how lawyers work

  • and they send these things by email

  • and so you've got one person making changes

  • to these Word documents and then hit save

  • and then at the same time, maybe somebody else,

  • maybe at the different company

  • is also updating the same document.

  • Changes it and now you email these changes to each other

  • and so this is actually very much the same data flow

  • and I actually just reused the same diagram

  • and changed the labels.

  • You've got the email as the communication path

  • and at some point these changes are going

  • to have to be merged together

  • and now before Microsoft Word,

  • I'm not sure there is even this kind

  • of nice user interface for three way mergers.

  • I know you can compare two documents

  • but I think what people end up doing

  • is manually copying the changes

  • from one version of the document to another

  • and so performing this merge really manually.

  • So that's kind of this oh crap situation here.

  • So in this case, it's kind of best to have like

  • an informal lock where one person says,

  • okay I'm going to be editing the document now.

  • Please don't change it for the next day.

  • I'll send it to you and then you can edit it.

  • So people try to sequence their updates

  • like this crude manual communication.

  • What happens in another?

  • Let's look at the third example.

  • Let's look at a to do list.

  • So this is maybe a shared

  • to do list where me and my wife together

  • have this shopping list where I can add stuff

  • and she can add stuff and then whoever next

  • goes to the shop can buy those things.

  • So here buying milk is added

  • to the to do list and let's say this to do list

  • is stored on a central server.

  • Now so thats what allows us to communicate

  • and so I add buy milk to the to do list

  • and press okay button and so it does a request,

  • like maybe an http post request

  • over the network, stores it there, it comes back

  • and says okay that was added to the to do list

  • and at the same time, maybe my wife goes,

  • oh you need to water the plants, so I just remembered.

  • So add stuff to the to do list

  • also does this post to the server

  • and comes back okay

  • and so in this case, actually what has happened

  • is if this central server stores its data

  • in a data base and it uses something like transactions.

  • If this is a relational database say,

  • then we actually have serialization going on

  • and that is these updates are actually applied

  • in a sequential order, in a serial order.

  • That's where serializable comes from

  • and database transactions.

  • Which means they're applied one at a time.

  • So you don't actually have the same concurrency problem

  • as we had with the code editing

  • and with the Word documents being sent back and forth.

  • Cause actually, there's one primary copy of the data

  • and that lives on the server

  • and that's being updated one transaction

  • at a time in sequence.

  • So in this case, you don't

  • get this conflict resolution problem.

  • So this seems nice but on the other hand

  • you have a problem which is well,

  • what if I don't have signal on my mobile phone right now

  • or if the network is interuppted for some other reason

  • or I can press the button to save

  • and I'll just a spinning, spinning wait indicator

  • and nothing will happen.

  • So here we have this, it's kind of obvious,

  • this problem if you don't have internet connection

  • then you can't reach the central server

  • so you can't store any data.

  • You can't edit the data in anyway.

  • So it doesn't work offline.

  • So these are the problems here.

  • We get the advantage from a central server

  • that we don't have to worry about

  • this concurrency and these different edits

  • happening at the same time

  • and having to merge those together

  • but at the cost of it only works online.

  • So we now we require constant connectivity to the server.

  • Which is not so great.

  • Especially if you're doing things on mobile devices.

  • Another problem is that these requests

  • are synchornous, so when I press the save button,

  • I have to wait until I get the okay back from the server

  • and only then I notice it is actually being saved.

  • So until I get the okay back,

  • I don't know whether the network request

  • actually made it through or not.

  • Maybe if the network is unrealiable.

  • So that also can be a timing problem.

  • Let's see in the document editing case,

  • say in the case of Google Docs,

  • you know you can press one letter

  • and then a second later

  • that letter will appear on the screen

  • for another person who's editing the same document.

  • So there the edits of, the units of editing

  • is a single key stroke.

  • Its not even like a commit or something like that.

  • It's just a single letter

  • and so in that case, if you wanted to send that

  • through a central server

  • than every time you press a letter,

  • you'd have to first send that to the server,

  • wait for it to be saved there,

  • wait for the server to come back to you,

  • say okay, and then you could

  • display the letter on the screen

  • and so you would have to wait that network round trip

  • for every single change to the document.

  • Which is actually what happens exactly

  • if you're editing a file over SSH.

  • So if you're logged into a Linux server

  • and you're using Vim or Emax or something

  • on the server and you type a letter

  • then that edit actually lives on the server.

  • So you've actually got that synchronus round trip.

  • So if you've got a slow one, reliable network,

  • and using SSH is pretty painful,

  • as you've probably experienced.

  • Our final problem with putting everything

  • on a central server is that now

  • there's this single point of failure

  • and people make fun of Github for example,

  • every time Github goes down,

  • people say, oh we've got this nice decentrilized

  • version control system and what do we do?

  • We put everything back in a centralized service.

  • Isn't that great?

  • So the sync can be disrupted.

  • If you worry about denial of surface attacks,

  • for example, or if you worry about blocking,

  • maybe in some countries that don't have such

  • free internet access, if you're critical

  • of the local regime or something like that,

  • actually this kind of blockability

  • is quite a problem.

  • So what we'd like to do is

  • to figure out how to not have these problems

  • of the single central server

  • but at the same time,

  • not have all of the problems of having

  • to do the merges by hand.

  • So coming back to the to do list example,

  • if we think about this having

  • to synchronously having to communicate

  • with a server, that's okay if the server is in the same town

  • but if the server is on the other side of the planet,

  • then this is quite slow

  • because it's simply speed of light

  • takes a while to go all the way round

  • to the other side of the world and back again.

  • So what if we just put data centers in several

  • different places?

  • So let's say when the blue person adds buy milk

  • to the to do list, that request just goes

  • to the blue person's local data center,

  • call it data center one and get saved there,

  • and when the red person adds water plants

  • to the to do list, then that goes

  • to the local data center, too.

  • Those are two different local data centers.

  • If the people are in different locations,

  • and so now that's okay.

  • So each person can get the response from their local

  • data center and now these changes will be

  • propagated asynchronously and so

  • the speed of light going all the way around

  • the planet is still as slow as it was before.

  • So it could happen that two people make these changes

  • without knowing about each other

  • and then you simply don't know which one of these

  • actually came first.

  • Because you know it's not really defined.

  • Did the blue come first or the red one.

  • From the point of view of data center one,

  • first the blue one came and then the red one came in

  • but from the point of view of data center two

  • it was first the red one then the blue one.

  • So if this adding an item to the to do list

  • means just appending it to the end of the list,

  • then which order should these items appear?

  • Now we certainly have to worry about the fact

  • that the data center one might

  • have buying milk first and watering the plants second

  • and the other data center might have them flipped

  • in the other order.

  • Oh dear.

  • Let's think this further.

  • Even without two data centers,

  • we can just make it really extreme

  • and say okay we're not even going to talk

  • to the data center, we're just going

  • to talk to our local storage on the device

  • and I'm going to treat the storage on my mobile phone

  • as a data center though it's the best data center ever

  • because I can't have a network interupption between

  • me and the storage on my phone.

  • So this communication is always going to work.

  • So I can just store something locally

  • and that will be nice and fast

  • and the red user can similarly store something

  • locally and then we can just kind of have

  • some kind of blah blah network stuff

  • to synchronize the changes around

  • and if you think about it,

  • this is exactly the same as the synchronization process

  • that has to happen with the two data centers,

  • exchanging data and this is exactly

  • the same synchronization process

  • that happened with the Git commits

  • being sent via Github or with the Word documents

  • being sent by email.

  • It's all exactly the same,

  • which is you've got some changes which are happening

  • concurrently, independently from each other

  • and those changes are being propagated asynchronously

  • and here, of course, now we're back

  • to this problem of conflict.

  • So take the example of adding two items

  • to the to do list.

  • You can kind of imagine that quite easy to resolve.

  • You just make sure that you somehow

  • decide on a consistent order.

  • Maybe using some Ids or something

  • or some user Ids.

  • You can order them arbtrially

  • but what if one item,

  • one change is deleted to do item

  • and the other actually edits that same to do item.

  • So the red one changes buying milk to buying soy milk

  • and the blue one just deletes the buying milk item

  • and these two changes happen without knowing about

  • each other, so how do we resolve this?

  • Because if you say that the deletion wins.

  • So, okay this item was deleted,

  • the to do list item was deleted

  • so that edit to change buy milk to buy soy milk

  • is just gone because the item that it refers to

  • is no longer there

  • but that means that we've forgotten about the fact

  • that soy milk was added in there

  • and so maybe that's an important fact

  • that we've now lost.

  • On the other hand, we can do it the way other way around.

  • We could say that this buying soy milk wins

  • but when the deletion has been lost

  • and so maybe the deletion is meaningful.

  • So how do we resolve these kind of conflicts?

  • In general, we have concurrent operations

  • that happen, concurrent doesn't mean necessarily

  • at the same moment in time.

  • It just means they happen without knowing about each other

  • and somehow we need to achieve convergence

  • where everybody agrees on the same data at the end,

  • and so people talk about eventual consistency

  • which is kind of the data base term

  • that's often used for describing these types of system

  • where you simply, you can allow different things

  • to happen concurrently and you just want

  • to end up in the same state at the end.

  • The problem with the term eventual consistency

  • is that it's actually very vaguely defined

  • and so people are very imprecise when talking about it

  • and I'd like to break it down into three points,

  • three properties that are a bit more precise.

  • So I'm going to firstly assume

  • eventual delivery and that is we sent messages

  • over the network and we're going to assume

  • that the network is not interuppted forever.

  • So we're going to assume that after some finite amount

  • of time, you get a network connection again

  • and so if you keep retrying then you can get

  • the message through eventually.

  • You kind of have to make this assumption

  • because if you assume that a device can be

  • offline forever, then it can never come in synch

  • with the other devices again.

  • Just by definition of offline.

  • So we're just going to assume

  • that eventually some messages go through

  • but we're not going to make any assumptions

  • about how quick that is.

  • So if I go to Iceland and there's

  • no mobile signal and I'm hiking

  • somewhere up on the top of a volcano

  • for two weeks, then I won't get any messages

  • for two weeks and when I get back

  • to Reykjavik, then I will get some messages

  • when I reach an internet connection again.

  • So that means my message delay is two weeks in that case.

  • That's okay, we're just going to assume that's alright.

  • Secondly, the second part of eventual consistency

  • is making sure that everybody ends up in the same state.

  • That is if we assume that eventually everybody

  • gets all the messages, then if two people

  • have received the same messages, then they should

  • be in the same state.

  • So that means even if they received the messages

  • in a different order, they should end up in the same state

  • and finally what we want is to not lose data.

  • Now this, it seems like a kind of ridiculous point

  • because of course nobody wants to lose data

  • but it is actually quite important to make this explicit.

  • So quite a few data base users use this mode

  • called last writer wins

  • where you say if two people change the data

  • at the same time, we're just going

  • to pick one of them arbitrally

  • based on the time stamp as being the winner

  • and the other ones were just going to throw away

  • and so this is like equivalent if you think

  • about the word document example of,

  • well we've got two people editing the Word document

  • and one person emailed their change to another

  • and the other person is just going to say,

  • well I made changes myself as well

  • and I'm going to declare that my changes are

  • newer than your changes

  • and so I'm just going to ignore your changes, sorry,

  • and so this is not very friendly to people

  • because changes get lost.

  • So let's require not losing data as well.

  • So this is what I mean with eventual consistency

  • and this is the kind of data structure

  • to which we might want to apply it.

  • So I'm going to model this to do list now

  • as a JSON document and you can just imagine

  • it's like a list of to do items

  • which is like each to do item has a title

  • and the flag done, whether it's done or not,

  • true or false and then maybe there's some settings

  • and stuff as well on the side,

  • and so the main data structures that we have here

  • are an ordered list.

  • That is you've got a list of to do items

  • and they have to appear in a certain order

  • and the user specifies this order

  • and also you have maps.

  • So you've got, maybe one adjacent object

  • inside another adjacent object or something like that.

  • Now once we've got these data structures

  • we can then make various changes to them.

  • So as the user interact with the application,

  • they will make changes.

  • For example, they might set watering plants to true.

  • So they say, okay this is now done,

  • I'm going to check the box on my phone

  • and what that does is set this done flag

  • from false to true.

  • So the operation here, the change operation,

  • the edit operation, is assigning a value

  • to a particular field in this JSON document.

  • Another thing that might happen

  • is somebody might edit a string.

  • So change milk, change buy milk to buy soy milk.

  • So by inserting a few letters into that string,

  • another thing that might, people might do

  • is insert a whole new list item

  • of phone mum between buy milk and before watering water

  • and plants.

  • So this is editing the ordered list object

  • to insert a new item.

  • Another thing that people might do

  • is add another key to this map here

  • or they might even delete an entire entry.

  • So the top level to do item, just delete the entire list.

  • Why not?

  • Maybe you have several lists or something like that.

  • So these are the kinds of changes that people

  • can make to these documents

  • and what we want to do is have some way of resolving

  • those changes.

  • So that even if people make these changes concurrently

  • to each other, we end up with eventual consistency.

  • So we can model this document as a tree

  • and we can have some data type annotations on it.

  • So say the top level document is a map

  • and inside it we have a list under the key to do

  • and so on.

  • Won't go into too much detail.

  • So this is actually an algorithm that I developed

  • with a colleague and we wrote this paper about it

  • a few months ago.

  • So if you're interested in that, you can find it online.

  • It's called a Conflict-Free Replicated JSON Datatype.

  • Has anyone here heard of CRDTS before?

  • Okay a few people, cool.

  • So this is an example of a CRDT.

  • If you don't know what a CRDT is, don't worry.

  • This paper is very theoretical .

  • It looks inside, it looks kind of like this.

  • So I'm not actually going to run through

  • all of the operational semantics today,

  • so don't worry about that.

  • I'm just going to kind give the intuition

  • about how the algorithm works

  • and just show kind of some of the curious edge cases

  • that occur there.

  • Some of the stuff that we have to think about

  • when trying to do this.

  • So the hope is that this algorithm

  • would allow people to concurrently edit

  • JSON documents and merge those edits

  • and end up with a sensible result at the end.

  • So one example of a document,

  • we saw the to do list example,

  • another one might be simply a text document.

  • So.

  • So a text document consists of the file name,

  • which is just a string.

  • Consists of characters or of the body,

  • of the actual text.

  • So each individual characters.

  • Like the smallest unit you can edit

  • and like maybe there might be additional stuff

  • for formatting, setting fonts and so on,

  • I'm just going to leave all of that out.

  • Another thing that's quite useful is cursor positions.

  • So if you use Google Docs, you can see where

  • in the document another user is editing right now.

  • That's quite handy to see so that you don't change

  • the same place at the same time while you're online.

  • So you could imagine implementing that as a map

  • from each client has a position.

  • A position is like a location in the document

  • and so that would allow you to keep track

  • of other people's cursors.

  • So I'll just focus on the body characters for now.

  • So the ordered list of characters

  • is what constitutes the content of the document.

  • You might wonder, why this is,

  • why this is a list of strings

  • and not a list of characters.

  • So begin footnotes.

  • It's a list of strings because

  • in Unicode, if you actually represent like one character,

  • the smallest editable unit of the document

  • is not necessarily a single Unicode code point.

  • Because you have things like combining,

  • combining marks, I think they're called for accents,

  • and diacritics and various things

  • or for emoji, I think the skin color annotation

  • is a combining mark and so you end up with a sequence

  • of several Unicode code points that constitutes

  • an character from user points of view.

  • If you're interested in this, look at the Unicode annex

  • number 29.

  • Anyway, end of footnote.

  • This is completely irrelevant.

  • I just wanted to mention that as well.

  • I'll give a little demo of the text editor

  • we implemented because its just kind of fun

  • to make this a bit more interactive.

  • So what I have here is something that doesn't work.

  • Sorry.

  • This is always the thing with demos.

  • Maybe it has to be on the wifi.

  • Okay, let's try again.

  • Okay, good.

  • So.

  • So this is a very basic text editor

  • that we implemented using this data structure,

  • this algorithm that we developed

  • and I've got actually two windows here, left and right,

  • which are both instances of this text editor running

  • and I can say hello to Berlin and it works.

  • So you can see I can type something on one side

  • and it appears on the other side

  • and these two editors are actually communicating

  • via a network connection.

  • So they are two separate processes

  • that are otherwise don't share anything.

  • So they could easily be on two separate computers

  • on other sides of the world, no problem

  • and what I can now do is can I kill this server.

  • So they use a web socket server here to communicate

  • and when I kill the server,

  • what I'm simulating is a network interupption

  • between the two.

  • So now both editors are offline.

  • So this could be because actually it's a server outage

  • or just because the clients have lost their internet

  • connection, it doesn't really matter

  • and so I can keep editing offline.

  • So lets say here.

  • So I'll say, Hello everyone at GOTO Berlin

  • and you see it doesn't appear on the right hand side,

  • because they're offline.

  • So we have these two now

  • and so I'm going to restart the server now

  • and what we want is for all of those changes

  • to be preserved.

  • So make sure, look at that everyone at is still there

  • and hope you're all doing well is still there.

  • So when I restart the server, the editors

  • in the background keep trying to reconnect

  • to the server automatically

  • and keep retrying in the background

  • and eventually they will mange to connect

  • and resychronize and now you see everyone at

  • has been copied from the left to the right

  • and the hope you're all doing well

  • has been copied from the right to the left

  • and we didn't have any like three way merge

  • user interface for this.

  • Like it just did this automatically.

  • Now, let's have a look at how this

  • algorithm actually works because

  • that's kind of interesting.

  • This is basically the same as what Google Docs does, right?

  • So, Google Docs is a bit fussy

  • with its offline support

  • but if you force it into actually doing it,

  • it essentially does the same kind of merge

  • and I can run a bit of how the algorithm

  • in Google Docs work.

  • So imagine you have a document consisting

  • of the letters H E L and O

  • and you label each letter with the index

  • of what position it is.

  • So 0, 1, 2, 3 and on the left hand side

  • we've got the green editor

  • and on the right hand side, we've got the purple editor

  • and each editor now makes an edit to this.

  • So the left hand side inserts a second L character

  • to change it to Hello

  • and the right hand side inserts and exclamation mark

  • at the end, making it H-E-L-O-exclamation mark

  • and so now we want to merge these two changes

  • that have happened concurrently

  • and so in order to do that,

  • we have the server to,

  • this is, in this case, run by Google

  • and these clients send essentially a dif

  • or like an operation recording what the change,

  • what change has been made by the user

  • and sent that operation over to the server

  • and so the left hand side says insert L

  • at position three.

  • So you see 0,1,2,3,4 so we inserted the L

  • so the O has moved from position three

  • to position four

  • and so we inserted that L there

  • and so this side, we have the inserting exclamation mark

  • at the position four

  • and so that describes the change that was made

  • and now server forwards those changes over

  • to the other client and so the green change,

  • inserting the L gets forward by the server over here.

  • Insert L at position three,

  • position three is the right position here.

  • So you end up with Hello exclamation mark.

  • Which is what we wanted.

  • However, if you think about what happens

  • in the other direction.

  • So this insert exclamation mark at position four,

  • if we simply send that through unchanged,

  • what we will get is Hell exclamation mark O

  • because over here, we inserted the letter L

  • at position three so the O moved along

  • from three to four.

  • So really we would need to change that

  • to insert exclamation mark at position five

  • and only then we would get the right outcome

  • of having Hello exclamation mark

  • and so what has to happen here is that

  • this position four needs to be changed to position five

  • because there was concurrently an insert

  • at position three.

  • So the server has to keep track

  • of all of these things going on simultaneously

  • and it has to transform the messages,

  • rewriting this three to four.

  • Some of the transformation happens on the client as well

  • but this algorithm does actually depend

  • on the server to do some work as well

  • and that works.

  • So this algorithm is called operational transformation

  • and it's been around for quite awhile.

  • So it was first discussed in the academic literature

  • back in the 1980s.

  • Although the first paper that presented

  • this algorithm was actually incorrect

  • and they said so in the paper.

  • They said, heres a case in which our algorithm fails.

  • Can someone help us fix it please?

  • And then several people, researchers came along

  • and proposed fixes to it, which worked,

  • and there were several different competing

  • algorithms there.

  • The one that most of the,

  • well done operational transformation based systems

  • are kind of inherit from is called Jupiter

  • from 1995 and that's what like Google Docs

  • and Ethopad is based off

  • and Google Wave which is now Apache Wave.

  • They all use this Jupiter design

  • which is to use a central server

  • that does some of the transformations.

  • Some of the others don't use a central server

  • but instead keep track of some kind

  • of multidimensonal hypercubes

  • of all of the edits happening simultaneously.

  • They start requiring quite a lot of memory,

  • some of those algorithms.

  • Which is why they're not used as often in practice

  • and so this kind of works fine for Google's purposes

  • but remember what we wanted is to be able

  • to work within enter and encryption

  • and so in that case we can't have a server

  • transforming our messages

  • because the server would then have to see

  • the content of the messages

  • and we want what we want is,

  • well we want to avoid one central server

  • because that's a single point of failure

  • and we want to avoid that server seeing the content

  • of our messages.

  • So we want to be able to just forward the messages

  • and this is where CRDT's come in.

  • So CRDT stands for communative,

  • no sorry, conflict free replicated data types

  • which is a bit of a mouthful

  • which is people just say CRDTS

  • and essentially this is a family of data structures

  • where several nodes can concurrently change

  • the data and they could automatically merge

  • and so they've got, as part of the definition

  • of the data structure, they've got merge functions

  • or functions which allow you to apply operations

  • in a different order and still get the same

  • outcome at the end and if we want

  • to model something like a text document

  • like what I had in our editor example,

  • in that case we have an ordered list of characters

  • and so the data type we want here

  • is an ordered list and several different

  • algorithms were, have been proposed to that (mumbles),

  • more recent about in the last 10 years

  • these things have come up

  • and the one that I'll describe now

  • and that our text editor has based off

  • is called RGA, replicated growable array.

  • Which came out of a Korean research group in 2011.

  • So let me show you how this one works,

  • in a nutshell.

  • We start off with the same example document

  • which is H-E-L-O

  • but now instead of giving each letter

  • just an index, 0,1,2,3

  • I'm going to give each a unique identifier.

  • Each letter has a unique identifier

  • which might be just 0a, 1a, 2a, 3a

  • and we now have the same edits happen on the left side,

  • we insert the letter L

  • and on the right hand side, we insert the exclamation mark

  • and every time we insert a new letter,

  • we have to make up a new identifier for that letter

  • and we're going to have a little rule

  • for how we create new identifiers.

  • So we have, they have to be globally unique

  • and they also have the certain ordering property

  • and so we're going to construct them as follows.

  • Each identifier is a number and a letter

  • and for the number, we chose one greater

  • than the highest number we have so far in the document

  • and so the document so far contains 0 to three as numbers

  • so the next number we're going to pick is four

  • and then we call the left hand one is node A,

  • the right hand one is Node B

  • and so that will be the letter that we use

  • for the identifier and so if we assume

  • that the names of the nodes are unique.

  • So there's no other node called A, there's only one node

  • called A and the, furthermore we assume

  • that these numbers are generated by taking

  • one plus the highest we have.

  • So the numbers per node will always be unique.

  • So that means we generate 4a on the left hand side

  • and 4b on the right hand side.

  • So it's the same four because they're both in the

  • same starting position but they have two different node Ids

  • so we get two different identifiers

  • and so that works

  • and now we can send these things via server

  • or it doesn't even have to be a server

  • that goes via P2P network or anything else you like as well.

  • So I'll just use a server here for simplicity

  • and instead of inserting at the particular position,

  • we're now going to say insert L

  • and we're going to say the idea of the new letter,

  • which is 4a and we're going to say

  • where to insert it by saying after position 2a

  • and so 2a is the first L and so the new L

  • with 4a get's inserted after 2a

  • and on the other side, we insert the exclamation mark

  • after 4, after 3a.

  • 3a is the letter O and so inserting

  • the exclamation mark with ID for b after 3a

  • means put the exclamation mark after the O

  • and so, we can now forward these messages.

  • So insert L with 4a after 2a.

  • 2a here still means the first L,

  • so it's going to put the second L in the right place

  • and give it ID 4a

  • and on the other side, we can take this message

  • and apply that.

  • Insert exclamation mark after 3a.

  • 3a is still the letter O

  • even though we inserted the letter L before it.

  • It hasn't changed the fact that the ID

  • of the letter O is still 3a

  • and so we can put the exclamation mark

  • with 4b after the letter O,

  • after 3a and so we end up with the same document

  • in both places which is very nice.

  • We can now just have some kind of network between

  • these two.

  • It doesn't have to transform the messages in anyway,

  • it just needs to make sure it keeps retrying

  • and eventually the messages get through.

  • That's all we need here.

  • We can encrypt them and it's all very nice.

  • Now there's still a problem with this algorithm,

  • which I wonder if anyone can spot?

  • Yes?

  • Deletion is one thing.

  • Yes I, deletion is actually quite easy.

  • So what we do for deletion is just to set a flag

  • for every ID and say that if it's deleted

  • then the ID actually remains there, secretly hidden,

  • but we're just going to say okay, this O is gone now,

  • so don't show it in the user interface.

  • That's called a tombstone.

  • There's not.

  • Same position, exactly.

  • So what if two people insert at the same position?

  • So let's have a simpler example document here.

  • A,b,c, with Ids 1a, 2a, 3a,

  • and now let's say X and Y get inserted here

  • between A and B.

  • So the left hand inserts X and Y,

  • gives them two new Ids according to the rules

  • that I specified earlier.

  • So 4a and 5a and

  • and so what that means here is insert X after 1a,

  • insert Y after 4a and on the other side,

  • on the right hand side, I'm going to insert P and Q

  • and we'll have the same rule again for assigning Ids

  • so they'll get 4b and 5b as their Ids

  • and they also appear between A and B

  • and now we need to make sure that all of these letters

  • end up in some kind of consistent order

  • on both nodes afterwards

  • and so lets take those operations and apply them

  • to the other side.

  • So insert P with the new Id 4b after 1a.

  • 1a is the first letter A, so we insert P there

  • and then insert Q with letter,

  • with id 5b after 4b.

  • 4b was the letter Q, which we've already inserted,

  • so we can put the, with the letter P.

  • 4b with the P.

  • So the Q with 5b goes after the 4b

  • so we put A, P, Q, X, Y, B, C.

  • So now, okay that's fine.

  • So far now, we need to make sure that on the other side,

  • we're actually going to end up with the same thing.

  • So we need to make sure the X ends up here

  • after the P Q but before the B

  • because we put the X here,

  • then the two would be inconsistent.

  • So, well how do we do this now?

  • Because the message is going to be the same.

  • The message is just going to be,

  • insert X with ID 4a after 1a

  • but if we put it straight after 1a,

  • then the X will go here

  • and we have an inconsistency.

  • So I'm gong to introduce a small algorithm

  • which uses these Ids

  • and that is, look at the Id of the incoming operation here.

  • When you're applying an operation that came in

  • from a different node, in this case that's 4a.

  • So 4a is the incoming operation

  • and we first find the position after which

  • we want to insert.

  • So 1a is here.

  • So we start here, after the A

  • and we look at the next, the ID of the next element

  • in that list.

  • Which is 4b and if that ID here, 4b,

  • is greater than the incoming element,

  • ID 4a, then we're going to skip over it

  • and if it's greater, the next one

  • is greater again, we'll again skip over that

  • and we keep skipping until we find an element

  • that is less than the ID of the incoming operation.

  • So 2a is less than 4a.

  • So in this ordering here, we look at first the number

  • than the letter.

  • So here 4b is greater than 4a because B is greater than A.

  • 5b is definitely greater than 4a

  • because five is greater than four

  • but two is less than four

  • so we know now we have to insert

  • between the 5b and the 2a.

  • Whew and then for the final operation is actually easy.

  • So this insert Y with Id 5a after 4a

  • while 4a is here.

  • So 5a just goes after it, that works,

  • and if you're wondering the skipping here

  • doesn't apply on this side because here

  • the 4b is greater than 4a

  • so we inserted this 4b here, we didn't skip over 4a

  • because 4b is greater than 4a

  • and so this actually works

  • and you kind of have to actually prove it

  • mathematically to convince yourself

  • this really works in all cases

  • but we have proved it and so we actually believed it

  • and so hoorah, we now have an algorithm

  • which will allow us to make these changes

  • concurrently without any transformation,

  • without any coordination and just allow

  • us to end up in the same place.

  • So let's say our document is Hey guys

  • and we decided that is not gender neutral enough,

  • so one person is going to change it to hey everyone

  • and the other person is going to change it

  • to hey folks and what does our editor do in this case now?

  • So I described just now how deletion works

  • and so deletion just means setting a flag on these

  • so if two people are concurrently delete

  • the same letter, they both delete the G of guys,

  • well that deleting it once is the same as deleting twice,

  • it doesn't get anymore deleted

  • through being deleted several times.

  • So those letters just get deleted

  • and the insertion stay

  • because insertion doesn't replace another insertion.

  • So as these two merge, what you're going to end up with

  • is the two insertion concatinated,

  • so it will be hey everyone folks

  • or maybe it will be hey folks everyone.

  • So the order of those two is arbitrary,

  • that just depends on coincidence of how

  • the networking happens to work out

  • and you know there's no order that is any,

  • one order isn't any better than the other.

  • The problem is just that what we have here,

  • everyone folks is not an English word

  • and so you kind of end up with junk in the document.

  • Though this is actually what Google Docs does as well.

  • So at this point I'm willing to just say,

  • okay we'll tell the users that look here,

  • this doesn't pass the spell checker

  • and it maybe can pop up a warning

  • saying hey several people edited the document

  • at the same place.

  • Actually Google Docs doesn't even do this warning.

  • It just leaves humans to spot it by themselves

  • and it seems to work in practice

  • so I think at this point we just say okay,

  • it's good enough.

  • We're not going to try to do a grammatical

  • analyze of the sentences and do the merges automatically

  • so that they obey English grammar

  • because that's just not going to work

  • and people have tried that

  • and it simply doesn't work.

  • So we've got this CRDT, this data structure

  • for all the lists which several people can change

  • at the same time.

  • Described deletion and how that works.

  • There's still a lot of open questions there.

  • So the problem with deletion here is

  • we need to actually remember for a long time

  • afterwards where a particular ID is

  • because we use these Ids as a position in the list

  • like address like a pointer

  • and so we can't just delete,

  • if something gets deleted from the document,

  • we can't then just remove it and forget about it

  • entirely because then some operation would come in

  • and insert after 500 and 3a

  • and we go 500 and 3a, sorry that was deleted.

  • I don't know where we put that insertion.

  • So we have to keep those things

  • which are called tombstones.

  • Also haven't talked about how to do undo,

  • like you know, control zed type of undo

  • or how to reorder the elements in the list.

  • That's kind of interesting

  • or how to make all of these Ids efficient

  • so it doesn't take too much storage.

  • So still a lot of open questions there.

  • I think the basic algorithm is kind of fine

  • but actually putting it into practice

  • is still going to need a bit of work

  • but I want to talk about some other stuff as well, actually

  • so we were talking about these JSON documents earlier

  • and so interesting things happened

  • when several people concurrently change JSON documents

  • that don't happen just in the case of a text document.

  • So let's say we've got here,

  • it's a bit of a contrived example I'm afraid,

  • a map of colors and so this is,

  • imagine this is an actual structure.

  • So it's not like curly brace qoutation mark CO etc

  • as a text document but it's actually the JSON syntax tree.

  • Described here and so one person inserts a new color,

  • the color red, into this map

  • and so you would then end up with a map

  • containing blue and red

  • and another person concurrently decides

  • actually they want to wipe out the entire contents

  • of the map so it sets it to the empty map

  • and then adds green into the map

  • and so how do we now merge these concurrent changes

  • that have happened here?

  • What happens, what's the outcome,

  • what is even the right behavior here?

  • What do we expect?

  • So we can think about it systematically

  • and say, well okay blue, does it contain blue?

  • Well blue was deleted on the right hand side

  • by setting the whole map to empty

  • so I guess blue should not be in the final result

  • cause it was deleted and the left hand side didn't touch it.

  • What about red?

  • Red intially didn't exist but it was inserted on the right,

  • on the left hand side and the right hand side

  • didn't touch red so I guess we need red

  • to be in the final result.

  • What about green?

  • Green was inserted on the right hand side

  • and it was not touched on the left hand side,

  • so yes we want green to be in the final results.

  • So our final expected outcome is to contain red and green

  • but not blue.

  • That seem logical?

  • I guess so.

  • So we can do that and our algorithm does this.

  • What this means here is that we have

  • to keep track of what this setting the map to empty means

  • because if you just take this and say,

  • on the left hand client,

  • you first add the red in and then you take that change

  • that came from the right hand side

  • which is setting the map to empty

  • and if you just apply that naively

  • and say you've got an operation

  • that has sent the map to empty,

  • well then you're going to wipe out the red

  • because the red was added concurrently.

  • So we have to somehow remember

  • what the state of the map was

  • at the time when you set it to empty.

  • So we have here, the thing is known as a causal context

  • or something like that.

  • In databases like react or example,

  • has anyone used react?

  • They have this data types feature, okay.

  • It works very similair to this that you have

  • this little extra bit of information that you pass around

  • in a HTP header and that,

  • the purpose of that is to keep track of

  • what is the state, what was the state of the document

  • at this time when you saw it so that then later

  • you know what the changes need to apply to.

  • So let's look at a different example.

  • Here we've got again an to do list example

  • and we've got buying milk and it is not yet done

  • and so on the left hand side an edit happens

  • and it just deletes buy milk from the list

  • because maybe it's been done

  • or maybe we no longer need milk or whatever,

  • it's just removed from the list.

  • On the right hand side, somebody sets done to true.

  • So taps the button to set buying milk done to true.

  • What happens if we try to merge these two different updates?

  • We can apply the same logic as we have just now

  • with the colors with the nested maps.

  • So with the colors what we said is okay,

  • does it contain each of the different things

  • depending on who edited what?

  • Okay so the title of buying milk,

  • on the left hand side that was deleted

  • because we deleted the entire to do item.

  • So I guess the title should not be in the result.

  • On the right hand side, we didn't touch the title.

  • So that's got to be deleted.

  • What about done false?

  • Well done false was deleted on the left hand side

  • and over written on the right hand side with done true.

  • So I guess done false is out.

  • What about done true?

  • Done true did not appear on the left hand side.

  • On the right hand side, done true was added

  • so we want done true to be in the end result.

  • So if we apply exactly the same reasoning logic

  • what we end up with is a to do item

  • that contains the flag done true but no title

  • and this is kind of not what you expect

  • because like we've applied exactly the same logic

  • but what we've ended up with something is that looks weird

  • and it looks weird because we kind of have this implicit

  • scheme idea we expected to do item

  • to always have a title and the done field

  • but this merge here has essentially

  • tampered with our schema.

  • So what do we do here?

  • Maybe we need a schema to explicitly say exactly

  • what fields something must have

  • but in that case what do we do?

  • So if somebody changes, if one person changes a field

  • within an object that has a schema

  • and another person deletes that entire object,

  • well do we say the deletion wins?

  • So in that case, any changes

  • to stuff within the deleted item

  • are just going to be lost

  • but we said earlier we don't want to lose data

  • so how do we resolve this?

  • Well maybe we say, actually the,

  • if somebody concurrently changes this

  • and sets done to true

  • then we actually going to bring the title back as well.

  • So even though the title was actually deleted

  • on the left hand side, we kind of ressurect

  • this whole deleted item and say,

  • okay it's gonna both the title and one true

  • but then well we forgot about the fact

  • that somebody deleted this item.

  • So it seems like something has got to give here

  • and we don't really know what it is.

  • We could say that maybe overwriting something

  • with an empty map should not have the same semantics

  • as deleting it and re adding it again

  • or maybe it should have the same semantics.

  • So I'm afraid this is going to end

  • on a slightly depressing note which is,

  • like we simply don't know how to expose

  • these kind of concurrently editable data structures

  • to application in a way that is not horrendously computing

  • and so I think that, I think there's a lot of value

  • in having these kind of data structures

  • that you can just merge automatically

  • and not have to worry about

  • writing manual conflict resolution code

  • but at the same time, conccurenacy still is hard,

  • even if you abstract away the concurrent communication

  • and everything, you've still got the problem

  • that somehow you can sometimes end

  • up in these merge situations where's there no one right way

  • of doing it and so if any of you have ideas,

  • I'd love to hear them.

  • Otherwise, we're continuing to work on this

  • and just try different approaches,

  • see what kind of APIs make sense to developers.

  • We could go back to the old bad days

  • and say okay, we're just going to have this,

  • this oh crap scenario,

  • we're just going to let users resolve

  • all of the edits manually.

  • I don't think that's really friendly to users

  • and so I think that like the popularity of things

  • like Google Docs and indeed like,

  • Meld or things like that for conflict resolution

  • get, shows that I actually, people do need

  • a bit of tooling and help in order to,

  • in order to resolve conflict.

  • We could say, okay well just put everything

  • on a central server and serialize everything.

  • That's an option too but again,

  • we've got this problem that you require

  • a network communication all the time

  • and if stuff doesn't work offline,

  • which is also a shame.

  • So I think it is worth working on this problem

  • but it is an open problem.

  • If you're interested in more details on this,

  • I've put, tweeted the slides already

  • so you can find it there

  • and there are links to all of the papers.

  • Heres the second page and here is a third page

  • of references.

  • So our paper is one of those

  • and there are also several others

  • about different CRTDs and different operational

  • transformation functions.

  • So that goes into vast amounts of detail.

  • Here is a book that I've been writing

  • that I've just sent off to the publishers

  • just two days ago.

  • So you an get an early release of this online already.

  • This is, it's not specifically about merging,

  • it's very broad, kind of introduction

  • to the architecture of databases

  • and what do databases do under the hood, also.

  • So if you're interested in this sort of thing,

  • I'd love if you could check it out.

  • Maybe give me any feedback if you

  • have any thoughts about it.

  • That would be wonderful.

  • Thank you very much all for coming

  • and I hope you have a great rest of the conference.

  • (applause)

  • Do we have a minute or two for questions?

  • I'm a bit short of time.

  • We do?

  • Okay, does anyone have any questions

  • or comments or anything?

  • We have some on the app.

  • That's alright.

  • So the question is, "Apart from the operational

  • transformation algorithm,

  • is there any algorithm which uses weighted operators

  • to decide the last operation?

  • For instance, the delete operation

  • has the lowest impact for that loses.

  • In terms of weight maybe,

  • neural network probabilities making decision faster?"

  • So you can used weighting except

  • that doesn't fundamentally resolve the problem.

  • Which is that you can end up then

  • in some kind of scenarios where you've just

  • got an impasse and you've got two things

  • with the same weighting and one of them has got to win

  • and so you can then arbitraily decide

  • whether one thing should win over another

  • and so that's what I was getting at with

  • kind of these trade offs here of like,

  • you could specify maybe in a schema

  • maybe some kind of semantic annotation

  • saying we want delete to win

  • or we want update to win.

  • The problem there is that it's really hard, I think,

  • to communicate that to developers of like

  • what does that actually mean?

  • If you don't have PHD in distributed systems,

  • will you still be able to understand

  • what this flag in your schema actually means?

  • So I think just making stuff comphrensiable

  • in a way that like, somebody can just go

  • and build an app and they don't have

  • to know about all of the internal details

  • of how conflict resolved internally.

  • I think would be very good

  • and so maybe priorities is one way of doing that

  • but I'm not sure.

  • The other question just there was,

  • "How did I make my slides?"

  • Which is using an iPad, I draw them by hand

  • using a iPad app called Paper

  • by a company called 53.

  • - [Man] I think you should wrap up.

  • - Okay, thank you very much for coming.

  • (applause)

(smooth electronic music)

字幕與單字

單字即點即查 點擊單字可以查詢單字解釋

B1 中級

2016年GOTO--為最終的一致性而解決衝突--Martin Kleppmann。 (GOTO 2016 • Conflict Resolution for Eventual Consistency • Martin Kleppmann)

  • 41 12
    colin 發佈於 2021 年 01 月 14 日
影片單字