Placeholder Image

字幕列表 影片播放

  • Hi everyone!! Welcome to CodeKarle. My name is Sandeep, and in this video we'll

  • go over how do we design a navigation application like Google Maps.

  • Now let's go over some Functional and Non Functional Requirements(NFRs) that we want

  • this platform to support. So the very first thing is when a person wants to

  • move from point A to point B, this platform should be able to give them a

  • couple of information . 1) What is the route that they should follow

  • 2) Given that route how much distance they'll have to cover and how much time

  • it will take. As an enhanced variant we could also build a system in which we

  • show them 2-3 routes and then let them choose that they want to minimize the

  • distance or minimize the time. 3) The next thing is that it should have a very

  • pluggable model. So let's say we make a very basic design which just has no

  • traffic data or anything of that sort. Now let's say if you want to plug in

  • traffic information it should be easy enough to add it. Now post that let's say

  • if you want to add information about weather or accidents or construction

  • going on somewhere or road blockages or something of that sort, it should have a

  • very easy way to input this data and not make a change in whole of the architecture.

  • So that's the main idea of making a pluggable. 4) There's one more

  • thing that we'll cover in a very brief thing, that is basically how do you

  • identify roads. So there are two three ways in which we can identify new roads.

  • So one very basic or the most efficient way is to just ask government sources

  • to give us all the information about all the roads data that they have. Now

  • they might not have all the data but at least we'll have a good starting point.

  • Now beyond that we can capture a lot of organic data. So let's say if a lot of

  • users are going on to a place or a route kind of a thing which we don't identify

  • as a road, then we can easily say that it there is a road that we don't know of

  • and we can kind of onboard that Road onto our system. Now we can use some data

  • like number of people traveling on it, which sides are they traveling to, what

  • is the volume of traffic, what is the speed and correlate that with the

  • existing roads that we have and try to even come up with how wide the road is!!

  • Is it a single lane road or if it's a four lane/six lane kind of a road. All

  • of those information could be figured out. But we will still not go into this in much

  • depth over the video. Mainly what we'll focus on is

  • How do we efficiently find the route between two points?

  • Now coming to the non-functional requirements the very first thing is the system should be

  • always Available. This should not go down. The next thing is it should have a good

  • enough accuracy. Now let's say if you are not giving the very best route but if

  • even if you are giving a good enough route that should be okay, unless it's

  • not a terrible route, okay. And it should not be too slow so let's say if it takes

  • 1 second/2 seconds/3 seconds to do all the calculations and come up with

  • output that should be okay. It's not that it should respond back in a millisecond

  • with the best route between two points. That's a bit too much to ask for,

  • a couple of seconds one, two, three seconds should be okay. But it should not be too slow.

  • Like it should not take 15 seconds to come back with the response, okay!!

  • Coming to the scale, again I do not know if these numbers are authentic, but from

  • what I've heard and read, Google Maps roughly has a billion MAU (Monthly Active

  • Users) and these users generally access Google Maps a couple of times in a day

  • or maybe a couple of times in a week at least.

  • What that means is we'll probably have roughly five to ten billion

  • requests coming in a month for time, distance and route between two points from users.

  • Then there are also five million companies which use Google map so

  • companies could be companies like Uber who are using Google Maps to come up

  • with the Navigation and Pricing System, or it could be smaller companies which

  • have a very similar use case. But that means that we need to keep these things

  • in mind that we have a very large volume of users who are coming into system who

  • will be using this platform and design it accordingly.

  • now building this

  • Navigation App is a very hard problem to solve. Not because of the kind of

  • Algorithms that we'll use. But more because of the kind of requirements this has. So

  • for example, if we try to calculate the number of roads in the world there are

  • various theories, which you know come up with various numbers. In general people

  • cannot claim that there are somewhere close to 50-100 million roads in the world.

  • It could be you know even a much larger number. Now if we try to model it as a

  • graph it would probably have somewhere close to 50 million vertexes and maybe

  • you know hundreds of millions of edges if we divide the roads into multiple

  • chunks and all of that, right. So that is a very very massive data set.Now a lot of

  • companies don't even have access to this kind of a data, where in what road is

  • connecting from where to where. That kind of information is kind of non-existent

  • throughout the whole world and there is no single place where you can fetch

  • that, which makes it a very hard problem to build an navigation system like this.

  • The next thing is, it is very hard to even quantify a lot of

  • attributes that will impact the ETA. For example traffic and weather condition

  • and Road quality. Things like these are very hard to quantify so there is not a

  • very easy mathematical formula that you can come up with to calculate the ETA

  • that will be required to cover a particular distance. One more challenge

  • this is, a lot of things into the whole road systems are very unpredictable.

  • There could be and a random accident happening somewhere, there could be a

  • road closure due to any kind of reason and those things are not really

  • predictable and that's the main reason why you know a lot of companies have not

  • been able to build this kind of system successfully.

  • We will be using an

  • ideology of Dynamic Programming while we try to solve this problem. So basically

  • what we will do is, we will try it for a very small area, and then we will try to

  • solve it for a larger area, and then even larger area and then across cities and

  • countries and so on and so forth. So before we get started let me introduce

  • you a concept called "Segment". Now this is not an industry standard term I just

  • made up this name on my own, so just keep that in mind. A Segment is basically a

  • small area that is small enough that we can easily operate on to it. Think of it

  • like this we can have segments of one kilometer by one kilometer throughout

  • the globe. So let's just say if you have a city, something like this, what we'll do is

  • we'll divide the city into multiple segments. Now ideally there will be a lot

  • more segments than what we are drawing but for understanding this should be

  • good enough, okay. Now the idea behind these segments is that it will have four

  • coordinates which are the four corners of a Segment, through which we can

  • identify the segment boundaries and each segment would have an identifier. You can

  • say this segment s1, this is s2, something of that sort. Now the thing with segments

  • is whether a point lies within a particular segment? - is something we

  • should be easily be able to calculate. So for example think of it like a

  • coordinate system you have these x axis and y axis, this is your point (0,0),

  • this is your point (0,1), this is your point (1,0) and let's say this

  • is your point (1,1). This is your square of one each, right. Now whether a point

  • lies within this or outside this we can easily identify based on what its

  • coordinates are right and similarly for this based on its XY coordinates we

  • should be able to identify whether this lies in this particular segment or not.

  • So what we will try to do is each user will try to map it to some segments

  • basis their coordinates, right! Now one good thing about these segments is

  • because these are functions of Lat-Long, we should easily be able to

  • identify some approximate distance between two segments. So let's just say

  • if you want to identify this distance between these this segment and this

  • segment, right. So basically let's say if we come up with the center points of

  • both of these and these are two lat/longs we can easily find the

  • distance between (aerial distance) between these two points. We use this in the

  • later sections but it's a very easy mathematical calculation given

  • two lat/longs, find the aerial distance between these, right? Now what we'll

  • actually do is, let's say the whole globe for example, we will actually

  • divide the whole globe into multiple segments some of the segments would just

  • have water and nothing else but most of the segments would have some localities

  • and we will be solving for one segment each and then we will extend the

  • solution for multiple segments segments which will then cut across multiple

  • cities and countries and whatnot, okay!! Now the way we model the road network is

  • like a graph. So think of two points let's say this is point A and this is

  • point B, and then the road between these two points, okay. So this becomes one

  • vertex(A), this becomes another vertex(B) and the road between them can be represented

  • as an edge. Now each road would have a weight. Now will not have one weight

  • like you normally have in usual graphs, will have multiple weights. so let's say this

  • road is of length 2 kilometers so one of the weights on this road would be 2

  • We could have a bit more weights along with that. We would have a

  • traffic weight, we could have a time-based weight. So let's just say we

  • come up with the average number thing in general it takes 150 seconds to cross

  • this road, so 150 becomes another weight. okay! So this is how we represent a

  • particular road and similarly there could be other weights that we can add.

  • Now a road, even though it looks like a non-directional thing, but it is actually

  • are directed relationship. Normally how would you represent it? - you'll say that

  • this one Road from A to B which is 2 kilometers long, takes 150 seconds. There's

  • also another road which is from B to A, which is again two kilometers long,

  • but this one probably takes 200 seconds to cover, right! Why do we need a directed

  • relationship here? - Firstly in order to handle if there is a different ETA or

  • different traffic pattern or something of that sort. Plus more importantly, if

  • this says that it takes infinite amount of time to reach here that's basically a

  • way to say that it's a one-way Road. Only you can go from point A to point B but

  • not from B to A, right! So we can use all of these ways to track various things

  • like one ways or various different kinds of roads. Now even though if it's a

  • directed graph but for all the like talking within this video I will draw a

  • single line thinking of it as a non directed relationship, because it makes

  • the drawing easier and explanation easier, but when you are actually going

  • to implement it, you will implement it as a directed relationship.

  • Now let's look at what happens when you want to find the distance between two points within a

  • segment or even ETA or something of that sort? So let's say this is a graph of all

  • the roads, within a particular segment and there are these ABCDEFGH... are the

  • junctions or the points where you might want to start or end from, and

  • everything else is basically in edge. And edges could have weights, assuming that all the

  • edges have certain weight. Let's say this edge has a weight of, first attribute is the

  • length which could mean 2 kilometers the next attribute might say that how much

  • time it takes to cover that 2 kilometer. This would have a length of 5

  • kilometers and would take 700 seconds to cover that. Now let's say if you

  • want to go from point A to point C, how can we go? - There could be

  • multiple routes. So one of the routes could be from A to B to C. There could be

  • another route from A to B to E to C there could be another route from A to I

  • to H to G to B to C and there could be another, a lot more number of routes.

  • How do you find the best route from going from point A to point C?

  • This is a very standard problem in the Graph Data Structures, there are various

  • algorithms to solve it. We can use anything like a Dijkstra or a Bellman Ford Algorithm

  • to come up with the shortest path between point A to point C.

  • Once you have calculated this and we might as well decide to cache it.

  • Because within a segment, it's good to know how can you move from point A to

  • point B. Normally what you would do is you would run something like a

  • Floyd-Warshall Algorithm, which calculates the shortest path between all possible edges

  • in all possible vertexes within a particular segment and then store that

  • information so that you do not have to recalculate it. So what it will end

  • up being is, there are these all normal roads which are edges and then

  • there could be a calculated edge also which says that, for A to C, the shortest

  • path takes ten kilometers, it takes two thousand seconds and it is via Point A

  • then B then E then C, something of that sort could be a calculated edge as

  • an output of Floyd-Warshall Algorithm, which can be stored in another Data Store

  • which can then be looked up if you want to quickly look at how much time

  • does it take from reaching a point A to point C. Now what we looked at earlier

  • was given a junction A and given a junction C how do we calculate the

  • shortest distance between those two junctions. What if the point with what we

  • are looking at is not really a junction which is the meeting point of two roads

  • but any random point in the globe which just comes on a particular road. Let's

  • say that point is X, okay. And now we want to find the shortest distance between X

  • and C. What that translates is basically find this distance (X->A), lets you call it .

  • Find this distance (X->B) let's say we call it J. Basically find that two short... two or

  • three closest junctions from that point. Normally if that is on a road it'll just

  • be two junctions right, and find the shortest distance from those two

  • junctions. So whatever is the distance from A to C and B to C, add I and J

  • respectively into that and then whichever is the shortest one that

  • becomes your answer. Now if you want to extend it further and say that I want to

  • find out the distance between X and Y? - What you basically mean to solve is

  • A to C, the distance between A to C, the distance between B to E, and then

  • these two good respective distances add it. So this (C->Y) is let's say i`, this(E->Y) is j`

  • So what you want to do is a comparission between [ i + i` + distance(A,C)]

  • or this the distance of [ j + j` + distance (B,E) ]

  • So this is how you can handle any point even if it is not a junction.

  • Now whenever you're calculating remember

  • we looked at that we can use a Floyd-Warshall Algorithm and then

  • calculate all the possible distances within a segment, right. What we should

  • also do at that time is look at what all possible exits are there some a Segment.

  • Why that's important? we'll come to. Let's say there are these two roads which are

  • exiting a particular segment. Let's say this is a Exit 1(E1). Let's say this

  • is a Exit 2(E2). And the segment Id is let's say S1. So what we say is this

  • particular Junction is S1E1 and this particular Junction is S1E2. What we'll

  • also calculate is distance of each Junction not from just other junctions

  • but also from these two points. So we will also know that what is the shortest

  • route from reach from C to S1E1 and from C to S1E2. And we'll store

  • this also as a cached information. Now let's come to how do we solve it across

  • multiple segments. Now let's say if you have this point A. From now on we will

  • just look at junctions because that makes the calculation easier okay.

  • You want to calculate distance from point A to point B and all these rectangles that

  • you see are unique Segments and these two points are in different

  • Segments. Then how do you actually calculate the distance between the two

  • points and the same logic would be used to calculate the ETA and anything

  • between these two points okay. So what we are essentially saying is, we have this

  • point A and we have this point B. Now given the lat/longs we would be able to

  • identify an aerial distance between them. Let's just say that the aerial distance

  • between them is 10 kilometers, okay. We also know that each segment is one

  • kilometer by one kilometer in length and width right. So what we can say is we'll

  • at max go... so because this is 10 kilometers on aerial distance, what we

  • can say is.. adding some buffers we will not cross more than 20 Segments on each

  • directions either going from North to South or going from East to West. That 20

  • number is basically a buffer that you can come up with based on your

  • conversation with your interviewer. But the idea is this number(10 KMs) will give us

  • some input onto how many segments do I want to look at. So 20 segments on the

  • North-South side and 20 segments on East-West side is what we'll look at to come up with

  • the optimal route from point A to point B. Because normally what would happen is

  • there will be millions of roads into the world. And if you start doing a

  • Dijkstra across the whole globe just to calculate distance between these two

  • points, that will be very un-optimal. So we have to break that Dijkstra at some point in time.

  • So this number would be used to break that Dijkstra saying that we will not go

  • further that beyond this point. Now one obvious way to solve this is you have

  • all the roads connecting the whole city which cuts these two points as well.

  • Run a Dijkstra across the city and then come up with the shortest distance between

  • point A to point B. But if you look at it and if you look at it that way that

  • there are probably millions of users querying at the same time to find

  • distance between any two points, Running those many number of Dijkstra's Algorithms

  • would be heavily inefficient. So we'll do some slight optimization there.

  • What we'll do is, there are... we'll basically say that there are two exits

  • out of this block(Segment having A). Basically to reach this block(Segment having B) we need to exit this block

  • and there are these two ways. Let's say if we have calculated that these two are the only

  • exists within this segment. Let's say this a segment S1. So A belongs to S1 and

  • these two are the exits, this is S1E1 and this is S1E2 and they have

  • some great W 1 and W 2 let's say. Now what we'll do is, this segment...

  • this junction is basically having two points. One is S1E1.. two identifiers basically,

  • S1E1 and S2E1. this is segment ID S2 and this is exit number one of that

  • segment so this has another identifier for S2. What we are saying then is this

  • has another point on the same route which is S2E1 and weight between them zero.

  • similarly there'll be a lot of points. This graph will fan out.

  • On the other side for B, let's say B belongs to segment S4 there

  • are another two exits through which you can reach B. One is S4E2 and one

  • is this one which is S4E1 even.

  • Now if we just connect all the exits to exits

  • for ten blocks(Segments) on the North side, ten blocks(Segments) on the South side, ten blocks

  • on this side and ten blocks on that side, we'll basically be able to come up with a

  • crisscross of a lot of Road which

  • connects these four points right and

  • there'll be a lot of junctions within that, there'll be a lot of roads, but these

  • will be basically just entry points and exit points of the segments. We don't

  • really care what are... what is happening within the segment when we are cutting

  • across segments. Why? - because to reach this segment we anyway need to cross all

  • of the other segments right. Now we don't care if this route which we are calling

  • as let's say this is segment S6 and this point is S6E1, this point is S6E2

  • Now this is basically a calculated path from these two exit points. This

  • exit point to that exit point basically. This real path would look something like

  • this right but we don't really care what's happening within the segment.

  • All we care about is from this point(S6E1) to this point(S6E2), I know the distance, and I'll use that.

  • Then from this point to this point I know the distance I'll use that. So we'll

  • basically be using all the exit points distances and use them as weights,

  • and come up with this graph at runtime.

  • Now given like ten segments on each side, we'll probably have

  • this graph of a couple of hundreds of edges at max right. Now we can run a

  • Dijkstra on this graph staying with that what is the shortest path to go from

  • here(A) to here(B). This Dijkstra runs on top of the graph which was created by

  • just entry points and exit points in everything else, and entry point and

  • start point and entry point and end point in the starting and ending segments.

  • Everything else was just a connector from individual end point to individuals

  • basically exits of a particular segment. Now this is fine when we have to travel

  • for like ten odd kilometers we can still create a graph of these 100-200 edges

  • and calculate. What if you want to go inter-city? What if you want to

  • go from City 1 to City 2 and then you want to calculate it? Then it becomes a

  • big trouble to... because let's say the distance between those two city is

  • thousand kilometers then there are possibly thousands of segments

  • multiple thousands of segments that you'll look at, right. That would too complicated

  • to run a Dijkstra at runtime. Now like we extended the solution from one segment

  • to multiple segments, we can basically create a Mega-Segment, right? We can

  • say this whole thing that you see here is 1 Mega-Segment and there could be

  • this exit to this Mega Segment this exit to this Mega Segment and probably an

  • exit here to this Mega Segment, right? And then what we can say is, let's say you

  • have this whole map of a country. I know it looks terrible but let's say this is

  • one country, what you can say this is 1 mega segment you can basically divide

  • this country into mega segments. And then recursively solve for it even

  • further. Now when you were to if you are to solve in the mega segment world you

  • have basically converted this whole country into possibly 50 mega segments

  • and now if you want to go from here to here, you basically need to just connect

  • the exit points of individual Mega Segments. You don't really

  • care what is happening within a mega segment, you don't really care what is

  • happening within segments under those mega segments and roads under those

  • segments. All you care about it from my start point how do I get to the nearest

  • exit or all the exits of that mega segment and from these exits of mega

  • segments how do I get to further mega segments recursively and then somehow

  • reach this end point, right. Now you can have n levels of nesting. Ideally three

  • levels of nesting is more than enough. Now what is the ideal size of this mega

  • segment? - Again there is no right answer you can come up with based on some

  • calculation with your interview again on this. But the idea if you still want

  • to calculate this thing across cities or across states, this segment approach

  • would still not scale you need to do an abstraction on top of segments which can

  • be represented as a Mega-Segment and then solve it recursively.

  • All good so far but we have not really talked about how do we come up with the

  • weights on an edge and how do...what all weights do we even consider? So the

  • very first obvious weight is the amount of distance between two points right so

  • that is one thing that will definitely have. Along with that we will also keep

  • an ETA that normally under normal traffic conditions, in the normal weather

  • conditions, how much time does it take for a user to go from point A to point B.

  • Now when we have both these information we can obviously keep a next

  • obvious thing saying what is the average speed of vehicles.

  • Speed = (distance/time) so basis that will be able to come up with an average speed.

  • Now you might think that the next obvious you things are weights like

  • traffic and weather condition and all of that but I would definitely not

  • recommend that. Even... let's assume this is what we have build already and

  • now somebody comes up and asks you to implement traffic data into this okay

  • now if you want to add traffic as a weight onto your edges that'll be a

  • very very bad design. Why? - Because with an additional attribute that you are adding

  • you are basically changing the logic of your underlying Dijkstra which runs on a

  • segment, right. Now let's say tomorrow somebody comes up asking you for a

  • weather information you will again change the whole logic of your Dijkstra.

  • That piece is a bad design, okay. So ideally.. basically the set of things that a good

  • design should entail is basically that it should not restrict addition of new

  • features plus each change should be small enough and contained. It should not

  • basically impact the whole feature. So what we'll say is traffic and all all of

  • those in things are basically attributes which impact your speed okay and basis

  • whatever your average speed is you can calculate the ETA as a function of

  • distance and average speed, right? So all we'll say is traffic, weather, roadblocks all

  • of those they'll never be weights on your graph, okay. They would just be attributes

  • that impact your average speed. So average speed would be a function of

  • traffic, it could be a function of weather

  • it could be a function of any number of fields that you want. Let's say if you

  • want to add a accident, or construction work going on someplace or whatever you want to add.

  • So all of those would basically impact your average speed. Now normally

  • you don't have to do all these fancy things, never ever! Why? - Because it's for a

  • company like Google Maps they know... let's say there are hundreds of people

  • going from point A to point B. They anyway know how much time they took

  • right plus a very important feature about this kind of data is, this traffic

  • data would always be normally distributed. So if we want to plot, Let's say

  • this is basically any metrics let's say ETA of how much time it took for people

  • to go from point A to point B on a graph it would always look like a normal

  • distribution, okay. What that means is you do kwow that there is this point which

  • most of the users are hitting and as a... with some statistical calculations of

  • one standard deviation from each of these point you will be easily be able to

  • calculate this range that normally for most of your users at least 50-60-70%

  • of your users, this is the window of time that it will take. Let's say this is

  • 5 seconds this is 7 seconds and let's say average is 6 seconds

  • for example so statistically you can come up with this number as a function

  • of lot of users who are crossing the roads. So you don't really care how much

  • is the traffic, what is the weather, when you anyway have this real-time information

  • So for all the geographies where you have people using

  • your application, there you have organic data from real users at this point in

  • time so you don't really care about traffic and all of that. Traffic and all

  • other things will come into picture only in places where you can either not get

  • enough data from users or there is some other reason why you're maybe legally

  • not allowed to get information of users right because of which if you don't have

  • real data users, then is when you will bother about using traffic and weather

  • and all those attributes to come up with this average speed number basis

  • that you can even know the distance from point A to point B, you can come up

  • with a ETA, okay. Now how do we quantify traffic?

  • you cannot say that if there are 805 vehicles, it will take this much amount

  • of time, right. The number of vehicles on a road and things like that cannot be

  • calculated at all. No matter how good technology you build right. So when these

  • things cannot be quantified how do you use them as a measure? So what we

  • can say is basically create multiple "Tiers". Let's say this is traffic measurement

  • and this is Weather Management. All you can say is traffic is low or it is

  • medium or it is high and whether it is good or it is bad, right. So you will

  • have these kind of Tiers for each of the attributes that you have and all you can

  • say is every time traffic goes from here(Low) to here(Medium), it has a 20% increase onto

  • average speed or something of that sort. right every time traffic goes from low

  • to medium, average speed reduces by 20 percent. Basis these transitions of

  • traffic volume... now you can always say that traffic is low or high based on the

  • number of users that are probably using your application or from some other

  • sources, right. Let's say you have this kind of a measure, so there are companies

  • like Waze and all, which will give you the kind of information that how's the

  • weather, how is the traffic in a very abstracted form. Now once you have

  • this information you can calculate average speed as a relative number from

  • the original value calculated. so let's say with the good weather, with the

  • medium traffic you knew the average speed was later 20 kilometers per hour

  • Now if the traffic goes to high, you kind of reduce it by 20%, so it basically

  • becomes 16 kilometers per hour. Basis is this kind of a calculation, you will

  • calculate the ETA. Now as and when inputs are coming, you are making changes

  • into your weights on the graph. How does it impact the overall system? So

  • coming back to the previous drawing that we made remember we had something called

  • as a segment and we had lots of roads within the segment and let's say the

  • traffic increase we got a signal from some third party saying traffic has

  • increased on a particular Road. Now do whatever you want to. Now let's say there

  • was this one exit point and one exit point and distance between this was some

  • Let's say there are some X weight to this particular theoretically calculated road.

  • And let's say it was actually using this particular road on which the

  • traffic is increased, let's say it actually was following this kind of a

  • graph, something of that sort. Now similarly all other roads are getting

  • changed. So all you can say is remember in this all the theoretically calculated

  • roads that we have to be calculated we were caching that what all real roads

  • are a part of this theoretically calculated road. Which is... it is not a

  • real route, but a calculated edge. So each time this thing changes, weight

  • changes on any real road, you basically try to infer, what all actual cached

  • roads are impacted. And then you basically make a change into the cached roads

  • also. So you say that okay the time taken initially was 10 seconds, for this

  • path, now it has become 15. So let me increase the value of this particular

  • calculated road from point E1 to E2 by 5 seconds

  • Now let's say you do this for one segment, then what basically happens is

  • you recursively bubble it up over a lot of cached things. So let's say

  • if there was a mega segment who's... who had this kind of an entry-exit point

  • that this is ME1 and this was in ME2 and you know that ME1, the route from

  • every ME1 to ME2 uses this E1-E2. So what you will do is you will basically bubble

  • up the amount of time for this route by 5 seconds. So what we are essentially

  • saying is as then when you get traffic data you basically update the ETA on a

  • particular road. Then update the ETA on all the calculated routes that this road

  • is a part of. And update and then basically that becomes

  • at a segment level. And then do the same thing recursively for all the segments

  • that are a part of a mega segment. So basically what you are doing is you're

  • bubbling of all the calculated values and then updating whole of the graph.

  • Now these things are not happening at once not all the signals are coming

  • up at one, right. So as in when something changes you bubble up till the level where

  • it goes and then stop. So let's say this road was not a part of this ME1 ME2

  • for example, then it will stop at the segment level, right. But if it is a part

  • of a mega segment road also, then it will bubble up in the mega segment level.

  • Now, other than just traffic and weather, there's also one very important factor

  • that you can use. So you can also use historical ETAs to come up with a real or

  • a predicted ETA for a particular Road. And this historical ETA is

  • basically calculated in a way that you basically look at the day of week and

  • the hour of day so let's say Monday, 5 p.m. this road takes 20 seconds to

  • cover, but on Sundays 5 p.m. is this road takes just 10 seconds to cover so

  • something of that sort is what you can cache at a level of a day of week and hour of day.

  • And then use that historical data to infer. So let's say if you do not have traffic

  • information also, then you can use this kind of an information to somehow come

  • up with the predicted ETA. Now coming back to the segment, due to some

  • activities within the segment so let's just say there are a lot of roads on the

  • segment. There is there are these two exit points and there are these points

  • which have some roads now let's say we've calculated that the ETA of fastest

  • way to go from point SE1 to SE2, involves 500 units of time.

  • let's say 500 seconds, okay. And it uses the route SE1 to A to B to C to SE2, okay.

  • Now let's just say that there's a traffic increase on A to B. Now we could

  • identify it by a lot of attributes. Let's just say people moving between A to B

  • their average time increases right. Or we get a signal saying you know the

  • traffic is increasing significantly. For whatever reason

  • let's say we increase the time of basically the ETA for going from point A to

  • point B. So it would impact the ETA from SE11 to SE2. Now there is a

  • limit to which it can bubble up right so let's say it increases from 500 to let's

  • say 550 due to some increase in A to B so all that is okay. But let's just say

  • that we have real data of users, and for some reason it is taking a lot of time

  • for some people but we put to go from point A to point B and let's just say

  • this increases to let's say 850. Now a jump from 500 units of time to 850 units

  • of time is a very big jump. It is having a.. roughly a 70% right.

  • So now it is possible that the fastest route from SE 1 to SE2 does not

  • include this route ABC! It could be some other route. So what we will do is if ETA

  • increases or decreases, generally in case of increase only we'll do this.

  • if ETA increases by greater than some percentage. It's a configurable amount.

  • Let's just say we have it set that if the ETA increases by 30%

  • then we recalculate things within that segment. Because all the updates

  • that you are getting are against a segment only. So whenever a

  • particular road's time increases you can calculate everything within that

  • segment, right. Which means recalculating paths from SE1 to SE2.

  • Let's just say, that we figured out that now the new shortest path is not this

  • path and it is basically the pass from SE1 to A to H to G to F to E and to SE2.

  • So, this basically becomes your new shortest path, right. Now if it is the shortest

  • path, then we need to bubble up that information, right. First of all we need

  • to say that whoever was traveling from SE1 to SE2, now needs to follow a new

  • path right which is a path from SE1-A- H - G - F - E - SE2, right. This is one

  • part of the things. But what if there was a mega segment over here which

  • encompasses a lot of segments, whose exit points were also including this

  • particular path. This is let us say ME1 and let's say ME2.

  • Now if the exit point path of this Mega Segment also includes this part

  • then even that mega segments path would change so the part where they use ABC

  • now that would get converted into this, right. So these are the things that

  • you'll have to do each time a weight changes. so when the increases beyond the

  • threshold then you have to do. So let's just say increase happen from 500 to 510

  • then all of this calculation is probably not even worth it. Because the best you

  • can get is probably somewhere between 500 to 510. Now let's say 10 seconds

  • would not be too much of an enhancement right. But if it is such a big jump from

  • 500 to 850 for example then you might as well recalculate everything within the

  • segment and then bubble up whatever is required. Now you'll not just calculate

  • with respect to exit points, you'll also calculate the shortest path to reach

  • from let's say SE1 to A, or SE1 to B. All of those will also be calculated but

  • for representation this was a good enough way to explain you the concept.

  • Now let's look at the architecture of the whole system. So we will do it in two parts:

  • First we'll look at the flow wherein all

  • users are connected to the system wherein we are capturing their location

  • information from all the users even if they are not using our map service. This

  • is basically for improving our map service altogether and a bit of user profiling.

  • In the next section we'll look at how does the actual navigation flow works

  • for the user who want to find the shortest route between two points and

  • what are the systems required for that. And at the end we'll look at some of the analytics.

  • With that let's get started. A bit of a convention to start with:

  • Things in green are basically user devices could be mostly these will be mobile

  • phones, could be web browsers as well. Things in black are load balancers plus

  • reverse proxies plus an authentication authorization layer in between which

  • will authenticate all the external requests coming

  • to the system. Things in blue are the

  • things that we have built on our own. It

  • could be web services, Kafka consumers or anything of that sort.

  • Things in red are either some infrastructure component that we've used or some

  • databases or some clusters like Kafka, Hadoop and all of that, cool. So let's look at

  • how the user flow begins. The very first interaction with the user is when they

  • when they have the app installed on their phone, with their

  • location services turn on, so we get regular pings from a user's device every few seconds.

  • Now if the user is stationary, the device will have the

  • logic to not send the location ping every five seconds and maybe send it

  • every five minutes or ten minutes or something of that sort, because that will

  • reduce the load on our system if the user is anyway at the same point + Better battery life.

  • But while the user is in transit, we will increase the frequency, so as to make

  • our data more accurate. This talks to.. this basically

  • keeps the persistent connection with the device. The reason for a persistent

  • connection is, so that we don't have to create a new connection everytime. Also this would

  • be generally a Websocket connection, because sometimes we might need to send

  • some information on to the user devices as well. So this WebSocket handler is

  • basically the service that talks to all the user devices. Now there would normally

  • be thousands of such WebSocket handlers for talking to all the users that

  • are currently online. So we need somebody who can manage the fact that

  • which handler machine is talking to which user. So then comes something

  • called as a Web Socket Manager. It basically keeps a track of which machine(read user)

  • is connected to which WebSocket handler, so which user device is basically

  • connected to which machine of this WebSocket Handler Service.

  • It basically keeps that information in a Redis. If let's say it lost that

  • information is lost we can kind of pull that information again from the servers.

  • So we don't need to kind of store that information in a persistent Data Store.

  • The next thing what happens is, as in when users are sending their locations

  • the frequency of the location pings is handled by the device and then the

  • device location comes to us basically coming at the Location Service.

  • Now Location Service is basically a repository of all the location related information .

  • It basically has some endpoints through which you can ingest

  • some users information, something like a user ID with the timestamp with a

  • particular Lat/Long. That kind of a structure. It puts all of that

  • information into a Cassandra as a permanent data source to keep a track of

  • the fact that which user was at what location at what point in time.

  • Remember this is not just for users who are using our app right now for navigation, but for

  • all the users in the world. How this data would help us we'll come to in a while.

  • Now, as and when it's getting location pings, it is also putting all of those

  • pings into a Kafka. All the location pings that are

  • going into this Kafka, are basically read by this Spark Streaming Consumer.

  • This does a lot of calculations, some of them are here and some of them

  • we look at in the later sections. First of all what it does is, if let's say a

  • lot of users are traveling at a place where.. which we do not identify, okay. But

  • their movement pattern tells us it looks like a road. So this

  • basically is a job which would be a Streaming Job which would look at last

  • ten minutes of data and see how the users are moving and basically it'll be used

  • to add new roads into our system. Now as and when a new road is added each

  • time a new road is added that would basically impact the particular Segment

  • within which the roads are added and then basis that all the Segment

  • related information that we calculated earlier that we talked about earlier

  • that would be recalculated. So that might impact a lot of segments and Mega Segments recursively

  • if that's a very critical new road that

  • we've found. The next thing is basically an average speed job. What that

  • does is, it basically looks at again if we look at within a segment a lot of

  • people moving from point A to point B, it tries to figure out what is the average

  • speed that these users have over let's say last fifteen twenty minutes and use

  • that information to as a proxy for a real-time information and then suggest

  • that and basically it will then update the

  • weights of the graph that we talked about earlier. Again that would

  • bubble up to various Segments and Mega Segments if that's a part of a critical road.

  • Now how would that bubble of process happen? - So this Spark Streaming Job,

  • individual jobs would actually write back things into Kafka. So let's say if a

  • new road is identified, it would write that to a separate topic saying

  • I've identified a new road which connects point X to point Y. Now there

  • will be a map update service which listens to that particular topic and it

  • updates it in a Data Store. It updates it in a graphs Cassandra, which is managed

  • by this Graph Processing Service, so each time a new road is found,

  • Map Update Service would invoke Graph Processing Service to tell that I have

  • found a new road just store that in your Datastore which sits on top of a Cassandra.

  • Now what this service is, we'll look at in the next section in more detail.

  • Now, each time average speed job finds out there is some change that has happened

  • it would again put an event into Kafka in a separate topic now and there'll be

  • something called as a Traffic Update Service that listens to this topic which

  • would update the traffic related information again into the same

  • Cassandra through Graph Processing Service. There is something called as a

  • Hotspot Identifier. So let's just say there's a particular locality on which

  • there is some average number of people per given area kind of a metric that we

  • have. Now let's have suddenly a lot of people start gathering in that area

  • it gives us a signal that there is some activity happening at that point. It

  • could be that there is some social event, some sports thing happening or but

  • something is happening there so that could be able to identify hotspots

  • within the graph. Because eventually once we identify that hotspot we'll know that

  • after probably a while, there will be a lot of traffic in that area as well

  • right. So that could help us in certain ways. Now for certain things Spark

  • Streaming would itself insert a lot of information and put out

  • events but in general it will dump all of that data into Hadoop cluster, which

  • has location pings of all the users across a lot of time. Now we could do

  • some ML Jobs onto it which classify these certain other things. So

  • let's say if you've identified a new road but we do not know what kind of a

  • road that is. If it's a one-way, if it's a two-way Road, if it's a single lane road

  • or maybe a six-lane Road. So all of those

  • things would be identified by this Road

  • Classifier and this particular job could

  • now put that event into that road topic

  • and Map Update Service could now store that additional information saying now

  • that I have not just identified that road but now I also know that this is a

  • probably a six-lane Road or something of that sort

  • There could be something

  • called as a Vehicle Identifier.

  • While you are getting location pings, if you start getting those pings at a very

  • high frequency let's say every one second if you are getting that you can

  • input a lot of information about the vehicle in which the person is traveling.

  • So let's just say that the traffic is going normal at certain pace but one

  • particular person is stopping every few kilometres let's say, okay. Now what that

  • could mean that could mean that either there's a red light or there's a traffic

  • jam there, possibly. But we know what are the junctions you know where the red

  • lights can be we know how the traffic jam are because we know the

  • data for the user as well. Now if it's not any of these two criteria then that

  • means that the person is traveling in a public transport something like a bus

  • and then the bus is stopping at each bus stop, all right. Other kind of things that

  • we can infer is if let's say the ride is bumpy or if the person is going left and

  • right here and there very frequently or having a very fast acceleration and very

  • fast breaking it's probably a two wheeler. if not in all other scenarios if

  • it's a smooth enough right stable enough acceleration going straight in one road

  • it is very likely a four-wheeler. Now we do tell information to Google

  • specifically that we are actually traveling in a two wheeler or four

  • wheeler while we select a map but if let's say that option is not available

  • then this kind of data would be good enough to infer which kind of vehicle a

  • person is traveling in, which will help us in a lot of other attributes. So these

  • individual jobs what all they identify they could put that into back into the

  • same Kafka in probably a separate topic for each

  • job and then those information could be analysed even further. Now let's look at

  • the user journey when they want to actually get the navigation from point A

  • to point B. So normally what people do is they search for a particular location

  • which then gets converted into a particular lat/long right. So all of

  • those is being done by Area Search Service. This basically does two kind of

  • things. One is, it has some areas within Elastic Search on which it provides fuzzy

  • searching and once the person has search for a particular area or

  • locality or location anything. It then gets converted into a lat/long. So

  • basically it stores the lat/long along with the place name, description all of

  • that, in a document. The other thing is it tries to dynamically figure out: Given an

  • address kind of a thing which lat/long that address points to. So it does that

  • address resolution into a lat/long as well. Now let's say... so post the response

  • from this user will have a start point and end point as a latitude longitude

  • both of them and then they would want to request the path, okay. First let's

  • quickly go over what is this Navigation Tracking Service. So while the person is

  • actually started the navigation and they're going towards the destination

  • all of their location coordinates will be tracked as part of Navigation Tracking Service.

  • This does two main things: 1) If the user is deviating from the route

  • that we suggested, we will kind of inform the app back and then that will probably

  • show a pop-up kind of a thing, saying that you are probably going wrong, maybe you

  • want to course-correct to the right path. 2) It will also store all the

  • data so it will push all of the data into Kafka, while the person is traveling

  • then that data could be used for later analysis saying, at this point in time a

  • person searched for a route from point A to point B, and this was the route that

  • we recommended. It could be used for figuring out how good the recommendation

  • of the path is. That being said, let's jump to the main piece which actually

  • figures out how to connect point A to point B, okay. So all of those requests

  • are being handled by this Map Service. This Map Service over here is basically an

  • interface service which provides multiple interfaces for people to

  • request the directions from some point to some point. One of the interface is

  • given to the user devices. A similar interface could be given out to

  • companies where in all the rate-limiting and their throughput calculation against

  • their account information is being calculated and stored, okay.

  • It's not an intelligent service it just forwards the request to

  • something called as a Graph Processing Service. Graph Processing Service

  • basically does a lot of things. First of all it queries Segment Service which

  • sits on top of its own Database of Cassandra, in which it stores all the

  • segments and their corner coordinates along with a lot of information of those

  • segments. So let's see if they if the Graph Processing Service wants to figure

  • out that what are the segment's for the start point and the end point, it

  • will get that information on the Segment Service. Now if it so happens

  • that both the points are within the same segment then it's a short duration ride.

  • Then the Graph Processing Service might choose to do a couple of things

  • it might look up that, Do have this data already cached? if yes, it will

  • directly return to the Map Service saying there is a response. If not, it'll then

  • decide what do I do next? It can now if it is the same segment it

  • can probably say that I will run a proper Dijkstra's algorithm and then try

  • to figure out the shortest path between the two points. It can do that and then

  • return the response. Alternatively, if it's in different segments then it will

  • try to fan out into the whole basically it will try to create a pick the sub

  • graph of the main whole world's graph that we want to look up as part of the

  • Dijkstra and run it, okay. Now for doing that it will look up on a lot of things.

  • First of all it will look up for all the roads within those segments, the

  • entry-exit points of the segments, into its Cassandra. It will also have

  • live traffic information which would have updated the ETAs into the graph

  • again from the same Cassandra. Now it

  • would have also got some input from

  • something called as a Third Party

  • Data Manager. Now it doesn't query

  • Third Party Data Manager at runtime.

  • Third Party Data Manager in fact, pushes

  • data into Graph Service which then impacts the live traffic information so

  • let's say if it says that there is a huge amount of traffic at certain point

  • it'll update that. Let's say if it says that there is a massive kind of a

  • rain happening somewhere, it will update the live traffic information thing there is

  • some rain and that would probably cause some delays and the average speed would

  • slow down and ETA would increase, something of that sort. Now using all of

  • that information it basically creates the whole graph that we looked at in the

  • previous sections and then it either looks up in the cache and returns the

  • result or it might run the Dijkstra's Algorithm or some Algorithm to basically

  • figure out the shortest path between two points and then return the result.

  • let's say if it doesn't have any traffic information, any ETA information it might

  • also query Historical Data Service to

  • find out at this point in time how much

  • ETA can expect from point A to point B.

  • if it doesn't have that information for

  • some of the points. Now Historical Data Service

  • sits on top of its own Cassandra

  • which basically stores the information

  • of the form that at some day, day of the

  • week like Sunday, Monday, Tuesday in some

  • hour, let's say 4 PM, 5 PM, something of that sort,

  • from point A to point B, how much was the average speed of people normally

  • and how much ETA did it take. So all of

  • that kind of information would be stored

  • in Historical Data Service. Bases all of

  • that, Graph Processing Service will

  • return back the response to Map Service and then to the user going by the exact

  • same set of logic that we looked at in the previous sections. Now all of

  • this is done a lot of events have gone into Kafka. One of the common events was

  • that by the Area Search Service which we didn't go over. So each time a search is

  • happening the service will put an event into Kafka. Now what that would do is it

  • will basically give us a list of areas that are being searched most often. So we

  • can do some optimizations for them. Another thing

  • would happen is, so not we can do a lot of analytic on top of this data okay

  • that is coming into Kafka. One of the very first thing we need to do is to

  • figure out that out of all the third party that we are integrated with which

  • of them is actually giving right information and which of them is

  • actually given the wrong information. How do we do that?

  • So let's say some third party gave us some information saying from point A to

  • point B, there is a lot of traffic, and ETAs would increase. But our data if it

  • says that ETAs have not been impacted at all, that means that particular

  • information was incorrect, right. Now if that happens from the same Third Party a

  • lot of times, then we can safely say that, that third party has a wrong

  • information or it gives us wrong

  • information. right. Now this Third Party

  • Data Manager, abstracts out on a lot of

  • third party companies, so we could be

  • pulling data from a lot of companies all

  • of that is abstracted under this Third

  • Party Data Manager but the reporting would be done at an individual

  • organization level. That being said, there are a lot of other kinds of analytics

  • that we might want to do. First of all we might want to identify what is the road

  • that a particular person is on, right? so as part of navigation you want to

  • probably have a audio kind of a thing saying turn right in ninety meters to

  • get on Mahatma Gandhi Road,or something of that sort, right. All of that inferences

  • could be done by the Spark Streaming saying where exactly I am in how much

  • time am I about to reach a junction and what is the road name on

  • which I will get to next. So all of that could be done by this. There are a lot of

  • other analytics that we could also do on top of this data. Let us look at it next.

  • So out of all the events that are being put into this Kafka, there is a lot of

  • analytics that we can do and we should do. So one of the very important things

  • to analyze is how accurate our ETA predictions are? So let's just say we

  • predicted that it will take 40 minutes to cover a distance from point A to

  • point B and if it takes something around 38-39 minutes then that's fairly accurate.

  • But if in real world it takes somewhere around 20 minutes then that's a fairly

  • bad prediction. So we need to have this data on how accurate our predictions are

  • so that we can figure out that we need to invest more time into you know

  • upgrading that algorithm that comes up with the ETAs. Along with that it will

  • also give us some information about which of the routes are the ones that we

  • are recommending to people, but people are not taking. Maybe there is something

  • wrong with the routes that exist in our database, right. So all of those things

  • could also be inferred. Some common things that we can infer is what are some

  • common hotspots in the city that people gather at. These could be some social

  • gathering places or things of that sort. So that will kind of give us some points

  • of interest that we can build over time using this data to start with. Other than that

  • some obvious things that we can also figure out is, what are the home

  • locations and work locations of some of the people. We could also infer user

  • profile data. So for example if a person is going to a lot of pubs for example

  • then they are more of a social kind of a person. If a person is traveling

  • outside the city very frequently on weekends, the person likes travelling. So

  • things like that can also be inferred basis just a location data. Coming to the

  • last thing, now this normally wouldn't be a part of a design interview but there's

  • something you should be aware of because this is an important thing that

  • companies like Google Maps have to take care of. So how do you handle disputed

  • areas? So while in Google Maps there is a clear-cut boundary of states and

  • countries and all of that but if there is a disputed region how does Google Maps

  • decide where to make the boundary of the country or state right. So a classic

  • example of this would be a dispute between the countries of India, Pakistan

  • and China in the state of Jammu and Kashmir. So India claims that whole of

  • the land is under their territory. China claims that some of the parts of that

  • state belongs to China and Pakistan claims that some of the parts of that

  • state belongs to Pakistan. So now how does Google decide or any such mapping

  • provider decides, that how where do they define the country boundaries, right. Now

  • this becomes very tricky for Google because they cannot take anybody's side

  • and whichever decision they make they are going to kind of piss off some of

  • the countries. Plus that's even worse in this scenario because these three

  • countries combined together have more than one third of the world's population.

  • So that's a very rich source of revenue for Google. So how can you handle? - so what

  • Google does is something very smart and also very tricky. So what they do is, based

  • on the country where you are coming from they'll show you a different country

  • boundary. So for people who are coming from India, to all of those people they

  • show that the whole part belongs to India. Now to the people of Pakistan they

  • show that the part that they are claiming belongs to Pakistan and the

  • parts that are disputed between India and China are shown as the dotted line.

  • Similarly for people coming from China they show the part that China claims is

  • theirs they show under China's territory and remaining disputed area

  • between India and Pakistan is shown as a dotted line with no country boundary

  • specified. So this is something if if you come across this kind of the thing in an

  • interview could try to replicate but keep this out of the scope of the

  • interview at least, but still something good to know. So yeah I think this is it

  • for Google Maps kind of a system.

Hi everyone!! Welcome to CodeKarle. My name is Sandeep, and in this video we'll

字幕與單字

單字即點即查 點擊單字可以查詢單字解釋

B1 中級

設計google map系統(Google Maps System Design Interview Question)

  • 19 1
    meowu 發佈於 2023 年 10 月 17 日
影片單字