字幕列表 影片播放 列印英文字幕 When alex did a video on linked list there's a lot of comments on it it sort of thing I want to use a linked list arrays a factor What's the point of reading this and so I was sort of thinking about is that well actually "Are arrays faster than the linked list or are the linked lists faster than an array, how do we go about finding out?" Which one's faster, so let's have a shoot out arrays versus linked list First question to start is I should just remind ourselves what the difference is between a linked list and An array so let's start with the array so probably the simplest To understand so an array is just a way of storing several values in memory so we start off with the first value and for the purposes of this demonstration, I'm going to use a Structure containing multiple values. We did talk about objects a while ago structures are what you had before We saw this object which is a collection of values in memories and store to integer values So this is array element zero and it's got two numbers. We then got array element one And it's got two numbers Looks like I'm playing break out got a great element To and it's got two numbers and so we can refer to any of these by reference to the Index starts at 0 3 4 5 6 7 10 so you can have an array of any size we have to know how big it is if you wanted to be when You create so you allocate the memory for all the things and then? You can store some values so we can write 42 in this one. We can put 21 in here I'm going to put 3 in that one and we can store them or we can put different values in here as Well as we need them so that's how an array works. Which is one big block of memory each thing stored one after the other linked list works slightly different we start off at the beginning of the linked list and it points at nothing but then as we add things to the list we Allocate the memory for it. We're going to store two things so the first thing we'll label these things inside it p And Q so it's a lot p here And we've got q but we also have an extra thing that tells us where the next they will be and that's empty, too So it's all put 42 and 16 so we just allocate space for that one Structure that one thing that we're going to store and we connect it up So we point this at the first thing so we often call this something like first And we see Alex videos to see how that have done So when we want to store the next thing we allocate explai for another one And we can store the value, so we've still got 8 21 in it now the difference between this could be anywhere else in memory, and we connect to the next pointer to point in To the next thing so we have some known starting point which points to the first thing and the first thing 22 the second thing and the second thing would point to the third thing and So on and that would point down and these could be all over anywhere In the computer's memory and then at the end you normally point it at some known value Which is often referred to as null and is often the number zero? So you can then walk through your list And when you find at the next point of zero or null you know you've come to the end of it There are two different ways of storing data the best or the same amount of data now He's got some differences this one as I says it's fixed inside, and you create it you can store How many things you know I've actually filled in but as soon as you get more than eleven since final top array? You can't go past eleven if you can see yeah the numbers all go to 11 on this however if we wanted to put something else thing we can just get rid of this thing And update this to point To a new thing so we can make this grow Quite easily, and we can also be really clever We can add things in the middle and the beginning very very simply, but we're going to go into that today So creating them. We're probably only going to do once but which one is actually faster Well, let's think about what actually to do. Let's define a very very simple problem We've created the structure here, and it's got two things in it It's got an integer called p and an integer called Q so saving the linked list and in The array let's consider an operation where we're just going to go through all the elements in our array Or all the elements in our linked list and sum the value of p are going to add up all the value So we're going to add up. What's in array Element Zeroes P in array of them at once P and so on so we're going to effectively do Something's going to add them all up. We'll do the same with the linked list as well So we're going to set what the first thing is to point to p Now that one. No. There's no time out of its value of p the next one and so on and we'll add them all up Why are we using missiles they want to the unique thing about this is That it's going to visit every single element in either the array or every single element in the linked list so we're going to visit every single element if you only have to visit one element then the very nature of Random access into array means as we fast There's no way you can make any difference unless it's the first element if you only one to ever access the first element Then look at the same speed we fuel directly to random element Then the array or win hands down you can just think about how it works that will be the case. We're going to consider We want to access every single element and do something to it and in this case You're just going to sum up the values, but it could be if things were representing say windows on the screen We want to move them around we want to redraw them Whatever it is will then do something to every single wall so we sort of set out the problem? I'm going to go over an array of values and a linked list of values I'm going to in sum in the visits all and we do actually add up all the numbers Stored in the element P. Which is part of the structure so what I've done is I've written To see poems which are more or less identical I'm going to create 125,000 elements in my array or mailing list and in the linked list I've got a structure which contains an integer p and an integer Q and in the array version I've got a structure which contains an integer p and interested Q in the array item The only difference is that the linked list has one extra item which is the next pointer to the next thing? So very listen see so hopefully we'll go as fast as we possibly can most of the rest of the program is identical so I have my main Routine here which creates a List or an array of random number generator? 125,000 random numbers which is a slowest part of the program you'll see and then store them So allocate memory for each element in the linked list and then we set the value Everything else will just leave with 0 and we do the same with the array just as we did on the paper We then run the program 100 times and all going to here is time How long it takes to run a function which adds afforda numbers either in the list or in the array? I will do that 100 times And then we can work out the average time it takes to calculate one and we'll print out the values as we go So we're just using the built-in clock function in C. Which should be accurate accurate enough for what we want to do so the real meat the body of the program is this function here some list or Some array Let's start with the array one because possibly the simplest we said a variable to be zero that's going to be asked some You have another variable I which is going to be our index through the array and we go from I equals zero I first element until we get to the end the number of elements in there and we just say this sum equals the value that was already in sum plus the value in the ice element in the array a P element Remember we said the array had a p and a Q in it we want the value stored in the p space within the structure in our array, so I'm just going to add get them together the list version very very similar we set the someone to be 0 We say while p does not equal null So why we haven't reached the end of the list and we set p initially to be the first thing in the linked list? So pointing out that first thing sum equals sum plus P inside p. We using P inside p Confusing variable names, but then I'm asleep programmer, and then we set p ir pointer to point to the next thing So we follow the link to the next thing in the linked list and we keep doing that until we come to the end There are two programs as simple as you get in terms of implementing a linked list or on array so we're going to do is run them and We should get some values output But we're not going to run them on the I max we're not going to run them on the raspberry Pi we're actually going to run them on the atari behind me which means I need a very high Peak high-tech piece of Equipment, so I need a floppy disk to transfer other programs from the iMac Go topping them off the iMac, and then we can run them on the AtAri so go over to the Atari and Assurance that it's cameras and spot. We can see it. So what I'm gonna do at first is they got to run the list first also going to generate 125,000 linked list values to let it Run Right while it loads off the floppy disk that's what it used to be like back in the 80s and 90s you had to wait one you crave them ran a While a long While And this is just a simple program So there is good So it's not Initializing the link list it's allocating all the memory for the different elements and then putting a random number In the P value of each one, so we'll let that run In the words of the old Hobbit computer game time passes or in the words of every apple of this secret short How long does this take? A while lots of oil, I don't know another time this bit 125,000 times however long it takes to allocate a random number and allocate some memory On the clock speed of what this is actually an 8 MegAhertz cPU So it's a takes a while the computer loaded off all the code for the program and off the floppy disk into memory And it's now running it to generate the data set that we need then Gonna sum all the values So everything is happening now will happen off memory It's just taking a while to process it But we are recording it in 4k which is a slight overkill. I hope you're getting into that extra bit So the programs going through now and we'll let it run 400 times We'll get an average, but actually looking at the values of the coming up I think we can safely assume that the average is going to end up being 166 clock ticks now when I say qwop takes each of the machines. We're going to look at are going to have Different clocks perms that they use the time things so we can't compare the values directly what we could come read with the seconds But we can certainly see the relative patterns between it will call that 166 clock ticks over to run the linked list version of the program it takes 166 clock ticks So let's now run the array version of the program So now we're going to do exactly the same thing we're going to allocate an array every time tWenty-five thousand elements and populate them with random values because of the way I've written the program It will actually be the same pSeudo random number sequence that the sum should be the same and then we're going to fill each of them up and time how long that takes and we'll they'll see whether an array actually is faster than the linked list or Whether a linked list will be b. Array So that to me looks like that little spinny thing is going faster Yeah, well so and I haven't just upgraded the processor while they're off camera What's actually happening is? When we allocate the space for the linked list we have to allocate space for each element and then for the gym there We allocate space for the next element. We could do that in a different way and speed things up, but we did it That's the classic way with the array. We allocate the whole lot in one go, so we all take one Huge block of memory under 25,000 times eight is about one meg's worth of memory So allocate all that in one go, and then we're just going through it filling in all the values setting all the values So that should be quicker and it seems and spinning wheel is going slightly faster what why would that Spin wheel very fast can you just learn how that You can read it if you've heard this thing you it yeah? So to give some nice sort of visual Feedback while this is running Well, I'm just a plain white screen what I'm doing is I'm printing every hundred element ideals I update that thing and just printing either a dash a Vertical line or one of the two slashes to make it look like it's getting round and those are printing it it creates a little Effect which is see that something's happened and the computer hasn't crashed So this should be faster, and we should get a value so is now working and the immediate thing we see is That the array is taking about 179 or 178 ticks it's the same quad kick so we can compare these two values the Array is slower than a linked list I Know down, okay? All right now. So you can't argue in - okay, so the numbers have come in the computers around 170 angular call is to that 170 8.5. Always for the final average to come in but for that. So they're roughly comparable Basically there's not much it sort of thing, but there's something about 200 qwop ticks per second on this machine It's is probably a noticeable. It's about 10 percent slower Okay, so that's it. I mean we can stop here come a that's it. We know. What's what arrays are slow You wrinkly yep, so arrays are slower than ling Li Gotta get my vendor video yeah and a video you can now bring up the slash computer, so virtually There are two ways you can walk over the Ray We went up the array from the smaller numbers to the larger numbers But we could also start at the larger indexes and move back to the smaller indexes, and I thought well Let's try it both ways let's not have anyone posting some comments saying no, but if you went the other way So let's run it the other way, so I wrote a version which walks down and because we're adding the numbers together We'll get exactly the same answer a plus b equals b. Plus a and all that you can watch numberphile to find out More online. I'm sure You think this would take exactly the same amount of time, so just update our table To do this and we're going to have the reverse array here I'll just wait while it does that So going backwards through the array the only thing can we do the same thing with the linked list Well the way we've designed the linked list we Have a pointer to the next thing in each thing, but we don't have a pointer back to the previous thing It's a singly linked list so we can only move forwards along it We could design it to be a doubly linked list and have a backwards and forwards pointer but if you think about it if we started at the end And move backwards that's going to be exactly the same Operations as if we start at the beginning and before so there's actually no point in testing It'd be exactly the same code. We just be using different offsets into memory so taking exactly the same amount of time While we wait for that to do its thing let's run the same programs on The Imac and see how much faster it is so let's compile them up. I am turning on as I did before all the optimizations For the array and things that'll make it go as fast as poss but some of the comparison to the array Test and I going to compile with the speed test Done, and I'm going to do the same thing for the link Test I'm going to say something really obvious now But it's honestly a heck of a lot quicker this machine just to get compiled up and ready. Yeah Those machines much faster because while I'm compiling everything on here and then transferring it across Rather than trying to get my old C compiler going on the AtAri And also the chances are this will produce better code, so it'll take make the most benefit of the speed We can pile them up, so let's run the linked list test as we did before boom It's done Everything now you notice that took a lot quicker? The numbers are still roughly in the same order still about 100 and around 200 but remember this is a different Clock we cannot compare the ticks from that one - it's on the different numbers, but we get the average here two hundred and nine Point five four I'm acting this is 166 atari ticks They're much slower chicks than the I like ticks we can't compare that to that we can compare horizontally let's do the same now with the array test and Wow When we did the linked list test on the atari it was faster than the array test Roughly take you about ten percent faster Nothing really in it you could say look at the difference on The Imac the IMac takes forty three point four four clock ticks to do the array 209 that's five times Faster for the Array was on the Atari the linked list was faster So a reverse array is now going on the Atari 114 and That's quicker. That's quicker than the original array, and that's quicker than the linked list, so if we do everything backwards Doogie where it's very good if you do everything forwards It's much slower. I'm a bit confused Is there any possibility that's just a problem the code or something always are you going to reveal something to this year? We need to delve a bit deeper here to see what's really going on Remember we wrote these programs in C So we wrote these in the high-level language and then they got compiled down To the instructions that the machine can execute and what's actually happening here. Is that the research shows that the machine can execute? Favor walking backwards down something compared with going further upwards now Let's run that same array backwards Program on the IMac for completing this so let me compile that one up and again. I'm Optimizing everything to the hilt Test and so we'll run speed tests back And we'll run this one so before the arrays now much faster According to this a reversal rate should be faster than the forward array. So what would you expect the value to be shown for? Run on the reverse away about fifteen or twenty directly if it was the same as that did All right Would you like to stick with that answer or would you like to change that oh? nervous excited actually Because the Atari was faster to do a linked list in an array and the iMac was the other way around I'm going to say it's going to be slower to do it in reverse. So you reckon firm supposed to do it in reverse Oh, I do like an indecisive person there. We go average time forty seven point four six So marginally slow marginally slower about 10% slower but still an order of magnitude faster than this we could do the same on the raspberry Pi So again, we'll compile all of these up, so we'll do it array test Compile it up noticeably slower to compile will do the LL test. This is the list version And we'll do the reverse test So compile them all up on the last repair, so let's run these and get the numbers we've run the array test we now get nine 165 point seven five as an average for the array will run the link list and We get one eight five eight Point six one for the linked list test and now we run the reverse test and We get 101 9.5 five so we've run the test now. So we've we've assumed nothing We've ran some tests or to get some data so we can see which is faster an array or liquids with this operation We've been trying And we've got some pretty interesting results So we're running on the atari the linked list was faster than the array unless he ran the array backwards in which case the array was faster than the linked list and The array going forward we'll come back to that We ran it on the raspberry Pi and here the array was about twice as fast as the linked list Baby rarely backwards the Array was slower, and if you do it on the IMac view range height, I'm faster than the linked list Whichever way you went so what's going on here? Well, let's ignore the iMac for the minute the apple Haters will love that bit but let's ignore the Imac for though and let's just have a look at the raspBerry Pi and the attali so we've got the Atari and the Raspberry Pi and we'll just go with The array speed and the link to this so we've got about one hundred and seventy nine For that and one hundred and sixty-six clock ticks that now when we can't compare the clock ticks between the different machines because they're different different clocks used in the machines but we can compare them relatively between the same thing on the same machine for the raspberry Pi it was 966 and that was a 1859 Kish now what's going on here? Well the thing we need to remember if you look at the machine code It's roughly the same length the same number of operation now on the atari some of the instructions will take slightly longer than to execute But that's not what gets going on here We need to think back to the video on Cashiers that we did the difference between the raspberry Pi CPU would have been much faster and more modern and the Ataris Is that the AtAris? Doesn't have a cache So every instructionally needs to fetch every bit of data that needs to fetch has to be fetched from memory each time No, it's not cached broadly speaking we believe something going on there, but broadly speaking with no cache is about half the Prefetch buffer if you want to get into the details, but we can see there's no cash So if we think about the cPU in the atari, then it's having to access memory For everything so everything that the cPU needs to fetch on the Atari the instructions The data from the array or a linked list and of course the next pointer from the linked list has to come from memory So it takes the same amount of time we get are the two weather fetching data or fetching instructions on the raspBerry Pi however? We still got the main body the cPU which is going to execute things and it has memory as well but in between there we have a cache in fact we actually probably have two caches one for Data and one for instructions as I want to see if you activate as we looked at in the previous video is Access to it via the caches, and then they if they haven't got it to get it From memory, so why does this make a difference? Surely things will still wake up so well on the atari the fastest thing in the theory is basically the memory the memory is much faster than the cPU that's about twice as fast and So the memory can provide a data exactly when the cPU needs it. There's no real need for cash Move ahead to the raspberry Pi and the are make of course then the cPU is much faster than the memory So when cPU ask for something it has to wait while the memory? provides it Now let's figure out how what happens when we run our program? with the array on the raspberry Pi Every instruction after the first time will be accessed will be cached in the instruction cache So the first time we go through the loop all the instructions are going to be used in that loop will have been cached in The instruction cap so you can get these immediately the data particularly with the array would also collapse a bit with a linked list we Don't get just one or two bytes in each time I'll get what we call a cache line pickups 128 bytes in a go, so we'll get some of the data that we already need Into the cache as well, so some of the data will already be there in the array more so with your right so the reason of the array runs faster on the raspberry Pi is that all the instructions are coming straight out of the Institute basically giving us a fast lane for those instructions, so they get there immediately There's only the data that needs to get from main memory, which will get cached as well So the same happens with the route list program Except for one crucial difference with the linked list poem we have to make one memory access to get the data value that we're going to sum and Then we have to make another memory access which has to go through the cat into main memory for the address of the next thing we have to make to Memory request which may get satisfied by main Memory here Where's our the array one? We only have to make one for the value that we're interested in and because of the cache We get sort of a fast passing instructions, and so actually not having to do that second memory access to get the next pointer here Means that this the arraylist of the program ends are working about twice as fast on the atari because everything is coming from memory it doesn't matter whether you're reading an instruction or reading a bit of Data It's still going to come from memory each time so actually the front of the value is already pre-calculated in memory means that actually runs Slightly faster, I mean, we're talking perhaps 10% faster. Not really a billion so we get a slight speed benefit Now to show that this is the case I took the same version of the program on the atari and ran it on the atari router now the difference for the AtAri St. We use there and the Falcon is that the falcon has a slightly later version of the 68000 processor which has a cache In it both instructions and data cache and when I ran that on there the times that came out So the array was 46 clock ticks is much faster processor anyway, and the link list was 58.5 Clock ticks so on the Falcon because you've got the cache. They're just like on the raspberry Pi The array version ends up being traffic exactly the same program exactly the same machine code Because you've got the cache there the array version ends up being faster because the instructions of their Professional and that sort of fat packs from the cache rather having to go to main memory each time mean that the irradiation becomes faster than the linked list version Now tell them other things we haven't talked about yet. Why is the reverse array faster on the AtAri? Simple answer to that it Just uses slightly different instructions their instructions on the six 8000 that allow you to do a decrement Testing with zero branch which isn't all in one instruction So you can actually make the code slightly more compact and run slightly faster again. It's a small enough time Why then is the imac significantly faster? We're getting sort of five times faster? Well that's because I use a slightly different compiler that I use the clang C Compiler on the IMac rather than GCC for the other two and it cheats it It squats that you've got an array axises in the loop, and it says well Okay, rather than doing a loop around one array access I will do a loop around eight array accesses all in one go So it actually removes some of the tests for the loop and it makes the program much faster for the compilers being clever And optimizing the program So it runs faster the answer to whether the raise faster than the link list or not? Very much depends on how you cpu in the machine that you've built is configured to actually execute the code So you can't make assumptions? About how a cpu will execute code you really need to do the test to see what's happening even within the same architecture family the difference between the six 8006 8030 in the Falcon mean that Assumptions we make based on what they charity did where the Rally was slower than the link list aren't true on The Falcon and when you move it on to a completely different architecture like the raspberry Pi or the X86 chip in the iMac Then again you get different effects, so the best thing is when you're faced with a question question like this, and you're not sure Come up with some test and collect real data, and you'll be able to see what's going on could you
B1 中級 陣列與鏈接列表 - Computerphile (Arrays vs Linked Lists - Computerphile) 11 0 林宜悉 發佈於 2021 年 01 月 14 日 更多分享 分享 收藏 回報 影片單字