字幕列表 影片播放 列印英文字幕 Hi, everybody. How we doing? We got caffeinated? Feeling good? Nice. So I'm Anjana Vakil, hello. You can find me on Twitter at my name and today I'd like to talk to you about immutable data structures for functional programming in JavaScript. We're going to take a look at what immutable data structures are, why they're a really cool way to handle the immutability that we typically use when we're doing functional programming and how we can do that in JavaScript because I hear y'all like JavaScript! So a little about me. I'm probably the only not-web-developer in the room. I am an engineer for Uber Research. I work with them to develop a custom query language for data in the scientific research funding domain. I'm also an alum of the Recurse Center, which is a fantastic programming community in New York City, and I am an alum of the Outreach Program, which if you have haven't heard of it, it's getting women and more folks involved in these by giving them internships at Mozilla. So I'm really happy to chat about those things if you want to come grab me after the talk. But you might know that I like functional programming. I think it rocks. Anybody else agree with me that functional programming is cool? Yeah! Yeah, so functional programming is a pretty great way to avoid some of the headaches of like imperative and object-oriented programming. In functional programming, what we typically do is conceive of our programs as being just pure functions. That means their transform their inputs to outputs, and that's all they do. They don't have my side effects like changing things in the console, and my taking things in the global state are side effects. But our data becomes data in, data out, and transformers of data. And one thing that goes hand-in-hand with this, with avoiding side effects is immutable data. Immutable data meaning once we've created it, it never changes. So this is a really good way of changing something accidental outside of your function. If everything is immutable, you can't change anything. So immutability another thing that rocks and it rocks pretty hard for other reasons that we'll see in a moment. But speaking of rocks, let's talk about rocks. So this is a rock, and immutability rocks in the way that rocks rock. Now I don't know about you, but I've been going to a lot of tech conferences recently and I've been feeling like there has enough poetry. So I'd like to read you a poem: Nobody sits like this rock sits. You rock, rock. The rock just sits and is. You show us how to just sit here. And that's what we need. It's so true, so deep. This is from -- Don't thank me, thank I Heart Huckabees, that's a great movie. Check it out. So this is really how immutable data rocks. It just sits there. It just is. Once we've created it, it never changes and that's amazing because it can help us avoid some of the headaches of immutability. So with immutability, we have some things pretty easy, but other things become harder and we'll see how that looks. So let's say I have an array called foo and it's got some numbers in it. Hm, and I'm already bored. Let's make it more fun. Let's say I have a zoo with some animals -- more fun! And I decided that I want to change something up about my zoo. Maybe I want to replace that rabbit there with something a little more exotic. Like an alien! So this is cool. I'm happy because I wanted a more exotic zoo. I got an alien in my zoo now. I didn't have to change anything except for that one little cell in my array. That's pretty sweet but my co-worker over was expecting zoo to be filled with earth beings, earth animals, and wasn't accounting for there being an alien in it. Who put that in there? Now my program doesn't work anymore. Who did that? So immutability has a couple problems. We have to manage who's been changing what, when -- who's been putting which animals in the zoo. We have to have a lot of overhead to manage that state, and that gives us headaches as individuals, and as teams. We also get bugs in the code because maybe I was only planning -- or my co-worker was only planning -- to handle terrestrial beings and didn't have a case of aliens being accounted for, and that broke something. So these are some side effects of immutability that don't make us happy. Let's try doing things the immutable way. So in an immutable world, my array, my zoo, once I've created it, it just sits and is forever. I cannot change it. What I can do if I want a new zoo that's more exotic is I can make a copy that's the same size as my original array, and I can make the modification I want, so I can put my alien in there in place of the rabbit. And so this is pretty sweet because now my co-worker is maybe, and they're, like, whoo, nothing broke in my program, and it's all still animal creatures but I had to copy over that whole array. I had to allocate the space for that entire array, even all of the stuff that didn't change. I had to copy all of that over, as well. So this means that my code runs pretty slow. And it also takes up a lot of memory. It takes up a lot of space and time. The complexity on those things are bad because copying is a waste of both time and space. It makes us sad face! We don't want that. So if we want to do immutability, we must be able to find a better way of doing that. Luckily for us, a lot of very smart folks have been thinking very hard about this problem for a while, and they've come up with some really good solutions for how we can deal with immutability efficiently. immutable data structures! So immutable data structures is a term that you may have heard about, with functional programming, or also in terms of React where they come in handy. Technically, an immutable data structure is like the rock, it just sits, and is once you create it. It never changes. But also hear the term persistent data structures banged about. Sometimes these are used interchangeably, but they have slightly different meanings. So if immutable data is data that never changes, persistent data is data for which we have access to old versions. So as we've been creating new modified versions of our data structures, we keep the old versions around. You might hear about partially persistent data structures where we can look at the old versions, we can access them, but we can't go back and update any of them. All we can update is the most current version that we have. And then you might also hear about fully persistent data structures where we can actually time travel, we can go back and update any of our past versions. And if this is starting to ring a bell like it's version control like git, it's sort of the same idea. So we're going to talk about these as persistent immutable data structures, they're both persistent, and immutable. Let's see how this works. The key to all of this is we want the old versions of our data, like, my original zoo to stay put. We just want to to sit like the rock but we want new versions to be created efficiently. So what magical tricks do we have to use to, like, make this happen? Do we have to make invocations do dances to the gods of space and time complexity? No. It's very simple. Trees and sharing. Isn't that sweet? These two simple concepts will get us efficient immutable data. How? So let's talk about trees because trees rock pretty hard, as well, alternative, unfortunately I don't have a poem for that, sorry. Imagine that we could find a way to represent our zoo array as a tree. So one thing I could do is I could put all of my animals -- all of my values -- in the leaves of a tree, and I could make it so that each leaf holds one value, one animal. But they might get lonely, so let's put them with a buddy. Let's put them 2x2. So each of our leaves will have two values and we'll hope that the buddies get along and not each each other -- looking at you, tiger, number six, don't eat that koala, and we can go up to intermediate nodes up and up, until we get to the root node of the whole structure, and now that root is an array represented previously by a tree. So this is my tree now in this structure. So given this type of structure, how do we update something? Given that my data is immutable, and it can never change, how can I handle the fact that it has an alien in it. So here what I would do is I would take the node that contains the value that I want to change. So in this case it would be the 0/1 node that you see on the bottom of the screen. And so I make a new copy where I've still got my monkey but I've changed the rabbit to an alien. And then I need to copy any of the intermediate nodes in the tree that were pointing to the node that I changed. So I basically trace a path up towards the root of the tree, which, now, I've got a new root, which means another version of the data structure. So this technique of making this update by copying the path from the leaf I changed to the root is called path copying. That's pretty cool because now I didn't have to copy the entire array; I just had to copy the nodes on the way from the root to the leaf that I changed. So if we've turned in something linear and copying into something logarithm. That's pretty cool, that's more performant, and the data of this is that all of these nodes in yellow here, so most of the tree is shared between the two versions, between the old version and the new. And so this saves me a lot of space because I can actually reuse the parts of the original version that didn't change, whereas, before, I had to copy those over, as well. So this means that what was before, like, a lot of memory consumption becomes a lot smaller because you don't have to store as many copies of the things if they didn't change. And that's called structural changing because we're sharing the structure of the tree between the two versions. So we've been talking about updating things but how do we get at the values in our data structure? How do we access them? Well, it turns out this isn't just a tree, it's a special type of tree called a TRIE tree, which originally came from the world "retrieval," so people could, I guess, call it tree, which is funny because we also call TREE trees, so we can call them "tries" if we want. So a try is a type of tree, where the leaves represent the values, and the paths to the value are the keys that that data is associated with. So often you see TRIEs with values stored as keys. So, for example, if I have T stored as a key, what I do to get to the T is I trace the tree one letter at a time. Then I go to T, and then to E, and then to EA, is my key, and then my value there is three. Because everything at the end sounds like "ee" in this talk. So this is pretty cool, but in our data structure, we weren't using words, we just wanted an array-type thing, we wanted indeces, right? So the insight here is if we treat the index as a binary number, then we can pretend that that's kind of, like, our word and we can descend the tree, bit-by-bit as if each representation of our binary representation is a letter. So let's see how that works. If I'm trying to get at item five in my array, so the animal at index five, I'd convert that to binary, so that's one, zero, one, and then I step through that as if it was a word. I step through it letter-by-letter, bit-by-bit. So I go from the root to the branch. I have a choice of either zero or one. I go to branch one first. And then I go to branch zero, and then I take the thing on the one side. So I go one, zero, one, down my tree and then I end up at my frog at index five. So this is a pretty simple insight but it ends up being incredibly powerful because it allows us to quickly traverse this tree structure, which lets us use that structural sharing to more efficiently represent our new copies of our immutable data structure. And, importantly, we don't have to be using a binary tree, meaning we have two branches from each node. That fits pretty well on a slide, but actually what you mostly see is a 32-way branching. So in our trees that we've been looking at, we've kind of had one bit of information per level. And we've been descending bit-by-bit but if we had a 32-way branching tree, it would be five bits of information that we would be representing at each level. So that would look something like this. If we had a much bigger number, like, 18,977, in binary, that's that bunch of ones and zeros. This would be a really deep tree if I had to descend into it one at a time, it would be like 15 levels deep. Too much, too long. So if I'd make more branches at each level, then I can chunk this up into kind of 5-bit letters as it were, and descend the tree that it's now only three levels using the 32-way branching. So this is kind of a tradeoff between how deep your tree is going to be, and how big the nodes are going to be because if I have just one bit of information at each level then I have really small nodes. That's quick to copy over but I have to go very, very deep down in the tree for a larger array. And generally, research has found that 32 is a pretty good tradeoff between the depth of the tree. So what we've seen is a bitmap vector TRIE. That's just jargon. We don't need to care about that. But if you need something to Google, you can Google that. This is cool for array-type of things and we have an index we want to jump there, but what about objects? We also want to be able to associate objects with arbitrary keys, not just indeces, so we want non-integers as keys, how does that work? So if I want a version of my data structure where it's no longer an array but it's something like an object where I'm associated letters with each of my animals like M for monkey, and P for panda, et cetera, what I can do is I can take my keys, in this case, they're letters, and hash them to get a number that represents the key. So that each key will have its own number. They won't be in order necessarily, but that's okay. Objects don't have to be in order. And then we can use the hash of that number in binary to descend the tree as before. So if I wanted to look up the value associated with key "F," I could hash F, get some number, and let's say I get five, like, A, B, C, D, E, five. And that would be represented in binary as one, and I descend the tree as before, here, for simplicity, just using a one bit at a time, two-way branching tree. But typically we would be doing this with 32 branches per level. So, again, we just descend the tree using the binary representation of our key, in this case, we used a hash function to transform it from some arbitrary object into a number and we get the animal we want -- in this case, our frog. Cool. So that, if you want to Google it, the thing you could Google is a hash array mapped TRIE. And this was a data structure parented by Phil Bagwell, and Rich Hickey, kind of started using it, and a lot of these an implemented in languages like Clojure to implement the data efficiently. There's a ton of optimizations that are usually done on these data structures to make them super-duper fast and lots of details that we're not covering here but this is the basic idea. Trees to represent our data, structural sharing so that we can reuse as much information as possible between the old versions and the new versions. And this idea of using binary representations of our keys, whether indeces, or hashed keys to descend the tree to find the thing we're looking for. So to recap, mutability induces headaches. It is to be avoided especially if you're doing functional programming where the essential idea is to not have side effects and only be using pure functions that don't change anything except do the computation on the input and return the output. Immutability, on the other hand, is great because if I'm using immutable data, I can't mess up my co-worker's program by making the zoo she only thought was animals suddenly have an alien in it. But copying is a really bad way of handling data because it is not efficient neither with respect to time, nor space. And structural sharing, using these tree structures -- or TRIE structures, and sharing as much information from one version to the next is the really performant way to do this. And so you're probably thinking, okay, these data structures are pretty cool. But what am I supposed to do with them? I'm not going to be building boxes of emoji here, am I? No, you don't have to. In JavaScript, there are some really great libraries out there to help you use these right off the bat. There are various solutions but I'm going to talk about a couple of them. So one is called Mori. Mori is basically a port of Clojure script by David Nolan that allows you to leverage the implementations of these data structures from ClojureScript, which is the version of Clojure which targets JavaScript from the comfort of your vanilla JavaScript. And it's got a bit more of a Clojure feel to it. A bit more of a functional language feel. The API is functional and we're going to see what that looks like in a moment. But that's one thing that kind of sets this library apart. On the other hand, there's also Immutable.js. This is a library put out by Facebook. It was created by Lee Byron. And this is a JavaScript implementation of these data structures. So it has a bit more of that native JavaScript feel to it. It doesn't have kind of the Clojure background brought in. And that means it's got a more object-oriented style API, although it is still returning new versions of data structures instead of changing mutable structures in place. So let's see what those look like. This is how you might use Mori to create what's called a vector. A vector is the data structure from Mori that you'd probably be using as an array-type thing. So I've got a vector that I'm calling A because it's sort of array-ish. It's got one and two in it. And if I want to push something onto that, the function that I'd use is conj. This is from the Clojure called, Lisp-speak. And what I would put in is the original A, and then what I want, which is, in this case, three. And you'll see that this creates this new structure on the right. These vector, one, two, and one, two, three, they look different because they're not really JavaScript arrays although you can convert back and forth. But the point is this cong function returns a new value which I can catch as A2 and I can prove to myself that my original A didn't change by using the count function to see how many things are in it. And there's only two things in it. But I can prove that my version, A2, has the third thing by trying to access, by using the get function to trying to get two, which it tells me, it is indeed three. This is the same thing that you would use in Immutable.js. Here you would use Immutable.js.list.of, that's interesting syntax. But it creates something more like a JavaScript array. Although it is not an array, it is a JS list. That I'll call an array and if I want to add something onto a new version of A, I use this sort of dot-method notation that we're used to. I'd say a.push(3), but, importantly, this is not changing a. It's just returning a new value of a, which I'm going to capture as a2 and I can prove to myself that it didn't change. A.size tells me it's two, and if I try to get the item at index two, I find that it's three, as I expected. So, similarly, for what are called maps, which is kind of the key-value object that we might be using, if I create an object, o, which is going to be my Mori hashmap data structure, I'm associating a is one, b with two, again, we see that the syntax is a little different from our regular JavaScript beastlier not regular JavaScript objects. They're super special immutable data structures, they need special syntax. And so if I want to change the value of one of my keys, I can use this asoc function, and then change the value of three in my new version, o2, and then I can prove to myself that the original didn't change by using the get function to make sure that a in the original one -- o, is one, and the a in o2 is three, as I would expect. And it looks quite similar in Immutable.js except the structure is called map, not hashmap, and I can pass in a little JavaScript object, and it gives me a little o, a little more JavaScript syntax than we're used to. This has a bit more of a syntax and feel that you might be used to from JavaScript programming, I can use the set method on o to create a new version where a is now three, and I can use the get methods on my old version o, and my new-version o2 to prove to myself that the old one didn't change. So these are really immutable data structures. They look really weird if you try to look at them in the console just as JavaScript objects. They're really fun to kind of poke down into because they have this complicated tree structure. So I highly recommend that you try out these libraries and see what works for you. I can tell you really just briefly before I run out of time here, that how they compare is basically, again, Mori is from the Clojure world, it's ClojureScript. But the Immutable.js has more of the o.get() kind of feel to it, if you're comfortable writing JavaScript like that. However, for me, it gives me a little bit of a cognitive dissonance there because it looks like we're mutating things with those calls -- we're not -- but for me, to get more into the mindset of functional programming, I prefer the functional programming of Mori because it gets to the way that we conceive things as inputs and not just outs. We don't want to be in the mindset of making changes in place to objects. There's also some minor performance differences between the two, Mori is a bit faster, and Immutable.js is a bit smaller. But they're both great options, try them out, and I hope one of them works for you. So that's my talk. I hope it's been useful. Go forth and don't mutate your data! Here's some references for you.
B1 中級 美國腔 Anjana Vakil:功能性JS的不可改變的數據結構|JSConf EU (Anjana Vakil: Immutable data structures for functional JS | JSConf EU) 41 0 dongshufeng 發佈於 2021 年 01 月 14 日 更多分享 分享 收藏 回報 影片單字