字幕列表 影片播放 列印英文字幕 [MUSIC PLAYING] DAVID MALAN: So today we're going to talk about challenges at this crucial intersection of law and technology. And the goal at the end of today is not to have provided you with more answers, but hopefully generated more questions about what this intersection is and where we're going to go forward. Because at this intersection lie a lot of really interesting and challenging problems that are at the forefront of what we're doing. And you, as a practitioner, may be someone who is asked to confront and contend with and provide resolutions for some of these problems. This lecture's going to be divided into two parts roughly. In the first part, we're going to discuss trust, whether we can trust the software that we receive and what implications that might have for software that's transmitted over the internet. And in the second part, we're going to talk about regulatory challenges that might be faced. As new emergent technologies come into play, how is the law prepared, or is the law prepared to contend with those challenges? But let's start by talking about this idea of a trust model, trust model being a computational term for basically do we trust something that we're receiving over the internet? Do we trust that software is what it says it is? Do we trust that a provider is providing a service in the way they describe, or are there doing other things behind the scenes? Now, as part of this lecture, there's a lot of supplementary reading materials that we've incorporated in that we're going to draw on quite a bit throughout the course of today. And the first of those is a paper called "Reflections on Trusting Trust." This is arguably one of the most famous papers in computer science. It was written in 1984 by Ken Thompson. Ken Thompson was one of the inventors of the Unix operating system, on which Linux was based, on which subsequently, based on a version of Linux, Mac OS is based. And so he's quite a well-known figure in the computer science community. And he wrote this paper to accept an award called the Turing Award, again, one of the most famous awards in computer science. And in it, he's trying to highlight the problem of trust in software. And he begins by discussing about a computer program that can reproduce itself. We typically refer to this as what's called a quine in computer science. But the idea is can you write a simple program that reproduces itself? And we won't go through that exercise here. But Thompson shows us that, yes, it is relatively trivial actually to write programs that do this. But what does this then lead to? So the next step of the process that Thompson discusses is stage two in this paper, is how do you teach a computer to teach itself something? And he uses the idea of a compiler. Recall that we use compilers in some programming languages to turn source code, the human-like syntax that we understand-- languages like C, for example, will be written in source code. And they need to be compiled, or transformed, into zeros and ones, machine code, because computers only understand these zeros and ones. They don't understand the human-like syntax that we're familiar with as programmers when we are writing our code. And what Thompson is suggesting that we can do is we can teach the compiler, the program that actually takes the source code and transforms it into zeros and ones, to compile itself. And he starts out by doing this by introducing a new character for the compiler to understand. The analogy is drawn to the newline character, which we type when we reach the end of a line. We want to go down and back to the beginning of a new one. We enter the newline character. There are other characters that were not initially envisioned as part of the C compiler. And one of those is vertical tab, which basically allows you to jump down several lines without necessarily resetting back to the beginning of the line as newline would. And so Thompson goes through the process, that I won't expound on here because it's covered in the paper, of how to teach the compiler what this new character, this vertical tab means. He shows us that we can write code in the C programming language and then have the compiler compile that code into zeros and ones that create something called a binary, a program that a computer can execute and understand. And then we can use that newly created compiler that we've just created to compile other C programs. Which means that once we've taught the computer how to understand what this vertical tab character is, it then can propagate into any other C program that we write. The computer is learning, effectively, a new thing to interpret, and it can then interpret that in every other program. But then Thompson leads us into stage three, which is, what if that's not all the computer or the compiler does? What if instead of just adding that vertical tab character whenever we did it, we also secretly, as part of the source code, insert a bug into the code, such that now whenever we compile the code and we encounter that backslash V, that vertical tab character, we're not only putting that into the code so that the computer can understand and pass this slash V, the character that it never knew about before, but we've also sort of surreptitiously hidden a bug in the code. And again, Thompson goes into great detail about exactly how that can be done and exactly what steps we can then take to make it look like that was never there. We can change the source code, modify it, and make it look like we never had a bug in there, even though it is now propagating into all of the source code we ever write or we ever compile going forward. We've created a way to surreptitiously hide bugs in our code. And the conclusion that Thompson draws is, is it possible to ever trust software that was written by anyone else? In this course we've talked about some of the tools that are available to programmers that would allow them to go back in time-- for example, we've discussed GitHub on several occasions to go back in time-- and see prior versions of code. In the 1980s, when this paper was written, that wasn't necessarily possible. It was relatively easy to hide source code changes so that the untrained eye wouldn't know about them. Code was not shared via the internet. Code was shared via floppy disks or hard disk that were being passed between people who needed them. And so there was no easy way to verify that code that was written by somebody else is actually trustworthy. Now, again, this paper came out 35-plus years ago now. And it came out around the time that the Computer Fraud and Abuse Act, which we've also previously discussed, was being drafted and run through Congress. Did lawmakers heed the advice of Ken Thompson? Do we still today trust that our programs that we receive or that we write are free of bugs? Is there a way for us to verify that? What should happen if code is found to be buggy? What if it's unintentionally buggy? What if it's maliciously buggy? Do we have a way to challenge things like that? Do we have a way to prosecute those kinds of cases if the bug creates some sort of catastrophic failure in some business? Not exactly. The challenge of figuring out whether or not we should trust software is something that we have to contend with every day. And there's no bright line answer for exactly how to do so. Now let's turn to perhaps a more modern interpretation of this idea and take a look at the Samsung Smart TV policy. So this was a bit of news a few years ago, that Samsung was recording or was capturing voice commands so people could make use of their television without needing a remote. You could say something like, television, please turn the volume up, or television, change the channel. But it turned out that when Samsung was collecting this information, they were transmitting it to a third party, a third-party language processor, who would ostensibly be taking the commands they hear and feeding them into their own database to improve the quality of understanding what these commands were. So it would hear-- let's say thousands of people use this brand of television. It would take the thousands of people's voices all making the same command, feed it into its algorithm to process this command, and hopefully try and come up with a better or more comprehensive understanding of what that command meant to avoid the mistake of I say one thing, and the TV does something else because it misinterprets what I do. If you take a look at Samsung's policy, it says things like the device will collect IP addresses, cookies, your hardware and software configuration, so the settings that you have put onto your television, your browser information. Some of these TVs, these smart TVs, have web browsers built into them. And so you may be also sharing information about your history and so on. Is this necessarily a bad thing? When it became a news story it was mildly scandalous in the tech world because it was unexpected. No one thought that that was something a television should be doing. But is it really all that different from when you use your browser anyway? We've seen in this course that whenever we connect to a website, we need to provide our IP address so that the site that we're requesting, the server, knows where to send our data back to. And in addition. As part of those HTTP headers, we not only send our IP address, but we're usually sending information about what operating system or running, what browser we're currently using, where geographically we might be located, so ways to help the routers route traffic in the right direction. Are we leaking as much information when we use the internet to make a request as we are when our television is interpreting or understanding a command? Why is it that this particular action, this interpretation of sound, feels so much more of a privacy violation than just accessing something on the internet when we're voluntarily, sort of, revealing the same information? Are we not voluntarily relinquishing the same information to a company like Samsung, whose smart TVs sort of precipitated this? Moreover, is it technologically feasible for Samsung to not collect all of the sounds that it hears? One of the big concerns as well that came up with these smart TVs is that when does the recording and transmitting start? For those of you who maybe have seen old versions of Star Trek, you may recall that in order to activate the computers on that television show, someone would just say computer. And then the computer would sort of spring to life, and then they could have a normal English language interaction with it. There's no need to program specific commands or click anything or have any other interaction other than voice. How would we technologically accomplish that now? How would a device know whether or not it should be listening unless it's listening for a specific word? Is there a way for the device to perhaps listen to everything that comes in but only start sending information when it hears a command? Is it impossible for it not to capture all of the information that it's hearing and send it somewhere, encrypt it or not encrypt it, and just transmit it somewhere else? It's kind of an interesting question. Samsung also allows not only voice controls, but gesture controls. This may help people who are visually impaired or help people who are unable to use a remote control device. They can wave or make certain gestures. And in so doing, they're going to capture your face perhaps as part of this gesture. Or they may capture certain movements that you're making or maybe even capture, depending on the quality of the camera built into the television, aspects of the room around you. Is this necessarily problematic? Is this something that we as users of this software need to accept as something that just is part of the deal? In order to use this feature, we have to do it? Is there a necessary compromise? Is there a way to ensure that Samsung is properly interacting with our data? Should there be a way for us to verify this? Or is that proprietary to Samsung, the way that it handles that data? Again, these are all sorts of questions that we really want to know the answers to. We want to know whether or not what we are saying we're doing is secure, is private. And we can read the policies of these organizations that are providing these tools for us to interact with. But is that enough? Do we have a way to verify? Is there anything we can do other than just trust that these companies are doing what they say they're doing, or services or programmers are providing tools that do exactly what they say that they do? Without some really advanced knowledge and skill in tech, the answer is no. And even if you have that advanced skill or knowledge, it's really hard to take a look at a binary, zeros and ones, the actual executable program that is being run on these devices, and look at it and say, yeah, I think that that does match the source code that they provided to me so I can really feel reasonably confident that yeah I trust this particular piece of software. As we've discussed in the context of security, trust is sort of something we have to deal with. We're constantly torn between this tension of not trusting other people, and so we encrypt everything, but needing to trust people in order for some things to work. It's a very delicate balancing act that we have to contend with every day. And again, I don't mean to pick on Samsung here. This is just one of many different examples that have sort of existed in popular culture. Let's consider another one, for example. Let's consider a piece of hardware called the Intel Management Engine, or hardware, firmware, software, depending on what it is, because one of the open questions is, what exactly is the Intel Management Engine? What we do know about it is that it is usually part of the CPU itself. It's unclear. It's not exactly been publicly disclosed whether it's built into the CPU or perhaps built into the CMOS or the BIOS, different parts, low-level parts of the motherboard itself. But it is a chip or some software that runs on a computer, whose intended purpose is to help network administrators in the event that something has gone wrong with a computer. So recall that we previously discussed this idea that it's possible to encrypt your hard drive, and that there are also ramifications that can happen if you encrypt your hard drive and forget exactly how to un-encrypt your hard drive. What the Intel Management Engine would allow, one of its several features, is for a network administrator, perhaps if you're in an enterprise suite, your IT professional, your head of IT might be able to access your computer remotely by issuing commands, because the computer is able to listen on a specific port. It's like 16,000 something. I don't remember exactly the port number. And it's discussed again, as well, in the article provided. But it allows the computer to be listening for a specific kind of request that should only be coming from an administrator's computer to be able to remotely access another computer. But the concern is because it's listening on a specific port, how is it possible to ensure that the request that it's receiving on that port or via that IP address are accurate? Because Intel has not disclosed the actual code that comprises this module of the IME. And then the question becomes, is that a problem? Should they be required to reveal that code? Some will certainly argue yes it's really important for us as end users to understand what software is running on our devices. We have a right to know what programs are running on our computers. Others will say, no, we don't have a right to do that. This is Intel's intellectual property. It may contain trade secret information that allows its chips to work better. We don't, for example, argue Coca-Cola should be required to reveal its secret formula to us because it may implicate certain allergies or Kentucky Fried Chicken needs to disclose its secret recipe to us. So why should Intel be required to tell us about the lines of code that comprise this part of its hardware or software or firmware, again depending on exactly what it is, because it's slightly unclear as to what this tool is. So the question again is, are they required to provide some degree of transparency? Do we have a right to know? Should we just trust that this software is indeed only being used to allow remote access only to authorized individuals? If Intel were to provide a tool to tell us whether our computer was vulnerable to attack from outside computers accessing our own personal computers outside of the enterprise context, should we trust the result of the software that Intel provided that tells us whether or not it is vulnerable? As it turns out, Intel does provide this software to tell you whether or not your IME chip is activated in such a way that yes, you are subject to potential remote access or no, you are not. Does saying that you are or your aunt reveal potential trade secret-related information about Intel? Should we be concerned that Intel is the one providing us this information versus a third party providing us this information? Of course, Intel being the only organization that really can tell us that we're vulnerable or not because they're the only ones who know what is on this software. So again, not picking on any individual company here, just drawing from case studies that exist in popular culture from in tech circles about the kinds of questions that we need to start considering and wrestling with. Are they going to be required to disclose this information? Should Samsung be revealing information about what sorts of data it's collecting and how it's collecting it? Do we trust that our compilers, as Ken Thompson alluded to, actually compile our code the way that they say that they do? This healthy skepticism is always at the forefront of our mind when we're considering programming- and technology-related questions. But how do we press on these issues further in a legal context? That's still to be determined. And that's going to be something that we're going to be grappling with for quite some time, I think. Another key issue that's likely to be faced by technologists and the lawyers who represent them, particularly startups working in a small environment with limited numbers of programmers that may be relying on material that's been open sourced online, is this idea of open source software and licensing. Because the scheme that exists out there is quite complicated. There are many, many different licenses that have many, many different provisions associated with them. And each one will have different combinations of some of these things being permitted, some of them not, and potential ramifications of using some of these licenses. We're going to discuss three of the most popularly used licenses, particularly in the context of open source software, generally that is released on GitHub. And the first of these is GPL version 3, GPL being the new Public License. And one of the things that GPL often gets criticism for is it is known as a copyleft license. And copyleft is sort of designed to be the inverse of what copyright protection's usually thought of. Copyright protections give the owner or the person who owns the copyright, not necessarily the creator but the person who owns the copyright, the ability to restrict certain behaviors associated with that work or that material. The GPL sort of does the opposite. Instead of restricting the rights of others, it compels others, who use code that has been licensed under the GPL, to avoid allowing any restrictions at all, such that others can also benefit from using and modifying that same source code. The catch with GPL is that any code that incorporates the GPL-- GPL license, excuse me. Any code that includes GPL-licensed code-- so say you incorporate some module written by somebody else, or your client incorporate something that they found on GitHub or found on the internet and wants to include it into their own project. If that code is licensed under the GPL, unfortunately one of the side effects perhaps of what your client or what you have just done is you have transformed your entire work into something that is GPL, which means you are also then required to make the source code available to anybody, make the binary available to anybody, and also to allow anybody to have the same rights of modification and redistribution that you had as well. So think about some of the dangers that might introduce for a company that relies extensively on GPL license code. They may not be able to profit as much from that code as they thought they would. Perhaps they thought they had this amazing disruptive idea that was going to transform the market. And this particular piece of GPL code that they found online allowed them-- it was the final piece of the puzzle that they needed. When they included it in their own source code, they transformed their entire project, according to the terms of the GPL license, into something that was also GPL licensed. So their profitability-- they could still sell it. But their profitability may be diminished because the source code is available freely to anybody to access. Now, some people find this particularly restrictive. In fact, pejoratively sometimes this is referred to as the GNU virus, the General Public License virus, because it propagates so extensively. As soon as you touch code or use code really that is GPL licensed, suddenly everything that it touches is also GPL licensed. So it's, depending on your perspective of open source licensing, it's either a great thing because it's making more stuff available, or it's a bad thing because it is preventing people from using open source material to create further developments when they don't necessarily want to license those changes or modifications that they made. The lesser General Public License, or the lesser GNU Public License, is basically the same idea, but it only applies to a library code. So if code is LGPL-ed, what this basically means is any modifications that you make to that code also need to be LGPL-ed, or released under the LGPL license. But other ancillary things that you do in your program that overall incorporates this library code does not need to be LGPL-ed. So it would be possible to license it under other terms, including terms that are not open source at all. So changes that you make to the library need to be propagated down the line so that other people can benefit from the changes that are specific to the library that you made. But it does not necessarily reflect back into your own code. You don't have to necessarily make that publicly available. So this is considered slightly lesser in terms of its ability to propagate. And also, though, it's considered lesser in terms of its ability to grant rights to others. Then you have, at the other end of the extreme, the MIT license. The MIT license is considered one of the most permissive licenses available. It says, here's the software. Do whatever you want with it. You can make changes to it. You don't have to re-license those changes to others. You can take this code and profit from it. You can take this code and make whatever-- re-license it under some other scheme if you want. So this is the other end of the extreme. Is this license copyleft? Well, no, it's not copyleft because it doesn't require others to adhere to the same licensing terms. Again, you can do with it whatever you would like. Most of the code that is actually found on GitHub is MIT licensed. So in that sense, using code that you find online is not necessarily problematic to an entrepreneur or a budding developer who wants to profit from some larger program that they write if it incorporates MIT-licensed code, which might be an issue for those who are incorporating GPL-licensed code. What sorts of considerations, then, would go into deciding which license to use? And again, these are just three of many, many licenses that exist that pertain to software development. Then, of course, there are open source licenses that are not tied to this at all. So for example, a lot of the material that we produce for CS50, the course on which this is based at Harvard College, is licensed under a Creative Commons license, which is similar in spirit to a GPL license, in as much as it oftentimes will require people to re-license the changes that they make to that material under GPL-- or under Creative Commons, excuse me. It will generally require a non-commercial aspect of it. It is not possible to profit from any changes that you make and so on. And that's not a software license. That's more of a general media-related license. So these software open source licenses exist in both contexts. But what sorts of considerations might go into choosing a license? Well, again, it really does depend on the organization itself. And so that's why understanding a bit about these licenses certainly comes into play. Do you want your changes to propagate and get out into the market more easily? That might be a reason to use the MIT license, which is a very permissive. Do you just feel compelled to share code with others, and you want to insist that others share that code as well? Then you might want to use GPL. Do you potentially want to use open source code but not release your own code freely to others, the changes that you make to interact with that code? That might be cause for relying on LGPL for the library code that you import and use but licensing your own changes and modifications under some other scheme. Again, a very complex and open field that's going to require a lot of research for anyone who's going to be pursuing and helping clients who are working with software development and what they want to do with that code going forward. So let's turn our attention now from issues that have existed for a while and sort of been bubbling underneath the surface, issues of trust and issues of software licensing-- those have been around a lot longer-- and start to contend with new technologies and how the law keeps up with them. And so you'll also hear these terms that are being considered emergent technologies or new technologies. You'll sometimes see them referred to as disruptive technologies because they are poised to materially affect the way that we interact with technology, particularly in terms of purchasing things through commerce, for example, as in the case of our first topic, 3D printing. So how does 3D printing work, is a good question to ask at the outset. Similar in spirit to a 2D printer, with a 2D printer you have a write head that spits out ink, typically in some sort of toner. It moves left to right across a piece of paper. And the paper's also fed through some sort of feeder. So the left-to-right movement of the toner or ink head is the x-axis movement. And the paper rolling underneath that provides y-axis movements. Such that when we're done, we may be able to get access to a piece of paper that has ink scattered across it, left to right, top to bottom. 3D printers work in very much the same way, except instead of their medium, instead of being ink or toner, is typically some sort of filament that is conventionally, at least at the time of this recording, been generally plastic based. And what basically happens is the plastic is melted just to above the melting point of the plastic. And then it is deposited onto some surface. And that surface that is being moved over by a similar read-write head, basically it's a nozzle or eyedropper basically of plastic. And it can move up and down across a flat surface, similar to what the printer would do. But instead of just being flat, the arm can also move up and down. On some models of 3D printers, the table can move up and down to allow it to not only print on the xy-plane, but also on the z-axis. So it can print in space and create three-dimensional objects, 3D printing. Typically the material used, again, is melted plastic just above the melting point. So that by the time it's deposited onto the surface or onto other existing plastic, it's already basically cooled enough that it's hardened again. So the idea is we want to just melt it enough so that by the time it's put onto some other surface, it re-hardens and becomes a rigid material once again. Now, 3D printing is usually considered to be a disruptive technology because it allows people to create items they may not otherwise have access to. And of course, the controversial one that is often spoken about in terms of we need to ban things or we need to ban certain 3D printers or ban certain 3D printing technologies is guns, because it's actually possible, using technology that exists right now, to 3D print a plastic gun that would evade any sort of metal detection that is usually used for detecting guns and is fully functional. It can fire bullets, plastic bullets or real metal bullets. The article that is recommended that goes with this part of the discussion proposes several different ways that we might be able to-- or the law may be able to keep up with 3D printing technologies. Because, again, the law typically lags behind technology, and so is there a way that the law can contend with this? And there are a couple of options that it proposes that I think are worthy of discussion. The first is allow permission-less innovation. Should we just allow people to do whatever they want with it, the 3D printing technology, and decide ex post facto this, what you just did, is not OK, the rest of it's fine and disallow that type of thing going forward? This approach is interesting because it allows people to be creative, and it allows potentially for things to be revealed about 3D printing technology that were not possible to forecast in advance. But is that reactive-based approach better? Or should we be proactive in trying to prevent the production of certain things that we don't want to be produced? And moreover, all the plastic filament tends to be the most popular and common way that things are 3D printed right now. 3D printers are being developed that are much more advanced than this. We are not necessarily restricted to plastic-based printing. We may have metal-based printing. And you may have even seen that there are 3D printers that exist that can produce organic materials. They use human cells, basically, to create things like organs. Do we want people to be able to create these things? Is this the kind of thing that should be regulated beforehand rather than regulated after we've already printed and exchanged copyrighted designs for what to build and construct? Is it too late by the time we have regulated it to prevent it from being reproduced in the future? Another thought that this article proposes is immunizing intermediaries. Should we allow people to do whatever they want with 3D printing? Or maybe not allow people to do whatever they want 3D printing, but regardless don't punish the manufacturers of 3D printers and don't punish the designers of the CAD files, the Computer-Aided Design files, that generally go into 3D printing? Is this a reasonable policy approach? It's not an unheard of policy approach. This is the approach that we typically have used with respect to gun manufacturers, for example. Gun manufacturers generally are not subject to prosecution for crimes that are committed using those guns. Should we apply something similar to 3D printers, for example, when the printer is used to manufacturer a gun? Who should be punished in that case, the person who designed the gun model, the person who actually printed the gun, the 3D printer manufacturer itself, any of those people? Again, an unanswered question that the law is going to have to contend with going forward. Another solution potentially is to rely on existing common law. But the problem that typically arises there is that there is not a federal common law. And so this would potentially result in 50 different jurisdictions handling the same problem in different ways. Whether this is a good thing or a bad thing, again, sort of dependent on how quickly these things move. Common law, as we've seen, certainly is capable of adapting to new technologies. Does it do it quickly enough for us? Finally, another example that is proposed is that we could just allow the 3D printing industry to self-regulate. After all, we, as attorneys, self-regulate, and that seems to work just fine. Now, granted this may be because we are in an adversarial system, and so there's advantages and extra incentives for adversaries to insist that we are adhering to our ethical principles and doing the right thing. There's also the overhanging threat of outside regulation if we do not self-regulate. So in a lawyer context, adapting this model to 3D printing may work because it seems to be working well for attorneys. Then you consider that social media companies are also self-regulating, with respect to data protection and data privacy. And as we've seen, that's maybe not going so well. So how do we handle the regulation of 3D printing? Does it fall into the self-regulation category? Does that succeed? Does it fall into the self-regulation category that doesn't succeed? Does it require preemptive regulation to deal with? Now, 3D printing also has some other potential concerns. Very easily, by the nature of the technology itself, it's quite capable of violating copyrights, patents, trademarks, potentially more just by the virtue of the fact that you can create things that may be copywritten or patented or trademarked. And there's also prior case law that sort of informs potential consequences for using 3D printers, the Napster case from several years ago, the technology. Napster would allow peer-to-peer sharing of digital music files. Basically that service was deemed to entirely exist for the purpose of violating copyright. And so that shut down Napster basically. Will 3D printers suffer the same fate? Because you could argue that 3D printers are generally used to recreate things that may be patented or may be subject to copyright. Or is it going to fall more into a category like Sony, which many years ago faced a lawsuit, or was part of a lawsuit involving VCRs and tape-delaying copywritten material? Is that going to be more of a precedent for 3D printing, or is the Napster case going to be more of a precedent for 3D printing? Again, we don't really know. It's up to the future practitioners of technology law, who are forced to grapple with the challenges presented by 3D printing, to nudge us in that direction, one way or the other. To dive a bit more deeply into this topic of 3D printing, I do recommend you take a look at this article, "Guns Limbs and Toys-- What Future for 3D Printing?" And if you're particularly interested in 3D printing and some of the ramifications of it and the technological underpinnings of it, I do encourage you to also take a look at "The Law and 3D Printing," which is a Law Review article from 2015, which also is periodically updated online. And it's a wonderful bibliography of all the different things that 3D printing does. And it will presumably continue to be updated as cases and laws come into play that interact with 3D printing and start to define this relatively ambiguous space. Another particularly innovative space that really pushes the boundaries of what the law is capable of handling is the idea of augmented reality and virtual reality. And we'll consider them in that order. Let's define what augmented reality is. And the most common example of this that you may be familiar with is a phenomenon from several years ago called Pokemon Go. It was a game that you played on your mobile phone. And you would hold up your phone, and you would see through the camera's lens, as if you were taking a picture, the real world through the lens of the camera. But superimposed onto that would be digital avatars of Pokemon, which is part of this game of collectible creatures that you're trying to walk around and find and capture, basically. So you would try and throw some fake ball at them to capture them. So augmented reality is some sort of technical graphical overlay over the real world. Contrast this with virtual reality, in which one typically wears a headset of some sort. It's usually proprietary. It's not generally available as an app, for example, like the augmented-reality game Pokemon Go was. It's usually tied to a specific brand of headset, like Oculus being one type of headset, for example. And it is an immersive alternate reality basically. When you put the headset on, you don't see the lens of the world around you. You are transformed into another space. And to make the experience even more immersive is the potential to wear headphones, for example, so that you are not only immersed in a visual space, but also immersed in a soundscape. Now, something that's particularly strange about these environments is that they are still interactive. It is still possible for multiple people, scattered in different parts of the world, to be involved in the same virtual reality experience, or the same augmented-reality experience. Let's now consider virtual reality experiences, where you are taken away from the real world. What should happen if someone were to commit a crime in a virtual reality space? Studies have shown that people who are immersed in a virtual reality experience can have serious ramifications. They can have real feelings that last for a long time based on their experiences in them. For example, there's been a study out where people put on a virtual reality headset, and they were then immersed in this space where they were standing on a plank. And they were asked to step off the plank. Now, in the real world, this would be just like this room. I can see that everything around me is a carpet. There's no giant pit for me to fall into. But when I have this headset on, I'm completely taken away from reality as we see it here. The experience is so pervasive for some people that they walk to the edge of the plank, and they freeze in fear. They can't move. There's a real physical manifestation in the real world of what they feel in this reality. And for those brave people who are able to take the step off the edge, many of them lean forward and try and fall into the space. And some of them may even get the experience like when you're on a roller coaster, and you feel that tingle in your spine as you're falling. The sense that that actually is happening to you is so real in the virtual reality space that you can feel it. So what would be the case, then, if you are in a virtual reality space, and someone were to pull a virtual gun on you? Is that assault? Assault is a crime where your perception of harm is a material element. It's not actual harm. It's your perception of it. You can perceive in the real world when somebody points a gun at you, this fear of imminent bodily harm. Can you feel that same imminent bodily harm in a virtual world? That's not a question that's really been answered Moreover, who has jurisdiction over a crime that is committed in virtual reality? It's possible that I, here in the United States, might be interacting with someone in France, who is maybe the perpetrator of this virtual assault that I'm describing. Is the crime committed in the United States? Is the crime committed in France? Do we have jurisdiction over the potential perpetrator, even though all I'm experiencing or seeing is that person's avatar as opposed to their real persona? Does anyone have jurisdiction over it? Does the jurisdiction only exist in the virtual world? Virtual reality introduces a lot of really interesting questions that are poised to redefine the way we think about jurisdiction in defining crimes and the prosecutability of crimes in a virtual space. Some other terms just to bring up as well that sort of tangentially relate to virtual and augmented reality so that you're familiar with them are the real-world crimes that are very technologically driven of doxing and swatting. Doxing, if unfamiliar, is a crime involving revealing or exposing the personal information of someone on the internet with the intent to harass or embarrass or do some harm to them by having that exposed, so, for example, revealing somebody's phone number such that it can be called incessantly by other people. As well as swatting, which is a, well, pretty horrible crime, whereby an individual calls the police and says, John Smith is committing a crime at this address, is holding me hostage, or something like that, with the intention that the police would then go to that location and a SWAT team would go, hence the term swatting, and potentially cause serious injury or harm to the ostensibly innocent John Smith, who's just sitting at home doing nothing. These two crimes are generally interrelated. But they oftentimes come up in the technological context, usually as part of the same conversation, when we're thinking about virtual reality crimes. One of the potential upsides, though, if you want to think about it like that, of crimes that are committed in virtual or augmented reality are-- well, there's actually a few. First, because it is happening in a virtual space, and because generally in the virtual space all of our movements are tracked, and the identities of everybody who's entering and leaving that space are tracked by way of IP addresses, it may be easier for investigators to figure out who the perpetrators of those crimes are. You know exactly the IP address of the person who apparently initiated this threat against you in the virtual space, which may perhaps make it easier to go and find that person in reality and question them about their involvement in this alleged crime. The other thing that's fortunately a good thing about these crimes, and this is not to mitigate the effect that these crimes can have, is that usually you can kind of mute them from happening. If somebody is in a virtual space, and they're just screaming constantly, such that you might consider that to be disturbing the peace when you're in a virtual space trying to have some sort of pleasant experience ordinarily, you usually have the capability of muting them. This is not a benefit that we have in real life. We generally can't stop crimes by just pretending they're not happening. But in a virtual space, we do have that luxury. That's, again, not to mitigate some of the very unpleasant and unfortunate things that can happen in virtual reality that are just inappropriate. But being in that space does allow people the option to get away from the crime in a way that the confines of reality may not allow. But again, this is a very challenging area because the law is not really equipped right now to handle what happens in an alternate reality, which effectively virtual reality is. And so, again, if you're considering trying to figure out the best way to prosecute these issues or deal with these issues, you may be at the forefront of trying to define how crimes are dealt with in a virtual space. Or how potentially, if working with augmented reality, if malicious code is put up in front of you to simulate something that might be happening in the real world, how do you prosecute those kinds of crimes, where you may be, for example, using a GPS program that is designed to navigate you in one direction versus the other based on the set of glasses that you're wearing so you don't have to keep looking at your phone to make sure that you're going the right way. What if somebody maliciously programs that augmented-reality program to route you off a cliff somewhere, right? How do we deal with that? Right now, again, augmented-reality virtual reality, it's a relatively untested space for lawyers in the law. In the second part of today's lecture, we're going to take a look at some potential regulatory challenges going forward, some issues at the forefront of law and technology generally related to privacy and how the law is ill equipped or hopefully soon to be equipped to handle the challenge that these issues present. And the first of these is your digital privacy, in particular, the abilities of organizations, companies, and mobile device manufacturers to track your whereabouts, whether that's your digital whereabouts, where you go on the internet, or your physical whereabouts. We'll start with the former, your digital whereabouts. So there's an article we provided on digital tracking technologies. This is designed to be a primer for the different types of things that companies, in particular their marketing teams, may do to track individuals online with, again, relatively little recourse for the individuals to know what sorts of information is being gathered about them, at least in the US. Now, of course, we're familiar with this idea of a cookie from our discussion of interacting with websites. It's our shorthand way to bypass the logging credentials and show sort of a virtual hand stamp saying, yes, I am who I say I am. I've already previously logged into your service. Cookies are certainly one way that a site can track a recurrent user from coming to the site over and over and over. Now, this article posits that most consumers have just come to accept that they're being tracked, like that's just part of the deal with the internet. Do you think that using cookies and being tracked is an essential requirement of what it means to use the internet today? And if you do think that, is that the way it should be? And if you don't think that, is that also the way it should be? Or should we be considering the fact that tracking is happening? Is that an essential part of what it means to use the internet? We also need to be concerned about the types of data that companies are using or collecting about us. Certainly cookies are one way to identify who we are. But also it's possible for a cookie to be identified with what types of data an individual accesses while visiting a particular site. So for example, if I am on Facebook, and I'm using my cookie, and I'm looking up lots of pictures on Facebook-- I'm just I'm searching for all my friends profiles and clicking on all the ones that have cats in them-- that might then give Facebook, or the administrator of that site, the ability to pair that cookie with a particular trend of things that that cookie likes. So in this case, it might want to then-- it knows, OK, maybe the person who owns this cookie likes cats. And as such, it may then start to serve up advertisements related to cats to me. And then when I log into a site, it's going to get information about my IP address. And if I use that cookie, it has now mapped my IP address to the fact that I like cats. And then it could sell the information about me, this particular IP address-- I guess it's not necessarily me because one IP address usually covers a house but gets you pretty close-- maps this particular IP address to somebody who likes cats. So they may sell that to some other service. Now, it turns out that IP addresses are generally allocated in geographic blocks, which means that, again, just by virtue of the fact that I log into a particular site, I'm able to access and access similar data when visiting that site. They may not be able to geographically isolate down to-- again, depending on how populated the area you are currently living in is, possibly narrow it down to a city block, that someone in this city block really likes cats. And then this company may be involved in targeted actual physical mail advertising, snail mail advertising, where some company that sells cat products, like a pet store or something, might target that particular block with advertising, in the hopes that because of this data that has been collected about this particular cookie, who then logged in with a particular IP address, which we've zeroed in to a particular geographic location-- it's kind of feeling a little unsettling, right? Suddenly something that we do online is having a manifestation, again, in the real world, where we're getting targeted advertising not just on sites that we visit, but also in our mailbox at home. It's a little bit discomfiting. Should IP addresses be allocated in this way? Is this the kind of thing that technologically can be changed? The latter answer is yes, it is possible to allocate IP addresses in a different way than we typically do. Should we allocate IP addresses in a different way than we typically do? Is the potential threat of receiving real-life advertisements related to your online activities enough to justify that? What would be enough to justify that kind of change? Then, of course, there's the question of tracking not in the digital world, but in the real world. This is usually done through mobile phone tracking. And so we provide an article from the Electronic Frontier Foundation. And full disclosure, some of the articles we've presented here do have a certain bias in them. The Electronic Frontier Foundation is well-known as a rights advocacy group for privacy. And so they're going to naturally be disinclined to things that involve tracking of data and so on. So just bear that in mind, some additional context when you're considering this article. But it does contain a lot of factual information and not necessarily just purely opinion about things that should be changed. Although it does advocate for certain policy changes. Now, why is it that tracking on a mobile device is oftentimes perceived as much worse than tracking on a laptop or desktop? Well, again, first of all, it's your mobile device is generally with you at all times. We've reached the point where our phones are generally carried in our pockets and with us wherever we go, which means that it's very easy to use data that's collected from mobile phone-- information that's given out by the mobile phone, whether that's the cell phone towers or GPS data and so on, to pinpoint that to us. The other concern is that mobile phones are very, very quick to become obsolete. Oftentimes one or two versions of a new version of a phone, whether it's a new Android phone release or software release or a new iPhone or so on, the version that came out two years ago is generally obsolete, which means it is no longer subject to firmware patches provided by the manufacturer or the software developers of the operating systems that are run on those phones, which could also mean that they are much more susceptible to people figuring out how to break into those phones and use that tracking information against you. So laptops and desktops generally don't move that much. You may carry your laptop to and from but generally to just a couple locations. It's usually set at a desk somewhere in between. Your desktop, of course, doesn't move at all. So the tracking potential there is pretty minimal. And also those devices tend to last quite a long time, and the lifecycle support for service and keeping those operating systems up to date is quite a bit longer versus the mobile phone, where that window is much, much shorter. Now, phones, contrary to most people's opinions of this, phones do not actually track your information based on GPS data. The way GPS works is your phone just fires off a signal, and it gets a response back that is trying to triangulate where exactly you are in space. But there's no information about what device requested that data or so on. And generally that data's not stored on the phone or in the GPS satellite in any way. It's just sort of ask-and-answer type inquiry. The real threat vector for phone tracking, if this is the kind of thing that you're concerned about, is actually through cell phone towers because cell phone towers do track this information. Different companies own different towers. They would like to know who is using each tower, whether or not this may involve also charging the-- say I'm using a Verizon phone, and I happen to be connected to an AT&T tower. AT&T may wish to know that this is mostly being used by Verizon customers. And the only way they really know that is by mapping the individual device to the phone number, then checking that against Verizon's records. And so they are collecting all this information about every phone that connects their tower so they could potentially bill Verizon for the portion of their customers who were using their infrastructure. So these towers do track information. And towers also can be used to triangulate your location. If I'm standing in the middle of an open field, for example, and there's a tower over there and a tower maybe just beside me, generally the signal that I'm sending-- my phone is emitting a signal constantly. If I'm emitting one signal in that direction, and it's received by a tower fairly weakly, and if I'm emitting another-- my phone is, again, radially sort of emitting the signal. If right next to me is another tower that's picking it up very strongly, in space I can use the information, sort of extrapolating from these two points, I'm most likely here. So even without having GPS turned on, just by trying to make a phone call or use a 2G, 3G, 4G network, it's pretty easy to figure out where you are in space. And this is potentially a concern. This concern comes up sometimes in the context of are these companies who provide operating systems for phones or firmware for phones, are they at the behest of government agencies, who may request back doors into the devices so that they can then spy on individuals? And certainly this might be something that comes up in a FISA court or the like, where they're trying to get phone records. And there's always this sort of unknown. Is it happening to all of our devices all the time? Is it is it happening right now the phone in my pocket? Or is the sound being captured in such a way that it can be transmitted just because? Because there happens to be a backdoor in the operating system or a backdoor in the firmware that allows anybody to listen to it, even if they're not supposed to be listening to it. It's really hard to pretend to be somebody that you're not with a phone. As you saw, it's pretty easy to pretend to be somebody that you're not with a computer you can use a service like a VPN, which pretends to be a different IP address. You connect to the VPN. And as long as you trust VPN, the VPN ostensibly protects your identity. With mobile phones, every device has a unique ID. And it's really hard to change that ID. So one way around this is to use what are called burner phones, devices that are used once, twice, and then they're thrown away. Now, this again comes down to how concerned are you about your privacy? How concerned should you be about your privacy? Are you concerned enough that you're willing to purchase these devices that are one-time, two-time use devices, which you then throw away and constantly do that? And moreover, it's actually kind of interesting to know that burner phones don't actually do-- they're not shown to do anything to protect one's identity or privacy because it tends to be the case that we call the same people, even if we're using different phones. And so by virtue of the fact that this number seems to be calling this number and this number all the time, like maybe it's my work line and my family, my home number. If I'm always calling those two numbers, even if the phone number changes, a pattern can still be established with the device IDs of all of the other phones, maybe my regular phone plus all the burners that I've had, where you can still craft a picture of who I am, even though I'm using different devices, based on the call patterns that I'm making. As usual, humans are the vulnerability here. Humans are going to use the same-- they're going to call the same people and talk to the same people on their phones all the time. And so it's relatively easy for mobile devices to track our locations. Again, every device has a unique ID. You can't hide that ID. That ID is part of something that gets transmitted to cell towers. And potentially the threat exists that if somebody is able to break into that phone, whether that's because of old, outdated firmware that's not been updated or because of the potential that there is some sort of backdoor that would allow an agent, authorized or not, to access it, again, this vulnerability exists. How does the law deal with do you own the information that is being tracked? Do you want that information to be available to other people? It's an open question. Another issue at the forefront of where we're going, especially when it comes to legal technology and law firms itself availing itself of technology, is artificial intelligence and machine learning. Both of these techniques are incredibly useful potentially to law firms that are trying to process large amounts of data relatively quickly, the type of work that's generally been outsourced to contract attorneys or first-year associates or the like. First of all, we need to define what it means when we talk about artificial intelligence. Generally when we think about that, it means something like pattern recognition. Can we teach a computer to recognize specific patterns? In the case of a law firm, for example, that might be can it realize that something looks like a clause in a contract, a valid clause that we might want to see or a clause that we're hoping not to see in our contracts. We might want to flag that for further human review. Can the machine make a decision about something? Should it, in fact, flag that for review? Or is it just highlighting things that might be alarming or not? Can it mimic the operations of the human mind? If we can teach a computer to do those things-- we've already seen that we can teach a computer to teach itself how to reproduce bugs. We saw that in Ken Thompson's compiler example. If we can teach a computer to mimic the types of things that we would do as humans, that's when we've created an artificial intelligence. There's a lot of potential uses for artificial intelligences in the legal profession, like I said, document review being one potential avenue for that. And there are a few different types of ways that artificial intelligences can learn. There are actually two kind of prevailing major ways. The first is for humans to supply some sort of data and also supply the rules that map the data to some outcome. That's one way. The other way is something called neuroevolution, which is generally best exemplified by way of a genetic algorithm. In a moment, we'll take a look at a genetic algorithm literally written in Python, where a machine learns over time to try and generate the right result. In this model, we give the computer a target, something that it should try and achieve, and request that it generates data until it can match that target that we are looking for. So by way of example, let's see if we can teach a computer to write Shakespeare. After all, it's a theory that given an infinite amount of time, enough monkeys could write Shakespeare. Can we teach a computer to do the same? Let's have a look. So it might be a big ask to get a computer to write all of Shakespeare. Let's see if we can get this computer to eventually realize the following line, the target, so to speak, "a rose by any other name." So we're going to try and teach a computer. We want a computer to eventually on its own arrive at this phrase using some sort of algorithm. The algorithm we're going to use to do it is called the genetic algorithm. Now, the genetic algorithm is called this based on the theory of genetics, that best traits or good traits will propagate down and become part of the defined set of traits we usually encounter. And bad traits, things that we don't necessarily want, will be weeded out of the population. And over successive generations, hopefully only the good traits will prevail. Now, just like any other genetic variation, we need to account for a mutation. We need to allow things to change. Otherwise we may end up in a situation where all we have is the potential for bad traits. We randomly might need something to happen to eliminate that bad trait. We have no other way to do it. So we do have to mutate some of our strings from time to time. How are we going to teach the computer to do this? We're not providing it with any data set to start with. The computer's going to generate its own data set, trying to get at this target. The way we're going to do this is to create a bunch of DNA objects. DNA objects, in this example, we're just going to refer to as different strings. And the strings are just a random-- as exemplified here in this code, a random set of characters. We're going to have it randomly pick. I believe that the string's about 23 characters long that we're trying to have it match. So it's going to randomly pick 23 characters, uppercase letters, lowercase letters, numbers, punctuation marks, doesn't matter, any legitimate Ascii character, and just add itself to the list of potential candidates for the correct phrase. So randomly slam on your keyboard and hit 23 keys. The computer has about 1,000 of those to get started. Every one of those strings, every one of those DNA items, also has the ability to determine how fit it is. Fitness being is it more likely to go on to the next generation? Does it have characteristics that we might want to propagate down the line? So for example, the way we're going to, in a rudimentary way, assess the fitness of a string, how close it is basically to the target, is to go over every single character of it and compare, does this match what we expect in this spot? So if it starts with a T-- or excuse me, starts with an A, "a rose by any other name," if it starts with an A, then that's one point of fitness. If the next character is a space, then that's one point of fitness. So a perfect string will have all of the characters in the correct space. But as long as it has even just one character in the correct space, then it is considered fit. And so we iterate over all of the characters in the string to see if it is fit. Now, much like multiple generations, we need the ability to create new strings from the population that we had before. And so this is the idea of crossover. We take two strings. And again, we're just going to arbitrarily decide how to take two strings and mash them together. We're going to say the first half comes from the mother string, and the second half comes from the father string. And that will produce a child, which may have some positive characteristics from the mother and some positive characteristics from the father, which may then make us a little bit closer towards this idea of having the perfect string. Again, the idea here is for the computer to evolve itself into the correct string rather than us just giving it a set of data and saying, do this. We want to let it figure it out on its own. That's the idea of the genetic algorithm. So we're going to arbitrarily split the string in half. Half the characters, or genes of the string, come from the mother. The other half come from the father. They get slammed together. That is a new DNA sequence of the child. And then again, to account for mutation, we need some random percent of the time, in this case, we're saying less than 1% the time, we would like one of those characters to randomly change. So it doesn't come from the mother or the father string. It just randomly changes into something else, in the hopes that maybe that mutation will be beneficial somewhere down the line. Now, in this other Python file, script.py, we're actually taking those strings that we are just randomly creating-- those are the DNA objects from the previous file-- and starting to actually evolve them over time. So we're going to start out with 1,000 of these random strings. And the best score so far, the closest score we have, the best match to "a rose by any other name" is currently zero. No string is currently there. We may randomly get it on the first generation. That would be a wonderful success. It's pretty unlikely. Population here is just an array. It's going to allow us to store all of these 1,000 strings. And then as long as we have not yet found the perfect string. The one that has 100% fitness or a score of exactly 1, we would like to do the following, calculate the fitness score for every one of those random 1,000 strings that we generated. Then, if what we just found is better than anything we've seen before-- and at the beginning, we start with zero, so everything is better than what we've seen before, as long as it matches at least one character-- then print out that string. So this is a sense of progression. Over time we're going to see the strings get better and better and better. Then we're going to create what's called a mating pool. Again, this is this idea of two strings sort of crossing over. They're sort of breeding to try and create a better subsequent string. Depending on how good that string is, we may want that child to be in the next population more times. If a string is a 20% match, that's pretty good, especially if it's an early generation. So we may want that string to appear in the mating pool, the next generation, 20% of the time. It has a better likelihood than a string that matches 5% of the characters to be closer to the right answer. So a string that barely matches anything, sure, it should be in the pool. Maybe it has the one character that we're looking for. But we only want it in the pool 5% of the time versus the string that matches 50% of the characters. We probably want that in the pool 50% of the time. The idea is, again, taking the best representatives of the next generation and trying to have the computer learn and understand that those are good and see if they can build better and better strings from those better and better representatives of the population that are more close to the target string that we're looking for, "a rose by any other name." Then in here all we're doing is picking two random items from that pool we've just created of the best possible candidates and mating those two together and continuing this process of hopefully getting better and better approximations of this string that we're looking for. And what's going to happen there is they're going to create a crossover. That crossover child DNA string will mutate into some other new string. And we'll add that to the population to be considered for the next round. So we're just going keep going over and over and over, generating hopefully better and better strings. So that's how these two files interact. The first file that we took a look at defines the properties of a string and how it can score itself basically. And this process here in script.py-- and this these two files are based on a Medium post, which we've described in the course materials, as well as an exam question that we've previously asked in the college version of CS50, for students to implement and solve on their own. Hopefully these two files taken together, the script file, will actually go through the process of creating this generation over and over. So let's see this in action. Let's see how in each successive generation we see strings get closer and closer and closer to the target string. Again, we never told the computer-- we never gave the computer a set of starting data to work with, only an end goal. The computer needs to learn how to get closer and closer to finding the right string. And that's what we do here. So let's run our program and see if we've actually taught the computer how to genetically evolve itself to figure out this target string that we're looking for. So we're going to run script.py, which is the Python file where we described the process happening. And let's just see how the generations evolve over time. So we get started, and we have some pretty quick results. This first string here has a matching score of 0.042, so 4%, which I believe is one character. So if we scroll through, we try and find "a rose by any other name," I don't know exactly which character it is here. But this is basically saying one. One of these characters matches. It's 4.2% what we're hoping for. That means that in the next pool, the next iteration, this string will be included 4.2% of the time. And there may also be other strings that also match. Remember, we're only printing out when we have a better string. So this only going to get included 4.2% of the time. But there are going to be plenty of other things that are also 4.2% matches that are probably matching-- each one of them matches one different character. So those will comprise part of the pool. Then we're going to cross pollinate. We're going to take each of those strings that each had a one character match and mash them together. Now, if the first string that we're considering has the character match in the first half, and the second string has a character match in the second half, now we've created a new string that has two matches, right? We know one of them was in the first half. That came from the mother string. We have one of them in the second half that came from the father's string. And so the combined string together, unless that character happens to get mutated out, which is a possibility-- we might actually take a good thing and turn it into a bad character. Then the next one should be twice as good. It should be 8.3% or 8.4% likely. And that's exactly what it is. So this next string has two matches. And the next one has three and four. And as we kind of scroll down, we see some patterns like this, A question mark Q Y. That's obviously not part of the correct answer. But it suggests that there's a parent in here that has this string that tends to have really good fitness. Like this string probably has many other characters outside of this box here that match. And so that parent propagates down the line for a while until eventually those characteristics, in about the ninth generation or so, get kind of wiped out. And as we can see over time, what starts out as a jumbled mess gets closer and closer to something that is starting to look even at 58% like we're getting pretty close to "a rose by any other name." And as we go on and on, again, the likelihood gets better and better. So that by the time we're here, at this line here, this string is going to appear in 87 and 1/2% of the next generation's population. So a lot of these characteristics of this string that's close but not exactly right will keep, appearing which makes it more and more likely that it will eventually pair up with another string that is a little bit better. And as you probably saw, towards the end, this process got slower, right? If all the strings are so good, it might just take a while to find one where the match is better than the parents. It might be the case that we are creating combinations that are worse again. We want to filter those back out. And so it takes a while to find exactly what we're looking for. But again, from this random string at the very beginning, over time, the computer learns what parts are good. So here's "rose," right, as part of the string. This was eventually correct. This got rooted out in the next generation. It got mutated out by accident. But mathematically, what it found was a little bit better. There are more characters in this string that are correct than this one, even if there are some recognizable patterns in the former. But the computer has learned, evolved over time what it means to match that particular string. This is the idea of neuroevolution, teaching a computer to recognize patterns without necessarily telling it what those patterns are, just what the target should be. So that genetic algorithm is kind of a fun programming activity. But the principles that underpin it still apply to a legal context. If you teach a computer to recognize certain patterns in a contract, you can teach a computer to write contracts potentially that match those patterns. You can teach a computer to recognize those patterns and make decisions based on them. So we were using neuroevolution to build or construct something. But you can also use neuroevolution to isolate correct sets of words or correct sets of phrases that you're hoping to see in a contract or that you might want to require for additional use. So again, the types of legal work that this can be used to help automate are things like collation, analysis, doing large document review, predicting the potential outcome of litigation based on having it review case precedents and outcomes and seeing if there are any trends that appear in cases X, Y, and Z all had this outcome. Is there some other common thread in cases X, Y, and Z that might also apply to the case that we're about to try? Or potentially we need to settle because we see that the outcome is going to be unfavorable to us. But does this digital lawyering potentially make you uncomfortable? Is it OK for legal decisions to be made by a computer? Is it more OK if those decisions are made because we've trained them with our own human instincts? There are services out there. There's a famous example of a parking ticket clearing service called Do Not Pay from several years ago, where a 19- or 20-year-old computer programmer basically taught a computer how to argue parking tickets on people's behalf so that they wouldn't have to hire attorneys to do so. He wasn't a trained attorney himself. He just recognized some of the things that are-- he talked to people and recognized some of the things that are common threads for people who successfully challenged parking tickets versus don't successfully challenge parking tickets, taught a computer to mimic those patterns, and have the computer send out notices and the like to defend parking ticket holders. And he was able to-- I think it was several hundred thousand dollars in potential legal fees saved and several hundred thousand parking tickets that were challenged successfully. And the case was dropped, and there was no payment required. So is it OK for computers to be making these decisions if humans teach them? Is it only OK for computers to make those decisions if the humans teaching them have legal training at the outset in order to make these decisions? Or can we trust programmers to write these kinds of programs for us as well? Does lawyering rely on a gut instinct? I'm sure sometimes in cases you've experienced in your own practice the decision that you make might be contrary to what you think might be the right thing to do because you just feel like if I do this other thing it's going to work better in this case. And I'm sure that for many of you, this has paid off successfully. Doing something that is in contravention of the accepted norm is something that a computer may not be-- you may not be able to train a computer to do that. You may not be able to train gut instinct to challenge the rules, when all this whole idea of neuroevolution and machine learning and AI is designed to have computers learn and enforce rules. Will the use of AI affect the attorneys' bottom line? Hypothetically it should make legal work cheaper. But this would then potentially reduce firm profits by not having attorneys, humans, reviewing this material. This is, in some ways, a good thing. It makes things more affordable for our clients. This is in some ways a bad thing. We have entrenched expenses that we need to pay that are based on certain monies coming in because of the hourly rates of our associates and our partners. Does this change that up? Does the fact of this changes it up, is it problematic? Is it better for us to provide the most competent representation that we can, even if that competent representation is actually from a computer? Remember that as attorneys, we have an ethical obligation to stay on top of and understand technology. Sometimes that may become a situation where using that technology and working with that technology really forces us to do something we might not want to do because it doesn't feel like the right thing to do from a business perspective. Nevertheless our ethical obligations compel us to potentially do that thing. So we've seen some of the good things that machine learning can do. But certainly there are also some bad things that machine learning can do. There's an article that we provided about machine bias and a computer program that is ostensibly supposed to be used by prosecutors and judges when they are considering releasing somebody on bail or setting the conditions for parole, whether or not they're more likely to commit future crimes. Like, what is their likely recidivism rate? What kind of additional support might they need upon their release? But it turns out that the data that we're feeding into these algorithms is provided by humans. And unfortunately these programs that are supposed to help judges make better decisions have a racial bias in them. The questions that get asked as part of figuring out whether this person is more likely or not to commit a future crime, they're never outright asking the question, what is your race and basing a score on that. But they're asking other questions that sort of are hints or indicators of what someone's race might be. For example, they're asking questions about socioeconomic status and languages spoken and whether or not parents have ever been imprisoned and so on. And these programs sort of stereotype people in ways that are not OK, or we might not deem to be OK in any way, to make decisions. And these stereotypes are created by humans. And so we're actually teaching the computer bias in this way. We're supplying data. We, as humans, are providing it. We're imparting our bias into the program. And the program is really just implementing exactly what we're telling it to do. Computers, yes, they are intelligent. We can teach them to learn things about themselves. But at the end of the day, that knowledge comes from us. We are either telling them to hit some target or providing data to them and telling them these are the rules to match. So computers can are only as intelligent as the humans who create and program them. And unfortunately that means they're also as affected by bias as the humans who create and program them. These programs have been found that they are only 20% of the time accurate in producing and predicting future violent crimes. They are only 60% of the time accurate in predicting any sort of future crime, so misdemeanors and so on, so a little bit better than a 50/50 shot at getting it right based on these predictive questions that they're asking people when during intake process. Proponents of these scoring metrics say that they provide useful data. Opponents say that the data is being misused. It's being used as part of sentencing determinations rather than what its ostensible purposes, which is to set conditions for bail and set conditions for release, any sort of parole conditions that might come into play. These calculations are also done by companies that generally are for-profit entities. They sell these programs to states and localities for a fixed rate per year typically. Does that mean that there's a financial incentive to make certain decisions? Would you feel differently about these programs if they were not free versus paid programs? Should computers be involved in making these decisions that humans would otherwise make anyway? Like, given a questionnaire, would a human being potentially reach the same conclusion? Ideally that is what it should do. It should be mimicking the human decision-making process. Is it somehow less slimy feeling, for lack of a better phrase, if a human being, a judge or a court clerk, is making these determinations rather than a computer? Now, granted the judge is still making the final call. But the computer is printing out likely recidivism scores and printing out all this data about somebody that surely is going to influence the judge's decision and in some localities, perhaps over influencing the judge's decision, taking the human element out of it entirely. Does it feel better if the computer is out of that equation entirely? Or is it better to have a computer make these decisions and potentially prevent mistakes from happening prevent or draw attention to things that might otherwise be missed or minimize things that might otherwise have too much attention drawn to them? Again, a difficult question to answer, how much do we want technology to be involved in the legal decision-making process? But as we go forward, it's certainly undoubtedly true that more and more decisions in a legal context are going to be made by computers at the outset, with humans sort of falling into the verification category rather than active decision maker category. Is this good? Is this bad? It's the future. For entities based in the United States or who solely have customers in the United States, this next area may not be a concern now but it's very likely to potentially become one in the future. And that is what to do with GDPR, the General Data Protection Regulation, or General Data Privacy regulation that was promulgated by the European Union and came into effect in May of 2018. This basically defines the right for people to know what kind of data is being collected about them. This is not a right that currently exists in the United States. And it'll be really interesting to see whether the EU experiment about revealing this kind of data, which has never been available to individuals before, will become something that exists in the United States and is going to be something that we have to deal with. If you're based in the United States, and you do have customers in Europe, you may be subject to the GDPR. For example, us at CS50, we have students who take the class through at edX, or HarvardX, the online MOOC platform. And when GDPR took effect in May of 2018, we spoke to Harvard and figured out ways that we needed to potentially interact with European users of our platform, despite the fact that we're based in the United States, and what sort of data implications that might have. And that it could be because of it's out of an abundance of caution to make sure we're on the right side of it, even if we're not necessarily subject to the GDPR, but it is certainly an area of evolving concern for international companies. The GDPR allows individuals to get their personal data. That means data that either could identify an individual, something like what we discussed earlier in terms of cookies and tracking and the kinds of things that you search being tied to your IP address, which then might be tied to your actual address and so on, or data that even could identify an individual but doesn't necessarily identify somebody just yet. The requirement itself imposes requirements. The regulation itself imposes requirements on the controller, so the person who is providing a service or is holding all of that data, and basically says that what the controllers responsibilities are for processing that data and what they have to reveal to users who request it. So for example, on request, by a user of a service, when that user and the controller are subjects the GDPR, the controller must identify themselves, who they are, what the best way is to contact them, tell the user what data they have about them, how that data is being processed, why they are processing that data, so what sorts of things are they trying to do with it. Are they trying to make longitudinal connections between different people? Are they trying to collect it to sell it to marketers and so on? They need to tell them if that data is going to be referred to a third party, again, whether that's selling the data or using a third-party service to help interpret that data. So again for example, in the case of Samsung, that might be Samsung is collecting your voice data. But they may be sharing all the data they get with a third party, whose focus, whose programming focus is about processing that data and trying to find out better voice commands by collecting the voices of hundreds of thousands of different people so they can get a better synthesis of a particular thing they hear, translating that into a command. These same restrictions will apply whether the data is collected or provided by the user, or is just inferred about the user as well. So that the controller would also need to reveal information that was gleaned about somebody without necessarily having just been given to them directly by the person providing that personal data. The owner can also compel the controller to change data about them once they get this report about what data they have about them that is inaccurate, which brings up a really interesting question of, what if something is accurate, but you don't like it, and you are a person who's providing personal data? Can you challenge it as inaccurate? This is, again, something that has not been answered yet but is very likely to be answered at some point by somebody. What does it mean for data to be inaccurate? Moreover, is it a good thing to delete data about somebody? There are exceptions that exist in the GDPR for preserving data or not allowing it to be deleted if it serves the public interest. And so the argument that is sometimes made in favor of GDPR is someone who commits a minor crime, for example, might be haunted by this one mark on their record for years and years and years. They can never shake it. And it's a minor crime. There was no recidivism. It wasn't violence in any way. It just has now hampered-- it's impacted their life. They can't get the kind of job that they want, for example. They can't get the kind of apartment that they want. Shouldn't they be able to eliminate that data? Some people would argue yes, that the individual's already paid the price. Society is not harmed by this crime or this past event any longer. And so sure, delete that data. Others would argue no, it's a part of history. We don't have a policy of erasing history. That's not what we do. And so even though it's annoying perhaps to that individual, or it's had a non-trivial impact on their life, we can't just get rid of data that we don't like. So data that might be deemed inaccurate personally, like if a company gets a lot of information about me because I'm doing a lot of online shopping, and they say, I'm a compulsive spender, and that's part of their processed data, can I challenge that is inaccurate because I don't think I'm a compulsive spender? I feel like I earn enough money and can spend this money how I want, and it has an impact on my life negatively. But they think, well, you've spent $20,000 on pictures of cats. Maybe you are kind of a compulsive spender. And that's something that we've gleaned from this data, and that's part of your record. Can I challenge that? Open question. For those of you who may be contending with the GDPR in your future practice, we've excerpted some parts of it that are particularly relevant, that deal with the technological implications of what we've just discussed as part of the recommended reading for this module. The last subject that we'd like to consider in this course is what is kind of a political hot potato right now in the United States. And that is this idea of net neutrality. And before we get into the back and forth of it, I think it's properly important for us to define what exactly net neutrality is. At its fundamental core, the idea is that all traffic on the internet should be treated equally. We shouldn't prioritize some packets over others. So whether your service is Google, Facebook, Netflix, some huge data provider, or you are some mom-and-pop shop in Kansas somewhere that has a few customers, but you still have a website and a web presence, that web traffic from either that location, the small shop, or the big data provider should be treated equally. One should not be prioritized over the other. That is the basic idea that underpins-- when you hear net neutrality, it is all traffic on the web should be treated equally. The hot potato, of course, is, is that the right thing to do? Let's try and visualize one way of thinking about net neutrality that kind of shows you how both sides might perceive this. It may help to think about net neutrality in terms of a road. Much like a road has cars flowing over it, the internet has information flowing over it. So we can think about this like we have a road. And proponents of net neutrality will say, well, wait a minute, if we built a second road that was parallel to the first road, went to the same place, but this road was maybe better maintained, and you had to pay a toll to use it, proponents would say, hey, wait, this is unfair. All this traffic needs to use this main road that we've been using for a long time. But people who can afford to go into this new road, where traffic moves faster, but you have to pay the toll, well, then their traffic's going to be prioritized. Their packets are to get there faster. This is not fundamentally fair. This is not the way the internet was designed, where free flow of information is sort of priority, and every packet is treated equally. So proponents of net neutrality will say this arrangement is unfair. Opponents of net neutrality, people who feel like you should be able to have traffic that goes faster on some roads than others, will say, no, no, no, this is the free market talking. The free market is saying, hey, if I really want to make sure that my service gets to people faster, I should have the right to do that. After all, that's how the market works for just about everything else. Why should the internet be any different? And that's really the basic idea. Is it should everybody use the same road, or should people who can afford to use a different road be permitted to do so? Proponents will say no. Opponents will say yes. That's the way the free market works. From a theoretical perspective or from a technical perspective, how would we implement this? It's relatively easy if the service that we're trying to target has paid for premium service. Their IP addresses associated with their business. And so the internet service provider, the people who own the infrastructure on which the internet operates, so they literally own the fiber optic cables along which the data operate, can just say, well, any data that's going to this IP address, we'll just prioritize it over other traffic. There might be real reasons to actually want to prioritize other traffic. So for example, if you are sending an email to somebody or trying to access a website, there's a lot of redundancy built in here. We've talked about TCP, for example, the Transmission Control Protocol, and how it has redundancy built in. If a packet is dropped, if there's so much network congestion because everybody's flowing along that same road, if there's so much congestion that the packet gets dropped, TCP will re-send that packet. So services that are low impact, like accessing a website for some company or sending an email to somebody, there's no real worry here. But now imagine a service like you're trying to make an international business video call using Skype or using Google Hangouts, or you're trying to stream a movie on Netflix or some other internet video streaming provider. Generally, those packets are not sent using TCP. They're usually using a different protocol called UDP, whose purpose in life is really just to get information to as quickly as possible, but there's no redundancy. If a package gets dropped, that packet gets dropped, so be it. Now, imagine if you're having an international business call. There's a lot of packets moving, especially if you're having a call with Asia, for example. Between the United States and Asia, that has to travel along that Pacific cable. There's a lot of traffic that has to use that Pacific cable. Wouldn't it be nice, advocates against net neutrality would say, if the company that's providing that service was able to pay to ensure that its packets had priority thus reducing the likelihood of those packets being dropped, thus improving the quality of the video call, thus generally providing, theoretically again, a better service for the people who use it. So it might be the case that some services just need prioritization. And the internet is designed in such a way that we can't guarantee or give them that prioritization. Isn't that a reason in favor of repealing net neutrality, making it so that people could pay for certain services that don't work with redundancy and require just to get there quickly and get there guaranteed over other traffic? In 2015, the Obama administration, when the Federal Communications Commission was Democratically controlled, voted in favor of net neutrality, reclassifying the internet as a Title II communications service. Meaning it could be much more tightly regulated by the FCC and imposing this net neutrality requirement. Two years later, when the Trump administration came into office, President Trump appointed Ajit Pai, the current chairman of the FCC, who basically said he was going to repeal the net neutrality rules that had been set in place by the Obama administration. And he did. Those took effect in the summer of 2018. So we're now back in this wild lands of net neutrality is on the books in some places. There are even states now who have state laws that are designed to enforce this idea, this theory of net neutrality, that you're now running into conflict with federal law. So there's now this question of who wins out here? Has Congress claimed this domain? Can states set different rules from what Congress and what regulators appointed by or delegated responsibility by Congress to make these decisions? Can states do something different than that? It is probably one of the most hot-button hot-potato issues in technology and the law right now. What is going to happen with respect to net neutrality? Is it a good thing? Is it a bad thing? Is it the right thing to do for the internet? To learn a bit more about net neutrality, we've supplied as an additional reading a con take on net neutrality. Generally you'd see pro takes about this in tech blogs. But we've explicitly included a con take on why net neutrality should not be the norm, which we really do encourage you to take a look at and consider as you dive into this topic. But those are just some of the challenges that lie at the intersection of law and technology. We've certainly barely skimmed the surface. And my hope is that I've created far more questions than answers because those are the kinds of questions that you are going to have to answer for us. Ultimately it is you, as practitioners, who will go out and face these challenges and figure out how we're going to deal with data breaches, how we're going to deal with AI in the law, how we're going to deal with net neutrality, how we're going to deal with issues of software and trust. Those are the questions for the future that lie at this intersection. And the future is in your hands. So help lead us in the right direction.
B1 中級 2019年律師的CS50--法律與技術交匯處的挑戰。 (CS50 for Lawyers 2019 - Challenges at the Intersection of Law and Technology) 5 0 林宜悉 發佈於 2021 年 01 月 14 日 更多分享 分享 收藏 回報 影片單字