字幕列表 影片播放 列印英文字幕 [MUSIC PLAYING] JEN GENNAI: I'm an operations manager, so my role is to ensure that we're making our considerations around ethically AI deliberate, actionable, and scalable across the whole organization in Google. So one of the first things to think about if you're a business leader or a developer is ensuring that people understand what you stand for. What does ethics mean to you? For us, that meant setting values-driven principles as a company. These value-driven principles, for us, are known as our AI principles. And last year, we announced them in June. So these are seven guidelines around AI development and deployment, which assigned to us how we want to develop AI. We want to ensure that we're not creating or reinforcing bias. We want to make sure that we're building technology that's accountable to people. And we have five others here that you can read. It's available on our website. But at the same time that we announce these aspirational principles for the company, we also identified four areas that we have considered our red lines. So these are technologies that we will not pursue. These cover things like weapons technology. We will not build or deploy weapons. We will also not build or deploy technologies that we feel violate international human rights. So if you're a business leader or a developer, we'd also encourage you to understand what are your aspirational goals. But at the same time, what are your guardrails? What point are you're not going to cross? It's the most important thing to do, is to know what is your definition of ethical AI development. After you've set your AI principles, the next thing is, how do you make them real? How do you make sure that you're aligning with those principles? So here, there are three main things I'd suggest keeping in mind. The first one is you need an accountable and authoritative body. So for us in Google, this means that we have senior executives across the whole company who have the authority to approve or decline a launch. So they have to wrestle with some of these very complex ethical questions to ensure that we are launching things that we do believe will lead to fair and ethical outcomes. So they provide the authority and the accountability to make some really tough decisions. Secondly, you have to make sure that the decision-makers have the right information. This involves talking to diverse people within the company, but also listening to your external users, external stakeholders, and feeding that into your decision-making criteria. Jamila will talk more about engaging with external communites in a moment. And then the third key part of building governance and accountability is having operations. Who's going to do the work? What are the structures and frameworks that are repeatable, that are transparent, and that are understood by the people who are making these decisions? So for that, in Google, we've established a central team that's not based in our engineering and product teams to ensure that there's a level of objectivity here. So the same people who are building the products are not the only people who are looking to make sure that those products are fair and ethical. So now you have your principles that you're trying to ensure that people understand what does ethics mean for you. We're talking about establishing governance structure to make sure that you're achieving those goals, and the next thing to do is to ensure that you're encouraging everyone within your company or the people that you work with and for are aligned on those goals. So making sure, one, that you've set overall goals in alignment with ethical AI-- so how are you going to achieve ethical development and deployment of technology? Next, you want to make sure that you're training people to think about these issues from the start. You don't want to catch some ethical consideration late in the product development lifecycle. You want to make sure that you're starting that as early as possible-- so getting people trained to think about these types of issues. Then we have rewards. You have to make sure if you're holding people accountable to ethical development and deployment, you may have to accept that that might slow down some development in order to get to the right outcomes-- making sure people feel rewarded for thinking about ethical development and deployment. And then, finally, making sure that you're hiring people and developing people who are helping you achieve those goals. Next, you've established your frameworks, you've hired the right people, you're rewarding them. How do you know you're achieving your goals? So we think about this as validating and testing. So an example here is replicating a user's experience. Who are your users? How do you make sure that you're thinking about a representative sample of your users? So you think about trying to test different experiences, mostly from your core subgroups. But you also want to be thinking about, who are your marginalized users? Who might be underrepresented in your workforce? And therefore, you might have to pay additional attention to to get it right. We also think about, what are the failure modes? And what we mean by that is if people have been negatively affected by a product in the past, we want to make sure they won't be negatively affected in the future. So how do we learn from that and make sure that we're testing deliberately for that in the future? And then the final bit of testing and validation is introducing some of those failures into the product to make sure that you're stress testing, and, again, have some objectivity to stress test a product to make sure it's achieving your fair and ethical goals. And then we think about it's not just you. You're not alone. How do we ensure that we're all sharing information to make us more fair and ethical and to make sure that the products we deliver are fair and ethical? So we encourage the sharing of best practices and guidelines. We do that ourselves in Google by providing our research and best practices on the Google AI site. So these best practices cover everything from ML fairness tools and research that Margaret Mitchell will talk about in a moment, but also best practices and guidelines that any developer or any business leader could follow themselves. So we try to both provide that ourselves, as well as encouraging other people to share their research and learnings also. So with that, as we talk about sharing with external, it's also about bringing voices in. So I'll pass over to Jamila Smith-Loud to talk about understanding human impacts. JAMILA SMITH-LOUD: Thank you. [APPLAUSE] Hi, everyone. I'm going to talk to you a little bit today about understanding, conceptualizing, and assessing human consequences and impacts on real people and communities through the use of tools like social equity impact assessments. Social and equity impact assessments come primarily from the social science discipline and give us a research-based method to assess these questions in a way that is broad enough to be able to apply across products, but also specific enough for us to think about what are tangible product changes and interventions that we can make. So I'll start off with one of the questions that we often start when thinking about these questions. I always like to say that when we're thinking about ethics, when we're thinking about fairness, and even thinking about questions of bias, these are really social problems. And one major entry point into understanding social problems is really thinking about what's the geographic context in which users live, and how does that impact their engagement with the product? So really asking, what experiences do people have that are based solely on where they live and that may differ greatly for other peoples who live in different neighborhoods that are either more resourced, more connected to internet-- all of these different aspects that make regional differences so important? Secondly, we like to ask what happens to people when they're engaging with our products in their families and in their communities. We like to think about, what are economic changes that may come as a part of engagement with this new technology? What are social and cultural changes that really do impact how people view the technology and view their participation in the process? And so I'll start a little bit of talking about our approach. The good thing about utilizing kind of existing frameworks of social and equity impact assessments which come from-- if you think about when we do new land development projects or even environmental assessments, there's already the standard of considering social impacts as a part of that process. And so we really do think of employing new technologies in the same way. We should be asking similar questions about how communities are impacted, what are their perceptions, and how are they framing these engagements? And so one of the things that we think about are kind of what is a principled approach to asking these questions? And the first one really is around engaging in the hard questions. When we're talking about fairness, when we're talking about ethics, we're not talking about them separately from issues of racism, social class, homophobia, and all forms of cultural prejudice. We're talking about what are the issues as they overlay in those systems.? And so it really requires us to be OK with those hard questions, and engaging with them, and realizing that our technologies and our products don't exist separately from that world. The next approach is really towards thinking anticipatory. I think the different thing about thinking about social and equity impact assessments from other social science research methods is that the relationships between causal impacts and correlations are going to be a little bit different, and we really are trying to anticipate harms and consequences. And so it requires you to be OK with the fuzzy conversations, but also realize that there's enough research, there's enough data that gives us the understanding of how history and contexts impact outcomes. And so being anticipatory in your process is really, really an important part of it. And lastly, in terms of thinking about the principled approach is really centering the voices and experiences of those communities who often bear the burden of the negative impacts. And that requires understanding how those communities would even conceptualize these problems. I think sometimes we come from a technical standpoint, and we think about the communities as separate from the problem. But if we're ready to center those voices and engaged throughout the whole process, I think it results in better outcomes. So to go a little bit deeper into engaging in the hard questions, what we're really trying to do is be able to assess how a product will impact communities, particularly communities who have been historically and traditionally marginalized. So it requires us to really think about history and context. How is that shaping this issue, and what could we learn from that assessment? It also requires an intersectional approach. If we're thinking about gender equity, if we're thinking about racial equity, these are not issues that live separately. They really do intersect, and being OK with understanding of that intersectional approach allows for a much fuller assessment. And then, lastly, in thinking about new technologies and thinking about new products, how does power influence outcomes and the feasibility of interventions? I think that the question of power and social impact go hand-in-hand, and it requires us to be OK with [? answering. ?] Answering might not get the best answer, but at least asking those hard questions. So our anticipatory process is part of a full process, right? So it's not just us thinking about the social and equity impacts, but it really is thinking about them within the context of the product-- so really having domain-specific application of these questions, and then having some assessment of the likelihood of the severity of the risk. And then, lastly, thinking about what are meaningful mitigations for whatever impacts that we have to developed. And so it's a full process. It requires work on our team in terms of understanding in the assessment, but it also requires partnership with our product teams to really do that domain-specific analysis. Centering the assessment. I talked a little bit about this before, but when we're centering this assessment, really, what we're trying to ask is, who's impacted most? So if we're thinking about a problem that may have some economic impact, it would require us to disaggregate the data based on income to see what communities, what populations, are most impacted-- so being OK with thinking about it in very specific population data and understanding who is impacted the most. Another important part is validation. And I think Jen mentioned that a lot, but really thinking about community-based research engagements, whether that's a participatory approach, whether that's focus groups. But really, how do we validate our assessments by engaging communities directly and really centering their framing of the problem as part of our project? And then going through iteration and realizing that it's not going to be perfect the first time, that it requires some pull and tugging from both sides to really get the conversation right. So what types of social problems are we thinking of? We're thinking about income inequality, housing and displacement, health disparities, the digital divide, and food access. We're thinking about these and all different types of ways, but I thought it might be helpful if we thought about a specific example. So let's look at the example of one of the types of social problems that we want to understand in relation to our products and users. The topic of inequity related to food access, which this map shows you-- and it's definitely a US context that we're thinking about this question for now, but also always thinking about it from a global way. But I thought that this map was a good way for us to look at it. As you can see, the areas that are shaded darker are the areas where those users might have a significantly different experience when we're thinking about products that give personalization and recommendations maybe for something like restaurants. So we're thinking about questions about how those users are either included or excluded from the product experience, and then we're thinking about going even further and thinking about how small businesses and low resource businesses also impact that type of product. So it requires us to realize that there's a wealth of data that allows us to even go here as deep as the census tract level and understand that there are certain communities who have a significantly different experience than other communities. And so, like I said, this map is looking at communities at a census tract level where there's no car and no supermarket store within a mile. And if we want it to look even deeper, we can overlay this information with income. So thinking about food access and income disparity, which are often connected, gives us a better understanding of how different groups may engage with a product. And so when thinking about a hard social problem like this, it really requires us to think, what's the logical process for us to get towards a big social problem and have very specific outcomes and effects that are meaningful and are making a change? And it requires us to really acknowledge that there's contexts that overlays all parts of this process, from the inputs that we have, from the activities that we do-- which may, in my case, be very much research-based activities-- and then thinking about what are meaningful outputs. And so to go in a little bit deeper in kind of this logic model way of thinking about it, we have a purpose now, in thinking about the food access example, to reduce negative unintended consequences in areas where access to quality food is an issue. We're also very aware of the context. So we're thinking about the context of food access, but we're also thinking about questions of gentrification. We're thinking about displacement. We're thinking about community distrust. So we realize that this question has many other issues that inform the context, not just access to food. But as part of the process, we're identifying resources. We're thinking, where are there multidisciplinary research teams that can help us think through? What are our external stakeholders that can help us frame the problem? And then, what are the cross-functional relationships that we need to build to really be able to solve this kind of problem, while acknowledging what our constraints are? Oftentimes, time is a huge constraint, and then gaps just in knowledge and comfort in being able to talk about these hard problems. Some of the activities and inputs that we are thinking about can help us get to some answers are really thinking about case studies, thinking about surveys, thinking about user research where we're asking user perception about this issue. How does engagement based on your geography differ in being able to do that analysis? And then creating tangible outputs, some that are product interventions and really focused on how we can make changes to the product, but also really community-based mitigations in thinking about are there ways in which we're engaging with the community, ways in which we're pulling data that we can really use to create a fuller set of solutions. And really, it's always towards aspiring for positive effects in principle and practice. So this is one of those areas where you can feel like you have a very principled approach, but it really is about being able to put them into practice. And so some of the things that I'll leave you with today in thinking about understanding these human impacts are really being able to apply them and thinking about applying them in specific technical applications, building trust through equitable collaboration-- so really thinking about, when you're engaging with external stakeholders, how do you make it feel equitable and that we're both sharing knowledge and experiences in ways that are meaningful-- and then validating the knowledge generation. When we're engaging with different communities, we really have to be OK that information, data, and the way that we frame this can come from multiple different sources, and it's really important. And then really thinking about, within your organization, within your team, what are change agents and what are change instruments that really make it a meaningful process? Thank you. Now Margaret will talk more about the machine learning pipeline. [APPLAUSE] MARGARET MITCHELL: Great. Thanks, Jamila. So I'll be talking a bit about fairness and transparency and some frameworks and approaches for developing ethical AI. So in a typical machine learning development pipeline, the starting point for developers is often the data. Training data is first collected and annotated. From there, a model can be trained. The model can then be used to output content such as predictions or rankings, and then downstream users will see the output. And we often see this approach as if it's a relatively clean pipeline that provides objective information that we can act on. However, from the beginning of this pipeline, human bias has already shaped the data that's collected. Human bias then further shapes what we collect and how we annotate it. Here are some of the human biases that commonly contribute to problematic biases and data, and in the interpretation of model outputs. Things like reporting bias-- where we tend to remark on things that are noticeable to us, as opposed to things that are typical-- things like out-group homogeneity bias-- where we tend to see people outside of our social group as somehow being less nuanced or less complex than people within the group that we work with-- and things like automation bias-- where we tend to favor the outputs of systems that are automated over the outputs of what humans actually say even when there's contradictory information. So rather than this straightforward, clean, end-to-end pipeline, we have human bias coming in at the start of the cycle, and then being propagated throughout the rest of the system. And this creates a feedback loop where, as users see the output of biased systems and start to click or start to interact with those outputs, this then feeds data that is further trained on-- that's already been biased in this way-- creating problematic feedback loops where biases can get worse and worse. We call this a sort of bias network effect, or bias "laundering." And a lot of our work seeks to disrupt this cycle so that we can bring the best kind of output possible. So some of the questions we consider is, who is at the table? What are the priorities in what we're working on? Should we be thinking about different aspects of the problem and different perspectives as we develop? How is the data that we're working with collected? What kind of things does it represent? Are there problematic correlations in the data? Or are some kinds of subgroups underrepresented in a way that will lead to disproportionate errors downstream? What are some foreseeable risks? So actually thinking with foresight and anticipating possible negative consequences of everything that we work on in order to better understand how we should prioritize. What constraints and supplements should be in place? Beyond a basic machine learning system, what can we do to ensure that we can account for the kinds of risks that we've anticipated and can foresee? And then what can we share with you, the public, about this process? We aim to be transparent as we can about this in order to bring about information about how we're focusing on this and make it clear that this is part of our development lifecycle. I'm going to briefly talk about some technical approaches. This is in the research world. You can look at papers on this, if you're interested, for more details. So there are two sorts of ML-- Machine Learning-- techniques that we've found to be relatively useful. One is bias mitigation, and the other one we've been broadly calling inclusion. So bias mitigation focuses on removing a signal for problematic variables. So for example, say you're working on a system that is supposed to predict whether or not someone should be promoted. You want to make sure that that system is not keying on something like gender, which we know is correlated with promotion decisions. In particular, women are less likely to be promoted or are promoted less quickly than men in a lot of places, including in tech. We can do this using an adversarial multi-task learning framework where, while we predict something like getting promoted, we also try and predict the subgroup that we'd like to make sure isn't affecting the decision and discourage the model from being able to see that, removing the representation by basically reversing the gradient and backpropagating. When we work on inclusion, we're working on adding signal for something-- trying to make sure that there are subgroups that are accounted for, even if they're not well-represented in the data. And one of the approaches that works really well for this is transfer learning. So we might take a pre-trained network with some understanding of gender, for example, or some understanding of skin tone, and use that in order to influence the decisions of another network that is able to key on these representations in order to better understand nuances in the world that it's looking at. This is a little bit of an example of one of the projects I was working on where we were able to increase how well we could detect whether or not someone was smiling based on working with some consented gender-identified individuals and having representations of what these gender presentations looked like, using that within the model that then predicted whether or not someone was smiling. Some of the transparency approaches that we've been working on help to further explain to you and also help keep us accountable for doing good work here. So one of them is model cards. In model cards, we're focusing on reporting what model performance is, disaggregating across various subgroups, and making it clear that we've taken ethical considerations into account, making it clear what the intended applications of the model or the API is, and sharing, generally, different kinds of considerations that developers should keep in mind as they work with the models. Another one is data cards. And this provides evaluation data about, when we report numbers, what is this based on? Who is represented when we decide a model can be used-- that it's safe for use? These kinds of things are useful for learners-- so people who generally want to better understand how models are working and what are the sort of things that are affecting model performance for third party users. So non-ML professionals who just want to have a better understanding of their data sets that they're working with or what the representation is in different data sets that machine learning models are based on or evaluated on, as well as machine learning researchers. So people like me, who want to compare model performance, they want to understand what needs to be improved, what is already doing well, and help be able to sort of benchmark and make progress in a way that's sensitive to the nuanced differences in different kinds of populations. Our commitment to you, working on fair and ethical artificial intelligence and machine learning, is to continue to measure, to improve, and to share real-world impact related to ethical AI development. Thanks. [APPLAUSE]
B1 中級 編寫公平與道德的人工智能與機器學習的遊戲手冊(Google I/O'19)。 (Writing the Playbook for Fair & Ethical Artificial Intelligence & Machine Learning (Google I/O'19)) 1 0 林宜悉 發佈於 2021 年 01 月 14 日 更多分享 分享 收藏 回報 影片單字