字幕列表 影片播放
[MUSIC PLAYING]
JEN GENNAI: I'm an operations manager,
so my role is to ensure that we're making our considerations
around ethically AI deliberate, actionable,
and scalable across the whole organization in Google.
So one of the first things to think about
if you're a business leader or a developer
is ensuring that people understand what you stand for.
What does ethics mean to you?
For us, that meant setting values-driven principles
as a company.
These value-driven principles, for us,
are known as our AI principles.
And last year, we announced them in June.
So these are seven guidelines around AI development
and deployment, which assigned to us how
we want to develop AI.
We want to ensure that we're not creating or reinforcing bias.
We want to make sure that we're building technology
that's accountable to people.
And we have five others here that you can read.
It's available on our website.
But at the same time that we announce
these aspirational principles for the company,
we also identified four areas that we
have considered our red lines.
So these are technologies that we will not pursue.
These cover things like weapons technology.
We will not build or deploy weapons.
We will also not build or deploy technologies
that we feel violate international human rights.
So if you're a business leader or a developer,
we'd also encourage you to understand what
are your aspirational goals.
But at the same time, what are your guardrails?
What point are you're not going to cross?
It's the most important thing to do, is to know what is
your definition of ethical AI development.
After you've set your AI principles,
the next thing is, how do you make them real?
How do you make sure that you're aligning with those principles?
So here, there are three main things
I'd suggest keeping in mind.
The first one is you need an accountable and authoritative
body.
So for us in Google, this means that we have senior executives
across the whole company who have the authority
to approve or decline a launch.
So they have to wrestle with some
of these very complex ethical questions
to ensure that we are launching things
that we do believe will lead to fair and ethical outcomes.
So they provide the authority and the accountability
to make some really tough decisions.
Secondly, you have to make sure that the decision-makers have
the right information.
This involves talking to diverse people within the company,
but also listening to your external users,
external stakeholders, and feeding that
into your decision-making criteria.
Jamila will talk more about engaging
with external communites in a moment.
And then the third key part of building governance
and accountability is having operations.
Who's going to do the work?
What are the structures and frameworks
that are repeatable, that are transparent,
and that are understood by the people who
are making these decisions?
So for that, in Google, we've established a central team
that's not based in our engineering and product teams
to ensure that there's a level of objectivity here.
So the same people who are building the products
are not the only people who are looking
to make sure that those products are fair and ethical.
So now you have your principles that you're
trying to ensure that people understand
what does ethics mean for you.
We're talking about establishing governance structure
to make sure that you're achieving those goals,
and the next thing to do is to ensure that you're encouraging
everyone within your company or the people that you work with
and for are aligned on those goals.
So making sure, one, that you've set overall goals in alignment
with ethical AI--
so how are you going to achieve ethical development
and deployment of technology?
Next, you want to make sure that you're training people
to think about these issues from the start.
You don't want to catch some ethical consideration
late in the product development lifecycle.
You want to make sure that you're
starting that as early as possible-- so getting
people trained to think about these types of issues.
Then we have rewards.
You have to make sure if you're holding people
accountable to ethical development and deployment,
you may have to accept that that might slow down
some development in order to get to the right outcomes--
making sure people feel rewarded for thinking
about ethical development and deployment.
And then, finally, making sure that you're hiring people
and developing people who are helping you
achieve those goals.
Next, you've established your frameworks,
you've hired the right people, you're rewarding them.
How do you know you're achieving your goals?
So we think about this as validating and testing.
So an example here is replicating
a user's experience.
Who are your users?
How do you make sure that you're thinking
about a representative sample of your users?
So you think about trying to test different experiences,
mostly from your core subgroups.
But you also want to be thinking about,
who are your marginalized users?
Who might be underrepresented in your workforce?
And therefore, you might have to pay additional attention to
to get it right.
We also think about, what are the failure modes?
And what we mean by that is if people have been negatively
affected by a product in the past,
we want to make sure they won't be negatively affected
in the future.
So how do we learn from that and make sure
that we're testing deliberately for that in the future?
And then the final bit of testing and validation
is introducing some of those failures
into the product to make sure that you're stress testing,
and, again, have some objectivity
to stress test a product to make sure it's achieving
your fair and ethical goals.
And then we think about it's not just you.
You're not alone.
How do we ensure that we're all sharing information
to make us more fair and ethical and to make sure
that the products we deliver are fair and ethical?
So we encourage the sharing of best practices and guidelines.
We do that ourselves in Google by providing
our research and best practices on the Google AI site.
So these best practices cover everything
from ML fairness tools and research
that Margaret Mitchell will talk about in a moment,
but also best practices and guidelines
that any developer or any business leader
could follow themselves.
So we try to both provide that ourselves, as well
as encouraging other people to share their research
and learnings also.
So with that, as we talk about sharing with external,
it's also about bringing voices in.
So I'll pass over to Jamila Smith-Loud
to talk about understanding human impacts.
JAMILA SMITH-LOUD: Thank you.
[APPLAUSE]
Hi, everyone.
I'm going to talk to you a little bit
today about understanding, conceptualizing, and assessing
human consequences and impacts on real people and communities
through the use of tools like social equity impact
assessments.
Social and equity impact assessments
come primarily from the social science discipline
and give us a research-based method
to assess these questions in a way that is broad enough
to be able to apply across products,
but also specific enough for us to think about what
are tangible product changes and interventions that we can make.
So I'll start off with one of the questions
that we often start when thinking about these questions.
I always like to say that when we're
thinking about ethics, when we're thinking about fairness,
and even thinking about questions of bias,
these are really social problems.
And one major entry point into understanding social problems
is really thinking about what's the geographic context in which
users live, and how does that impact their engagement
with the product?
So really asking, what experiences
do people have that are based solely on where they live
and that may differ greatly for other peoples who
live in different neighborhoods that are either
more resourced, more connected to internet-- all
of these different aspects that make regional differences so
important?
Secondly, we like to ask what happens to people when they're
engaging with our products in their families
and in their communities.
We like to think about, what are economic changes that
may come as a part of engagement with this new technology?
What are social and cultural changes that really do impact
how people view the technology and view their participation
in the process?
And so I'll start a little bit of talking about our approach.
The good thing about utilizing kind
of existing frameworks of social and equity impact assessments
which come from--
if you think about when we do new land development
projects or even environmental assessments,
there's already the standard of considering social impacts
as a part of that process.
And so we really do think of employing new technologies
in the same way.
We should be asking similar questions about how communities
are impacted, what are their perceptions,
and how are they framing these engagements?
And so one of the things that we think about
are kind of what is a principled approach to asking
these questions?
And the first one really is around
engaging in the hard questions.
When we're talking about fairness,
when we're talking about ethics, we're
not talking about them separately
from issues of racism, social class, homophobia,
and all forms of cultural prejudice.
We're talking about what are the issues as they
overlay in those systems.?
And so it really requires us to be
OK with those hard questions, and engaging with them,
and realizing that our technologies and our products
don't exist separately from that world.
The next approach is really towards thinking anticipatory.
I think the different thing about thinking
about social and equity impact assessments
from other social science research methods
is that the relationships between causal impacts
and correlations are going to be a little bit different,
and we really are trying to anticipate
harms and consequences.
And so it requires you to be OK with the fuzzy conversations,
but also realize that there's enough research,
there's enough data that gives us
the understanding of how history and contexts impact outcomes.
And so being anticipatory in your process
is really, really an important part of it.
And lastly, in terms of thinking about the principled approach
is really centering the voices and experiences
of those communities who often bear the burden
of the negative impacts.
And that requires understanding how
those communities would even conceptualize these problems.
I think sometimes we come from a technical standpoint,
and we think about the communities
as separate from the problem.
But if we're ready to center those voices and engaged
throughout the whole process, I think
it results in better outcomes.
So to go a little bit deeper into engaging
in the hard questions, what we're really trying to do
is be able to assess how a product will impact
communities, particularly communities
who have been historically and traditionally marginalized.
So it requires us to really think
about history and context.
How is that shaping this issue, and what could we
learn from that assessment?
It also requires an intersectional approach.
If we're thinking about gender equity,
if we're thinking about racial equity,
these are not issues that live separately.
They really do intersect, and being OK
with understanding of that intersectional approach
allows for a much fuller assessment.
And then, lastly, in thinking about new technologies
and thinking about new products, how does
power influence outcomes and the feasibility of interventions?
I think that the question of power and social impact
go hand-in-hand, and it requires us
to be OK with [? answering. ?] Answering might not
get the best answer, but at least
asking those hard questions.
So our anticipatory process is part of a full process, right?
So it's not just us thinking about the social and equity
impacts, but it really is thinking about them
within the context of the product--
so really having domain-specific application of these questions,
and then having some assessment of the likelihood
of the severity of the risk.
And then, lastly, thinking about what are meaningful
mitigations for whatever impacts that we have to developed.
And so it's a full process.
It requires work on our team in terms
of understanding in the assessment,
but it also requires partnership with our product teams
to really do that domain-specific analysis.
Centering the assessment.
I talked a little bit about this before,
but when we're centering this assessment, really,
what we're trying to ask is, who's impacted most?
So if we're thinking about a problem that
may have some economic impact, it
would require us to disaggregate the data based
on income to see what communities, what populations,
are most impacted-- so being OK with thinking about it in very
specific population data and understanding
who is impacted the most.
Another important part is validation.
And I think Jen mentioned that a lot, but really
thinking about community-based research engagements,
whether that's a participatory approach,
whether that's focus groups.
But really, how do we validate our assessments
by engaging communities directly and really centering
their framing of the problem as part of our project?
And then going through iteration and realizing
that it's not going to be perfect the first time, that it
requires some pull and tugging from both sides
to really get the conversation right.
So what types of social problems are we thinking of?
We're thinking about income inequality, housing
and displacement, health disparities,
the digital divide, and food access.
We're thinking about these and all different types of ways,
but I thought it might be helpful
if we thought about a specific example.
So let's look at the example of one
of the types of social problems that we
want to understand in relation to our products and users.
The topic of inequity related to food access, which
this map shows you--
and it's definitely a US context that we're
thinking about this question for now,
but also always thinking about it from a global way.
But I thought that this map was a good way for us
to look at it.
As you can see, the areas that are shaded darker
are the areas where those users might have a significantly
different experience when we're thinking about products that
give personalization and recommendations maybe
for something like restaurants.
So we're thinking about questions
about how those users are either included or excluded
from the product experience, and then we're
thinking about going even further and thinking about how
small businesses and low resource businesses
also impact that type of product.
So it requires us to realize that there's
a wealth of data that allows us to even go here as
deep as the census tract level and understand that there are
certain communities who have a significantly
different experience than other communities.
And so, like I said, this map is looking
at communities at a census tract level
where there's no car and no supermarket
store within a mile.
And if we want it to look even deeper,
we can overlay this information with income.
So thinking about food access and income disparity,
which are often connected, gives us
a better understanding of how different groups may
engage with a product.
And so when thinking about a hard social problem like this,
it really requires us to think, what's
the logical process for us to get
towards a big social problem and have very specific outcomes
and effects that are meaningful and are making a change?
And it requires us to really acknowledge
that there's contexts that overlays
all parts of this process, from the inputs that we have,
from the activities that we do-- which may, in my case,
be very much research-based activities--
and then thinking about what are meaningful outputs.
And so to go in a little bit deeper
in kind of this logic model way of thinking about it,
we have a purpose now, in thinking about the food access
example, to reduce negative unintended consequences
in areas where access to quality food is an issue.
We're also very aware of the context.
So we're thinking about the context of food access,
but we're also thinking about questions of gentrification.
We're thinking about displacement.
We're thinking about community distrust.
So we realize that this question has
many other issues that inform the context, not just
access to food.
But as part of the process, we're identifying resources.
We're thinking, where are there multidisciplinary research
teams that can help us think through?
What are our external stakeholders that
can help us frame the problem?
And then, what are the cross-functional relationships
that we need to build to really be
able to solve this kind of problem,
while acknowledging what our constraints are?
Oftentimes, time is a huge constraint,
and then gaps just in knowledge and comfort
in being able to talk about these hard problems.
Some of the activities and inputs
that we are thinking about can help
us get to some answers are really
thinking about case studies, thinking about surveys,
thinking about user research where we're asking user
perception about this issue.
How does engagement based on your geography
differ in being able to do that analysis?
And then creating tangible outputs,
some that are product interventions and really
focused on how we can make changes to the product,
but also really community-based mitigations in thinking about
are there ways in which we're engaging
with the community, ways in which we're pulling data
that we can really use to create a fuller set of solutions.
And really, it's always towards aspiring for positive effects
in principle and practice.
So this is one of those areas where
you can feel like you have a very principled approach,
but it really is about being able to put them into practice.
And so some of the things that I'll leave you
with today in thinking about understanding
these human impacts are really being able to apply them
and thinking about applying them in specific technical
applications, building trust through
equitable collaboration-- so really thinking about,
when you're engaging with external stakeholders,
how do you make it feel equitable
and that we're both sharing knowledge
and experiences in ways that are meaningful--
and then validating the knowledge generation.
When we're engaging with different communities,
we really have to be OK that information, data, and the way
that we frame this can come from multiple different sources,
and it's really important.
And then really thinking about, within your organization,
within your team, what are change agents
and what are change instruments that really
make it a meaningful process?
Thank you.
Now Margaret will talk more about the machine learning
pipeline.
[APPLAUSE]
MARGARET MITCHELL: Great.
Thanks, Jamila.
So I'll be talking a bit about fairness and transparency
and some frameworks and approaches for developing
ethical AI.
So in a typical machine learning development pipeline,
the starting point for developers is often the data.
Training data is first collected and annotated.
From there, a model can be trained.
The model can then be used to output content
such as predictions or rankings, and then downstream users
will see the output.
And we often see this approach as
if it's a relatively clean pipeline that
provides objective information that we can act on.
However, from the beginning of this pipeline,
human bias has already shaped the data that's collected.
Human bias then further shapes what we collect
and how we annotate it.
Here are some of the human biases that commonly contribute
to problematic biases and data, and in the interpretation
of model outputs.
Things like reporting bias-- where we tend to remark
on things that are noticeable to us,
as opposed to things that are typical--
things like out-group homogeneity bias--
where we tend to see people outside of our social group
as somehow being less nuanced or less
complex than people within the group that we work with--
and things like automation bias--
where we tend to favor the outputs of systems
that are automated over the outputs of what humans actually
say even when there's contradictory information.
So rather than this straightforward, clean,
end-to-end pipeline, we have human bias
coming in at the start of the cycle,
and then being propagated throughout the rest
of the system.
And this creates a feedback loop where,
as users see the output of biased systems and start
to click or start to interact with those outputs,
this then feeds data that is further trained
on-- that's already been biased in this way--
creating problematic feedback loops
where biases can get worse and worse.
We call this a sort of bias network effect,
or bias "laundering."
And a lot of our work seeks to disrupt this cycle
so that we can bring the best kind of output possible.
So some of the questions we consider
is, who is at the table?
What are the priorities in what we're working on?
Should we be thinking about different aspects
of the problem and different perspectives as we develop?
How is the data that we're working with collected?
What kind of things does it represent?
Are there problematic correlations in the data?
Or are some kinds of subgroups underrepresented in a way
that will lead to disproportionate errors
downstream?
What are some foreseeable risks?
So actually thinking with foresight
and anticipating possible negative consequences
of everything that we work on in order to better understand
how we should prioritize.
What constraints and supplements should be in place?
Beyond a basic machine learning system,
what can we do to ensure that we can account
for the kinds of risks that we've anticipated
and can foresee?
And then what can we share with you, the public,
about this process?
We aim to be transparent as we can about this
in order to bring about information about how we're
focusing on this and make it clear that this is part
of our development lifecycle.
I'm going to briefly talk about some technical approaches.
This is in the research world.
You can look at papers on this, if you're interested,
for more details.
So there are two sorts of ML--
Machine Learning-- techniques that we've
found to be relatively useful.
One is bias mitigation, and the other one we've
been broadly calling inclusion.
So bias mitigation focuses on removing a signal
for problematic variables.
So for example, say you're working
on a system that is supposed to predict whether or not
someone should be promoted.
You want to make sure that that system is not
keying on something like gender, which we know is correlated
with promotion decisions.
In particular, women are less likely to be promoted
or are promoted less quickly than men in a lot of places,
including in tech.
We can do this using an adversarial multi-task learning
framework where, while we predict something
like getting promoted, we also try and predict
the subgroup that we'd like to make sure isn't affecting
the decision and discourage the model
from being able to see that, removing the representation
by basically reversing the gradient and backpropagating.
When we work on inclusion, we're working
on adding signal for something-- trying to make sure
that there are subgroups that are accounted for,
even if they're not well-represented in the data.
And one of the approaches that works really well for this
is transfer learning.
So we might take a pre-trained network
with some understanding of gender,
for example, or some understanding of skin tone,
and use that in order to influence
the decisions of another network that
is able to key on these representations in order
to better understand nuances in the world that it's looking at.
This is a little bit of an example of one
of the projects I was working on where we were able to increase
how well we could detect whether or not someone was smiling
based on working with some consented gender-identified
individuals and having representations of what
these gender presentations looked like, using that
within the model that then predicted whether or not
someone was smiling.
Some of the transparency approaches
that we've been working on help to further explain to you
and also help keep us accountable for doing
good work here.
So one of them is model cards.
In model cards, we're focusing on reporting
what model performance is, disaggregating
across various subgroups, and making it clear that we've
taken ethical considerations into account,
making it clear what the intended
applications of the model or the API is,
and sharing, generally, different kinds
of considerations that developers should keep in mind
as they work with the models.
Another one is data cards.
And this provides evaluation data about,
when we report numbers, what is this based on?
Who is represented when we decide a model can be used--
that it's safe for use?
These kinds of things are useful for learners-- so people who
generally want to better understand
how models are working and what are the sort of things
that are affecting model performance for third party
users.
So non-ML professionals who just want
to have a better understanding of their data
sets that they're working with or what the representation
is in different data sets that machine learning models are
based on or evaluated on, as well as machine
learning researchers.
So people like me, who want to compare model performance, they
want to understand what needs to be improved,
what is already doing well, and help
be able to sort of benchmark and make progress
in a way that's sensitive to the nuanced differences
in different kinds of populations.
Our commitment to you, working on
fair and ethical artificial intelligence and machine
learning, is to continue to measure, to improve,
and to share real-world impact related to ethical AI
development.
Thanks.
[APPLAUSE]