字幕列表 影片播放 列印英文字幕 [applause] >> Sarah Zampardo: Great. So my short talk today is going to focus on testing to learn. So I'm going to start, as a Wisconsin native, and I'm sorry Tim, I should have given you warning about this, I don't know where you went, I'm a big Packers fan. I'm really looking forward to the beginning of the season in just two weeks against the Cardinals. And this year I have a new strategy for Coach McCarthy. Instead of playing all of your top players in every game, instead I recommend that you bench them and play your second string. See, I did a little bit of analysis, and I see that when you play the top players, you actually have a lower winning percentage than when your second-string players have more playing time. Not so fast, right? In fact, it's the case that—it could also be that you're only putting in your second-string players when you're far ahead in the game, so the more playing time that your second-string players have, the more likely you are to win, but it's not because the second-string players are actually creating the winning for the team in the end. And so one way it could actually take a look at this would be to test actually putting in the second-string players earlier in the game, and seeing if the Packers are more likely to win if we had the guy that you're probably don't even know his name at the bottom, versus Aaron Rogers who is actually playing in the game. So, put more simply, it's not that correlation establishes causation, rather that experimentation enables you to establish causation. So Clara and the Heron Foundation have asked us to consider what is a good deal, and how do we identify good deals? And that's defined as understanding in social and financial sectors, how we can find different enterprises that create social and financial performance and impact together. So identifying a good deal is pretensed upon the ability to actually understand the impact of any given program, intervention or initiative. And the gold standard of doing that is experimentation. The whole methodology is centuries old, the scientific method, and it's been applied selectively and successfully in the social sector over many years. However, the potential is much greater. Why? And what can we do about it? I'm going to leave the topic of data infrastructure to other people who have far more experience to talk about that, and instead focus on using K-12 education as an example of a social enterprise, understanding the key and fundamental challenges that I hear often, against experimentation in the education space. And then I'll share some of the ways, through examples and suggestions, that we have actually overcome those challenges, using examples from the private sector. So, a little bit of background about myself first. I'm with Applied Predictive Technologies, as Hope was explaining, and we're the largest software cloud-based predictive analytics firm in the world. We work with a variety of consumer-facing firms across industries, including hospitality, telecoms, banking, CPG, including firms such as the ones you see here , and we help them answer questions about the incremental impact of their programs, which surround issues such as pricing and promotions, marketing effectiveness, brand management, capital investment, remodels, among many others. In addition to my client-facing work, I also head our social sector practice, where we primarily partner with school districts and other educational institutions, to similarly apply the experimental method, to understand the incremental impact of their programs. As some additional background, all my academic background is in experimental economics, specifically focused on the development space, and all my prior professional experience is in the nonprofit and public sector spaces. I'm really excited because I've seen that the tools that I've learned in the private sector that are much accelerated beyond what I've seen in the social sector can actually be applied successfully in the social sector. And I hope to share some of those meaningful examples with you. So the first place that I'll start is actually with an example to help more explain the experimental methodology. So this is the good deal: a different sort of good deal, a good sandwich deal. So in 2004, there was a franchiser for Subway in Miami, who saw that sales were declining on the weekends, so he wanted to offer a very simple and easy-to-understand promotion. So he offered any sandwich for $5. The $5 footlong! And on average that was $2 cheaper than your average footlong sandwich. He saw wildly successful sales, so other franchisers in the area also opted to try the promotion, and they also saw similar results. Before long, franchisers across the country were clamoring for the corporate headquarters to actually introduce the promotion nationally and put some big marketing support across it. However, there was still a lot of skepticism within national. So you can imagine the types of confounding events that could be creating the positive sales and profit results that those franchisers saw in Miami. One possibility is that Miami overall was just seeing an upward trend in the market, so people were more interested in Subway, for whatever reason, what was going effectively in that area. Another thing that the executives pointed out was that other areas of the country were also seeing positive change, so sales were up in Denver, or in Pittsburgh, and they had no $5 footlong, so how do we know that the $5 footlong was actually the idea that was creating the positive growth? So at that time they actually partnered with APT to test the idea. In order to get rid of all of the noise that's going on in the system, we selected a set of test franchisers across the country to actually introduce the $5 footlong, and compared their performance before the promotion was offered, and during the promotion was offered, on sales and margin, to see how the $5 footlong actually affected those particular restaurants. And then, given the results of that, in 2008 it was actually launched nationally. So how does this all relate to K-12 education? In a similar fashion, there are huge decisions, and great decisions, very expensive ones that are made every day across our schools. And they happen on many different levels of government, also in combination with foundations and tech startups and other entrepreneurs. Also, the challenges are great that these interventions are trying to solve. You don't need me to tell you about what the challenges are in education today in the country. The question though, is how do we understand what works? If there are a whole slew of new software platforms for teaching math more effectively to second graders, how do we know which one is the one that we should actually get our teachers to learn, take them out of the schools for a day to have an education session, and then bring them back to the classroom and take time away from what the teachers would normally be doing with the students, to instead be learning math on this new software program? How do we know which ones are more effective, and which ones are actually driving the results that we're looking for? The answer in the education space is the same as it is in the private sector. We need to select a group of students or schools and call them "test". Introduce the intervention into that special selected set, and then compare them to a group of control students or schools so we can measure the effect of the intervention, above and beyond the other factors that are actually separating and aggregating student performance. So if this is such a great idea, as the economist in me asks, why aren't all school districts doing this? There are four fundamental challenges I hear in conversations with school districts that I would like to share with you, in order to show what are those challenges to experimentation, and then also suggest some ways that we have actually overcome them, through the examples of what we've done in the education space. So the first fundamental challenge that I hear is one about equity. We were actually talking about that this morning. "Experimentation is unethical and is unfair", and this is a quote. "How can you justify offering an intervention to only some students, when you know it could help the students that are held out as control?" And I think this is true in a lot of different areas when you're thinking about using experimentation to measure social impact. So the reality is that there are always a lot of different effects that are creating different behavior, and impact, but be it for students, schools, or for stores. If you think about a store for example, what's impacting the traffic that's coming into the store is a function of some uncontrolled factors, such as the local economy, or maybe there's a store down the floor in the mall that has a really successful national promotion going on, so you have more traffic coming in, or maybe it's something like weather. And it's also some controlled factors, like how strong the store manager is, or what types of new products that store has introduced. The same thing when it comes to schools. It could be the fact that there's a really effective administrator or principal. It could also be that a school was selected by the superintendent to receive a special grant, and there are only six grade schools in the entire district who received that grant. And it's not that there's some reason that the other schools in the district didn't receive it, or there could be some that are on the margin, but some students are receiving additional resources just on the basis of limited resources that exist, and no matter whatever sector that you're in. So the way to approach this, instead of saying that it isn't fair, is to actually use that variability and let it work for you, and use it as a way to actually do natural experiment analysis. So an example is we partnered with a major school district, and we were taking a look at turnaround schools. I'm not sure if you guys are familiar with the program or the intervention, but basically the district has identified schools that are severely underperforming, and they identify that the principal and about half the staff have to be turned around, and then they invest additional resources within the school to bring the standardized test scores up to proficiency within three years. So no student chooses to attend a failing school, right? No one wants to attend a school that has to be turned around the year after they enter the school, but yet the experience of that student in that school that's turned around gives us an opportunity to compare them to other students who are not in schools that are turned around. And that analysis showed what are the different characteristics that lead to more successful turnarounds. So that's just one example of how natural experiment analysis is actually taking areas of inequity—and inequity is inevitable—and using it to still learn from it in very valuable ways. The second fundamental challenge is experimental efficacy. So to quote: "I'm concerned about measuring the impact of a new technology in classrooms, because I don't have confidence that all teachers will employ it correctly, or consistently. In this case the analysis isn't measuring anything useful." So this is one that I hear most often from interventionists who are actually coming from outside of the education space, such as technology entrepreneurs. The reality is that you want experimentation, you want any impact analysis, to measure not just how something should perform in a clinical and perfect setting, but also how it performs in combination with the implementation of that idea. Let's stick with technology. If you have a new software program like the one I was talking about earlier, teaching math to second-graders, if you have that particular software program, you could, the coders who created it could go into schools, and work directly with the students on using it, and say that it was wildly successful at increasing the scores or math-readiness of students. However, a better test of how that program would actually perform if launched across the district would be putting it in the hands of teachers, and then measuring how those teachers both use the idea, and also implemented it consistently in the classroom. So, the efficacy of how an idea is implemented is just as important as the idea itself. You can also measure it explicitly, when that's possible, so one example is we were taking a look at the impact of a tutoring program on reading-readiness for smaller students, and on average there was no impact they were able to measure. But then we could also look at the number of hours of after-school tutoring that students experienced, and we saw that as the number of hours increased, as expected, there was more of an impact of the program. So in cases where there is a range of efficacy, you can actually measure that in order to include that in the analysis that you're doing, and only further the idea of what's the specific incremental impact of every hour of tutoring that was available to these students. The third one is data, which is good given this session. "We don't have enough of the right data. Our data is not very high quality, and it's very infrequent. Our schools report standardized test scores only once every year per student and at the end of the school year." So, this is absolutely right. The data that's available in the education space pales in comparison to what we work with in the private sector. To illustrate this very concretely, the particular school district I was talking about, if you take all the data we have for them, so that's ten years, it's hundreds of schools, and tens of thousands of students, if you take that data, it is less than the data of one store for one year in the same city. Not even six months. It doesn't compare. While that's true, we shouldn't let the perfect be the enemy of the good. There's actually incredible work and analysis and experimentation that can be done just using the data that's available today. So an example of this is, for about three years now we've been partnering with a very innovative and leading charter school, and helping them take a look at the incremental impact in academic performance of students who are entering their school. And the nice thing with this particular charter school is that it's a very natural experiment because some students are able to enter the charter school and others are naturally lotteried out of the charter school. So using that natural variation, we're able to set up testing control, under matching students between test and control, based on academic characteristics, and their middle school academic performance, as well as other student characteristics like free and reduced lunch, and other areas that we found to be predictive of their performance in high school. So doing this analysis we were able to come to some very specific, actionable insights, even though all we had available for every student was their standardized test score at the end of every year for the specific subjects that were tested on the state exam. Let me give you a few examples of the value-adds. First of all, being able to measure the incremental impact on standardized test scores for these students, and relating that to college-readiness, as well as relating it to proficiency in the district, was enough for that school to be able to make a pitch to other foundations, donors and the school board, to get additional investment into the school and to build additional campuses, to have more seats for students in the area. Because they could show what they were doing above and beyond the other schools that were also highly acclaimed in their area were doing. Another thing was taking a look at the differences across subjects, between math, science, reading and writing, and also how that varied by different ethnicities and different genders. And they found that a few of their areas, they were adopting a curriculum that was more predisposed to some students learning better than others, which helped them re-shape and make sure that they were pulling along all of the students, to get them to the academic standards that they expected within their school. Another of those actionable insights was actually looking at the impact on freshmen versus sophomores and upperclassmen. They saw that the incremental impact grew over time, which helped them consider opening up a middle school and a grade school, so that the overall culture of the school could be a part of students' lives even earlier than high school. And finally a really important issue to them was making sure that all students were impacted by the school, and it wasn't just that the students who were poorest performing were left behind, or the students who were best performing were brought down to the mean. In fact, we saw that students across all different academic performance levels saw significant achievement gains. So again, not a lot of data, and the data is growing, but even the data we have can be really meaningful when we look at the impact that it has on helping excellent education institutions get better, and identifying interventions that are not doing what we would hope that they do. And the final one, are resource constraints. "Internally we have limited staff dedicated to program evaluation, and they also play an operational role. Also, we can't afford to pay think tanks or consulting firms to conduct large-scale impact evaluations." I absolutely hear this, and resource constraints are a really important thing. I think there are two areas to talk about. The first one quickly is that, just as in the private sector, there are very specific knowledge and information you have to have, in order to conduct any of these types of financial modeling, or econometric techniques that we've been talking about, using software actually makes it accessible to a much broader population. And that's the business model that we've used in the private sector, and have been using with education institutions that we partner with as well. The second area is that experimentation is actually a resource-saving type of analysis. It's not enough just to say that there are so many worse resources that we have; we're taking some from somewhere, and that's overall a negative, but if we're identifying places where you could actually be saving time, or could be using money better, that's overall resource-saving. And so that's what our private sector clients have testified to, and also in the example of the charter school and the turnaround schools, we found similar opportunities. In conclusion, it's a really exciting time right now, because we have more data, we have more interest and more ability than ever before to actually leverage the most powerful tool that we have to understand the incremental impact of any program. If we want to identify a good deal, we need to understand what works, and experimentation is the best way for us to do that. Now, all of our social sector work, and our clients, have the capability to make this an ongoing idea rather than just a one-off issue. We want to be able to learn, and to test to learn, in order to actually make this a part of what we're all able to do to identify good deals. Thank you. [applause] >> Hope Neighbor: Sarah, thank you very much. I have a question to start off the discussion. This approach has a clear benefit in terms of being able to understand how we can revise the way in which we do education in order to achieve better outcomes. It seems also really heady and potentially difficult for a wide range of stakeholders to understand. What is, as this approach has been implemented by different school districts, what are the greatest challenges that you've seen in implementation, and how would you suggest districts approach things differently where they'd implement your approach? >> Sarah Zampardo: When I think about even having an opportunity or space within a district to actually use it, the challenges are really the four fundamental ones I went over. But what it comes down to even more concretely is incentives, and aligning incentives, which I know is a topic we've already talked about. In a school district you need to have someone who is a leader and comes to the front and wants to understand the impact of particular programs. Because it's not expected within that community to know necessarily if a particular program that a principal is doing engaging parents is successful or more successful than another, because if the principal is happy and the parents are happy, who am I to say that it's not working? I should just leave well enough alone. So there are different incentives that come together and are at play, and if there's not a leader within the group who steps forward and says, this is really important to understand how we're spending money, this is important to understand which of the various programs are doing the most in terms of our standardized test scores, then it's not going to happen. It's not enough just to have someone at a lower level, or a principal, who's actually able to take it on, because it is so controversial within the school district. >> Catherine Havasi: I think another question, when you listed the different challenges that you run into, I think another one is actually longitudinal studies. A lot of different intervention programs in education have a short term, maybe a school year or a couple-week bump, and in the long term sometimes it flattens out or even does damage. So a lot of the things that you've talked about here are testing in a one-school-year sort of range. Have you guys done any work looking at longitudinal work? Also, how can we look at different features in the short-term work to see if we can try to predict longitudinal impact? >> Sarah Zampardo: Sure. I think there are two ways to talk about that. One of them is that when you look in the past, so you're doing natural experiment analysis or back-tests, there actually is still an incredible amount of data you can be using. It's still based on just looking at for example K-12 education, and it's not looking at how students did in college, or how they did after college with getting jobs, I think that's the true longitudinal that you're talking about, but at least we have as much as we can within the sphere of the K-12 education to mobilize for types of analysis like that. When you talk about taking it out into further longitudinal work, we get into the issue of identifiers, and having some unique way of tracking a student from what they were in Chicago Public Schools, to what they were when they went to Columbia College, etc. So finding that level of information is something I've only really seen successfully done when there's a whole other study, usually from an academic perspective, or from a think tank, that tries to recreate all of the data over time. So there aren't—it does not exist in databases today that would enable you to do analysis like that. >> Catherine Havasi: You can also do middle school and high school sort of thing. >> Sarah Zampardo: Yeah, so anything within the same sphere, we've looked at. The other thing I would say is part of why we're so interested in longer studies is because we don't have as much information or readily available data to actually make changes more quickly or to learn from it more quickly. So I think what's exciting is that that's actually changing today, as there's more technological innovation that's coming into classrooms. One firm that comes to mind is Amplify, so if you have a way for teachers at the end of every single lesson to monitor how well students are actually receiving that information, then instead of having one data point at the end of every year, you have a data point for every single lesson, for every single day, for every student, and then you can quickly make changes and actually learn from it in a more microcosm of what would work better for teaching students a given lesson, or what types of maybe arrangements of the classroom matter, or what types of class start times or end times or what you should be doing after school, so all of those different issues could actually be addressed if you also have lower-level data. >> Tim Ferguson: So in the case of the charter school could use support. You talked about the charter school itself, but in terms of the intervention, are there any other service providers, or nonprofit for example? >> Sarah Zampardo: Have we worked with nonprofits, is your question? >> Tim Ferguson: Vis-à-vis, in that particular instance, where you've seen the improvement, and you're using them as the—with the control group outside, are there other interventions that are taking place that are not, what I think of as the more obvious, which are around the technology in the classroom and that sort of thing. For example, there are a number of programs that are being run now in this country where there are "teacher assistants", I'm not talking about TFA, I'm talking about another particular program, where they can demonstrate very very clearly that attendance is up, the length of the day has increased, and that the scores are improving as well. So it's not just the technology piece and it's a harder data—well, you can collect the data, and whether it's cause and effect or not time will show, but is that in that charter school case, is that happening there as well? >> Sarah Zampardo: Yes. I would call it a cocktail of different solutions that makes the school experience different for students in that school, or that network is what it is—than it is for another school, and so—except in cases where there's some discontinuity in time of the application of those different programs, it's not possible to disentangle one element from another. So what I'm saying by that is that the charter school has a whole ethos that they all get together at the beginning of the day, and they talk about their values, and then they also have different classroom setups, and so how you disentangle one from another with this type of analysis isn't possible. But getting to the other sorts of information you could start looking at: if a student did or did not actually attend the early morning exercises does that have an impact? But yeah, there are a lot of different effects that are a part of what that charter school represents. >> Mayur Patel: Some of the randomized control trial and quasi-experimental design has been around for a while. I'd be curious to know what you think as the data analytic space gets more sophisticated and crosses over more, where is it going to disrupt those methodologies the most. Is it going to be on the segmentation, on the control groups, on the collection site... what do you see happening in that area? >> Sarah Zampardo: Because you're asking how it changes as it develops, I think I could speak a little bit to what I've seen in the private sector, because that's actually a place where it is very developed. So, what differentiates experimental design and how it really disrupts some of the other types of analysis that we see often in our clients? There are a few different ways. One of them is how you're measuring the counter-factual, or what's the baseline. So, how you determine what the control should be, and how you measure how good the control is at actually predicting the test, the performance of the test, is a very challenging and sophisticated question and there are lots of exciting ways to do that really well. As you get more data and as you get better data, you can even measure how much better you can get at that. So I think the industry is constantly learning how to do that better. I think you're right that segmentation is another important piece. So the way that we think about the learnings that actually come from any test are first of all, what did it do? So what did the program drive on an average level? Second of all, are there different elements of the program that work better than others? So, is it that high hours versus low hours is better, or a 50 cent price increase versus a 40 cent versus a 30 cent, and then third question is about segmentation, which I think is where a lot of the action actually is. So instead of just saying on average it's effective, you're saying, it's more effective with this type of a student than it is with this type of a student, or it's more effective with a school that's in this neighborhood than it is with a school that's in that neighborhood. So when you're able to get to that level of segmentation, it's something that's really hard to do without a sophisticated data infrastructure, because it's that much more data you have to bring into your model after you've already gotten to the first question, which is really what people asked you for in the first place. So when you have it all collected together and you're able to get to the segmentations really quickly, that's when you can be very fine-tuned about how you're taking that intervention, or you're using the resources, and applying it in a way that you'll see the greatest increase for whatever you're actually using it for. >> Annie Donovan: I wanted to ask a question because I—I think Fozzie Bear is really cool! [laughter] >> Bill Wilkie: I've been trying to get this guy to clean my shoes all morning! [laughter] >> Annie Donovan: I know, it's like, so cool! So what do you see happening in the—you're doing this experimentation inside a government framework. How—and education, maybe it's easier, maybe because charters schools have the network that you're working in, it's really critical for them to be able to show value add and all that. But how do you see adaptation happening more broadly in government for—just coming off of the experience of working inside the federal government for a year in the Office of Management and Budgets being obsessed with, how do we do more of this? But it's really hard to get it off the dime. >> Sarah Zampardo: Yeah. It's a challenging question. I feel like I am equipped to speak to education, and not as equipped to speak to how it would impact looking at different programs in prisons for example that help to reduce recidivism. I would say that there's a lot of exciting work that has been done in that space as well that uses experimental design. The challenge is, in order to use the experimental design, you actually have to have a sphere that you can separate from the other things that are going on, and even within that sphere, it's going to be spurious and noisy and there's a lot of things that are going on, but it means that using experimental design isn't necessarily always possible for a program that has to be the same across everyone on a national level. And it's worked in places where you can control more fully the entire experience that someone has within your sphere, which is why, like, experimentation in prisons or experimentation in schools is more possible than experimentation on the tax structure. Your question really gets to what are the underlying requirements of actually using this methodology. If we say that it's a really good methodology to use, what do we need to have existent in order to push it there? I also think it's still incentives as well, that all want to speak to having the data and analysis available for anyone to be using, and so, that's a challenge for the government if they're actually validating their different programs, to have it be known that maybe some of the things that they're doing aren't as effective as some other areas. >> Annie Donovan: Just in terms of market demand for your services, do you see any pickup in the government sector, or...? >> Sarah Zampardo: We haven't really been focusing on there, for the reasons I'm talking about. >> Richard Ling: Are you actually, I've been reading articles about a digital revolution happening in education— >> Sarah Zampardo: Sorry, the what? >> Richard Ling: I've been reading articles about, there's a lot of excitement around bringing tablets into schools and things like that right? >> Sarah Zampardo: Yes. >> Richard Ling: Are you familiar with some of those projects? That's where I can see you can get real-time data on tremendous amounts of information, right? >> Sarah Zampardo: Exactly. >> Richard Ling: What students spend more time on, and translating to more effective scores, etc. >> Sarah Zampardo: Yeah. And both there's a lot of data there, and there's a lot of interest in making sure those tech companies can prove their value to districts, because it takes so much to actually get into districts, so the more data they have to prove what they're doing and why other districts should bring them in, it's a win-win when it comes to tech education. >> Richard Ling: But are the districts open to bringing these things in? >> Sarah Zampardo: There are different groups of innovative schools, so superintendents choose to be a part of consortiums or conferences that really bring these folks together, and tech companies will pitch to those different groups of districts. And some districts are more respected for being open to different types of ideas. Like New York famously has a lot of different programs that they're trying. Another one that's interesting is the Recovery school district in Louisiana, which was created after Hurricane Katrina, that they have, because they're so interested in finding out what works, and there are so many challenges across their schools, it's an opportunity to try a lot of different things , and they're open to things not working in a way that superintendents in other districts aren't as open to. So when it comes to smaller programs, like just introducing a tablet, it's really for the tech company to prove to the district that they should spend money on that new idea. >> Hope Neighbor: Great. We have time—Paul had a question and then we have time for one more after that. >> Paul Light: I just had a quick observation that in looking at nonprofits and government, the big barrier is politics. There's only one person who mentioned it, and David kind of got to it by saying that Democrats and Republicans have different utility functions. [laughter] But, Democrats and Republicans also have different data, so it used to be, Moynihan once said, everybody's entitled to their own opinions, but not to their own data, but that is no longer the case. So I think that's something for us to be thinking about, as the conversation rolls on, is the ideologies, the politics, of getting good information that's valid, empirically grounded, that can be used and relied upon. And what might be the flags that would identify a bad piece of data? Now, Elizabeth, I think, talked about "creepy", which was the first time I heard someone talk about a negative externality, and I do think creepy is a negative externality of some of this, right? You know? So, I think there's a lot going on underneath here, as we're talking about how wonderful this is, that we ought to grapple with. Politics is here, and that's why you [to Annie Donovan] had a tough time at OMB, you know? Why they're hammering hammering hammering, you know—there's a lot of politics here. >> David Rabjohns: Yeah, I just had a piece of small data, which is on the tablet issue, my sixteen-year-old told me yesterday that all the kids in the school have tablets, all they do is play games on them, and don't do any work. So I don't know if that's relevant, but— [laughter] From the sixteen-year-old in the school. >> Sarah Zampardo: And that's experimental efficacy for you! [laughter] >> Richard Ling: Are people playing more games doing better in school? Are people playing less? >> David Rabjohns: Are they getting better at games? >> Hope Neighbor: I think that's actually a great note on which to wrap up the conversation. Sarah, thank you very very much. Thanks to all of our panelists. [applause]
B1 中級 資訊的力量會議。測試學習 (Power of Information Conference: Testing to Learn) 88 6 Hhart Budha 發佈於 2021 年 01 月 14 日 更多分享 分享 收藏 回報 影片單字