字幕列表 影片播放 列印英文字幕 Welcome, everybody. It's a great pleasure to welcome you to our CC Mei Distinguished seminar series. This is a series that is sponsored by the Department of Civil and Environmental Engineering and the CC Mei Fund, and this is our first Distinguished seminar of the term. It's a great pleasure to see it's a full house. Hopefully for the people that will be late, they will still find some seats. And so for today's inauguration talk of the term, we will be hearing from Professor George Sugihara, and George Sugihara is a Professor of Biological Oceanography at the Physical Oceanography Research Division, Scripps Institute of Oceanography at UC San Diego. I'm co-hosting Professor George Sugihara with Professor Serguei Saavedra here in CEE. So professor Sugihara is a data-driven theoretician whose work focuses on developing minimalist inductive theory, extracting information from observational data with minimal assumptions. He has worked across many scientific domains, including ecology, finance, climate science, medicine, and fisheries. He's most known for topological models in ecology, empirical dynamic forecasting models, research and genetic early warning signs of critical transitions, methods of distinguishing correlation from causal interaction time series, and has championed the idea that causation can occur without correlation. He provided one of the earliest field demonstrations of chaos in ecology and biology. Professor Sugihara is the inaugural holder of the McQuown Chair of Natural Science at the Scripps Institute of Oceanography at UCSD. He has won many other awards and recognitions, including being member of National Academies Board on Mathematical Sciences and their applications for a few years. And today, he will discuss understanding nature holistically and without equations. And that's extremely intriguing for all of us. And so without further ado, please join me in welcoming Professor Sugihara. [APPLAUSE] This is in my presenter notes, so I'm reading it off of the screen here. I want to make a disclaimer, however. In the abstract, it says that these ideas are intuitive. Are you good? Are we good? OK. So the abstract says that the ideas that I'm going to present are intuitive, but this is not entirely true. In fact, for whatever reason, at one point, the playwright Tom Stoppard approached me, and he said that he was interested in writing something about these ideas and wondered if it would be possible to explain these to a theater audience. And just read the dark black there. His response was that if he tried to explain it to a theater audience, they'd probably be in the lobby drinking before he got through the first sentence. So the ideas are in fact decidedly counter-intuitively. And this is a fact that in a sense goes against how we usually try to understand things. So I'll explain what that means in a second. So we're all familiar with Berkeley's famous dictum, but despite this warning, correlation is very much at the core of Western science. Untangling networks of cause and effect is really how we try to understand nature. It's essentially what the business of science is all about. And for the most part and very much despite Berkeley's warning, correlation is very much at the core of how we try to get a grasp on this. It's an unspoken rule, in fact, that within science and with how we normally operate, it's a correlation is a reasonable thing to do. It's innocent until it's proven guilty. Thus, distinguishing this intuitive correlation from the somewhat counter-intuitive causation is at the crux, and it's the topic of this talk today. So I'm going to develop a discussion for making this distinction that hinges on two main elements. First, the fact that the nature is dynamic in the temporal sequence matters. Meaning that nature is better understood as a movie than as snapshots, OK? And secondly is the fact that nature is nonlinear, that it consists of interdependent parts that are basically non-separable, that context really matters. That nature can't be understood as independent pieces but rather each piece needs to be studied in the context surrounding it. So let's start with a nice, simple example. All right. Consider these two time series. One might be a species, or these might be two species interacting, or one might be an environmental driver and responding species, or a driver and a physiological response, or money supply and interest rates, something like that. So if you look at 10 years of data, you say your first hypothesis is that these things are positively correlated. You have this kind of working model for what's going on. If you roll forward another dozen years, you find your hypothesis holds, but then it falls apart a little bit here and in the middle, right in here. And then it sort of flips back on here towards the end. So out of 18 years of observations, actually more like 22 years of observations, we find that our hypothesis that these things are correlated is a pretty good one. If this was an ecology pattern, if this was a pattern from ecology, we'd say that this is a really good hypothesis. So we might make an adaptive caveat here, kind of an excuse for what happened when it became uncorrelated, but more or less, this looks like a pretty good hypothesis. This is, however, what we see if we roll forward another couple of decades. In fact, for very long periods of time, these two variables are uncorrelated. They're totally unrelated. However, they appear from a statistical sense to be unrelated, but they were actually generated from a coupled two-species difference equation. So this is a simple example of nonlinear dynamics. We see to two things can appear to be coupled for short periods of time, uncoupled, but for very long periods of time, there's absolutely no correlation. So not only does correlation not imply causation, but with simple nonlinear dynamics, lack of correlation does not imply lack of causation. That's actually something that I think is fairly important. In retrospect, what I just showed you, you might think this is obvious, but apparently this is not well known, and it contradicts a currently held view that correlation is a necessary condition for causation. So this was Edward Tufte who said that empirically observed variation is a necessary condition for causation. OK. So the activity of correlation, I think, reflects the physiology of how we learn. And one can argue that it's almost wired into our cognitive apparatus. So the basic notion beyond Hebbian learning is that cells that fire together wire together. So the mechanism of how we learn is really very sort of supportive of the whole notion of correlation. So I think it's very fundamental to how we perceive things as human beings. OK. The picture that emerges is not only that correlation does not necessarily imply causation, but that you can have causation without correlation. OK, and this is the realm of nonlinear systems. This is interesting, because this is also the realm of biological systems. So within this realm, there's a further consequence of non-linearity that was demonstrated in the model example, and that's this phenomenon of mirage correlation. So correlations that come and go and that even change sign. So here is a nice, simple example of mirage correlation. This is an example not from finance but from ecology. This is a study by John McGowan, and it was an attempt to try to explain harmful algal blooms at Scripps, these red tides. So these spikes here are spikes in chlorophyll found at Scripps Pier. And what we see in the blue at the bottom are sea surface temperature anomalies. And so the idea was that the spikes in chlorophyll were really caused by the sea surface temperature anomalies. This is about a decade's worth of observations. They were about to publish it, but they were kind of slow in doing so. And in the meantime, this correlation reversed itself. And not only did it reverse itself, it then became completely uncorrelated. So I think this is a classic example of a mirage correlation. OK. So here's another example from Southern California. Using data up to 1991, there's a very significant relationship between sea surface temperature here, and this is a measure of sardine production, so-called recruitment. So this was reported in '94 and was subsequently written into state law for managing harvest. So if you are above 17 degrees, the harvest levels are higher. If you're below 17 degrees, they were lower. However, when data-- if you add it to this existing data, data from '94 up to 2010, this is what you find. The correlation seemed to disappear in both cases. So these are two different ways of measuring productivity, and the correlation disappeared in both of them. So this statute that was written into state law has now been suspended. And this is where it now stands. All right, so another famous example from fisheries was this meta-analysis on 74 environment recruitment correlations that were reported in the literature. So these correlations were tested subsequent to the publication of each original paper by adding additional data to see if they were upheld. And only 28 out of the 74 were. And among the 28 that were upheld was the sardine, so we know what happened there. OK, so relationships that we thought we understood seemed to disappear. This sort of thing is familiar in finance where relationships are uncovered but often disappear even before we try to exploit them. OK. So how do we address this? The approach that I'm going to present today is based on not only your state space reconstruction, which I refer to here with a little less technical but I think more descriptive name, which is empirical dynamics. So EDM, Empirical Dynamic Modeling, is basically a holistic data-driven approach for studying complex systems from their attractors. It's designed to address nonlinear issues such as mirage correlation. I'm now going to play a brief video that I think is going to explain all. This is something that my son actually made for me when I tried to explain it to him. And he said, no, no, no, you can do this-- it doesn't take three hours to explain this to someone. You can do this in like two minutes with a reasonable video. So he made this nice video for me. The narration is by Robert May. [VIDEO PLAYBACK] - This animation illustrates the Lorentz attractor. The Lorentz is an example of a coupled dynamic system consisting of three differential equations, where each-- [END PLAYBACK] Oh, technical difficulties. Sorry. Let me start it again. Hold on. [VIDEO PLAYBACK] - This animation illustrates the Lorentz attractor. The Lorentz is an example of a coupled dynamic system consisting of three differential equations where each component depends on the state and dynamics of the other two components. Think of each component, for example, as being species-- foxes, rabbits, grasses. And each one changes depending on the state of the other two. So these components shown here as the axes are actually the state variables or the Cartesian coordinates that form the state space. Notice that when the system is in one lobe, X and Z are positively correlated. And when the system is in the lobe, X and Z are negatively correlated. The other wing of the butterfly. We can view a time series thus as a projection from that manifold onto a coordinate axis of the state space. Here we see the projection onto axis X and the resulting time series recording displacement of X. This can be repeated on the other coordinate axes to generate other simultaneous time series. And so these time series are really just projections of the manifold dynamics on the coordinate axes. Conversely, we can recreate the manifold by projecting the individual time series back into the state space to create the flow. On this panel, we can see the three time series, X, Y, and Z, each of which is really a projection of the motion on that manifold. And what we're doing is the opposite here. We are taking a time series and projecting them back into the original three-dimensional state space to recreate the manifold. It's a butterfly attractor. [END PLAYBACK] OK. To summarize, these time series are really observations of motion on an attractor. Indeed, the jargon term in dynamical systems is to call a time series an observation function. Conversely, you can actually create attractors by taking the appropriate time series, plotting them in the right space, and generating some kind of a shape. OK, this is really the basis of this empirical dynamic approach. What is important, I think, to understand here is that the attractor and the equations are actually equivalent. Both contain identical information, and both represent the rules governing the relationships among variables. And depending on when they are viewed, these relationships can appear to change. And this is what can give rise to mirage correlations. So over the short term here, there might be correlations. But over a longer term-- so for example, if it's in this lobe-- I'm very bad with machines. All right. If it's in that lobe, you'll get a positive relationship. If it's in the lobe on this side, you'll get a negative correlation. If you sample the system sparsely over long periods of time, you'd find no apparent correlation at all, OK? OK, let's look at another real example of this. So this is an application that I was initially skeptical about, mainly because I couldn't see how to get time series. But luckily, I was wrong here. These are experimental data obtained by Gerald Pao from the Salk Institute on expression levels of transcription factor SWI4 and cyclin CLN3. This is in yeast. If you view it statistically, so this is viewed statistically, the relationship between these two variables, there's absolutely no statistical relationship. There's no cross-correlation. However, if you connect these observations in time, they're clearly inter-related. So we see the skeleton of an attractor emerging. So the way that they generated this data, actually, which-- so when I was originally approached about this, and they said, well, we want to apply these methods to gene expression. And I said, but you can't make a time series for gene expression. And they said, oh, yes, we can. And what they did in this case, because it was yeast, they were able to shock cells, which synchronizes them in their cell cycle, and then sample them every 30 minutes for two days. And so at each sample, they would sequence several thousands of genes and do this every 30 minutes for two days. You can do a lot if you have post-docs and graduate students, all right? OK. So we were able to get this thing to actually reflect an attractor. Very interesting. Of course, if you randomize these observations in time, you get absolutely nothing. You still get singularities. So you get these crossings in two dimensions. However, if you include the cyclin CLB2, the crossing disappear, OK? So we have this nice cluster of three things, that actually if you looked at them statistically, appear to be uncorrelated, or essentially invisible to bioinformatics techniques that are, in fact, dynamically interacting. So here is another short video clip that I think presents what I consider to be a really important basic theorem that supports a lot of this empirical dynamics work. [VIDEO PLAYBACK] - There's a very powerful theorem proven by [INAUDIBLE].. It shows generically that one can reconstruct a shadow version of the original manifold simply by looking at one of its time series projections. For example, consider the three times series shown her. These are all copies of each other. They are all copies of variable eggs. Each is displaced by an amount tau. So the top one is unlagged, the second one is lag by tau, and the blue one at the bottom is lag by two tau. Takens' theorem then says that we should be able to use these three time series as new coordinates and reconstruct a shadow of the original butterfly manifold. This is the reconstructed manifold produced from lags of a single variable, and you can see that it actually does look very similar to the butterfly attractor. Each point in the three-dimensional reconstruction can be thought of as a time segment with different points capturing different signals of [INAUDIBLE] of variable eggs. This method represents a one-to-one map between the original manifold, butterfly attractor, and the reconstruction, allowing us to recover states of the original dynamic system by using lags of just a single time series. [END PLAYBACK] OK. So to recap, the attractor really describes how the variables relate to each other through time. And Takens' theorem says quite powerfully that any one variable contains information about the others. This fact allows us to use a single variable basically to construct a shadow manifold using time lags as proxy coordinates that has a one-to-one relationship with the original manifold. So constructing attractors, again, from time series data is the real basis of the empirical dynamic approach. And as we see, we can do this univariately by taking time lags of one variable. We can do this multivariately with a set of native coordinates, and we can also make mixed embeddings that have some time lags as well as some multivariate coordinates. So let's look at some examples. So this is an example of using lags with the expression time series. This is a mammalian model. Mouse fibroblast production of an insulin-like growth factor binding protein. And again, this is the case of synchronizing and then sampling over a number of days. So clearly gene expression is a dynamic process, which is quite a radical departure, I think, from normal bioinformatics approaches, which are essentially static OK. Here we have another ecological example. These are attractors constructed for sockeye salmon returns, and this is for the Fraser River in Canada, which is like the iconic salmon fishery. And you can see for each one of these different spawning lakes, you get an attractor that looks relatively similar. They all look like Pringle chips, basically. And what's interesting about this-- and I'll talk about this a little bit more later-- is that you can use these attractors that you construct from data to make very good predictions. And the fact that you can make predictions and make these predictions out of sample, I think, should give you some confidence that this is reasonable. So again, I'm talking about a kind of modeling where there really are almost no free parameters. There's one in this case, right? I'm assuming that I can't adjust the fact that I'm observing this once a year. So that's given. Tau is given. The time lag is given. The only variable that I'm using here that I need to kind of estimate is the number of dimensions, so the number of embedding dimensions that we need for this. In this case, I'm showing it in three dimensions. Not all of these attractors, of course, are going to be three-dimensionals. The ones that I'll show you tend to be, only because you can see them and they're easy to understand what's going on. So the basic process is really involving very few assumptions and with only one fitted parameter, with that fitted parameter being the embedding dimension. OK. So the fact that I'm able to get to using-- this is again, just using lags-- something coherent in three dimensions means that I might be able to construct a mechanistic model that has three variables. So maybe sea surface temperature, river discharge, maybe spawning, smolts going into the ocean, something like that. OK. So again, one of the most compelling features, I think, of this general set of techniques is that it can be used to forecast. And the fact that you could forecast was something that originally got me interested in this area or this set of techniques. And it kind of led me into finance, so I worked for like half a decade as a managing director for Deutsche Bank. And things like this were used to manage on the order of $2 billion a day in notional risk. So it's very bottom line, it's very pragmatic, and verifiable with prediction, all of which I find-- plus it's extremely economical. There are very few moving parts. OK. So I'm going to quickly show you two basic methods for forecasting. There are many other possibilities that exist, but these are just two very simple ones, simplex projection and S-maps. So simplex projection is basically a nearest neighbor forecasting technique. Now you can imagine having the number of nearest neighbors to be a tunable parameter, but the idea here is to be minimal, and the nearest neighbors are essentially determined by the embedding dimension. So if you have an embedding dimension of e, you can always-- a point in an e dimensional space can be an interior point in e plus one dimensions, which means you just need e plus one neighbors. And so e plus one-- so the number of neighbors is determined. It's not a free variable in this, OK? So the idea then is to take these nearest neighbors in this space, which are analogs, project them forward, and see where they went, and that'll give you an idea for where the system is headed. OK. So again, each point on this attractor is a history vector or a history fragment, basically. And so here is this point that I'm trying to predict from. And I look at the nearest neighbors, and then I-- these are points in the past, right? And now I say, where do they go next? And so I get a spread of points going forward, and I take the center of mass of that spread, the exponentially weighted center of mass, and that gives me a prediction. So how do you predict the future? You do it by looking at similar points in the past. But what do you mean by similar? What you mean by similar is that the points have to be in the correct dimensionality. So for example, if I'm trying to predict the temperature at the end of Scripps Pier tomorrow, the sea surface temperature, and it's a three-dimensional process, and let's say the right lag should be a week, then I'm not just going to look at temperatures that are similar to today's temperature. I'm going to look at temperatures where today's temperature, the temperature a week ago, and the temperature two weeks ago are most similar, right? And so the knowing the dimensionality is quite important for determining what the nearest neighbors are, all right? So you take the weighted average and that becomes your prediction. Here's an example. This looks like white noise. What I'm going to do is cut this data in half, and I'm going to use the first half to build a model, I'm going to predict on the second half. So if I take time lag coordinates, and in this case, again, I'm choosing on purpose three three-dimensional things, because they're easy to show. This is like taking a fork with three prongs, laying it down on the time series, and calling one x, the other one y, the other one z. So I'm going to plot all those points going forward, and this is the shape I get. So you actually get what looked like white noise, and it totally random actually was not. In fact, I generated it from first differences of [INAUDIBLE], OK? So if we now use this simple zeroth order technique and we try to predict that second half of the time series that looked totally noisy, you can do quite well. This is actually predicting to two points into the future, two steps into the future. OK. So again, how did I know to choose three dimensions? Basically you do this by trial and error. You try like one, two, three. And it peaks So this is, again, how well you can predict. This is the Pearson correlation coefficient. And this is trying different embedding dimensions, trying a two-pronged fork, a three-pronged fork, so on. And again, so the embedding with the best predictability is the one that best unfolds the attractor, the one that best resolves the singularities. And this relies basically on the Whitney embedding theorem. So if the attractor actually was a ball of thread, OK, and I tried to embed this ball of thread in one dimension, that would be like shining a light down across over a line. Then at any point, I could be going right or left. So there's singularities everywhere. If I shine it down on two dimensions, I now have a disk. At any point I can go right, left, up, down, so forth. Everywhere is a singularity. If I know embed it in three dimensions-- so the thread is one-dimensional, right? If I embed it in three dimensions, all of a sudden, I can see that I have individual threads. And if you have these individual threads, that allows you to make better predictions, right? So this is how you can tell how well you've embedded the attractor, how well you can predict with the attractor. OK. All right. So the other-- sort of the next order of complexity is basically a first-order map, which is a weighted autoregressive model where you're effectively computing a plane along the manifold along this attractor and using the coefficients of the Jacobian matrix that you compute for this hyperplane, basically, to give you predictions. But when you're computing this plane, there's a weighting function. It's this weighting function that we're calling theta here. And that weighting function determines how heavily you weight points that are nearby on the attractor versus points that are far away, OK? So if theta is equal to zero, then all points are equally weighted. That's just like fitting a standard AR model to a cloud of points, right? All points are equally valid. But if the attractor really matters, then points nearby should be weighted more heavily than points far away, OK? So if there's actual curvature in there, then if you weight more heavily, you're taking advantage of that information, OK? So this is if you crank theta up to 0.5, your weighting points nearby more heavily, so forth and so on. OK. This is a really simple test for non-linearity. You can actually try increasing that theta, the tuning parameter. And if as you increase it the predictability goes up, then that's an indication that you get an advantage by acknowledging the fact that the function is different at different parts on the attractor, which is another way of saying the dynamics are state dependent, which is another way of saying the manifold has curvature to it, OK? So curvature is actually ubiquitous in nature. This is a study that my student [? Zach ?] [? Shee ?] did. And if you look at 20th century records for specific biological populations, you find all of them exhibit non-linearity. We didn't find non-linearity, actually, for some of the physical measurements. But again, we were just looking at the 20th century, and it might've been too short to pick that up. Other examples include other fish species, sheep, diatoms, and an assortment of many other kinds of phenomena. All show this kind of non-linearity. It seems to be ubiquitous. Wherever you look for it, it's actually rare that you don't find it, OK? So the fact that things are nonlinear is pretty important, I think. It affects the way that you should think about the problem and analyze it. And in fact, the non-linearity is a property that I believe can be exploited. This is an example of doing just that. So this paper appeared last year in PRSB, and it used S-maps, this technique that we just saw, to show how species interactions vary in time depending on where on the attractor they are, OK? So it really showed how we can take real-time measurements of the interactions that are state dependent, OK? And the basic idea is as follows. So the S-map involves calculating a hyperplane or a surface at each point as the system travels along its attractor. So this involves calculating the Jacobian matrix, whose elements are partial derivatives that measure the effect of one species on another. So note that the embeddings here are multi-variate. So these aren't lags of one variable, but they're native variables, right? So I want to know how the relationship of each native variable affects the other variable and how that changes through time. So what I do is at each point, I compute a Jacobian matrix. If this was an equilibrium system, there would just be one point, and I would be looking at the-- it's like the standard linear stability analysis for an equilibrium system. But what I'm doing is I'm taking that analysis, but I'm applying it to each as the system travels successively along each point on the attractor. So the coefficients are in effect fit sequentially as the system travels along its attractor. And they vary, therefore, according to the location on the attractor. So what's really nice about this is that it's something that you can actually accomplish very easily on real data. And here's an example. This is data from a marine mesocosm that was collected by Huisman, and what you want to focus on is the competition between copepods and rotifers. These are the two main consumers in this. So these are both zooplanktons that eat phytoplankton. And this is basically the partial of how the callenoids vary with the rotifers. And so you can see that the competition-- so this shows how the coefficients are changing as you computed along as the system is traveling along its attractor. So what's the interesting thing, what I think is interesting here is that I was totally surprised. Competition is not a fairly smooth and long-term relationship, right? In classical ecology, it's regarded as a constant. So two species compete, you compute their alpha.ij, and that's the constant. In fact, it's very episodic. It seems to only occur like in these little bottlenecks, which I think is-- so I mean, this is nature. This is not my model. This is what nature is telling me, that you get competition in these little bottlenecks. So that fact I found fairly surprising. But what's even more interesting is to ask the question, what is it about the system when it does occur that causes this competition? And it turns out that what you can do is make a graph basically of how that coefficient-- this is terrible. I think I got this when I talked at Stanford last fall. OK. All right. All right, it's broken. So you can make a plot of what the competition coefficient-- how the competition coefficient varies as a function of food abundance. And the obvious thing that you get here is that when do you get competition? When food is scarce. I mean, duh. That seems like it should be obvious. But what wasn't clear before is how episodic this all is. It's not sort of a gradual constant affair. It's something that happens in these sudden bottlenecks. So what we have then is a pretty good tool for probing changing interactions. And I can see other potential for this in terms of looking for-- you can compute the matrix and maybe compute something like an eigenvalue for the matrix as it changes to look for changes where-- to look for instances where you were about to enter a critical transition. So this stuff really hasn't been written up yet. You should go ahead and do it. But I see a lot of potential for just using this fairly simple approach, which again, is very empirical, and it allows the data to tell you what's actually happening. OK. So let's see how EDM deals with causation. OK. This is the formal statement of Granger causality. So basically he's saying, I'm going to try to predict Y2 from the universe of all possible variables. And this is the variance, my uncertainty in my prediction. And it says that if however I remove Y1 and I'm trying to predict Y2, and this variance is greater, than I know that Y1 was causal. So it says if I exclude a variable and I don't do as well at predicting, then that variable was causal. That's the formal definition of Granger causality. The problem, however, is that this seems to contradict Takens' theorem. So Takens' theorem says the information about other variables in the system are contained in each other variable, OK? So how can you remove a variable if that variable's information is contained in the others? So there is a little bit of a problem. What's interesting is if you look at Granger's '68 paper where he describes this, he says explicitly, this may not work for dynamic systems. So-- [LAUGHTER] He was covered. OK. So I think this is a useful criterion sort of as a kind of a rule of thumb, practical rule of thumb. But it really is intended more for stochastic systems rather than dynamic systems. OK. So in dynamic systems, time series variables are causally related again if they're coupled and belong to the same dynamic system. If X causes Y, then information about X must be encoded in this shadow manifold of Y. And this is something that you can test with cross-mapping. This was the paper that was published at the end of 2012 that describes the idea. And I have one final video clip. It's not narrated by Bob May. I had my student [? Hal ?] [? Yee ?] do the narration on this one. But it'll explain it. [VIDEO PLAYBACK] - Takens' theorem gives us a one-to-one mapping between the original manifold and reconstructed shadow manifolds. Here we will explain how this important aspect of attractor reconstruction can be used to [INAUDIBLE] two time series variables belong to the same dynamic system and are thus causally related. This particular reconstruction is based on lags of variable x. If we now do the same for variable y, we find something similar. Here we see the original manifold M, as well as the shadow manifolds, Mx and My, created from lags of x and y respectively. Because both Mx and My map one-to-one to the original manifold M, they also map one-to-one to each other. This implies that the points that are nearby on the manifold My correspond to points that are also nearby on Mx. We can demonstrate this principle by finding the nearest neighbors in My and using their time indices to find the corresponding points in Mx. These points will be nearest neighbors on Mx only if x and y are causally related. Thus, we can use nearby points on My to identify nearby points on Mx. This allows us to use the historical record of y to estimate the states of x and vice versa, a technique we call cross-mapping. With longer time series, the reconstructed manifolds are denser, nearest neighbors are closer, and a cross-map estimates increase in precision. We call this phenomenon convergent cross-mapping and use this convergence as a practical criterion for detecting causation. [END PLAYBACK] OK. So with convergent cross-mapping, what we're trying to do is we're trying to recover states of the affected variable-- we're trying to recover states of the causal variable from the affected variable. And so this is basic. Let's see. The idea is that instead of looking specifically at the cause, we're looking at the effect to try to infer what the cause was. So basically from the victim, we can find something about the aggressor or the perpetrator, right? OK. This little piece, I think, will give you a little bit of intuition. So these two time series are what you get if alpha is zero. So this is y is red and x is blue. And you can see that with alpha equal to zero, they're independent. If I crank up alpha, and then this is what I get. So again, you can see that the blues time series is not altered, but the red one, but y actually is. And it's in this alteration of the time series that I'm able, from the reconstructed manifold, to be able to backtrack the values of the blue time series. And so that shows that x was causal on y. OK. A necessary condition for a cross-map estimate for-- a necessary condition for a convergence is to show that the cross-map estimate improves with data length. And so that's basically what we see here. So as points get closer in the attractor, your estimates should get better, and so predictions should get better. So let's look at some examples. This is a classic predator/prey experiment that Gauss made famous. So didinium is the rotifer predator, paramecium is the prey. And you can see, you can get cross-mapping in both directions, sure. The predator is affecting the prey, the prey is affecting the predator. This sort of looks like maybe the predator is affecting the prey more than the prey is affecting the predator. But if you look at this in a time lag way, so this is looking at different prediction lags for doing the cross-mapping, you find that the effect of the predator on the prey is almost instantaneous, which you kind of expect. These are rotifers eating paramecia. But the effect of the paramecia itself on the predator is delayed, and it's delayed looks like by about a day or so. So you get sort of a sensible time delay here. OK. This is a field example. These are sardines and anchovies that have been sort of a mystery for quite a while. They show reciprocal abundance patterns. And it was thought that maybe they compete. These are data for Southern California. It may well be that they are competitive in other areas, maybe the Japan sea. But not in Southern California. There's absolutely no evidence for mutual effect to sardines and anchovies there. However, if you look at sea surface temperatures, you find that they're both influenced by sea surface temperature, but probably in slightly opposite ways. So that's kind of a nice result for that problem. OK, now final ecological example are these red tides. Episodic red tides are a classic example that no one has been able to predict. They've been thought to be regime-like, and the mechanism for this rapid transition has remained a mystery for over a century. So despite about a dozen or so Scripps theses all showing by experiment that certain factors should be important, none of them show a correlation. So if you look at the field data, you actually don't see the correlation that you would expect if you had done the experiments. So this was exactly the case that we saw, for example, with sea surface temperature anomaly and chlorophyll. So you get these little temporal correlations that then disappear. So the absence of environmental correlations suggests that these events can't be explained by linear dynamics. And you can confirm this by doing an S-map test. You find, in fact, chlorophyll is very nonlinear. If you increase theta, it improves. But the most convincing thing is that you can actually find very good predictability using a simple manifold constructed in forecasting using an S-map. So the univariate construction, because you're just looking at the one variable, is really summarizing the internal dynamics, the intrinsic dynamics. And so if you just focus now on the red tides, the correlation goes down. So we actually can't predict these red tides quite as well from just the simple internal dynamics, which suggests that there may be stochastic variables coming in to force the system, OK? So we then did the obvious thing, which was to apply CCM to these environmental variables that were thought to be important but that showed no correlation. And so that's what we did. And these candidate variables fall into two groups, those that describe nutrient history and those that describe stratification. If you look at correlation, you actually find very little correlation in there at all. But if you do cross-mapping, just about all of them show a good cross-map scale, OK? So just about all of them contained some information. And so this was very encouraging. This is actually a class project, and there were eight of us involved. And we had data. We did all this analysis, and we had data going up to 2010. The data from 2010 onward had not yet been analyzed. We had all the samples, but they hadn't been analyzed. And so we came up with our set of models, and then we were able to process the data, and we all sort of-- there were 16 fingers being crossed. And we did the out-of-sample test, and this is the result, which was very good. We actually found very good predictability with a correlation coefficient of about 0.6. So this is a really nice example of out-of-sample forecasting using these methods. So we've learned something about the mechanism, that the mechanism has something to do with water stratification, stability of the water column. And on top of that, we're actually able to forecast these red tides with some accuracy. All right, so this is potentially the most exciting application of these ideas, and this is the last big piece that I want to talk about. So this experiment, this is the experimental work being done in the Verma Lab at the Salk Institute. And this is the attractor that we saw earlier. Remember, these things were all mutually uncorrelated in a statistical sense, but we found were causally related. And so if you make an attractor using all three, this is what you get. These things are also uncorrelated, but very strongly causally linked to the transcription regulator WHI5. So this suggested that one could do an experiment with WHI5 to see how well this method of CCM actually does at identifying uncorrelated causal links. So this is an example showing the uncorrelated linkage. You can see that WHI5 and SWI4 are completely uncorrelated. Those are the original time series. But if you do cross-mapping, you find in fact there's a significant signal. You can recover values of WHI5 from values of SWI4 on the attractor. OK, so the experiment that this suggests is if you alter the value of WHI5 artificially, if you experimentally enhance WHI5, because it's causally related, it should produce an effect on the dynamics of these other genes. And so that's what we did. And so the black is while type, and the purple dotted one is the manipulation. So the manipulation clearly deformed the attractor. And this is something that you can actually quantify pretty easily. OK, so if you repeat this procedure for other genes showing a low correlation with WHI5-- this is the money panel right here-- you can find that 82% of the genes that were identified by CCM to be causal-- these are all uncorrelated-- to be causal, were actually verified by experiment to be causal, which is really good, because the industry standard for [INAUDIBLE] is 3%. So this is better. This is actually better. The other thing that I think makes this interesting is that these non-correlated genes that are also causal are thought to be signal integrators, and signal integrators may be really, really important for gene expression. So we'll see how this all goes. So I think that this could have immediate practical importance, because the networks that you generate this way can provide good guidance for the experiments that need to be done. So you have 25,000 genes, and so you can do 25,000 [INAUDIBLE] experiments. That's just too much. And so you need something to kind of narrow down what to focus on, and this may be a reasonable thing. All right. So this is a mammalian example of the same sort of thing. This is a group of genes that Verma has studied for about 30 years-- so a very well studied group-- that have to do with the immune response. And this is the network that you would get if you just looked at cross correlations. But this network turns out to be entirely wrong. There's a very well known bi-directional feedback between I kappa, B alpha, and relA. And this is the network that you get with CCM. So what's interesting is that this CCM network actually identifies another link that looks interesting between relA in June that was not previously known. And so this link, because you have this bi-directional feedback, should produce some kind of limit cycle-like behavior. And so if you make a phase portrait of these two, you should see something that looks kind of limit cycle-like. The same should be true here, OK? I'm almost done. All right. So if you do this, this is the known link, and we get something that looks kind of limit cycle-like. This was the previously unknown link, OK, and you do get this behavior. And this was actually the incorrect link that was suggested by correlation. So kind of interesting. All right. So there are a bunch of recent studies that have looked at this. I'll just go through them really fast. This one was focused on forecasting. OK, hold on. Let me go-- OK. So this one had to do with the incidents of cosmic rays in 20th century that's been used to suggest that climate warming is natural and not due to man. And what we did that was interesting is that we found that if you look at over the 20th century, there is no causal relationship between cosmic rays and global warming. However, if you look at the time scale year-to-year, you find a causal signal. So in fact, it does have a very short-term effect on inter-year dynamics, but it doesn't explain the trend. OK. So this was a study on the Vostok ice core to see if there is a direct observational-- if we get direct observational evidence for the effect of greenhouse gases on warming. And we found it, of course. But the other thing that we found that was kind of interesting was you actually have a link in the other direction as well, but it's delayed. And so this is a more immediate effect. This one takes hundreds of years to occur, OK? And then this one focused on forecasting. It was a great success story, because the models that we were able to produce got some interest in the Canadian press, and we made forecasts for 2014, 2015, and 2016 that were all pretty good. So, so far, so good. I don't know what's going to happen in 2017. So this is a nice example of out-of-sample sample forecasting. The classical models, if you actually try to include environmental variables, do worse if you do that. With these models, it does better. OK, and then this one appeared last fall. It was an application of these ideas to look at flu epidemics. And what's interesting here is that we were actually able to find a particular temperature threshold, 75 degrees, below which, absolute humidity has a negative effect on flu incidence, above which, absolute humidity has a positive influence. And I think the hypothesized mechanism is below 75 degrees, the main environmental stressor is viral envelope disruption due to excess water, right? Above 75 degrees, desiccation becomes the main environmental stressor. And so higher humidity helps actually flu incidence at higher temperatures, but it inhibits flu incidence at lower temperatures. And so, of course there are many other factors than absolute humidity, but this was one that came out. And it may actually be that the proximal driver is relative humidity, but you're asking, what's the-- but relative humidity varies depending on whether you're inside or outside. Absolute humidity is much more robust. Absolute humidity outside is going to be about the same as it is inside. So yeah, an interesting nuance. All right. This paper won the 2016 William James Prize in Consciousness. We'll stop at that. All right. So I'm just going to stop there. All right. And so these are my tasteless thematic closing slides. This is a little politically incorrect. In fact, all of them are politically incorrect. My wife told me this was cute, so you can blame her. All right. And so this is with particular reference to the fisheries models that are built on assumptions of equilibrium. There we go. Yeah. And then, as we all know, this is true. Thank you. [APPLAUSE] All right, thank you very much for a great talk. Questions. Thank you for the nice talk. I would like to ask you, what is your feeling about the applicability of data-driven methods in general in systems with high intrinsic dimensionality? Let's say fluid flows, climate models. In this case, how do you choose the variables to model? And what is the effect of sparse data in phase space? Have you considered such issues? Yeah. Well, I think that-- well, I've chosen examples of opportunity. So the things that I've chosen have all shown attractors that are relatively low dimensional. They were taken from problems that you may not necessarily have thought in advance should be low-dimensional, so like gene expression I figured should be very high-dimensional, but it turns out that there are facets of it that certainly look relatively low-dimensional. So this is kind of a copout answer, but really you just have to try. You have to see if you have data. So the place to start is just with data. You say, well, maybe I don't have the perfect observations. That's fine. But you need to start with some set of observations, and then you can build from there. And you might find that there are maybe two or three variables that are very tightly related and something interesting might come out of that. So it really is, it's kind of like following your nose. I mean, you don't necessarily have the whole plan in advance, but what it requires is an initial set of data, the willingness to actually give it a try. So again, the dimensionality that we're getting in these models is not an absolute number. So surely any one of these things, even the fisheries models, probably really, really high-dimensional in principle. However, for all practical purposes, you can do quite well, and you can measure how much variance you can explain by how much predictability you have. You can do quite well with about four dimensions. Four is not a bad number of dimensions to use in some of these salmon returns models. So you can think of a problem hypothetically as being very high-dimensional, but if you have data, and that data actually shows that in maybe six or seven dimensions you can get some degree of predictability, then I think you have a little bit of insight into that problem. You've gained a little bit of advantage on that problem. Yeah. OK. So it looks as if there is a bit of an assumption underlying some of this where you kind of have to assume the underlying manifold remains stable while you're collecting these data to populate this shadow manifold. So are you working on methods for detecting shifts in that underlying attractor, whether you're on a non-stationary, nonlinear regime? Yeah, I mean, that's a great question. So whether something is stationary or not can depend on how long you observe it. So for example, if you have motion in one lobe of the attractor and then it flips to the other lobe, do you say this is an unstationary process? No. It just depends on how long you've looked at it. You're asking a really important question, and it's something that you can answer practically pretty much by windowing, sort values in the past to see if you're getting essentially the same dynamics as you go forward. The danger with that, though, is that you can have problems where the essential dynamics-- let's say it's an annual cycle of an epidemic, for example, were the essential dynamics, say, during the outbreak are five-dimensional, but as the thing recovers, it collapses down to zero-dimensional, becomes stable for a period of time. And so what you're asking, I mean, it really is an important question, but I believe there are some practical ways of addressing, but there is no simple universal way of doing it. So maybe windowing is one way of doing it. But again, you have to be careful that by windowing you haven't artificially created what looks like a non-stationary process that actually is stationary. And in the end, the way that you judge how well you've done this is how well you can predict. So if your predictions actually start to degrade as you're going forward, then you have some reason to be suspicious. Yeah, yeah. Oh, thanks for a nice talk. I have a question. Have you ever tried that the methods fail in some cases, like if you use some mathematical principles, like in what kind of system this method will be successful, and then in what kind of system this method will not be successful? Can you describe it in using some mathematical patterns? Yeah. So one kind of system where they would may not be successful is where you don't really-- a system that's really fundamentally stochastic, right, where, in fact, there are no deterministic rules of any kind. But those are systems that as scientists we like to stay away from, right, because what is there to explain? So my answer to this would be that I tend to like problems where I start looking at them, and they're giving me little sugar tablets of results. And so I keep going in that direction. And personally, that's how I operate. So I would stay away from a problem. And maybe that's why I'm not encountering as many problems that are totally intractable is that they haven't-- I'm like an ant following a gradient of sugar. They haven't kind of led me in that direction. But it's a good question. I don't think this is going to work for everything obviously, right? But I've just had pretty good luck so far sort of. But it's not just luck, because I'm actually following a gradient. So I'm attracted to the problems where it seems to be working. Yeah. So the gene expression problem, I had basically written off. So when I was initially approached and I thought, this could not possibly work, they walked away, and I thought, oh, OK. I won't see these people again. But then they came back with data, and they showed that it did work. And then we have this really good collaboration going right now. Yeah. Just along those lines, it's pretty obvious that it won't work in cases where your observations simply aren't sufficient to fully describe the system. So yeah. It was also on the tip of my tongue to ask if you were working at Deutsche Bank in 2008, but I won't. What? If you were working at Deutsche Bank in 2008, but I won't ask that. Oh, no, no, no, no. No. No. So yeah. No, I was there from '96 to 2002. OK, that was safe, then. Got out in time. Last question then is, has any of the stuff been extended to [INAUDIBLE] categorical types of data? I think that it's possible. We are working on something right now that we're calling static cross-mapping, which is trying to do that sort of thing. We have some initial results that look pretty good. But no, I think that's a really important area. So we don't always have time series. And it's much harder to get that kind of data than it is to get-- [INAUDIBLE] Exactly. But there's another kind of ordering, of course, that you can put on this data. And I think that like in ecology, it's much harder to find time series than it is to find cross-sectional studies where you have lots of samples of lots of species. And there is a method that was in the end that I had to flip through and not show that basically allows you to-- if you've observed a system for a short period of time, so if you're limited by the length of your time series, but if there are many interacting variables, you have an advantage, and that advantage grows factorially as you have more variables, which is strange, because it goes counter to our idea that complex systems should be a problem. So the curse of complexity we can actually exploit. So the fact that these things are interconnected basically means that each variable provides another view of the system. And so if you have many interconnected variables, you have many different views of the system. [INAUDIBLE] Well, yeah. What this is saying is that this is kind of a way to counter the problem of high dimensionality, that if you have a lot of interacting dimensions, you have the potential for many alternative views of the problem. So if you did an embedding using-- [INAUDIBLE] embedding using each dimension, each one gives you another view of the problem. You can then actually do these mixed embeddings that combine to take lags plus combine other dimensions, and you end up then with a factorial number of representations of the system. And so this is actually a good way to reduce noise in systems. There's a paper that came out last summer in Science, it's called "Multi-View Embedding," that tries to exploit this. Yeah. So a couple times during your talk, you alluded to having found the right lag, i.e. the right value of tau. Are their values of tau that perform better than others in practice, and why is this? Yeah. So there no doubt are. In ecology, we never do-- we rarely have the luxury of having oversampled data. And so by default, the lag is typically just one, whatever the sampling interval was. So in the limit of continuous data, the tau shouldn't matter? Oh, no, no, no, the tau will matter. So if you are a physicist recording something almost continuously in time, then the tau does matter. So now you have two free variables. You have to fit tau and you have to fit e. And what you're doing, you want to choose tau that allows you to unfold the attractor maximally. And the way that you can determine that maximal unfolding is by prediction, simple prediction. Thank you. Yeah. OK, one last question if anybody has one. No? So this might be a bit of a naive question, but where can one learn more about this? Because it seems like it's relatively new. There's advancements all the time in our understanding of the world with this tool. What realm is it under? Is it statistics or biology? Or what are the places that are doing research with this? Yeah, so it is relatively new. My lab has produced a bunch of papers dealing with this. There is a software package now, it's called rEDM that's on CRAN that has a tutorial, and it discusses some of this. But no, I need to write a review paper or a book or something that puts it all in one place, and that hasn't been done yet. So yeah. But the software package is good. My student put it together. So I had all this horrible research software that really was not intended for human consumption. It was like for me in my pajamas at home. But he rewrote it in R, and we put in a very nice sort of tutorial with it to kind of explain some of the basics. But it's amazingly easy to use, and the ideas are actually quite intuitive. And it's something actually that I think a number-- it is gaining. It seems to be accelerating in usefulness. And the citations, for example, are just like doing that. So I think, again, having something that looks good or that sounds good or that seems interesting is very different from having something that actually works. My lab is pretty pragmatic. I say, these things actually have to work. To make it easy for people to understand how they work, we have to provide our markdown so that everything can be exactly reproduced, and the code has to be there. I would encourage you to check out the rEDM code. Yeah. Yeah. All right. Thank you very much. Please join me in thanking your speaker. [APPLAUSE]
B1 中級 C.C. C. Mei Distinguished Speaker Series:杉原治博士 (C. C. Mei Distinguished Speaker Series: Dr. George Sugihara) 31 1 Josh 發佈於 2021 年 01 月 14 日 更多分享 分享 收藏 回報 影片單字