Placeholder Image

字幕列表 影片播放

  • Sunyaev Shamil: Well, it’s great to present for this group.

  • John, thank you for the introduction. And a lot of this work is in close collaboration

  • with John’s lab. So I was going to talk about how epigenetics actually may control

  • genetics. So throughout this meeting there are multiple talks pointing to the importance

  • of genetic variation in understanding of a genetic landscape. So there is a lot of justifiable

  • interest in how genetics controls epigenetics. So briefly all the studies can be summarized

  • as, pick a favorite epigenetic feature. QTL studies, right? So you take -- you have eQTLs,

  • methylation QTLs, chromatin accessibility QTLs; all types of QTLs. And of course we

  • believe that understanding the epigenetic variation on epigenetic features can allow

  • us to go a long way in understanding the mechanism of the [unintelligible] association, the biology,

  • and so forth.

  • However, were interested in the inverse problem, is, what is the fact of epigenetic

  • landscape on genetics? And one of these effect is how a genomic landscape controls mutation,

  • right? Because the source of variation is mutation. So what weve been doing, weve

  • been looking at data on mutations, both in germ line context -- so we are now sequencing

  • data for multiple trios and quads; I wouldn’t be talking about this today -- and somatic

  • mutations, coinco-somatic [spelled phonetically] mutations, where -- so these are differences

  • between parents and children, where changes are happening in the DNA, and here differences

  • are between usually blots and control cell type and cancer cell.

  • So this is the idea, and the idea is to see, what are the facts of epigenetic variables

  • on these changes in the DNA sequence? Why are we interested? So were interested for

  • multiple reasons. One interest is in statistical and medical genetics fields because understand

  • of mutation rate models would inform methods for gene mapping, and I’ll talk about that

  • in a second. Another big interest of ours is evolutionary biology, and there are two

  • reasons we care from an evolutionary biology perspective. One is that mutation rate is

  • a key parameter in a lot of evolutionary models, right? If we want to infer selection, if we

  • want to understand differences between populations, differences between species, date speciation

  • advance, we have to have some understanding of mutation rate. The other interest is evolution

  • of mutation rate itself, right? Because cell controls mutational events, and mutation rate

  • is one phenotype which is under selection. So the question now is not only why mutation

  • rate is what it is, but -- so not only what is mutation rate but why is mutation rate

  • what it is? Also there is of course interest from biology perspective -- biology of DNA

  • repair and biology of DNA replication.

  • So for maybe couple minutes talk about statistical genetics piece of work. So there is a growing

  • interest in gene mapping using de novo mutations. There are two areas specifically; it’s genetics

  • of neuropsychiatric diseases and cancer genomics. And the idea here is to map genes involved

  • in disease progression, or cancer driver genes, using recurrents. So this is not your classic

  • genetic mapping. For example, LD-based association or linkage; this is mapping using mutations,

  • and this is the only mapping strategy which is possible to learn in sectional systems.

  • The idea is very simple: you find different patients carrying mutations in the same gene,

  • collapse them by gene, and you can make an inference that this is a significantly mutated

  • gene, right? There are more mutations than you expect. Now the big question is, what

  • do you actually expect, right?

  • Because in these studies you cannot run case control. You cannot really look at how many

  • mutations in this gene happens in cases versus how many in controls because you would lack

  • the statistical power to do so. So the idea is to do some sort of model. So for example,

  • the simplest approach -- and this is -- was used in early papers on the subject -- you

  • take some estimate of genomic mutation rate using independent samples, then you evaluate

  • from an ability to observe recurrent events in a given gene, correct for multiple testing,

  • right? So why this is not the correct strategy because if you have heterogeneity among samples,

  • especially problem in cancer genomics, where you have one -- some samples basically filled

  • with mutations; others have much lower mutation densities -- you will make flushes, inferences,

  • and this mapping will generate a lot of false positives.

  • So there is another strategy; another strategy is the following. So you take -- look at your

  • real data and just permute data around. Look at permutation expanse, multiple permutations,

  • and you can evaluate how frequently you see these do mutations independently in the same

  • gene. And the problem here, of course, is mutation rate variation, because if mutation

  • rate is heterogeneous along the genome, this may simply be a mutational hot spot which

  • you don’t know about. So what we need -- we need careful model of local mutation rate.

  • And the problem in cancer is that, because of accessibility to specific mutagens or specific

  • genetic changes in repair systems, and I’ll be talking about that. You may have a situation

  • where this mutation rate heterogeneity is patient-specific, not just cancer type-specific

  • but specific to individual patients. Now, over five years ago, we -- again, collaboration

  • with John’s lab -- made an observation that a bold density of human SNPs and human-chimpanzee

  • diversions is increased in later replicating regions of the genome, compared to earlier

  • replication regions of the genome. So we have certain epigenomic variables that control,

  • potentially, mutation rate, so this is stratification of S-phase of cell cycle into four regions.

  • And weve seen increase in both the versions and polymorphous.

  • So this fueled our interest in the question. And it turned out that the same effect is

  • observed in cancer genomics, so this in collaboration with [unintelligible]’s lab. We see that

  • the risk affect the replication timing in pretty much every single cancer type we analyzed,

  • so there is increase of mutation density later replication compared to earlier replication.

  • And some genes that are located in later-replicating regions are sort of usual false positives

  • of mutation mapping and in cancers. There is another variable, which is level of gene

  • expression. Genes that are expressed at high levels have less mutations in cancer genomes.

  • And the standard idea as the culprit is, the transcription coupled repair mechanism. And

  • I’ll show you the pathway because I’ll be -- and I’ll show it then again because

  • I’ll be talking about this pathway throughout the talk. So the idea is the following: in

  • the resolution in DNA, one of the mechanisms is nucleic acid scission repair, which starts

  • with the FDH, which is helicase on one’s DNA. There is precision step in bold direction,

  • there is a resynthesis using the other strand as the template. Now, this mechanism -- this

  • is a very accurate repair mechanism which can be recruited in two different ways. So

  • one way is stalled RNA polymerase, so if the resolution and DNA transcription cannot proceed

  • forward. And polymerase recruits nucleic acid scission repair systems downstream. The other

  • mechanism is what we call global genome repair, is active search by the SPC complex for lesions

  • in DNA. So first thing we decided to check is, “Okay, we think that this mechanism

  • leads to reduction of mutation density in actively transcribed genes. What happens in

  • active regulatory elements?” We decided to look within DNA’s one hypersensitive

  • sites; I don’t have to introduce them for this audience. Youre all familiar with

  • that. My naïve expectation was that mutation density may be elevated because these sites

  • are not protected by nucleosomes; maybe they are more accessible to some sort of damage

  • and so forth. So when we looked at multiple cell types -- this was published last year

  • -- multiple myeloma, colon cancer, melanoma, lung cancer, CLL, and this scale depends on

  • number of samples we had -- we see reduction in every single cell type, reduction of mutation

  • density within regions of open chromatin. Now, what’s important -- the effect is very

  • well localized. I’m not talking mega-basis or hundreds of KB; this is one kilobase resolution,

  • right? And the reduction is compared to immediate flank, and I’m not going through many regression

  • models, how to take into account effective location, effective nucleotide composition,

  • mutational spectrum in this cancer type, and so forth.

  • Okay, so what can be behind this effect? So we decided to look at one system specifically

  • in melanoma, and there are several reasons. One is, there are multiple samples available;

  • it’s high mutation-rate cancer; and, most importantly, we know the mutation source.

  • We have a signature, and we believe this signature corresponds to UV damage of DNA. And we know

  • that the major repair mechanism acting on this signature is nucleic acid scission repair,

  • so we can make some biological hypotheses from looking at this system. Okay, so now

  • it’s little more coincitative [spelled phonetically] presentation on the same data. These are intergenic

  • regions; these are intronic regions; we have mutation density and we have chromatin accessibility

  • in coincitative fashion. This is just number of mapped DNase1 cleavages. So what we think

  • is this is the action of transcription coupled repair of the difference between intergenic

  • and intronic regions. However, within each of those there is very strong dependency on

  • chromatin accessibility.

  • Okay, why is this happening? So there are many possibilities. One is that what were

  • seeing is purifying selection in regulatory elements, so maybe mutations are happening

  • but negative selection purges them, and were not seeing them. So I don’t have time to

  • discuss this in detail, but as somebody who unsuccessfully spent now almost three years

  • looking for signatures of purifying selection in cancers, I don’t believe in that, right?

  • So in order to assume that this is the case, selection must be dramatically stronger than

  • encoding regions of the genome; we never observe that. Another possibility is this -- is association

  • with replication timing or other epigenetic feature, not necessarily specifically with

  • chromatin accessibility. So we test it in two ways; you can run multiple variation regression

  • models and see that this is not the case, and also the scale of the effect is very different,

  • right? So there’s a very localized phenomenon. Okay, so another possibility is the accessibility

  • to DNA repair. And here, what the hypothesis is, XPC in global genome repair is the large

  • bulky complex, like DNase1, right? With footprint which is much larger than a distance between

  • nucleosomes. So it has to work in -- with chromatinized DNA, and there is active mechanism

  • to assist nucleic scission repair to work on chromatinized DNA. And if you look through

  • experimental literature, the access of DNA repair to naked DNA is always much faster.

  • So again, the idea is that global genome repair may work more efficiently in open DNA compared

  • to chromatinized DNA and recruit the same nucleic acid scission repair machinery downstream.

  • Now, even as bioinformaticists, we can test the hypothesis without running any experiments

  • because cancer genomic data -- when you look at mutations, you have phenotype and genotype

  • in the same dataset, right? So I have a phenotype, “What is the drop of mutation density in

  • DNA’s hypersensitive regions?” and I have genotype of a tumor. And I have a hypothesis

  • that nucleic acid scission repair is involved. So we can stratify all our melanoma samples

  • into those where we do not see any change in nucleic acid scission repair -- which are

  • marked green -- or samples where we do observe potentially deactivating mutation anywhere

  • in nucleic acid scission repair pathway. And we see that there is statistically significant

  • enrichment of samples with potentially deactivated nucleic acid scission repair among samples

  • where the drop in mutation density is associated where chromatin accessibility is very small.

  • We can further exploit the structure of the pathway because, if mutations deactivating

  • nucleic acid scission repair happen downstream, and actual repair part of the pathway, then

  • we should [unintelligible] both facts dependency of mutation density on transcription -- so

  • correlation with expression level -- and correlation with chromatin. So as we see here, these three

  • samples, for example, where mutations happen downstream, in these genes, in core repair

  • part of the pathway, they have very small or no decrease in mutation density associated

  • with either transcription or chromatin accessibility.

  • Unfortunately, we had only one sample -- this is sample number four -- upstream specifically

  • with mutations specifically in global genome repair. And this beats the hypothesis, but

  • I probably wouldn’t really make very strong inference from a single sample. So, concluding

  • this part of the talk, we think that mutation density -- what we think, we know -- we observed

  • that mutation density is remarkably reduced in regulatory regions marked by DNase hypersensitive

  • sides. And the fact is, like limited by global genome repair, as can be shown by association

  • of this effect with presence of intact nucleic acid scission repair pathway in the same.

  • Okay, so this is very focal. So what we learned so far -- we learned that mutation density

  • in cancers is shifted towards later replicating regions, regions cancer don’t really -- doesn’t

  • really need, because most of expressed genes and active elements are located in earlier

  • replicating domains. We observed that mutation density in cancers is reduced in actively

  • transcribed genes, in genes cancer needs, versus genes cancer doesn’t need. And we

  • also learned that mutation density is reduced in actively [unintelligible] regulatory elements,

  • right? So this is kind of the thing. So these are primarily observations especially on expression

  • and DNase1 accessibility, with -- specifically within functional -- potentially functional

  • elements. So what happens if we change resolution and well look at the mega-base scale, and

  • we use the data collected by the Epigenome Roadmap Consortium from multiple cell types

  • and multiple epigenomic variables. So first, again, looking at -- looking at variation

  • in DNase1 hypersensitivity, just density of picks per mega-base, versus number of mutations.

  • Again, I use melanoma as an example and I use classic UBE-induced mutation density;

  • there’s pretty good correlation.

  • However, one interested feature we noted is the following. So I can look at three different

  • skin cell types -- melanocytes, fiber blocks, and keratinocytes. And I see that there is

  • decrease in mutation density associated with density of open chromatin regions in each

  • of the three cell types. However, in melanocytes, this decrease is much more profound. Right?

  • The correlation coefficient -- negative correlation coefficient is much greater. The general phenomenon,

  • again, is that activating works are negatively related with mutation density, and repressive

  • marks are positively related with mutation density; again, places where cancer doesn’t

  • need functional genes to work have reduced density of mutations. And I’ll come back

  • to that point. Now, back to specific cell types; so if I take, for example, mutations

  • in liver cancers and information about non-methylation marks in liver and information about melanocytes

  • -- and I would also look at melanoma mutationswhat I observe is that, if I condition

  • to the right cell type, the other cell type carries no information, right? So if I check

  • liver cancer and melanoma and I check data on methylation in liver cells, hepatocytes,

  • and melanocytes, if I would know about melanocytes, liver cells, and know information to mutation

  • density in melanoma, if I would know about liver cells, melanocytes, don’t have any

  • information to mutation density in liver cancer. Okay, so now these observations, they hint

  • at the importance of features; they hint at multiple features; they hint at the importance

  • of correct cell type. Now what are we going to do? We have highly dimensional dataset.

  • Now for some reason, our projects involved in the study are like [unintelligible] progression,

  • and I know there are many methods probably bioinformaticists in the room who like other

  • methods, but I just follow projects in the study, so positive [unintelligible] selected

  • random course regression for the analysis, much in learning method. So what you do, you

  • throw everything into it and we show that we can actually predict the mutation density

  • per mega-base with fairly remarkable accuracy not every cancer, but it’s -- over 80 percent

  • of variants can be explained in whole bunch of cancer types.

  • Now, because it’s random forest, you can look at the features that contribute to the

  • exclusifier [spelled phonetically], and this is the pattern: so if we look at melanoma,

  • I see some with the filial cells but most of the features come from melanocytes. If

  • I move to liver -- and this is of course small chance of very large metrics like this, right?

  • So I would look at what features significantly contribute to the predictor for liver cells

  • and which features come from liver cells. Then I would look at colon cancer and there

  • is the same match, multiple myeloma, and so forth. There is one cancer where it doesn’t

  • work, and I think probably didn’t have the right cell type, is lung cancer. So lung cancer

  • this trick didn’t work. Okay, now I can do the following trick: I can take all of

  • my features and cluster them by gene. And I can look at, for which of the tissues collectively,

  • what is the variants explained by the classifier if I take only the relevant cell type versus

  • all of the relevant tissues and cell types? And again, for melanoma, I see that I can

  • explain most of variation looking only at melanocytes. The effect is not as dramatic,

  • but also I can select the right cell type in liver cancer, and so on. So, looking at

  • this, what we decided to do, we decided to develop a simple classifier. So now were

  • turning this on its head. So what I told you so far is this: there are regions of the genome

  • where genes are expressed, where chromatin is active. These regions have less mutations

  • than regions which are heterochromatic; later in replication; not associated with active

  • chromatin and transcription. And I told you that, looking at epigenomic data, if you have

  • the right cell type, you can actually predict a mutation profile over the mega-base.

  • So now what we decided to do, we decided to turn it on its head because we can develop

  • a predictor of cell type of origin of cancer from mutational data. So I look at the genome

  • and I scan database of Epigenome Roadmap, and I’m trying to predict, what is the cell

  • which is cell or origin of this cancer, right? Again, whenever I ran the true experiment

  • taking tumors of a known primary, predicting and acting on them clinically, this wasn’t

  • done. So what we did, we did very simple experiment. We took individual samples from our datasets

  • and we developed a classifier again looking at significant features that explain variation

  • of mutation-regular mega-base. And what we see for most of cancers, we predict with overall

  • accuracy of 88 percent what is the right cell type. We did not predict lung cancer, as I

  • mentioned; again, probably we don’t have the right epigenomic profile. There was almost

  • an anecdote with esophageal cancer because the original cell type which the algorithm

  • selected, we believed, is a false positive. But then, looking at the literature, we realized

  • that these are exact cells that people believe give rise to esophageal cancer. So it lists

  • -- with some reasonable accuracy, this trick works.

  • Okay, so now there is an important question. The important question is, these are cells

  • of origin, and we heard today about epigenomic modification due to cancer progression. This

  • was my original thinking. This is this whole talk about failures of my original hypothesis,

  • by the way. So my original thinking was the following: we observed that cancer avoids

  • mutations in the regions it needs mutations. We know that this is determined by epigenomic

  • profile. Now we can think about evolution of mutation rate, and this is what were

  • doing on theoretical side of things, which I don’t have time to present. And you may

  • think about the following idea, “Okay, so I -- cancer starts frequently at high mutation

  • rate background, then mutations keep happening, and of course many of these mutations may

  • potentially be deleterious for the tumor. There would be selection to suppress these

  • mutations if you look at expression data. Both basic scission repair system and nucleic

  • acid scission repair systems are overexpressed, like later melanoma compared to earlier melanoma.

  • So I thought that this is active selection of mutation rate, right? To eliminate mutations

  • where a tumor needs them.”

  • So then we ask the following question. And we didn’t have plenty of data, but there

  • are two cell types where we did have data. So we can take -- we can see how mutation

  • densities predicted by epigenomic features of liver cells versus epigenomic features

  • of liver cancer cells, right? And what we see is that we can predict much better using

  • liver cells than liver cancer cells. In melanoma, there is even more interesting experiment

  • because we take the same cell line, and we can see that all peaks in cell line don’t

  • predict as well as all peak within melanocytes. But if we take specific cell line or specific

  • to melanocytes, these are pretty much non-predictive, and melanocyte peaks that are not observed

  • in cancer still predict mutation density. I found it very surprising. I think one possible

  • explanation is a lot of mutations we observe in tumors actually arrives very early, before

  • epigenomic changes associated with cancer. Okay, so I see John’s standing there, so

  • I’m going to my conclusion slide. Basically, again, mutation density at one mega-base in

  • cancer is very strongly associated with chromatin organization. This association is very highly

  • specific with respect to cell of origin, and it looks like cancer genome has enough information

  • about cell origins, so you can actually predict what is the cell of origin based on cancer

  • genome. Thank you, my lab. So this is how seriously we think about our projects. Paz

  • Polak, who recently left the lab, contributed to most of this. So he’s here, listed with

  • the lab members, and of course thanks going to John Stamatoyannopoulos and Bob Thurman,

  • and to Rosa Karlic and Amnon Kore, who were all collaborators. Thank you.

  • [applause]

  • Male Speaker: Fabulous. The thing that the tumors are actively

  • going at silencing some of these mutations in order to transit from a normal state to

  • a tumor state, if indeed the mutations are more likely to arise in the normal tissues

  • than in active process?

  • Sunyaev Shamil: So I’m a little bit in disarray with my

  • thinking right now. So my original thinking was that, if you look at mathematical models

  • of evolution of mutation rate, you find that, in a sectional systems, selectional mutation

  • rate is much more efficient than in sectional systems. So in principle, cancer would have

  • the ability to change mutation rate, especially if what were seeing is cell-type specific

  • to silence mutation in regions where it needs. And I found this model intellectually pleasing;

  • I don’t think this is what were observing. I think what were observing possibly is

  • the very simple fact that most of cancerous clonal and most of these mutations possibly

  • accumulated very early in, like, before cancer progression. But to tell you the truth, by

  • now I don’t know. I don’t have any good model anymore.

  • Male Speaker: Fantastic. So I was wondering -- you -- in

  • the later part of the talk, you said the correlation with when you get the chromatin states from

  • tissues versus cancers, the cancers that you show is cell-lines so is it -- is that -- would

  • that be a factor that cell-lines are very selective and they probably have very selective

  • chromatin states very different from what the original cancer would be.

  • Sunyaev Shamil: Yeah, that’s --

  • Male Speaker: So the mutation rate would be better if you

  • take directly cancer tissues than cancer cell-lines?

  • Sunyaev Shamil: This may be the case. So in principle, if

  • there is epigenetic control of mutation rate, I would be surprised that it would be different

  • in cell-lines compared to cancers, but the observation is absolutely correct. So the

  • main result on the paper were done on primary tumors, and the last couple slides were comparison

  • with cell-line data. And we didn’t have matching datasets, so that’s of course a

  • deficiency, but I do not see an obvious hypothesis why there would be a substantial difference,

  • because cell-lines have been there for reasonably long time and if mutations are -- keep happening,

  • and would be associated with epigenomics of cell-lines, we should observe it.

  • Male Speaker: I have another question. It’s a very general

  • question, so it’s been known in the field and very much propagated by followers for

  • many years that the mutation rate is constant between cancer cells and normal mutation rate.

  • So can you comment on that? What is it now, where does it stand?

  • Sunyaev Shamil: It’s an interesting -- it’s a very interesting

  • question. So I think there is disagreement within the field whether mutation rate is

  • elevated during -- in cancer, or it’s not elevated. So people who believed that it is

  • elevated, they point to A, a lot of mutator genes associated with cancer, both germ-line

  • predisposition and these are earlier events in cancer. For example, we see a lot of samples

  • in melanoma with changes in nucleic acid scission repair pathways. Theoretically, it fits very

  • well because you would have changing mutator and would hitchhike with -- together with

  • cancer drivers. Now there are people who don’t really believe that there is substantial difference,

  • and especially if you look at mutation density. If a lot of these events happen early, people

  • point to dependency on age of diagnosis and this type of observations. I don’t have

  • a strong opinion either way; I find arguments of increased mutation rate very logical, and

  • also I’m happy to live in the world where it’s grey in some cases, especially where

  • you have mutator mutations. Mutation rate may be elevated, and in other cases maybe

  • it’s the same. You just hit randomly driver gene.

  • Male Speaker: Thanks.

  • [applause]

  • [end of transcript]

Sunyaev Shamil: Well, it’s great to present for this group.

字幕與單字

單字即點即查 點擊單字可以查詢單字解釋

B2 中高級 美國腔

遺傳學的表觀遺傳控制:表觀基因組對突變的影響----Shamil Sunyaev。 (Epigenetic Control of Genetics: the Impact of Epigenome on Mutation - Shamil Sunyaev)

  • 130 10
    Chou Jasper 發佈於 2021 年 01 月 14 日
影片單字