字幕列表 影片播放 列印英文字幕 Data science plays a key role in the selection of influenza vaccines. What may sound like an excerpt from a sci-fi novel is, in fact, a real-life application of modern data science techniques improving lives today. In this video, we'll talk about viruses and vaccines. We'll explore machine learning's role in the preparation of influenza vaccines and the ways to visualize and analyze genome data using data science techniques. (These include ML and different substitution models). We'll also mention platforms, where you can store and analyze gene data or even your own genome if you've got it. But first things first – let's see what viruses are and how they operate! What are viruses? Viruses are small cells, which can cause illness in different organisms, like birds, mammals, and humans. In the case of Influenza, there are two distinct surface proteins N and H, which it uses to enter a host or host cells (the H protein) and replicate (the N protein). Now, these proteins vary a bit in their structure, so different versions of them are identified by a number. An example of that is the H3N2, which contains the third variant of the H protein and second variant of the N protein. Both H3N2 and H1N1 are called subtypes of Influenza. And they're also the two most common subtypes to infect humans. H3N2 is an important example of the flu virus. Also known as the Hong Kong flu, it caused a pandemic in 1968, resulting in over a million deaths worldwide. The virus was highly contagious and spread quickly through the population, starting from Asia and later reaching America, via returning troops from Vietnam. By the end of 1969, the virus had reached parts of Africa and South America, as well. And if you thought this was bad, hold on to your hats! There's even a more dangerous influenza subtype: the H1N1, also known as the Spanish flu. H1N1 was responsible for the swine flu pandemic of 2009, as well as the devastating Spanish flu of 1918. It was extremely lethal, resulting in over 30 million deaths worldwide. The reasons behind the high mortality of the virus still remain a mystery. While some scientists suggest an unusually aggressive form of the virus was involved, others claim it was the circumstances surrounding the infection (overcrowded and unhygienic camps during the war) that contributed to the high death toll. At this point, you're probably thinking: “If this virus can be so dangerous or potentially lethal, how can we protect ourselves against it?” The short answer: influenza vaccines, commonly known as flu shots. So, what is a vaccine and how does it work? Nowadays, vaccines can include forms of a weakened virus, which our immune system can train to recognize and deactivate. In the case of the influenza vaccine, it includes some forms of H1N1 and H3N2 viruses we talked about earlier. Influenza vaccines are formulated annually. But why do they need to change the vaccine each year? The answer lies behind two phenomena in genetics: antigenic drifts and shifts. Hold on, wait, what are those? Let's start with antigenic drift. Imagine you have a group of people, stranded on a raft in the sea. Over time the people on the raft slowly change appearances, they grow a beard, hair gets longer, they get more tanned. In essence, they remain the same people but slightly changed. This is what antigenic drift means - slow changes over time. And what about an antigenic shift? Now, if two people on the raft mix their genomes (as none of the kids are calling it) and create a progeny, a.k.a. a child, it will contain a mixture of both their traits. So, the antigenic shift is the exchange of genetic material and the creation of a new organism. Because of the antigenic drift Influenza mutates and changes quickly, making it difficult to find a vaccine against all possible mutated viruses. The antigenic shift also causes the emerging of new influenza subtypes, such as the H3N1 or H1N1 we talked about earlier. So, when scientists decide which virus types to include in the vaccine, they need to think about how to make it most effective. And that depends on how closely the vaccine resembles the types of influenza viruses which will dominate during the upcoming flu season. This is where data science comes into play. Based on existing data about former and current virus spread and variants, scientists try to model and predict the future behavior of viruses, using machine learning algorithms. To do that, they first need an appropriate way to handle information about viruses, or more precisely their genomes. This is done via analysis of genetic data. But what's genetic data, exactly? Genetic data includes the genome of organisms or some parts of it. It usually consists of DNA, represented in the form of strings. In the case of Influenza, it contains RNA, which some viruses have as their genetic material. Alright! Once we have our genetic data, it's time to decide how to best visualize it. Though there are many options, we'll talk about one in particular. The staple phylogenetic tree. Phylogenetic trees, also known as evolutionary trees, represent the closeness of different species in terms of their genetics. Basically, they are a diagram showing the evolutionary relationships between species. In the case of influenza, such trees can be used to visualize different strains of the virus. Let's put all of this together and get to the final point: prediction using data science. Using information obtained from phylogenetic trees combined with different machine learning techniques, you can model future behavior or spread of the Influenza virus. One of the methods involves nonnegative least-squares optimization, which measures distances between branches of a phylogenetic tree. It uses a bidirectional weighted phylogenetic tree and determines sets of coding changes on the surface of the H protein. The model can then identify the antigenic impact of different influenza strains. Another way to perform phylogenetic analyses is to use the PAML package, which contains programs for phylogenetic analyses of genetic data using maximum likelihood (ML). How it's done? By taking a set of trees and evaluating their log-likelihood values under different models. These models estimate some parameters while allowing for others to vary. This way they can incorporate the variety of gene types in influenza strains and their surface H protein. Of course, there are other methods you can use to make predictions in biology. Our aim is to provide you with an overview of two main ones, and we trust you can delve into and explore other methods on your own if you find this topic interesting. And that pretty much brings it into a close. We went all the way from learning about the flu and how a virus works, through the history of the first vaccine and the biggest flu pandemics, to the antigenic shifts and drifts. That was fun, right? We discussed different types of biological data and their visualization. Finally, we learned how to make predictions using different machine learning techniques. But, before we go, let's round off with something about data science and its diverse applications. Data science is not just a tool used in the IT Domain or by large corporations. It plays an important role in (life) sciences and its medical and biological applications are becoming more and more widespread. In fact, big tech companies like Google and Amazon started their own genome projects recently, allowing users to store and analyze their own genome on their respective cloud platforms. Microsoft entered the field too, with the release of Microsoft Genomics on their Azure cloud. So, if the big players are on it, it's a safe bet to assume that genomes and their analytics using machine learning are definitely worth looking into. Ok, guys and gals. I hope we managed to shed light on influenza vaccines and the data science behind them. If you enjoyed the content of our video, please click the like button and share the story with your friends! And, if you're curious to find out more on the topic, you can follow the link to the article in the description. Thanks for watching!
B2 中高級 流感疫苗和生物學中的數據科學 (Influenza Vaccines and Data Science in Biology) 3 0 林宜悉 發佈於 2021 年 01 月 14 日 更多分享 分享 收藏 回報 影片單字