比博士更瞭解研究 - 從零研究到科學英雄 (Understand Research Better Than a Doctor | Research Zero to Science Hero)

字幕列表影片播放

Reading the primary literature, meaning research articles, is intimidating, confusing,
and seems out of reach for most people who aren't trained scientists.
But it doesn't have to be that way. Let's cover how to make reading research articles easy, fun, and approachable.
Dr. Jubbal, MedSchoolInsiders.com.
As Richard Feynman once said,
“The first principle is that you must not fool yourself — and you are the easiest person to fool.”
We'll equip you with the tools and strategies to not be fooled with regards to scientific research moving forward.
As part of my neuroscience major in college,
we were required to read dozens of research articles related to the field.
We spent hours going over every single article, dissecting its strengths, weaknesses,
and working to accurately assess what value it provided to the scientific community.
Yet despite reading dozens of these neuroscience papers, when I entered medical school,
I still didn't enjoy reading the primary literature.
In fact, I avoided doing so unless absolutely necessary.
It wasn't until I began doing research of my own, read hundreds of papers,
and published dozens of my own that it all began to click.
Being able to understand and assess the scientific literature is so important
to parsing out the noise from the truth, but it doesn't have to take you years like it did for me.
When it comes to scientific studies, there are different levels of evidence.
Not all studies are created equal, and the study design is a big part of how strong the evidence is.
At the top, randomized controlled trials are the gold standard, the cream of the crop.
Below that, prospective cohort and case-control studies.
Prospective means you follow the subjects over time to see the outcomes of interest.
Third, we have retrospective cohort or case-control studies, meaning you already have the outcomes of interest,
but look back historically and make interpretations.
Fourth, we have case series and case reports, which are investigations into individual patient cases.
There are other levels, such as systematic reviews, meta-analyses, expert opinion, and others,
but for simplicity we'll stick to these four levels.
This ranking may not make sense just yet, and that's ok.
We'll now cover the elements of research, and how they apply to each type of research study,
and it will all begin to come together.
Epidemiology, coming from the Greek term epidēmia, translates to prevalence of disease.
It is the branch of medicine dealing with the incidence, distribution, and control of diseases.
If the primary aim of science is discovering the truth and determining cause and effect,
then it's important to note that most observational epidemiological studies cannot establish causality,
and therefore they cannot soundly accept or reject a hypothesis.
Strong correlations found in observational studies can be compelling enough to take seriously,
but there are limitations.
When it comes to observational studies, compared to experimental studies,
we have cohort, case-control, and cross sectional.
Without diving into the differences of each type of observational study,
understand this generally entails observing large groups of individuals
and recording their exposure to risk factors to find associations with possible causes of disease.
If they're retrospective,
they're looking back in time to identify particular characteristics associated with the outcome of interest.
These types of studies are prone to confounding and other biases, which will take us further from the truth.
We'll cover this in more detail shortly.
Prospective cohort studies recruit subjects and collect baseline information
before the subjects have developed the outcome of interest.
The advantage of prospective studies is they reduce several types of biases
which are commonplace in retrospective studies.
There are four steps to the scientific method:
First, make an observation.
Second, come up with a (falsifiable) hypothesis based on this observation.
Next, test the hypothesis through an experiment.
And last, accept or reject the hypothesis based on the experiment results.
To determine causality, meaning if some cause results in an effect, like whether or not red meat causes cancer,
the hypothesis must be adequately tested.
This is the part that is most commonly overlooked, particularly in disciplines such as nutrition,
because doing experiments necessary to establish causality presents several obstacles.
For this reason, many researchers turn to doing easier observational studies,
and I'm guilty of this too, but the problem is that most of these don't get us closer to the truth.
The gold standard for determining causality is a well designed randomized controlled trial, or RCT for short.
The researchers create inclusion and exclusion criteria to gather a group of subjects qualified for the study.
Then, they randomize subjects to two groups.
For example, one group receives drug A, and the other group receives placebo.
By randomly allocating participants into the treatment or control groups,
much of the bias from observational studies is substantially reduced.
In short, finding cause and effect becomes much easier.
If randomized controlled trials are so much better, then why aren't they always used?
First, they can be very expensive.
One report looking at all RCTs funded by the US National Institute of Neurological Disorders and Stroke
found 28 trials with a total cost of $335 million.
Second, RCTs take a long time.
According to one study, the median time from start of enrollment to publication was 5 and a half years.
Third, not all RCTs are created equal, and it's quite challenging to conduct a high quality RCT.
These studies must have adequate randomization, stratification, blinding, sample size, power,
proper selection of endpoints, clearly defined selection criteria, and more.
Fourth, ethical considerations.
If you're assigning someone to be in the control or experimental group,
you can assign them to something you think will be helpful,
like a medication or other treatment, or not have an effect, like placebo or control group.
But you wouldn't be able to assign someone to a group that you would expect to harm them –
can you imagine assigning some teenagers to smoke cigarettes and some not to?
This is a key distinction between RCTs and observational studies.
While RCTs seek to establish cause-and-effect relationship that are beneficial,
epidemiologists seek to establish associations that are harmful.
To better understand the strengths and weaknesses of any particular research study,
we'll need to explore statistics. Don't worry, we're gonna keep this basic, nothing too crazy.
Relative risk, in its simplest terms, is the relative difference in risk between two groups.
If a certain drug decreases the risk of colon cancer from 0.2% to 0.1%, that's a 50% relative risk reduction.
Decreasing the initial risk, .2%, by half, gives you a risk of .1%.
The actual change in the rate of the event occurring would be the absolute risk reduction,
which in this instance would be 0.1%, because .2 - .1 = .1.
The way most studies, and especially journalists,
summarize and report the results is through relative risk changes.
This is much more headline-worthy,
but obscures the truth where absolute risk would be more useful at communicating true impact.
But what's more likely to get more clicks? “New drug reduces colon cancer risk by 50%!”
Again, that would be relative risk reduction.
Alternatively, “New drug reduces colon cancer risk from 2 per 1000 to 1 per 1000”.
That would be absolute risk reduction.
In the world of research, a bias is anything that causes false conclusions and is potentially misleading.
Let's start with one of the biggest offenders: confounding.
A confounding variable is one that influences both the independent and dependent variables,
but wasn't accounted for in the study.
For example, let's say we're studying the correlation between bicycling and the sale of ice-cream.
As the bicycling rate increases, so does the sale of ice cream.
The researchers conclude that bicycling causes people to consume ice cream.
The third variable, weather, confounds the relationship between bicycling and ice cream,
as when it's hot outside, people are more likely to bicycle and also more likely to eat ice cream.
Another bias that isn't properly appreciated, particularly in the world of nutrition, is the healthy user bias.
Health-conscious people are more likely to do certain activities.
For example, most health-conscious people have heard that red meat is bad,
and therefore they're less likely to eat red meat.
People who eat more red meat are usually less health-conscious,
and therefore are also more likely to smoke, not exercise, and consume soft drinks.
Therefore, when an observational study comes out comparing those who eat red meat to those who don't,
we cannot actually conclude it's due to the red meat and not these other factors.
Even when researchers are aware of these factors, they are virtually impossible to properly account for.
Selection bias refers to the study population not being representative of the target population,
usually due to errors in selection of subjects into a study, or the likelihood of them staying in the study.
In the lost to follow-up bias, researchers are unable to follow up with certain subjects,
so they don't know what happened to them, such as whether or not they developed the outcome of interest.
This leads to a selection bias when the loss to follow up
is not the same across the exposed and unexposed groups.
There are many other biases, but we don't have time to explore each and every one here.
Good research minimizes the effects of confounding and biases. How do we do that?
Randomization is a method where study participants are randomly assigned to a treatment or control group.
Randomization is a key part in being able to distinguish cause and effect,
as proper randomization eliminates confounding.
You cannot do this in observational studies, as subjects self-select themselves into whichever group.
When confounding variables are inevitably present,
there are statistical methods to “control” or “adjust for” them.
The two are stratification and multivariate models.
Stratification fixes the level of the confounders and produces subgroups within which the confounder does not vary.
This allows for evaluation of the exposure-outcome association within each stratum of the confounder.
This works because the confounder does not vary across the exposure-outcome in each level.
Multivariate models are better at controlling for greater number of confounders.
There are various types, one of the most common of which is linear regression.
In its simplest terms, regression is fitting the best straight line to a dataset.
Think back to algebra and y = mx + b.
We're trying to find the equation that best predicts the linear relationship between the observed data,
being y, and the experimental variable, being x.
Logistical regression deals with more complex relationships with multiple continuous variables.
The important thing to note is that confounding often still persists, even after adjustment.
There are almost an infinite number of possibilities that can confound an observation,
but researchers can only eliminate or control for the ones that they are aware of.
Alex Reinhart, author of Statistics Done Wrong, points out that it's common to interpret results by saying,
“If weight increases by one pound, with all other variables held constant,
then heart attack rates increase by X percent.
You can quote the numbers from the regression equation,
but in the real world, the process of gaining a pound of weight also involves other changes.
Nobody ever gains a pound with all other variables held constant,
so your regression equation doesn't translate to reality.”
Because confounding is such a central limitation to observational research,
we must be careful when drawing conclusions from these types of studies.
With observational epidemiology, it's incredibly difficult to prove an association right or wrong.
While a small minority of these associations may be causal, the overwhelming majority are not.
And therefore, we should err on the side of skepticism.
When you propose a hypothesis in a research study, there are two forms:
the null hypothesis, meaning there is no relationship between the two phenomena,
and the alternative hypothesis, meaning there is a relationship.
The study seeks to provide data to suggest one over the other — note that science does not prove things,
as you could in math, but rather provides evidence for or against.
The /p/-value is the scoring metric that makes the final call.
It's the probability of obtaining these test results from chance alone, assuming the null hypothesis is correct.
In other words, it's the likelihood that no relationship exists, but the findings occurred due to chance alone.
A smaller /p/-value more strongly rejects a null hypothesis.
A larger /p/-value means a larger chance that the effect you are seeing is due to chance,
thus supporting the null hypothesis.
The /p/-value cutoff is assigned by the researchers to determine the cutoff at which statistical significance is achieved.
We call this number α, and it is usually set to 0.05, meaning 5%, or sometimes lower.
If the /p/-value is less than 0.05, we say the results are “statistically significant,”
and the null hypothesis is rejected.
There is a chance we are wrong, and we have terms for this, too.
When there's no true effect, but we think there is, we call this a false positive, or a Type I error.
We failed to reject the null hypothesis even when it was true.
The opposite, where there is an effect but we think there isn't, is called a Type II error.
We accepted the null hypothesis when we shouldn't have.
The chance of committing a Type II error is called β.
Statistical power is the probability that a study will correctly find a real effect, meaning a true positive.
This translates to Power = 1 - β. Power is influenced by four factors:
The probability of a false positive, which is α, or the Type I error rate.
The sample size (N), the effect size, meaning the magnitude of difference between groups.
And the probability of a false negative also called β or the Type II error rate.
Keep this in mind, as we'll be coming back to it.
A corollary to /p/-values are confidence intervals.
To find the confidence interval, you take 1 - α, so if α is commonly set to 0.05,
the confidence interval would be 0.95, or 95%.
When reading a study, you can quickly determine if statistical significance was achieved by
whether or not the confidence intervals include the number 1.00.
If it's larger, like 1.05 - 1.27, then a positive association is present with statistical significance,
and if it's smaller, like 0.56 - 0.89, then a negative association is present with statistical significance.
Confidence intervals are commonly misunderstood. With a 95% confidence interval of 1.05 - 1.27,
this does not mean that we are 95% confident that the true value is between those two numbers.
Rather, if we were to take 100 different samples and compute a 95% confidence interval for each sample,
then 95 of the 100 confidence intervals would contain the true value.
In other words, a 95% confidence interval states that 95% of experiments conducted in this exact manner
will include the true value, but 5% will not.
Lastly, let's clarify statistical significance versus practical significance.
A study can find statistical significance but have no practical significance.
This is more common than you think. A common case where this happens is when the sample size is too large.
The larger the sample size, the greater the probability that the study will reach statistical significance.
At these extremes, even minute differences in outcomes can be statistically significant.
If a study finds that a new intervention reduces weight by 0.5 pounds, who cares? It's not clinically relevant.
The reverse is also true, where a study demonstrates practical significance,
yet was unable to achieve statistical significance.
If we revisit the four factors that influence power,
we see that sample size is the most easily manipulated to over- or underpower a study.
Often times, observational studies are overpowered with thousands of subjects,
such that any minute difference may yield a statistically significant result.
Other studies experience the opposite, whereby they have a small number of subjects,
and even if there is a real difference, statistical significance cannot be demonstrated.
Each of these components in isolation isn't enough to make you an expert at deciphering research studies.
However, when you put each piece into context and understand the /why/ of how sound science is conducted,
you'll become far better equipped to think critically and make sense of the primary literature yourself,
without having to rely on lazy thinking and black and white summaries from journalists.
If you enjoyed this video, you'll love my weekly newsletter.
It gets sent out once a week and is super short.
Check it out at medschoolinsiders.com/newsletter.
If you ever change your mind, it's one-click to unsubscribe, and I promise I will never spam you.
Thank you all so much for watching.
This was an incredibly challenging video to make,
as there's so much to research and it was just challenging to fit it all in a single video.
Big shout out to Peter Attia's Nerd Safari series which was the inspiration and foundation for this video.
If you liked this video, let us know with a thumbs up
to keep the YouTube gods happy.
Much love to you all,
and I will see you guys in that next one.