字幕列表 影片播放
[MUSIC PLAYING]
PING YEH: Hello.
I'm Ping Yeh of Google at Quantum team,
and I'm going to talk about the statistical significance
of the quantum supremacy experiment
with our Sycamore processor.
So a quick reminder on statistical significance,
you start with a null hypothesis, H0 or H
null, which means that there's nothing interesting.
And you have a statistic called F, and also
a probability distribution function of F given H0.
Then you go ahead and measure F in your data.
Let's say you come up with a value of F hat,
and the tail probability gives you the p-value.
And if p-value is smaller than a pretty fine significance level
alpha, then we say it is statistically significant.
And we reject H0, OK?
So that's how it is.
And for major scientific claims, we usually
set alpha to a so-called 5 sigma level
for Gaussian, which corresponds to this value.
So a question is which null hypothesis
are we talking about rejecting for a [INAUDIBLE] purpose?
OK so before going to that, I have a new hypothesis
for my talk.
So hopefully, you can help me reject it at the end.
OK?
So the null hypothesis for the quantum supremacy--
the value F here is the fidelity of the Sycamore processor
for a circuit.
So the first one is that F is consistent with zero.
That means the processor has lost quantum coherence.
And the second one is that F is not zero,
but it's low enough so the classical simulation is easy.
So that means no supremacy.
OK, so we want to recheck both of these, no quantum
and no supremacy hypotheses.
What we do is, of course, we follow this p-value thing.
And here, apparently, if you could reject the second one,
the first one is rejected.
So we set our H0 to be the second one,
and we set this threshold to be 0.1%, which
comes from a complex analysis of classical simulations.
So at which value the simulation should be already hard enough?
OK?
And we want to reject it.
That means we are significantly above it.
All right, so the tail dr is that with 53 qubits,
20 cycles of circuit, and 3 million samples
per circuit with 10 different random circuits
would come up with an F hat of this value, which
corresponds to a p-value of about 6.4 sigma in Gaussian.
OK, so that, of course, is above 5 sigma, so that's good.
And of course, there is a systematic uncertainty
on the value of a hat.
So there's an uncertainty here.
So we estimated the uncertainty to be 4 times 10
to the minus 5.
And the p-value with that distribution here of F hat
is estimated to be 2 times 10 to the minus 10,
which corresponds to about 6.2 sigma in Gaussian.
So again, both null hypotheses are rejected.
OK?
So if you are interested in knowing
how we came up with those numbers and this function,
let's continue.
So there are a few factors in coming up with--
in getting this p-value.
First is the distribution function of the H0.
The second is the estimation of F hat.
And the third is the distribution around F hat.
So let's try to get those.
First of all, the data set I used for analysis can be
downloaded here, bit.ly/quantum supremacy dataset.
So Google's quantum supremacy experiment
is based on the quantum circuit sampling.
So this is an illustration of a random circuit.
And at the end of the circuit, we
come up with a wave function, psi,
which is a linear combination of, of course,
2 to the n different computational basis states.
So you sample those bitstrings from this n state many times,
and the probability of sampling a particular bitstring
is basically just [INAUDIBLE] squared.
OK, that's standard quantum mechanics.
And for random circuit, those probabilities actually
follow a distribution. a so-called Porter-Thomas
distribution.
And here I'm using a variable called scaled probability,
which is the dimension of [INAUDIBLE] space times
the probability itself.
Then the distribution becomes a very simple
exponential distribution, which is
independent of the number of qubits.
OK?
It's easier to analyze.
All right.
So now we do sampling.
So typically we sample about millions
of bitstrings for each random circuit.
And for 53 qubits, that's, of course, much, much smaller
than the [INAUDIBLE] space.
So it's a tiny sampling.
There are two different sampling strategies
that we are interested in.
The first one is a uniform random sampling.
That means each qubit gives you a 0 or 1 in a 50%, 50% chance.
And then the bitstring you sampled and the x value--
I mean, the scaled probability value you get--
will be distributed according to the population
distribution, which is the Porter-Thomas itself.
And this is what a decoherent quantum
computer would give you.
And if it is a perfect quantum computer,
then the bitstrings with higher probability
will be sampled more often.
So the distribution becomes x times exponential.
OK?
So I call these two distributions P1 and P2.
And it so happens they look like this.
And the average value of P1 is 1, of P2 is 2.
And this comes in very handy when
we want to estimate fidelity.
So we have error model, which is a linear combination
of the perfect density metrics and a totally random metrics.
So the corresponding distribution of the scale
probability goes like this.
It's also a linear combination of the two distributions.
And if we want to measure-- or we can calculate a mean value
out of this distribution, you can find out
it's actually just very simple.
It's an F plus 1.
So that means that the mean value of the measure x
is a fidelity estimator.
And this is our so-called linear cross-entropy fidelity formula.
OK?
And now we want to see whether-- how
this x distribution looks like.
So we took data from a Elided circuit
with 53 qubits, 20 cycles, and 3 million measurements.
Here Elided circuit means we remove 22 quibit gates out
of this circuit to make the computation
in classical computer as possible.
And we estimate the fidelity to be that value, 0.18%.
OK, the next is we want to see whether that distribution looks
like what we predicted with the survey.
So we overlay that.
OK, just by eyeballing it, it looks similar.
We want to quantitatively measure how similar they
are with each other.
So we use the Kolmogorov-Smirnov test.
So it will give you a p-value of the kind
that you can interpret as a probability,
that the data is drawn from this distribution function.
So a p-value close to 1--
in this instance, 0.98 means that it's very close.
We have high confidence that this is from that distribution.
And if we change the theoretical distribution
from the estimated fidelity to, for example, 0 fidelity,
the p-value goes down to very low.
OK?
So we have confidence here that the model PDF is actually
a very good description of the data.
All right, and the next is we can try--
move on to estimated statistical uncertainty on the [INAUDIBLE]
fidelity.
And because it is an error--
it is a mean, so error on mean is kind of the standard way
to do that.
So from data, we estimate to be this value,
and from the theoretical PDF you can also estimate.
And you find out that there is an excellent agreement
between the two.
So that means that theoretical prediction
can be used actually for our new hypothesis distribution.
And furthermore, we verify the statistical uncertainty
by bootstrapping because we have this central limit
theorem that the distribution of the mean value
should go with the Gaussian when you have--
when the number of samples goes to infinity.
So we perform 10,000 bootstraps.
Each bootstrap sample contains 3 million samples.
And the mean value, or the fidelity from each bootstrap
sample, is plotted here.
And this is the histogram of them.
So it indeed looks like a very good fit to Gaussian,
and the width, the standard deviation,
is very close to the estimated one.
So here we know that OK, the new hypothesis PDF
is a Gaussian with our theoretical--
I mean, this structural fidelity of 0.1%
and standard deviation of theoretical prediction.
All right.
Now we have more than one random circuit.
We have 10 of them.
So we can combine them.
And there we used two different ways of combining them,
and we get I think basically identical results.
And with a combined sample, we can again
test the agreement between theory and data.
And we can see that with this combined sample,
there are 30 million samples.
So the p-value is still reasonable, 66%.
But if you say what's the p-value
for structural fidelity, it becomes very, very low.
And for 0 fidelity, it's even lower.
So this give us more confidence that the combination process
makes sense.
All right.
So the next is we need to go into a supremacy region, where
the classical computation of those probabilities
is not possible.
But nonetheless, we need to estimate the significance
of full circuit.
So we go to a lower number of qubits,
from 12 qubits to 38 qubits, and check
the ratio between full circuit and [? Elided ?] circuit
in a similar way of the [? Elided ?] circuit
in 53 qubits.
And we found out the ratio of these two fidelities
is about 97%.
So that is a factor we apply to the combined
fidelity for our estimate of the full circuit fidelity, which
is this value.
And then the next is the systematic uncertainty.
So there are many, many sources of uncertainty here,
and they are captured in one big number, which is the drift,
so how fidelity drifts with time after a calibration.
And here we took data for 17 hours
on the same random circuit.
And we found out it drops down--
not too much, but kind of visibly.
Within this range, I think, the linear fit
seems to be working OK.
So we use a linear fit.
The data of supremacy experiment is taken in the first hour.
So we use the variations of the residual in the first hour,
plus the variation of the intersect as a variance
of the fidelity itself.
And we treat that as a systematic uncertainty.
And we take that ratio and multiply the ratio
to the estimated fidelity to be the final estimate
of the systematic uncertainty.
So we get that number.
So now coming back to this factor, all the factors
for coming up with calculating the p-value,
we have all of them estimated.
So it's straightforward to plug them in to get a p-value.
So for this tail probability of F hat,
we get a p-value of this number, which corresponds
to about 6.4 sigma in Gaussian.
But then, this is one I've had.
We do have a systematic uncertainty here.
So how do we deal with that?
So the way we deal with it is that OK, we can try.
For example, we subtract by 5 sigma
and see what's the p-value here.
Of course, p-value is higher because you're
integrating with more area.
But then there are more infinite number
of possible choices of the value for checking the p-value.
So how do we do that?
We found out that actually we can do an expectation
value of p-value by integrating the p-value
with this Gaussian distribution around F hat.
And after we do that, we'd get a p-value of 2 times 10
with minus 10, which corresponds to about 6.2 sigma in Gaussian.
So that's our final p-value for the whole quantum supremacy
experiment.
OK, so conclusion, both null hypotheses
are rejected, with more than 5 sigma
of statistical significance, along with [INAUDIBLE]
several checks that give us some confidence on those numbers.
And the dataset's available here.
OK?
And coming back to the null hypothesis on my talk--
so please help me reject this hypothesis
by leaving comments below.
Thank you very much.
[MUSIC PLAYING]