Strong Quantum Computational Advantage Using a Superconducting Quantum Processor

Yulin Wu,1,2,3 Wan-Su Bao,4 Sirui Cao,1,2,3 Fusheng Chen,1,2,3 Ming-Cheng Chen,1,2,3 Xiawei Chen,2 Tung-Hsun Chung,1,2,3 Hui Deng,1,2,3 Yajie Du,2 Daojin Fan,1,2,3 Ming Gong,1,2,3 Cheng Guo,1,2,3 Chu Guo,1,2,3 Shaojun Guo,1,2,3 Lianchen Han,1,2,3 Linyin Hong,5 He-Liang Huang,1,2,3,4 Yong-Heng Huo,1,2,3 Liping Li,2 Na Li,1,2,3 Shaowei Li,1,2,3 Yuan Li,1,2,3 Futian Liang,1,2,3 Chun Lin,6 Jin Lin,1,2,3 Haoran Qian,1,2,3 Dan Qiao,2 Hao Rong,1,2,3 Hong Su,1,2,3 Lihua Sun,1,2,3 Liangyuan Wang,1,2,3 Yu,1,2,3 Kai Yan,2 WeiFei Yang,5 Yang Yang,2 Yangsen Ye,1,2,3 Jianghan Yin,2 Chong Ying,1,2,3 Jiale Yu,1,2,3 Chen Zha,1,2,3 Cha Zhang,1,2,3 Haibin Zhang,2 Kaili Zhang,1,2,3 Yiming Zhang,1,2,3 Han Zhao,2 Youwei Zhao,1,2,3 Liang Zhou,5 Qingling Zhu,1,2,3 Chao-Yang Lu,1,2,3 Cheng-Zhi Peng,1,2,3 Xiaobo Zhu,1,2,3 and Jian-Wei Pan1,2,3

1Hefei National Laboratory for Physical Sciences at the Microscale and Department of Modern Physics, University of Science and Technology of China, Hefei 230026, China
2Shanghai Branch, CAS Center for Excellence in Quantum Information and Quantum Physics, University of Science and Technology of China, Shanghai 201315, China
3Shanghai Research Center for Quantum Sciences, Shanghai 201315, China
4Henan Key Laboratory of Quantum Information and Cryptography, Zhengzhou 450000, China
5QuantumCTek Co., Ltd., Hefei 230026, China
6Shanghai Institute of Technical Physics, Chinese Academy of Sciences, Shanghai 200083, China

(Received 1 July 2021; accepted 26 August 2021; published 25 October 2021)

Scaling up to a large number of qubits with high-precision control is essential in the demonstrations of quantum computational advantage to exponentially outpace the classical hardware and algorithmic improvements. Here, we develop a two-dimensional programmable superconducting quantum processor, Zuchongzhi, which is composed of 66 functional qubits in a tunable coupling architecture. To characterize the performance of the whole system, we perform random quantum circuits sampling for benchmarking, up to a system size of 56 qubits and 20 cycles. The computational cost of the classical simulation of this task is estimated to be 2–3 orders of magnitude higher than the previous work on 53-qubit Sycamore processor [Nature 574, 505 (2019)]. We estimate that the sampling task finished by Zuchongzhi in about 1.2 h will take the most powerful supercomputer at least 8 yr. Our work establishes an unambiguous quantum computational advantage that is infeasible for classical computation in a reasonable amount of time. The high-precision and programmable quantum computing platform opens a new door to explore novel many-body phenomena and implement complex quantum algorithms.

DOI: 10.1103/PhysRevLett.127.180501

Introduction.—In the past years, encouraging progress has been made in the physical realizations of quantum computers [1–4], indicating a transition of quantum computing from a theoretical picture to a nascent technology. A major milestone along the way is the demonstration of quantum computational advantage, which is also known as quantum supremacy. It is defined by a quantum device that can implement a well-defined task overwhelmingly faster than any classical computer to an extent that no classical computer can complete the task within a reasonable amount of time.

To this end, recent experiments using 53 superconducting qubits and 76 photons have provided strong evidence to demonstrate the quantum computational advantage and subsequently disprove the extended Church-Turing thesis [5–10]. Because of continuous improvements in the classical algorithm and hardware [11–13] to compete with the quantum computers, the demonstration of a quantum computational advantage is not a single-shot achievement; the quantum hardware has to be upgraded. It should be emphasized that the increase of qubits is expected to exponentially outpace the classical performance.

Simultaneously increasing the number of qubits and high-fidelity quantum logic gates is also crucial for the rapid development of noisy intermediate scale quantum (NISQ) technology [14] and the demonstration of logic qubit through surface code error correction [15–20]. Indeed, a wide range of near-term applications are being investigated, including quantum chemistry [21–23], quantum many-body physics [24–31], and quantum machine learning [32–38].

Scaling up high-fidelity superconducting quantum processors faces major challenges in the chip fabrication and qubit control. In this work, we make progress toward
building a larger-scale and high-performance superconducting quantum computing system, named Zuchongzhi. The quantum processor is designed and fabricated with a two-dimensional and tunable coupling architecture, which contains a total of 66 qubits. High-fidelity single-qubit gates (average 99.86%) and two-qubit gates (average 99.41%), as well as readout (average 95.48%), are achieved with this processor, while performing simultaneous gate operations on multiple qubits. We use random quantum circuit sampling [6] as a metric to evaluate the overall power of the quantum processor. Experimental results show that our processor is able to complete the sampling task with a system size up to 56 qubits and 20 cycles. We estimate that the classical computational overhead to simulate Zuchongzhi is 2–3 orders of magnitude higher than the task implemented on Google’s 53-qubit Sycamore processor [9]. Therefore, our experiment unambiguously established a computational task that can be completed by a quantum computer in 1.2 h but will take at least an unreasonable time for any supercomputers.

High-performance quantum processor.—The Zuchongzhi quantum processor consists of 66 qubits, arrayed in 11 rows and 6 columns forming a two-dimensional rectangular lattice pattern as depicted in the device schematic in Fig. 1(a). The quantum processor uses Transmon qubits [39], which are essentially nonlinear oscillators with their nonlinearity originating from superconducting Josephson effect. The lowest two energy levels of the nonlinear oscillator are singled out to form the computational space of a qubit, encoded as $|0\rangle$ and $|1\rangle$. Each qubit has two control lines to provide full control of the qubit: a microwave drive line to drive excitations between $|0\rangle$ and $|1\rangle$, and a magnetic flux bias line to tune the qubit resonance frequency. Each qubit, except those at the boundaries, has four tunable couplers to couple to its nearest neighbors [40], with tunable coupling that can be turned on and off with fast control. The tunable couplers are also Transmon qubits [Fig. 1(b)], with frequencies several GHz higher than those of the data qubits and always stay at ground states [41]. A magnetic flux bias line is provided for each coupler to fast tune the coupling strength $g$ between neighboring qubits continually from $\sim -5$ MHz to $\sim -50$ MHz. Each qubit dispersively couples to a readout resonator which couples to a Purcell filter shared between six qubits, frequency multiplexing [42,43] is used to measure the qubit states simultaneously.

All the quantum circuit components of our quantum processor are fabricated on two separate sapphire chips, which are then stacked together with the indium bump flip-chip technique. The quantum processor chip is wire bonded to a printed circuit board, mounted into a well shielded cryostat, and connected to room temperature control electronics through various microwave components in the wiring.

All the 66 qubits and 110 couplers on the quantum processor function properly. Rough calibration results for all these 66 qubits, including their decoherence time $T_1$ (average 30.6 $\mu$s at idle frequencies), single-qubit gate (average 99.86%), two-qubit gate (average 99.24%), readout (average 95.23%), are provided in the Supplemental Material [48]. In this work, we select 56 qubits to demonstrate the random circuit sampling, which are optimized to achieve an optimal computational complexity in the classical simulation.

The quantum processor is controlled and calibrated with a dedicated software system, see the Supplemental Material for details, which includes Refs. [49–51]. We start by calibrating the single-qubit gates. Single-qubit gates are implemented with radio-frequency (RF) pulses as the qubit frequencies are in the range of 4–6 GHz. Coherent RF pulses resonant with the qubit frequency are fed to the qubits through the microwave control lines to excite the qubits. Pulse shaping is calibrated to prevent leakage outside of the computational space [52]. To enable parallel execution of gates, all the couplers are turned off when single-qubit gates are applied to isolate each qubit. Single-qubit gate performance is susceptible to a number of conditions like coupling to a two-level system (TLS), coupling to microwave resonance, microwave cross talk, and residual coupling between qubits. These conditions are mostly qubit frequency dependent, we use an error model.

FIG. 1. Device schematic of the Zuchongzhi quantum processor. (a) The Zuchongzhi quantum processor consists of two sapphire chips. One carries 66 qubits and 110 couplers, and each qubit couples to four neighboring qubits except those at the boundaries. The other hosts the readout components and control lines as well as wiring. These two chips are aligned and bounded together with indium bumps. See Supplemental Material for details about the quantum processor design and fabrication, which includes Refs. [44–47] (b) Simplified circuit schematic of the qubit and coupler.
interaction and single qubit phase accumulations. All these swap between the qubits, as well as controlled phase applied simultaneously [Fig. 2(a)].

to account for bucket of gate error sources and learn an optimal qubit frequency configuration for all qubits through an optimization process. With the optimal qubit frequency configuration, we are able to obtain high performance single-qubit gates for all qubits. We use parallel cross-entropy benchmarking (XEB) [6,53] to benchmark single-qubit gate performance. Results show an average single-qubit gate Pauli error $e_1$ of 0.14% when gates are applied simultaneously [Fig. 2(a)].

For the random circuit sampling task, the iSWAP-like gate [9] is used as the two-qubit gate. We bias neighboring qubits into resonance and turn on a coupling of $g \sim 10$ MHz for a time duration $\sim 32$ ns, which introduces swap between the qubits, as well as controlled phase interaction and single qubit phase accumulations. All these effects can be modeled as the following unitary matrix [9]:

$$
\begin{bmatrix}
1 & 0 & 0 & 0 \\
0 & e^{i(\Delta_x + \Delta_y)} \cos \theta & -ie^{i(\Delta_y - \Delta_{off})} \sin \theta & 0 \\
0 & -ie^{i(\Delta_y + \Delta_{off})} \sin \theta & e^{i(\Delta_x - \Delta_y)} \cos \theta & 0 \\
0 & 0 & 0 & e^{i(2\Delta_{x} - \phi)}
\end{bmatrix}.
$$

Parallel XEB is also employed to benchmark the iSWAP-like gate performance, an optimization process is used to learn the five parameters $\theta$, $\phi$, $\Delta_x$, $\Delta_y$, and $\Delta_{off}$ by maximizing the XEB fidelities. The length of the flux bias pulses are chosen to minimize leakage to higher energy levels, pulse distortion and timing are carefully calibrated [54]. The qubit frequencies at which two-qubit gates are performed are also optimized following a similar procedure as setting the single-qubit operation frequencies to mitigate the influences of TLS, cross talk, and pulse distortion on gate performance. The average two-qubit gate Pauli error $e_2$ of our processor is 0.59% when all gates are applied simultaneously [Fig. 2(c)].

To optimize readout fidelity and reduce readout cross talk, a different frequency setting for the qubits and couplers is used when performing readout. We calibrate the readout fidelities by preparing all qubits at $|0\rangle$ and $|1\rangle$ to check the count of successfully identifying the readout result as $|0\rangle$ and $|1\rangle$. The average single-qubit state readout error of our processor is 4.52% [Fig. 2(b)]. We also compare the fidelity result with that obtained from preparing the qubits in random bit strings as a sanity check, see the Supplemental Material for details, which includes Refs. [55].

**Random quantum circuit benchmarking.**—To characterize the overall performance of the quantum processor, we employ the task of random quantum circuit sampling for benchmarking. A random quantum circuit is an outstanding candidate to demonstrate quantum computational advantages, and has potential applications in certified random bits [56], error correction [57], and hydrodynamics simulation [58].

Figure 3 shows the gate sequence of our random quantum circuit. Each random quantum circuit is composed of $m$ cycles, and each cycle is composed of a single-qubit gate layer and a two-qubit gate layer. In the single-qubit gate layer, single-qubit gates are applied on all qubits and chosen randomly from the set of $\{\sqrt{X}, \sqrt{Y}, \sqrt{W}\}$, where $\sqrt{X} = RX(\pi/2)$, $\sqrt{Y} = RY(\pi/2)$, and $\sqrt{W} = R_{X+Y}(\pi/2)$ are the $\pi/2$ rotation around a specific axis. Each single-qubit gate on a qubit in the subsequent cycle is independently and randomly chosen from the subset of $\{\sqrt{X}, \sqrt{Y}, \sqrt{W}\}$, which does not include the single-qubit gate to this qubit in the preceding cycle. In the two-qubit

---

**FIG. 2.** Single-qubit gate, two-qubit gate, and readout performance of the selected 56 qubits. Single-qubit gate Pauli error $e_1$ (a), qubit state readout error $e_r$ (b), and two qubit gate Pauli error $e_e$ (c) of the 56 qubits and the 94 couplers used in the random circuit sampling task. The values are provided for all qubits operating simultaneously. See Supplemental Material for the rough calibration results of all 66 qubits and 110 couplers.
We now turn to test 56-qubit circuits increasingly with more cycles. The output bit strings of full, patch, and elided circuits from 12 to 20 cycles are all sampled in our experiments. However, the verification of the full circuit becomes challenging in this regime due to our limited classical computing resources. Therefore, we use the previously tested patch and elided circuits to assess performance. Figure 4(b) shows the linear XEB results for patch circuits and elided circuits. For each cycle, a total of ten randomly generated circuit instances are executed and sampled. We collect approximately $1.9 \times 10^7$ bit strings for each 56-qubit circuit with 20 cycles, the fidelities for these ten elided circuits are given in the inset of Fig. 4(b). Each individual circuit instance fidelity is nearly inside the ±σ statistical error band for a single instance, indicating the stability of the system and the unbiasedness of noise. We then apply inverse-variance weighting over these ten random circuits, yielding $F = (6.62 \pm 0.72) \times 10^{-4}$ for the combined linear XEB fidelity of the 56-qubit 20-cycle circuits. The null hypothesis of uniform sampling ($F = 0$) is thus rejected with a significance of $9\sigma$.

In addition, the observed fidelity of each circuit, as well as the decay of XEB fidelities with qubits $n$ and cycles $m$, match the predicted fidelity calculated from a simple multiplication of individual operations quite well. This result provides convincing evidence to confirm the low correlation of errors of each individual operation, including single- and two-qubit gates, as well as readout, which is a critical aspect for quantum error correction.


table

Computational cost estimation.—We finally estimate the classical computational cost of our hardest circuits, i.e., the 56-qubit random quantum circuit with 20 cycles. The estimation is based on two types of classical algorithms which are considered state of the art for classically simulating quantum circuits, namely, the tensor network algorithm and the Schrödinger-Feynman algorithm.

The tensor network algorithm reduces the problem of computing amplitudes into contracting tensor networks. It is a single-amplitude algorithm in that the complexity grows linearly with the number of amplitudes, which has been shown to perform excellently for relatively shallow quantum circuits [12,13,59–63]. The computational cost of tensor network algorithms is determined by the tensor contract path. To identify an optimal tensor contract path, we use the PYTHON package COTENGRA [64], which has been shown to be capable of reproducing state-of-the-art results in Refs. [12,13]. The number of floating point operations to generate one perfect sample from the 53-qubit 20-cycle random circuit used in Ref. [9] and our 56-qubit and 20-cycle random circuit is estimated as $1.63 \times 10^{18}$ and $1.65 \times 10^{20}$, respectively. Given that $3 \times 10^6$ samples were collected over one circuit instance with 0.224% fidelity in Ref. [9], while we have collected $1.9 \times 10^7$ samples with 0.0662% fidelity, so theoretically it would cost a total of $1.10 \times 10^{22}$ and $2.08 \times 10^{24}$ floating

![Diagram](image.png)

FIG. 3. 56-qubit random quantum circuit operations. The circuit can be divided into $m$ cycles, and each cycle has a layer of single-qubit gates and two-qubit gates. The single-qubit gates are chosen randomly from the set of $\{\sqrt{X}, \sqrt{Y}, \sqrt{W}\}$, while the two-qubit gates are chosen from the patterns of $A, B, C,$ and $D$ in the sequence of ABCDCDAB. The circles in the upper left corner of the diagram represent qubits, and the discarded qubits are marked with a shaded color. The orange, blue, green, and red lines represent the two-qubit gates of the four patterns $A, B, C,$ and $D$, respectively.

gate layer, two-qubit gates are applied according to a specified pattern, labeled by $A, B, C,$ and $D$, in the sequence of ABCDCDAB. Finally, an additional single-qubit gate layer is applied after $m$ cycles and before measurement.

With just a few cycles, the random quantum circuit could generate a highly entangled state. Two variant circuits, the patch circuit and elided circuit, are utilized to estimate the XEB fidelity of quantum circuits within our classical computing capabilities. The “patch” circuits are designed by removing a slice of two-qubit gates, while the “elided” circuits only remove a fraction of the gates between the patches. In these two variant circuits, the amount of entanglement involved is reduced so that it is feasible to classically simulate the experiments and thus determine $F_{\text{XEB}}$. We test the linear XEB fidelities of these two variant circuits and the full version of the circuits ranging from 15 qubits to 56 qubits with 10 cycles [see Fig. 4(a)]. Over all of these circuits, the fidelities derived from patch and elided circuits are in good agreement with the fidelities obtained with the corresponding full circuits, with average deviations of ~5% and ~10%, respectively, dominated by system fluctuations. The achieved results indicate that patch circuits and elided circuits could be used as performance estimators for large systems.
point operations, respectively, to reproduce the same results as Ref. [9] and our work using classical computer (see Supplemental Material for details, which includes Refs. [65–69]).

In comparison, the Schrödinger-Feynman algorithm is a full-amplitude algorithm in that computing an arbitrarily chosen branch of amplitudes is almost as hard as computing a single amplitude. Similar to Ref. [9], we estimate that it would cost $5.76 \times 10^{17}$ core hours for the task of simulating a 56-qubit 20-cycle random quantum circuit sampling with 0.0662% fidelity using the Schrödinger-Feynman algorithm, while simulating the previous task on the 53-qubit 20-cycle circuit (0.224% fidelity [9]) would cost $8.90 \times 10^{13}$ core hours (see Supplemental Material for details). Thus, our 56-qubit 20-cycle random quantum circuit is about 6000 times harder to classically simulate using the Schrödinger-Feynman algorithm. There are two sources for this difficulty increasing: the increased number of qubits and cross gates when separating the circuit into two patches.

Therefore, using the tensor network algorithm or Schrödinger-Feynman algorithm, the classical computational cost of our sample task with a 56-qubit and 20-cycle random circuit is about 2–3 orders of magnitude greater than that of the previous task with 53-qubit and 20-cycle [9]. This indicates that our work significantly enlarges the gap between the computational advantages of quantum devices and the classical simulations. In particular, as discussed in the Supplemental Material, it is estimated that it will take 15.9 days to simulate the previous sampling task in Ref. [9] using the tensor network algorithm on SUMMIT, whereas simulating our sampling task will take 8.2 yr. We anticipate the development of more efficient classical simulation approaches. On the one hand, the competition between quantum and classical computing will continue; on the other hand, more efficient classical simulation methods are necessary for large-scale quantum computing benchmarking.

Conclusion.—In conclusion, we have reported the design, fabrication, measurement, and benchmarking of a state-of-the-art 66-qubit superconducting quantum processor that is fully programmable through electric control. We are able to achieve high-fidelity logic operations of the full quantum circuit. Our experimental results of a random quantum circuit with 56 qubits and 20 cycles on the Zuchongzhi quantum processor established a new record to challenge the classical computing capability. We note that the performance of the whole system behaves as predicted when the system size grows from small to large, confirming our high-fidelity quantum operations and low correlated errors on the Zuchongzhi processor. The quantum processor has a scalable architecture that is compatible...
with surface-code error correction, which can act as the test bed for fault-tolerant quantum computing. We also expect that this large-scale, high-performance quantum processor could enable us to pursue valuable NISQ quantum applications beyond classical computers in the near future. A related experiment demonstrating quantum computational advantage with up to 113 photons is reported in [70].

We thank Run-Ze Liu, Wen Liu, Chenggang Zhou, Pan Zhang, and Junjie Wu for very helpful discussions and assistance. The classical calculations were performed on the supercomputing system in the Supercomputing Center of University of Science and Technology of China. The authors thank the USTC Center for Micro- and Nanoscale Research and Fabrication for supporting the sample fabrication. The authors also thank QuantumCTek Co., Ltd., for supporting the fabrication and the maintenance of room-temperature electronics. This research was supported by the National Key R&D Program of China (Grant No. 2017YFA0304300), the Chinese Academy of Sciences, Anhui Initiative in Quantum Information Technologies, Technology Committee of Shanghai Municipality, National Natural Science Foundation of China (Grants No. 11905217, No. 11744326, Grants No. 11905294), Shanghai Municipal Science and Technology Major Project (Grant No. 2019SHZDZX01), Key-Area Research and Development Program of Guangdong Province (Grant No. 2020B0303030001), and the Youth Talent Lifting Project (Grant No. 2020-JCJQ-QT-030).


[56] S. Aaronson (private communication).