Enhanced Measurement of Neutral Atom Qubits with Machine Learning (2024)

L. PhuttitarnDepartment of Physics, University of Wisconsin-Madison, 1150 University Avenue, Madison, WI, 53706, USA  B. M. BeckerDepartment of Physics, University of Wisconsin-Madison, 1150 University Avenue, Madison, WI, 53706, USA  R. ChinnarasuDepartment of Physics, University of Wisconsin-Madison, 1150 University Avenue, Madison, WI, 53706, USA  T. M. GrahamDepartment of Physics, University of Wisconsin-Madison, 1150 University Avenue, Madison, WI, 53706, USA  M. SaffmanDepartment of Physics, University of Wisconsin-Madison, 1150 University Avenue, Madison, WI, 53706, USAInfleqtion, Inc., Madison, WI, 53703, USA

(May 1, 2024)

Abstract

We demonstrate qubit state measurements assisted by a supervised convolutional neural network (CNN) in a neutral atom quantum processor. We present two CNN architectures for analyzing neutral atom qubit readout data: a compact 5-layer single-qubit CNN architecture and a 6-layer multi-qubit CNN architecture. We benchmark both architectures against a conventional Gaussian threshold analysis method. In a sparse array (9 μm𝜇m\mu\rm mitalic_μ roman_m atom separation) which experiences negligible crosstalk, we observed up to 32% and 56% error reduction for the multi-qubit and single-qubit architectures respectively, as compared to the benchmark.In a tightly spaced array (5 μm𝜇m\mu\rm mitalic_μ roman_m atom separation), which suffers from readout crosstalk, we observed up to 43% and 32% error reduction in the multi-qubit and single-qubit CNN architectures respectively, as compared to the benchmark. By examining the correlation between the predicted states of neighboring qubits, we found that the multi-qubit CNN architecture reduces the crosstalk correlation up to 78.5%. This work demonstrates a proof of concept for a CNN network to be implemented as a real-time readout processing method on a neutral atom quantum computer, enabling faster readout time and improved fidelity.

I Introduction

Scalable quantum computation requires precise, high fidelity initialization, control, and measurement of the quantum state of a large number of qubits. The number of sequential operations that can be performed is limited by qubit decoherence, resulting from imperfect qubit control mechanisms and unintended interaction with the environment. The leading approach to overcome this limitation is mid-circuit measurement based quantum error correction (see Acharyaetal. (2023) and references therein).The ability to make high-fidelity measurements with minimal collateral disruption to the system is not only relevant to initialization and final read-out – it is also essential to achieving quantum error correction.

Neutral atom quantum computing has matured remarkably in recent years. Single- and two-qubit gates have been demonstrated on neutral atoms with fidelity well above 99% Nikolovetal. (2023); Everedetal. (2023). Multi-qubit quantum circuits Grahametal. (2022); Bluvsteinetal. (2022) and mid-circuit measurements have also been demonstrated Deistetal. (2022); Singhetal. (2023); Grahametal. (2023); Norciaetal. (2023); Maetal. (2023); Lisetal. (2023).Qubit state measurements in a neutral atom array are achieved by probing the array with light detuned from a cycling transition. The resulting fluorescence is captured with a high quantum efficiency imaging device such as an EMCCD or sCMOS sensor, producing a greyscale image of the neutral atom array. Conventionally, the state of the qubit is then determined by integrating the photon counts over regions of interest (ROIs) and applying a linear threshold that optimally separates the two states’ probability distributions (see Fig. 3). The fidelity of the state detection is dependent upon the separability of the two distributions, which in turn depends on the signal-to-noise ratio. To achieve a fidelity above 99% with this method, the typical probing period is tens of msms\rm msroman_ms. This is a significant delay, given that the longest gate operation only takes several μs𝜇s\mu\rm sitalic_μ roman_s. Shortening the exposure time decreases the probing period, but also reduces signal-to-noise ratio and the fidelity. One could increase the power of the probing laser to compensate, but too much power can cause atom loss through heating. This compromise limits the signal-to-noise ratio and imposes a lower bound on the probing period, given a desired measurement fidelity and acceptable rate of qubit loss.

It is possible to further reduce the probing period without loss of fidelity by using more efficient image analysis algorithms. Examples include Independent Component AnalysisXiaetal. (2015), the Bayesian Inference Algorithm Martinez-Dorantesetal. (2017) and supervised Deep Neural Networks (DNNs)SyberfeldtandVuoluterä (2020). In this work, we present enhanced state detection using a Convolutional Neural Network (CNN). CNNs are a sub-category of DNNs that are especially well suited for image classification. Supervised DNNs have already been demonstrated to improve readout fidelity on other leading quantum computing platforms, includingtrapped-ion Seifetal. (2018); Dingetal. (2019), superconductingBravyietal. (2021); Lienhardetal. (2022) and quantum dot qubitsDarulováetal. (2021); Matsumotoetal. (2021). To the authors’ knowledge, this is the first demonstration of the use of deep learning for neutral atom qubit state detection.

II CNN Designs

Since state detection for neutral atom qubits is performed by imaging the atoms, a CNN is ideal for the task. The principles of how CNNs work are well known and we refer to existing literature for more details Goodfellowetal. (2016). In what follows, we focus on the particulars of the CNNs used in our experiments.

We implemented and benchmarked two different CNN architectures for qubit state detection. The first architecture, CNN-site, is a single qubit classifier. Analyzing one qubit at a time, it replaces the conventional linear threshold method for real-time applications, achieving superior detection fidelity. The second architecture, CNN-array, is a multi-qubit classifier. Because it analyzes the entire qubit array in parallel, it is crosstalk-aware. Compared to CNN-site, it boasts improved detection fidelity on large, tightly spaced arrays at the expense of real-time computational efficiency. Both architectures are implemented in Python using the Keras API et. al. (2015) and the Tensorflow library Abadietal. (2015) with GPU acceleration.

Enhanced Measurement of Neutral Atom Qubits with Machine Learning (1)
LayerFilter ShapeOutput Shape# Params
InputN/A10×10×11010110\times 10\times 110 × 10 × 10
Conv2D3×3×1×32331323\times 3\times 1\times 323 × 3 × 1 × 328×8×3288328\times 8\times 328 × 8 × 32320
Conv2D3×3×32×643332643\times 3\times 32\times 643 × 3 × 32 × 646×6×6466646\times 6\times 646 × 6 × 6418,496
Conv2D3×3×64×12833641283\times 3\times 64\times 1283 × 3 × 64 × 1284×4×128441284\times 4\times 1284 × 4 × 12873,856
FlattenN/A20480
Dense2,048×12820481282,048\times 1282 , 048 × 128128262,272
Dense128×21282128\times 2128 × 22258
Total trainable parameters:355,202
Enhanced Measurement of Neutral Atom Qubits with Machine Learning (2)
LayerFilter ShapeOutput Shape# Params
InputN/A28×28×12828128\times 28\times 128 × 28 × 10
Conv2D3×3×1×32331323\times 3\times 1\times 323 × 3 × 1 × 3226×26×3226263226\times 26\times 3226 × 26 × 32320
Conv2D3×3×32×643332643\times 3\times 32\times 643 × 3 × 32 × 6424×24×6424246424\times 24\times 6424 × 24 × 6418,496
Conv2D3×3×64×12833641283\times 3\times 64\times 1283 × 3 × 64 × 12822×22×128222212822\times 22\times 12822 × 22 × 12873,856
FlattenN/A619520
Dense(2048×128)×92048\times 128)\times 92048 × 128 ) × 9128×91289128\times 9128 × 974,613,888
Dense(128×64)×9128649(128\times 64)\times 9( 128 × 64 ) × 964×964964\times 964 × 974,304
Dense(64×2)×96429(64\times 2)\times 9( 64 × 2 ) × 92×9292\times 92 × 91,170
Total trainable parameters:74,782,034

II.1 CNN-site

CNN-site is a compact, five-layer single-qubit state classifier, as shown in Fig. 1. It consists of three convolutional (CONV) layers followed by two fully-connected (FC) layers. The activation function is a rectified linear unit (g(𝐳)=max(𝟎,𝐳)𝑔𝐳0𝐳g\left(\mathbf{z}\right)=\max\left(\mathbf{0},\mathbf{z}\right)italic_g ( bold_z ) = roman_max ( bold_0 , bold_z )) for all layers except the final FC layer, which uses a Softmax activation function111The Softmax activation function is defined asg(zc)=ezcd=1Coutezd𝑔subscript𝑧superscript𝑐superscript𝑒subscript𝑧superscript𝑐superscriptsubscript𝑑1subscript𝐶outsuperscript𝑒subscript𝑧𝑑g\left(z_{c^{\prime}}\right)=\frac{e^{z_{c^{\prime}}}}{\sum_{d=1}^{C_{\rm out}%}e^{z_{d}}}italic_g ( italic_z start_POSTSUBSCRIPT italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) = divide start_ARG italic_e start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_d = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT roman_out end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARGThis function normalizes the sum of the elements such that c=1Coutzc=1superscriptsubscriptsuperscript𝑐1subscript𝐶outsubscript𝑧superscript𝑐1\sum_{c^{\prime}=1}^{C_{\rm out}}z_{c^{\prime}}=1∑ start_POSTSUBSCRIPT italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT roman_out end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = 1. Each element value is then the weighted probability. . As input, it receives a cropped monochromatic image of a single site of the atomic array. The input images are pre-processed by mean subtraction and normalization (see Appendix A for details). The output nodes represent the weighted probabilities of the atom being in a dark or bright state.

The dimensions of the CNN layers were experimentally varied in order to discover the set of parameters that consistently converged to a high state detection fidelity during training. Appreciably shrinking the network resulted in sporadic convergence failures (as indicated by low fidelities corresponding to random guessing) due to nondeterministic initialization of the network weights. On the other hand, appreciably increasing the width or depth of the network did not yield further fidelity improvements – the fidelities were almost identical amongst networks that did converge. This reflects the relative simplicity of the input images; the task can be completed with only 3 convolutional layers because higher-complexity features, which additional convolutional layers could extract, are not present in the data. We believe that CNN-site could be further optimized without any detrimental impact to detection fidelity, but this would require further iterative fine-tuning and would not alter the conclusions of this work. The number of parameters is already very small in comparison to typical CNNs. Its low parameter count makes it quick enough for real-time applications, including atom rearrangement and mid-circuit measurement if accelerated using a neural network accelerator (NNA) or an FPGA.

II.2 CNN-array

CNN-array is a six-layer multi-qubit state classifier that improves state detection in closely-spaced arrays where crosstalk between the neighboring sites is appreciable. As input, CNN-array accepts pre-processed images of the full atomic array. Its architecture, pictured in Fig. 2, is identical to that of CNN-site except (a) the input image is larger, (b) there is an additional FC layer, and (c) there are multiple instances of each FC layer, one per qubit in the array. It produces 𝒩𝒩\mathcal{N}caligraphic_N simultaneous outputs with the same format as CNN-site, indicating the two-state probabilities of each qubit in the array. Since each instance of the FC layers can “see” the features of all of the other qubits in the array, CNN-array can identify and accommodate correlations between neighboring qubits. Owing to the larger input image and the all-to-all fanout of the FC layers’ connections, CNN-array contains similar-to\sim210 times more parameters than CNN-site, making real-time implementation challenging, even with an NNA or FPGA. Optimization is likely possible and is a subject for future work.

II.3 Gaussian-threshold & Square-threshold

For completeness, we present the conventional Gaussian threshold and square-threshold methods, which we used as our performance benchmark.

First, the region of interest of each atom (the location in the image where the atom is expected to be found) for each site is determined by averaging all the readout images in the entire dataset, and applying a circular Gaussian fit to the result. This locates the center of the distribution, corresponding to the central position of the atom, and the standard deviation σ𝜎\sigmaitalic_σ, corresponding to the size of the atom in pixels. For the Gaussian-threshold method, the resulting best-fit parameters are then used to generate a 2D-Gaussian mask for each site: a value in the range 0.0 to 1.0 for each pixel in the image. For the square-threshold method, we replace the Gaussian mask with a binary square mask with sides of length 2σ2𝜎2\sigma2 italic_σ.

The mask is multiplied element-wise with each readout image and then summed over all pixels, returning a single integrated value per atom. The quantum state of the atom is predicted to be a bright (dark) state if the integrated value is above (below) a pre-determined threshold. The threshold itself is determined by collating the integrated data of many images of randomly-loaded arrays. The histogram of the collated data is shown in Fig. 3 (lower-left, lower-right) along with the threshold (dotted line). The distribution resembles a mixture of two Gaussian distributions, one for each binary state (dark or bright). The parameters of the two-Gaussian mixture can be obtained by applying a curve fit to the histogram. The threshold that maximizes the separation between the two states is the intersection point of the two fitted Gaussian curves.

Enhanced Measurement of Neutral Atom Qubits with Machine Learning (3)

III Experimental Methods

To collect data, Cs atoms are loaded into a 3×\times×3 tweezer array generated by a two-dimensional Acousto-Optical Deflector (AOD). The array is imaged with 852 nm light that is red-detuned from the 6s1/2,f=46p3/2,f=5formulae-sequence6subscript𝑠12𝑓46subscript𝑝32𝑓56s_{1/2},f=4-6p_{3/2},f=56 italic_s start_POSTSUBSCRIPT 1 / 2 end_POSTSUBSCRIPT , italic_f = 4 - 6 italic_p start_POSTSUBSCRIPT 3 / 2 end_POSTSUBSCRIPT , italic_f = 5 transition by 9γ𝛾\gammaitalic_γ (γ=2π×5.2𝛾2𝜋5.2\gamma=2\pi\times 5.2italic_γ = 2 italic_π × 5.2 MHz). Fluorescence light is collected via the two opposing 0.7 NA objective lenses. State measurements are performed destructively by first pushing out one of the states, followed by occupancy imaging with a hyperfine repump laser to prevent atoms going dark due to Raman transitions. Although state measurements can be performed non-destructively Kwonetal. (2017) the destructive method used here provides for the lowest possible uncertainty in state measurements.The two collection paths are projected onto two adjacent regions of the EMCCD sensor (Andor Ixon 897).

In order to train and evaluate the performance of the two CNN networks, we acquired a dataset consisting of noisy input data paired with the corresponding “ground truth” labels.We collected data at site spacings of 5 μ𝜇\muitalic_μm and 9 μ𝜇\muitalic_μm, varying the readout time(secondary path state separation) from 10 ms(2.17σ𝜎\sigmaitalic_σ) to 100 ms(66.6σ𝜎\sigmaitalic_σ). The conventional linear threshold method is applied to the high-SNR images in order to generate the “ground truth” labels with >99.8%absentpercent99.8>99.8\%> 99.8 % fidelity in all sets. Each dataset contains 3,000 single-shots of the 3×3333\times 33 × 3 array with 50%similar-toabsentpercent50\sim 50\%∼ 50 % occupancy for each readout.

We do not expect the small labeling error in the dataset to create an upper bound on achievable fidelity. Convolutional neural networks and deep neural networks are known to be robust against small amounts of unbiased mislabelled data Fardetal. (2017); Rolnicketal. (2018). The fraction of labelling errors in our dataset (<0.2%absentpercent0.2<0.2\%< 0.2 %) is small enough to not impose an upper limit on the achievable fidelity with a supervised learning method. For a system with lower readout fidelity such that there is a significant amount of mislabelled data, conventional training methods can be modified with a transfer learning method Oquabetal. (2014); Azizpouretal. (2015), semi-supervised learning Westonetal. (2012) and unsupervised learning Dosovitskiyetal. (2014).

Each dataset is then randomly divided into three subsets with the distribution: 60% training, 20% validation, and 20% test. The training subset is used as input for all iterations of CNN training. The validation subset is used between training epochs to monitor the performance of the partially-trained CNN and detect overfitting. The test subset is reserved for final performance evaluation, to ensure that the CNNs are benchmarked on data that the CNNs have no prior access to. For specifics about the CNN training procedure, see Appendix A. All quoted fidelities were evaluated on the test subset.

IV Results

Enhanced Measurement of Neutral Atom Qubits with Machine Learning (4)
Central to Nearest Neighbor
GaussianSquareCNN-siteCNN-array
52CFsubscriptsuperscript𝐶𝐹52\mathcal{F}^{CF}_{52}caligraphic_F start_POSTSUPERSCRIPT italic_C italic_F end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 52 end_POSTSUBSCRIPT0.02180.00720.00480.0043
54CFsubscriptsuperscript𝐶𝐹54\mathcal{F}^{CF}_{54}caligraphic_F start_POSTSUPERSCRIPT italic_C italic_F end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 54 end_POSTSUBSCRIPT0.03000.02670.02360.0238
56CFsubscriptsuperscript𝐶𝐹56\mathcal{F}^{CF}_{56}caligraphic_F start_POSTSUPERSCRIPT italic_C italic_F end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 56 end_POSTSUBSCRIPT0.00750.0025-0.0090-0.0048
58CFsubscriptsuperscript𝐶𝐹58\mathcal{F}^{CF}_{58}caligraphic_F start_POSTSUPERSCRIPT italic_C italic_F end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 58 end_POSTSUBSCRIPT0.00650.01860.00460.0095
|5jCF|quantum-operator-productsubscriptsuperscript𝐶𝐹5𝑗\braket{}{\mathcal{F}^{CF}_{5j}}{}⟨ | start_ARG caligraphic_F start_POSTSUPERSCRIPT italic_C italic_F end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 5 italic_j end_POSTSUBSCRIPT end_ARG | ⟩0.01650.01380.00600.0082
Edge-to-Edge
13CFsubscriptsuperscript𝐶𝐹13\mathcal{F}^{CF}_{13}caligraphic_F start_POSTSUPERSCRIPT italic_C italic_F end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 13 end_POSTSUBSCRIPT-0.0086-0.0131-0.0059-0.180
79CFsubscriptsuperscript𝐶𝐹79\mathcal{F}^{CF}_{79}caligraphic_F start_POSTSUPERSCRIPT italic_C italic_F end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 79 end_POSTSUBSCRIPT-0.0134-0.0172-0.0108-0.0118
17CFsubscriptsuperscript𝐶𝐹17\mathcal{F}^{CF}_{17}caligraphic_F start_POSTSUPERSCRIPT italic_C italic_F end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 17 end_POSTSUBSCRIPT0.00240.0151-0.00040.0069
39CFsubscriptsuperscript𝐶𝐹39\mathcal{F}^{CF}_{39}caligraphic_F start_POSTSUPERSCRIPT italic_C italic_F end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 39 end_POSTSUBSCRIPT0.00950.00400.00070.0004
19CFsubscriptsuperscript𝐶𝐹19\mathcal{F}^{CF}_{19}caligraphic_F start_POSTSUPERSCRIPT italic_C italic_F end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 19 end_POSTSUBSCRIPT-0.0098-0.0108-0.0163-0.0160
37CFsubscriptsuperscript𝐶𝐹37\mathcal{F}^{CF}_{37}caligraphic_F start_POSTSUPERSCRIPT italic_C italic_F end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 37 end_POSTSUBSCRIPT-0.0030-0.00060.0007-0.0055
|ijCF|quantum-operator-productsubscriptsuperscript𝐶𝐹𝑖𝑗\braket{}{\mathcal{F}^{CF}_{ij}}{}⟨ | start_ARG caligraphic_F start_POSTSUPERSCRIPT italic_C italic_F end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_ARG | ⟩0.01010.00780.00580.0096

In this section, we present the CNN classification performance compared to conventional methods. We first consider the performance under low-crosstalk conditions (9 μm𝜇m\mu\rm mitalic_μ roman_m array spacing), then examine the case when there is increased crosstalk (5 μm𝜇m\mu\rm mitalic_μ roman_m array spacing). Crosstalk occurs when a percentage of photons emitted by an atom strike the EMCCD array at the location corresponding to a different site in the atomic array. This increases the photon count for the “wrong” atom, making correct state detection more challenging.

Performance is measured in terms of classification fidelity Lienhardetal. (2022), defined as

=1P(Bp|D)+P(Dp|B)21𝑃conditionalsubscript𝐵𝑝𝐷𝑃conditionalsubscript𝐷𝑝𝐵2\mathcal{F}=1-\frac{P(B_{p}|D)+P(D_{p}|B)}{2}caligraphic_F = 1 - divide start_ARG italic_P ( italic_B start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT | italic_D ) + italic_P ( italic_D start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT | italic_B ) end_ARG start_ARG 2 end_ARG(1)

where D𝐷Ditalic_D and B𝐵Bitalic_B are the true dark and bright qubit states respectively and Dpsubscript𝐷𝑝D_{p}italic_D start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT and Bpsubscript𝐵𝑝B_{p}italic_B start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT are the predicted dark and bright states. P(Bp|D)𝑃conditionalsubscript𝐵𝑝𝐷P(B_{p}|D)italic_P ( italic_B start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT | italic_D ) and P(Dp|B)𝑃conditionalsubscript𝐷𝑝𝐵P(D_{p}|B)italic_P ( italic_D start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT | italic_B ) represent the false bright and false dark prediction probabilities respectively. To quantify the prediction correlation between two sites due to crosstalk, we define the cross-fidelity Chenetal. (2021) as

ijCF=1P(Di|Bj)+P(Bi|Dj)\mathcal{F}^{\rm CF}_{ij}=1-\braket{P(D_{i}}{B_{j})+P(B_{i}}{D_{j})}caligraphic_F start_POSTSUPERSCRIPT roman_CF end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = 1 - ⟨ start_ARG italic_P ( italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG | start_ARG italic_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) + italic_P ( italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG | start_ARG italic_D start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) end_ARG ⟩(2)

where Disubscript𝐷𝑖D_{i}italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and Bisubscript𝐵𝑖B_{i}italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are the dark and bright states of qubit i𝑖iitalic_i and Djsubscript𝐷𝑗D_{j}italic_D start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and Bjsubscript𝐵𝑗B_{j}italic_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are the dark and bright states of qubit j𝑗jitalic_j. In the absence of crosstalk, the state of qubit i𝑖iitalic_i is independent of the qubit j𝑗jitalic_j, and the probability of qubit i𝑖iitalic_i being in the bright state given that qubit j𝑗jitalic_j is in the dark state, P(Di|Bj)𝑃conditionalsubscript𝐷𝑖subscript𝐵𝑗P(D_{i}|B_{j})italic_P ( italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ), is 0.5. This yields ijCF=0subscriptsuperscript𝐶𝐹𝑖𝑗0\mathcal{F}^{CF}_{ij}=0caligraphic_F start_POSTSUPERSCRIPT italic_C italic_F end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = 0 for the crosstalk-free case. Positive (negative) ijCFsubscriptsuperscript𝐶𝐹𝑖𝑗\mathcal{F}^{CF}_{ij}caligraphic_F start_POSTSUPERSCRIPT italic_C italic_F end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT values indicate correlation (anti-correlation) between sites i𝑖iitalic_i and j𝑗jitalic_j.

We define the relative infidelity factorη𝜂\etaitalic_η in terms of CNN infidelity (1CNN(1-\mathcal{F}_{\rm CNN}( 1 - caligraphic_F start_POSTSUBSCRIPT roman_CNN end_POSTSUBSCRIPT) and Gaussian mask infidelity (1σ(1-\mathcal{F}_{\sigma}( 1 - caligraphic_F start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT) as

η=(1σ)(1CNN)1σ𝜂1subscript𝜎1subscriptCNN1subscript𝜎\eta=\frac{(1-\mathcal{F}_{\sigma})-(1-\mathcal{F}_{\rm CNN})}{1-\mathcal{F}_{%\sigma}}italic_η = divide start_ARG ( 1 - caligraphic_F start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ) - ( 1 - caligraphic_F start_POSTSUBSCRIPT roman_CNN end_POSTSUBSCRIPT ) end_ARG start_ARG 1 - caligraphic_F start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT end_ARG(3)

This measures the infidelity reduction, or the percent reduction in incorrect state predictions, that the CNN achieved as compared to the conventional method.

Figure 4 shows the performance of all four state detection methods for the 9μm9𝜇m9\leavevmode\nobreak\ \mu\rm m9 italic_μ roman_m spacing array as a function of readout time. At this spacing, the effects of crosstalk are relatively small, which we confirmed by comparing the cross-fidelity of the neighboring sites versus non-neighboring sites for the Square mask in Table 3. (We use the Square mask statistics as a reference because that method has no built-in mechanism to mitigate crosstalk). Both CNN architectures showed a reduction in measurement infidelity relative to the conventional methods at readout times up to 90 ms. CNN-site achieved up to 56% reduction in infidelity (vertical dotted line), or up to 29% reduction in readout time (horizontal dotted line) when compared to the Gaussian method. CNN-array achieved up to 35% reduction in infidelity and 14% reduction in readout time. This demonstrates that the CNNs outperform the conventional method even in a low-crosstalk configuration. CNN-site’s superior performance, despite its low parameter count, may indicate that it is better-optimized for this particular task than CNN-array. In this crosstalk-free configuration, the large number of trainable parameters in CNN-array does make it more susceptible to overfitting. We also observe no significant differences in performance between the Square Threshold and Gaussian Threshold.

Central to Nearest Neighbor
GaussianSquareCNN-siteCNN-array
52CFsubscriptsuperscriptCF52\mathcal{F}^{\rm CF}_{52}caligraphic_F start_POSTSUPERSCRIPT roman_CF end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 52 end_POSTSUBSCRIPT0.02980.05240.01510.0099
54CFsubscriptsuperscriptCF54\mathcal{F}^{\rm CF}_{54}caligraphic_F start_POSTSUPERSCRIPT roman_CF end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 54 end_POSTSUBSCRIPT0.02110.02980.01460.0099
56CFsubscriptsuperscriptCF56\mathcal{F}^{\rm CF}_{56}caligraphic_F start_POSTSUPERSCRIPT roman_CF end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 56 end_POSTSUBSCRIPT0.02180.02840.01200.0022
58CFsubscriptsuperscriptCF58\mathcal{F}^{\rm CF}_{58}caligraphic_F start_POSTSUPERSCRIPT roman_CF end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 58 end_POSTSUBSCRIPT0.03920.06070.01200.0020
|5jCF|quantum-operator-productsubscriptsuperscriptCF5𝑗\braket{}{\mathcal{F}^{\rm CF}_{5j}}{}⟨ | start_ARG caligraphic_F start_POSTSUPERSCRIPT roman_CF end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 5 italic_j end_POSTSUBSCRIPT end_ARG | ⟩0.02800.04280.01340.0060
Between Non-nearest Neighbors
13CFsubscriptsuperscriptCF13\mathcal{F}^{\rm CF}_{13}caligraphic_F start_POSTSUPERSCRIPT roman_CF end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 13 end_POSTSUBSCRIPT0.01100.01390.02050.0110
79CFsubscriptsuperscriptCF79\mathcal{F}^{\rm CF}_{79}caligraphic_F start_POSTSUPERSCRIPT roman_CF end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 79 end_POSTSUBSCRIPT-0.0036-0.0005-0.0094-0.0036
17CFsubscriptsuperscriptCF17\mathcal{F}^{\rm CF}_{17}caligraphic_F start_POSTSUPERSCRIPT roman_CF end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 17 end_POSTSUBSCRIPT0.00380.0091-0.00160.0038
39CFsubscriptsuperscriptCF39\mathcal{F}^{\rm CF}_{39}caligraphic_F start_POSTSUPERSCRIPT roman_CF end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 39 end_POSTSUBSCRIPT0.00280.00900.00310.0028
19CFsubscriptsuperscriptCF19\mathcal{F}^{\rm CF}_{19}caligraphic_F start_POSTSUPERSCRIPT roman_CF end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 19 end_POSTSUBSCRIPT-0.0068-0.0043-0.0098-0.0068
37CFsubscriptsuperscriptCF37\mathcal{F}^{\rm CF}_{37}caligraphic_F start_POSTSUPERSCRIPT roman_CF end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 37 end_POSTSUBSCRIPT0.00700.00780.00800.0070
|ijCF|quantum-operator-productsubscriptsuperscriptCF𝑖𝑗\braket{}{\mathcal{F}^{\rm CF}_{ij}}{}⟨ | start_ARG caligraphic_F start_POSTSUPERSCRIPT roman_CF end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_ARG | ⟩0.00580.00740.00870.0058

At 5μ5𝜇5\leavevmode\nobreak\ \mu5 italic_μm separation, the effects of crosstalk become apparent. This is evident if we compare the Square mask cross-fidelities in Table 4 to those in Table 3. While the edge-to-edge baseline values are similar, the average nearest-neighbour cross-fidelity increased from 0.0138 to 0.0428 when we decreased the spacing from 9μm9𝜇m9\leavevmode\nobreak\ \mu\rm m9 italic_μ roman_m to 5μm5𝜇m5\leavevmode\nobreak\ \mu\rm m5 italic_μ roman_m.

As anticipated, the crosstalk significantly impacts state detection performance, and further differentiates the four methods. Figure 5a) shows the performance of all four state detection methods for the 5μm5𝜇m5\leavevmode\nobreak\ \mu\rm m5 italic_μ roman_m spaced array.CNN-array achieved an infidelity reduction of up to 43% (vertical dotted line) and readout time reduction up to 25% (horizontal dotted line) compared to the Gaussian mask. CNN-site achieved a more modest 32% infidelity reduction and 20% readout time reduction.Unlike the 9μm9𝜇m9\leavevmode\nobreak\ \mu\rm m9 italic_μ roman_m case, the Gaussian mask performs significantly better than the square mask.

Analyzing the fidelity of individual sites in Fig. 5b) we observe higher infidelity on the central site when compared to the four corner sites for all methods. The corresponding cross-fidelities in Table 4 support this observation, showing a positive correlation between the central site and its four nearest neighbors. CNN-array was the most effective at mitigating crosstalk, showing almost no difference betwen the nearest-neighbor and edge-to-edge cross-fidelities (0.0060 and 0.0058, respectively). That corresponds to a 78% cross-fidelity reduction compared to the Gaussian mask. We attribute CNN-array’s superior performance to its simultaneous awareness of all sites in the array. Unexpectedly, CNN-site was also successful at mitigating some of the crosstalk, achieving a 50% cross-fidelity improvement. We attribute this to CNN-site’s spatially-aware utilization of features on the edges of the 10×10101010\times 1010 × 10 pixel input to detect subtle correlations.

To better understand the effect of crosstalk on state detection performance, we compare in Fig. 6 the histograms of the central site with and without neighbors for the Gaussian and Square threshold methods. When neighboring sites were occupied, we observed broadening and upward-biasing of the distributions due to bleed-over of fluorescence from neighboring sites, making the two states more difficult to separate.

Enhanced Measurement of Neutral Atom Qubits with Machine Learning (5)
Enhanced Measurement of Neutral Atom Qubits with Machine Learning (6)

Finally, in order to gauge the advantages of our method under normal operating conditions, we apply the CNN detection method to measurements taken on the non-attenuated primary path at 5μm5𝜇m5\leavevmode\nobreak\ \mu\rm m5 italic_μ roman_m spacing. We obtain the training data and “ground truth” labels by performing a sequence of three measurements on each randomly loaded atom array, as described in Appendix B. The first and last high fidelity measurements are 20 ms long with >4.5σabsent4.5𝜎>4.5\sigma> 4.5 italic_σ separation. We observed improvement with up to 83% reduction of infidelity. This improvement is consistent with results from the dual-path setup. This makes it possible to reduce the probing time from 15 ms to 9.8 ms while preserving the readout fidelity above 99.5%.

We also characterized the processing time needed by the CNN compared to the conventional method. On a PC (Ryzen 1700 CPU and GTX1080 GPU), we found the average CNN-site and CNN-array single-site inference time is 97 ±plus-or-minus\pm± 1 μs𝜇s\mu\rm sitalic_μ roman_s and 303 ±plus-or-minus\pm± 23 μs𝜇s\mu\rm sitalic_μ roman_s respectively, compared to the conventional processing time of 11 ±plus-or-minus\pm± 1 μs𝜇s\mu\rm sitalic_μ roman_s. Taking into account the increased inference time together with the reduced probing time, our method reduces the total measurement time by 50%. The processing delay can be further reduced by optimizing the CNN network and using hardware acceleration.

Enhanced Measurement of Neutral Atom Qubits with Machine Learning (7)

V Conclusion

We have presented two neural network architectures for state detection of neutral atoms. Both architectures outperformed conventional state detection methods by a statistically significant margin. We observed a 43% reduction in infidelity and 78% cross-fidelity reduction at 5μm5𝜇m5\leavevmode\nobreak\ \mu\rm m5 italic_μ roman_m site-to-site spacing with CNN-array, demonstrating a superior ability to mitigate inter-site crosstalk. CNN-site also performed better than conventional methods, with a 32% reduction in infidelity and 50% cross-fidelity improvement. To confirm that improvement is not attributable only to the mitigation of crosstalk, we also evaluated the performance of the neural network at 9 μ𝜇\muitalic_μm site spacing where the effects of crosstalk are small. In this configuration, CNN-site was able to reduce the infidelity up to 57% compared to the Gaussian mask method. Based on these results, we conclude that a neural network can reduce the readout time by up to 29% while maintaining the same fidelity, opening the possibility of faster high-fidelity measurements on neutral atom processors. In future work we plan to accelerate the CNNs on an FPGA as a proof-of-concept demonstration of fast real-time state detection, which is a necessary ingredient for measurement-based quantum error correctionGrahametal. (2023).

Acknowledgements.

This material is based upon work supported by the U.S.Department of Energy Office of Science National QuantumInformation Science Research Centers as part of the Q-NEXTcenter, NSF Award 2016136 for the QLCI center Hybrid Quantum Architectures and Networks, and NSF award No. 2210437.

References

  • Acharyaetal. (2023)R.Acharya, I.Aleiner,R.Allen, T.I.Andersen, M.Ansmann, F.Arute, K.Arya, A.Asfaw, J.Atalaya,R.Babbush, D.Bacon, J.C.Bardin, J.Basso, A.Bengtsson, S.Boixo, G.Bortoli, A.Bourassa, J.Bovaird, L.Brill, M.Broughton, B.B.Buckley, D.A.Buell, T.Burger, B.Burkett,N.Bushnell, Y.Chen, Z.Chen, B.Chiaro, J.Cogan, R.Collins, P.Conner, W.Courtney, A.L.Crook, B.Curtin, D.M.Debroy,A.DelToroBarba,S.Demura, A.Dunsworth, D.Eppens, C.Erickson, L.Faoro, E.Farhi, R.Fatemi, L.F.Burgos, E.Forati, A.G.Fowler,B.Foxen, W.Giang, C.Gidney, D.Gilboa, M.Giustina, A.G.Dau, J.A.Gross, S.Habegger,M.C.Hamilton, M.P.Harrigan, S.D.Harrington, O.Higgott, J.Hilton, M.Hoffmann, S.Hong, T.Huang, A.Huff,W.J.Huggins, L.B.Ioffe, S.V.Isakov, J.Iveland, E.Jeffrey, Z.Jiang, C.Jones, P.Juhas, D.Kafri, K.Kechedzhi, J.Kelly, T.Khattar, M.Khezri, M.Kieferová, S.Kim, A.Kitaev, P.V.Klimov,A.R.Klots, A.N.Korotkov, F.Kostritsa, J.M.Kreikebaum, D.Landhuis, P.Laptev, K.-M.Lau, L.Laws, J.Lee, K.Lee, B.J.Lester, A.Lill, W.Liu, A.Locharla, E.Lucero,F.D.Malone, J.Marshall, O.Martin, J.R.McClean, T.McCourt, M.McEwen, A.Megrant, B.M.Costa, X.Mi, K.C.Miao,M.Mohseni, S.Montazeri, A.Morvan, E.Mount, W.Mruczkiewicz, O.Naaman, M.Neeley, C.Neill, A.Nersisyan, H.Neven, M.Newman, J.H.Ng, A.Nguyen, M.Nguyen,M.Y.Niu, T.E.O’Brien, A.Opremcak, J.Platt, A.Petukhov, R.Potter, L.P.Pryadko, C.Quintana, P.Roushan,N.C.Rubin, N.Saei, D.Sank, K.Sankaragomathi, K.J.Satzinger, H.F.Schurkus, C.Schuster, M.J.Shearn, A.Shorter,V.Shvarts, J.Skruzny, V.Smelyanskiy, W.C.Smith, G.Sterling, D.Strain, M.Szalay, A.Torres, G.Vidal, B.Villalonga, C.V.Heidweiller, T.White, C.Xing, Z.J.Yao, P.Yeh, J.Yoo, G.Young, A.Zalcman, Y.Zhang, andN.Zhu,“Suppressing quantum errors by scaling a surface code logical qubit,”Nature614,676–681 (2023).
  • Nikolovetal. (2023)B.Nikolov, E.Diamond-Hitchco*ck, J.Bass, N.L.R.Spong,andJ.D.Pritchard,“Randomized benchmarkingusing nondestructive readout in a two-dimensional atom array,”Phys. Rev. Lett.131,030602 (2023).
  • Everedetal. (2023)S.J.Evered, D.Bluvstein,M.Kalinowski, S.Ebadi, T.Manovitz, H.Zhou, S.H.Li, A.A.Geim, T.T.Wang, N.Maskara,H.Levine, G.Semeghini, M.Greiner, V.Vuletić, andM.D.Lukin,“High-fidelity parallel entangling gates on a neutral-atom quantumcomputer,”Nature622,268(2023).
  • Grahametal. (2022)T.M.Graham, Y.Song,J.Scott, C.Poole, L.Phuttitarn, K.Jooya, P.Eichler, X.Jiang, A.Marra, B.Grinkemeyer, M.Kwon, M.Ebert, J.Cherek,M.T.Lichtman, M.Gillette, J.Gilbert, D.Bowman, T.Ballance, C.Campbell, E.D.Dahl, O.Crawford, N.S.Blunt,B.Rogers, T.Noel, andM.Saffman,“Multi-qubit entanglement and algorithms on a neutral-atomquantum computer,”Nature604,457–462 (2022).
  • Bluvsteinetal. (2022)D.Bluvstein, H.Levine,G.Semeghini, T.T.Wang, S.Ebadi, M.Kalinowski, A.Keesling, N.Maskara, H.Pichler, M.Greiner, V.Vuletić, andM.D.Lukin,“A quantum processor based on coherent transport of entangled atomarrays,”Nature604,451–456(2022).
  • Deistetal. (2022)E.Deist, Y.-H.Lu,J.Ho, M.K.Pasha, J.Zeiher, Z.Yan, andD.M.Stamper-Kurn,“Mid-circuit cavity measurement in a neutral atom array,”Phys. Rev. Lett.129,203602 (2022).
  • Singhetal. (2023)K.Singh, C.E.Bradley,S.Anand, V.Ramesh, R.White, andH.Bernien,“Mid-circuit correction of correlated phase errors using anarray of spectator qubits,”Science380,1265 (2023).
  • Grahametal. (2023)T.M.Graham, L.Phuttitarn,R.Chinnarasu, Y.Song, C.Poole, K.Jooya, J.Scott, A.Scott, P.Eichler, andM.Saffman,“Mid-circuit measurements on a neutral atom quantum processor,”Phys. Rev. X13,041051 (2023).
  • Norciaetal. (2023)M.A.Norcia, W.B.Cairncross, K.Barnes,P.Battaglino, A.Brown, M.O.Brown, K.Cassella, C.-A.Chen, R.Coxe, D.Crow, J.Epstein, C.Griger,A.M.W.Jones, H.Kim, J.M.Kindem, J.King, S.S.Kondov, K.Kotru, J.Lauigan, M.Li, M.Lu, E.Megidish,J.Marjanovic, M.McDonald, T.Mittiga, J.A.Muniz, S.Narayanaswami, C.Nishiguchi, R.Notermans, T.Paule, K.Pawlak, L.Peng, A.Ryou, A.Smull,D.Stack, M.Stone, A.Sucich, M.Urbanek, R.vande Veerdonk, Z.Vendeiro, T.Wilkason, T.-Y.Wu, X.Xie, X.Zhang, andB.J.Bloom,“Mid-circuit qubit measurement andrearrangement in a 171Yb atomic array,”Phys. Rev. X13,041034 (2023).
  • Maetal. (2023)S.Ma, G.Liu, P.Peng, B.Zhang, S.Jandura, J.Claes, A.P.Burgers, G.Pupillo, S.Puri, andJ.D.Thompson,“High-fidelity gates andmid-circuit erasure conversion in an atomic qubit,”Nature622,279 (2023).
  • Lisetal. (2023)J.W.Lis, A.Senoo, W.F.McGrew, F.Rönchen, A.Jenkins, andA.M.Kaufman,“Mid-circuit operations using theomg-architecture in neutral atom arrays,”Phys. Rev. X13,041035 (2023).
  • Xiaetal. (2015)T.Xia, M.Lichtman,K.Maller, A.W.Carr, M.J.Piotrowicz, L.Isenhower, andM.Saffman,“Randomized benchmarking of single-qubit gates in a 2Darray of neutral-atom qubits,”Phys. Rev. Lett.114,100503 (2015).
  • Martinez-Dorantesetal. (2017)M.Martinez-Dorantes, W.Alt, J.Gallego,S.Ghosh, L.Ratschbacher, Y.Völzke, andD.Meschede,“Fast nondestructive parallel readout of neutral atomregisters in optical potentials,”Phys. Rev. Lett.119,180503 (2017).
  • SyberfeldtandVuoluterä (2020)A.SyberfeldtandF.Vuoluterä,“Imageprocessing based on deep neural networks for detecting quality problems inpaper bag production,”Procedia CIRP93,1224–1229 (2020),53rd CIRPConference on Manufacturing Systems 2020.
  • Seifetal. (2018)A.Seif, K.A.Landsman,N.M.Linke, C.Figgatt, C.Monroe, andM.Hafezi,“Machine learning assisted readout of trapped-ionqubits,”J.Phys. B: At. Mol. Opt. Phys.51,174006 (2018).
  • Dingetal. (2019)Z.-H.Ding, J.-M.Cui,Y.-F.Huang, C.-F.Li, T.Tu, andG.-C.Guo,“Fast high-fidelity readout of a single trapped-ion qubitvia machine-learning methods,”Phys. Rev. Appl.12,014038 (2019).
  • Bravyietal. (2021)S.Bravyi, S.Sheldon,A.Kandala, D.C.Mckay, andJ.M.Gambetta,“Mitigating measurement errors inmultiqubit experiments,”Phys. Rev. A103,042605 (2021).
  • Lienhardetal. (2022)B.Lienhard, A.Vepsäläinen, L.C.G.Govia, C.R.Hoffer,J.Y.Qiu, D.Ristè, M.Ware, D.Kim, R.Winik, A.Melville,B.Niedzielski, J.Yoder, G.J.Ribeill, T.A.Ohki, H.K.Krovi, T.P.Orlando, S.Gustavsson, andW.D.Oliver,“Deep-neural-network discrimination ofmultiplexed superconducting-qubit states,”Phys. Rev. Appl.17,014024 (2022).
  • Darulováetal. (2021)J.Darulová, M.Troyer,andM.C.Cassidy,“Evaluation of synthetic andexperimental training data in supervised machine learning applied tocharge-state detection of quantum dots,”Mach. Learn.: Sci. Technol.2,045023 (2021),publisher: IOP Publishing.
  • Matsumotoetal. (2021)Y.Matsumoto, T.Fujita,A.Ludwig, A.D.Wieck, K.Komatani, andA.Oiwa,“Noise-robust classification of single-shot electron spinreadouts using a deep neural network,”npj Qu. Inf.7,136 (2021).
  • Goodfellowetal. (2016)I.Goodfellow, Y.Bengio,andA.Courville,Deep Learning(MIT Press,2016)www.deeplearningbook.org.
  • et. al. (2015)F.Cholletet. al.,“Keras,”https://github.com/fchollet/keras(2015).
  • Abadietal. (2015)M.Abadi, A.Agarwal,P.Barham, E.Brevdo, Z.Chen, C.Citro, G.S.Corrado, A.Davis, J.Dean,M.Devin, S.Ghemawat, I.Goodfellow, A.Harp, G.Irving, M.Isard, Y.Jia, R.Jozefowicz, L.Kaiser,M.Kudlur, J.Levenberg, D.Mané, R.Monga, S.Moore, D.Murray, C.Olah, M.Schuster, J.Shlens,B.Steiner, I.Sutskever, K.Talwar, P.Tucker, V.Vanhoucke, V.Vasudevan, F.Viégas, O.Vinyals, P.Warden, M.Wattenberg, M.Wicke, Y.Yu, andX.Zheng,“TensorFlow: Large-scale machine learning onheterogeneous systems,” (2015),software available from tensorflow.org.
  • Note (1)The Softmax activation function is defined as g(zc)=ezc\sum@\slimits@d=1Coutezd𝑔subscript𝑧superscript𝑐superscript𝑒subscript𝑧superscript𝑐\sum@superscriptsubscript\slimits@𝑑1subscript𝐶outsuperscript𝑒subscript𝑧𝑑g\left(z_{c^{\prime}}\right)=\frac{e^{z_{c^{\prime}}}}{\sum@\slimits@_{d=1}^{C%_{\rm out}}e^{z_{d}}}italic_g ( italic_z start_POSTSUBSCRIPT italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) = divide start_ARG italic_e start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG start_POSTSUBSCRIPT italic_d = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT roman_out end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG This function normalizesthe sum of the elements such that \sum@\slimits@c=1Coutzc=1\sum@superscriptsubscript\slimits@superscript𝑐1subscript𝐶outsubscript𝑧superscript𝑐1\sum@\slimits@_{c^{\prime}=1}^{C_{\rm out}}z_{c^{\prime}}=1start_POSTSUBSCRIPT italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT roman_out end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = 1. Each element value is then theweighted probability.
  • Kwonetal. (2017)M.Kwon, M.F.Ebert,T.G.Walker, andM.Saffman,“Parallel low-loss measurement ofmultiple atomic qubits,”Phys. Rev. Lett.119,180504 (2017).
  • Fardetal. (2017)F.S.Fard, P.Hollensen,S.Mcilory, andT.Trappenberg,“Impact of biased mislabelingon learning with deep networks,”in2017 InternationalJoint Conference on Neural Networks (IJCNN)(2017)pp.2652–2657.
  • Rolnicketal. (2018)D.Rolnick, A.Veit,S.Belongie, andN.Shavit,“Deep learning is robust to massive labelnoise,”arXiv:1705.10694 (2018).
  • Oquabetal. (2014)M.Oquab, L.Bottou,I.Laptev, andJ.Sivic,“Learning and transferring mid-level imagerepresentations using convolutional neural networks,”in2014 IEEE Conference onComputer Vision and Pattern Recognition(2014)pp.1717–1724.
  • Azizpouretal. (2015)H.Azizpour, A.S.Razavian, J.Sullivan,A.Maki, andS.Carlsson,“From generic to specific deep representationsfor visual recognition,”in2015 IEEE Conference on Computer Vision andPattern Recognition Workshops (CVPRW)(2015)pp.36–45.
  • Westonetal. (2012)J.Weston, F.Ratle,H.Mobahi, andR.Collobert,“Deep learning via semi-supervised embedding,”inNeural Networks:Tricks of the Trade: Second Edition(SpringerBerlin Heidelberg,Berlin, Heidelberg,2012)pp.639–655.
  • Dosovitskiyetal. (2014)A.Dosovitskiy, J.T.Springenberg, M.Riedmiller, andT.Brox,“Discriminativeunsupervised feature learning with convolutional neural networks,”inProceedings of the 27thInternational Conference on Neural Information Processing Systems - Volume1,NIPS’14(MITPress,Cambridge, MA, USA,2014)pp.766–774.
  • Chenetal. (2021)Z.Chen, K.J.Satzinger,J.Atalaya, A.N.Korotkov, A.Dunsworth, D.Sank, C.Quintana, M.McEwen, R.Barends, P.V.Klimov, S.Hong, C.Jones,A.Petukhov, D.Kafri, S.Demura, B.Burkett, C.Gidney, A.G.Fowler, A.Paler, H.Putterman,I.Aleiner, F.Arute, K.Arya, R.Babbush, J.C.Bardin, A.Bengtsson, A.Bourassa,M.Broughton, B.B.Buckley, D.A.Buell, N.Bushnell, B.Chiaro, R.Collins, W.Courtney, A.R.Derk, D.Eppens, C.Erickson,E.Farhi, B.Foxen, M.Giustina, A.Greene, J.A.Gross, M.P.Harrigan, S.D.Harrington, J.Hilton, A.Ho, T.Huang, W.J.Huggins,L.B.Ioffe, S.V.Isakov, E.Jeffrey, Z.Jiang, K.Kechedzhi, S.Kim, A.Kitaev, F.Kostritsa,D.Landhuis, P.Laptev, E.Lucero, O.Martin, J.R.McClean, T.McCourt, X.Mi,K.C.Miao, M.Mohseni, S.Montazeri, W.Mruczkiewicz, J.Mutus, O.Naaman, M.Neeley, C.Neill, M.Newman, M.Y.Niu, T.E.O’Brien, A.Opremcak,E.Ostby, B.Pató, N.Redd, P.Roushan, N.C.Rubin, V.Shvarts, D.Strain,M.Szalay, M.D.Trevithick, B.Villalonga, T.White, Z.J.Yao, P.Yeh, J.Yoo, A.Zalcman,H.Neven, S.Boixo, V.Smelyanskiy, Y.Chen, A.Megrant, andJ.Kelly,“Exponential suppression of bit or phase errors with cyclic errorcorrection,”Nature595,383–387(2021).
  • KingmaandBa (2017)D.P.KingmaandJ.Ba,“Adam: A method forstochastic optimization,”arXiv:1412.6980 (2017).

Appendix A Methods

This section describes the step-by-step procedure used to initialize and train the neural networks.

A.1 Preprocessing

The primary- and secondary-path images are acquired from the camera as raw, greyscale images. The primary-path images are fed into the conventional Gaussian mask analysis method (see Sec. II.3) in order to extract the “ground truth” labels – a binary value for each site in the array, where ‘1’ corresponds to the bright state and ‘0’ to the dark state. This results in a binary array of size (nimages(n_{\rm images}( italic_n start_POSTSUBSCRIPT roman_images end_POSTSUBSCRIPT, nsites)n_{\rm sites})italic_n start_POSTSUBSCRIPT roman_sites end_POSTSUBSCRIPT ), encompassing all the labels for the entire dataset. Prior to being sent to the CNN for processing, the secondary-path images are first normalized and zero centered: Iijsubscript𝐼𝑖𝑗I_{ij}italic_I start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT=(Iijμ)/αI_{ij}-\mu)/\alphaitalic_I start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT - italic_μ ) / italic_α where μ𝜇\muitalic_μ is the average pixel intensity over the entire training subset and α=ImaxImin𝛼subscript𝐼maxsubscript𝐼min\alpha=I_{\rm max}-I_{\rm min}italic_α = italic_I start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT - italic_I start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT where Imaxsubscript𝐼maxI_{\rm max}italic_I start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT and Iminsubscript𝐼minI_{\rm min}italic_I start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT are maximum and minimum pixel intensity in the training subset. The indices i𝑖iitalic_i and j𝑗jitalic_j refer to the coordinates of each pixel in the image.

Enhanced Measurement of Neutral Atom Qubits with Machine Learning (8)

A.2 Image Partitioning (only for CNN-site)

The input images for CNN-site are generated by cropping the array image into individual 10×10101010\times 1010 × 10 pixel images at predefined site locations. For each site, the site center is determined by performing a Gaussian fit over the averaged training image. Averaging the training images improves the signal-to-noise ratio and ensures that all sites contain the (averaged) image of an atom. Once the pixel coordinates at a site’s center have been identified, a 10×10101010\times 1010 × 10 crop centered on those coordinates is extracted. This is repeated for all sites in the array. All of the extracted site images together form an array of shape (nimage(n_{\rm image}( italic_n start_POSTSUBSCRIPT roman_image end_POSTSUBSCRIPT, nsitesubscript𝑛siten_{\rm site}italic_n start_POSTSUBSCRIPT roman_site end_POSTSUBSCRIPT, 10, 10), where nimagesubscript𝑛imagen_{\rm image}italic_n start_POSTSUBSCRIPT roman_image end_POSTSUBSCRIPT refers to the number of original (un-cropped) images. This array is reshaped to a 4-D matrix of size (nimage×nsite(n_{\rm image}\times n_{\rm site}( italic_n start_POSTSUBSCRIPT roman_image end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT roman_site end_POSTSUBSCRIPT, 10, 10, 1), which is passed to the CNN as input data (Tensorflow processes the images in batches, not one at a time). A separate array of this type is produced for each of the training, test, and validation subsets. The array of “ground truth” labels is transformed to a 1-D vector of size (nimage×nsitesubscript𝑛imagesubscript𝑛siten_{\rm image}\times n_{\rm site}italic_n start_POSTSUBSCRIPT roman_image end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT roman_site end_POSTSUBSCRIPT). The CNN does not receive these labels as input; they are used by Tensorflow to evaluate the CNN and manage the training process.

A.3 Training

The network is trained using the Tensorflow library, via the Keras API. The network parameters are randomly initialized to begin with, then trained using the pre-processed secondary-path images and corresponding “ground truth” labels. The hyper-parameters used for training are listed in Table 5. The training progress is supervised by monitoring the loss over the validation set at the end of every training epoch, and saving a snapshot of the best-so-far network parameters. At the end of the training process, only the set of parameters corresponding to the lowest-achieved validation loss is kept. Validation loss refers to the loss metric evaluated on the validation subset. A lower validation loss corresponds to better performance at the task. The max epoch number is chosen to be long enough to avoid underfitting the data. In other words, the training continues until overfitting occurs, such that the training accuracy will approach 100% by the end of the training process, while the validation accuracy begins to drop. Training to the point of overfitting does not harm the final result, because of this monitoring process. Roughly speaking, the optimal network parameters are achieved when the training accuracy (loss) and validation accuracy (loss) diverge, as seen in Fig. 8. This process of monitoring and selection ensures that the network parameters are a sufficiently good fit to the classification task and that the resulting network still generalizes well to data that it has not seen before.

ParameterCNN-siteCNN-array
OptimizerAdamAdam
Initial learning rate1×1041superscript1041\times 10^{-4}1 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT5×1045superscript1045\times 10^{-4}5 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT
Max epoch4030
Batch size6416

Note that the CNN needs to be re-trained from scratch for every new dataset acquired. For example, CNN-site has a completely different set of training parameters for the “9μm9𝜇m9\leavevmode\nobreak\ \mu\rm m9 italic_μ roman_m, 40 ms readout time” dataset as compared to the “9μm9𝜇m9\leavevmode\nobreak\ \mu\rm m9 italic_μ roman_m, 20 ms readout time” dataset or the “5μm5𝜇m5\leavevmode\nobreak\ \mu\rm m5 italic_μ roman_m, 20 ms readout time” dataset. We found that it is not difficult to modify the CNN architecture in such a way (namely, by incorporating an “embedding” layer and providing a dataset identifier as an additional input) that a single parameter set can handle all datasets equally, without significant loss of accuracy. This could be an important enhancement for real-time applications in which it is desirable to change the parameters of the experiment without acquiring a new dataset and re-training the CNN. We consider this to be a topic for future research.

Appendix B Alternative dataset generation method

For a system with only a single imaging path, a training dataset can be generated by performing a sequence of A𝐴Aitalic_A-B𝐵Bitalic_B-A𝐴Aitalic_A measurements on each randomly loaded array, where A𝐴Aitalic_A is a high fidelity measurement and B𝐵Bitalic_B is a noisy measurement. Then the data are post-selected to keep only the A𝐴Aitalic_A-B𝐵Bitalic_B-A𝐴Aitalic_A sequences in which the first and last A𝐴Aitalic_A measurements show identical array occupation (no atom loss), according to the conventional analysis method. Post-selection discards any data that are mislabeled due to atom loss during or after the first measurement. The post-selected A𝐴Aitalic_A measurements are then used to generate the “ground truth” labels, while the corresponding B𝐵Bitalic_B measurements constitute the inputs to the CNN.

Enhanced Measurement of Neutral Atom Qubits with Machine Learning (2024)

References

Top Articles
Latest Posts
Article information

Author: Jeremiah Abshire

Last Updated:

Views: 5758

Rating: 4.3 / 5 (74 voted)

Reviews: 81% of readers found this page helpful

Author information

Name: Jeremiah Abshire

Birthday: 1993-09-14

Address: Apt. 425 92748 Jannie Centers, Port Nikitaville, VT 82110

Phone: +8096210939894

Job: Lead Healthcare Manager

Hobby: Watching movies, Watching movies, Knapping, LARPing, Coffee roasting, Lacemaking, Gaming

Introduction: My name is Jeremiah Abshire, I am a outstanding, kind, clever, hilarious, curious, hilarious, outstanding person who loves writing and wants to share my knowledge and understanding with you.