Quantum Classifier in Quantum Machine Learning

"Before diving deep, it's better to get an overview of something."

Index

Framework for Classical Machine Learning
Near Term vs Fault Tolerant
Variational Circuits
Variational Circuit as Classifier
- Data Encoding
  - Basis Encoding
  - Amplitude Encoding
  - Angle Encoding
  - Higher Order Encoding
- Applying a Variational Model
- Extracting Labels
  - Parity Post-Processing
  - Measuring First Qubit
- Optimization: Parameter Shift Rule
Do we have any advantage of using the Variational Circuit?
Conclusion
Credit for the above blog

Quantum Machine Learning has an advantage over Classical Machine Learning as it operates in an Infinite-dimensional space known as Hilbert Space.

To understand it, read this blog on 'Importance of Quantum Machine Learning': https://medium.com/@aanshsavla2453/importance-of-quantum-machine-learning-f00b01926ddf

Besides having access to the exponential state space, quantum computing also has some interesting quantum properties like interference where the probability amplitudes can add up or cancel each other out which creates a different way to think about probability. We can also think of quantum mechanics as a generalized probability theory. In classical machine learning, probability theory is the backbone of inference and statistical modeling. So if we replace the classical probability theory with a more generalized probability theory it might give a different way to think about the machine learning models.

If we talk about quantum machine learning, we have 4 types or 4 paradigms of machine learning in general: CC, CQ, QC, QQ. The majority of research is focused on the CQ paradigm where classical data is being modeled, enhanced or processed with a quantum computer.

To understand other paradigms read this blog on 'Introduction to Quantum Machine Learning': https://medium.com/@aanshsavla2453/introduction-to-quantum-machine-learning-1c68d375fe23

Framework for Classical Machine Learning:

For now, don't concentrate on the info given in each block. A general idea is, we always start with some data which is given as input to the machine learning model. It gives us the prediction which is scored with the cost function. Finally, we update our parameters based on some gradient-based techniques. The CQ paradigm of quantum machine learning tells how can we take the model(highlighted in green) and replace it with some computation on a quantum computer. Can we replace it with a quantum model or can we replace certain parts of the model with quantum computation which is beneficial or advantageous?

Near Term vs Fault Tolerant:

Near-term devices are the quantum devices that are noisier and are available today. These devices suffer from a lot of errors. Fault-tolerant devices are the ideal devices that are error-free. Hence there are 2 types of quantum machine learning algorithms. First, which are designed, run and optimized for near-term devices. Second, which are more theoretical and we know that it can be advantageous or run efficiently when we have these fault-tolerant quantum computers. For now, we can only focus on near-term devices which brings us to the idea of Variational Models.

Variational Circuits:

These models can be simply thought of as a quantum circuit with some parameters in it that we want to train, optimize and tweak.

Below is a general quantum circuit:

|ψ> is the initial state. U is the unitary operation or unitary evolution. M is the measurement of the qubit.

Below is a variational circuit:

The difference is that the unitary operations we apply can depend on some parameters like θ. The output of measurement is stochastic i.e. the output is random and hence we repeat the measurement multiple times to get the expected value of the output(<Z>). Thus, we get the probability distribution of the possible basis states. If we change the parameter sets of the variational circuit then we also change the statistics or the distribution of our circuit.

The above circuit can have different names:

Parameterized Quantum Circuit
Variational Circuit
Ansatz: A circuit that acts as a template. This circuit has some parameters in it.

Variational Circuit as Classifier:

The linear model of the classifier is replaced by a variational circuit. Here the task is to train a quantum circuit on labeled samples(supervised learning) to predict labels for new data. The steps for the above task are as follows:

Step 1: Encode the classical data into a quantum state.

Step 2: Apply a parameterized model

Step 3: Measure the circuit to extract labels

Step 4: Use optimization techniques (like gradient descent) to update model parameters.

Data Encoding:

How to encode data in the quantum circuit? This is still an open question. There is no hard and fast rule on how to do it. We only discuss a few methods which are proposed in some literature. No method is proven to be the best for all problems. Different data encoding techniques are used for different problems.

Basis Encoding :

We input the classical data into the basis states. For single qubit the basis states are |0> and |1>. For 2 qubits, the basis states are |00>,|01>,|10>,|11>. If we have two data points x1 and x2 and if we somehow convert them into binary representation such that x1 is converted to 01 and x2 is converted to 11, then basis encoding sets up a circuit such that we can encode the data x1 and x2 and associate them with appropriate basis state that they represent.

Conversion of x1 and x2:

Encoding of x1 and x2 into 2-qubit basis states:

Hence classical data is converted to binary representation and then encoded into the basis states into a quantum computer.

Amplitude Encoding:

Instead of encoding our classical data into the basis states, we encode it into amplitude vectors.

Consider a 2-qubit circuit:

The amplitude vector of the initial state will be given by:

Now if we want to encode a data vector given by:

We apply some rotations to the quantum circuit such that the initial state of the circuit is converted into the data vector x1.

Angle Encoding:

We have data with two features:

In Angle encoding, the number of qubits = the number of features in the data. Each qubit is associated with each feature and each feature is encoded by rotating the qubits about an angle that depends on the feature value. Let's consider Rz rotation. So the circuit will be given by:

Now if we increase the number of features to 3, then we will need 3 qubits to encode these features.

Higher-Order Encoding:

Consider the same data point with two features as above. The circuit of higher-order encoding can be described in 4 steps:

Apply Hadamard gate to bring all qubits in the Superposition
Apply Angle Encoding to each qubit
Apply an entangling gate such as a CNOT gate.
Apply the Rotation gate with a product of the feature as the parameter of rotation.

Due to the product of the feature values we call this technique a higher-order encoding. Sometimes the green highlighted region is repeated multiple times which has some reasons. These encoding blocks are stacked upon each other which gives the depth of the feature map.

Applying a Variational Model:

The method to build a Variational Circuit is an open question. There is a lot of research in this area trying to figure out exactly how we should design circuits with certain benefits and trade-offs but it is also still largely being investigated. One research paper titled 'Expressibility and entangling capability of parameterized quantum circuits for the hybrid quantum-classical algorithm.' helps in answering this question. Link to the paper: https://arxiv.org/pdf/1905.10876.pdf

According to the above paper, Expressibility is defined as a circuit's ability to generate pure states that are well representative of the Hilbert Space. In quantum computing, the Bloch Sphere is a geometrical representation of the pure state space of a two-level quantum mechanical system. Hence Bloch Sphere can be considered as a specific case of a Hilbert Space. The results for the expressibility of different types of single qubit circuits are given as follows:

In this paper, they tried to understand that if we have a variational circuit with a certain structure how much space can we access or how much can we sample from the full Hilbert Space of the model i.e. if we have a quantum circuit with some variational parameters or variational operations then based on those operations, how much Hilbert space can we tap into with our model.

Let's consider the above circuits:

1) Single qubit with Identity Gate: A single qubit represents a Bloch sphere. We can access only the initial state of the qubit by applying an Identity Gate as there is no variational parameter involved in this circuit. No other state on the Bloch Sphere can be accessed or we can sample nothing out of an identity gate. This circuit has low expressibility.

2) Single qubit with H-Gate followed by Rz-Gate: We put the qubit in superposition by applying H-Gate and then we apply the rotation about the Z-axis and we sample different values for the angle at which we can rotate around Z-axis. As we give different values θ the Bloch vector will just rotate in a circle around the Z-axis. Hence the values of the Hilbert Space that we can tap into are the one that lies on the circumference perpendicular to the Z-axis.

3) Single qubit with H-Gate followed by Rz-Gate followed by Rx-Gate: Now we can also move above and below the circumference covered in the second case due to Rx Gate. Hence we can tap into more Hilbert Space than in the second case.

4) Arbitrary Unitary Gate: On applying any arbitrary unitary gate, we can tap into full Hilbert space. This circuit has high expressibility. While designing the variational circuit we might think that this high expressible circuit must be used to tap into full Hilbert Space but it is not always advantageous.

Below are some examples of the variational circuit given in the paper:

Let's see one typical example of a variational circuit that's used in a variational classifier:

The initial block is the Data-Encoding block. Then we apply rotation gates to each qubit. These rotation gates can be around any axis. For this case let's consider Y-axis i.e. Ry-Gate. These rotation gates are parameterized by trainable parameters. After these gates, we apply some entangling gates. Here we can do pair-wise entanglement or all-to-all entanglement where each qubit is entangled with each qubit. Now, if we want to increase the number of parameters, then we repeat the highlighted green block multiple times.

Extracting Labels:

How to measure the quantum circuit such that the output here can be interpreted as a label for our machine-learning task? We will discuss some techniques from a few literature papers:

1) Parity Post-Processing:

Consider a 2-qubit circuit with a Data Encoding circuit, Variational circuit and Measurement on any basis. Since there are 2 qubits there are 4 possible bases states: 00,01,10,11. Since the quantum measurements are stochastic, we need to repeat the measurements. These repetitions are called shots.

This shots-based measurement gives us a probability distribution or an expectation for measuring each of these possible states. Hence each of these basis states will have a probability associated with it. Let's say that on running the experiment multiple times the probability distribution(Ignoring the norm) is as follows:

In the parity post-processing, the parity of the basis state is mapped to different classes i.e. a binary classification mapping. Even parity corresponds to class +1 and Odd parity corresponds to class -1. The parity of each basis state is given as follows:

Now we sum the probabilities of even parity bases states i.e. 0.8+0.1 = 0.9. So this gives the probability of our data point being in class +1. Similarly summing the probabilities of odd parity bases states is 0.1+0=0.1. Hence the distribution is given as follows:

Hence our Variational circuit predicts the output as +1.

For more information on Parity Post Processing read the paper: 'Supervised learning with Quantum Enhanced Feature Space'.

Link to the paper: arxiv.org/pdf/1804.11326.pdf

2) Measuring the first qubit:

In this, we measure the first qubit and interpret that as the probability of our label. Consider the same binary classification problem that needs to be mapped to either +1 or -1. Consider the below circuit:

The expectation value will be between -1 and +1. We can decide on a rule that: If Z < 0, then we map Z to -1 and If Z >= 0, then we map Z to +1.

'M' here is called the Measurement Strategy.

Let's consider another complicated measurement strategy known as Mc. Any complicated measurement strategy can be decomposed by some rotations that happen before an easy measurement strategy.

Since we have a variational circuit, these additional rotations needed for Mc will be absorbed into this variational circuit i.e. these rotations will be part of the variational circuit itself and won't be considered separately for measurement. So in the end, the observable or the strategy remaining for measurement is the easy measurement strategy M.

Optimization:

The first question is 'Can we do optimization in quantum models?'. In classical machine learning theory, we can compute the gradients of our cost function by computing the gradients of our model. In the past, there was no efficient manner in which we could optimize neural networks which led to AI winters. This was the problem until backpropagation was developed which gave an idea of how to compute gradients based on the parameters. Fortunately, we can compute the gradient of a quantum circuit by using the Parameter-Shift Rule.

Consider a quantum circuit with n-qubits. The initial state will be given by :

Circuit for which gradient is to be computed:

The first step is to shift the parameters upward by some small amount 's'. Let this be denoted by 'A'.

The second step is to shift the parameters downward by some small amount 's'. Let this be denoted by 'B'.

The final gradient is given by:

Hence, by following the above method we can compute the gradient of a quantum model and can optimize the variational circuits.

Do we have any advantage in using variational circuits?

Let's consider Data Encoding again. We have classical data vector given as:

We have n-qubits to encode this data vector. We encode this data vector into a new state called phi-x which is a vector with 2 to the power n entries.

if n=2 then the classical data was mapped to a 4-dimensional vector.

So we are mapping our original data to a higher dimensional space. This gives the idea of the feature map or the kernel. Hence encoding our data into a quantum circuit can be an analogy to deriving a feature map. Hence we can think of this entire framework of building this quantum classifier as a linear classifier in a feature space where this feature space corresponds to a feature map that we have encoded our classical data into a quantum state. Hence this attempt of quantum classification does not provide an advantage as we are doing just a linear classification of a feature-map data. Hence to study how to embed data into a quantum state is the key thing here to do. More research is needed to know the practical advantages of such quantum classifiers.

Conclusion: We have discussed different building blocks of a quantum classifier like Data Encoding, Variational Circuits, Extracting Labels and Optimization techniques. This was just an introduction to a very simple Quantum Machine Learning Model.

Credit for the above blog:

I have gained this information from a video on Quantum Machine Learning by Qiskit. This video was made at Qiskit Global Summer School 2021. Thank You Amira Abbas for such a wonderful explanation of the above concept.

Check out the video: Building a Quantum Classifier https://www.youtube.com/watch?v=-sxlXNz7ZxU&list=PLOFEBzvs-VvqJwybFxkTiDzhf5E11p8BI&index=10

Quantum Classifier in Quantum Machine Learning

A simple Quantum Machine Learning model