Release Catalyst v0.4.0 · PennyLaneAI/catalyst

New features

Catalyst is now accessible directly within the PennyLane user interface, once Catalyst is installed, allowing easy access to Catalyst just-in-time functionality.

Through the use of the qml.qjit decorator, entire workflows can be JIT compiled down to a machine binary on first-function execution, including both quantum and classical processing. Subsequent calls to the compiled function will execute the previously-compiled binary, resulting in significant performance improvements.
```
import pennylane as qml

dev = qml.device("lightning.qubit", wires=2)

@qml.qjit
@qml.qnode(dev)
def circuit(theta):
    qml.Hadamard(wires=0)
    qml.RX(theta, wires=1)
    qml.CNOT(wires=[0, 1])
    return qml.expval(qml.PauliZ(wires=1))
```
```
>>> circuit(0.5)  # the first call, compilation occurs here
array(0.)
>>> circuit(0.5)  # the precompiled quantum function is called
array(0.)
```
Currently, PennyLane supports the Catalyst hybrid compiler with the qml.qjit decorator, which directly aliases Catalyst's catalyst.qjit.

In addition to the above qml.qjit integration, the following native PennyLane functions can now be used with the qjit decorator: qml.adjoint, qml.ctrl, qml.grad, qml.jacobian, qml.vjp, qml.jvp, and qml.adjoint, qml.while_loop, qml.for_loop, qml.cond. These will alias to the corresponding Catalyst functions when used within a qjit context.

For more details on these functions, please refer to the PennyLane compiler documentation and compiler module documentation.
Just-in-time compiled functions now support asynchronous execution of QNodes. (#374) (#381) (#420) (#424) (#433)

Simply specify async_qnodes=True when using the @qjit decorator to enable the async execution of QNodes. Currently, asynchronous execution is only supported by lightning.qubit and lightning.kokkos.

Asynchronous execution will be most beneficial for just-in-time compiled functions that contain --- or generate --- multiple QNodes.

For example,
```
dev = qml.device("lightning.qubit", wires=2)

@qml.qnode(device=dev)
def circuit(params):
    qml.RX(params[0], wires=0)
    qml.RY(params[1], wires=1)
    qml.CNOT(wires=[0, 1])
    return qml.expval(qml.PauliZ(wires=0))

@qjit(async_qnodes=True)
def multiple_qnodes(params):
    x = jnp.sin(params)
    y = jnp.cos(params)
    z = jnp.array([circuit(x), circuit(y)]) # will be executed in parallel
    return circuit(z)
```
```
>>> func(jnp.array([1.0, 2.0]))
1.0
```
Here, the first two circuit executions will occur in parallel across multiple threads, as their execution can occur independently.
Preliminary support for PennyLane transforms has been added. (#280)
```
@qjit
@qml.transforms.split_non_commuting
@qml.qnode(dev)
def circuit(x):
    qml.RX(x,wires=0)
    return [qml.expval(qml.PauliY(0)), qml.expval(qml.PauliZ(0))]
```
```
>>> circuit(0.4)
[array(-0.51413599), array(0.85770868)]
```
Currently, most PennyLane transforms will work with Catalyst as long as:
- The circuit does not include any Catalyst-specific features, such
  as Catalyst control flow or measurement,
- The QNode returns only lists of measurement processes,
- AutoGraph is disabled, and
- The transformation does not require or depend on the numeric value of
  dynamic variables.
Catalyst now supports just-in-time compilation of dynamically-shaped arrays. (#366) (#386) (#390) (#411)

The @qjit decorator can now be used to compile functions that accepts or contain tensors whose dimensions are not known at compile time; runtime execution with different shapes is supported without recompilation.

In addition, standard tensor initialization functions jax.numpy.ones, jnp.zeros, and jnp.empty now accept dynamic variables (where the value is only known at runtime).
```
@qjit
def func(size: int):
    return jax.numpy.ones([size, size], dtype=float)
```
```
>>> func(3)
[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]
```
When passing tensors as arguments to compiled functions, the abstracted_axes keyword argument to the @qjit decorator can be used to specify which axes of the input arguments should be treated as abstract (and thus avoid recompilation).

For example, without specifying abstracted_axes, the following sum function would recompile each time an array of different size is passed as an argument:
```
>>> @qjit
>>> def sum_fn(x):
>>>     return jnp.sum(x)
>>> sum_fn(jnp.array([1]))     # Compilation happens here.
>>> sum_fn(jnp.array([1, 1]))  # And here!
```
By passing abstracted_axes, we can specify that the first axes of the first argument is to be treated as dynamic during initial compilation:
```
>>> @qjit(abstracted_axes={0: "n"})
>>> def sum_fn(x):
>>>     return jnp.sum(x)
>>> sum_fn(jnp.array([1]))     # Compilation happens here.
>>> sum_fn(jnp.array([1, 1]))  # No need to recompile.
```
Note that support for dynamic arrays in control-flow primitives (such as loops), is not yet supported.

Error mitigation using the zero-noise extrapolation method is now available through the catalyst.mitigate_with_zne transform. (#324) (#414)

For example, given a noisy device (such as noisy hardware available through Amazon Braket):

dev = qml.device("noisy.device", wires=2)

@qml.qnode(device=dev)
def circuit(x, n):

    @for_loop(0, n, 1)
    def loop_rx(i):
        qml.RX(x, wires=0)

    loop_rx()

    qml.Hadamard(wires=0)
    qml.RZ(x, wires=0)
    loop_rx()
    qml.RZ(x, wires=0)
    qml.CNOT(wires=[1, 0])
    qml.Hadamard(wires=1)
    return qml.expval(qml.PauliY(wires=0))

@qjit
def mitigated_circuit(args, n):
    s = jax.numpy.array([1, 2, 3])
    return mitigate_with_zne(circuit, scale_factors=s)(args, n)

>>> mitigated_circuit(0.2, 5)
0.5655341100116512

In addition, a mitigation dialect has been added to the MLIR layer of Catalyst. It contains a Zero Noise Extrapolation (ZNE) operation, with a lowering to a global folded circuit.

Improvements

The three backend devices provided with Catalyst, lightning.qubit, lightning.kokkos, and braket.aws, are now dynamically loaded at runtime. (#343) (#400)

This takes advantage of the new backend plugin system provided in Catalyst v0.3.2, and allows the devices to be packaged separately from the runtime CAPI. Provided backend devices are now loaded at runtime, instead of being linked at compile time.

For more details on the backend plugin system, see the custom devices documentation.

Finite-shot measurement statistics (expval, var, and probs) are now supported for the lightning.qubit and lightning.kokkos devices. Previously, exact statistics were returned even when finite shots were specified. (#392) (#410)

>>> dev = qml.device("lightning.qubit", wires=2, shots=100)
>>> @qjit
>>> @qml.qnode(dev)
>>> def circuit(x):
>>>     qml.RX(x, wires=0)
>>>     return qml.probs(wires=0)
>>> circuit(0.54)
array([0.94, 0.06])
>>> circuit(0.54)
array([0.93, 0.07])

Catalyst gradient functions grad, jacobian, jvp, and vjp can now be invoked from outside a @qjit context. (#375)

This simplifies the process of writing functions where compilation can be turned on and off easily by adding or removing the decorator. The functions dispatch to their JAX equivalents when the compilation is turned off.

dev = qml.device("lightning.qubit", wires=2)

@qml.qnode(dev)
def circuit(x):
    qml.RX(x, wires=0)
    return qml.expval(qml.PauliZ(0))

>>> grad(circuit)(0.54)  # dispatches to jax.grad
Array(-0.51413599, dtype=float64, weak_type=True)
>>> qjit(grad(circuit))(0.54). # differentiates using Catalyst
array(-0.51413599)

New lightning.qubit configuration options are now supported via the qml.device loader, including Markov Chain Monte Carlo sampling support. (#369)

dev = qml.device("lightning.qubit", wires=2, shots=1000, mcmc=True)

@qml.qnode(dev)
def circuit(x):
    qml.RX(x, wires=0)
    return qml.expval(qml.PauliZ(0))

>>> circuit(0.54)
array(0.856)

Improvements have been made to the runtime and quantum MLIR dialect in order to support asynchronous execution.
- The runtime now supports multiple active devices managed via a device pool. The new RTDevice data-class and RTDeviceStatus along with the thread_local device instance pointer enable the runtime to better scope the lifetime of device instances concurrently. With these changes, one can create multiple active devices and execute multiple programs in a multithreaded environment. (#381)
- The ability to dynamically release devices has been added via DeviceReleaseOp in the Quantum MLIR dialect. This is lowered to the __quantum__rt__device_release() runtime instruction, which updates the status of the device instance from Active to Inactive. The runtime will reuse this deactivated instance instead of creating a new one automatically at runtime in a multi-QNode workflow when another device with identical specifications is requested. (#381)
- The DeviceOp definition in the Quantum MLIR dialect has been updated to lower a tuple of device information ('lib', 'name', 'kwargs') to a single device initialization call __quantum__rt__device_init(int8_t *, int8_t *, int8_t *). This allows the runtime to initialize device instances without keeping partial information of the device. (#396)
The quantum adjoint compiler routine has been extended to support function calls that affect the quantum state within an adjoint region. Note that the function may only provide a single result consisting of the quantum register. By itself this provides no user-facing changes, but compiler pass developers may now generate quantum adjoint operations around a block of code containing function calls as well as quantum operations and control flow operations. (#353)
The allocation and deallocation operations in MLIR (AllocOp, DeallocOp) now follow simple value semantics for qubit register values, instead of modelling memory in the MLIR trait system. Similarly, the frontend generates proper value semantics by deallocating the final register value.

The change enables functions at the MLIR level to accept and return quantum register values, which would otherwise not be correctly identified as aliases of existing register values by the bufferization system. (#360)

Breaking changes

Third party devices must now provide a configuration TOML file, in order to specify their supported operations, measurements, and features for Catalyst compatibility. For more information please visit the Custom Devices section in our documentation. (#369)

Bug fixes

Resolves a bug in the compiler's differentiation engine that results in a segmentation fault when attempting to differentiate non-differentiable quantum operations. The fix ensures that all existing quantum operation types are removed during gradient passes that extract classical code from a QNode function. It also adds a verification step that will raise an error if a gradient pass cannot successfully eliminate all quantum operations for such functions. (#397)
Resolves a bug that caused unpredictable behaviour when printing string values with the debug.print function. The issue was caused by non-null-terminated strings. (#418)

Contributors

This release contains contributions from (in alphabetical order):

Ali Asadi,
David Ittah,
Romain Moyard,
Sergei Mironov,
Erick Ochoa Lopez,
Shuli Shu.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Catalyst v0.4.0

New features

Improvements

Breaking changes

Bug fixes

Contributors