An implementation of cond_rnn for the prediction of pulse propagation in nonlinear waveguides.
This project is a companion to an Optics Express publication, and more in general to my PhD work in pulse propagation modelling with deep learning techniques.
Pulse propagation modelling in nonlinear optical materials is traditionally performed via numerical solutions of equations (NLSE, NEE, UPPE). Recently however, data-driven recurrent neural networks have shown potential in reducing the long running times required by legacy methods. This project builds on the state of the art to provide a more generalised ML-bsed approach. Combining existing RNNs for pulse propagation with the ConditionalRecurrent()
wrapper for Keras, the prediction is informed by auxiliary data, which allows for a single neural network capale of predicting the evolution of optical pulses through different structures.
For more background on the research and the theory, please refer to the paper.
Firstly, you will need to install both Tensorflow and Conditional Recurrent.
The model can then be built by using the Keras Functional API, as shown in the Jupyter Notebooks in the CODE
folder.
Two examples for creating models are provided here, for uses with one or two conditions.
In short, the idea is to define the inputs (sequential and conditional) as nodes, and pass them as inputs to the LSTM
layers, that are wrapped with the ConditionalRecurrent
.
# Sequence
i = Input(shape=[sequence_length, n_features])
# Conditional parameter
c = Input(shape=[cond_features])
# Combine into the first conditional LSTM layer
x = ConditionalRecurrent(LSTM(n_features, return_sequences=True))([i, c])
# Combine again into the second conditional LSTM layer
x = ConditionalRecurrent(LSTM(n_features, return_sequences=False))([x, c])
# Finally, the output Dense layer
x = Dense(units=n_features, activation=a_func)(x)
# Build the model from the tensors
model = Model(inputs=[i, c], outputs=[x])
# Compile the model
model.compile()
The model takes as inputs a sequence, like any LSTM model, and a condition. This means that for each individual prediction, the sequence is fed together with a conditon, which allows for different values of the condition to be used within a single structure. For instance, this is how non-uniformly poled structures can be modelled with this architecture (see paper).
Similarly to any non-sequential Keras model, the training and testing dataset includes multiple inputs, as shown here:
history = model.fit(
verbose=1,
x=[train_x, train_c], y=train_y,
batch_size=140,
validation_data=([test_x, test_c], test_y),
epochs= 200
)
Since each individual sequence is fed with a condition, the train_
and test_
arrays have the follwing shapes:
x = [None, sequence_length, n_features]
c = [None, cond_features]
y = [None, n_features]
See CODE/TrainModel.ipynb
for a more detailed guide on how to train the model.
Although each use case is different, and the hyperparameters should be determined heuristically, there are a few things I have found that could be useful.
As suggested by @philipperemy here, complex conditions should be normalised to a range centered at 0 and with a variance of 1.
Initialization of hidden states
When modelling complex dynamics like the evolution of optical pulses, I have found that feeding each condition twice works best, as you can see in the example notebooks.
Here is a couple of examples of what the trained networks can do. In these examples, the networks have been trained with the simulations computed via a numerical solution of the UPPE.
This network was built with a sequence lenght of 10, for the prediction of a spectrum made of 500 points along the freuency axis (n_features=500
). The condition fed with each sequence is the poling period of the second-order nonlinear coefficient at that longitudinal step
This is a visual comparison of the RNN compared to the numerical solution.
This model produced the above result in 5 seconds.
For more results and a detailed discussion on the performance, please see the paper
Since the ConditionalRecurrent
is compatible with all recurrent layers in Keras, it is possible to use SimpleRNN
and GRU
layers instead of the LSTM
. Specifically, GRU
layers could be used for achieving a more lean and quick network.
It should also be possible to extend this work, to include three or more conditional parameters.
I am also working on a more generalized version of this repo, since this method can ideally be applied to other nonlinear systems.
Special thanks to @philipperemy for making this layer, and for being so helpful as I was struggling in applying this architecture to my nonlinear optics research!