GitHub - AILWQ/DySymNet: [ICML 2024] Official Pytorch implementation of the paper "A Neural-Guided Dynamic Symbolic Network for Exploring Mathematical Expressions from Data"

DySymNet

This repository contains the official Pytorch implementation for the paper A Neural-Guided Dynamic Symbolic Network for Exploring Mathematical Expressions from Data accepted by ICML'24.

🔥 News

[2024/10/12] Now DySymNet can be installed via 'pip install DySymNet'. You only need 1 command to start exploring expressions!

🚀 Highlights

DySymNet is a new search paradigm for symbolic regression (SR) that searches the symbolic network with various architectures instead of searching expressions in the large functional space.
DySymNet possesses promising capabilities in solving high-dimensional problems and optimizing coefficients, which are lacking in current SR methods.
DySymNet outperforms state-of-the-art baselines across various SR standard benchmark datasets and the well-known SRBench with more variables.

📦 Install

Create the conda environment and install DySymNet :

conda create -n dysymnet python=3.8
conda activate dysymnet
pip install DySymNet

🤗 Quick start

You can create and run the following script in any directory:

# Demo.py
import numpy as np
from DySymNet import SymbolicRegression
from DySymNet.scripts.params import Params
from DySymNet.scripts.functions import *

# You can customize some hyperparameters according to parameter configuration
config = Params()

# such as operators 
funcs = [Identity(), Sin(), Cos(), Square(), Plus(), Sub(), Product()]
config.funcs_avail = funcs

# Example 1: Input ground truth expression
SR = SymbolicRegression.SymboliRegression(config=config, func="x_1**3 + x_1**2 + x_1", func_name="Nguyen-1")
eq, R2, error, relative_error = SR.solve_environment()
print('Expression: ', eq)
print('R2: ', R2)
print('error: ', error)
print('relative_error: ', relative_error)
print('log(1 + MSE): ', np.log(1 + error))

Then you can get a folder named as "results" in the current directory, which contains subfolders named func_name that record the logs of the script running process.

⚙️ Parameter configuration

The main running script is SymbolicRegression.py and it relies on configuring runs via params.py. The params.py includes various hyperparameters of the controller RNN and the symbolic network. You can configure the following hyperparameters as required:

parameters for symbolic network structure

Parameters	Description	Example Values
`funcs_avail`	Operator library	See `params.py`
`n_layers`	Range of symbolic network layers	[2, 3, 4, 5]
`num_func_layer`	Range of the number of neurons per layer of a symbolic network	[2, 3, 4, 5, 6]

Note: You can add the additional operators in the functions.py by referring to existing operators and place them inside funcs_avail if you want to use them.

parameters for controller RNN

Parameters	Description	Example Values
`num_epochs`	epochs for sampling	500
`batch_size`	Size for a batch sampling	10
`optimizer`	Optimizer for training RNN	Adam
`hidden_size`	Hidden dim. of RNN layer	32
`embedding_size`	Embedding dim.	16
`learning_rate1`	Learning rate for training RNN	0.0006
`risk_seeking`	using risk seeking policy gradient or not	True
`risk_factor`	Risk factor	0.5
`entropy_weight`	Entropy weight	0.005
`reward_type`	Loss type for computing reward	mse

parameters for symbolic network training

Parameters	Description	Example Values
`learning_rate2`	Learning rate for training symbolic network	0.01
`reg_weight`	Regularizaiton weight	5e-3
`threshold`	Prunning threshold	0.05
`trials`	Training trials for training symbolic network	1
`n_epochs1`	Epochs for the first training stage	10001
`n_epochs2`	Epochs for the second training stage	10001
`summary_step`	Summary for every `n` training steps	1000
`clip_grad`	Using adaptive gradient clipping or not	True
`max_norm`	Norm threshold for gradient clipping	1.0
`window_size`	Window size for adaptive gradient clipping	50
`refine_constants`	Refining constants or not	True
`n_restarts`	Number of restarts for BFGS optimization	1
`add_bias`	adding bias or not	False
`verbose`	Print training process or not	True
`use_gpu`	Using cuda or not	False
`plot_reward`	Plot reward curve or not	False

Note: threshold controls the complexity of the final expression, and is a trade-off between complexity and precision, which you can customise according to your actual requirements.

parameters for genearting input data

Parameters	Description	Example Values
`N_TRAIN`	Size of input data	100
`N_VAL`	Size of validation dataset	100
`NOISE`	Standard deviation of noise for input data	0
`DOMAIN`	Domain of input data	(-1, 1)
`N_TEST`	Size of test dataset	100
`DOMAIN_TEST`	Domain of test dataset	(-1, 1)

Additional parameters

results_dir configures the save path for all results

🤖 Symbolic Regression

We provide two ways to perform symbolic regression tasks.

Option1: Input ground truth expression

When you want to discover an expression for which the ground truth is known, for example to test a standard benchmark, you can edit the script SymbolicRegression.py as follows:

# SymbolicRegression.py
params = Params()  # configuration for a specific task
ground_truth_eq = "x_1 + x_2"  # variable names should be written as x_i, where i>=1.
eq_name = "x_1+x_2"
SR = SymbolicRegression(config=params, func=ground_truth_eq, fun_name=eq_name)  # A new folder named "func_name" will be created to store the result files.
eq, R2, error, relative_error = SR.solve_environment()  # return results

In this way, the function generate_data is used to automatically generate the corresponding data set $\mathcal{D}(X, y)$ for inference, instead of you generating the data yourself.

Then, you can run SymbolicRegression.py directly, or you can run it in the terminal as follows:

python SymbolicRegression.py

After running this script, the results will be stored in path ./results/test/func_name.

Option2: Load the data file

When you only have observed data and do not know the ground truth, you can perform symbolic regression by entering the path to the csv data file:

# SymbolicRegression.py
params = Params()  # configuration for a specific task
data_path = './data/Nguyen-1.csv'  # data file should be in csv format
SR = SymbolicRegression(config=params, func_name='Nguyen-1', data_path=data_path)  # you can rename the func_name as any other you want.
eq, R2, error, relative_error = SR.solve_environment()  # return results

Note: the data file should contains ($X_{dim} + 1$) colums, which $X_{dim}$ is the number of independent variables and the last colum is the corresponding $y$ values.

Then, you can run SymbolicRegression.py directly, or you can run it in the terminal as follows:

python SymbolicRegression.py

After running this script, the results will be stored in path ./results/test/func_name.

Output

Once the script stops early or finishes running, you will get the following output:

Expression: x_1 + x_2
R2: 1.0
error: 4.3591795754679974e-13
relative_error:  2.036015757767018e-06
log(1 + MSE):  4.3587355946774144e-13

🌟 Citing this work

If you find our work and this codebase helpful, please consider starring this repo and cite:

@inproceedings{
li2024a,
title={A Neural-Guided Dynamic Symbolic Network for Exploring Mathematical Expressions from Data},
author={Wenqiang Li and Weijun Li and Lina Yu and Min Wu and Linjun Sun and Jingyi Liu and Yanjie Li and Shu Wei and Deng Yusong and Meilan Hao},
booktitle={Forty-first International Conference on Machine Learning},
year={2024},
url={https://openreview.net/forum?id=IejxxE9DO2}
}

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
DySymNet.egg-info		DySymNet.egg-info
DySymNet		DySymNet
build/lib/DySymNet		build/lib/DySymNet
data		data
dist		dist
img		img
.DS_Store		.DS_Store
Demo.py		Demo.py
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DySymNet

🔥 News

🚀 Highlights

📦 Install

🤗 Quick start

⚙️ Parameter configuration

parameters for symbolic network structure

parameters for controller RNN

parameters for symbolic network training

parameters for genearting input data

Additional parameters

🤖 Symbolic Regression

Option1: Input ground truth expression

Option2: Load the data file

Output

🌟 Citing this work

About

Releases

Packages

Languages

License

AILWQ/DySymNet

Folders and files

Latest commit

History

Repository files navigation

DySymNet

🔥 News

🚀 Highlights

📦 Install

🤗 Quick start

⚙️ Parameter configuration

parameters for symbolic network structure

parameters for controller RNN

parameters for symbolic network training

parameters for genearting input data

Additional parameters

🤖 Symbolic Regression

Option1: Input ground truth expression

Option2: Load the data file

Output

🌟 Citing this work

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages