Qlib is an AI-oriented quantitative investment platform, which aims to realize the potential, empower the research, and create the value of AI technologies in quantitative investment.
It contains the full ML pipeline of data processing, model training, back-testing; and covers the entire chain of quantitative investment: alpha seeking, risk modeling, portfolio optimization, and order execution.
With Qlib, users can easily try ideas to create better Quant investment strategies.
For more details, please refer to our paper "Qlib: An AI-oriented Quantitative Investment Platform".
- News and Plans
- Framework of Qlib
- Quick Start
- Quant Model Zoo
- Quant Dataset Zoo
- More About Qlib
- Offline Mode and Online Mode
- Related Reports
- Contact Us
- Contributing
New features under development(order by estimated release time). Your feedbacks about the features are very important.
Feature | Status |
---|---|
Online serving and automatic model rolling | Under review: microsoft#290 |
Planning-based portfolio optimization | Under review: microsoft#280 |
Fund data supporting and analysis | Under review: microsoft#292 |
Point-in-Time database | Under review: microsoft#343 |
High-frequency trading | Initial opensource version under development |
Meta-Learning-based data selection | Initial opensource version under development |
Recent released features
Feature | Status |
---|---|
DoubleEnsemble Model | Released microsoft#286 |
High-frequency data processing example | Released microsoft#257 |
High-frequency trading example | Part of code released microsoft#227 |
High-frequency data(1min) | Released microsoft#221 |
Tabnet Model | Released microsoft#205 |
Features released before 2021 are not listed here.
At the module level, Qlib is a platform that consists of the above components. The components are designed as loose-coupled modules, and each component could be used stand-alone.
Name | Description |
---|---|
Infrastructure layer |
Infrastructure layer provides underlying support for Quant research. DataServer provides a high-performance infrastructure for users to manage and retrieve raw data. Trainer provides a flexible interface to control the training process of models, which enable algorithms to control the training process. |
Workflow layer |
Workflow layer covers the whole workflow of quantitative investment. Information Extractor extracts data for models. Forecast Model focuses on producing all kinds of forecast signals (e.g. alpha, risk) for other modules. With these signals Portfolio Generator will generate the target portfolio and produce orders to be executed by Order Executor . |
Interface layer |
Interface layer tries to present a user-friendly interface for the underlying system. Analyser module will provide users detailed analysis reports of forecasting signals, portfolios and execution results |
- The modules with hand-drawn style are under development and will be released in the future.
- The modules with dashed borders are highly user-customizable and extendible.
This quick start guide tries to demonstrate
- It's very easy to build a complete Quant research workflow and try your ideas with Qlib.
- Though with public data and simple models, machine learning technologies work very well in practical Quant investment.
Here is a quick demo shows how to install Qlib
, and run LightGBM with qrun
. But, please make sure you have already prepared the data following the instruction.
This table demonstrates the supported Python version of Qlib
:
install with pip | install from source | plot | |
---|---|---|---|
Python 3.6 | ✔️ | ✔️ (only with Anaconda ) |
✔️ |
Python 3.7 | ✔️ | ✔️ | ✔️ |
Python 3.8 | ✔️ | ✔️ | ✔️ |
Python 3.9 | ❌ | ✔️ | ❌ |
Note:
- Please pay attention that installing cython in Python 3.6 will raise some error when installing
Qlib
from source. If users use Python 3.6 on their machines, it is recommended to upgrade Python to version 3.7 or useconda
's Python to installQlib
from source. - For Python 3.9,
Qlib
supports running workflows such as training models, doing backtest and plot most of the related figures (those included in notebook). However, plotting for the model performance is not supported for now and we will fix this when the dependent packages are upgraded in the future.
Users can easily install Qlib
by pip according to the following command.
pip install pyqlib
Note: pip will install the latest stable qlib. However, the main branch of qlib is in active development. If you want to test the latest scripts or functions in the main branch. Please install qlib with the methods below.
Also, users can install the latest dev version Qlib
by the source code according to the following steps:
-
Before installing
Qlib
from source, users need to install some dependencies:pip install numpy pip install --upgrade cython
-
Clone the repository and install
Qlib
as follows.- If you haven't installed qlib by the command
pip install pyqlib
before:git clone https://github.com/microsoft/qlib.git && cd qlib python setup.py install
- If you have already installed the stable version by the command
pip install pyqlib
:git clone https://github.com/microsoft/qlib.git && cd qlib pip install .
Note: Only the command
pip install .
can overwrite the stable version installed bypip install pyqlib
, while the commandpython setup.py install
can't. - If you haven't installed qlib by the command
Tips: If you fail to install Qlib
or run the examples in your environment, comparing your steps and the CI workflow may help you find the problem.
Load and prepare data by running the following code:
# get 1d data
python scripts/get_data.py qlib_data --target_dir ~/.qlib/qlib_data/cn_data --region cn
# get 1min data
python scripts/get_data.py qlib_data --target_dir ~/.qlib/qlib_data/cn_data_1min --region cn --interval 1min
This dataset is created by public data collected by crawler scripts, which have been released in the same repository. Users could create the same dataset with it.
Please pay ATTENTION that the data is collected from Yahoo Finance, and the data might not be perfect. We recommend users to prepare their own data if they have a high-quality dataset. For more information, users can refer to the related document.
Qlib provides a tool named qrun
to run the whole workflow automatically (including building dataset, training models, backtest and evaluation). You can start an auto quant research workflow and have a graphical reports analysis according to the following steps:
-
Quant Research Workflow: Run
qrun
with lightgbm workflow config (workflow_config_lightgbm_Alpha158.yaml as following.cd examples # Avoid running program under the directory contains `qlib` qrun benchmarks/LightGBM/workflow_config_lightgbm_Alpha158.yaml
If users want to use
qrun
under debug mode, please use the following command:python -m pdb qlib/workflow/cli.py examples/benchmarks/LightGBM/workflow_config_lightgbm_Alpha158.yaml
The result of
qrun
is as follows, please refer to Intraday Trading for more details about the result.'The following are analysis results of the excess return without cost.' risk mean 0.000708 std 0.005626 annualized_return 0.178316 information_ratio 1.996555 max_drawdown -0.081806 'The following are analysis results of the excess return with cost.' risk mean 0.000512 std 0.005626 annualized_return 0.128982 information_ratio 1.444287 max_drawdown -0.091078
Here are detailed documents for
qrun
and workflow. -
Graphical Reports Analysis: Run
examples/workflow_by_code.ipynb
withjupyter notebook
to get graphical reports
The automatic workflow may not suit the research workflow of all Quant researchers. To support a flexible Quant research workflow, Qlib also provides a modularized interface to allow researchers to build their own workflow by code. Here is a demo for customized Quant research workflow by code.
Here is a list of models built on Qlib
.
- GBDT based on XGBoost (Tianqi Chen, et al. 2016)
- GBDT based on LightGBM (Guolin Ke, et al. 2017)
- GBDT based on Catboost (Liudmila Prokhorenkova, et al. 2017)
- MLP based on pytorch
- LSTM based on pytorch (Sepp Hochreiter, et al. 1997)
- GRU based on pytorch (Kyunghyun Cho, et al. 2014)
- ALSTM based on pytorch (Yao Qin, et al. 2017)
- GATs based on pytorch (Petar Velickovic, et al. 2017)
- SFM based on pytorch (Liheng Zhang, et al. 2017)
- TFT based on tensorflow (Bryan Lim, et al. 2019)
- TabNet based on pytorch (Sercan O. Arik, et al. 2019)
- DoubleEnsemble based on LightGBM (Chuheng Zhang, et al. 2020)
Your PR of new Quant models is highly welcomed.
The performance of each model on the Alpha158
and Alpha360
dataset can be found here.
All the models listed above are runnable with Qlib
. Users can find the config files we provide and some details about the model through the benchmarks folder. More information can be retrieved at the model files listed above.
Qlib
provides three different ways to run a single model, users can pick the one that fits their cases best:
-
Users can use the tool
qrun
mentioned above to run a model's workflow based from a config file. -
Users can create a
workflow_by_code
python script based on the one listed in theexamples
folder. -
Users can use the script
run_all_model.py
listed in theexamples
folder to run a model. Here is an example of the specific shell command to be used:python run_all_model.py --models=lightgbm
, where the--models
arguments can take any number of models listed above(the available models can be found in benchmarks). For more use cases, please refer to the file's docstrings.
Qlib
also provides a script run_all_model.py
which can run multiple models for several iterations. (Note: the script only support Linux for now. Other OS will be supported in the future. Besides, it doesn't support parrallel running the same model for multiple times as well, and this will be fixed in the future development too.)
The script will create a unique virtual environment for each model, and delete the environments after training. Thus, only experiment results such as IC
and backtest
results will be generated and stored.
Here is an example of running all the models for 10 iterations:
python run_all_model.py 10
It also provides the API to run specific models at once. For more use cases, please refer to the file's docstrings.
Dataset plays a very important role in Quant. Here is a list of the datasets built on Qlib
:
Dataset | US Market | China Market |
---|---|---|
Alpha360 | √ | √ |
Alpha158 | √ | √ |
Here is a tutorial to build dataset with Qlib
.
Your PR to build new Quant dataset is highly welcomed.
The detailed documents are organized in docs. Sphinx and the readthedocs theme is required to build the documentation in html formats.
cd docs/
conda install sphinx sphinx_rtd_theme -y
# Otherwise, you can install them with pip
# pip install sphinx sphinx_rtd_theme
make html
You can also view the latest document online directly.
Qlib is in active and continuing development. Our plan is in the roadmap, which is managed as a github project.
The data server of Qlib can either deployed as Offline
mode or Online
mode. The default mode is offline mode.
Under Offline
mode, the data will be deployed locally.
Under Online
mode, the data will be deployed as a shared data service. The data and their cache will be shared by all the clients. The data retrieval performance is expected to be improved due to a higher rate of cache hits. It will consume less disk space, too. The documents of the online mode can be found in Qlib-Server. The online mode can be deployed automatically with Azure CLI based scripts. The source code of online data server can be found in Qlib-Server repository.
The performance of data processing is important to data-driven methods like AI technologies. As an AI-oriented platform, Qlib provides a solution for data storage and data processing. To demonstrate the performance of Qlib data server, we compare it with several other data storage solutions.
We evaluate the performance of several storage solutions by finishing the same task, which creates a dataset (14 features/factors) from the basic OHLCV daily data of a stock market (800 stocks each day from 2007 to 2020). The task involves data queries and processing.
HDF5 | MySQL | MongoDB | InfluxDB | Qlib -E -D | Qlib +E -D | Qlib +E +D | |
---|---|---|---|---|---|---|---|
Total (1CPU) (seconds) | 184.4±3.7 | 365.3±7.5 | 253.6±6.7 | 368.2±3.6 | 147.0±8.8 | 47.6±1.0 | 7.4±0.3 |
Total (64CPU) (seconds) | 8.8±0.6 | 4.2±0.2 |
+(-)E
indicates with (out)ExpressionCache
+(-)D
indicates with (out)DatasetCache
Most general-purpose databases take too much time to load data. After looking into the underlying implementation, we find that data go through too many layers of interfaces and unnecessary format transformations in general-purpose database solutions. Such overheads greatly slow down the data loading process. Qlib data are stored in a compact format, which is efficient to be combined into arrays for scientific computation.
- 【华泰金工林晓明团队】图神经网络选股与Qlib实践——华泰人工智能系列之四十二
- Guide To Qlib: Microsoft’s AI Investment Platform
- 【华泰金工林晓明团队】微软AI量化投资平台Qlib体验——华泰人工智能系列之四十
- 微软也搞AI量化平台?还是开源的!
- 微矿Qlib:业内首个AI量化投资开源平台
- If you have any issues, please create issue here or send messages in gitter.
- If you want to make contributions to
Qlib
, please create pull requests. - For other reasons, you are welcome to contact us by email([email protected]).
- We are recruiting new members(both FTEs and interns), your resumes are welcome!
Join IM discussion groups:
Gitter |
---|
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the right to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.