-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Indicators and dataframes #5
Comments
First off, thank you for your thoughtful input @rokups, it is much appreciated. My thoughts are below:
PyBroker computes indicators in parallel using a process pool. To simplify this, the indicators are distributed across multiple processes for each ticker and indicator function pair. This means that there are no dependencies between indicators, making their computation easily parallelizable. If you need to share custom data between indicators, you can register a custom data column with PyBroker and then create your own DataSource class or pass your own DataFrame to PyBroker. The Creating a Custom DataSource notebook shows how to do this. In your example, you would calculate the
I am considering creating a wrapper around ta-lib. You should already be able to use pandas_ta by using a custom data source and registering custom columns, as explained previously. Perhaps I can add an example of pandas_ta to the custom DataSources notebook.
Creating multiple DataFrames would introduce extra overhead and complexity. External APIs for historical data are designed to return a single DataFrame to maintain simplicity and performance. However, a bigger concern is that having multiple DataFrames may not parallelize efficiently across multiple processes due to memory limitations and would also severely slow down serialization given PyBroker's current implementation. On the other hand, NumPy arrays can be mem-mapped across processes with ease and can be accelerated using Numba.
You can retrieve the indicator of another symbol using ExecContext#indicator(), as well as OHLCV + custom column data with ExecContext#foreign(). I agree that support for multi-symbol indicators would make sense. It is something that I considered during the design phase, but I limited the implementation to single-symbol indicators for the sake of simplicity in the initial release (V1). I need to give this more thought, but my plan would be to add support for multi-symbol indicators as a configuration option that groups data for all symbols per indicator. If you have any suggestions, please let me know. In the meantime, you can calculate the multi-symbol indicator outside of PyBroker, save it to a DataFrame column, and then register the custom column with PyBroker. |
Hmm what you say does make sense...
Here is a little help on that: talibgen.py.txt This is an updated and fixed script from TA-Lib/ta-lib-python#212, should simplify the process. |
Great, thank you! |
After reviewing TA-Lib again, I am unsure if creating a wrapper for it adds significant value. It's already fairly straightforward to integrate TA-Lib with PyBroker by using lambdas as shown in the following example:
I added this example to the Writing Indicators notebook. |
Hmm i did not think about using lambdas. Thank you for the example. I suppose this is solved then? |
I need to use a lot of indicators in pybroker, which are taken directly from datasource. Is there a way to quickly register these indicators? Or can I get the whole dataframe from the datasource directly in context without registering them? |
Hi @JevonYang, You can register the indicator columns in your dataframe using pybroker.register_columns. |
I figured i will share some thoughts i have after trying PyBroker a bit.
Writing indicators could be more convenient i think. For example, in freqtrade it works like this:
Bot calls
populate_indicators()
that we implement and passes entire dataframe to it. There we can do things like this:This looks trivial on surface and of course is nothing PyBroker can not do, but actually this is very powerful.
To achieve something like this in PyBroker we would have to create a custom indicator functions for
spread_sma20
andspread_sma40
. But here we waste calculation of the spread column as it is done twice now.It also is rather cumbersome to use indicator libraries like lib-ta or pandas_ta. These libraries already provide one-func-call indicators that we now must wrap in another function to acquaint them with PyBroker.
Normally i would just say "whatever, ill do it on symbol dataframe", however
datasource.query()
merges all symbols into one dataframe and thats the only place where it seems to make sense to insert custom indicators for backtesting.What would be convenient
First of all it seems to me it would make more sense if
data_source.query()
returned a list of dataframes instead of one dataframe with all symbols. This dataframe need to be split anyway, besides merging dataframes of different symbols puts a burden on the user to make sure that dataframes of all queried symbols are of equal length and user must properly merge them in case there are missing candles. If everyone has to do it - might as well do it in the library.Then, if dataframes were separate, we could also have a user-implemented
indicators_fn(df)
in the same spirit asexec_fn
, which would allow massaging dataframe in any way we see necessary and utilizing all power of pandas.This approach should be future-proof as well as adding support for multiple timeframes could be implemented as specifying
indicator_fn
for each timeframe. It should play well into live trading as well, sinceindicator_fn
could be called once every new bar comes in.Multi-symbol indicators
There is one special case where my proposed approach is not good enough: pairs trading. We need price data of two symbols in order to calculate necessary metrics. Maybe a way to get raw (just OHLC data, no indicators) symbol dataframe in
indicator_fn
could be an option. On same accord order entry for pairs trading is also bit unintuitive as entire process is split over twoexecute_fn
iterations, but thats another topic..Anyhow, by no means a request, just some food for thought and discussions. My proposition may have shortcomings that are unobvious to me.
The text was updated successfully, but these errors were encountered: