Pdstools V4 alpha 1
Pre-release
Pre-release
StijnKas
released this
30 Oct 17:17
·
33 commits
to master
since this release
V4 brings some pretty major (and necessary) changes. A lot of them are, unfortunately, breaking - but it's for the best. pdstools is now much easier to maintain and keep consistent, and new functionality now has a much more logical place to go.
The goal is for the initial V4 release to contain most of the breaking (API-centric) changes we foresee in a long time. Then, we can of course still change the inner functionality and/or add new functions - but hopefully the most important function schemas/API don't need more changes anytime soon.
✨Highlights
- Farewell R - you've served us well, but pdstools is now Python only
- Introducing the Pega DX API Client
- Starting out with support for the 24.2 Prediction Studio and Knowledge Buddy APIs
- Major refactor of the entire codebase: consistent python naming, optional dependency groups, well-defined typehints
❌Deprecations/removals
- The R version of pdstools has been removed. In case you still want to use the R tools, you should manually clone the repo at the V3.x tag.
- The legacy IH utilities have been dropped. These were old parts of the codebase and untested/unused. New IH utilities are on their way!
🔨Changes
- Consistent pythonic casing, meaning
PascalCase
for classes &snake_case
for methods, variables & arguments - Much improved typehints, so it's much more obvious what the response of a given function will be
- Fewer 'base' dependencies; different functionality is split up into 'namespaces' that all have their own set of requirements
- The first time you invoke a method in a 'namespace', it verifies the dependencies and gives a clear warning if any are missing
- To expand on the previous point: functionality is split up much more logically. Taking the ADMDatamart class as an example:
- Plotting functionality is part of
ADMDatamart.plot.bubble_chart()
(or any other plot of course) - The health check and other reports are part of
ADMDatamart.generate.health_check()
(for instance) - The intermediate aggregations needed are part of
ADMDatamart.aggregations.pivot()
(for instance)
- Plotting functionality is part of
- Using
classmethod
s, we can initialize the ADMDatamart class in particular in a much more flexible way.- The main
__init__
method of the ADMDatamart class is very simple: it expects twopolars.LazyFrame
s; one formodel_data
and one forprediction_data
. If you've already read in your data, simply use this - If, instead, you want to use the previous functionality which automatically found the most recent file in a folder, you should initialize the datamart class like
ADMDatamart.from_ds_export()
- Or, if instead, you are consuming the results of a data flow (including the OOTB Prediction Studio export), you can simply initialize the datamart class like
ADMDatamart.from_dataflow_export(model_data="pattern_for_model_files*.json", predictor_data="pattern_for_predictor_files*.json")
. We can also cache the files we've read in before by writing to a 'cache' file automatically - this makes things move quickly. This closes #205 as well.
- The main
Todo before release:
- Update Pega Academy article https://academy.pega.com/topic/data-scientist-tools-customer-decision-hub/v1
- Further improve test coverage
- Complete missing docstrings
- Perform further internal testing
- Ensure all linked issues are fixed
- Improve some of the optional imports that are imported on library import