Pdstools V4 beta 1
Pre-release
Pre-release
V4 brings some pretty major (and necessary) changes. A lot of them are, unfortunately, breaking - but it's for the best. pdstools is now much easier to maintain and keep consistent, and new functionality now has a much more logical place to go.
The goal is for the initial V4 release to contain most of the breaking (API-centric) changes we foresee in a long time. Then, we can of course still change the inner functionality and/or add new functions - but hopefully the most important function schemas/API don't need more changes anytime soon.
✨Highlights
- Farewell R - you've served us well, but pdstools is now Python only
- Introducing the Pega DX API Client
- Starting out with support for the 24.2 Prediction Studio and Knowledge Buddy APIs
- Major refactor of the entire codebase: consistent python naming, optional dependency groups, well-defined typehints
❌Deprecations/removals
- The R version of pdstools has been removed. In case you still want to use the R tools, you should manually clone the repo at the V3.x tag.
- The legacy IH utilities have been dropped. These were old parts of the codebase and untested/unused. New IH utilities are on their way!
- The Wiki documentation has been ported to the (tracked) Python documentation. We'll deprecate the wiki, but keep it live to give external links some time to link to the documentation instead.
🔨Changes
- Consistent pythonic casing, meaning
PascalCase
for classes &snake_case
for methods, variables & arguments - Much improved typehints, so it's much more obvious what the response of a given function will be
- Fewer 'base' dependencies; different functionality is split up into 'namespaces' that all have their own set of requirements
- The first time you invoke a method in a 'namespace', it verifies the dependencies and gives a clear warning if any are missing
- To expand on the previous point: functionality is split up much more logically. Taking the ADMDatamart class as an example:
- Plotting functionality is part of
ADMDatamart.plot.bubble_chart()
(or any other plot of course) - The health check and other reports are part of
ADMDatamart.generate.health_check()
(for instance) - The intermediate aggregations needed are part of
ADMDatamart.aggregations.pivot()
(for instance)
- Plotting functionality is part of
- Using
classmethod
s, we can initialize the ADMDatamart class in particular in a much more flexible way.- The main
__init__
method of the ADMDatamart class is very simple: it expects twopolars.LazyFrame
s; one formodel_data
and one forprediction_data
. If you've already read in your data, simply use this - If, instead, you want to use the previous functionality which automatically found the most recent file in a folder, you should initialize the datamart class like
ADMDatamart.from_ds_export()
- Or, if instead, you are consuming the results of a data flow (including the OOTB Prediction Studio export), you can simply initialize the datamart class like
ADMDatamart.from_dataflow_export(model_data="pattern_for_model_files*.json", predictor_data="pattern_for_predictor_files*.json")
. We can also cache the files we've read in before by writing to a 'cache' file automatically - this makes things move quickly. This closes #205 as well.
- The main
Todo before release:
- Update Pega Academy article https://academy.pega.com/topic/data-scientist-tools-customer-decision-hub/v1
- Further improve test coverage
- Complete missing docstrings
- Perform further internal testing
- Ensure all linked issues are fixed
- Improve some of the optional imports that are imported on library import
Full Changelog: V4.0.0-alpha.1...V4.0.0-beta.1