Release Pdstools V4 beta 1 · pegasystems/pega-datascientist-tools

V4 brings some pretty major (and necessary) changes. A lot of them are, unfortunately, breaking - but it's for the best. pdstools is now much easier to maintain and keep consistent, and new functionality now has a much more logical place to go.

The goal is for the initial V4 release to contain most of the breaking (API-centric) changes we foresee in a long time. Then, we can of course still change the inner functionality and/or add new functions - but hopefully the most important function schemas/API don't need more changes anytime soon.

✨Highlights

Farewell R - you've served us well, but pdstools is now Python only
Introducing the Pega DX API Client
- Starting out with support for the 24.2 Prediction Studio and Knowledge Buddy APIs
Major refactor of the entire codebase: consistent python naming, optional dependency groups, well-defined typehints

❌Deprecations/removals

The R version of pdstools has been removed. In case you still want to use the R tools, you should manually clone the repo at the V3.x tag.
The legacy IH utilities have been dropped. These were old parts of the codebase and untested/unused. New IH utilities are on their way!
The Wiki documentation has been ported to the (tracked) Python documentation. We'll deprecate the wiki, but keep it live to give external links some time to link to the documentation instead.

🔨Changes

Consistent pythonic casing, meaning PascalCase for classes & snake_case for methods, variables & arguments
Much improved typehints, so it's much more obvious what the response of a given function will be
Fewer 'base' dependencies; different functionality is split up into 'namespaces' that all have their own set of requirements
- The first time you invoke a method in a 'namespace', it verifies the dependencies and gives a clear warning if any are missing
To expand on the previous point: functionality is split up much more logically. Taking the ADMDatamart class as an example:
- Plotting functionality is part of ADMDatamart.plot.bubble_chart() (or any other plot of course)
- The health check and other reports are part of ADMDatamart.generate.health_check() (for instance)
- The intermediate aggregations needed are part of ADMDatamart.aggregations.pivot() (for instance)
Using classmethods, we can initialize the ADMDatamart class in particular in a much more flexible way.
- The main __init__ method of the ADMDatamart class is very simple: it expects two polars.LazyFrames; one for model_data and one for prediction_data. If you've already read in your data, simply use this
- If, instead, you want to use the previous functionality which automatically found the most recent file in a folder, you should initialize the datamart class like ADMDatamart.from_ds_export()
- Or, if instead, you are consuming the results of a data flow (including the OOTB Prediction Studio export), you can simply initialize the datamart class like ADMDatamart.from_dataflow_export(model_data="pattern_for_model_files*.json", predictor_data="pattern_for_predictor_files*.json"). We can also cache the files we've read in before by writing to a 'cache' file automatically - this makes things move quickly. This closes #205 as well.

Todo before release:

Update Pega Academy article https://academy.pega.com/topic/data-scientist-tools-customer-decision-hub/v1
Further improve test coverage
Complete missing docstrings
Perform further internal testing
Ensure all linked issues are fixed
Improve some of the optional imports that are imported on library import

Full Changelog: V4.0.0-alpha.1...V4.0.0-beta.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pdstools V4 beta 1

✨Highlights

❌Deprecations/removals

🔨Changes