Releases: NVIDIA-Merlin/core
Releases · NVIDIA-Merlin/core
v23.08.00
v23.05.00
What’s Changed
⚠ Breaking Changes
- Adjust the
DaskExecutor
API methods to takeDataset
s instead of ddfs @karlhigley (#299)
🐜 Bug Fixes
- Add some additional mutually exclusive tags to the collisions list @karlhigley (#316)
- Fix Pandas extension dtype mapping for newer versions of Pandas @karlhigley (#314)
- Provide better string alias support for dtypes, allow external types to resolve to unknown, fix cuDF struct support @karlhigley (#313)
- Make
ColumnSelector.all
a property instead of a manually set attribute @karlhigley (#296)
🚀 Features
- Add
Rename
to the set of core DAG ops that work in all DAGs @karlhigley (#312) - Enable Compound Tag Selection and Removal to work with atomic tags and strings @oliverholworthy (#317)
- Add optional
schema
parameter tofrom_df
method onTensorTable
@oliverholworthy (#286) - Add
as_tensor_type
method toTensorTable
for framework column conversion @oliverholworthy (#285) - Add support for cuDF's struct dtype @karlhigley (#309)
📄 Documentation
- Skip errors if branch tracking fails in
docs-sched-rebuild
@oliverholworthy (#327) - Pin numpy version for docs build to ensure we can build the API docs for recent versions @oliverholworthy (#326)
- Create stable branch locally in
docs-sched-rebuild
to enable stable docs build @oliverholworthy (#325) - Build docs for stable branch and make default @oliverholworthy (#322)
🔧 Maintenance
- remove cupy-cuda11x from tox test environment @nv-alaiacano (#323)
- Handle schema inference in Dataset with empty list col @oliverholworthy (#319)
- Convert data formats before executing each op in
LocalExecutor
@karlhigley (#280) - Add problem matcher for actionlint to annotate errors @oliverholworthy (#315)
- Add
actionlint
to pre-commit-config to check for valid GitHub Workflow config @oliverholworthy (#290) - Remove optional dependencies from Conda Recipe @oliverholworthy (#298)
- Remove warning about compound tags deprecation @oliverholworthy (#256)
- Add workflows to check base branch and set stable branch @oliverholworthy (#310)
- Update tag pattern in GitHub Workflows @oliverholworthy (#311)
- Skip package release jobs for dev tags @oliverholworthy (#305)
- Remove use of deprecated numpy aliases of builtin types @oliverholworthy (#308)
- don't re-run tests on closed PR @nv-alaiacano (#307)
- Revert "Adjust the
DaskExecutor
API methods to takeDataset
s inst… @karlhigley (#306) - Update packages workflow, separating PyPI from conda build @oliverholworthy (#300)
- Move build-docs to separate job in packages workflow with Python 3.9 @oliverholworthy (#302)
- Add Workflow to update the stable branch ref to the latest tag @oliverholworthy (#303)
- Adjust the
DaskExecutor
API methods to takeDataset
s instead of ddfs @karlhigley (#299) - CI: add quotes to workflow name @nv-alaiacano (#295)
v23.04.00
What’s Changed
⚠️ Breaking Changes
- Preserve original Dask partitions by default in
[Dataset.to](http://dataset.to/)_parquet
@rjzamora (#254) - Change the location and filename of schema.pbtxt to .merlin/schema.json @edknv (#249)
🐜 Bug Fixes
- Return a dataframe type that matches
reader
passed tofetch_table_data
@oliverholworthy (#287) - add hack to handle tf not recognizing bool dtype in dlpack @jperez999 (#276)
- update numpy version to handle dlpack @jperez999 (#275)
- fix cuda import logic from numba and device memsize @jperez999 (#274)
- change cpu conversion for tf to convert-to-tensor @jperez999 (#271)
- fix gpu numpy conversion offsets @jperez999 (#269)
- Disable strict dtype checking by default @karlhigley (#268)
- Propagate
_unsafe
flag through column constructors properly @karlhigley (#264) - Propagate the
_unsafe
mode flag fromTensorTable
toTensorColumn
@karlhigley (#260) - add import pytest to file @jperez999 (#229)
🚀 Features
- Add
column_type
property toTensorTable
@karlhigley (#283) - Extend mapping of nullable types for pandas @oliverholworthy (#278)
- add 3d tensor support to creating tensor columns @jperez999 (#246)
- Run with import without gpu @jperez999 (#261)
- Check environment supports target device in Dataset constructor @oliverholworthy (#243)
- Support
Dataset
cpu-mode in environment with GPUs that have not been detected @oliverholworthy (#236) - Allow casting a
Dimension
to an integer when min and max are the same @karlhigley (#252) - Add predicate function argument to
select_by_tag
@oliverholworthy (#94) - Add row_group_size argument to Dataset.to_parquet @rjzamora (#218)
- Enable Schema selection using
select_by_tag
with string representation ofTags
enum. @oliverholworthy (#242) - Add Schema
copy
method @oliverholworthy (#240)
🔧 Maintenance
- Update
pull_apart_list
to usepd.concat
instead of deprecatedSeries.append
@oliverholworthy (#291) - Install protobuf version compatible with tensorflow 2.9 for Merlin Models tests @oliverholworthy (#289)
- Add support for from_dlpack with numpy 1.23.0 @oliverholworthy (#284)
- Save schema in old location for backwards compatibility @oliverholworthy (#267)
- Refactor
LocalExecutor
into more discrete steps that can be overridden @karlhigley (#279) - Preserve type of shape dims as ints when re-loading schema from disk @oliverholworthy (#281)
- uses compat everywhere to allow container bypass when gpus not present @jperez999 (#277)
- update numpy version to handle dlpack @jperez999 (#275)
- fix cuda import logic from numba and device memsize @jperez999 (#274)
- migrate compat into a separate folder and separate tf and torch import @jperez999 (#272)
- change cpu conversion for tf to convert-to-tensor @jperez999 (#271)
- compat imports update @jperez999 (#270)
- fix gpu numpy conversion offsets @jperez999 (#269)
- fix configure tf function to id all gpus available @jperez999 (#266)
- migrate configure tensorflow to core, separate has_gpu from compat @jperez999 (#265)
- add 3d tensor support to creating tensor columns @jperez999 (#246)
- Revert #261 and #262 (
merlin.core.compat
changes) @karlhigley (#263) - Run with import without gpu @jperez999 (#261)
- Update
merlin.core.compat
to useHAS_GPU
and add add'l libraries @karlhigley (#262) - Rework DLpack conversion dispatching to allow caching dispatched methods @karlhigley (#259)
- Add an
unsafe
mode toTensorTable
/TensorColumn
(for internal use) @karlhigley (#258) - Make
TensorColumn
shape and dtype properties lazy but memoized @karlhigley (#257) - Bump
dask
,distributed
,fsspec
versions @karlhigley (#201) - Move common steps to run tox env into reusable workflow @oliverholworthy (#247)
- Improve check for array types in
is_list_dtype
@oliverholworthy (#253) - Support cupy and numpy array types in
flatten_list_column_values
@oliverholworthy (#251) - Update
is_list_dtype
to handle additional types @oliverholworthy (#250) - Remove use of HAS_GPU from
dispatch
functions @oliverholworthy (#244) - Change the location and filename of schema.pbtxt to .merlin/schema.json @edknv (#249)
- Add workflow for testing dataloader @oliverholworthy (#186)
- add import pytest to file @jperez999 (#229)
- Add correct job dependency for release in
cpu-packages
@oliverholworthy (#241)
v23.02.01
What's Changed
Patch release on top of v23.02.00
🔧 Maintenance
- Add pynvml dependency @oliverholworthy (#237)
Full Changelog: v23.02.00...v23.02.01
v23.02.00
What’s Changed
⚠ Breaking Changes
- Remove use of
is_list
/is_ragged
and replace with setting shapes @karlhigley (#215) - Add a new
shape
field toColumnSchema
@karlhigley (#195)
🐜 Bug Fixes
- Save schema with consistent dtype when
dtypes
is used @oliverholworthy (#182)
🚀 Features
- Update HAS_GPU variable to account for
CUDA_VISIBLE_DEVICES
@oliverholworthy (#221) - Clean up of make_df function @jperez999 (#205)
- separate cupy import from rapids @jperez999 (#211)
- Support partially specified value_count when used with
is_ragged=False
@oliverholworthy (#213) - Fix for updated versions of cudf to parquet @jperez999 (#204)
- Create standard Merlin dtypes in the
merlin.dtypes
module @karlhigley (#170)
🔧 Maintenance
- Remove use of
is_list
/is_ragged
and replace with setting shapes @karlhigley (#215) - Reduce the overhead of using
LocalExecutor
(esp. dtype validation) @karlhigley (#219) - Clean up of make_df function @jperez999 (#205)
- Add util functions for un/grouping column values/offsets in dicts @karlhigley (#216)
- Fill in some missing docstrings @karlhigley (#217)
- Serialize shapes to and from Merlin schema files @karlhigley (#214)
- Fix for updated versions of cudf to parquet @jperez999 (#204)
- add gcp label to jenkinsfile @AyodeAwe (#181)
- Add a new
shape
field toColumnSchema
@karlhigley (#195) - Increase upper bound of
pandas
version from 1.4 to 1.6 @oliverholworthy (#210) - Update pre-commit config with latest versions of repos @oliverholworthy (#208)
- Install latest version of NVTabular/dataloader with systems tests @oliverholworthy (#209)
- Add note on why we're using
device_get_count
instead ofcuda.gpus
@oliverholworthy (#207) - Add Formatter (Prettier) for YAML and Markdown files @karlhigley (#199)
- Change the name of the package building action @karlhigley (#198)
- Split CPU tests and building packages for release into separate actions @karlhigley (#197)
- Simplify
ColumnSchema.with
methods usingdataclasses.replace()
@karlhigley (#194) - Handle executor transform case when parent node provides no new columns @oliverholworthy (#226)
- Update Models/NVTabular test config @oliverholworthy (#185)
- skip notebook tests in models test @edknv (#193)
- add a build pandas column api for easier multihot column creation @jperez999 (#183)
- Use pre-commit for linting in GitHub Actions Workflow @oliverholworthy (#184)
- Convert to cudf.Series in create_multihot_col @oliverholworthy (#187)
- adding workflow for GPU CI on gha @jperez999 (#191)
v0.10.0 (22.12)
What’s Changed
🐜 Bug Fixes
- Fix file-count warning in Dataset.to_parquet @rjzamora (#159)
- Remove the @Property annotation from Transformable.columns @karlhigley (#166)
- Update value_count serialization/deserialization to be consistent with original schema @oliverholworthy (#111)
- Fix feature.shape attribute in from_merlin_schema @rjzamora (#169)
- Add the schema to the output of the .repartition() method @sararb (#192)
🚀 Features
- Read parquet statistics to optimize len when they are missing @rjzamora (#178)
- Change is_ragged property based on value_count in with_properties @oliverholworthy (#172)
- add is_list detection for merlin columns @jperez999 (#180)
- Enable partial value count to be specified @oliverholworthy (#171)
📄 Documentation
- docs: Add temp semver to calver banner @mikemckiernan (#161)
🔧 Maintenance
- Remove specifying is_ragged in LocalExecutor _transform_data @oliverholworthy (#173)
- Add Jenkinsfile @AyodeAwe (#167)
- Fix concat_columns for DataFrames with list features @oliverholworthy (#165)
- update drafter to work on tags & update cpu ci to target branches @jperez999 (#174)
- Remove explicit DictArray reference from merlin.core.dispatch @karlhigley (#163)
v0.9.0
What’s Changed
🐜 Bug Fixes
- Update
with_properties
to enable changing existing properties onColumnSchema
@oliverholworthy (#157) - Patch
is_list_dtype
/list_val_dtype
to work with Numpy ndarrays @karlhigley (#153) - Fix dtype inference from pandas list column @rjzamora (#154)
🚀 Features
- necessary changes to allow graph execution in dataloader @jperez999 (#152)
📄 Documentation
- docs: Add basic SEO configuration @mikemckiernan (#160)
🔧 Maintenance
- Rework executor
transform
methods to accept aGraph
@karlhigley (#158) - Update
with_properties
to enable changing existing properties onColumnSchema
@oliverholworthy (#157) - Serialize/Deserialize
ColumnSchema
consistently when the domain name matches the feature name @oliverholworthy (#155)
v0.8.0
What’s Changed
🚀 Features
- Add wildcard selector for cases where you'd like to select all columns by @karlhigley in #143
🐜 Bug Fixes
- Avoid using numba to set device context in import by @rjzamora in #145
- Fix ambigous statement when names is a list by @jperez999 in #147
- Resolve wildcard selectors in
BaseOperator.compute_selector()
by @karlhigley in #146
🔧 Maintenance
- Break
LocalExecutor.transform()
down into smaller methods by @karlhigley in #140 - Add and apply
DictArray
wrapper class and correspondingProtocol
definitions by @karlhigley in #141 - Specify minimum Python version as 3.8 in
setup.py
by @oliverholworthy in #151 - Add a
validate_schemas
hook to clean up downstream validation code by @karlhigley in #76 - Add XGBoost to
merlin-models
CPU tests by @karlhigley in #131
v0.7.0
What’s Changed
🔧 Maintenance
- Switch downstream repo tests from
build
tocheck
(to make optional) @karlhigley (#137)
v0.6.0
What’s Changed
⚠ Breaking Changes
- fix pull apart list for newer cudf versions @jperez999 (#122)
🐜 Bug Fixes
- ensure that combinations of nodes can be used as subgraphs @nv-alaiacano (#130)
- Set
HAS_GPU = False
indispatch
if relevant packages fail to import @oliverholworthy (#112) - remove upstream dependencies that have no outputs @nv-alaiacano (#107)
🚀 Features
- add subgraph feature of a Graph @nv-alaiacano (#128)
- Split compound tags (like
USER_ID
) into atomic tags (likeUSER
,ID
) @karlhigley (#119) - Add a
quantity
attribute toColumnSchema
@karlhigley (#118)
🔧 Maintenance
- Fix versioneer to get accurate version numbers @benfred (#132)
- Combine changes that address downstream failures @karlhigley (#136)
- Expand
models
testing in PR checks to include TF and other frameworks @karlhigley (#129) - Migrate Merlin DAG executors from NVTabular @karlhigley (#125)
- Improve the organization of the schema tests (column vs schema vs io) @karlhigley (#124)
- Split the downstream repo tests into separate Tox environments and Github actions @karlhigley (#127)
- Migrate test environment to
tox
@karlhigley (#126) - fix pull apart list for newer cudf versions @jperez999 (#122)
- Use mambabuild for generating conda package in github actions @benfred (#116)
- Split compound tags (like
USER_ID
) into atomic tags (likeUSER
,ID
) @karlhigley (#119) - Auto-update pre-commit hook packages @karlhigley (#117)
- Update
versioneer
from 0.21 to 0.23 @oliverholworthy (#114) - Pin
fsspec==2022.5.0
@karlhigley (#113)