Releases: modin-project/modin
Modin 0.15.1
This release pins Ray < 1.13.0 to avoid deserialization race condition.
Key Features and Updates
Contributors
Modin 0.15.0
This release includes updated support for pandas 1.4.2, new Batch and Logging APIs, and a plethora
of bug fixes and documentation improvements.
Key Features and Updates
- Stability and Bugfixes
- FIX-#4376: Upgrade pandas to 1.4.2 (#4377)
- FIX-#3615: Relax some deps in development env (#4365)
- FIX-#4370: Fix broken docstring links (#4375)
- FIX-#4392: Align Modin XGBoost with xgb>=1.6 (#4393)
- FIX-#4385: Get rid of
use-deprecated
option inpip
(#4386) - FIX-#3527: Fix parquet partitioning issue causing negative row length partitions (#4368)
- FIX-#4330: Override the memory limit to start ray 1.11.0 on Macs (#4335)
- FIX-#4407: Align
insert
function with pandas in case of numpy array with several columns (#4408) - FIX-#4373: Fix invalid file path when trying
read_csv_glob
withusecols
parameter (#4405) - FIX-#4394: Fix issue with multiindex metadata desync (#4395)
- FIX-#4438: Fix
reindex
function that doesn't preserve initial index metadata (#4442) - FIX-#4425: Add parameters to groupby pct_change (#4429)
- FIX-#4457: Fix
loc
in case when need reindex item (#4457) - FIX-#4414: Add missing f prefix on f-strings found at https://codereview.doctor/ (#4415)
- FIX-#4461: Fix S3 CSV data path (#4462)
- FIX-#4467:
drop_duplicates
no longer removes items based on index values (#4468) - FIX-#4449: Drain the call queue before waiting on result in benchmark mode (#4472)
- FIX-#4518: Fix Modin Logging to report specific Modin warnings/errors (#4519)
- FIX-#4481: Allow clipping with a Modin Series of bounds (#4486)
- FIX-#4504: Support na_action in applymap (#4505)
- FIX-#4503: Stop the memory logging thread after session exit (#4515)
- FIX-#4531: Fix a makedirs race condition in to_parquet (#4533)
- FIX-#4464: Refactor Ray utils and quick fix groupby.count failing on virtual partitions (#4490)
- FIX-#4436: Fix to_pydatetime dtype for timezone None (#4437)
- FIX-#4541: Fix merge_asof with non-unique right index (#4542)
- Performance enhancements
- Benchmarking enhancements
- Refactor Codebase
- REFACTOR-#4284: use variable length unpacking when getting results from
deploy
function (#4285) - REFACTOR-#3642: Move PyArrow storage format usage from main feature to experimental ones (#4374)
- REFACTOR-#4003: Delete the deprecated cloud mortgage example (#4406)
- REFACTOR-#4513: Fix spelling mistakes in docs and docstrings (#4514)
- REFACTOR-#4510: Align experimental and regular IO modules initializations (#4511)
- REFACTOR-#4284: use variable length unpacking when getting results from
- Developer API enhancements
- Update testing suite
- TEST-#4363: Use Ray from pypi in CI (#4364)
- FIX-#4422: get rid of case sensitivity for
warns_that_defaulting_to_pandas
(#4423) - TEST-#4426: Stop passing is_default kwarg to Modin and pandas (#4428)
- FIX-#4439: Fix flake8 CI fail (#4440)
- FIX-#4409: Fix
eval_insert
utility that doesn't actually check results ofinsert
function (#4410) - TEST-#4482: Fix getitem and loc with series of bools (#4483).
- Documentation improvements
- DOCS-#4296: Fix docs warnings (#4297)
- DOCS-#4388: Turn off fail_on_warning option for docs build (#4389)
- DOCS-#4469: Say that commit messages can start with PERF (#4470).
- DOCS-#4466: Recommend GitHub issues over [email protected] (#4474).
- DOCS-#4487: Recommend GitHub issues over [email protected] (#4489).
- Dependencies
- New Features
Contributors
@YarShev
@Garra1980
@prutskov
@alexander3774
@amyskov
@wangxiaoying
@jeffreykennethli
@mvashishtha
@anmyachev
@dchigarev
@devin-petersohn
@jrsacher
@orcahmlee
@naren-ponder
@RehanSD
Modin 0.14.1
This release contains a few key bugfixes and pandas version update.
Key Features and Updates
- FIX-#4376: Upgrade pandas to 1.4.2 (#4377)
- FIX-#4390: Add redis to Modin dependencies (#4396)
- FIX-#3527: Fix parquet partitioning issue causing negative row length partitions (#4368)
- FIX-#4330: Override the memory limit to start ray 1.11.0 on Macs. (#4335)
- FIX-#4394: Fix issue with multiindex metadata desync (#4395)
- FIX-#4373: fix usage of 'read_csv_glob' with 'usecols' parameter (#4405)
- FIX-#4425: Add parameters to groupby pct_change. (#4429)
Contributors
@Garra1980, @devin-petersohn, @dchigarev, @jeffreykennethli, @mvashishtha, @YarShev, @anmyachev
Modin 0.14.0
This release contains significant upgrades to Developer API, as well as to Modin's documentation,
some refactor codebase and performance enhancements, and multiple bugfixes.
Key Features and Updates
- Stability and Bugfixes
- FIX-#4058: Allow pickling empty dataframes and series (#4095)
- FIX-#4136: Fix exercise_3.ipynb example notebook (#4137)
- FIX-#4105: Fix names of pandas options to avoid
OptionError
(#4109) - FIX-#3417: Fix read_csv with skiprows and header parameters (#3419)
- FIX-#4142: Fix OmniSci enabling (#4146)
- FIX-#4162: Use
skipif
instead ofskip
for compatibility with pytest 7.0 (#4163) - FIX-#4158: Do not print OmniSci logs to stdout by default (#4159)
- FIX-#4177: Support read_feather from pathlike objects (#4177)
- FIX-#4234: Upgrade pandas to 1.4.1 (#4235)
- FIX-#3368: support unsigned integers in OmniSci backend (#4256)
- FIX-#4057: Allow reading an empty parquet file (#4075)
- FIX-#3884: Fix read_excel() dropping empty rows (#4161)
- FIX-#4257: Fix Categorical() for scalar categories (#4258)
- FIX-#4300: Fix Modin Categorical column dtype categories (#4276)
- FIX-#4208: Fix lazy metadata update for
PandasDataFrame.from_labels
(#4209) - FIX-#3981, FIX-#3801, FIX-#4149: Stop broadcasting scalars to set items (#4160)
- FIX-#4185: Fix rolling across column partitions (#4262)
- FIX-#4303: Fix the syntax error in reading from postgres (#4304)
- FIX-#4308: Add proper error handling in df.set_index (#4309)
- FIX-#4056: Allow an empty parse_date list in
read_csv_glob
(#4074) - FIX-#4312: Fix constructing categorical frame with duplicate column names (#4313).
- FIX-#4314: Allow passing a series of dtypes to astype (#4318)
- FIX-#4310: Handle lists of lists of ints in read_csv_glob (#4319)
- FIX-#4138, FIX-#4009: remove redundant sorting in the internal
- Performance enhancements
- Benchmarking enhancements
- Refactor Codebase
- REFACTOR-#3990: remove code duplication in
PandasDataframePartition
hierarchy (#3991) - REFACTOR-#4229: remove unused
dask_client
global variable inmodin\pandas\__init__.py
(#4230) - REFACTOR-#3997: remove code duplication for
broadcast_apply
method (#3996) - REFACTOR-#3994: remove code duplication for
get_indices
function (#3995) - REFACTOR-#4331: remove code duplication for
to_pandas
,to_numpy
functions inQueryCompiler
hierarchy (#4332) - REFACTOR-#4213: Refactor
modin/examples/tutorial/
directory (#4214) - REFACTOR-#4206: add assert check into
__init__
method ofPandasOnDaskDataframePartition
class (#4207) - REFACTOR-#3900: add flake8-no-implicit-concat plugin and refactor flake8 error codes (#3901)
- REFACTOR-#4093: Refactor base to be smaller (#4220)
- REFACTOR-#4047: Rename
cluster
directory tocloud
in examples (#4212) - REFACTOR-#3853: interacting with Dask interface through
DaskWrapper
class (#3854) - REFACTOR-#4322: Move is_reduce_fn outside of groupby_agg (#4323)
- REFACTOR-#3990: remove code duplication in
- Pandas API implementations and improvements
- Developer API enhancements
- FEAT-#4245: Define base interface for dataframe exchange protocol (#4246)
- FEAT-#4244: Implement dataframe exchange protocol for OmnisciOnNative execution (#4269)
- FEAT-#4144: Implement dataframe exchange protocol for pandas storage format (#4150)
- FEAT-#4342: Support `from_dataframe`` for pandas storage format (#4343)
- Update testing suite
- Documentation improvements
- DOCS-#4077: Add release notes template to docs folder (#4078)
- DOCS-#4082: Add pdf/epub/htmlzip formats for doc builds (#4083)
- DOCS-#4168: Fix rendering the examples on troubleshooting page (#4169)
- DOCS-#4151: Add info in troubleshooting page related to Dask engine usage (#4152)
- DOCS-#4172: Refresh Intel Distribution of Modin paragraph (#4175)
- DOCS-#4173: Mention strict channel priority in conda install section (#4178)
- DOCS-#4176: Update OmniSci usage section (#4192)
- DOCS-#4027: Add GIF images and chart to Modin README demonstrating speedups (https://github.com/modin-project/m...
Modin 0.13.3
This release contains a few key bugfixes and pandas version update.
Key Features and Updates
- Stability and Bugfixes
- Stop shallow dataframe copies from creating global shared state (#4184)
- Make PandasOnRayDataframeColumnPartition conformant to partition interface (#4231)
- Fix lazy metadata update for PandasDataFrame.from_labels (#4209)
- Fix Categorical() for scalar categories (#4258)
- Fix some cases when assigning a scalar to a subset of dataframe or series. (#4160)
- Align read_excel() behaviour on empty rows with pandas 1.3+ (#4161)
- Allow reading an empty parquet file. (#4075)
- Pin Dask<2022.2.0 as a temporary fix. (#4218)
- Add proper error handling in df.set_index. (#4309)
- Documentation improvements
- Clarify OmniSci activation in its usage section. (#4192)
- Upgrade pandas to 1.4.1 (#4235)
Contributors
@mvashishtha @anmyachev @prutskov @devin-petersohn @naren-ponder @YarShev @Garra1980
Modin 0.13.2
This release contains documentation polishing and small user experience
improvements.
Key Features and Updates
- Mention strict channel priority in conda install section (#4178)
- Refresh Intel Distribution of Modin paragraph (#4175)
- Add info in troubleshooting page related to Dask engine usage (#4152)
- Do not print OmniSci logs to stdout by default (#4159)
- Fix rendering the examples on troubleshooting page (#4169)
- Use skipif instead of skip for compatibility with pytest 7.0 (#4163)
Contributors
Modin 0.13.1
This release contains a few key bugfixes and updates to the documentation.
Key Features and Updates
- Stability and Bugfixes
- Documentation improvements
Contributors
@prutskov, @paulovn, @YarShev, @RehanSD, @devin-petersohn,
@mvashishtha
Modin 0.13.0
This release contains significant upgrades to Modin's documentation,
support for pandas 1.4, new algebra and partitioning layer APIs, and some bugfixes.
Key Features and Updates
- Stability and bugfixes
- Support for subscripting Resampler (1a1edfd)
- Fix groupby with column name for
by
(a04d7b7) - Workaround for groupby with
sort=False
with categorical keys (c67a7c5) - Align default value of
REDIS_PASSWORD
with Ray'sDEFAULT_REDIS_PASSWORD
(f79cb85) - Fix groupby dictionary aggregation when
by
and columns to aggregate overlap (d42c070) - Fix
read_csv
when callables are provided forskip_rows
parameter (7c84758) - Ensure address is not passed to
ray.init
when running Ray in local mode (02a23d4) - Ensure that
groupby.indices
returns positional indices (e9c06f2) - Fix setting of categorical values (0e36e22)
- Ensure
df.__getitem__
respects step attribute of slice (7e85c5d) - Ensure data argument is delievered to the Dataframe in experimental cloud mode (2f7da1f)
- Fix assigning to a Series with a single item (0d9d14e)
- Fix the default to pandas in pd.DataFrame.sparse.from_spmatrix (ab2855b)
- Fix
apply
result type inference (ac17ca1) - Exclude "scripts" from setup package (6224aba)
- Fix assigning a Categorical to a column (cb4e727)
- Ensure
df.to_csv
propagates metadata (e.g. index) (154697b) - Update
pyarrow
requirement in environment files (b55b08d)
- Performance enhancements
- Benchmarking enhancements
- Update benchmarks for groupby that are more representative (0582aa2)
- Refactor Codebase
- Pandas API implementations and improvements
- Add support for
storage_options
argument toread_csv_glob
(7c33afe) - Add support for
dropna
argument forgroupby.indices
andgroupby.groups
(144a613) - Ensure relabeling Modin Frame does not lose partition shape (3c740db)
- Update
Series.values
to default toto_numpy()
(67228ef) - Add support for
modin.pandas.show_versions
andpython -m modin --versions
(efe717f) - Upgrade pandas support to 1.4 (39fbc57)
- Add support for
- OmniSci enhancements
- Update benchmarks for groupby that are more representative (9396f23)
- Update documentation on Native + OmniSci (edc1608)
- Add support for
getArrowTable()
(6882ec2) - Fix segfault during
init
when only OmniSci is present (8c8a6a3) - Optimize
append
with default arguments (67013f9) - Fix OmniSci engine enabling for IO functions (9d1a334)
- XGBoost enhancements
- Developer API enhancements
- Update testing suite
- Documentation improvements
- Improve documentation on pandas on Ray execution (b76dc57)
- Reformat documentation to match pandas documentation theme (cc96f5d)
- Improve documentation on pandas on Python execution (d590de0)
- Improve System view in architecture documentation (6d51921)
- Improve documentation on using pandas on Dask (003f338)
- Improve documentation on pandas on Dask execution (61bf043)
- Add documentation on using pandas on Python (195b668)
- Improve Modin Out of Core documentation (cf426c4)
- Improve documentation on OmniSci on native execution (689faee)
- Improve documentation on IO (ffa67c7)
- Add documentation on factories and parsers (6ca66db)
- Improve documentation for experimental pandas on Ray execution (20abddd)
- Improve documentation for
modin.core.dataframe.base
andmodin.core.dataframe.pandas
(cf1e541) - Update troubleshooting documentation and add FAQs (cc95ae2)
- Improve README introduction and installation sections (a632d1f)
- Update copyright year (7da1dc8)
- Update a link to
pandas.read_json
(0315823) - Improve documentation for Modin vs. Dask (34732cb)
- Fix links to the contributing page (81a06d6)
- Remove broken links from supported apis (c04502d)
- Change docs copyright statement to 'Modin Developers' (ed2a7a4)
- Rename Developer page to Development in docs (406af7c)
- Improve "Getting Started" section (4a62bba)
- Update Modin tutorials (76707bf)
- Add back quickstart notebook (4dd97ab)
- Fix links in README and update README and FAQs (5d84042)
- Update Modin module layout in architecture docs (7fcafa7)
- Update documentation with new algebra operators and
ModinDataframe
(4b70725) - Add usage guide to documentation (4511566)
- Build docs with Python 3.8 (01c1876)
- Dependencies
- Update PyArrow to 6.0 and OmniSci to 5.10.1 (018515f)
Contributors
@anmyachev, @prutskov, @Rubtsowa, @vnlitvinov, @dchigarev, @YarShev, @amyskov,
@mvashishtha, @dorisjlee, @devin-petersohn, @jeffreykennethli, @RehanSD,
@novichkovg, @Lozovskii-Aleksandr, @naren-ponder, @ahallermed, @fexolm,
@adityagp, @susmitpy, @ienkovich
Modin 0.12.1
This release contains an update to the pandas version and a few bugfixes. Key Features and Updates ------------------------ * Update supported pandas version to 1.3.5 (b79989a) * Improvements to groupby * Fix `groupby` for case `by` is `None` (40d45c8) * Fix handling of dictionary aggregation (29f927b) * Return positional indices for Groupby property (c66324d) * Fix slicing dataframes with `step` property (5651844) * Fix assignment of data to category column (23dd3f8) Contributors this release ------------------------- @Rubtsowa, @prutskov, @dchigarev, @amyskov, @vnlitvinov, @mvashishtha, @YarShev, @devin-petersohn
Modin 0.12.0
This release contains a refactor to the codebase, encapsulating significant amounts of improvements to the maintainability of the code, and a plethora of bugfixes. This release also introduces a slack community for Modin users to interact with Modin developers. Please join us at our [Slack](https://modin.org/slack.html) to continue the conversation! Key Features and Updates ------------------------ * Stability and bugfixes * Support allowing callables and scalars together in .loc/.iloc (25ea7fd) * Ensure .loc with slice and scalar column returns Series (9492878) * Fix Modin OmniSci Docker example (b853c51) * Ensure Modin OmniSci + Modin Ray Docker containers install packages from conda-forge (032afd6) * Determine return type (Series or DataFrame) from one element Series (17ad1f0) * Update cloud examples (648b6a0) * Fix Modin OmniSci memory leak during `read_csv` (8581ba1) * Use `floor` for casting `float` to `int` for OmniSci 5.8.0 (c67a936) * Fix .loc on empty DataFrame (2260431) * Ensure Modin on Ray does not duplicate writes to disk on `to_csv` when workers die (6178a57) * Add support for `storage_options` argument in `read_*` functions except `read_excel` (77a00cc) * Ensure Modin Ray correctly raises exceptions when `to_parquet` or `to_csv` fail (8d67cd3) * Ensure Modin Ray does not hang when workers crash on `to_csv` (73bf061) * Remove platform specific code from `setup.py` to ensure distributions are pure Python (b186e40) * Refactor Codebase * Update import of public index classes to import from `pandas.core.indexes.api ` module (488357a) * Replace `try...finally` with pytest fixtures (c349a94) * Restructure project files (b37bcf8) * Use `fsspec` to open files (b8a9c07) * Add LGTM Service to CI (b193fef) * Remove extraneous `*NUM_THREADS` environment variables from CI (b925625) * Update documentation + code + comment language to reflect new project structure (7a81588) * Update language to reflect new project structure and add implementation to BaseDataframeAxisPartition (7ab2d90) * De-dupe `read_fwf` and `read_csv` code (2f824f8) * Reformat entire codebase with `black` and `flake8` (75f698c) * Pandas API implementations and improvements * Add support for `{true|false}_values` for `read_csv` for Modin OmniSci (9cd93f2) * Implement `explode` for Series and DataFrame (ddd4afe) * Support reading gzipped fwf (a80cb3b) * Add support for `to_parquet` Modin Ray (643596d) * Add support for creating an `sqlalchemy` connection with arbitrary arguments (ece98a6, 4a42e04) * Add support for `set_index` with different input types (cab37f2) * XGBoost enhancements * Support new DMatrix parameters (4d7f6d4) * Developer API enhancements * Throw custom errors when optional dependencies are missing (53bb047) * Improve Modin OmniSci quickstart (167957b) * Update testing suite * Documentation improvements * Dependencies * Add fsspec (dependency for IO) to dependencies (44e3f10) * Make `botocore` import optional (adc15c6) * Pin minimum `s3fs` dependency to fix `aibotocore` issue (8acad95) * Update PyArrow to 5.0 and OmniSci to 5.8 (4121358) Contributors ------------ @ienkovich, @vnlitvinov, @mvashishtha, @devin-petersohn, @dchigarev, @prutskov, @amyskov, @gshimansky, @anmyachev, @YarShev, @Garra1980, @Rubtsowa, @jeffreykennethli, @RehanSD, @dorisjlee, @naren-ponder