Skip to content

Releases: modin-project/modin

Modin 0.6.2

17 Oct 19:14
cf7be8c
Compare
Choose a tag to compare

Modin 0.6.2 release notes

This release contains a large number of new functionality for our internal Modin DataFrame abstraction, as well as many backend enhancements that dramatically improve performance.

We attempted to update to the latest version of Ray, but due to some API changes and changes in behavior, we had to revert that change until we can verify that this version of Ray will work with the large memory workloads of Modin.

Bugfixes + Pandas Concordance (🐛 + 🐼)

  • Empty call queue after draining (#808)

New Functionality ✨

  • Add MapReduce to the functions and register (#812)
  • Initial addition of Fold Function in Pandas Query Compiler (#827)

Backend enhancements + Performance 🚀

  • Create modin_frame.filter_full_axis and update query and dropna to use (#815)
  • Add fast path for concat when the axes and partitioning is aligned (#818)
  • Remove dtype requirement from head, tail, etc. (#819)
  • Add mean to the Reductions list (#825)
  • Adding quantile to list of Reductions (#826)
  • Adding hooks to be able to custom distribute non-pandas objects (#813)

Documentation 📃

  • Fix build issue for documentation (#811)

Dependencies 🔗

  • Update Ray version to 0.7.5 (#822)
  • Revert "Update Ray version to 0.7.5 (#822)" (#831)

Contributors this release

The following users contributed code to Modin since the last release.

@williamma12 (Maintainer)
@devin-petersohn (Maintainer)

🎉🎉 Thank you! 🎉🎉

Modin 0.6.1

21 Sep 01:42
3caf783
Compare
Choose a tag to compare

Modin 0.6.1 release notes

This release includes several bugfixes and improvements to the backend. It also fixes support for windows users and adds new install targets. See the README for more information.

Bugfixes + Pandas Concordance (🐛 + 🐼)

  • Update components of groupby when level is being used (#779)
  • Support multiple names values for groupby on MultiLevelIndex (#788)
  • Update Series.map to support dictionary of operations and support (#787)
  • Fix groupby + reduce for partitions with only 1 row (#795)
  • Fix DataFrame.apply when the result of the apply is a Series (#796)
  • Update any() in groupby.py to accept arguments (#802)

New Functionality ✨

  • Support Windows with the proper requirements (#780)

Backend enhancements + Performance 🚀

  • Improve and correct metadata management internally (#792)

Documentation 📃

  • Update README to include more information (#781)
  • Fixup a broken docs link (#799)

Dependencies 🔗

  • Move pyarrow import after Ray (#794)
  • Fix import (#804)
  • Update io.py (#784)

Contributors this release

The following users contributed code to Modin since the last release.

@sbrugman (First time contributor) 🌟
@xrmx (First time contributor) 🌟
@rliu4439 (First time contributor) 🌟
@williamma12 (Maintainer)
@devin-petersohn (Maintainer)

🎉🎉 Thank you! 🎉🎉

Modin 0.6.0

09 Sep 16:01
ac2c9da
Compare
Choose a tag to compare

Modin 0.6.0 release notes

This release contains a large number of internal changes and some new functionality. Notably, a Dask backend for Windows support was added, and the pandas version was updated to 0.25.1. There were a number of minor bugfixes as well. The entire backend was refactored in #721 to support future additions easier and query planning.

We also dropped Python2 support while updating to the newest pandas version. pandas is no longer supporting Python2, so we will not as well.

Bugfixes + Pandas Concordance (🐛 + 🐼)

  • Fix reshape so that it succeeds on larger tables as well (#731)
  • count with level defaults to pandas (#748)
  • Apply groupby function to every elements (#757)
  • Fix issue where assigning a column value to overwrite a column (#770)
  • Add a warning message when passing a SQLAlchemy connection object to read_sql (#771)
  • Add better defaulting to pandas message for groupby (#773)

New Functionality ✨

  • Preliminary read_json implementation (#715)
  • count with level (#761)
  • Compatibility Changes for Pandas 0.25.1 (#755)
  • Add Dask futures implementation (#732)

Backend enhancements + Performance 🚀

  • More efficiently manage metadata internally (#721)
  • Reduce the amount of data we deserialize to match pandas (#774)

Code Quality 💯

  • Rename map_full_axis style functions to fold (#769)

Documentation 📃

  • Link to Pandas Docs in modin_dataframe_supported for usage (#749)
  • Link to Pandas Docs in UsingPandasonRay (#751)

Testing 📈

  • Patch s3fs unclosed socket warning issue (#742)

Dependencies 🔗

  • Bump Ray version (#750)
  • Resolves Typing in Python3.7 Issue (#765)

Contributors this release

The following users contributed code to Modin since the last release.

@loopyme (First time contributor) 🌟
@agardelein (First time contributor) 🌟
@anthonyhsyu (First time contributor) 🌟
@dulinda (Returning contributor)
@RehanSD (Returning contributor)
@simon-mo (Maintainer)
@williamma12 (Maintainer)
@devin-petersohn (Maintainer)

🎉🎉 Thank you! 🎉🎉

Modin 0.5.4

17 Jul 05:49
a351907
Compare
Choose a tag to compare

Modin 0.5.4 release notes

This release contains many performance enhancements and minor bugfixes. Several new features were also added this release.

Bugfixes + Pandas Concordance (🐛 + 🐼)

  • Fix sorting for MultiIndex (#705)
  • Fix parallel read_sql on Postgres tables (#707)
  • Add fix for axis_partition operations after a join/concat (#704)
  • Fix case where SQLAlchemy connections are passed in to read_sql (#712)
  • Fix index mismatch for add (#710)
  • Fixes indexing with boolean list in loc (#735)
  • Default to pandas on MultiLevelIndex reset_index (#739)

New Functionality ✨

  • reading gzipped csv files (#682)
  • Add support for text manipulation operations (#713)
  • Add support for dot (#719)
  • Add to_numpy method (#718)
  • Read_csv with compression "bz2", "zip", and "xz" (#722)

Backend enhancements + Performance 🚀

  • Improve performance of __setitem__ on new column (#701)
  • Improve fillna performance depending on arguments passed in (#709)
  • Add metadata to be returned by operations along entire axis (#699)
  • Add a way to compute pre-compute metadata on read_csv (#714)
  • Add metadata to object when reading from parquet (#716)
  • Improve performance of dtypes computation by collecting at data ingest (#717)
  • Update partition width calculation for read csv with a ray engine (#728)
  • Add to_numpy to Frame Manager, Query Compiler, and BaseDataFrame (#726)

Code Quality 💯

  • Change datamanager to querycompiler (#702)
  • Code quality/reorganize query compiler (#706)

Dependencies 🔗

  • Update ray version to 0.7.1 (#697)

Contributors this release

The following users contributed code to Modin since the last release.

@dulinda (First time contributor) 🌟
@RehanSD (First time contributor) 🌟
@williamma12 (Maintainer)
@devin-petersohn (Maintainer)

🎉🎉 Thank you! 🎉🎉

Modin 0.5.3

19 Jun 05:05
588272b
Compare
Choose a tag to compare

Modin 0.5.3 release notes

This release includes several fixes to regressions and documentation. We also have preliminary support for an autoscaling cluster (#661). Performance groupby + sum, count, and other dimension reducing operations was increased by up to 10x from the previous implementation (#659).

Bugfixes + Pandas Concordance (🐛 + 🐼)

  • Fix usecols when the string name of the column is provided (#652)
  • Fixes full axis reduce functions with empty row and/or column partitions (#663)
  • Fix memory_usage() for transposed dataframes (#662)
  • Default to pandas when trying to get tuple from Series (#689)
  • Fix dtypes on empty dataframes calls to to_pandas (#688)

New Functionality ✨

  • Initial ray autoscaler support (#661) 🎉

Backend enhancements + Performance 🚀

  • Improve performance of Groupby (#659) 🎉
  • Fix internal indices calculation for non-compute partitions (#691)

Documentation 📑

  • Update documentation for DataFrame methods (#643)
  • Update Documentation structure (#665, #666, #667, #668)
  • Update utilities documentation for pandas on ray (#669)

Testing 📈

  • Remove extraneous teardown module for parquet (#648)

Regressions ↩️

  • Fix binary operations after transpose (#676)
  • Fix issue with sort_values after transpose (#679)
  • Fix concat when QueryCompiler is transposed (#681)
  • Fix concat with all Series and axis=1 (#684)
  • Fix how we compute the block_widths/lengths after single update (#693)

Contributors this release

The following users contributed code to Modin since the last release.

@williamma12 (Committer)
@devin-petersohn (Admin)

🎉🎉 Thank you! 🎉🎉

Modin 0.5.2

31 May 15:58
c1c985a
Compare
Choose a tag to compare

Modin 0.5.2 release notes

This release is a hotfix for a bug/regression introduced in 0.5.1.

Bugfixes + Pandas Concordance (🐛 + 🐼)

  • Fix Parquet reader for partitioned files (#644)

Modin 0.5.1

28 May 16:25
2ab1170
Compare
Choose a tag to compare

Modin 0.5.1 release notes

This release includes performance improvements for indexing (loc, iloc, etc.) and some minor bugfixes.

Bugfixes + Pandas Concordance (🐛 + 🐼)

  • Fix usecols when header=None (#622)
  • Fix shallow copy (#631)
  • Replace read_pandas with ParquetDataset to support predicate pushdown (#638)
  • Support boolean indexers and other properties in loc (#635)
  • Fix hdf error checking (#639)
  • Read partitioned parquet files (#632)

Backend enhancements + Performance 🚀

  • Add fastrack for slices when step=None (#614)
  • Make indexing faster (#613)

Dependencies 🔗

  • Bump ray version to 0.7.0 (#623)

Regressions ↩️

  • Fix Series.getitem with a slice (#615)
  • Fix apply error checking for functions that require certain types (#617)

Contributors this release

The following users contributed code to Modin since the last release.

@ddutt (First time contributor) 🌟
@williamma12 (Committer)
@devin-petersohn (Admin)

🎉🎉 Thank you! 🎉🎉

Modin 0.5.0

06 May 04:51
09ff0c2
Compare
Choose a tag to compare

Modin 0.5.0 release notes

This release includes many major new features and updates.

Bugfixes + Pandas Concordance (🐛 + 🐼)

  • Fix loc with MultiIndex (#508)
  • #516 Fix duplicated index in concat (#521)
  • read_excel(sheet_name=None) not working (#512) (#532)
  • Change how describe excludes columns (#535)
  • Add "options" support #291 (#538)
  • Correct behavior for read_table when sep=False (#547)
  • Fix read_csv when parse_dates and index_col are the same (#548)
  • Fix issue where repr was not correct after mapreduce operation (#552)
  • Fix reset_index when name field of the index is set (#553)
  • Support for arguments not explicity in the signature for read_fwf (#561)
  • Add datetime to top level API. issue: #542 (#564)
  • Allow concat to accept non-subscriptable objects as keys parameter (#568)
  • Fix support for level parameter in groupby (#575)
  • Fix numeric_only parameter (#578)
  • Set series to dataframe (#545)
  • Fix astype with "category" as the type passed (#587)

User experience 👤

  • Remove typing dependency (#571)
  • Add warning when using the constructor of DataFrame and Series (#572)
  • Fix compatibility for Python2 (#606)

New functionality ⭐️

  • Add Gandiva as a partition engine for the Ray backend (#489)
  • parallel read_sql() using limit and offset (#499)
  • Integrate pyarrow's CSV reader into modin (#511)
  • Added read_csv support for S3 (#505, #543)
  • Distributed Series 🎉 (#522)
  • Add parallelism parameter to read_sql() #455 (#594)

Backend enhancements + Performance 🚀

  • Add fastrack for empty mask computation (#565)
  • Change QueryCompiler.view to use index-based lookup (#566)

Dependencies 🔗

  • Move sqlalchemy import statement in experimental io (#498)
  • pin pytables version in tests to avoid dependency mismatches (#500)
  • Update numpy version to 1.16 (#506)
  • Bump pandas version to 0.24.2 (#509)
  • Update Ray version to 0.6.6 (#567)

Testing and Code Quality (📈 + 💯)

  • Refactor and rename files to be more descriptive (#496)
  • Add stress tests for modin (#481)
  • Refactor to move all ci related files to ci/ (#479)
  • Refactor the QueryCompiler module to separate backends (#510)
  • Formatting with black (#527)
  • Fix Travis incompatibility (#534)
  • Dtype cleanup (#570)

Regressions ↩️

  • Fix single-column DataFrame index on MapReduce operations (#580)
  • Fix drop after transpose (#582)
  • Add support for concat with empty DataFrames and new Series (#584)
  • Fix Series.getitem for bool indexers and slices (#591)
  • Fix binary operation after transpose (#589)
  • Fix indexing on empty_series (#596)
  • Correctly compute reindex after a transpose (#600)
  • Correctly apply Series functions element-wise for correct cases (#598)
  • Fix regression in constructor for lists/dicts of Series (#602)
  • Fix dtype checking if other is a scalar (#604)

Contributors this release

The following users contributed code to Modin since the last release.

@ipacheco-uy (First time contributor) 🌟
@pcmoritz (First time contributor) 🌟
@wuisawesome (First time contributor) 🌟
@williamma12 (Committer)
@eavidan (Committer)
@devin-petersohn (Admin)

🎉🎉 Thank you! 🎉🎉

Modin 0.4.0

07 Mar 03:05
7ee7a98
Compare
Choose a tag to compare

Modin 0.4.0 release notes

This release includes many minor bugfixes and an update to pandas 0.24.

Bugfixes + Pandas Concordance (🐛 + 🐼)

  • Correct support for drop_duplicates (#466)
  • Fix issue by properly handling parse_dates (#473)
  • Correctly match pandas behavior when iteratively updating columns (#488)
  • Add correct support for include="all" (#486)

User experience 👤

  • Update pandas version to 0.24 (#451) 🎉
  • add dockerfile (#487)

New functionality ⭐️

  • parallel to_sql() (#461)

Backend enhancements + Performance 🚀

  • Remove arguments causing errors with Ray 0.7.0 (#472)
  • Create Base Query Compiler object for query compilers (#448)
  • Fix inheritance of constructor for PandasQueryCompilerView (#478)

Testing and Code Quality (📈 + 💯)

  • Add additional testing for API compatibility. (#480)
  • Remove dead code (#493)
  • Improve test coverage (#491)

Contributors this release

The following users contributed code to Modin since the last release.

@aglove2189 (First time contributor) 🌟
@williamma12 (Committer)
@eavidan (Committer)
@devin-petersohn (Admin)

🎉🎉 Thank you! 🎉🎉

Modin 0.4.0rc1

26 Feb 03:47
50a7a27
Compare
Choose a tag to compare
Modin 0.4.0rc1 Pre-release
Pre-release

Modin 0.4.0rc1 release notes

This is a release candidate. Please test with existing workflows to ensure correctness.

Bugfixes + Pandas Concordance (🐛 + 🐼)

  • Correct support for drop_duplicates (#466)
  • Fix issue by properly handling parse_dates (#473)

User experience 👤

  • Update pandas version to 0.24 (#451) 🎉

New functionality ⭐️

  • parallel to_sql() (#461)

Backend enhancements + Performance 🚀

  • Remove arguments causing errors with Ray 0.7.0 (#472)
  • Create Base Query Compiler object for query compilers (#448)
  • Fix inheritance of constructor for PandasQueryCompilerView (#478)

Testing 📈

  • Add additional testing for API compatibility. (#480)

Contributors this release

The following users contributed code to Modin since the last release.

@williamma12 (Committer)
@eavidan (Committer)
@devin-petersohn (Admin)

🎉🎉 Thank you! 🎉🎉