Releases: modin-project/modin
Modin 0.6.2
Modin 0.6.2 release notes
This release contains a large number of new functionality for our internal Modin DataFrame abstraction, as well as many backend enhancements that dramatically improve performance.
We attempted to update to the latest version of Ray, but due to some API changes and changes in behavior, we had to revert that change until we can verify that this version of Ray will work with the large memory workloads of Modin.
Bugfixes + Pandas Concordance (🐛 + 🐼)
- Empty call queue after draining (#808)
New Functionality ✨
- Add MapReduce to the functions and register (#812)
- Initial addition of Fold Function in Pandas Query Compiler (#827)
Backend enhancements + Performance 🚀
- Create
modin_frame.filter_full_axis
and updatequery
anddropna
to use (#815) - Add fast path for
concat
when the axes and partitioning is aligned (#818) - Remove dtype requirement from
head
,tail
, etc. (#819) - Add mean to the Reductions list (#825)
- Adding
quantile
to list of Reductions (#826) - Adding hooks to be able to custom distribute non-pandas objects (#813)
Documentation 📃
- Fix build issue for documentation (#811)
Dependencies 🔗
Contributors this release
The following users contributed code to Modin since the last release.
@williamma12 (Maintainer)
@devin-petersohn (Maintainer)
🎉🎉 Thank you! 🎉🎉
Modin 0.6.1
Modin 0.6.1 release notes
This release includes several bugfixes and improvements to the backend. It also fixes support for windows users and adds new install targets. See the README for more information.
Bugfixes + Pandas Concordance (🐛 + 🐼)
- Update components of groupby when level is being used (#779)
- Support multiple
names
values for groupby on MultiLevelIndex (#788) - Update
Series.map
to support dictionary of operations and support (#787) - Fix
groupby
+reduce
for partitions with only 1 row (#795) - Fix
DataFrame.apply
when the result of theapply
is aSeries
(#796) - Update
any()
in groupby.py to accept arguments (#802)
New Functionality ✨
- Support Windows with the proper requirements (#780)
Backend enhancements + Performance 🚀
- Improve and correct metadata management internally (#792)
Documentation 📃
Dependencies 🔗
Contributors this release
The following users contributed code to Modin since the last release.
@sbrugman (First time contributor) 🌟
@xrmx (First time contributor) 🌟
@rliu4439 (First time contributor) 🌟
@williamma12 (Maintainer)
@devin-petersohn (Maintainer)
🎉🎉 Thank you! 🎉🎉
Modin 0.6.0
Modin 0.6.0 release notes
This release contains a large number of internal changes and some new functionality. Notably, a Dask backend for Windows support was added, and the pandas version was updated to 0.25.1. There were a number of minor bugfixes as well. The entire backend was refactored in #721 to support future additions easier and query planning.
We also dropped Python2 support while updating to the newest pandas version. pandas is no longer supporting Python2, so we will not as well.
Bugfixes + Pandas Concordance (🐛 + 🐼)
- Fix reshape so that it succeeds on larger tables as well (#731)
- count with level defaults to pandas (#748)
- Apply groupby function to every elements (#757)
- Fix issue where assigning a column value to overwrite a column (#770)
- Add a warning message when passing a SQLAlchemy connection object to read_sql (#771)
- Add better defaulting to pandas message for
groupby
(#773)
New Functionality ✨
- Preliminary read_json implementation (#715)
- count with level (#761)
- Compatibility Changes for Pandas 0.25.1 (#755)
- Add Dask futures implementation (#732)
Backend enhancements + Performance 🚀
- More efficiently manage metadata internally (#721)
- Reduce the amount of data we deserialize to match pandas (#774)
Code Quality 💯
- Rename
map_full_axis
style functions tofold
(#769)
Documentation 📃
- Link to Pandas Docs in modin_dataframe_supported for usage (#749)
- Link to Pandas Docs in UsingPandasonRay (#751)
Testing 📈
- Patch s3fs unclosed socket warning issue (#742)
Dependencies 🔗
Contributors this release
The following users contributed code to Modin since the last release.
@loopyme (First time contributor) 🌟
@agardelein (First time contributor) 🌟
@anthonyhsyu (First time contributor) 🌟
@dulinda (Returning contributor)
@RehanSD (Returning contributor)
@simon-mo (Maintainer)
@williamma12 (Maintainer)
@devin-petersohn (Maintainer)
🎉🎉 Thank you! 🎉🎉
Modin 0.5.4
Modin 0.5.4 release notes
This release contains many performance enhancements and minor bugfixes. Several new features were also added this release.
Bugfixes + Pandas Concordance (🐛 + 🐼)
- Fix sorting for MultiIndex (#705)
- Fix parallel read_sql on Postgres tables (#707)
- Add fix for axis_partition operations after a join/concat (#704)
- Fix case where SQLAlchemy connections are passed in to read_sql (#712)
- Fix index mismatch for add (#710)
- Fixes indexing with boolean list in loc (#735)
- Default to pandas on MultiLevelIndex
reset_index
(#739)
New Functionality ✨
- reading gzipped csv files (#682)
- Add support for text manipulation operations (#713)
- Add support for
dot
(#719) - Add to_numpy method (#718)
- Read_csv with compression "bz2", "zip", and "xz" (#722)
Backend enhancements + Performance 🚀
- Improve performance of
__setitem__
on new column (#701) - Improve fillna performance depending on arguments passed in (#709)
- Add metadata to be returned by operations along entire axis (#699)
- Add a way to compute pre-compute metadata on read_csv (#714)
- Add metadata to object when reading from parquet (#716)
- Improve performance of dtypes computation by collecting at data ingest (#717)
- Update partition width calculation for read csv with a ray engine (#728)
- Add to_numpy to Frame Manager, Query Compiler, and BaseDataFrame (#726)
Code Quality 💯
Dependencies 🔗
- Update ray version to 0.7.1 (#697)
Contributors this release
The following users contributed code to Modin since the last release.
@dulinda (First time contributor) 🌟
@RehanSD (First time contributor) 🌟
@williamma12 (Maintainer)
@devin-petersohn (Maintainer)
🎉🎉 Thank you! 🎉🎉
Modin 0.5.3
Modin 0.5.3 release notes
This release includes several fixes to regressions and documentation. We also have preliminary support for an autoscaling cluster (#661). Performance groupby
+ sum
, count
, and other dimension reducing operations was increased by up to 10x from the previous implementation (#659).
Bugfixes + Pandas Concordance (🐛 + 🐼)
- Fix
usecols
when the string name of the column is provided (#652) - Fixes full axis reduce functions with empty row and/or column partitions (#663)
- Fix memory_usage() for transposed dataframes (#662)
- Default to pandas when trying to get tuple from Series (#689)
- Fix dtypes on empty dataframes calls to to_pandas (#688)
New Functionality ✨
- Initial ray autoscaler support (#661) 🎉
Backend enhancements + Performance 🚀
- Improve performance of Groupby (#659) 🎉
- Fix internal indices calculation for non-compute partitions (#691)
Documentation 📑
- Update documentation for DataFrame methods (#643)
- Update Documentation structure (#665, #666, #667, #668)
- Update utilities documentation for pandas on ray (#669)
Testing 📈
- Remove extraneous teardown module for parquet (#648)
Regressions ↩️
- Fix binary operations after transpose (#676)
- Fix issue with
sort_values
after transpose (#679) - Fix
concat
when QueryCompiler is transposed (#681) - Fix
concat
with all Series and axis=1 (#684) - Fix how we compute the block_widths/lengths after single update (#693)
Contributors this release
The following users contributed code to Modin since the last release.
@williamma12 (Committer)
@devin-petersohn (Admin)
🎉🎉 Thank you! 🎉🎉
Modin 0.5.2
Modin 0.5.2 release notes
This release is a hotfix for a bug/regression introduced in 0.5.1.
Bugfixes + Pandas Concordance (🐛 + 🐼)
- Fix Parquet reader for partitioned files (#644)
Modin 0.5.1
Modin 0.5.1 release notes
This release includes performance improvements for indexing (loc
, iloc
, etc.) and some minor bugfixes.
Bugfixes + Pandas Concordance (🐛 + 🐼)
- Fix
usecols
whenheader=None
(#622) - Fix shallow copy (#631)
- Replace read_pandas with ParquetDataset to support predicate pushdown (#638)
- Support boolean indexers and other properties in
loc
(#635) - Fix hdf error checking (#639)
- Read partitioned parquet files (#632)
Backend enhancements + Performance 🚀
Dependencies 🔗
- Bump ray version to 0.7.0 (#623)
Regressions ↩️
- Fix Series.getitem with a slice (#615)
- Fix
apply
error checking for functions that require certain types (#617)
Contributors this release
The following users contributed code to Modin since the last release.
@ddutt (First time contributor) 🌟
@williamma12 (Committer)
@devin-petersohn (Admin)
🎉🎉 Thank you! 🎉🎉
Modin 0.5.0
Modin 0.5.0 release notes
This release includes many major new features and updates.
Bugfixes + Pandas Concordance (🐛 + 🐼)
- Fix loc with MultiIndex (#508)
- #516 Fix duplicated index in concat (#521)
- read_excel(sheet_name=None) not working (#512) (#532)
- Change how describe excludes columns (#535)
- Add "options" support #291 (#538)
- Correct behavior for
read_table
whensep=False
(#547) - Fix
read_csv
whenparse_dates
andindex_col
are the same (#548) - Fix issue where
repr
was not correct after mapreduce operation (#552) - Fix
reset_index
whenname
field of the index is set (#553) - Support for arguments not explicity in the signature for
read_fwf
(#561) - Add
datetime
to top level API. issue: #542 (#564) - Allow
concat
to accept non-subscriptable objects askeys
parameter (#568) - Fix support for
level
parameter in groupby (#575) - Fix numeric_only parameter (#578)
- Set series to dataframe (#545)
- Fix
astype
with"category"
as the type passed (#587)
User experience 👤
- Remove typing dependency (#571)
- Add warning when using the constructor of DataFrame and Series (#572)
- Fix compatibility for Python2 (#606)
New functionality ⭐️
- Add Gandiva as a partition engine for the Ray backend (#489)
- parallel read_sql() using limit and offset (#499)
- Integrate pyarrow's CSV reader into modin (#511)
- Added read_csv support for S3 (#505, #543)
- Distributed Series 🎉 (#522)
- Add parallelism parameter to read_sql() #455 (#594)
Backend enhancements + Performance 🚀
- Add fastrack for empty mask computation (#565)
- Change
QueryCompiler.view
to use index-based lookup (#566)
Dependencies 🔗
- Move
sqlalchemy
import statement in experimental io (#498) - pin pytables version in tests to avoid dependency mismatches (#500)
- Update numpy version to 1.16 (#506)
- Bump pandas version to 0.24.2 (#509)
- Update Ray version to 0.6.6 (#567)
Testing and Code Quality (📈 + 💯)
- Refactor and rename files to be more descriptive (#496)
- Add stress tests for modin (#481)
- Refactor to move all ci related files to ci/ (#479)
- Refactor the QueryCompiler module to separate backends (#510)
- Formatting with black (#527)
- Fix Travis incompatibility (#534)
- Dtype cleanup (#570)
Regressions ↩️
- Fix single-column DataFrame index on MapReduce operations (#580)
- Fix drop after transpose (#582)
- Add support for concat with empty DataFrames and new Series (#584)
- Fix Series.getitem for bool indexers and slices (#591)
- Fix binary operation after transpose (#589)
- Fix indexing on empty_series (#596)
- Correctly compute
reindex
after a transpose (#600) - Correctly apply Series functions element-wise for correct cases (#598)
- Fix regression in constructor for lists/dicts of Series (#602)
- Fix dtype checking if other is a scalar (#604)
Contributors this release
The following users contributed code to Modin since the last release.
@ipacheco-uy (First time contributor) 🌟
@pcmoritz (First time contributor) 🌟
@wuisawesome (First time contributor) 🌟
@williamma12 (Committer)
@eavidan (Committer)
@devin-petersohn (Admin)
🎉🎉 Thank you! 🎉🎉
Modin 0.4.0
Modin 0.4.0 release notes
This release includes many minor bugfixes and an update to pandas 0.24.
Bugfixes + Pandas Concordance (🐛 + 🐼)
- Correct support for
drop_duplicates
(#466) - Fix issue by properly handling parse_dates (#473)
- Correctly match pandas behavior when iteratively updating columns (#488)
- Add correct support for
include="all"
(#486)
User experience 👤
New functionality ⭐️
- parallel
to_sql()
(#461)
Backend enhancements + Performance 🚀
- Remove arguments causing errors with Ray 0.7.0 (#472)
- Create Base Query Compiler object for query compilers (#448)
- Fix inheritance of constructor for PandasQueryCompilerView (#478)
Testing and Code Quality (📈 + 💯)
- Add additional testing for API compatibility. (#480)
- Remove dead code (#493)
- Improve test coverage (#491)
Contributors this release
The following users contributed code to Modin since the last release.
@aglove2189 (First time contributor) 🌟
@williamma12 (Committer)
@eavidan (Committer)
@devin-petersohn (Admin)
🎉🎉 Thank you! 🎉🎉
Modin 0.4.0rc1
Modin 0.4.0rc1 release notes
This is a release candidate. Please test with existing workflows to ensure correctness.
Bugfixes + Pandas Concordance (🐛 + 🐼)
User experience 👤
- Update pandas version to 0.24 (#451) 🎉
New functionality ⭐️
- parallel
to_sql()
(#461)
Backend enhancements + Performance 🚀
- Remove arguments causing errors with Ray 0.7.0 (#472)
- Create Base Query Compiler object for query compilers (#448)
- Fix inheritance of constructor for PandasQueryCompilerView (#478)
Testing 📈
- Add additional testing for API compatibility. (#480)
Contributors this release
The following users contributed code to Modin since the last release.
@williamma12 (Committer)
@eavidan (Committer)
@devin-petersohn (Admin)
🎉🎉 Thank you! 🎉🎉