Skip to content

Modin 0.8.1

Compare
Choose a tag to compare
@devin-petersohn devin-petersohn released this 30 Sep 23:46
· 1798 commits to master since this release
0.8.1
5b6f73a
Modin 0.8.1 release notes

The Modin 0.8.1 release contains a large amount of new functionality and
bugfixes. Additionally, a large amount of effort this release was spent
improving the code quality and testing infrastructure of Modin
developers. This is the first release that can be used with Omnisci as a
compute backend (experimentall:y).

Bugfixes + Pandas Concordance (🐛 + 🐼)
----------------------------------------
* FIX-#1647: Support repr() on empty Series. (#1859)
* Fix recursion in experimental mode in some cases (#1874)
* FIX-#1674: Series.apply and DataFrame.apply (#1718)
* FIX-#1869: index sort for count(level=...) (#1870)
* FIX-#1497: Don't sort in concat() when sort=False (#1889)
* FIX-#1854: groupby() with arbitrary series (#1886)
* FIX-#1959 #1987: Fix `duplicated` and `drop_duplicates` functions (#1994)
* FEAT-#1285: Add `sem` implementation for `Series` and `DataFrame` (#2048)
* FIX-#2054: Moved non-dependent on modin.DataFrame utils to modin/utils.py (#2055)
* FIX-#2052: fix spawning of remote cluster (#2053)
* FIX-#1918: fix core dumped issue (#2000)
* FIX-#1386: Fix `read_csv` for incorrect csv data (#2076)
* FIX-#1997: Fix `unstack` for MultiIndex with different inner lvl-nodes (#2012)
* FIX-#2080: engine dispatching moved to a separate folder (#2081)
* FEAT-#1957: abstract methods in BaseQueryCompiler replaced to defaults (#2047)
* FIX-#1997 #2084: Fix unstack for case when columns have (#2086)
* FIX-#2069: Add workaround for Python issubclass() quirk (#2070)
* REFACTOR-#2101: avoid unconditional index access in DataFrame.rename (#2102)
* FIX-#2110: get rid of 'NotImplementedError' at OmniSci query compiler (#2112)
* FIX-#1900: Fix bug in groupby when index name is passed by string (#2125)
* BUG-#2127: fix delimiter param for pyarrow based read_csv (#2129)
* FIX-#2145: add cloud dependencies to conda dev environment (#2143)
* FIX-#2148: add note about braceexpand for cloud examples (#2149)
* FIX-#2151: add add_conda_packages for remote omnisci (#2152)
* FIX-#2147: return interval for python micro version - *.*.X (#2146)
* FIX-#2134: Fix mismatch partitioning insertions with same index (#2140)
* FEAT-#2154: generate MultiIndex for columns in groupby.agg (#2155)
* FIX-#1921: Fix `read_excel` when sheet names are non-default (#2159)
* FIX-#2156: improve index name mangling (#2158)
* FIX-#2172: support float32 in calcite serializer (#2173)

New Functionality ✨
--------------------
* FEAT-#1881: add scale-out feature dependencies (#1892)
* FEAT-#2058: Improve how remote factories are defined (#2060)
* FEAT-#1871: introduce OmniSci based experimental engine (#2079)
* FEAT-#2108: Save rpyc server output if rpyc logging is on (#2109)
* FEAT-#2085: sync python version between both contexts (#2107)
* FEAT-#1991: Enable OmniSci on cloud (#2119)
* FIX-#1144: Fix `read_parquet` for working with HDFS (#2120)
* FEAT-#1992: enable ETL part of LoanPD bench in cloud (#2106)
* FEAT-#2089: add ability to install additional conda packages (#2117)
* FEAT-#1200: pivot_table implementation (#1669)
* FEAT-#2141: support skew aggregate in omnisci backend (#2142)
* FEAT-#1219, FEAT-#2135: Add `corr` and `cov` (#2130)
* FEAT-#1847: High performance, no shuffle train_test_split (#1848)
* FEAT-#2138: sync modin version between local and remote contexts (#2153)

Code Quality + Testing 💯
-------------------------
* TEST-#1876: Add tests running under experimental (#1877)
* FIX-#1867: establish CI (#1868)
* FIX-#1887: fix versions (#1888)
* TEST-#1865: Add RPyC library in requirements (#1866)
* TEST-2022: speed up prepare-cache job (#2023)
* TEST-#2024: remove test_dataframe.py (#2025)
* TEST-#2020: decrease parallel tests on Ubuntu (#2021)
* TEST-#2030: speed up cache; decrease parallel jobs in push.yml (#2031)
* TEST-#2028: speed up window tests (#2029)
* TEST-#2026: speed up test_join_sort.py (#2027)
* TEST-#2037: speed up test_binary with refactor dataframe.py (#2038)
* TEST-#2044: speed up iter tests (#2045)
* TEST-#2042: speed up udf tests (#2043)
* TEST-#2033: speed up test_series.py (#2034)
* TEST-#2039: speed up default tests (#2040)
* TEST-#2050: decrease number of parallel jobs on windows Ci (#2051)
* TEST-#1891: use conda instead of pip (#2056)
* REFACTOR-#2035: move getitem_array to the backend (#2036)
* REFACTOR-#2083: Rename LISCENSE_HEADER to LICENSE_HEADER. (#2082)
* REFACTOR-#1839: Update pandas dependency and pandas APIs to match (#1840)
* Simulate cluster for testing remote context (#1982)
* Fix test_from_csv for simulated remote case (#2111)
* TEST-#2123: testing of OmniSci added at CI (#2124)
* FEAT-#2087: Added benchmarks test suite (#2103)
* FEAT-#2136: Added benchmarks for mask generation and indexing (#2137)
* FIX-#2162: exclude test folders from coverage report (#2160)
* REFACTOR-#2170: simplify concat of a single frame (#2171)
* FEAT-#2166: Added benchmarks for DataFrame.merge (#2167)

Backend enhancements + Performance 🚀
-------------------------------------
* FEAT-#1861: Use cloudpickle library for experimental.cloud features (#1862)
* Fix access to special attributes in experimental mode (#1875)
* REFACTOR-#2011: move default_to_pandas in groupby to backend (#2041)
* FIX-#2115: Use `seek` when we don't need to check quotes for CSV (#2116)

Documentation 📃
----------------
* Conda recipe for Modin (#1986)
* FIX-#2131: Add note on `value_counts` for `DataFrame` in the doc (#2132)
* DOC-REFACTOR: 1/n Refactoring the documentation (#2095)
* DOCS-#2068: Provide Jupyter notebooks showing running NYC Taxi (#2168)
* DOCS-#2176: Add plantuml and issues to doc dependencies (#2177)

Dependencies
------------
* FIX-#911: Pin Dask Dependency for Python 3.8 compatiblity (#1846)
* FIX-#2090: Do not always require pyarrow (#2126)

Contributors this release
-------------------------

The following users contributed code to Modin since the last release.

@abykovsk (First Time contributor) ⭐️
@anton-malakhov (First Time contributor) ⭐️
@heuermh (First Time contributor) ⭐️
@ienkovich (First Time contributor) ⭐️
@itamarst (Returning contributor) 🌟
@prutskov (Returning contributor) 🌟
@amyskov (Returning contributor) 🌟
@vnlitvinov (Returning contributor) 🌟
@dchigarev (Returning contributor) 🌟
@YarShev (Returning contributor) 🌟
@anmyachev (Returning contributor) 🌟
@gshimansky (Returning contributor) 🌟
@devin-petersohn (Maintainer)