Modin 0.8.1
devin-petersohn
released this
30 Sep 23:46
·
1798 commits
to master
since this release
Modin 0.8.1 release notes The Modin 0.8.1 release contains a large amount of new functionality and bugfixes. Additionally, a large amount of effort this release was spent improving the code quality and testing infrastructure of Modin developers. This is the first release that can be used with Omnisci as a compute backend (experimentall:y). Bugfixes + Pandas Concordance (🐛 + 🐼) ---------------------------------------- * FIX-#1647: Support repr() on empty Series. (#1859) * Fix recursion in experimental mode in some cases (#1874) * FIX-#1674: Series.apply and DataFrame.apply (#1718) * FIX-#1869: index sort for count(level=...) (#1870) * FIX-#1497: Don't sort in concat() when sort=False (#1889) * FIX-#1854: groupby() with arbitrary series (#1886) * FIX-#1959 #1987: Fix `duplicated` and `drop_duplicates` functions (#1994) * FEAT-#1285: Add `sem` implementation for `Series` and `DataFrame` (#2048) * FIX-#2054: Moved non-dependent on modin.DataFrame utils to modin/utils.py (#2055) * FIX-#2052: fix spawning of remote cluster (#2053) * FIX-#1918: fix core dumped issue (#2000) * FIX-#1386: Fix `read_csv` for incorrect csv data (#2076) * FIX-#1997: Fix `unstack` for MultiIndex with different inner lvl-nodes (#2012) * FIX-#2080: engine dispatching moved to a separate folder (#2081) * FEAT-#1957: abstract methods in BaseQueryCompiler replaced to defaults (#2047) * FIX-#1997 #2084: Fix unstack for case when columns have (#2086) * FIX-#2069: Add workaround for Python issubclass() quirk (#2070) * REFACTOR-#2101: avoid unconditional index access in DataFrame.rename (#2102) * FIX-#2110: get rid of 'NotImplementedError' at OmniSci query compiler (#2112) * FIX-#1900: Fix bug in groupby when index name is passed by string (#2125) * BUG-#2127: fix delimiter param for pyarrow based read_csv (#2129) * FIX-#2145: add cloud dependencies to conda dev environment (#2143) * FIX-#2148: add note about braceexpand for cloud examples (#2149) * FIX-#2151: add add_conda_packages for remote omnisci (#2152) * FIX-#2147: return interval for python micro version - *.*.X (#2146) * FIX-#2134: Fix mismatch partitioning insertions with same index (#2140) * FEAT-#2154: generate MultiIndex for columns in groupby.agg (#2155) * FIX-#1921: Fix `read_excel` when sheet names are non-default (#2159) * FIX-#2156: improve index name mangling (#2158) * FIX-#2172: support float32 in calcite serializer (#2173) New Functionality ✨ -------------------- * FEAT-#1881: add scale-out feature dependencies (#1892) * FEAT-#2058: Improve how remote factories are defined (#2060) * FEAT-#1871: introduce OmniSci based experimental engine (#2079) * FEAT-#2108: Save rpyc server output if rpyc logging is on (#2109) * FEAT-#2085: sync python version between both contexts (#2107) * FEAT-#1991: Enable OmniSci on cloud (#2119) * FIX-#1144: Fix `read_parquet` for working with HDFS (#2120) * FEAT-#1992: enable ETL part of LoanPD bench in cloud (#2106) * FEAT-#2089: add ability to install additional conda packages (#2117) * FEAT-#1200: pivot_table implementation (#1669) * FEAT-#2141: support skew aggregate in omnisci backend (#2142) * FEAT-#1219, FEAT-#2135: Add `corr` and `cov` (#2130) * FEAT-#1847: High performance, no shuffle train_test_split (#1848) * FEAT-#2138: sync modin version between local and remote contexts (#2153) Code Quality + Testing 💯 ------------------------- * TEST-#1876: Add tests running under experimental (#1877) * FIX-#1867: establish CI (#1868) * FIX-#1887: fix versions (#1888) * TEST-#1865: Add RPyC library in requirements (#1866) * TEST-2022: speed up prepare-cache job (#2023) * TEST-#2024: remove test_dataframe.py (#2025) * TEST-#2020: decrease parallel tests on Ubuntu (#2021) * TEST-#2030: speed up cache; decrease parallel jobs in push.yml (#2031) * TEST-#2028: speed up window tests (#2029) * TEST-#2026: speed up test_join_sort.py (#2027) * TEST-#2037: speed up test_binary with refactor dataframe.py (#2038) * TEST-#2044: speed up iter tests (#2045) * TEST-#2042: speed up udf tests (#2043) * TEST-#2033: speed up test_series.py (#2034) * TEST-#2039: speed up default tests (#2040) * TEST-#2050: decrease number of parallel jobs on windows Ci (#2051) * TEST-#1891: use conda instead of pip (#2056) * REFACTOR-#2035: move getitem_array to the backend (#2036) * REFACTOR-#2083: Rename LISCENSE_HEADER to LICENSE_HEADER. (#2082) * REFACTOR-#1839: Update pandas dependency and pandas APIs to match (#1840) * Simulate cluster for testing remote context (#1982) * Fix test_from_csv for simulated remote case (#2111) * TEST-#2123: testing of OmniSci added at CI (#2124) * FEAT-#2087: Added benchmarks test suite (#2103) * FEAT-#2136: Added benchmarks for mask generation and indexing (#2137) * FIX-#2162: exclude test folders from coverage report (#2160) * REFACTOR-#2170: simplify concat of a single frame (#2171) * FEAT-#2166: Added benchmarks for DataFrame.merge (#2167) Backend enhancements + Performance 🚀 ------------------------------------- * FEAT-#1861: Use cloudpickle library for experimental.cloud features (#1862) * Fix access to special attributes in experimental mode (#1875) * REFACTOR-#2011: move default_to_pandas in groupby to backend (#2041) * FIX-#2115: Use `seek` when we don't need to check quotes for CSV (#2116) Documentation 📃 ---------------- * Conda recipe for Modin (#1986) * FIX-#2131: Add note on `value_counts` for `DataFrame` in the doc (#2132) * DOC-REFACTOR: 1/n Refactoring the documentation (#2095) * DOCS-#2068: Provide Jupyter notebooks showing running NYC Taxi (#2168) * DOCS-#2176: Add plantuml and issues to doc dependencies (#2177) Dependencies ------------ * FIX-#911: Pin Dask Dependency for Python 3.8 compatiblity (#1846) * FIX-#2090: Do not always require pyarrow (#2126) Contributors this release ------------------------- The following users contributed code to Modin since the last release. @abykovsk (First Time contributor) ⭐️ @anton-malakhov (First Time contributor) ⭐️ @heuermh (First Time contributor) ⭐️ @ienkovich (First Time contributor) ⭐️ @itamarst (Returning contributor) 🌟 @prutskov (Returning contributor) 🌟 @amyskov (Returning contributor) 🌟 @vnlitvinov (Returning contributor) 🌟 @dchigarev (Returning contributor) 🌟 @YarShev (Returning contributor) 🌟 @anmyachev (Returning contributor) 🌟 @gshimansky (Returning contributor) 🌟 @devin-petersohn (Maintainer)