Modin 0.8.0
devin-petersohn
released this
29 Jul 23:16
·
1919 commits
to master
since this release
Modin 0.8.0 release notes The Modin 0.8.0 release is one of the biggest releases yet, and includes several bugfixes and new functionality, highlighted below. One of the new key features is the ability to spawn and run Modin code on a cluster via a new experimental cloud API. This API allows you to switch between running on your laptop and running in the cloud, across multiple clusters. The API is as simple as: ``` import modin.pandas as pd from modin.experimental.cloud import cluster example_cluster = cluster.create("aws", "aws_credentials") with example_cluster: remote_df = pd.DataFrame([1, 2, 3, 4]) print(len(remote_df)) # len() is executed remotely local_df = pd.read_csv("my.csv") print(len(local_df)) ``` With this simple API, data scientists have more power at their fingertips. The high level overview of the major bugfixes and new functionality can be found below. Bugfixes + Pandas Concordance (🐛 + 🐼) ---------------------------------------- * Level parameter for kurt function implementation (#1567) * Fix of issue #1462: groupby_agg ignores exceptions (#1703) * Fix AttributeError: module 'numpy.random' has no attribute 'randomState' (#1707) * Correctly handle mismatched quotes and csv.QUOTE_NONE flag. (#1555) * Fix for Series.attrs and Series.array (#1717) * Fix #1683 - losing index names in pd.concat (#1684) * Use low-level api for kurt function implementation with defined level parameter (#1719) * Fix of inconsistent indices (#1727) * Fix support for callable in loc/iloc (#1776) * Fix support for nested assignment with `loc`/`iloc` (#1788) * Fix support for `loc` with MultiIndex parameter (#1789) * Fix metadata for concat and mask when `axis=1` (#1797) * Fix unlimited column printing for smaller dataframes (#1799) * Fix visual bug with repr on smaller dataframes (#1798) * Fix support for cummax and cummin across int and float (#1800) * Fix support for dictionary in `pd.concat` (#1795) * Series.reset_index considering 'name' fix (#1820) * `to_pandas' of nested objects added (#1828) * Don't sort indexes in Series functions with level parameter (#1830) * Fix result of `Series.dt.components/freq/tz` (#1730) * Groupby on categories fixed (#1802) * product/sum incorrect behavior of 'min_count' fixed (#1827) * Support for groupby() with original Series in by list. (#1842) * make 'sort_index' consider axis parameter (#1858) * properly process UDFs (#1845) New Functionality ✨ -------------------- * Add implementation of `resample` for Series and DataFrame (#1625) * Add `merge` implementation for `DataFrame` and as free function (#1695) * melt implementation (#1689) * Enable running Modin via remote Ray on spawned cluster (#1818) 🎉 Code Quality + Testing 💯 ------------------------- * Move logic of `sort_values` into the query compiler (#1754) * Add commitlint check on pull requests (#1760) * REFACTOR-#1763: Move logic of `merge` (#1764) * Limit object store to 1GB during CI tests (#1744) Backend enhancements + Performance 🚀 ------------------------------------- * Improve performance of slice indexing (#1753) * Update iterator implemetion to `iloc` (#1599) * Speed up RPyC connection (#1833) Documentation 📃 ---------------- * Fix missing links in the architecture page (#1810) * add runner of taxi benchmark as example (#1836) * Add notes about using MODIN_SOCKS_PROXY variable (#1817) * add runner of h2o benchmark as example (#1856) Contributors this release ------------------------- The following users contributed code to Modin since the last release. @hwsamuel (First Time contributor) ⭐️ @ikedaosushi (First Time contributor) ⭐️ @itamarst (First Time contributor) ⭐️ @pratheekrebala (First Time contributor) ⭐️ @prutskov (Returning contributor) 🌟 @amyskov (Returning contributor) 🌟 @vnlitvinov (Returning contributor) 🌟 @dchigarev (Returning contributor) 🌟 @YarShev (Returning contributor) 🌟 @anmyachev (Returning contributor) 🌟 @gshimansky (Returning contributor) 🌟 @devin-petersohn (Maintainer) 🎉🎉 Thank you! 🎉🎉