Skip to content

Commit

Permalink
Merge pull request #6 from pwwang/dev
Browse files Browse the repository at this point in the history
0.0.4
  • Loading branch information
pwwang authored May 7, 2021
2 parents b714ab2 + 8a73146 commit 269af11
Show file tree
Hide file tree
Showing 23 changed files with 2,287 additions and 885 deletions.
4 changes: 3 additions & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,11 @@ repos:
rev: 5df1a4bf6f04a1ed3a643167b38d502575e29aef
hooks:
- id: trailing-whitespace
exclude: 'docs/'
exclude: 'docs/|README\.rst'
- id: end-of-file-fixer
exclude: 'docs/|README\.rst'
- id: check-yaml
exclude: 'mkdocs.yml'
- repo: local
hooks:
- id: masterpylintrc
Expand Down
19 changes: 17 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,22 @@
# datar

Port of R data packages (especially from tidyverse): [tidyr][1], [dplyr][2], [tibble][4] and so on in python, using [pipda][3].
Port of [dplyr][2] and other related R packages in python, using [pipda][3].

Unlike other similar packages in python that just mimic the piping sign, `datar` follows the API designs from the original packages as possible. So that nearly no extra effort is needed for those who are familar with those R packages to transition to python.

<!-- badges -->
[![Pypi][6]][7] [![Github][8]][9] ![Building][10] [![Docs and API][11]][5] [![Codacy][12]][13] [![Codacy coverage][14]][13]

[Documentation][5]
[Documentation][5] | [Reference Maps][15] | [Notebook Examples][16] | [API][17]

## Installtion

```shell
pip install -U datar
```

`datar` requires python 3.7.1+ and is backended by `pandas (1.2+)`.

## Example usage

```python
Expand Down Expand Up @@ -92,3 +95,15 @@ iris >> pull(f.Sepal_Length) >> dist_plot()
[3]: https://github.com/pwwang/pipda
[4]: https://tibble.tidyverse.org/index.html
[5]: https://pwwang.github.io/datar/
[6]: https://img.shields.io/pypi/v/datar?style=flat-square
[7]: https://pypi.org/project/datar/
[8]: https://img.shields.io/github/v/tag/pwwang/datar?style=flat-square
[9]: https://github.com/pwwang/datar
[10]: https://img.shields.io/github/workflow/status/pwwang/datar/Build%20and%20Deploy?style=flat-square
[11]: https://img.shields.io/github/workflow/status/pwwang/datar/Build%20Docs?label=Docs&style=flat-square
[12]: https://img.shields.io/codacy/grade/3d9bdff4d7a34bdfb9cd9e254184cb35?style=flat-square
[13]: https://app.codacy.com/gh/pwwang/datar
[14]: https://img.shields.io/codacy/coverage/3d9bdff4d7a34bdfb9cd9e254184cb35?style=flat-square
[15]: https://pwwang.github.io/datar/reference_maps/ALL/
[16]: https://pwwang.github.io/datar/notebooks/across/
[17]: https://pwwang.github.io/datar/api/datar/
130 changes: 130 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
.. role:: raw-html-m2r(raw)
:format: html


datar
=====

Port of `dplyr <https://dplyr.tidyverse.org/index.html>`_ and other related R packages in python, using `pipda <https://github.com/pwwang/pipda>`_.

Unlike other similar packages in python that just mimic the piping sign, ``datar`` follows the API designs from the original packages as possible. So that nearly no extra effort is needed for those who are familar with those R packages to transition to python.

:raw-html-m2r:`<!-- badges -->`
`
.. image:: https://img.shields.io/pypi/v/datar?style=flat-square
:target: https://img.shields.io/pypi/v/datar?style=flat-square
:alt: Pypi
<https://pypi.org/project/datar/>`_ `
.. image:: https://img.shields.io/github/v/tag/pwwang/datar?style=flat-square
:target: https://img.shields.io/github/v/tag/pwwang/datar?style=flat-square
:alt: Github
<https://github.com/pwwang/datar>`_
.. image:: https://img.shields.io/github/workflow/status/pwwang/datar/Build%20and%20Deploy?style=flat-square
:target: https://img.shields.io/github/workflow/status/pwwang/datar/Build%20and%20Deploy?style=flat-square
:alt: Building
`
.. image:: https://img.shields.io/github/workflow/status/pwwang/datar/Build%20Docs?label=Docs&style=flat-square
:target: https://img.shields.io/github/workflow/status/pwwang/datar/Build%20Docs?label=Docs&style=flat-square
:alt: Docs and API
<https://pwwang.github.io/datar/>`_ `
.. image:: https://img.shields.io/codacy/grade/3d9bdff4d7a34bdfb9cd9e254184cb35?style=flat-square
:target: https://img.shields.io/codacy/grade/3d9bdff4d7a34bdfb9cd9e254184cb35?style=flat-square
:alt: Codacy
<https://app.codacy.com/gh/pwwang/datar>`_ `
.. image:: https://img.shields.io/codacy/coverage/3d9bdff4d7a34bdfb9cd9e254184cb35?style=flat-square
:target: https://img.shields.io/codacy/coverage/3d9bdff4d7a34bdfb9cd9e254184cb35?style=flat-square
:alt: Codacy coverage
<https://app.codacy.com/gh/pwwang/datar>`_

`Documentation <https://pwwang.github.io/datar/>`_ | `Reference Maps <https://pwwang.github.io/datar/reference_maps/ALL/>`_ | `Notebook Examples <https://pwwang.github.io/datar/notebooks/across/>`_ | `API <https://pwwang.github.io/datar/api/datar/>`_

Installtion
-----------

.. code-block:: shell
pip install -U datar
``datar`` requires python 3.7.1+ and is backended by ``pandas (1.2+)``.

Example usage
-------------

.. code-block:: python
from datar import f
from datar.dplyr import mutate, filter, if_else
from datar.tibble import tibble
df = tibble(
x=range(4),
y=['zero', 'one', 'two', 'three']
)
df >> mutate(z=f.x)
"""# output
x y z
0 0 zero 0
1 1 one 1
2 2 two 2
3 3 three 3
"""
df >> mutate(z=if_else(f.x>1, 1, 0))
"""# output:
x y z
0 0 zero 0
1 1 one 0
2 2 two 1
3 3 three 1
"""
df >> filter(f.x>1)
"""# output:
x y
0 2 two
1 3 three
"""
df >> mutate(z=if_else(f.x>1, 1, 0)) >> filter(f.z==1)
"""# output:
x y z
0 2 two 1
1 3 three 1
"""
.. code-block:: python
# works with plotnine
import numpy
from datar.base import sin, pi
from plotnine import ggplot, aes, geom_line, theme_classic
df = tibble(x=numpy.linspace(0, 2*pi, 500))
(df >>
mutate(y=sin(f.x), sign=if_else(f.y>=0, "positive", "negative")) >>
ggplot(aes(x='x', y='y')) + theme_classic()
) + geom_line(aes(color='sign'), size=1.2)
.. image:: ./example.png
:target: ./example.png
:alt: example


.. code-block:: python
# very easy to integrate with other libraries
# for example: klib
import klib
from pipda import register_verb
from datar.datasets import iris
from datar.dplyr import pull
dist_plot = register_verb(func=klib.dist_plot)
iris >> pull(f.Sepal_Length) >> dist_plot()
.. image:: ./example2.png
:target: ./example2.png
:alt: example

2 changes: 1 addition & 1 deletion datar/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@
from .core import operator as _datar_operator
from .core.defaults import f

__version__ = '0.0.3'
__version__ = '0.0.4'
31 changes: 19 additions & 12 deletions datar/base/funcs.py
Original file line number Diff line number Diff line change
Expand Up @@ -397,22 +397,25 @@ def c(*elems: Any) -> Collection:
return Collection(*elems)

@register_func(None, context=Context.EVAL)
def seq_along(along_with: Iterable[Any]) -> SeriesLikeType:
def seq_along(
along_with: Iterable[Any],
_base0: bool = False
) -> SeriesLikeType:
"""Generate sequences along an iterable"""
return numpy.array(range(len(along_with))) + 1
return numpy.array(range(len(along_with))) + int(not _base0)

@register_func(None, context=Context.EVAL)
def seq_len(length_out: IntOrIter) -> SeriesLikeType:
def seq_len(length_out: IntOrIter, _base0: bool = False) -> SeriesLikeType:
"""Generate sequences with the length"""
if is_scalar(length_out):
return numpy.array(range(int(length_out)))
return numpy.array(range(int(length_out))) + int(not _base0)
if len(length_out) > 1:
logger.warning(
"In seq_len(%r) : first element used of 'length_out' argument",
length_out
)
length_out = int(list(length_out)[0])
return numpy.array(range(length_out))
return numpy.array(range(length_out)) + int(not _base0)


@register_func(None, context=Context.EVAL)
Expand All @@ -421,7 +424,8 @@ def seq(
to: IntType = None,
by: IntType = None,
length_out: IntType = None,
along_with: IntType = None
along_with: IntType = None,
_base0: bool = False
) -> SeriesLikeType:
"""Generate a sequence
Expand All @@ -431,24 +435,27 @@ def seq(
This API is consistent with r-base's seq. 1-based and inclusive.
"""
if along_with is not None:
return seq_along(along_with)
return seq_along(along_with, _base0)
if from_ is not None and not is_scalar(from_):
return seq_along(from_)
return seq_along(from_, _base0)
if length_out is not None and from_ is None and to is None:
return seq_len(length_out)

base = int(not _base0)

if from_ is None:
from_ = 1
from_ = base
elif to is None:
from_, to = 1, from_
from_, to = base, from_

if length_out is not None:
by = (float(to) - float(from_)) / float(length_out - 1)

elif by is None:
by = 1 if to > from_ else -1
length_out = to - from_ + 1 if to > from_ else from_ - to + 1
length_out = to - from_ + base if to > from_ else from_ - to + base
else:
length_out = (to - from_ + 1.1 * by) // by
length_out = (to - from_ + .1 * by + float(base) * by) // by
return numpy.array([from_ + n * by for n in range(int(length_out))])

@register_func(None, context=Context.EVAL)
Expand Down
3 changes: 2 additions & 1 deletion datar/dplyr/bind.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ def data_to_df(data):
if col in dat and not dat[col].isna().all()
]
all_categorical = [
is_categorical(ser) for ser in all_series
is_categorical(ser) or all(pandas.isna(ser)) for ser in all_series
]
if all(all_categorical):
union_cat = union_categoricals(all_series)
Expand All @@ -98,6 +98,7 @@ def data_to_df(data):
keys=key_data.keys(),
names=[_id, None]
).reset_index(level=0).reset_index(drop=True)

return pandas.concat(key_data.values()).reset_index(drop=True)

@bind_rows.register(DataFrameGroupBy, context=Context.PENDING)
Expand Down
9 changes: 8 additions & 1 deletion datar/tibble/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
"""APIs for R-tibble"""

from .funcs import tibble, tribble, fibble
from .funcs import (
tibble, tibble_row, tribble, fibble,
enframe, deframe,
add_row, add_case, add_column,
has_rownames, has_index, remove_index, remove_rownames, drop_index,
rownames_to_column, index_to_column, rowid_to_column,
column_to_rownames, column_to_index,
)
Loading

0 comments on commit 269af11

Please sign in to comment.