Skip to content

VS Code extension to transform table from clipboard to R, Python or Julia dataframe

License

Notifications You must be signed in to change notification settings

atsyplenkov/pastum

Repository files navigation

Website Visual Studio Marketplace Version Visual Studio Marketplace Installs Open VSX Downloads Deploy Extension GitHub License Project Status: Active – The project has reached a stable, usable state and is being actively developed.

pastum allows you to quickly transform any text/HTML table from your clipboard into a dataframe object in your favorite language — R, Python, Julia or JavaScript. Almost all popular frameworks are supported; if something is missing, don't hesitate to raise an issue.

Example usage

Text table to polars (Python)

Using the command palette, insert the copied text table as a Python, R, or Julia object. Select the framework on the go. Just press Ctrl/Cmd+Shift+P, type pastum, and select the preferred option:

Text table to tibble (R)

Or you can specify the pastum.defaultDataframeR/pastum.defaultDataframePython parameter in the VS Code settings and insert the table using the right-click context menu by selecting Pastum: paste as default dataframe. The inserted language-framework pair will depend on the editor language (i.e., you cannot paste a pandas dataframe into an R file using this command):

Try it Yourself

In the table below, the most unfortunate and complex situation is presented. It is a mixture of empty cells, strings, integer and float values. Select, copy and try to paste it into the IDE. The {pastum} will recognize all types correctly and fill empty cells with corresponding NA/missing/None/null values.

Integer ID Strings with missing values Only float Int and Float
1 Javascript 1.43 1
2 Rust 123,456.78 2
3 -45 3
4 Clojure 123456.78 4
5 Basic -45.65 5.5
# paste it as a tribble object in R
tibble::tribble(
  ~IntegerID, ~StringsWithMissingValues, ~OnlyFloat, ~IntAndFloat,
  1L,         "Javascript",              1.43,       1.0,         
  2L,         "Rust",                    123456.78,  2.0,         
  3L,         NA,                        -45.0,      3.0,         
  4L,         "Clojure",                 123456.78,  4.0,         
  5L,         "Basic",                   -45.65,     5.5
)

Installation

The extension is published on both the VS Code Marketplace and the Open VSX Registry: just click Install there or manually install it with:

  1. Start VS Code (or any other Code OSS-based IDE, such as Positron).

  2. Inside VS Code, go to the extensions view either by executing the View: Show Extensions command (click View -> Command Palette...) or by clicking on the extension icon on the left side of the VS Code window.

  3. In the extensions view, simply search for the term pastum in the marketplace search box, then select the extension named Pastum and click the install button.

Alternatively, you can install the latest version from the Releases page. Download the latest .vsix file and install it as described here.

Features

  • For a complete list of features and example usage, see — pastum.anatolii.nz

  • You can use the extension through the command palette (Ctrl/Cmd+Shift+P) or via the right-click context menu. If you are a conservative person who doesn't switch frameworks often, you can specify your favorite one in the settings and always use the Pastum: paste as default dataframe command.

  • The extension mimics the behavior of the {datapasta} R package and is capable of detecting the main types: strings (or character vectors in R), integer, and float values. A numeric column is considered to be float if at least one of the values is float; otherwise, the entire column will be treated as integer. By default, trailing zeroes are added to all float values to comply with polars rules (i.e., numeric values c(1, 2, 3, 4.5) are transformed to c(1.0, 2.0, 3.0, 4.5)).

  • Empty table cells will be replaced with NA, None, or missing values depending on the preferred programming language.

  • By default, the column names are renamed following the PascalCase convention (i.e., non-machine friendly column names like 'Long & Ugly column💥' will be transformed to 'LongUglyColumn'). However, the user can specify the preferred naming convention in the settings — pastum.defaultConvention.

  • Since v0.2.0, users can control the decimal separator (e.g., '.' in 12.45) and the digit group separator (i.e., in numbers over 999) through the pastum.decimalPoint config. By default, it is set up for a dot (.) as the decimal separator and a comma (,) as the group separator.

IDE support

The extension has almost zero dependencies and is expected to work with any Code OSS-based IDE. It was tested with the latest release version of VS Code (1.94.2) and the pre-release version of Positron IDE (2024.11.0-69). So, if you are using VS Code, go to the VS Code Marketplace; otherwise, visit the Open VSX Registry.

Questions and Feature Requests

There's a lot going on with the development of new features in Pastum. If you have any questions or something is not working, feel free to open an issue or start a conversation on BlueSky.

Contributions

Contributions are welcome! If you'd like to contribute, please, fork, submit a PR and I'll merge it.

Acknowledgements

This extension was inspired by the {datapasta} R package created by @MilesMcBain and contributors. However, the implementation in the Code OSS environment was influenced by @coatless and his web app.