Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate to async requests and update visual response #6

Merged
merged 20 commits into from
Jan 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 0 additions & 30 deletions .github/workflows/build_cli.yml

This file was deleted.

17 changes: 17 additions & 0 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
name: Build and publish

on:
release:
types: [released]

jobs:
deploy:

runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v2
- name: "Build and publish to PyPi"
uses: JRubics/[email protected]
with:
pypi_token: ${{ secrets.PYPI_TOKEN }}
36 changes: 36 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
name: Integration Tests

on: [push]

jobs:
build:
runs-on: ubuntu-latest
strategy:
max-parallel: 4
matrix:
python-version: ['3.8', '3.9', '3.10', '3.11']

env:
PORT: 8080
steps:
- name: "Checkout"
uses: "actions/checkout@v4"
- name: Run Dataverse Action
id: dataverse
uses: gdcc/dataverse-action@main
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python3 -m pip install --upgrade pip
python3 -m pip install poetry
poetry install --with test
- name: Test with pytest
env:
API_TOKEN: ${{ steps.dataverse.outputs.api_token }}
BASE_URL: ${{ steps.dataverse.outputs.base_url }}
DVUPLOADER_TESTING: "true"
run: |
python3 -m poetry run pytest
31 changes: 16 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
<p align="center">
<h1 align="center">Python DVUploader</h1>
</p>
<h1 align="center">
Dataverse Uploader</br>
<a href="https://badge.fury.io/py/dvuploader"><img src="https://badge.fury.io/py/dvuploader.svg" alt="PyPI version" height="18"></a>
<img src="https://img.shields.io/badge/python-3.8|3.9|3.10|3.11-blue.svg" alt="Build Badge">
<img src="https://github.com/gdcc/python-dvuploader/actions/workflows/test.yaml/badge.svg" alt="Build Badge">
</h1>

Python equivalent to the [DVUploader](https://github.com/GlobalDataverseCommunityConsortium/dataverse-uploader) written in Java. Complements other libraries written in Python and facilitates the upload of files to a Dataverse instance via [Direct Upload](https://guides.dataverse.org/en/latest/developers/s3-direct-upload-api.html).

Expand All @@ -12,9 +15,7 @@ Python equivalent to the [DVUploader](https://github.com/GlobalDataverseCommunit

-----

<p align="center">
<img src="./static/demo.gif" width="600"/>
</p>
https://github.com/gdcc/python-dvuploader/assets/30547301/671131b1-d188-4433-9f77-9ec0ed2af36e

-----

Expand All @@ -41,23 +42,27 @@ python3 -m pip install .
In order to perform a direct upload, you need to have a Dataverse instance running and a cloud storage provider. The following example shows how to upload files to a Dataverse instance. Simply provide the files of interest and utilize the `upload` method of a `DVUploader` instance.

```python
from dvuploader import DVUploader, File
import dvuploader as dv


# Add file individually
files = [
File(filepath="./small.txt"),
File(directoryLabel="some/dir", filepath="./medium.txt"),
File(directoryLabel="some/dir", filepath="./big.txt"),
dv.File(filepath="./small.txt"),
dv.File(directoryLabel="some/dir", filepath="./medium.txt"),
dv.File(directoryLabel="some/dir", filepath="./big.txt"),
*dv.add_directory("./data"), # Add an entire directory
]

DV_URL = "https://demo.dataverse.org/"
API_TOKEN = "XXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"
PID = "doi:10.70122/XXX/XXXXX"

dvuploader = DVUploader(files=files)
dvuploader = dv.DVUploader(files=files)
dvuploader.upload(
api_token=API_TOKEN,
dataverse_url=DV_URL,
persistent_id=PID,
n_parallel_uploads=2, # Whatever your instance can handle
)
```

Expand Down Expand Up @@ -109,7 +114,3 @@ The `config` file can then be used as follows:
```bash
dvuploader --config-path config.yml
```

#### CLI Binaries

DVUploader ships with binaries for Linux, MacOS and Windows. You can download the binaries from the `bin` [directory](./bin) and use them in a similar fashion as described above.
Binary file removed bin/dvuploader-macos-latest
Binary file not shown.
Binary file removed bin/dvuploader-ubuntu-latest
Binary file not shown.
Binary file removed bin/dvuploader-windows-latest.exe
Binary file not shown.
5 changes: 5 additions & 0 deletions dvuploader/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,7 @@
from .dvuploader import DVUploader
from .file import File
from .utils import add_directory

import nest_asyncio

nest_asyncio.apply()
19 changes: 14 additions & 5 deletions dvuploader/checksum.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,22 @@
import os
from typing import Callable

from pydantic import BaseModel, Field
from pydantic import BaseModel, ConfigDict, Field


from enum import Enum
import hashlib


class ChecksumTypes(Enum):
"""Enum class representing different types of checksums."""
"""Enum class representing different types of checksums.

Attributes:
SHA1: Represents the SHA-1 checksum algorithm.
MD5: Represents the MD5 checksum algorithm.
SHA256: Represents the SHA-256 checksum algorithm.
SHA512: Represents the SHA-512 checksum algorithm.
"""

SHA1 = ("SHA-1", hashlib.sha1)
MD5 = ("MD5", hashlib.md5)
Expand All @@ -23,8 +34,7 @@ class Checksum(BaseModel):
value (str): The value of the checksum.
"""

class Config:
allow_population_by_field_name = True
model_config = ConfigDict(populate_by_name=True)

type: str = Field(..., alias="@type")
value: str = Field(..., alias="@value")
Expand Down Expand Up @@ -62,7 +72,6 @@ def _chunk_checksum(fpath: str, hash_fun: Callable, blocksize=2**20) -> str:
Returns:
str: A string representing the checksum of the file.
"""

m = hash_fun()
with open(fpath, "rb") as f:
while True:
Expand Down
40 changes: 0 additions & 40 deletions dvuploader/chunkstream.py

This file was deleted.

45 changes: 36 additions & 9 deletions dvuploader/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
import typer

from pydantic import BaseModel
from typing import List, Tuple
from typing import List, Optional
from dvuploader import DVUploader, File


Expand All @@ -11,12 +11,13 @@ class CliInput(BaseModel):
dataverse_url: str
persistent_id: str
files: List[File]
n_jobs: int = 1


app = typer.Typer()


def _parse_yaml_config(path: str) -> Tuple[List[File], str, str, str]:
def _parse_yaml_config(path: str) -> CliInput:
"""
Parses a configuration file and returns a Class instance
containing a list of File objects, a persistent ID, a Dataverse URL,
Expand All @@ -32,17 +33,31 @@ def _parse_yaml_config(path: str) -> Tuple[List[File], str, str, str]:
Raises:
ValueError: If the configuration file is invalid.
"""
return CliInput(**yaml.safe_load(open(path)))
return CliInput(**yaml.safe_load(open(path))) # type: ignore


def _validate_inputs(
filepaths: List[str],
pid: str,
dataverse_url: str,
api_token: str,
config_path: str,
config_path: Optional[str],
) -> None:
if config_path and len(filepaths) > 0:
"""
Validates the inputs for the dvuploader command.

Args:
filepaths (List[str]): List of filepaths to be uploaded.
pid (str): Persistent identifier of the dataset.
dataverse_url (str): URL of the Dataverse instance.
api_token (str): API token for authentication.
config_path (Optional[str]): Path to the configuration file.

Raises:
typer.BadParameter: If both a configuration file and a list of filepaths are specified.
typer.BadParameter: If neither a configuration file nor metadata parameters are specified.
"""
if config_path is not None and len(filepaths) > 0:
raise typer.BadParameter(
"Cannot specify both a JSON/YAML file and a list of filepaths."
)
Expand Down Expand Up @@ -80,15 +95,27 @@ def main(
default=None,
help="The URL of the Dataverse repository.",
),
config_path: str = typer.Option(
config_path: Optional[str] = typer.Option(
default=None,
help="Path to a JSON/YAML file containing specifications for the files to upload. Defaults to None.",
),
n_jobs: int = typer.Option(
default=-1,
default=1,
help="The number of parallel jobs to run. Defaults to -1.",
),
):
"""
Uploads files to a Dataverse repository.

Args:
filepaths (List[str]): A list of filepaths to upload.
pid (str): The persistent identifier of the Dataverse dataset.
api_token (str): The API token for the Dataverse repository.
dataverse_url (str): The URL of the Dataverse repository.
config_path (Optional[str]): Path to a JSON/YAML file containing specifications for the files to upload. Defaults to None.
n_jobs (int): The number of parallel jobs to run. Defaults to -1.
"""

_validate_inputs(
filepaths=filepaths,
pid=pid,
Expand All @@ -113,9 +140,9 @@ def main(
persistent_id=cli_input.persistent_id,
dataverse_url=cli_input.dataverse_url,
api_token=cli_input.api_token,
n_jobs=n_jobs,
n_parallel_uploads=n_jobs,
)


if __name__ == "__main__":
typer.run(main)
app()
Loading