Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RESTClient #1141

Merged
merged 15 commits into from
Mar 25, 2024
Merged

Add RESTClient #1141

merged 15 commits into from
Mar 25, 2024

Conversation

burnash
Copy link
Collaborator

@burnash burnash commented Mar 24, 2024

Description

This pull request introduces a RESTClient class along with supporting components to interact with RESTful APIs. The RESTClient provides:

  • an interface for making HTTP requests,
  • handling pagination (via a set of paginator classes: e.g. HeaderLinkPaginator, JSONResponsePaginator.
  • managing authentication (via BearerTokenAuth, HttpBasicAuth, etc.)

Users are able to customize request handling, response parsing, and data extraction according to the specifics of the target API. RESTClient also features automatic paginator detection with the PaginatorFactory

Example usage:

import dlt
from dlt.sources.helpers.rest_client import RESTClient

github_client = RESTClient(base_url="https://api.github.com")

@dlt.resource(
    table_name="issues",
    write_disposition="merge",
    primary_key="id",
)
def get_issues(
    updated_at=dlt.sources.incremental(
        "updated_at", initial_value="1970-01-01T00:00:00Z"
    )
):
    for page in github_client.paginate(
        "/repos/dlt-hub/dlt/issues",
        params={
            "since": updated_at.last_value,
            "per_page": 100,
            "sort": "updated",
            "direction": "desc",
            "state": "open",
        },
    ):
        yield page


# The rest of the pipeline remains the same
pipeline = dlt.pipeline(
    pipeline_name="github_issues_merge",
    destination="duckdb",
    dataset_name="github_data_merge",
)
load_info = pipeline.run(get_issues)
print(load_info)

Related Issues

@burnash burnash self-assigned this Mar 24, 2024
Copy link

netlify bot commented Mar 24, 2024

Deploy Preview for dlt-hub-docs canceled.

Name Link
🔨 Latest commit b05b20c
🔍 Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/6601c12cc8e5b100080976d6

@burnash
Copy link
Collaborator Author

burnash commented Mar 24, 2024

@rudolfix I've put it in dlt.sources.helpers.rest_client module. Open for suggestions.

@burnash burnash added the enhancement New feature or request label Mar 24, 2024
@rudolfix
Copy link
Collaborator

@burnash what I think is missing for internal testing is a simplified request like interface to access the rest client. the idea is to expose it from request helper.

from source.helpers import requests

requests.paginate(url, method, auth=str | AuthBase | AuthBase(), paginator = str | Paginator | Paginator(), ....)

auth and paginators follow the same convention as destinations, progress, naming convention

str - a short hand alias ie or type
type or instance of a type
types have aliases identical to short hand ie basic is basic auth shorthand alias and a type. basic(user, password) creates BasicAuthenticator

auth and paginators can be configured

[source.chargebee.paginator]
cursor_field="..."
cursor_param="cursor"

[source.chargebee.auth]
api_key="...."

where to put REST client: current location is good. let's write some code and maybe we'll change it
I'll review the actual code tomorrow

Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is good! we are missing

  1. tests for requests.paginated shorthand
  2. some initial documentation (maybe page in docs but not linked to index yet)

dlt/sources/helpers/rest_client/utils.py Outdated Show resolved Hide resolved
rudolfix
rudolfix previously approved these changes Mar 25, 2024
Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! let' work on it in followup PRs

@burnash
Copy link
Collaborator Author

burnash commented Mar 25, 2024

@rudolfix

  1. tests for requests.paginated shorthand

There are tests for that in the test_client.py but I think it's best to have a separate file for that.

@burnash
Copy link
Collaborator Author

burnash commented Mar 25, 2024

@rudolfix

tests for requests.paginated shorthand

Refactored from test_client.py in b1ee49a

Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls see one comment

dlt/sources/helpers/rest_client/auth.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@rudolfix rudolfix merged commit b8fb7fd into devel Mar 25, 2024
39 of 47 checks passed
@rudolfix rudolfix deleted the enh/rest_api_client branch March 25, 2024 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants