DataLigo

This library helps to read and write data from most of the data sources. It accelerate the ML and ETL process without worrying about the multiple data connectors.

Installation

pip install -U dataligo

Install from sources

Alternatively, you can also clone the latest version from the repository and install it directly from the source code:

pip install -e .

Quick tour

>>> from dataligo import Ligo
>>> from transformers import pipeline

>>> ligo = Ligo('./ligo_config.yaml') # Check the sample_ligo_config.yaml for reference
>>> print(ligo.get_supported_data_sources_list())
['s3',
 'gcs',
 'azureblob',
 'bigquery',
 'snowflake',
 'redshift',
 'starrocks',
 'postgresql',
 'mysql',
 'oracle',
 'mssql',
 'mariadb',
 'sqlite',
 'elasticsearch',
 'mongodb',
 'dynamodb',
 'redis']

>>> mongodb = ligo.connect('mongodb')
>>> df = mongodb.read_as_dataframe(database='reviewdb',collection='reviews',return_type='pandas') # Default return_type is pandas
>>> df.head()
        _id	                        Review
0	64272bb06a14f52787e0a09e	good and interesting
1	64272bb06a14f52787e0a09f	This class is very helpful to me. Currently, I...
2	64272bb06a14f52787e0a0a0	like!Prof and TAs are helpful and the discussi...
3	64272bb06a14f52787e0a0a1	Easy to follow and includes a lot basic and im...
4	64272bb06a14f52787e0a0a2	Really nice teacher!I could got the point eazl...

>>> classifier = pipeline("sentiment-analysis")
>>> reviews = df.Review.tolist()
>>> results = classifier(reviews,truncation=True)
>>> for result in results:
>>>     print(f"label: {result['label']}, with score: {round(result['score'], 4)}")
label: POSITIVE, with score: 0.9999
label: POSITIVE, with score: 0.9997
label: POSITIVE, with score: 0.9999
label: POSITIVE, with score: 0.999
label: POSITIVE, with score: 0.9967

>>> df['predicted_label'] = [result['label'] for result in results]
>>> df['predicted_score'] = [round(result['score'], 4) for result in results]

# Write the results to the MongoDB
>>> mongodb.write_dataframe(df,'reviewdb','review_sentiments')

Example DataLigo Pipeline

ETL Pipeline

ML Pipeline

Supported Connectors

Data Sources	Type	pandas	polars	dask
S3	datalake	read write	read write	read write
GCS	datalake	read write	read write	read write
Azure Blob Storage	datalake	read write	read write	read write
Snowflake	datawarehouse	read write	read write	read write
BigQuery	datawarehouse	read write	read write	read write
StarRocks	datawarehouse	read write	read write	read write
Redshift	datawarehouse	read write	read write	read write
PostgreSQL	database	read write	read write	read write
MySQL	database	read write	read write	read write
MariaDB	database	read write	read write	read write
MsSQL	database	read write	read write	read write
Oracle	database	read write	read write	read write
SQLite	database	read write	read write	read write
MongoDB	nosql	read write	read write	read write
ElasticSearch	nosql	read write	read write	read write
DynamoDB	nosql	read write	read write	read write
Redis(beta)	nosql	read write	read write	read write

Acknowledgement

Some functionalities of DataLigo are inspired by the following packages.

ConnectorX

DataLigo used Connectorx to read data from most of the RDBMS databases to utilize the performance benefits and inspired the return_type parameter from it
dynamo-pandas

DataLigo used dynamo-pandas to read and write data from DynamoDB

Name		Name	Last commit message	Last commit date
Latest commit History 217 Commits
dataligo		dataligo
docs		docs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
sample_ligo_config.yaml		sample_ligo_config.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DataLigo

Installation

Quick tour

Example DataLigo Pipeline

ETL Pipeline

ML Pipeline

Supported Connectors

Acknowledgement

About

Releases 6

Packages

Contributors 2

Languages

License

hifxit/dataligo

Folders and files

Latest commit

History

Repository files navigation

DataLigo

Installation

Quick tour

Example DataLigo Pipeline

ETL Pipeline

ML Pipeline

Supported Connectors

Acknowledgement

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 6

Packages 0

Contributors 2

Languages

Packages