Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 0.4.21 PR #799

Merged
merged 69 commits into from
Oct 26, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
9eb2e96
🚀 Bumped version after release
Rafalz13 Oct 12, 2023
e2d3cc6
✨ Added `validate_df` task
Rafalz13 Oct 13, 2023
5ec1956
📝 Updated Changelog
Rafalz13 Oct 13, 2023
d9bab5f
Update task_utils.py
m-paz Oct 19, 2023
0cc1750
Merge pull request #766 from dyvenia/add_validation_on_df_level
m-paz Oct 23, 2023
c4a661c
🎨 added new field extraction
burzec-dyv Oct 23, 2023
ec87569
Changed CHANGELOG.md
burzec-dyv Oct 23, 2023
1e2d708
Merge pull request #769 from burzekj/genesys_webmsg_fix
m-paz Oct 23, 2023
1e93ea8
✨ Added test for column sum+ tests for validate_df
m-paz Oct 23, 2023
6b87f52
⚡️ Removed unused imports and changed logger
m-paz Oct 24, 2023
c3dd445
⚡️ Cleaned logger
m-paz Oct 24, 2023
5379468
Merge pull request #770 from dyvenia/df_tests_sv_tests
m-paz Oct 24, 2023
2118b6d
🎨 Added `validation_df_dict` param
angelika233 Oct 24, 2023
6979def
🔥 removed unused import
angelika233 Oct 24, 2023
6f6581b
added validate_df parameter to EurostatToADLS
gwieloch Oct 24, 2023
8f5348a
added df validation to aselite
dominikjedlinski Oct 24, 2023
176f46a
added `validate_df` task to mysql flow
gwieloch Oct 24, 2023
60148cb
🎨 changed parameter name to `validate_df_dict`
angelika233 Oct 25, 2023
ff50849
add df validation to BigQueryToADLS
jkrobicki Oct 25, 2023
d6f4e6c
✨ Added validate_df_dict parameter
angelika233 Oct 25, 2023
7db5cbd
Merge pull request #772 from dyvenia/c4c_validate_df
m-paz Oct 25, 2023
62d906b
added df_validation
dominikjedlinski Oct 25, 2023
318a8fa
Merge pull request #775 from dyvenia/mediatool_validate_df
m-paz Oct 25, 2023
2ca685c
✨ Added `validate_df_dict` parameter
angelika233 Oct 25, 2023
9cead23
added validate_df task to VidClubToADLS flow
gwieloch Oct 25, 2023
65c7553
added validate_df task to VidClubToADLS
gwieloch Oct 25, 2023
99cbcec
added validate_df
dominikjedlinski Oct 25, 2023
666e711
Merge branch 'vid_club_validate_df' of https://github.com/gwieloch/vi…
gwieloch Oct 25, 2023
c9cc200
updated validation_task for aselite
dominikjedlinski Oct 25, 2023
25e66b2
✨ Added validate_df_dict param tests
jkrobicki Oct 25, 2023
6dce986
✨ Added df validation task to the flow
jkrobicki Oct 25, 2023
46db7a8
added validate_df task to SAP BW class
gwieloch Oct 25, 2023
f41f523
✨ Added validate_df_dict parameter
angelika233 Oct 25, 2023
cecf39b
Merge pull request #773 from gwieloch/eurostat_validate_df
m-paz Oct 25, 2023
beb6b17
Merge pull request #774 from gwieloch/mysql_validate_df
m-paz Oct 25, 2023
4f482a7
🎨 changed import
angelika233 Oct 25, 2023
ef6002c
added `validate_df` task to `SupermetricsToADLS` flow
gwieloch Oct 25, 2023
971ea8e
Merge pull request #777 from gwieloch/vid_club_validate_df
m-paz Oct 25, 2023
eab1c93
Merge pull request #776 from dyvenia/sap_rfc_validate_df
m-paz Oct 25, 2023
101efa6
added `validate_df` task to `CustomerGaugeToADLS` class
gwieloch Oct 25, 2023
d025ed8
Merge pull request #785 from gwieloch/supermetrics_validate_df
m-paz Oct 25, 2023
a8de722
🎨 Updated validate_df
angelika233 Oct 26, 2023
5e49816
🎨 Added run to task
angelika233 Oct 26, 2023
eb86bdf
Merge pull request #786 from gwieloch/cgauge_validate_df
m-paz Oct 26, 2023
b7f9f56
✅ Added tests for outlook_to_adls flow
jkrobicki Oct 26, 2023
bed6995
Cleaned imports and secrets
jkrobicki Oct 26, 2023
5a5b153
Corrected parameter to validate_df_dict
jkrobicki Oct 26, 2023
d18fabe
Corrected parameter to validate_df_dict
jkrobicki Oct 26, 2023
ad6432f
Corrected validation if statement
jkrobicki Oct 26, 2023
6e396c4
Corrected validation if statement
jkrobicki Oct 26, 2023
80c9689
Merge pull request #778 from dominikjedlinski/hubspot_validate_df
m-paz Oct 26, 2023
15fabd7
Merge pull request #779 from jkrobicki/add_validate_df_to_bidgquery
m-paz Oct 26, 2023
a293f0f
Merge pull request #781 from gwieloch/sapbw_validate_df
m-paz Oct 26, 2023
34e40cb
Delete tests/integration/flows/test_outlook_to_adls.py
m-paz Oct 26, 2023
ee2071d
reverted test_aselite_to_adls
dominikjedlinski Oct 26, 2023
6d857b5
Merge pull request #788 from jkrobicki/outlook_validate_df
m-paz Oct 26, 2023
20db130
Merge pull request #791 from dominikjedlinski/salesforce_validate_df
m-paz Oct 26, 2023
d6590ae
Sharepoint list connector
marcinpurtak Oct 26, 2023
96066e4
Merge pull request #782 from dyvenia/genesys_validate_df
m-paz Oct 26, 2023
7282d4a
Merge pull request #793 from dominikjedlinski/aselite_validate_df
m-paz Oct 26, 2023
fda3c81
Merge branch 'dev' of https://github.com/dyvenia/viadot into sharepoi…
angelika233 Oct 26, 2023
dceee62
✨ Added fetch all
angelika233 Oct 26, 2023
44dc50f
🎨 formatted code
angelika233 Oct 26, 2023
61dc36f
🎨 formatted
angelika233 Oct 26, 2023
fa9b814
🔥 removed connection test
angelika233 Oct 26, 2023
ef1900e
Merge pull request #796 from angelika233/sharepoint_list_formatted
m-paz Oct 26, 2023
4338f55
added order for validation_task for vaidot flows
gwieloch Oct 26, 2023
5117a6e
Merge pull request #798 from gwieloch/validate_df_fix
m-paz Oct 26, 2023
6be1ad8
📝 Updated changelog before release
m-paz Oct 26, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 10 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,20 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

## [0.4.21] - 2023-10-26
### Added
- Added `validate_df` task to task_utils.
- Added `SharepointList` source class.
- Added `SharepointListToDF` task class.
- Added `SharepointListToADLS` flow class.
- Added tests for `SharepointList`.
- Added `get_nested_dict` to untils.py.

### Fixed

### Changed

- Changed `GenesysToCSV` logic for end_point == "conversations". Added new fields to extraction.

## [0.4.20] - 2023-10-12
### Added
Expand Down Expand Up @@ -618,4 +626,4 @@ specified in the `SUPERMETRICS_DEFAULT_USER` secret
- Moved from poetry to pip

### Fixed
- Fix `AzureBlobStorage`'s `to_storage()` method is missing the final upload blob part
- Fix `AzureBlobStorage`'s `to_storage()` method is missing the final upload blob part
69 changes: 68 additions & 1 deletion tests/integration/flows/test_bigquery_to_adls.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,16 @@
import os

import pendulum
from prefect.tasks.secrets import PrefectSecret
import pytest
from unittest import mock
import pandas as pd

from prefect.tasks.secrets import PrefectSecret
from viadot.flows import BigQueryToADLS
from viadot.tasks import AzureDataLakeRemove

from viadot.exceptions import ValidationError

ADLS_DIR_PATH = "raw/tests/"
ADLS_FILE_NAME = str(pendulum.now("utc")) + ".parquet"
BIGQ_CREDENTIAL_KEY = "BIGQUERY-TESTS"
Expand Down Expand Up @@ -72,6 +77,68 @@ def test_bigquery_to_adls_false():
assert result.is_failed()
os.remove("test_bigquery_to_adls_overwrite_false.parquet")
os.remove("test_bigquery_to_adls_overwrite_false.json")


DATA = {
"type": ["banner", "banner"],
"country": ["PL", "DE"],
}


@mock.patch(
"viadot.tasks.BigQueryToDF.run",
return_value=pd.DataFrame(data=DATA),
)
@pytest.mark.run
def test_bigquery_to_adls_validate_df_fail(mocked_data):
flow_bigquery = BigQueryToADLS(
name="Test BigQuery to ADLS validate df fail",
dataset_name="official_empty",
table_name="space",
credentials_key=BIGQ_CREDENTIAL_KEY,
adls_file_name=ADLS_FILE_NAME,
overwrite_adls=True,
adls_dir_path=ADLS_DIR_PATH,
adls_sp_credentials_secret=ADLS_CREDENTIAL_SECRET,
validate_df_dict={"column_list_to_match": ["type", "country", "test"]},
)
try:
result = flow_bigquery.run()
except ValidationError:
pass

os.remove("test_bigquery_to_adls_validate_df_fail.parquet")
os.remove("test_bigquery_to_adls_validate_df_fail.json")


@mock.patch(
"viadot.tasks.BigQueryToDF.run",
return_value=pd.DataFrame(data=DATA),
)
@pytest.mark.run
def test_bigquery_to_adls_validate_df_success(mocked_data):
flow_bigquery = BigQueryToADLS(
name="Test BigQuery to ADLS validate df success",
dataset_name="official_empty",
table_name="space",
credentials_key=BIGQ_CREDENTIAL_KEY,
adls_file_name=ADLS_FILE_NAME,
overwrite_adls=True,
adls_dir_path=ADLS_DIR_PATH,
adls_sp_credentials_secret=ADLS_CREDENTIAL_SECRET,
validate_df_dict={"column_list_to_match": ["type", "country"]},
)
result = flow_bigquery.run()

result = flow_bigquery.run()
assert result.is_successful()

task_results = result.result.values()
assert all([task_result.is_successful() for task_result in task_results])

os.remove("test_bigquery_to_adls_validate_df_success.parquet")
os.remove("test_bigquery_to_adls_validate_df_success.json")

rm = AzureDataLakeRemove(
path=ADLS_DIR_PATH + ADLS_FILE_NAME, vault_name="azuwevelcrkeyv001s"
)
Expand Down
59 changes: 59 additions & 0 deletions tests/integration/flows/test_cloud_for_customers_report_to_adls.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from viadot.config import local_config
from viadot.flows import CloudForCustomersReportToADLS
from viadot.exceptions import ValidationError


def test_cloud_for_customers_report_to_adls():
Expand Down Expand Up @@ -27,3 +28,61 @@ def test_cloud_for_customers_report_to_adls():

task_results = result.result.values()
assert all([task_result.is_successful() for task_result in task_results])

assert len(flow.tasks) == 6


def test_cloud_for_customers_report_to_adls_validation_fail(caplog):
credentials = local_config.get("CLOUD_FOR_CUSTOMERS")
credentials_prod = credentials["Prod"]
channels = ["VEL_B_AFS", "VEL_B_ASA"]
month = ["01"]
year = ["2021"]
flow = CloudForCustomersReportToADLS(
report_url=credentials_prod["server"],
env="Prod",
channels=channels,
months=month,
years=year,
name="test_c4c_report_to_adls",
local_file_path=f"test_c4c_report_to_adls.csv",
adls_sp_credentials_secret=credentials["adls_sp_credentials_secret"],
adls_dir_path=credentials["adls_dir_path"],
validate_df_dict={"column_size": {"ChannelName ID": 10}},
)
try:
result = flow.run()
except ValidationError:
pass


def test_cloud_for_customers_report_to_adls_validation_success():
credentials = local_config.get("CLOUD_FOR_CUSTOMERS")
credentials_prod = credentials["Prod"]
channels = ["VEL_B_AFS", "VEL_B_ASA"]
month = ["01"]
year = ["2021"]
flow = CloudForCustomersReportToADLS(
report_url=credentials_prod["server"],
env="Prod",
channels=channels,
months=month,
years=year,
name="test_c4c_report_to_adls",
local_file_path=f"test_c4c_report_to_adls.csv",
adls_sp_credentials_secret=credentials["adls_sp_credentials_secret"],
adls_dir_path=credentials["adls_dir_path"],
validate_df_dict={"column_size": {"ChannelName ID": 13}},
)

try:
result = flow.run()
except ValidationError:
assert False, "Validation failed but was expected to pass"

assert result.is_successful()

task_results = result.result.values()
assert all([task_result.is_successful() for task_result in task_results])

assert len(flow.tasks) == 7
53 changes: 53 additions & 0 deletions tests/integration/flows/test_customer_gauge_to_adls.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
import pytest

from viadot.flows import CustomerGaugeToADLS
from viadot.exceptions import ValidationError

DATA = {
"user_name": ["Jane", "Bob"],
Expand All @@ -15,6 +16,7 @@
"user_address_country_name": "United States",
"user_address_country_code": "US",
}

COLUMNS = ["user_name", "user_address_street"]
ADLS_FILE_NAME = "test_customer_gauge.parquet"
ADLS_DIR_PATH = "raw/tests/"
Expand All @@ -40,3 +42,54 @@ def test_customer_gauge_to_adls_run_flow(mocked_class):
assert result.is_successful()
os.remove("test_customer_gauge_to_adls_flow_run.parquet")
os.remove("test_customer_gauge_to_adls_flow_run.json")


@mock.patch(
"viadot.tasks.CustomerGaugeToDF.run",
return_value=pd.DataFrame(data=DATA),
)
@pytest.mark.run
def test_customer_gauge_to_adls_run_flow_validation_success(mocked_class):
flow = CustomerGaugeToADLS(
"test_customer_gauge_to_adls_run_flow_validation_success",
endpoint="responses",
total_load=False,
anonymize=True,
columns_to_anonymize=COLUMNS,
adls_dir_path=ADLS_DIR_PATH,
adls_file_name=ADLS_FILE_NAME,
overwrite_adls=True,
validate_df_dict={"column_size": {"user_address_state": 2}},
)
result = flow.run()
assert result.is_successful()
assert len(flow.tasks) == 11

os.remove("test_customer_gauge_to_adls_run_flow_validation_success.parquet")
os.remove("test_customer_gauge_to_adls_run_flow_validation_success.json")


@mock.patch(
"viadot.tasks.CustomerGaugeToDF.run",
return_value=pd.DataFrame(data=DATA),
)
@pytest.mark.run
def test_customer_gauge_to_adls_run_flow_validation_failure(mocked_class):
flow = CustomerGaugeToADLS(
"test_customer_gauge_to_adls_run_flow_validation_failure",
endpoint="responses",
total_load=False,
anonymize=True,
columns_to_anonymize=COLUMNS,
adls_dir_path=ADLS_DIR_PATH,
adls_file_name=ADLS_FILE_NAME,
overwrite_adls=True,
validate_df_dict={"column_size": {"user_name": 5}},
)
try:
flow.run()
except ValidationError:
pass

os.remove("test_customer_gauge_to_adls_run_flow_validation_failure.parquet")
os.remove("test_customer_gauge_to_adls_run_flow_validation_failure.json")
26 changes: 25 additions & 1 deletion tests/integration/flows/test_eurostat_to_adls.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,11 @@

from viadot.flows import EurostatToADLS

DATA = {"geo": ["PL", "DE", "NL"], "indicator": [35, 55, 77]}
DATA = {
"geo": ["PL", "DE", "NL"],
"indicator": [35, 55, 77],
"time": ["2023-01", "2023-51", "2023-07"],
}
ADLS_FILE_NAME = "test_eurostat.parquet"
ADLS_DIR_PATH = "raw/tests/"

Expand All @@ -28,3 +32,23 @@ def test_eurostat_to_adls_run_flow(mocked_class):
assert result.is_successful()
os.remove("test_eurostat_to_adls_flow_run.parquet")
os.remove("test_eurostat_to_adls_flow_run.json")


@mock.patch(
"viadot.tasks.EurostatToDF.run",
return_value=pd.DataFrame(data=DATA),
)
@pytest.mark.run
def test_validate_df(mocked_class):
flow = EurostatToADLS(
"test_validate_df",
dataset_code="ILC_DI04",
overwrite_adls=True,
validate_df_dict={"column_size": {"time": 7}},
adls_dir_path=ADLS_DIR_PATH,
adls_file_name=ADLS_FILE_NAME,
)
result = flow.run()
assert result.is_successful()
os.remove("test_validate_df.parquet")
os.remove("test_validate_df.json")
79 changes: 79 additions & 0 deletions tests/integration/flows/test_hubspot_to_adls.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
import pytest

from viadot.flows import HubspotToADLS
from viadot.exceptions import ValidationError

DATA = {
"id": {"0": "820306930"},
Expand Down Expand Up @@ -60,3 +61,81 @@ def test_hubspot_to_adls_flow_run(mocked_class):
assert result.is_successful()
os.remove("test_hubspot_to_adls_flow_run.parquet")
os.remove("test_hubspot_to_adls_flow_run.json")


@mock.patch(
"viadot.tasks.HubspotToDF.run",
return_value=pd.DataFrame(data=DATA),
)
@pytest.mark.run
def test_hubspot_to_adls_flow_run_validate_fail(mocked_class):
flow = HubspotToADLS(
"test_hubspot_to_adls_flow_run",
hubspot_credentials_key="HUBSPOT",
endpoint="line_items",
filters=[
{
"filters": [
{
"propertyName": "createdate",
"operator": "BETWEEN",
"highValue": "2021-01-01",
"value": "2021-01-01",
},
{"propertyName": "quantity", "operator": "EQ", "value": "2"},
]
},
{
"filters": [
{"propertyName": "amount", "operator": "EQ", "value": "3744.000"}
]
},
],
overwrite_adls=True,
adls_dir_path=ADLS_DIR_PATH,
adls_file_name=ADLS_FILE_NAME,
validate_df_dict={"column_size": {"id": 0}},
)
try:
flow.run()
except ValidationError:
pass


@mock.patch(
"viadot.tasks.HubspotToDF.run",
return_value=pd.DataFrame(data=DATA),
)
@pytest.mark.run
def test_hubspot_to_adls_flow_run_validate_success(mocked_class):
flow = HubspotToADLS(
"test_hubspot_to_adls_flow_run",
hubspot_credentials_key="HUBSPOT",
endpoint="line_items",
filters=[
{
"filters": [
{
"propertyName": "createdate",
"operator": "BETWEEN",
"highValue": "2021-01-01",
"value": "2021-01-01",
},
{"propertyName": "quantity", "operator": "EQ", "value": "2"},
]
},
{
"filters": [
{"propertyName": "amount", "operator": "EQ", "value": "3744.000"}
]
},
],
overwrite_adls=True,
adls_dir_path=ADLS_DIR_PATH,
adls_file_name=ADLS_FILE_NAME,
validate_df_dict={"column_unique_values": ["id"]},
)
result = flow.run()
assert result.is_successful()
os.remove("test_hubspot_to_adls_flow_run.parquet")
os.remove("test_hubspot_to_adls_flow_run.json")
Loading
Loading