-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Salesforce migration #629
Salesforce migration #629
Conversation
I made some minor corrections, the code is now ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor adjustments and missing docstrings
@djagoda881 random tip, you can just do |
This reverts commit d0f0416.
tests/unit/test_salesforce.py
Outdated
"LastName": ["LastName"], | ||
} | ||
df_restored = pd.DataFrame(data=data_restored) | ||
salesforce.upsert(df=df_restored, table=TABLE_TO_UPSERT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second point remains, I think it may be confusing because you're mixing 2 things together in these fixtures, you're creating a test df but also randomly upserting records. It's actually not easy to understand the tests since you already upsert records in the fixture, I don't think this should be happening?
Perhaps you meant to insert some records before running upsert()
test in order to validate that they are correctly updated in the Salesforce.upsert()
method (to validate the "update" part of upsert()
's functionality)? But in that case, I don't think this needs to be in a fixture, you can do this (note also there is a cleanup at the end, pls also add it - records should be created and removed in each test).
TEST_DF_EXTERNAL = pd.DataFrame({"LastName": [TEST_LAST_NAME], "SAPContactId__c": ["111"]})
def test_upsert(salesforce):
# Setup.
sf = salesforce.salesforce
test_record = TEST_DF_EXTERNAL.to_records() # not sure if this is the data structure required by insert(), pls check
sf.Contact.insert(test_record)
# Test.
salesforce.upsert(
df=TEST_DF_EXTERNAL, table=TABLE_TO_UPSERT, external_id="SAPContactId__c"
)
result = sf.query(
f"SELECT ID, LastName FROM {TABLE_TO_UPSERT} WHERE ID='{ID_TO_UPSERT}'"
)
assert result["records"][0]["LastName"] == TEST_LAST_NAME
# Cleanup
sf.delete(test_record)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a couple of comments
I decided to change the test structure by looking at the advice from 1:1 on Thursday.
|
tests/unit/test_salesforce.py
Outdated
@@ -28,7 +28,7 @@ def test_row_creation(salesforce): | |||
sf.Contact.delete(result["records"][0]["Id"]) | |||
|
|||
|
|||
def test_upsert_external_id_correct(salesforce, test_row_creation): | |||
def test_upsert_external_id_column(salesforce, test_row_creation): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just test_upsert
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the name of the test after the latest changes
tests/unit/test_salesforce.py
Outdated
try: | ||
salesforce.upsert(df=df, table="Contact", external_id_column="SAPContactId__c") | ||
except Exception as exception: | ||
raise exception |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this try/except here? Can you add a docstring explaining?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed the all try except blocks from the tests. Inside the function in upsert
and bulk upsert
the commands are executed in the try except block so if they don't work correctly an error will appear.
tests/unit/test_salesforce.py
Outdated
assert created_rows["records"][0]["LastName"] == "viadot-insert-1" | ||
assert created_rows["records"][1]["LastName"] == "viadot-insert-2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we do result_df.equals(expected_df)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is unfortunately not possible to compare in this way. Simple salesforce returns data in OrderedDict
form, it can be converted into a data frame but another problem is the Id. The ID is given on insertion and takes the form of a unique hash e.g. ('Id', '0035E000031FfsMQAS')
. I would have to add another query that returns the name itself, but this would involve further complications in the code
tests/unit/test_salesforce.py
Outdated
sf.Contact.delete(created_rows["records"][1]["Id"]) | ||
|
||
|
||
def test_upsert_external_id_column_wrong(salesforce, test_row_creation): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you pls change the name to test_upsert_incorrect_external_id_column
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
viadot/sources/salesforce.py
Outdated
|
||
Args: | ||
df (pd.DataFrame): The DataFrame to upsert. | ||
df (pd.DataFrame): Dataframe containing the rows to Bulk upsert. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bulk -> bulk, given -> provided/specified, Dataframe -> DataFrame/pandas DataFrame
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
viadot/sources/salesforce.py
Outdated
Args: | ||
domain (str, optional): Domain of a connection. Defaults to 'test' (sandbox). | ||
Can be added only if built-in username/password/security token is provided. | ||
client_id (str, optional): Client id to keep the track of API calls. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
id -> id, keep the track -> keep track
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
viadot/sources/salesforce.py
Outdated
Using an upsert operation gives you more control over logging and error handling than using Bulk upsert. | ||
|
||
Args: | ||
df (pd.DataFrame): Dataframe containing the rows to upsert. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pandas DataFrame
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
viadot/sources/salesforce.py
Outdated
Download all data from the indicated table or the result of the specified query. | ||
|
||
Args: | ||
query (str, optional): Query for download the specific data. Defaults to None. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Query for download -> The query to be used to download the data. (appears 2x)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
viadot/sources/salesforce.py
Outdated
Args: | ||
query (str, optional): Query for download the specific data. Defaults to None. | ||
table (str, optional): Table name. Defaults to None. | ||
columns (List[str], optional): List of columns which are needed, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"List of columns which are needed, requires table argument"-> "list of required columns; requires table
to be specified" (appears 2x)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
viadot/sources/salesforce.py
Outdated
columns: List[str] = None, | ||
) -> pd.DataFrame: | ||
""" | ||
Downloads the indicated data and returns the Dataframe. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Downloads the indicated data returns it as a pandas DataFrame
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The DE team has created a new version of this PR. |
Link pls @djagoda881 |
New PR: #708 |
Summary
The Salesforce source was moved from viadot version 1. The source was moved along with the unit tests. The structure of the source has undergone the following changes compared to viadot 1:
Importance
Marge It is needed to use source in prefect-viadot container
Checklist
This PR:
CHANGELOG.md
with a summary of the changes