Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Improvement: removed DB Schema validation from update_if_valid() #361

Closed
wants to merge 1 commit into from

Conversation

nh916
Copy link
Contributor

@nh916 nh916 commented Sep 20, 2023

Description

removed DB Schema validation from update_if_valid() to improve performance

Issue Link

https://trello.com/c/1wBnrbBw

Changes

  • removed DB Schema validation within update_if_valid method
    • originally the tests took 54 minutes to run, after the change the tests took 13 minutes
  • changed the docstrings and code to reflect the new updates

Screenshots

click to see test performance before

image

click to see test performance after

image

Notes

  • was thinking of knocking out the DB schema checking through deserialization and serialization, but I don't think I have to because the performance is good where it is
  • I think this is good, but if we decide we need more or less validation later as we go, we can add it in

Checklist

  • My name is on the list of contributors (CONTRIBUTORS.md) in the pull request source branch.
  • I have updated the documentation to reflect my changes.

… A LOT

* changed the docstrings and code to reflect the new updates
@trunk-io
Copy link

trunk-io bot commented Sep 20, 2023

Merging to develop in this repository is managed by Trunk.

  • To merge this pull request, check the box to the left or comment /trunk merge below.

@nh916 nh916 requested a review from InnocentBug September 20, 2023 22:39
Copy link
Collaborator

@InnocentBug InnocentBug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely not a fan of this.
I won't approve.

Human time, working with the SDK is always more valuable then computer time.
Debugging any script with the SDK without this check is an absolute nightmare.
DB Schema error hard to understand in the first place for users, but if the error now pops up at an unexpected location and at a place where you can't figure out its origin is an absolute no go.

If you are really so concerned about CPU performance, make this a flag in the API class.
api.disable_db_schema_check and api.enable_db_schema_check.
And default is checks on.
If a user is done debugging their script and are concerned about performance they can disable the checks.

I also think that this is the wrong place to disable the checks, in my opinion the disabling should happen in api._is_node_schema_valid, hiding it in a node function that is called upon updating seems cryptic to me.

@InnocentBug
Copy link
Collaborator

Also this check should 100% be on for our unit tests.
Otherwise we don't even have a chance to catch issues with the DB schema changes.

@nh916
Copy link
Contributor Author

nh916 commented Sep 21, 2023

I also think that this is the wrong place to disable the checks, in my opinion the disabling should happen in api._is_node_schema_valid, hiding it in a node function that is called upon updating seems cryptic to me.

I do agree that the function looks a bit cryptic because we are saying "update if valid" and then we are just updating it regardless. I was thinking of first putting this idea out and seeing which parts we like and dislike and after we come to agreement we can figure out the rest of refactoring/renaming to make it more readable. I figured we'd probably disagree about this and need to find a middle ground.

I can do a flag, but then we won't have any db schema validation instead of having it in fewer places, which is not ideal.


Human time, working with the SDK is always more valuable then computer time.

I agree, but that is the issues because as the user is sitting trying to run their program, it takes forever for it to run and give back the error. If only uploads were slow then it would make sense, but human time is wasted here waiting for the computer to run and after a long time give an error or success.

I bet the errors will be equally as helpful in debugging if we have it in one place instead of many places.

We could also try to do an experiment or user testing for this as well to be more sure.


Debugging any script with the SDK without this check is an absolute nightmare.

I think it would actually be okay because we have some pretty good validation with beartype and our setters and getters are written nicely and the db schema gives pretty good error are pretty easy and understand and debug from.

Tell me more about your thoughts on what would be hard. Was there a time that you were struggling with the error and had a hard time debugging this SDK?

@nh916
Copy link
Contributor Author

nh916 commented Sep 21, 2023

Also this check should 100% be on for our unit tests. Otherwise we don't even have a chance to catch issues with the DB schema changes.

I think the API also runs the DB schema checks so we have another safety there and gives back the db schema errors.

We could also try to have the db schema run at the end of every unit tests, not sure how to do that yet, but if we like that idea maybe we can research more into it and go from there

@InnocentBug
Copy link
Collaborator

I can do a flag, but then we won't have any db schema validation instead of having it in fewer places, which is not ideal.

A flag is good.
We can write the validate function in a way that allows overwriting the flag.
And we would use that on SAVE.

It will be validated by API anyways, so forceful validation here should be fine.

@InnocentBug
Copy link
Collaborator

I think the API also runs the DB schema checks so we have another safety there and gives back the db schema errors.

Yes, but the feedback from API on DB schema error is really cryptic and I don't think it is effort well spend that we decode this message for the users.

@InnocentBug
Copy link
Collaborator

We could also try to have the db schema run at the end of every unit tests, not sure how to do that yet, but if we like that idea maybe we can research more into it and go from there

In the flag solution, I would just not set the flag for our unit tests.
(Except of course to test the flag itself.)

@nh916
Copy link
Contributor Author

nh916 commented Sep 21, 2023

I can do a flag, but then we won't have any db schema validation instead of having it in fewer places, which is not ideal.

A flag is good. We can write the validate function in a way that allows overwriting the flag. And we would use that on SAVE.

It will be validated by API anyways, so forceful validation here should be fine.

I can write the flag in and I'm thinking of doing it how I wrote the verbose property so it can be changed at any point throughout the script, but not sure right now how to only have db schema work on save and be absent everywhere else. Would you want to do that part after I put in the flag?

@nh916
Copy link
Contributor Author

nh916 commented Sep 21, 2023

We could also try to have the db schema run at the end of every unit tests, not sure how to do that yet, but if we like that idea maybe we can research more into it and go from there

In the flag solution, I would just not set the flag for our unit tests. (Except of course to test the flag itself.)

not sure if I'd want to write a test for this because it is too small and simple and a test for it might just be overkill

@InnocentBug InnocentBug deleted the performance-improvement branch March 27, 2024 13:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants