-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validation Extension Support #200
base: master
Are you sure you want to change the base?
Validation Extension Support #200
Conversation
- Add config option for resources to require validation. - Only submit to xloader if validation is successful.
- Added debug logging.
- Do not use 2.10+ helper.
- Check toolkit action instead of plugin name.
- Added method docs to explain new method.
# Conflicts: # ckanext/xloader/plugin.py ### RESOLVED.
- Fixed typo. - Fixed try/catch.
- Prevent POST to upload to DS if validation is required. - flake8 if wrap indents.
@ThrawnCA I added the flash message to the |
- Syntax fixes from flake8.
@ThrawnCA fixed flake8 check here now as well. |
- Moved logic into util method. - Renamed config options. - Added new config option to enforce validation schema existance.
# Conflicts: # ckanext/xloader/plugin.py # ckanext/xloader/utils.py ### RESOLVED.
- Added automated tests. - Changed the logic to be better and more clear.
- Fixed a logic case.
Love the work you have done on this @JVickery-TBS . One thing that I think would make this perfect is to see if xloader can hook into validation to run the xloader load in sync mode right after we have a success, or maybe another priority queue so its not delayed again. Since validation success ensure you don't have blank lines or duplicate headers, the fast load should always run and not defer back to messytables slow python orm style commits. Just a bit of background: Something to think about with this integration: Possible other extension enhancement for performance: |
# Conflicts: # ckanext/xloader/config_declaration.yaml # ckanext/xloader/plugin.py # ckanext/xloader/tests/test_plugin.py ### RESOLVED.
- Better conditional syntax.
@duttonw Yeah we do the same thing with our setup as well in which we have to do async mode for validation and xloader as most of the CSVs that get upload are gigantic. As for the UX for users waiting for these things, I am currently working on some UI things (open-data/ckanext-canada#1430) to show the status of Xloader for a Resource, e.g. "Pending" or "Running" etc. Still kind of a work in progress, but let me know if this would be a good thing to put back into the Xloader repo. Regarding the archiver, we actually use the cloud storage plugin, so things have to be gathered from there every time. But with the loop that happens, we kind of actually found out that this was an issue with the Core CKAN code and how the Resource plugin hooks work. Things tend to always get run multiple times because the Resource actions always call the Package actions. Still discussions to go on how to fix that. open-data/ckanext-validation#9 <- I put in a temporary fix on our Validation fork to prevent the looping. Along with a temporary fix in our Xloader one (open-data@df5a4c0). These two things prevent any job looping. This was for sure a major issue as the resource_patch from Xloader would cause another Validation run, which would then cause another Xloader run and stop because file hash did not change, moreover, it would do that to ALL the resources in a dataset. So yeah, updating a single Resource should have enqueued 1 or 2 jobs, but could end up enqueuing like 70. YES the validation and xloader were slow, so because of this loop, some datasets with 200+ resources in them backed up the job queue to a couple thousand. It was terrible and that is when I looked to fix this job looping thing. I will try and get my job looping prevention into the ckanext-validation repo this week. Even if this gets fixed sometime in upstream, should not really have a major impact with the work-around in Validation. But as for your request to have validation do resource_patch and have Xloader ALWAYS do sync instead of async in that specific case, I shall look into that this week as well. Because, yeah, I feel like it could be annoying for a user to have to wait for Validation, and then after that passes, wait again for their Xloader job (that is now at the end of the queue) to run. Especially on busier sites or at busier times when the queue might be larger. (Especially with the job loop stuff happening, this is basically always hahaha) |
- Fixed inline comments to make more sense.
Another issue with asynchronous processing is that the logic in |
Please do create a new pr for the inclusion of the badge. Place it behind a config option to disable badge (default is on) When enabled:
Have another flag to enable all (debug level) messages:
My reasoning for this is to reduce the clutter on resources. You only want to know when things are changing most of the time. Once its done its purpose is served. I'd also only include the url for people who have author or above. Reason being is that we allow non-authors to register to provide 'data requests' as well as commenting and email notification on dataset/resources/groups etc (atm this is paused due to very bad sql performance on dashboard viewing which seem to be a recurring problem ckan/ckan#6028 and ckan/ckan#7878 )
|
@duttonw okay sounds great I will make time in these coming weeks to do the above. (and yeaahhh the Following and the Dashboard stuff gets pretty slow, I think we have disabled following things and the dashboard as well). @ThrawnCA Yeah, I was thinking of how to do the Validation Pass -> Sync Mode Xloader. I am just trying to figure out how to do it all in this plugin. As we have the button to "Upload to Datastore". So I don't know the logic to determine if the caller is from the View or from the Validation Job yet. (maybe I can see if we are currently in a job and see what the job callback is from the |
- Started doing sync mode for xloader right after validation.
Hi @JVickery-TBS , @ThrawnCA did something similar in our branch for the CLI click command for sys admins to force a datastore when under the crunch, you can see in in this pr qld-gov-au#82 Regards, |
That could be easily fixed by putting a hidden input flag in the view. It wouldn't even be a problem if someone spotted it and chose to insert it into an API call. |
- Continued sync mode chaining from validation.
@ThrawnCA @duttonw thanks for sending over that commit, I will check it out to see if I can use some of the code. The issue that I am seeing here is now how RQ runs the jobs in workers. Firstly, you would have to have a worker running for that queue, and have it in the top priority. I would like to not have users need to run a separate queue. Have not read Thrawn's commit yet, but maybe it just executes the burst of that custom queue right away? I would also like to support other devs in which they call the submit to xloader action, it should probably always enqueue by default. I am currently having difficulties preventing loops when running the xloader directly from the validation job. So just trying to figure that one out. |
- Continued sync mode chaining from validation.
@ThrawnCA @duttonw its not exactly where I want it, but am at a bit of a block right now. I got it to a point in which if it is chaining from the Validation job, then it will enqueue a job and push it to the start of the queue. Bad things with this are:
So I am not really sure at this point how to move forward, I feel like the best/easiest thing would be the add some feature to ckanext-validation. The extension DOES have a context variable called |
@JVickery-TBS Have you looked at the interaction between the archiver and QA plugins? Archiver defines an A similar pattern could be used here, but would require changes to the Validation plugin to add callback support. |
# Conflicts: # ckanext/xloader/action.py ### RESOLVED.
- Implement experimental `IPipeValidation` implement.
requires: ckan/ckanext-validation#97 |
- Cannot do tests without `IPipeValidation`.
- Clearer comments. - Clearer log messages.
- Added `ignore_not_sysadmin` validator to the sync key.
if sync: | ||
log.exception('Unable to xloader res_id=%s', res_id) | ||
else: | ||
log.exception('Unable to enqueued xloader res_id=%s', res_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Unable to enqueued" -> "Unable to enqueue"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than @ThrawnCA debug wording tweak. It looks good.
Is these commits used anywhere else (i.e. is it a candidate for squashing for clean history).
Can you also update the README.md and maybe the Change log of what you added.
Since I've recently become a mantainer of the pypi module for this so I'd like to push this out to that soon after. (Doing it correctly with a github action pipeline)
Adds support for the
ckanext-validation
plugin. Behind the new config optionckanext.xloader.requires_validation
, Resources will be required to be validated before being xloadered.