Skip to content
This repository has been archived by the owner on Nov 12, 2024. It is now read-only.

Invalid index data #371

Closed
winwiz1 opened this issue Dec 19, 2020 · 2 comments
Closed

Invalid index data #371

winwiz1 opened this issue Dec 19, 2020 · 2 comments

Comments

@winwiz1
Copy link

winwiz1 commented Dec 19, 2020

The index DE_BE_11001 has data that appears to be invalid:

subregion1_code = BE
subregion1_name = 11001

subregion2_code = Berlin Mitte
subregion2_name = null

The Crisp CSV utility flags other similar indices as invalid:

DE_BE_11002
DE_BE_11003
DE_BE_11004
DE_BE_11005
DE_BE_11006
DE_BE_11007
DE_BE_11008
DE_BE_11009
DE_BE_11010
DE_BE_11011
DE_BE_11012
@winwiz1
Copy link
Author

winwiz1 commented Dec 20, 2020

Thank you for starting doing index validation checks introduced today in issue: #372, commit: 48638ea, file: src/test/test_data.py.

You are welcome to use additional constraints/data integrity rules from Crisp CSV. And to run this or similar utility, as suggested in #186, on the final production data before copying it to the production bucket. This could be considered to be a part of integration tests which compliment the unit tests introduced today. It would be consistent with common practice of maintaining data validation and integrity rules applied to any significant amount of data meant to be used by consumers.

@owahltinez
Copy link
Contributor

Thanks for flagging this issue, it should now be resolved.

Even though we are open to adding post-processing validation of data, we currently do not have a lot of resources but we'll evaluate your suggestion. So far, we have only added pre-processing validation of data in the form of unit tests.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants