Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CM-420: developed new upload flow with validation #45

Merged
merged 32 commits into from
Jan 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
9467fb9
CM-378: initial algo
Dec 27, 2023
e492c9e
CM-378: update algorithm
Dec 28, 2023
08483c0
CM-378: update algo
Dec 29, 2023
5412a4c
handle_uniqueness
Dec 29, 2023
403f66f
CM-378: check columns of th df
Jan 3, 2024
4539def
CM-378: handle uniqueness
Jan 3, 2024
837c3e3
CM-378: added adjustments after calling whole new validation endpoint
sniedzielski Jan 3, 2024
f88016d
CM-378: fixed code
sniedzielski Jan 3, 2024
bce4990
CM-378: fixing response output for endpoint with validations
sniedzielski Jan 3, 2024
e9faa45
CM-378: added to README information about endpoint
sniedzielski Jan 3, 2024
93a9f16
CM-378: improved README section regarding validations
sniedzielski Jan 3, 2024
d03590e
CM-378: improved README section
sniedzielski Jan 3, 2024
5a4d966
CM-378: improved readme
sniedzielski Jan 4, 2024
4eb329f
validation workflow
sniedzielski Jan 4, 2024
2a7c669
CM-420: remove unused files
sniedzielski Jan 9, 2024
b90467e
CM-420: removed unused line of codes
sniedzielski Jan 9, 2024
fb06b75
CM-420: updated readme to add how validation works in upload workflow
sniedzielski Jan 9, 2024
6000c15
CM-420: updated readme to add how validation works in upload workflow
sniedzielski Jan 9, 2024
960da0a
Merge branch 'develop' of https://github.com/openimis/openimis-be-soc…
sniedzielski Jan 12, 2024
de28489
Merge branch 'develop' of https://github.com/openimis/openimis-be-soc…
sniedzielski Jan 15, 2024
57a3725
CM-420: added information about % of invalid items to display on task…
sniedzielski Jan 19, 2024
501a7a8
CM-420: added check why task is failed - temporary
sniedzielski Jan 19, 2024
c491123
CM-420: removed redundat paranthesises from json
sniedzielski Jan 19, 2024
1e17a4d
CM-420: removed loggers
sniedzielski Jan 19, 2024
b4c6053
CM-420: fixed calculate percentage of invalid items
sniedzielski Jan 19, 2024
517ce03
CM-420: removed line
sniedzielski Jan 19, 2024
a2247fa
CM-420: added tests for services method
sniedzielski Jan 19, 2024
93f9586
CM-420: fixed tests
sniedzielski Jan 19, 2024
afcc1e2
CM-420: fixed tests 2
sniedzielski Jan 19, 2024
292c96f
CM-420: fixed tests 3
sniedzielski Jan 19, 2024
074fa6f
CM-420: fixed tests 4
sniedzielski Jan 19, 2024
05600d0
CM-420: fixed tests 5
sniedzielski Jan 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
254 changes: 254 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,3 +107,257 @@ include related objects, and then click export all.
* Save file in the business model for initialization after deployment in
`openimis-be_social_protection/import_data`.
* Rename filename into `opensearch_beneficiary_dashboard.ndjson`

## Validations and deduplication detection

### Validations endpoint
* This is handled by the POST endpoint 'api/social_protection/validate_import_beneficiaries'.
* The endpoint is utilized within the upload workflow when a user uploads beneficiaries into a specific system.
* The input required is identical to that of the POST endpoint 'api/social_protection/import_beneficiaries' (CSV file).
* The endpoint heavily relies on schema properties. For instance, `validationCalculation` in the schema triggers a specific validation strategy. Similarly, in the duplications section of the schema,
setting `uniqueness: true` signifies the need for duplication checks based on the record's field value.
* Based on the provided schema below (from `programme/benefit plan`), it indicates that validations will run for the `email`
field (`validationCalculation`), and duplication checks will be performed for `national_id` (`uniqueness: true`)
```
{
"$id":"https://example.com/beneficiares.schema.json",
"type":"object",
"title":"Record of beneficiares",
"$schema":"http://json-schema.org/draft-04/schema#",
"properties":{
"email":{
"type":"string",
"description":"email address to contact with beneficiary",
"validationCalculation":{
"name":"EmailValidationStrategy"
}
},
"able_bodied":{
"type":"boolean",
"description":"Flag determining whether someone is able bodied or not"
},
"national_id":{
"type":"string",
"uniqueness":true,
"description":"national id"
},
"educated_level":{
"type":"string",
"description":"The level of person when it comes to the school/education/studies"
},
"chronic_illness":{
"type":"boolean",
"description":"Flag determining whether someone has such kind of illness or not"
},
"national_id_type":{
"type":"string",
"description":"A type of national id"
},
"number_of_elderly":{
"type":"integer",
"description":"Number of elderly"
},
"number_of_children":{
"type":"integer",
"description":"Number of children"
},
"beneficiary_data_source":{
"type":"string",
"description":"The source from where such beneficiary comes"
}
},
"description":"This document records the details beneficiares"
}
```
* An example response after calling the endpoint looks like this:
```
{
"success":true,
"data":[
{
"row":{
"first_name":"Rick",
"last_name":"Scott",
"dob":"2023-07-13",
"email":"[email protected]",
"able_bodied":false,
"national_id":"1345320000AN",
"educated_level":"higher education",
"national_id_type":"National ID Card",
"number_of_elderly":1,
"number_of_children":2,
"beneficiary_data_source":"BENEFICIARY_ETL"
},
"validations":{
"email":{
"success":true,
"field_name":"email",
"note":"Ok",
"duplications":null
},
"national_id_uniqueness":{
"success":false,
"field_name":"national_id",
"note":"'national_id' Field value '1345320000AN' is duplicated",
"duplications":{
"duplicated":true,
"duplicates_amoung_database":[
{
"id":"dbe11b3d-c6db-4912-bc84-c8e3d57afdb7",
"first_name":"TestFN",
"last_name":"TestLN",
"dob":"2023-07-13",
"email":"[email protected]",
"able_bodied":false,
"national_id":"1345320000AN",
"educated_level":"higher education",
"national_id_type":"National ID Card",
"number_of_elderly":1,
"number_of_children":2,
"beneficiary_data_source":"BENEFICIARY_ETL"
},
{
"id":"8321950f-a017-4940-a7ac-977714b685ec",
"first_name":"Lewis",
"last_name":"Test",
"dob":"1998-06-04",
"able_bodied":true,
"national_id":"1345320000AN",
"educated_level":"higher education",
"national_id_type":"passport",
"number_of_elderly":0,
"number_of_children":0,
"beneficiary_data_source":"BENEFICIARY_ETL"
},
{
"id":"bc0c2772-fcfa-46a4-9b42-894707db2c37",
"first_name":"Jacob",
"last_name":"Open",
"dob":"1995-06-01",
"able_bodied":false,
"national_id":"1345320000AN",
"educated_level":"higher education",
"national_id_type":"passport",
"number_of_elderly":0,
"number_of_children":2,
"beneficiary_data_source":"BENEFICIARY_ETL"
},
{
"id":"ea21f84c-28db-4039-96d4-460a96bb2278",
"first_name":"Jacob",
"last_name":"Open",
"dob":"1995-06-01",
"able_bodied":true,
"national_id":"1345320000AN",
"educated_level":"higher education",
"national_id_type":"passport",
"number_of_elderly":0,
"number_of_children":4,
"beneficiary_data_source":"BENEFICIARY_ETL"
}
],
"incoming_duplicates":[
{
"first_name":"Eva",
"last_name":"Jacob",
"dob":"1995-06-01",
"email":"[email protected]",
"able_bodied":true,
"national_id":"1345320000AN",
"educated_level":"secondary education",
"national_id_type":"National ID Card",
"number_of_elderly":1,
"number_of_children":2,
"beneficiary_data_source":"BENEFICIARY_ETL"
}
]
}
}
}
},
{
"row":{
"first_name":"Frank",
"last_name":"Mood",
"dob":"1995-06-01",
"email":"[email protected]",
"able_bodied":true,
"national_id":"134532022LKSD",
"educated_level":"medium education",
"national_id_type":"National ID Card",
"number_of_elderly":0,
"number_of_children":1,
"beneficiary_data_source":"BENEFICIARY_ETL"
},
"validations":{
"email":{
"success":true,
"field_name":"email",
"note":"Ok",
"duplications":null
},
"national_id_uniqueness":{
"success":true,
"field_name":"national_id",
"note":"'national_id' Field value '134532022LKSD' is not duplicated",
"duplications":null
}
}
},
{
"row":{
"first_name":"Jan",
"last_name":"White",
"dob":"1995-06-01",
"email":"janwhitetest.com",
"able_bodied":true,
"national_id":"1345320000ANER",
"educated_level":"higher education",
"national_id_type":"National ID Card",
"number_of_elderly":0,
"number_of_children":4,
"beneficiary_data_source":"BENEFICIARY_ETL"
},
"validations":{
"email":{
"success":false,
"field_name":"email",
"note":"Invalid email format",
"duplications":null
},
"national_id_uniqueness":{
"success":true,
"field_name":"national_id",
"note":"'national_id' Field value '1345320000ANER' is not duplicated",
"duplications":null
}
}
},
]
}
```
* Within the example response, the `data` section contains information about each row in an array format.
* Each element in the array is a dictionary representing a row from the input CSV file.
* Inside this dictionary, the `row` key holds the representation of a new individual/beneficiary entering the system with provided values.
* Under `validations`, you'll find validated fields (if the field in the schema is marked by a validation class) and potential duplicates if
`uniqueness` property is set for the field.
If the property `uniqueness` is set for a particular field, in `validations`, an additional key suffix `_uniqueness` indicates potential duplicates.
* The 'duplication' section shows potential duplicates among incoming (`incoming_duplicates`) and existing records (`duplicates_amoung_database`).
* An empty `validation` property indicates that no validations need processing based on the schema properties.
* Next to the `data` and `success` properties, there is a `summary_invalid_items` field containing a list of uuids of individual data sources which are invalid.
This list is necessary in the Benefit Update workflow to flag such records in the IndividualDataSource.

### Validations in upload workflow
* https://github.com/openimis/openimis-lightning_dkr/tree/develop Here there are two workflows responsible for uploading and validation data: `BenefitPlanUpdate` and `beneficiary-import-valid-items`
* The workflow `BenefitPlanUpdate` is utilized when a file is uploaded using a form in BenefitPlanPage.
* The workflow named `beneficiary-import-valid-items` is activated to confirm valid items following the validation process. Its activation occurs when a task linked to that specific action is initiated.
* The validation operates according to the calculation rule, defining the strategy for determining validation approaches.
* More about calculation strategy verification in calcrule strategy you can find more in [README Section in calculation validation strategy module](https://github.com/openimis/openimis-be-calcrule_validations_py/tree/develop)
* The upload process involves two stages: first, a validation process verifies the data, and upon successful validation,
the data is uploaded. In case of any invalid items, there's an additional step where the user can review and download a
report containing the invalid items. After reviewing the report, the user can proceed to import the valid items through task management.
* For successful scenarios, the status is marked as `SUCCESS`. There's no requirement for maker-checker validation (task) since the process wasn't halted.
* In scenarios where one or more records are invalid, the status is `WAITING_FOR_VERIFICATION`. This indicates the presence of a task in the maker-checker view for verifying the upload of valid items.
The report containing invalid items can be downloaded from the upload history on the benefit plan page.
* When a user accepts the valid items from an import that faced issues with some invalid items and there are no errors in this workflow,
the status of the import is marked as `PARTIAL_SUCCESS`. This triggers the `beneficiary-import-valid-items` workflow in such cases.
10 changes: 10 additions & 0 deletions social_protection/apps.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,11 @@
"gql_check_benefit_plan_update": True,
"gql_check_beneficiary_crud": True,
"gql_check_group_beneficiary_crud": True,
"unique_class_validation": "DeduplicationValidationStrategy",
"validation_calculation_uuid": "4362f958-5894-435b-9bda-df6cadf88352",
"validation_import_valid_items": "validation.import_valid_items",
"validation_download_invalid_items": "validation.download_invalid_items",
"validation_import_valid_items_workflow": "beneficiary-import-valid-items"
}


Expand All @@ -43,6 +48,11 @@ class SocialProtectionConfig(AppConfig):
gql_check_benefit_plan_update = None
gql_check_beneficiary_crud = None
gql_check_group_beneficiary_crud = None
unique_class_validation = None
validation_calculation_uuid = None
validation_import_valid_items = None
validation_download_invalid_items = None
validation_import_valid_items_workflow = None

def ready(self):
from core.models import ModuleConfiguration
Expand Down
4 changes: 4 additions & 0 deletions social_protection/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ class BenefitPlanType(models.TextChoices):
def __str__(self):
return f'Benefit Plan {self.code}'


class Beneficiary(core_models.HistoryBusinessModel):
individual = models.ForeignKey(Individual, models.DO_NOTHING, null=False)
benefit_plan = models.ForeignKey(BenefitPlan, models.DO_NOTHING, null=False)
Expand All @@ -56,6 +57,9 @@ class BenefitPlanDataUploadRecords(core_models.HistoryModel):
benefit_plan = models.ForeignKey(BenefitPlan, models.DO_NOTHING, null=False)
workflow = models.CharField(max_length=50)

def __str__(self):
return f"{self.benefit_plan.code} {self.data_upload.source_name} {self.workflow} {self.date_created}"


class GroupBeneficiary(core_models.HistoryBusinessModel):
group = models.ForeignKey(Group, models.DO_NOTHING, null=False)
Expand Down
Loading
Loading