-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: Add option to enable/disable regex matching for expected fields…
… in file blueprint settings.
- Loading branch information
Showing
12 changed files
with
96 additions
and
102 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
18 changes: 18 additions & 0 deletions
18
ddm/migrations/0046_donationblueprint_expected_fields_regex_matching.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
# Generated by Django 3.2.13 on 2023-12-03 13:31 | ||
|
||
from django.db import migrations, models | ||
|
||
|
||
class Migration(migrations.Migration): | ||
|
||
dependencies = [ | ||
('ddm', '0045_alter_processingrule_comparison_operator'), | ||
] | ||
|
||
operations = [ | ||
migrations.AddField( | ||
model_name='donationblueprint', | ||
name='expected_fields_regex_matching', | ||
field=models.BooleanField(default=False, help_text='Select if you use regex expressions in the "Excpected fields".'), | ||
), | ||
] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
15 changes: 15 additions & 0 deletions
15
ddm/templates/ddm/admin/data_donation/donation_blueprint/block_data_extraction.html
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
<h5 class="pt-4"><b>Data Extraction</b></h5> | ||
<ul> | ||
<li>The data extraction follows <i>extraction rules</i> which can be configured below. These rules are applied | ||
consecutively in the defined order. | ||
</li> | ||
<li><b>Keep data:</b> For every field/column/variable that you want to | ||
keep in the donated data, you first have to define an extraction rule with the "Keep field" operator. | ||
</li> | ||
<li><b>Filter and alter data:</b> Next, you can add rules to filter (i.e., delete) or alter entries in the | ||
uploaded data | ||
(e.g., to delete all entries where the date is < 01.01.2020, or to replace | ||
e-mail-addresses with "ANONYMIZED EMAIL"). For this, there are several comparison and regex operations | ||
available. | ||
</li> | ||
</ul> |
17 changes: 17 additions & 0 deletions
17
ddm/templates/ddm/admin/data_donation/donation_blueprint/block_file_validation.html
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
<div class="ddm-admin-form pt-5"> | ||
<h4>Data Extraction Settings</h4> | ||
|
||
<p>Data Extraction is a two-step process consisting of first the <b>file validation</b> and second the <b>data extraction</b>.</p> | ||
|
||
<h5><b>File Validation</b></h5> | ||
<ul> | ||
<li>First, it is checked whether the expected file is included in the uploaded data (only applies to ZIP uploads). | ||
If the associated File Uploader expects a ZIP Upload, the correct file is identified | ||
using the provided <code>file path</code> (this is skipped for single file uploads).</li> | ||
<li>Second, it is checked whether the uploaded file is in the <code>expected file format</code>.</li> | ||
<li>Third, it is checked whether the identified file contains <b>all</b> <code>expected fields</code>.</li> | ||
<li> | ||
If any of these validation steps fail, the participant will be shown an | ||
exception message explaining what went wrong and no data is extracted.</li> | ||
</ul> | ||
</div> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -23,7 +23,7 @@ | |
|
||
<div class="ddm-admin-form"> | ||
{% for field in form %} | ||
{% if field.name not in "expected_fields,regex_path,exp_file_format,csv_delimiter,json_extraction_root" %} | ||
{% if field.name not in "expected_fields,expected_fields_regex_matching,regex_path,exp_file_format,csv_delimiter,json_extraction_root" %} | ||
<p> | ||
{{ field.label_tag }} | ||
{{ field.errors }} | ||
|
@@ -34,34 +34,11 @@ | |
{% endfor %} | ||
</div> | ||
|
||
<div class="ddm-admin-form"> | ||
<h5>Data Extraction Settings</h5> | ||
</div> | ||
|
||
<div> | ||
<p>Data Extraction is a two-step process:</p> | ||
|
||
<h6><b>1. File Validation</b></h6> | ||
<p> | ||
First, it is checked whether the file that you expect is included in the download. | ||
This means that if the associated File Uploader expects a ZIP Upload, it tries to find the correct | ||
file according to the <i>file path</i> you defined (this is skipped for single file uploads). | ||
</p> | ||
<p> | ||
Next, it is checked whether the uploaded file has the expected format defined in the <i>Expected File Format</i> | ||
setting (and other settings, depending on the file format). | ||
</p> | ||
<p> | ||
Lastly, it is checked whether the identified file contains the expected fields | ||
defined in the <i>Expected Fields</i> setting.<br> | ||
If any of these validation steps fail, the participant will be shown an | ||
exception message explaining what went wrong and the file upload and extraction is aborted. | ||
</p> | ||
</div> | ||
{% include "ddm/admin/data_donation/donation_blueprint/block_file_validation.html" %} | ||
|
||
<div class="ddm-admin-form"> | ||
{% for field in form %} | ||
{% if field.name in "expected_fields,regex_path,exp_file_format,csv_delimiter,json_extraction_root" %} | ||
{% if field.name in "expected_fields,expected_fields_regex_matching,regex_path,exp_file_format,csv_delimiter,json_extraction_root" %} | ||
<p> | ||
{{ field.label_tag }} | ||
{{ field.errors }} | ||
|
@@ -72,29 +49,7 @@ <h6><b>1. File Validation</b></h6> | |
{% endfor %} | ||
</div> | ||
|
||
<div> | ||
<h6><b>2. Data Extraction</b></h6> | ||
<p> | ||
For the data extraction, the Data Donation Module follows the data sparsity paradigm. | ||
This means that the base assumption is, that you do not want any data from your participants, | ||
and you have to explicitly indicate which data fields you want to have included. | ||
</p> | ||
<p> | ||
To keep data in the data donation, you must define <i>Extraction Rules</i>.<br> | ||
An Extraction Rule is always related to one field/column in the uploaded data file | ||
and a data field will only be kept in a participant's donation if it is explicitly | ||
mentioned in at least one of the extraction rules. | ||
</p> | ||
<p> | ||
An extraction rule can either indicate to just keep a field in the donation | ||
(by mentioning the field/column in an extraction rule without defining any concrete comparison operator), | ||
use data contained in a field to delete data entries (i.e., rows) from the donation | ||
(e.g., to delete all entries where the date is < 01.01.2020) or | ||
alter the data contained in a field (e.g., anonymize an e-mail address by replacing "[email protected]" with "EMAIL").<br> | ||
For this, there are several comparison and regex operations available. For the comparison operations, a match | ||
means that a data entry will be deleted. The rules are applied to the uploaded file in the indicated order. | ||
</p> | ||
</div> | ||
{% include "ddm/admin/data_donation/donation_blueprint/block_data_extraction.html" %} | ||
|
||
<div class="pb-3"><i>You will be able to define the extraction rules once you have saved and created the File blueprint.</i></div> | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters