-
Notifications
You must be signed in to change notification settings - Fork 4
Patterns and Regular Expressions
Dimitris Katsiros edited this page Aug 25, 2019
·
1 revision
The anonymizer service uses regular expressions to identify sensitive information in the text.
Each regural expression is considered a pattern.
All patterns are stored in patterns.json file.
The anonymizer already contains tested patterns trying to identify the following entities:
- Phone Numbers
- Vehicle Numbers
- Identity Cards
- Ibans
- Afm
- Amka
- Brands
- Addresses
- Known Addresses
- Names and Surnames
- Places
- Decision Numbers
You can turn off a pattern by simply setting "active": "False"
.
You can always add your own regular expressions by turning on custom_regex.
After setting "active": "True"
, the parser will search for any given pattern inside "pattern": {}
.
For example if you want to anonymize emails, you can add:
},
"custom_regex": {
"active": "True",
"pattern": {
"my_custom_email_pattern": "(^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$)"
}
}
If the service ran in verbose mode, all entities identified will be named my_custom_email_pattern and will be anonymized properly.