-
Notifications
You must be signed in to change notification settings - Fork 4
How does the anonymizer work
The main function that implements the whole service's funtionality is find_entities()
.
You can import find_entities()
by simply typing:
from anonymizer.anonymize import find_entities
The function takes the following arguments:
find_entities(ifile,
ofile=None,
method=['strict', "*", "True"],
patterns_file='patterns.json',
verbose=False,
words_array=[],
quick=False)
This is the path of the file given as input.
This is the path of the anonymized output file.
By default, ofile = ifile + '_anonymized' + (extension_name_of_the_original_file).
This is the method used for anonymization. It contains an array with 3 items.
-
'strict'
. -
A char indicating the char that should replace any anonymized words.
-
A number as length if user needs a specific length of symbols for replacement, or
'True'
if user wants the length to be equal to the originals word's length.
If True
the service prints on terminal all entities identified.
A array of words that will be anonymized in document. All words are given by the user manually, therefore giving a flexibility to the service.
If True
, service runs only searching for words in words_array
.
python3 -m anonymizer
-i <inputfile>
-o <outputfile>
-f <folder>
-m <method_used(s,strict)/symbol/(lenght==lenght_of_word)>
-p <patterns.json>
-v <verbose_mode>
-w <string of words separated by commas>
-q <quick_mode>
python3 -m anonymizer -i testfile.odt -o testfile_anonymized.odt -m s/*/True -p anonymizer/patterns.json
- i: Specify the input's file path.
- o: Specify the output's file path.
- f: Specify a folder's path. If set, the module will anonymize all .txt and .odt files in the folder.
- m:
- method: Strict method.
- symbol: Specify the symbol that will replace sensitive information.
- length: If True lenght is set to
len(entity)
, else if a number n is given each entity shall be replaced with symbol n times, always respecting the original alignment/format of the text.
- p: Specify the patterns file. A default pattern file is given in anonymizer/patterns.json
- v: If typed, verbose mode is on. In verbose mode all identified entities are printed on the console.
- w: If typed the parser expects a string with words separated by commas (,). Each word is anonymized in the text, adding flexibility to the service. Therefore the user can anonymize words that may not have been identified by default.
- q: If typed, quick mode is on. In quick mode the service only searches for entities given by user through the -w input. This is useful in cases where a text has already been parsed and user wants to anonymize additional entities.