Fast EML Parser

Purpose: Parse EML files and extract attachments How to run:

Go to Gmail and download the emails you want to parse - in mbox format
Update the constants in mail_parse[fuzzy].py with the path to the mbox file and the client dictionary
Run the python script with the command python mail_parse[fuzzy].py

you can run the script with fuzzy matching by using the mail_parse_fuzzy.py script
this will increase the number of false positives, but will also increase the number of matches

Features

Include OCR support for documents (PDF and JPEG)
Iterate over emails once and check for inclusion in the client dictionary by key
Include search functionality by ID number
Create a client list dictionary with names and IDs (as identifiers)
Include IDs in output folder names

Server setup

ssh-keygen -t rsa
# upload the key to GitHub deploy keys
git clone git@REPO_URL
apt update
apt install -y python3-pip tesseract-ocr tesseract-ocr-heb poppler-utils

mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh
~/miniconda3/bin/conda init bash
~/miniconda3/bin/conda init zsh

conda create -n eml_parse python=3.10.3
conda activate eml_parse

pip3 install pytesseract pdf2image pillow fast-mail-parser fuzzywuzzy futures tqdm

mkdir -p "/tmp/output/"
mkdir -p "/tmp/fast_eml_parst/test/"

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
client_list.py		client_list.py
eml_name_change.py		eml_name_change.py
fuzzy.py		fuzzy.py
mail_parse.py		mail_parse.py
mail_parse_fuzzy.py		mail_parse_fuzzy.py
mbox_split.py		mbox_split.py
ocr_test.py		ocr_test.py
remove_duplicates.py		remove_duplicates.py
rent_agreement_find.py		rent_agreement_find.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fast EML Parser

Features

Server setup

About

Releases

Packages

Languages

dvir1994/fast_eml_parser

Folders and files

Latest commit

History

Repository files navigation

Fast EML Parser

Features

Server setup

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages