This repo also hold the Official code of Alleviating pseudo-touching in attention U-Net-based binarization approach for the historical Tibetan document images(deprecated) and order prediction in IACC-DAR-AlphX-Code.
This project aims to provide a solution for image segmentation that can be used in many fields, e.g. document binarization. This repo is simple, efficient and flexiable, you can modify anything you want.
Ensure that the following Python packages have been installed:
pip install numpy
pip install torch
pip install torchvision
pip install opencv-python
pip install tensorboard
pip install tqdm
or just pip install the missing package is more than enough.
Set imgs_dir
and masks_dir
to your path.
Here, --input
is the path to the input image and --output
is the path to where the model will write the output.
If you wish to train the model or use your own dataset, follow these steps:
-
Prepare your data as requested.
-
Navigate to the base directory in the terminal and run the following command:
python train.py
Args :
--imgs_dir: Directory of input images
--masks_dir: Directory of GT masks
--dir_checkpoint: Directory to save the checkpoints.
--input_size: Size of input images
--epoch: Number of epochs for training
--batch_size: Batch size for training
--val_percent: Percentage of validation data
--lr: Learning rate for training
--weight_decay: Weight decay factor for training
--momentum: Momentum factor for the optimizer
You can add those args if needed.
python infer.py
Args :
--imgs_dir: The directory where input images are located.
--out_dir: The directory to save the output images after processing.
--model_pth: The path to the trained network model.
--batch_size: The number of images to process in each batch.
--patch_size: The size of each image patch to be processed, recommand 256.
--bitwise_img_size: The size of the images after bitwise operations. We recommend setting this value to as large as possible.
You can add those args if needed.