Skip to content

This repo is dedicated to building a deep learning framework under development that can identify and search through Russian Birth/Marriage/Death records for names of individuals and other translational purposes.

Notifications You must be signed in to change notification settings

dusevitch/DL_for_searching_Russian_Records

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 

Repository files navigation

DL_for_searching_Russian_Records

This repo is dedicated to building a deep learning system under development that can identify and search through Russian Birth/Marriage/Death records for names of individuals and other translational purposes.

Vision and Background

The vision of this project is to use deep learning and computer vision methods to create a workflow that allows English speakers to search more easily through Russian family history records to find names more easily. After digging through many of these records, it has been difficult for me personally to distiguish handwritten characters, going back and forth between handwritten character guides and spending a lot of time searching through record documents. I felt like there was a better way to do this more efficiently and quickly to search through more records.

Here is a basic example of some typical Birth/Marriage/Death records along with example title pages and indices that are sometimes included:

[insert pictures here]

Methods

Depending on the records, since there are a vast number, a bounding box table structure is used to separate (generally) any data recorded, and printed text is often used for the titles. Detecting table regions and title text is an easier first step which can lead to further separation of data for detection of handwritten information.

Currently (July 2020) detecting handwritten text is a difficult problem still currently being solved in the computer vision/deep learning community, but I still believe that individual characters could still be found using current methods in handwritten text. I'm trying to start from searching for names using the first (typically capital) letter of the surname based on its placement in the appropriate box (or somewhat near it).

The smaller incremental goals associated with this vision are as follows (in no particular order, although some depend on others):

  • Detecting different regions according to bounding boxes
  • Using OCR to convert printed title text into Russian words, then into english phrases (direct translations, and adjusted translations)
  • Detecting Russian handwritten letters (capital vs lowercase)
  • Using Deep learning to classify between Birth, Marriage, Death, Title, and index pages (or regions)
  • Detecting words from regions that exit boxes
  • Detecting Russian handwritten Capital letters inside of a box (from a search, either name or attending, etc.)
  • Separating different regions of birth/Marriage/Death/title/index parts of pages for analysis?
  • Building an online javascript system to run this system so that it can be integrated with FamilySearch sites

Future work/ideas that may be particularly difficult/time consuming:

  • Handwritten word detection in Russian
  • Text translation overlay from handwritten text (dependent on above)
  • Extensions to other languages (other language speakers to search Russian documents, English users to search other language documents)
  • Using Russian name prediction software (see other Russian family history sites [insert link here]) to generate Russian version of english/polish/other names and then search for it in documentation

File Organization

We are currently organizing files by each of the above listed subproject goals, and will create projects that combine many of them in the future.

Running individual projects

See individual .readme files for each project to see how to run them independently.

About

This repo is dedicated to building a deep learning framework under development that can identify and search through Russian Birth/Marriage/Death records for names of individuals and other translational purposes.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published