Skip to content

A pyspark based codebase for fetching and formatting metadata from a LIMS db for IGF

License

Notifications You must be signed in to change notification settings

imperial-genomics-facility/LimsMetadataParsing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LimsMetadataParsing

A pyspark based codebase for fetching and formatting metadata from a LIMS db for IGF

Set up environment

  • Step 1: Get Miniconda
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
  • Step 2: Clone git repo
git clone https://github.com/imperial-genomics-facility/LimsMetadataParsing.git
  • Step 3: Install conda env from the environment.yml file
conda env create -n ENV_NAME --file environment.yml
  • Step 4: Create egg file for LimsMetadataParsing repo
python setup.py bdist_egg

Get UCanAccess

Download UCanAccess from the following link and unzip the contents

Usage

parseAccessDbForMetadata.py [-h] -a ACCESS_DB_PATH -q QUOTE_FILE_PATH
                                 -o OUTPUT_PATH -k KNOWN_PROJECTS_LIST -j
                                   UCANACCESS_JAR_PATH

optional arguments:
  -h, --help                show this help message and exit
  -a ACCESS_DB_PATH, --access_db_path ACCESS_DB_PATH
                            Path to Access LIMS db
  -q QUOTE_FILE_PATH, --quote_file_path QUOTE_FILE_PATH
                            Path to quote xls file
  -o OUTPUT_PATH, --output_path OUTPUT_PATH
                            Output dir path for metadta files
  -k KNOWN_PROJECTS_LIST, --known_projects_list KNOWN_PROJECTS_LIST
                            File containing list of known projects
  -j UCANACCESS_JAR_PATH, --ucanaccess_jar_path UCANACCESS_JAR_PATH
                            Path to ucanaccess jar files

Run spark code

spark-submit \
--master local[NUMBER_OF_CPUS] \
--py-files /path/igfLimsParsing-0.0.1-py3.6.egg \
/path/LimsMetadataParsing/scripts/parseAccessDbForMetadata.py \
-a /path/Database.accdb \
-q /path/Quotes.xlsx \
-o /path/csv_dir \
-k /path/project_list.csv \
-j /path/UCanAccess-4.0.4-bin

About

A pyspark based codebase for fetching and formatting metadata from a LIMS db for IGF

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published