Skip to content
This repository has been archived by the owner on Jul 16, 2024. It is now read-only.

Remove obsolete scripts - backup runfolder now in automated_scripts, … #42

Merged
merged 3 commits into from
Jun 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,2 @@
*.pyc
wscleaner/wscleaner/config.json
wscleaner/test/test_dir*.txt
wscleaner/test/data
107 changes: 51 additions & 56 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,75 +1,70 @@
# Workstation Housekeeping v1.11

Scripts to manage data on the NGS workstation

---

## backup_runfolder.py

Uploads an Illumina runfolder to DNANexus.

### Usage

```bash
backup_runfolder.py [-h] -i RUNFOLDER [-a AUTH_TOKEN] [--ignore IGNORE] [-p PROJECT] [--logpath LOGPATH]
```

### What are the dependencies for this script?
## Workstation Cleaner (wscleaner)

This tool requires the DNAnexus utilities `ua` (upload agent) and `dx` (DNAnexus toolkit) to be available in the system PATH. Python3 is required, and this tool uses packages from the standard library.
Workstation Cleaner (wscleaner) deletes local directories that have been uploaded to the DNAnexus cloud storage service.

### How does this tool work?
When executed, Runfolders in the input (root) directory are deleted based on the following criteria:

* The script parses the input parameters, asserting that the given runfolder exists.
* If the `-p` option is given, the script attempts to find a matching DNAnexus project. Otherwise, it looks for a single project matching the runfolder name. If more or less than 1 project matches, the script logs an error and exits.
* The runfolder is traversed and a list of files in each folder is obtained. If any comma-separated strings passed to the `--ignore` argument are present within the filepath, or filename the file is excluded.
* A single DNAnexus project is found matching the runfolder name
* All local FASTQ files are uploaded and in a 'closed' state
* X logfiles are present in the DNA Nexus project /Logfiles directory (NB X can be added as a command line argument - default is 5)

* The DNAnexus `ua` utility is used to upload files in batches of 100 at a time. The number of upload tries is set to 100 with the `--tries` flag.
* Orthogonal tests are performed to:
* A count of files that should be uploaded (using the ignore terms if provided)
* A count of files in the DNA Nexus project
* (If relevant) A count of files in the DNA Nexus project containing a pattern to be ignored. NB this may not be accurate if the ignore term is found in the result of dx find data (eg present in project name)
* Logs from this and the script are written to a logfile, named "runfolder_backup_runfolder.log". A destination for this file can be passed to the `--logpath` flag.
or if the run is identified as a TSO500 run, based on:
* the bcl2fastq2_output.log file created by the automated scripts
AND
* Presence of `_TSO` in the human readable DNANexus project name

---
A DNAnexus API key must be cached locally using the `--set-key` option.

## findfastqs.sh
## Workstation Environment
The directory `env/` in this repository contains conda environment scripts for the workstation. These remove conflicts in the PYTHONPATH environment variable by editing the variable when conda is activated. The conda documentation describes where to place these scripts under ['saving environment variables'](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#macos-and-linux).

Report the number of gzipped fastq files in an Illumina runfolder.
## Install
As descibed above, on the workstation 2 environments exist - wscleaner and wscleaner_test (for development work).
You need to activate these environment before installing with pip (as below).

### Usage

```bash
$ findfastqs.sh RUNFOLDER
>>> RUNFOLDER has 156 demultiplexed fastq files with 2 undetermined. Total: 158
git clone https://github.com/moka-guys/workstation_housekeeping.git
pip install workstation_housekeeping/wscleaner
wscleaner --version # Print version number
```

---

## Workstation Cleaner (wscleaner)

Delete local directories that have been uploaded to the DNAnexus cloud storage service.
See wscleaner readme for more info

## ngrok_start.sh

Allow SSH access to the system by running ngrok as a background process. As of v1.11 supports dockerised ngrok instance.

### Installation

See knowledge base article for ngrok installation.
## Automated usage
The script `wscleaner_command.sh` is called by the crontab. This activates the enviroment and passes the logfile path (and any other non-default arguments).
A development command script `wscleaner_command_dev.sh` can be used to call the test environment and provide testing arguments, eg --dry-run

### Usage

Non-dockerised ngrok:
## Manual Usage

`sudo bash ngrok_start.sh`

Dockerised ngrok:
```
usage: wscleaner [-h] [--auth AUTH] [--dry-run] [--logfile LOGFILE]
[--min-age MIN_AGE] [--logfile-count LOGFILE_COUNT]
[--version]
root

positional arguments:
root A directory containing runfolders to process

optional arguments:
-h, --help show this help message and exit
--auth AUTH A text file containing the DNANexus authentication
token
--dry-run Perform a dry run without deleting files
--logfile LOGFILE A path for the application logfile
--min-age MIN_AGE The age (days) a runfolder must be to be deleted
--logfile-count LOGFILE_COUNT
The number of logfiles a runfolder must have in
/Logfiles
--version Print version
```

`sudo bash ngrok_start.sh docker`
## Test

### output
```bash
# Run from the cloned repo directory after installation
pytest . --auth_token DNA_NEXUS_KEY
```

The script will output the ngrok connection details
## License

Developed by Synnovis Genome Informatics
Loading