Skip to content

Commit

Permalink
Merge pull request #385 from ENCODE-DCC/PIPE-77_shorten-conda-env-name
Browse files Browse the repository at this point in the history
Pipe 77 shorten conda env name
  • Loading branch information
leepc12 authored Jun 13, 2022
2 parents 3260451 + f70500e commit b3c6564
Show file tree
Hide file tree
Showing 8 changed files with 70 additions and 78 deletions.
2 changes: 1 addition & 1 deletion .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ make_tag: &make_tag
commands:
install_python3_caper_gcs:
description: "Install py3, caper and gcs. Set py3 as default python."
steps:
steps:
- run:
command: |
sudo apt-get update && sudo apt-get install software-properties-common git wget curl -y
Expand Down
81 changes: 38 additions & 43 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,36 +3,17 @@
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.156534.svg)](https://doi.org/10.5281/zenodo.156534)[![CircleCI](https://circleci.com/gh/ENCODE-DCC/atac-seq-pipeline/tree/master.svg?style=svg)](https://circleci.com/gh/ENCODE-DCC/atac-seq-pipeline/tree/master)


## Updated genome TSV files (v3 -> v4)
## Conda environment name change (since v2.2.0 or 6/13/2022)



## Download new Caper>=2.1

New Caper is out. You need to update your Caper to work with the latest ENCODE ATAC-seq pipeline.
```bash
$ pip install caper --upgrade
```

## Local/HPC users and new Caper>=2.1

There are tons of changes for local/HPC backends: `local`, `slurm`, `sge`, `pbs` and `lsf`(added). Make a backup of your current Caper configuration file `~/.caper/default.conf` and run `caper init`. Local/HPC users need to reset/initialize Caper's configuration file according to your chosen backend. Edit the configuration file and follow instructions in there.
```bash
$ cd ~/.caper
$ cp default.conf default.conf.bak
$ caper init [YOUR_BACKEND]
Pipeline's Conda environment's names have been shortened to work around the following error:
```

In order to run a pipeline, you need to add one of the following flags to specify the environment to run each task within. i.e. `--conda`, `--singularity` and `--docker`. These flags are not required for cloud backend users (`aws` and `gcp`).
```bash
# for example
$ caper run ... --singularity
PaddingError: Placeholder of length '80' too short in package /XXXXXXXXXXX/miniconda3/envs/
```

For Conda users, **RE-INSTALL PIPELINE'S CONDA ENVIRONMENT AND DO NOT ACTIVATE CONDA ENVIRONMENT BEFORE RUNNING PIPELINES**. Caper will internally call `conda run -n ENV_NAME CROMWELL_JOB_SCRIPT`. Just make sure that pipeline's new Conda environments are correctly installed.
You need to reinstall pipeline's Conda environment. It's recommended to do this for every version update.
```bash
$ scripts/uninstall_conda_env.sh
$ scripts/install_conda_env.sh
$ bash scripts/uninstall_conda_env.sh
$ bash scripts/install_conda_env.sh
```

## Introduction
Expand All @@ -51,31 +32,44 @@ The ATAC-seq pipeline protocol specification is [here](https://docs.google.com/d

1) Make sure that you have Python>=3.6. Caper does not work with Python2. Install Caper and check its version >=2.0.
```bash
$ python --version
$ pip install caper

# use caper version >= 2.3.0 for a new HPC feature (caper hpc submit/list/abort).
$ caper -v
```
2) Make a backup of your Caper configuration file `~/.caper/default.conf` if you are upgrading from old Caper(<2.0.0). Reset/initialize Caper's configuration file. Read Caper's [README](https://github.com/ENCODE-DCC/caper/blob/master/README.md) carefully to choose a backend for your system. Follow the instruction in the configuration file.
2) Read Caper's [README](https://github.com/ENCODE-DCC/caper/blob/master/README.md) carefully to choose a backend for your system. Follow the instruction in the configuration file.
```bash
# make a backup of ~/.caper/default.conf if you already have it
# this will overwrite the existing conf file ~/.caper/default.conf
# make a backup of it first if needed
$ caper init [YOUR_BACKEND]

# then edit ~/.caper/default.conf
# edit the conf file
$ vi ~/.caper/default.conf
```

3) Git clone this pipeline.
> **IMPORTANT**: use `~/atac-seq-pipeline/atac.wdl` as `[WDL]` in Caper's documentation.

```bash
$ cd
$ git clone https://github.com/ENCODE-DCC/atac-seq-pipeline
```

4) (Optional for Conda users) Install pipeline's Conda environments if you don't have Singularity or Docker installed on your system. We recommend to use Singularity instead of Conda. If you don't have Conda on your system, install [Miniconda3](https://docs.conda.io/en/latest/miniconda.html).
4) (Optional for Conda) **DO NOT USE A SHARED CONDA. INSTALL YOUR OWN [MINICONDA3](https://docs.conda.io/en/latest/miniconda.html) AND USE IT.** Install pipeline's Conda environments if you don't have Singularity or Docker installed on your system. We recommend to use Singularity instead of Conda.
```bash
# check if you have Singularity on your system, if so then it's not recommended to use Conda
$ singularity --version

# check if you are not using a shared conda, if so then delete it or remove it from your PATH
$ which conda

# change directory to pipeline's git repo
$ cd atac-seq-pipeline
# uninstall old environments (<2.0.0)

# uninstall old environments
$ bash scripts/uninstall_conda_env.sh

# install new envs, you need to run this for every pipeline version update.
# it may be killed if you run this command line on a login node.
# it's recommended to make an interactive node and run it there.
$ bash scripts/install_conda_env.sh
```

Expand All @@ -96,22 +90,23 @@ You can use URIs(`s3://`, `gs://` and `http(s)://`) in Caper's command lines and

According to your chosen platform of Caper, run Caper or submit Caper command line to the cluster. You can choose other environments like `--singularity` or `--docker` instead of `--conda`. But you must define one of the environments.

The followings are just examples. Please read [Caper's README](https://github.com/ENCODE-DCC/caper) very carefully to find an actual working command line for your chosen platform.
PLEASE READ [CAPER'S README](https://github.com/ENCODE-DCC/caper) VERY CAREFULLY BEFORE RUNNING ANY PIPELINES. YOU WILL NEED TO CORRECTLY CONFIGURE CAPER FIRST. These are just example command lines.

```bash
# Run it locally with Conda (You don't need to activate it, make sure to install Conda envs first)
# Run it locally with Conda (DO NOT ACTIVATE PIPELINE'S CONDA ENVIRONEMT)
$ caper run atac.wdl -i https://storage.googleapis.com/encode-pipeline-test-samples/encode-atac-seq-pipeline/ENCSR356KRQ_subsampled.json --conda

# Or submit it as a leader job (with long/enough resources) to SLURM (Stanford Sherlock) with Singularity
# It will fail if you directly run the leader job on login nodes
$ sbatch -p [SLURM_PARTITON] -J [WORKFLOW_NAME] --export=ALL --mem 4G -t 4-0 --wrap "caper run atac.wdl -i https://storage.googleapis.com/encode-pipeline-test-samples/encode-atac-seq-pipeline/ENCSR356KRQ_subsampled.json --singularity"
# On HPC, submit it as a leader job to SLURM with Singularity
$ caper hpc submit atac.wdl -i https://storage.googleapis.com/encode-pipeline-test-samples/encode-atac-seq-pipeline/ENCSR356KRQ_subsampled.json --singularity --leader-job-name ANY_GOOD_LEADER_JOB_NAME

# Check status of your leader job
$ squeue -u $USER | grep [WORKFLOW_NAME]
# Check job ID and status of your leader jobs
$ caper hpc list

# Cancel the leader node to close all of its children jobs
$ scancel -j [JOB_ID]
```

# If you directly use cluster command like scancel or qdel then
# child jobs will not be terminated
$ caper hpc abort [JOB_ID]
```

## Running and sharing on Truwl

Expand Down
30 changes: 15 additions & 15 deletions atac.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,10 @@ struct RuntimeEnvironment {
}

workflow atac {
String pipeline_ver = 'v2.1.3'
String pipeline_ver = 'v2.2.0'

meta {
version: 'v2.1.3'
version: 'v2.2.0'

author: 'Jin wook Lee'
email: '[email protected]'
Expand All @@ -19,9 +19,9 @@ workflow atac {

specification_document: 'https://docs.google.com/document/d/1f0Cm4vRyDQDu0bMehHD7P7KOMxTOP-HiNoIvL1VcBt8/edit?usp=sharing'

default_docker: 'encodedcc/atac-seq-pipeline:v2.1.3'
default_singularity: 'https://encode-pipeline-singularity-image.s3.us-west-2.amazonaws.com/atac-seq-pipeline_v2.1.3.sif'
default_conda: 'encode-atac-seq-pipeline'
default_docker: 'encodedcc/atac-seq-pipeline:v2.2.0'
default_singularity: 'https://encode-pipeline-singularity-image.s3.us-west-2.amazonaws.com/atac-seq-pipeline_v2.2.0.sif'
default_conda: 'encd-atac'
croo_out_def: 'https://storage.googleapis.com/encode-pipeline-output-definition/atac.croo.v5.json'

parameter_group: {
Expand Down Expand Up @@ -72,12 +72,12 @@ workflow atac {
}
input {
# group: runtime_environment
String docker = 'encodedcc/atac-seq-pipeline:v2.1.3'
String singularity = 'https://encode-pipeline-singularity-image.s3.us-west-2.amazonaws.com/atac-seq-pipeline_v2.1.3.sif'
String conda = 'encode-atac-seq-pipeline'
String conda_macs2 = 'encode-atac-seq-pipeline-macs2'
String conda_spp = 'encode-atac-seq-pipeline-spp'
String conda_python2 = 'encode-atac-seq-pipeline-python2'
String docker = 'encodedcc/atac-seq-pipeline:v2.2.0'
String singularity = 'https://encode-pipeline-singularity-image.s3.us-west-2.amazonaws.com/atac-seq-pipeline_v2.2.0.sif'
String conda = 'encd-atac'
String conda_macs2 = 'encd-atac-macs2'
String conda_spp = 'encd-atac-spp'
String conda_python2 = 'encd-atac-py2'

# group: pipeline_metadata
String title = 'Untitled'
Expand Down Expand Up @@ -255,22 +255,22 @@ workflow atac {
conda: {
description: 'Default Conda environment name to run WDL tasks. For Conda users only.',
group: 'runtime_environment',
example: 'encode-atac-seq-pipeline'
example: 'encd-atac'
}
conda_macs2: {
description: 'Conda environment name for task macs2. For Conda users only.',
group: 'runtime_environment',
example: 'encode-atac-seq-pipeline-macs2'
example: 'encd-atac-macs2'
}
conda_spp: {
description: 'Conda environment name for tasks spp/xcor. For Conda users only.',
group: 'runtime_environment',
example: 'encode-atac-seq-pipeline-spp'
example: 'encd-atac-spp'
}
conda_python2: {
description: 'Conda environment name for tasks with python2 wrappers (tss_enrich). For Conda users only.',
group: 'runtime_environment',
example: 'encode-atac-seq-pipeline-python2'
example: 'encd-atac-py2'
}
title: {
description: 'Experiment title.',
Expand Down
8 changes: 2 additions & 6 deletions docs/build_genome_database.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,7 @@

# How to build genome database

1. [Install Conda](https://conda.io/miniconda.html). Skip this if you already have equivalent Conda alternatives (Anaconda Python). Download and run the [installer](https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh). Agree to the license term by typing `yes`. It will ask you about the installation location. On Stanford clusters (Sherlock and SCG4), we recommend to install it outside of your `$HOME` directory since its filesystem is slow and has very limited space. At the end of the installation, choose `yes` to add Miniconda's binary to `$PATH` in your BASH startup script.
```bash
$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
$ bash Miniconda3-latest-Linux-x86_64.sh
```
1. [Install Conda](https://conda.io/miniconda.html).

2. Install pipeline's Conda environment.
```bash
Expand All @@ -22,7 +18,7 @@
3. Choose `GENOME` from `hg19`, `hg38`, `mm9` and `mm10` and specify a destination directory. This will take several hours. We recommend not to run this installer on a login node of your cluster. It will take >8GB memory and >2h time.
```bash
$ conda activate encode-atac-seq-pipeline
$ conda activate encd-atac
$ bash scripts/build_genome_data.sh [GENOME] [DESTINATION_DIR]
```
Expand Down
10 changes: 5 additions & 5 deletions scripts/install_conda_env.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,23 +5,23 @@ SH_SCRIPT_DIR=$(cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd)

echo "$(date): Installing pipeline's Conda environments..."

conda create -n encode-atac-seq-pipeline --file ${SH_SCRIPT_DIR}/requirements.txt \
conda create -n encd-atac --file ${SH_SCRIPT_DIR}/requirements.txt \
--override-channels -c bioconda -c defaults -y

conda create -n encode-atac-seq-pipeline-macs2 --file ${SH_SCRIPT_DIR}/requirements.macs2.txt \
conda create -n encd-atac-macs2 --file ${SH_SCRIPT_DIR}/requirements.macs2.txt \
--override-channels -c bioconda -c defaults -y

conda create -n encode-atac-seq-pipeline-spp --file ${SH_SCRIPT_DIR}/requirements.spp.txt \
conda create -n encd-atac-spp --file ${SH_SCRIPT_DIR}/requirements.spp.txt \
--override-channels -c r -c bioconda -c defaults -y

# adhoc fix for the following issues:
# - https://github.com/ENCODE-DCC/chip-seq-pipeline2/issues/259
# - https://github.com/ENCODE-DCC/chip-seq-pipeline2/issues/265
# force-install readline 6.2, ncurses 5.9 from conda-forge (ignoring dependencies)
conda install -n encode-atac-seq-pipeline-spp --no-deps --no-update-deps -y \
conda install -n encd-atac-spp --no-deps --no-update-deps -y \
readline==6.2 ncurses==5.9 -c conda-forge

conda create -n encode-atac-seq-pipeline-python2 --file ${SH_SCRIPT_DIR}/requirements.python2.txt \
conda create -n encd-atac-py2 --file ${SH_SCRIPT_DIR}/requirements.python2.txt \
--override-channels -c conda-forge -c bioconda -c defaults -y

echo "$(date): Done successfully."
Expand Down
1 change: 1 addition & 0 deletions scripts/requirements.python2.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ python ==2.7.16
biopython ==1.76
metaseq ==0.5.6
samtools ==1.9
gffutils ==0.10.1 # 0.11.0 is not py2 compatible

python-dateutil ==2.8.0
grep
Expand Down
8 changes: 4 additions & 4 deletions scripts/uninstall_conda_env.sh
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
#!/bin/bash

PIPELINE_CONDA_ENVS=(
encode-atac-seq-pipeline
encode-atac-seq-pipeline-macs2
encode-atac-seq-pipeline-spp
encode-atac-seq-pipeline-python2
encd-atac
encd-atac-macs2
encd-atac-spp
encd-atac-py2
)
for PIPELINE_CONDA_ENV in "${PIPELINE_CONDA_ENVS[@]}"
do
Expand Down
8 changes: 4 additions & 4 deletions scripts/update_conda_env.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@ SH_SCRIPT_DIR=$(cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd)
SRC_DIR=${SH_SCRIPT_DIR}/../src

PIPELINE_CONDA_ENVS=(
encode-atac-seq-pipeline
encode-atac-seq-pipeline-macs2
encode-atac-seq-pipeline-spp
encode-atac-seq-pipeline-python2
encd-atac
encd-atac-macs2
encd-atac-spp
encd-atac-py2
)
chmod u+rx ${SRC_DIR}/*.py

Expand Down

0 comments on commit b3c6564

Please sign in to comment.