Project plan

Paper

Multi-omics reveal the lifestyle of the acidophilic, mineral-oxidizing model species Leptospirillum ferriphilum T

A new sequenced genome of L.ferriphilum type strain (species described for the first time) was presented
RNA sequencing in continuous culture and batch bioleaching culture elucidated adaptations to mineral environment
- Bioleaching: extraction of metals from ores through organisms, techniques that present sustainable and environmentally friendly alternatives to traditional methods, recovery of copper from CuFeS2

Plan

Overview

Goals

Reproduce the work of the paper, and answer additional questions

De novo assembly of long reads obtained by PacBio SMRT long-read sequencing
Analysis of the functional omics, identification of genes for various capabilities
Differential expression analyses of cells grown in different cultures
Comparison with other species

Hypothesis

Similar results to the original paper will be obtained

Sample

Leptospirillum ferriphilum type strain, an iron oxidizing bacteria

Data

PacBio DNA long-read sequencing data, in fastq format
Illumina RNA paired end sequencing data, from 5 samples, each with 2 technical replicates, in fastq format
- 3 samples from continuous cultures using ferrous iron as the substrate
- 2 samples from batch, mineral, bioleaching cultures containing chalcopyrite (CuFeS2)

Analyses and softwares

Basic

Genome assembly of PacBio reads
- Canu ~ 11.5 hours
Assembly quality assessment
- Quast
- MUMmerplot
Structural and functional annotation
- Prokka
- eggNOGmapper
Synteny comparison with a closely related genome
- blast
- Artemis
RNA-seq reads preprocessing: trimming + quality check (before and after)
- FastQC
- Trimmomatic ~ 15min per file
Mapping and counting RNA-seq reads, and analysing differential expression
- BWA ~ 5 hours
- IGV
- HTSeq ~ 8 hours
- DESeq2
Biological interpretation of the results

Extra

Analysis of metabolic and other functional capabilities
Comparative genomics: comparison of genes in common with 1 or more species
Identification of other sequences within the genome (promoters, repeats, mobile elements, …)
Deeper analysis of the differential expression results: e.g. thorough evaluation of systems and genes that are differentially expressed, comparison with the results in the published paper

Time frame

4/4 Project planning
4/12 Genome assembly and genome annotation
4/25 Comparative genomics
5/7 RNA mapping
5/27 Final deadline

Data organization

Seperate data and code
Big files are compressed, symbolic links are used
Working directory is as follow

genome_analysis/
├── analyses
│   ├── 01_genome_assembly
│   │   └── 01-lfts-pacbio
│   ├── 02_assembly_quality_assessment
│   │   ├── 01-quast
│   │   └── 02-mummer
│   ├── 03_genome_annotation
│   │   ├── 01-prokka
│   │   └── 02-emapper
│   ├── 04_rna_preprocessing
│   │   ├── 01-fastqc
│   │   ├── 02-trimmomatic
│   │   └── 03-fastqc
│   ├── 05_rna_mapping
│   │   ├── 01-bwa
│   │   ├── 02-htseq
│   │   └── 03-deseq2
│   └── 06_comparison
│       └── 01-blastn
├── code
│   ├── blast.sh
│   ├── bwa.sh
│   ├── canu.sh
│   ├── deseq2.R
│   ├── fastqc1.sh
│   ├── fastqc2.sh
│   ├── htseq.sh
│   ├── mummer.sh
│   ├── parse.py
│   ├── plot.R
│   ├── prokka.sh
│   ├── quast.sh
│   └── trimmometric.sh
├── data
│   ├── DNA_raw_data -> /proj/g2019003/nobackup/private/3_Christel_2017/DNA_raw_data
│   ├── metadata
│   │   └── SraRunTable.txt
│   ├── reference -> /proj/g2019003/nobackup/private/3_Christel_2017/reference
│   ├── RNA_raw_data -> /proj/g2019003/nobackup/private/3_Christel_2017/RNA_raw_data
│   └── RNA_trimmed_reads -> /proj/g2019003/nobackup/private/3_Christel_2017/RNA_trimmed_reads
└── README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project-plan.md

Project-plan.md

Project plan

Paper

Plan

Overview

Goals

Hypothesis

Sample

Data

Analyses and softwares

Basic

Extra

Time frame

Data organization

Files

Project-plan.md

Latest commit

History

Project-plan.md

File metadata and controls

Project plan

Paper

Plan

Overview

Goals

Hypothesis

Sample

Data

Analyses and softwares

Basic

Extra

Time frame

Data organization