- A new sequenced genome of L.ferriphilum type strain (species described for the first time) was presented
- RNA sequencing in continuous culture and batch bioleaching culture elucidated adaptations to mineral environment
- Bioleaching: extraction of metals from ores through organisms, techniques that present sustainable and environmentally friendly alternatives to traditional methods, recovery of copper from CuFeS2
Reproduce the work of the paper, and answer additional questions
- De novo assembly of long reads obtained by PacBio SMRT long-read sequencing
- Analysis of the functional omics, identification of genes for various capabilities
- Differential expression analyses of cells grown in different cultures
- Comparison with other species
Similar results to the original paper will be obtained
Leptospirillum ferriphilum type strain, an iron oxidizing bacteria
- PacBio DNA long-read sequencing data, in fastq format
- Illumina RNA paired end sequencing data, from 5 samples, each with 2 technical replicates, in fastq format
- 3 samples from continuous cultures using ferrous iron as the substrate
- 2 samples from batch, mineral, bioleaching cultures containing chalcopyrite (CuFeS2)
- Genome assembly of PacBio reads
- Canu ~ 11.5 hours
- Assembly quality assessment
- Quast
- MUMmerplot
- Structural and functional annotation
- Prokka
- eggNOGmapper
- Synteny comparison with a closely related genome
- blast
- Artemis
- RNA-seq reads preprocessing: trimming + quality check (before and after)
- FastQC
- Trimmomatic ~ 15min per file
- Mapping and counting RNA-seq reads, and analysing differential expression
- BWA ~ 5 hours
- IGV
- HTSeq ~ 8 hours
- DESeq2
- Biological interpretation of the results
- Analysis of metabolic and other functional capabilities
- Comparative genomics: comparison of genes in common with 1 or more species
- Identification of other sequences within the genome (promoters, repeats, mobile elements, …)
- Deeper analysis of the differential expression results: e.g. thorough evaluation of systems and genes that are differentially expressed, comparison with the results in the published paper
- 4/4 Project planning
- 4/12 Genome assembly and genome annotation
- 4/25 Comparative genomics
- 5/7 RNA mapping
- 5/27 Final deadline
- Seperate data and code
- Big files are compressed, symbolic links are used
- Working directory is as follow
genome_analysis/
├── analyses
│ ├── 01_genome_assembly
│ │ └── 01-lfts-pacbio
│ ├── 02_assembly_quality_assessment
│ │ ├── 01-quast
│ │ └── 02-mummer
│ ├── 03_genome_annotation
│ │ ├── 01-prokka
│ │ └── 02-emapper
│ ├── 04_rna_preprocessing
│ │ ├── 01-fastqc
│ │ ├── 02-trimmomatic
│ │ └── 03-fastqc
│ ├── 05_rna_mapping
│ │ ├── 01-bwa
│ │ ├── 02-htseq
│ │ └── 03-deseq2
│ └── 06_comparison
│ └── 01-blastn
├── code
│ ├── blast.sh
│ ├── bwa.sh
│ ├── canu.sh
│ ├── deseq2.R
│ ├── fastqc1.sh
│ ├── fastqc2.sh
│ ├── htseq.sh
│ ├── mummer.sh
│ ├── parse.py
│ ├── plot.R
│ ├── prokka.sh
│ ├── quast.sh
│ └── trimmometric.sh
├── data
│ ├── DNA_raw_data -> /proj/g2019003/nobackup/private/3_Christel_2017/DNA_raw_data
│ ├── metadata
│ │ └── SraRunTable.txt
│ ├── reference -> /proj/g2019003/nobackup/private/3_Christel_2017/reference
│ ├── RNA_raw_data -> /proj/g2019003/nobackup/private/3_Christel_2017/RNA_raw_data
│ └── RNA_trimmed_reads -> /proj/g2019003/nobackup/private/3_Christel_2017/RNA_trimmed_reads
└── README.md