This repository has been archived by the owner on May 22, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 4
/
TODO
136 lines (82 loc) · 4.35 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
!! Linking in fastq and bam files is done in a very wasteful manner. (This was done to mimic
IRODS downloading, but is not necessary). Ideally just make a channel that bypasses
linking completely.
! does bracer have loci like tracer does?
! index naming scheme; different versions of aligner (salmon 9/11/13) per genome version (GRCh38)
! versions of tools used.
use mode: 'link' in Publishdir (hardlink)
name checks on columns, rather than using 'last column',
see e.g. salmon; should use 'NumReads', not -1.
naming of columns:
(merge feature counts used for both hisat2 and star; have to do it in script)
==> study4294-tic218-hisat2-fc-genecounts.txt <==
ENSEMBL_ID MICA_collection-17817.hisat2.gene.fc.txt MICA_collection-17818.hisat2
==> study4294-tic218-star-fc-genecounts.txt <==
ENSEMBL_ID MICA_collection-17817.star.gene.fc.txt MICA_collection-17818.star.gen
==> study4294-tic218-star-genecounts.txt <==
ENSEMBL_ID MICA_collection-17817 MICA_collection-17818 MICA_collection-17819 MIC
====== general
- produce correlation plots between count matrices. (star vs fc)
- lostcause [ text: 'lowmapping; ] has issues with -resume on k8s
- irods.sh has diverged from guitar/bin/irods.sh. The latter has more features
and is the one svd uses interactively;
This irods.sh can use iget threads and uses the 'green' irods hack.
Probably best to merge them at some point.
Green hack has now been removed.
? add ps to bracer docker image
- unexpected end of file for bracer gunzip
- (rare lesser spotted error) Zero length biginteger; mmm. This is resume again. Why is NF sending
empty files?
(!) restore irods resumption in case of no files.
(!) submodule cellgeni/utils
! Featurecounts more memory for bulk data; try dynamic nextflow directive
how did that work again?
memory { reads.size() < 70.KB ? 1.GB : 5.GB }
- dump everythign in https://www.nextflow.io/docs/latest/metadata.html
- -m hisat2 for multiQC?
- samtools collate has -f option for fast. Can we use this?
- star overhang 74: best to encode it somewhere?
- software versions; move conda file to github.
- docker container for mixcr: https://hub.docker.com/r/mgibio/mixcr/
- NOTE mixcr fastqc
mixcr was installed by me, not in conda environment.
Should install it as cellgeni-su.
fastqc was installed not as part of conda
? enable fastqdir simultaneous with IRODS. this will help testing.
so --fastqdirPE (stick flag in option), --samplefile_dir
- check places where I built indexes for STAR and hisat2; save scripts/settings
Done:
rsync -avm --del --include='*RESUME*' -f 'hide,! */' DATA-indexes/ ~/DATA-indexes
rsync -avm --del --include='*.config' -f 'hide,! */' DATA-indexes/ ~/DATA-indexes
(!) nf-core uses csvtk, a tool set by Shen Wei in Golang.
# Indexbam: compute total amount of giga-bases.
# star collectFile
# check [text: "a b c\n"] idiom, mixed with collectFile on it.text, used downstream
of ch_star_reject and ch_hisat2_reject.
! However, it is perhaps neater to do this in the script, and generate a file
(i.e. pick the approach taken in irods rather than star/hisat2).
# ignore all columns from featurecounts except 1,7; file size is 30M per sample.
# check_logs equivalent for hisat2, filter on overall alignment rate
HISAT2 summary stats:
Total pairs: 10000
Aligned concordantly or discordantly 0 time: 1784 (17.84%)
Aligned concordantly 1 time: 7343 (73.43%)
Aligned concordantly >1 times: 873 (8.73%)
Aligned discordantly 1 time: 0 (0.00%)
Total unpaired reads: 3568
Aligned 0 time: 3568 (100.00%)
Aligned 1 time: 0 (0.00%)
Aligned >1 times: 0 (0.00%)
Overall alignment rate: 82.16%
# collectFile pattern for lostcause (and any other merge process)
# lostcause: add file size check in crams_to_fastq:
or catch the STAR error if it happens. Test on small files, found in tic-148 / ijn
Perhaps move indexbam functionality before HISAT2 and STAR,
have a filter there. Anyway, make sure it is in lostcause.
# use channel instead of publishbam process, subscribe
# can I do fc for both hisat2 and STAR? with transpose yes.
# merge_featurecounts.py: enable reading input files from -I metafile option
# enable simultaneous hisat2 run with star (similar salmon)
====== lostcause
- hopefully use future NF functionality in
https://github.com/nextflow-io/nextflow/issues/903