TODO



!! Linking in fastq and bam files is done in a very wasteful manner. (This was done to mimic
  IRODS downloading, but is not necessary). Ideally just make a channel that bypasses
  linking completely.


! does bracer have loci like tracer does?

! index naming scheme; different versions of aligner (salmon 9/11/13) per genome version (GRCh38)

! versions of tools used.

use mode: 'link' in Publishdir (hardlink)


name checks on columns, rather than using 'last column',
see e.g. salmon; should use 'NumReads', not -1.


naming of columns:

(merge feature counts used for both hisat2 and star; have to do it in script)

==> study4294-tic218-hisat2-fc-genecounts.txt <==
ENSEMBL_ID  MICA_collection-17817.hisat2.gene.fc.txt  MICA_collection-17818.hisat2

==> study4294-tic218-star-fc-genecounts.txt <==
ENSEMBL_ID  MICA_collection-17817.star.gene.fc.txt  MICA_collection-17818.star.gen

==> study4294-tic218-star-genecounts.txt <==
ENSEMBL_ID  MICA_collection-17817 MICA_collection-17818 MICA_collection-17819 MIC


====== general

- produce correlation plots between count matrices. (star vs fc)

- lostcause [ text: 'lowmapping; ] has issues with -resume on k8s

- irods.sh has diverged from guitar/bin/irods.sh.  The latter has more features
  and is the one svd uses interactively;
  This irods.sh can use iget threads and uses the 'green' irods hack.
  Probably best to merge them at some point.
  Green hack has now been removed.

? add ps to bracer docker image

- unexpected end of file for bracer gunzip

- (rare lesser spotted error) Zero length biginteger; mmm. This is resume again. Why is NF sending
  empty files?

(!) restore irods resumption in case of no files.

(!) submodule cellgeni/utils

!  Featurecounts more memory for bulk data; try dynamic nextflow directive
   how did that work again?
   memory { reads.size() < 70.KB ? 1.GB : 5.GB }

-  dump everythign in https://www.nextflow.io/docs/latest/metadata.html

- -m hisat2 for multiQC?

-  samtools collate has -f option for fast. Can we use this?

-  star overhang 74: best to encode it somewhere?

-  software versions; move conda file to github.

-  docker container for mixcr: https://hub.docker.com/r/mgibio/mixcr/

-  NOTE mixcr fastqc
   mixcr was installed by me, not in conda environment.
   Should install it as cellgeni-su.
   fastqc was installed not as part of conda

?  enable fastqdir simultaneous with IRODS. this will help testing.
   so --fastqdirPE (stick flag in option), --samplefile_dir

-  check places where I built indexes for STAR and hisat2; save scripts/settings
      Done:
      rsync -avm --del --include='*RESUME*' -f 'hide,! */' DATA-indexes/ ~/DATA-indexes
      rsync -avm --del --include='*.config' -f 'hide,! */' DATA-indexes/ ~/DATA-indexes

(!) nf-core uses csvtk, a tool set by Shen Wei in Golang.

#  Indexbam: compute total amount of giga-bases.

#  star collectFile

#  check [text: "a b c\n"] idiom, mixed with collectFile on it.text, used downstream
   of ch_star_reject and ch_hisat2_reject.
!  However, it is perhaps neater to do this in the script, and generate a file
   (i.e. pick the approach taken in irods rather than star/hisat2).

#  ignore all columns from featurecounts except 1,7; file size is 30M per sample.

#  check_logs equivalent for hisat2, filter on overall alignment rate

  HISAT2 summary stats:
    Total pairs: 10000
      Aligned concordantly or discordantly 0 time: 1784 (17.84%)
      Aligned concordantly 1 time: 7343 (73.43%)
      Aligned concordantly >1 times: 873 (8.73%)
      Aligned discordantly 1 time: 0 (0.00%)
    Total unpaired reads: 3568
      Aligned 0 time: 3568 (100.00%)
      Aligned 1 time: 0 (0.00%)
      Aligned >1 times: 0 (0.00%)
    Overall alignment rate: 82.16%

#  collectFile pattern for lostcause (and any other merge process)

#  lostcause: add file size check in crams_to_fastq:
   or catch the STAR error if it happens. Test on small files, found in tic-148 / ijn
   Perhaps move indexbam functionality before HISAT2 and STAR,
   have a filter there. Anyway, make sure it is in lostcause.

#  use channel instead of publishbam process, subscribe

#  can I do fc for both hisat2 and STAR? with transpose yes.

#  merge_featurecounts.py: enable reading input files from -I metafile option

#  enable simultaneous hisat2 run with star (similar salmon)

====== lostcause

-  hopefully use future NF functionality in
      https://github.com/nextflow-io/nextflow/issues/903