Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extremely high percentage of reads too short to map... #28

Open
GoogleCodeExporter opened this issue Jan 26, 2016 · 1 comment
Open

Comments

@GoogleCodeExporter
Copy link

I have been using STAR with Illumina 101bp paired end reads. The first set of 
libraries I sequenced work great going through the pipeline, but I have had a 
very strange problem with the most recent libraries.

I call star using the following call:

Star_Directory/STAR --genomeDir Star_Directory/STAR_2.3.0/Genome --readFilesIn 
$f $f2 --outSAMstrandField intronMotif --runThreadN 3

where f and f2 are the paired end reads:
1-Nq-C96_S94_L001_R1_001_val_1.fq 
1-Nq-C96_S94_L001_R2_001_val_2.fq

which have been trimmed by trim_galore with the call:
trim_galore -q 15 --phred33 --paired --length 50 -a CTGTCTCTTATACACATCT 
--stringency 3 $f $f2

where f and f2 are the untrimmed fastq files:
1-Nq-C96_S94_L001_R2_001.fastq 
1-Nq-C96_S94_L001_R1_001.fastq 

For these runs the log.out file shows something like this:

                                  Started job on |  Sep 17 13:16:13
                             Started mapping on |   Sep 17 13:17:17
                                    Finished on |   Sep 17 13:17:47
       Mapping speed, Million of reads per hour |   21.76

                          Number of input reads |   181350
                      Average input read length |   179
                                    UNIQUE READS:
                   Uniquely mapped reads number |   1973
                        Uniquely mapped reads % |   1.09%
                          Average mapped length |   176.75
                       Number of splices: Total |   24
            Number of splices: Annotated (sjdb) |   0
                       Number of splices: GT/AG |   23
                       Number of splices: GC/AG |   1
                       Number of splices: AT/AC |   0
               Number of splices: Non-canonical |   0
                      Mismatch rate per base, % |   0.39%
                         Deletion rate per base |   0.04%
                        Deletion average length |   2.22
                        Insertion rate per base |   0.00%
                       Insertion average length |   1.50
                             MULTI-MAPPING READS:
        Number of reads mapped to multiple loci |   948
             % of reads mapped to multiple loci |   0.52%
        Number of reads mapped to too many loci |   22
             % of reads mapped to too many loci |   0.01%
                                  UNMAPPED READS:
       % of reads unmapped: too many mismatches |   0.00%
                 % of reads unmapped: too short |   98.37%
                     % of reads unmapped: other |   0.01%

However looking at the Fastq files it looks like the reads are for the most 
part adequate.
I've attached abreviated versions of the two of the paired end read fastqs.

I've also attached abbreviated versions of two of the paired end fastqs that 
have mapped with a unique mapping percentage of approximately 90% (called 
read1/2_goodMappers.fq)

I am new to RNAseq analysis, so this may be a trivial issue. I am hoping I can 
get any sort of help I can.

I am using STAR 2.3.0 on Mac OSX.

Thanks so much.



Original issue reported on code.google.com by [email protected] on 18 Sep 2014 at 4:17

Attachments:

@GoogleCodeExporter
Copy link
Author

It turns out my reads were just bad and they were not mapping to the genome...

Sorry for the trouble, back to making libraries!

Original comment by [email protected] on 13 Oct 2014 at 12:34

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant