forked from s-andrews/FastQC
-
Notifications
You must be signed in to change notification settings - Fork 0
/
RELEASE_NOTES.txt
924 lines (667 loc) · 36.5 KB
/
RELEASE_NOTES.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
RELEASE NOTES FOR FastQC v0.11.8
--------------------------------
This release works around some edge cases in unusual sequence libraries
and changes the behaviour of the read length module when run with the
--nogroup option. Other minor fixes are also present.
RELEASE NOTES FOR FastQC v0.11.7
--------------------------------
This is a bugfix release for a bug introduced in 0.11.6. Specifically
this version would crash if the first sequence in a file was <12bp
(or less than the length of the longest adapter if a custom adapters
file was being used).
RELEASE NOTES FOR FastQC v0.11.6
--------------------------------
This update fixes some bugs and updates some of the functionality to
accommodate changes in some of the sequencing platforms.
There is one major change which is that by default we now disable the
kmer module. With the inclusion of the adapter plot the value of the
information in the Kmer plot is often not great, and it is easy to
confound it if there are any over-represented sequences, or primer
compositional bias. Overall therefore we consider it best to not
routinely include this module.
If you want to turn this module back on, then simply edit the
limits.txt file in the Configuration folder of the FastQC installation
and change the line near the top which says:
kmer ignore 1
..to..
kmer ignore 0
..and the module will be re-enabled.
Other changes in this release are:
1) Fixed a bug which prematurely abandoned the adapter content plot
when long custom adapters were being used.
2) Changed the cutoff for the maximum number of tiles to allow for
the novaseq which has lots of them.
3) Fixed a bug in the parsing of tile numbers on some illumina
sequencers
4) Added some new Clontech sequences to the contaminants list.
5) Made the --nanopore option work with the new multi-folder ONT
folder structure
6) Added an option to specify a file name when streaming data into
FastQC
7) Added new RDF paths to check for fastq data in nanopore fast5 files
8) Fix parsing of newer nanopore base names to correctly collate sequences
9) Fixed a typo in the documentation for the per tile plot documentation
10) Added a --min-length option to ignore short sequences making it easier
to generate directly comparable statistics between runs.
RELEASE NOTES FOR FastQC v0.11.5
--------------------------------
This is a minor bug-fix release which addresses issues found in
the previous release.
1) Fixed a bug in the per-base sequence content module where the warn
error flags were incorrectly being calculated on G-A and C-T instead
of G-C and T-A.
2) Fixed the small-RNA adapter molecule sequence so that all possible
adapter variants would be found - the previous version would
under-estimate the amount of small RNA adapter which was present in
a sample.
3) Fixed a typo in the documentation for the duplication plot.
RELEASE NOTES FOR FastQC v0.11.4
--------------------------------
This is a minor bug-fix release which addresses issues found in
the previous release.
1) Changed the OSX launcher to not rely on the Mac specific JVM
framework, but to use any command line java which is found. This
will require the user to install the JDK on newer OSX releases
rather than just the JVM. See the updated version of the INSTALL.txt
document if you have problems with this on OSX.
2) Fixed a typo in one of the included adapter sequences in the search
set.
3) Fixed a bug which failed to remove an fq.gz extension from the
name used in the report when running in offline mode.
4) Made the per-tile quality module not collect any stats if it's
been disabled in limits.txt, rather than just not being included
in the output.
5) Fixed a bug in the calculation of duplication levels which only
affected highly duplicated ordered libraries where the count limit
was not reached.
6) Fixed a bug which triggered an error flag on the per-base quality
plot for positions with fewer than 100 observations, where we can't
calculate a sensible percentile value. Changed the text report to
assign NaN to these positions rather than 0.
RELEASE NOTES FOR FastQC v0.11.3
--------------------------------
This is a minor bug-fix release which addresses some issues
reported with the v0.11.2 release.
1) Fixed a bug when using the limits.txt file to disable the per
tile analysis module.
2) Fixed a documentation error in the duplicated sequences plot.
3) Fixed a thread safety bug when processing multiple files in a
single session which caused the program not to exit when all
processing had in fact completed.
4) Fixed a bug which meant that forced formats in the interactive
application weren't being honoured.
5) Fixed a bug in the way soft clipping was applied when we were
analysing only mapped data.
6) Fix a memory issue when trying to parse tile names in cases where
we mistakenly think we're identify tile numbers, but we aren't
7) Fix a bug in the text reporting of per-tile quality scores
8) Add the SOLID smallRNA adapter to the default adapter search set
9) Fix a bug in casava mode when using uncompressed fastq files
10) Increase the number of sampled sequences in the duplicate and
overrepresented module to 100,000.
11) Add a clean up of data structures for the Kmer module so that
the interactive mode can process more files without dying.
RELEASE NOTES FOR FastQC v0.11.2
--------------------------------
This is a minor bug-fix release which addresses some issues
reported with the v0.11.1 release.
1) Added a proper implementation of a --limits command line
option to allow users to specify a custom limits file for an
individual run. This also fixed a bug seen if the user used the
--adapter option.
2) Fixed an error in the naming of the folder inside the zip file
such that it couldn't be extracted into the same folder as the
main HTML file.
3) Fixed an overly large data structure which was causing some
runs to terminate due to a lack of memory.
4) Fixed a poor implementation in the Kmer module which was
causing unusually high memory usage.
5) Fixed incorrect defaults for the warn/fail values in the per
sequence quality module.
RELEASE NOTES FOR FastQC V0.11.1
--------------------------------
FastQC v0.11.1 is the first release for a long time and is a
major update to the package. This release adds some significant
new features to the program which will hopefully make it more
useful.
The major new features in this release are:
1) Configurable thresholds for modules. For all modules you can
now alter a configuration file to set the thresholds used by the
program for warnings and errors so that you can flag up only the
types of problem which you are concerned by.
2) Optional modules. The same configuration file used to set
the warn / error thresholds can also be used to selectively
disable modules you don't want to see at all.
3) New per-tile quality analysis. If you are running Illumina
libraries through FastQC it will now analyse the quality calls
on a per-tile basis and will flag up points in the run where the
quality in individual tiles fell below the average quality for
that cycle. This can help to spot technical problems during the
run.
4) New adapter content module. A new module has been added to
specifically search for the presence of adapters in your library.
This operates in a similar way to the existing Kmer analysis but
allows you to specify individual adapter sequences to screen and
will always show the results for each adapter so you can easily
see what you might gain if you chose to adapter trim your library.
5) Improved duplication plot. The duplication plot has been given
an overhaul so that it now reports values which are real read
numbers rather than always giving relative values. This is thanks
to work done by Brian Howard of sciome.com who worked out the right
way to extrapolate from the data we sample to the full library. It
also shows how the level of duplication would affect the library
both before and after deduplication, and the headline figure is now
much more useful as it shows the percentage of the library which
would remain if you chose to deduplicate.
6) Improved Kmer module. The Kmer module has been changed so that
instead of trying to search for individual Kmers which are present
at higher than expected frequency (which actually happens all the
time in real libraries), it now looks for Kmers which are present
in significantly different amounts at different starting positions
within the library. This has allowed the use of longer Kmer
sequences to give a more useful result.
7) Since file reports. The default output format for the program
is now a single HTML file with all of the various graphs embedded
into it. The .zip file output with the individual graphs is still
produced as are the associated data files, but you can just
distribute the one HTML file alone - the other data is no longer
required.
8) Ability to read from stdin. If you want to pipe a stream of
data into fastqc rather than using a real file then you can just
use 'stdin' as the filename to process and then stream uncompressed
fastq data on stdin.
9) Changed base groupings. For long reads we used to use an
exponential series to group bases together to summarise the sequence
content and qualities. We've now switched the default to be that
for grouped plots the first 9 individual bases will always be shown
(since this often roots out problems in the libraries), after that
there will be a series of evenly sized windows so that the same number
of bases fall into each window. You can bring back the old behaviour
with the new --expgroup option, and you can remove grouping all together
with the --nogroup option (although this will do bad things to your
plots if you have really long reads).
10) Dropped support for the Solaxa64 (but NOT Phred64) encoding. We
have removed the ability of the program to reliably detect the original
Solexa64 encoding which was used in the GA pipeline prior to v1.3.
This was a 64 offset encoding but which allowed scores which ranged
down to -5. Supporting this encoding meant that we would incorrectly
guess the encoding on Phred33 files which had no bases with quality
scores below 26, which could happen if you aggressively trimmed your
data. Supporting just Phred33 and Phred64 now means that we wouldn't
mis-detect unless there were no bases with qualities below 31, which
is much less likely, even in trimmed data. Since no Solexa64 data
will have been produced since early 2009 it is unlikely that removing
support for this format will adversely affect users of the program.
RELEASE NOTES FOR FastQC V0.10.1
--------------------------------
FastQC v0.10.1 is a bugfix release which works around two problems
people have encountered with previous releases:
1) A work-round has been put into place for a limitation in the
java gzip decompressor, where it would read only the first
compressed block in a file created by concatenating multiple
gzipped files directly, rather than decompressing and recompressing
them.
2) Users who had installed the program in directories containing
characters required to be encoded in URLs (= & ? etc) were finding
that the report generation generated an error. This encoding has
now been fixed and the program should now have no limits in the
name of the directory in which it can be installed.
One additional feature is that in the fastqc wrapper you can now
specify the location of the java interpreter on the command line
using the --java parameter, rather than having to have it included
in the path.
One other change in this release is that the package names for all
of the java classes have been changed to reflect a change in the
official project URL. This means that the launchers for the
program have had to be updated to use these new names. If you have
created your own launcher, or had copied any of the old ones you'll
need to update this to use the launchers included in this version
of the program.
The new URL for the project is:
www.bioinformatics.babraham.ac.uk/projects/fastqc/
..and if you want to report bugs the bugzilla URL is now:
www.bioinformatics.babraham.ac.uk/bugzilla/
RELEASE NOTES FOR FastQC V0.10.0
--------------------------------
The major feature of FastQC v0.10.0 is the addition of support
for fastq files generated directly by the latest version of the
illumina pipeline (Casava v1.8). In this version the pipeline
generates gzipped fastq files by default rather than using qseq
files which can then be converted to fastq. However the fastq
files generated by casava are unusual in two ways:
1) A single sample produces a set of fastq files with a common
name, but an incrementing number at the end.
2) Casava FastQ files contain sequences from clusters which have
failed the internal QC, and been flagged to be filtered.
FastQC v0.10.0 introduces a Casava mode which will merge together
fastq files from the same sample group and produce a single report.
It will also exclude any flagged entries from the analysis. You
would therefore run FastQC as normal but selecting all of the
fastq files from Casava and using casava mode for your analysis.
Casava mode is activated from the command line by adding the
--casava option to the launch command. From the interactive
application you need to select 'Casava FastQ Files' from the drop
down file selector filter options.
If you want to analyse casava fastq files without these extra
options then you can use treat them as normal fastq files with
no problems.
In addition to this change there have also been changes to allow
the wrapper script to work properly under windows, and a bug was
fixed which missed of the last possible Kmer from every sequence
in a library.
RELEASE NOTES FOR FastQC V0.9.6
-------------------------------
FastQC v0.9.6 fixes a couple of bugs which aren't likely to have
affected the majority of fastqc users:
- Fixed a crash in the Kmer module when analysing a sequence where
every sequence in a library contained a poly-N stretch at its
3' end.
- Fixed the wrapper script so that OSX users launching fastqc
through the script rather than the Mac application bundle get
their classpath set correctly, and can therefore analyse bam/sam
files.
RELEASE NOTES FOR FastQC V0.9.5
-------------------------------
FastQC v0.9.5 fixes some bugs in the programs text output and
improves a few things in the graphical interface. Main changes
are:
- Progress calculations are now exact and not approximate
- The UI now has a welcome screen so you're not just presented
with a blank screen when the program starts
- The wrapper script now sets the classpath correctly in windows
as well as linux.
- The text report for per-base sequence content now reports
correct values for grouped bases
- The HTML report uses a custom stylesheet for print output so
graphs aren't cut off when reports are printed.
- Fixed a bug in testing for a warning in the per-base sequence
content module.
- Alt text in HTML reports now matches the graphic it describes
RELEASE NOTES FOR FastQC V0.9.4
-------------------------------
FastQC v0.9.4 is a minor bugfix release which changes the offline
version of the program so that if a file fails to be processed a
full backtrace of the error is produced, rather than just a
simple generic error message.
RELEASE NOTES FOR FastQC V0.9.3
-------------------------------
FastQC v0.9.3 adds support for fastq files compressed with
bzip2 in addition to its existing support for gzip compressed
files. It's worth noting that although bzip2 offers a reduction
in the file size of the compressed files (about a 5-fold size
reduction compared to raw fastq. Gzip is a 4-fold decrease),
there is a significant penalty in terms of the speed of
decompression of these files. In our tests gzipped files were
actually processed slightly faster than raw FastQ files, presumably
due to the lower amount of data transfer from disk required,
however bzip2 compressed files took around 6X as long to
process as gzipped files.
The other big change in this release is an update to the default
CSS layout such that viewing the HTML reports doesn't require
lots of scrolling up and down the page. As before the CSS can
be edited and customised by editing the templates shipped with
the program. Many thanks to Phil Ewels who did much of the
work on the new layout.
RELEASE NOTES FOR FastQC V0.9.2
-------------------------------
FastQC v0.9.2 fixes two bugs which were identified in the
previous release.
1) In the text output for the per-base quality module the
correct base numbers weren't being included for files which
used grouped base ranges.
2) The Kmer analysis module could crash when analysing very
small files, such that no position in the file had more than
1000 instances of an enriched Kmer.
Both of these issues should now be resolved.
RELEASE NOTES FOR FastQC V0.9.1
-------------------------------
FastQC v0.9.1 adds some new command line options and fixes a
couple of usability issues.
The new command line options are:
--quiet Will suppress all progress messages and ensure that only
warnings or errors are written to stderr. This might be useful
for people running fastqc as part of a pipeline.
--nogroup Will turn off the dynamic grouping of bases in the
various per-base plots. This would allow you to see a result
for each base of a 100bp run for example. This option should
not be used for really long reads (454, PacBio etc) since it
has the potential to crash the program or generate very wide
output plots
In addition to these the following changes have been made:
The basic stats module now includes a line to say which type
of quality encoding was found so this information isn't just
present in the header of the per-base quality plot, and will
appear in the text based output.
We now distinguish between Illumina <1.3, 1.3 and 1.5 encodings
rather than just <1.3 and >=1.3. The Sanger encoding is now
labelled as Sanger / Illumina 1.9+ to allow for the change in
encoding in the latest illumina pipeline.
A bug was fixed which caused the program to crash when encountering
a zero length colorspace file (which shouldn't happen anyway, but
crashing wasn't the correct response).
The interactive over-represented sequence table now allows you
to copy out each cell individually which makes it easy to copy
any unknown sequences into other programs.
RELEASE NOTES FOR FastQC V0.9.0
-------------------------------
FastQC v0.9.0 makes a number of changes primarily targetted at
datasets containing longer reads. It allows all of the analyses
within FastQC to be completed sensibly and without excessive
resource usage on runs containing reads up to tens of kilobases
in length.
For long reads the program now converts many of its plots to
variably sized bins so that, for example you see every base for
the first 10 bases, then every 5 bases for the next 40 bases,
then every 50, then 100 then 1kb per bin, until the end of the
sequence is reached. For sequences below 75bp the reports will
look exactly the same as before. For groups running slightly
longer Illumina or ABI runs you'll see some compression at the
end of your reads, and people using 454 or PacBio will get
sensible results for all analyses for the first time.
One additional change which will impact anyone using reads over
75bp is that the duplicate sequence and overrepresented sequence
plots now only use the first 50bp of each read if the total read
length is over 75bp. This is because these plots work on the
basis of an exact sequence match, and longer reads tend to show
more errors at the end which makes them look like different
sequences when they're actually the same. 50bp should be enough
that you won't see exact matches by chance.
RELEASE NOTES FOR FastQC V0.8.0
-------------------------------
FastQC v0.8.0 adds some new options for parsing BAM/SAM files and
makes the graphs in the report easier to interpret.
All graphs in v0.8.0 now have marker lines going across them to make
it easier to relate the data in the graph to the y-axis. The per
base quality boxplots have shading behind the graph to indicate ranges
of good, medium and bad quality sequence.
For BAM/SAM files you can now specify that you wish to analyse only
the mapped sequences in the file. This is particularly of use to
people working on colorspace data where the mapped data should
produce reliable sequence, whereas the current raw conversion to
base space may overrepresent any errors which are present. The
option to use only mapped data is set by using the
-f bam_mapped or -f sam_mapped
option on the command line, or by specifying Mapped BAM/SAM files
from the drop down file filter in the file chooser in the interactive
version of the program.
For FastQ files the parser has been updated to not treat blank lines
between entries or at the end of the file as a format violation since
many sequences in the public repositories can have this problem.
From the command line we now offer the option to process multiple
files in parallel by setting the -t/--threads option. Please note that
using more than 1 thread will increase the amount of allocated memory
(250MB per thread), so you need to be sure you have enough memory (and
disk bandwidth) to be able to process more than one file. The
interactive application still defaults to a single processing thread
but you could theoretically change this by passing the correct java
properties in the startup command.
RELEASE NOTES FOR FastQC V0.7.2
-------------------------------
FastQC v0.7.2 fixes a bug in libraries where no unique sequences
were observed. It also improves the collection of duplicate
statistics on libraries with very low diversity.
A new command line option has been added to allow the user to
manually specify a contaminant's file rather than using the sitewide
default. This would be useful if you have different sets of
contaminants to screen against for different libraries, or if you
wanted to make a custom set of contaminants, but didn't have
sufficient privileges to modify the sitewide contaminants file.
RELEASE NOTES FOR FastQC V0.7.1
-------------------------------
FastQC v0.7.1 makes some significant enhancements to the fastqc
wrapper script which make it easier to use as part of a pipeline.
You can now use normal unix options to create your fastqc
command rather than having to pass java system properties. The
old options will continue to work though, so the updated
wrapper is still compatible with any previous commands you may
have in place. Full details of the new options can be found
by running
fastqc --help
One new option is the --format option which allows you to manually
specify the input format of a file, rather than having FastQC
guess the format from the file name. This would allow you to have
a BAM file called test.dat and process it using:
fastqc --format bam test.dat
RELEASE NOTES FOR FastQC V0.7.0
-------------------------------
FastQC v0.7.0 introduces a new analysis module which looks at the
enrichment of short sequences within the library. It is possible
to get enrichment of unaligned subsequences to quite a high degree
without this being apparent in any of the existing modules. This
new module should find problems such as read through into the
adapters at the other end of libraries, which other analyses would
miss.
Other changes in this release include:
* Altering the fastqc wrapper script so it can identify when it's
being used on the source distribution of the software so it can
issue an appropriate warning.
* Tidied up all of the y-axes on the graphs so that scaling should
now always be perfect in all graphs.
RELEASE NOTES FOR FastQC V0.6.1
-------------------------------
FastQC v0.6.1 is a bugfix release which fixes a problem with sequences
from BAM/SAM files which map to the reverse strand of the reference.
In these cases the sequence contained in the BAM/SAM file is reverse
complemented and the qualities are reversed relative to the original
sequence which came off the sequencer. In the previous release this
meant that the plots were incorrectly showing a mix of forward and
reversed sequences.
In v0.6.1 any sequences mapping to the reverse strand of the reference
are converted back to their original state before being analysed which
should give a clearer view of the overall qualities and sequence biases
within the run.
RELEASE NOTES FOR FastQC V0.6.0
-------------------------------
FastQC v0.6 adds support for reading SAM/BAM files as well as still
supporting fastq files. File type detection is based on the filename
so SAM/BAM files need to be named .sam or .bam. Any other form of
filename is assumed to be a fastq file.
SAM/BAM reading was added through the use of the picard libraries, which
means that the launcher for FastQC has had to be modified to include
the picard libraries into the classpath. If you use the bat file launcher,
the wrapper script or the Mac application bundle then you won't need to
do anything to get the new version to work, but if you've created your
own launcher then you will need to modify your classpath statement to
include the sam-1.32.jar file into the classpath.
The only other change in this release is that the line graphs have been
improved to use smoother lines for the graphs.
RELEASE NOTES FOR FastQC V0.5.1
-------------------------------
Release 0.5.1 fixes a couple of bugs and makes some improvements to
existing functions.
* A bug was fixed which caused the headers of the overrepresented
sequences results to not be separated in the text output.
* A bug was fixed which caused spikes to appear in the %GC profile
when using read lengths >100bp
* The fitting of the theoretical distribution to the %GC profile
was improved
* Some new entries were added to the contaminants file to cover
Illumina oligos for multiplexing, tag expression and small RNA
protocols (thanks to Aaron Statham for providing these)
RELEASE NOTES FOR FastQC V0.5.0
-------------------------------
FastQC v0.4.4 makes a number of changes to the previous
release which hopefully improve the relevance of the output.
* The fitting of the normal curve to the %GC distribution has
been improved for shorter sequences
* Each section of the HTML report output now has a pass/warn/fail
icon next to it, rather than just having them at the top
* The duplicate level analysis now estimates the total percentage
of sequences which are not unique, reports this on the graph and
uses this as the basis for the pass/warn/fail filtering
* The structure of the HTML output folder has been changed so that
icons and images are put into subfolders so the only files at
the top level are ones which the user is intended to open directly.
RELEASE NOTES FOR FastQC V0.4.3
-------------------------------
FastQC v0.4.3 is a bugfix release which fixes a bug and adds
an extra check to the interactive program.
In versions of the program since v0.2 the total sequence count
in the Basic Stats module may have been incorrect by a few percent
(either high or low). This is because instead of using the real
sequence count the module was using an estimated count which should
only be used for updating the progress indicators. The module
has now been fixed to report the actual sequence count.
In the interactive application there was no warning given if you
chose to save a report and opted to overwrite an existing report. You
will now be warned in this case an offered the chance to use a
different filename. This change won't affect the non-interactive
mode of the program where report files will be overwritten if the
program is run more than once on the same set of data.
RELEASE NOTES FOR FastQC V0.4.2
-------------------------------
FastQC v0.4.2 is a minor release which fixes some bugs and improves
the warnings on some of the analysis modules.
Bugs which were fixed were:
* The per-base quality plot showed an incorrect scale on the
y-axis. This was usually off by about 2, but could be more
depending on the data. The relative relationships shown
were correct, it was just the scale bar which was wrong.
* The sequence parser incorrectly identified some base called
files as colorspace files where they contained dots in place
of N base calls. This has now been improved but will still
fail for the base where a base called file starts with an
entry which is a single called base followed by all dots
since this is indistinguishable from a colorspace fastq file.
* Some JREs (notably OpenJDK) can't cope with a headless environment
being set from within the program. I've therefore added code
to the fastqc wrapper script to externally set the headless
environment up to bypass this problem.
Other improvements which have been made are:
* When writing out graphs for the HTML report we now scale some
graphs by the length of the sequences analysed so we don't get
really squashed graphs. The interactive application will still
scale the graphs to the size of the window
* More checks have been added to the parsing of FastQ files to
ensure we're looking at the correct file format. A better reporting
mechanism has been added to allow the program to cope better with
problems during parsing.
* The per-sequence GC plot has been modified to add in a modelled
normal distribtion over the observed distribution so you can see
how well the two fit.
* All modules (apart from the general stats) now have valid checks
in place, and all can now produce warnings and failures. Some of
the checks in the existing modules have been changed to better cope
with libraries with very low or high GC content.
RELEASE NOTES FOR FastQC V0.4.1
-------------------------------
FastQC v0.4.1 is a bugfix release which changes the operation
of the duplicate sequence and overrepresented sequence modules.
Both of these modules now only track sequences which appear in
the first 200,000 sequences of a file. This change was made
because people using longer read lengths found that tracking
the first million sequences led to the program exhausting its
allocated memory.
The other change is that the duplicate sequence module now
tracks sequence counts to the end of the file rather than
stopping after a million sequences. This means the final
duplication counts seen are much more realistic and should
match with what you would see in a genome browser. Because
of this change the cutoffs to flag a file with a warning or
an error have been increased. You now get a warning when
the sum of duplicated sequences is more than 20% of the
unique sequences. You get an error when there are more
duplicated sequences than unique sequences.
RELEASE NOTES FOR FastQC V0.4
-----------------------------
FastQC v0.4 introduces a new analysis module, and easier way
to launch the program from the command line and a new output
file, as well as fixing a few minor bugs.
The new analysis module is the sequence duplication level
module. This is a complement to the existing overrepresented
sequences module in that it looks at sequences which occur
more than once in your data. The new module takes a more
global view and says what proportion of all of your sequences
occur once, twice, three times etc. In a diverse library most
sequences should occur only once. A highly enriched library
may have some duplication, but higher levels of duplication
may indicate a problem, such as a PCR overamplification.
In response to several requests we've also now introduced a
new output file into the report. This is a text based, tab
delimited file which includes all of the data show in the
graphs in the graphical report. This would allow people
running pipelines to store the data generated by fastQC and
analyse it systematically rather than just taking the
pass/fail/warn summary, or reviewing the reports manually.
Finally, if you're running fastqc from the command line we've
now included a 'fastqc' wrapper script which you can launch
directly rather than having to construct a java launch
command. You can still pass -Dxxx options through to the
program, but for simple analyses you can now simply run:
fastqc [some files]
..once you have included the FastQC install directory into your
path. More details are in the install document.
Other minor changes:
- The over-represented sequences module now scans the first
million sequences to decide which sequences to track to the
end of the file.
- The labeling on the per-base N content graph is now correct
RELEASE NOTES FOR FastQC V0.3.1
-------------------------------
V0.3.1 is a bugfix release which fixes a few minor problems.
1) The template system now correctly checks that it only imports
graphics files from the templates directory into the report
files and doesn't break when it encounters unexpected files.
2) The offline mode now correctly reports the progress of all
processed files rather than just the first one.
3) The documentation has been updated to include information on
the blue mean line in the per base quality plot.
RELEASE NOTES FOR FastQC V0.3
-----------------------------
Major new additions to v0.3 are listed below:
1) Support for gzip compressed fastq files. You can now
load gzip compressed fastq files into FastQC in the same
way as uncompressed files. The files will be decompressed
interactively and can be viewed in the same way as for
uncompressed files.
2) Contaminant identification. If your library is found to
have overrepresented sequences in it these are now scanned
against a database of common contaminants (primers, adapters etc)
to see if the source of the contamination can be identified.
The database is stored in a new Contaminants folder in the
main installation directory, and can be updated with your
own protocol specific sequences.
3) When in non-interactive mode you can now pass a
preference -Dfastqc.output_dir to provide an alternative location
to save reports to, rather than having them in the same
directory as the source fastq files.
Some improvements have also been made to the colorspace support
so this version should support a wider range of colorspace
fastq files.
RELEASE NOTES FOR FastQC V0.2
-----------------------------
There are a few new additions in v0.2 of FastQC
1) Colorspace support: We now have rudimentary support for the
analysis of fastq files in colorspace. The analysis is done by
converting the colorspace calls to basecalls, which isn't ideal
but is hopefully more useful than nothing.
2) Option to unzip reports. The default action in non-interactive
mode is now to both create a zip file containing the FastQC report
and to unzip this into a folder of the same name. If you just want
to generate the zip files you can add -Dfastqc.unzip=false to the
command line to suppress this new behaviour
3) It is now possible to customise the HTML report for your site.
There is an HTML template which can be edited to add your own
branding to the reports you generate.
4) In addition to the human readable HTML report we now also
generate a tab delimited summary file which should make it
easier to integrate FastQC into an automated reporting system
which spots any potential problems with the data.
RELEASE NOTES FOR FastQC V0.1.1
-------------------------------
This point release fixes a problem when running FastQC in a
non-interactive mode on a system without a graphical display.
The program should now operate correctly on headless systems
as long as the filename(s) to process are specified on the
command line.
RELEASE NOTES FOR FastQC V0.1
-------------------------------
FastQC v0.1 is a beta release it should work in its present state but
we are keen to get feedback on the program. In particular we are
interested to hear if anyone has:
1) Suggestions for other checks we could be performing.
2) Comments about the criteria we set for issuing warnings or errors and
suggestions for how these could be improved.
You can report feedback either though our bug reporting tool at:
www.bioinformatics.bbsrc.ac.uk/bugzilla/
...or directly to [email protected]