Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ExtractIlluminaBarcodes opening a lot of files, error "Too many open files" #1801

Open
GATKSupportTeam opened this issue Apr 25, 2022 · 2 comments

Comments

@GATKSupportTeam
Copy link
Collaborator

This request was created from a contribution made by Robert Altwasser on April 19, 2022 10:09 UTC.

Link: https://gatk.broadinstitute.org/hc/en-us/community/posts/5461192217627-Picard-Too-many-open-files-

--

I am demultiplexing a S4 sequencing run and Picard ExtractIlluminaBarcodes opens to many files which crashes the run. It's dual index data with UMIs and I need unmapped BAM files with the umi sequence. I checked the MD5sum of the raw data several times and I also run a check on the Basecall dir.

I monitored the open files of the process with 'lsof' and it quickly exceeds 120000 files, which is the maximum that I can set with 'ulimit -n' .

Here is the RunInfo:

<Read Number="1" NumCycles="148" IsIndexedRead="N"/>
<Read Number="2" NumCycles="17" IsIndexedRead="Y"/>
<Read Number="3" NumCycles="8" IsIndexedRead="Y"/>
<Read Number="4" NumCycles="148" IsIndexedRead="N"/>

a) Versions:

The Genome Analysis Toolkit (GATK) v4.2.5.0

HTSJDK Version: 2.24.1

Picard Version: 2.25.4

Java: openjdk version "1.8.0_312"

b) Exact command used:

(bash) $ ulimit -n 100000
picard -Xmx110g -Djava.io.tmpdir=/data/gpfs-1/users/altwassr_c/scratch/tmp/ -Xms110g \

ExtractIlluminaBarcodes \

-B /data/gpfs-1/users/altwassr_c/scratch/data/220325_A00643/Data/Intensities/BaseCalls/ \

-L 1 \

--NUM_PROCESSORS 1 \

-M metrices/barcode_metrices1.txt \

-BARCODE_FILE /data/gpfs-1/users/altwassr_c/work/projekte/barcode1.csv \

-RS 148T8B9M8B148T \

--MAX_RECORDS_IN_RAM 1000000000 \

--TMP_DIR /data/gpfs-1/users/altwassr_c/scratch/tmp/

c) Log: ``

ERROR   2022-04-19 04:41:06     ExtractIlluminaBarcodes Error processing tile 2140                                    

picard.PicardException: File not found: (/data/gpfs-1/users/altwassr_c/scratch/data/220325_A00643_0438_BH22YTDSX2/Data/Intensities/BaseCalls/L002/C237.1/L002_1.cbcl)

        at picard.illumina.parser.readers.BaseBclReader.open(BaseBclReader.java:93)                                   

        at picard.illumina.parser.readers.CbclReader.readHeader(CbclReader.java:127)                                  

        at picard.illumina.parser.readers.CbclReader.readTileData(CbclReader.java:200)                                

        at picard.illumina.parser.readers.CbclReader.advance(CbclReader.java:275)                                     

        at picard.illumina.parser.readers.CbclReader.hasNext(CbclReader.java:252)                                     

        at picard.illumina.parser.NewIlluminaDataProvider.hasNext(NewIlluminaDataProvider.java:125)                   

        at picard.illumina.ExtractIlluminaBarcodes$PerTileBarcodeExtractor.run(ExtractIlluminaBarcodes.java:363)      

        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)                                    

        at java.util.concurrent.FutureTask.run(FutureTask.java:266)                                                   

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)                            

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)                            

        at java.lang.Thread.run(Thread.java:748)           

Caused by: java.io.FileNotFoundException: /data/gpfs-1/users/altwassr_c/scratch/data/220325_A00643_0438_BH22YTDSX2/Data/Intensities/BaseCalls/L002/C237.1/L002_1.cbcl (Too many open files)

        at java.io.FileInputStream.open0(Native Method)    

        at java.io.FileInputStream.open(FileInputStream.java:195)                                                     

        at java.io.FileInputStream.(FileInputStream.java:138)                                                   

        at picard.illumina.parser.readers.BaseBclReader.open(BaseBclReader.java:90)                                   

        ... 11 more                                        

INFO    2022-04-19 04:41:06     ExtractIlluminaBarcodes Extracting barcodes for tile 2141                             

ERROR   2022-04-19 04:41:06     ExtractIlluminaBarcodes Error processing tile 2141                                    

picard.PicardException: Unrecognized data type(Cbcl) found by IlluminaDataProviderFactory!                            

        at picard.illumina.parser.IlluminaDataProviderFactory.makeParser(IlluminaDataProviderFactory.java:400)        

        at picard.illumina.parser.IlluminaDataProviderFactory.makeDataProvider(IlluminaDataProviderFactory.java:249)  

        at picard.illumina.parser.IlluminaDataProviderFactory.makeDataProvider(IlluminaDataProviderFactory.java:228)  

        at picard.illumina.ExtractIlluminaBarcodes$PerTileBarcodeExtractor.run(ExtractIlluminaBarcodes.java:355)      

        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)                                    

        at java.util.concurrent.FutureTask.run(FutureTask.java:266)                                                   

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)                            

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)                            

      at java.lang.Thread.run(Thread.java:748)

(created from Zendesk ticket #281653)
gz#281653

@gbrandt6
Copy link

@gbggrant this error came up on the GATK Forum. Is there anything going wrong with ExtractIlluminaBarcodes that it is opening 120000 files? This user has a limit of 100000. Here they have already tried increasing --MAX_RECORDS_IN_RAM.

@gbggrant
Copy link
Contributor

We've seen some reports of this, I believe that Fulcrum Genomics (who submitted some recent changes on this code) are looking into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants