Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Individual sample stayed at 'Finding clipped reads' #15

Open
YaliHao opened this issue Nov 7, 2024 · 0 comments
Open

Individual sample stayed at 'Finding clipped reads' #15

YaliHao opened this issue Nov 7, 2024 · 0 comments

Comments

@YaliHao
Copy link

YaliHao commented Nov 7, 2024

Hi, Chrisjrt!
I have run the test files correctly. However, when I used hafeZ to run 40 data in batches with 4 threads, it stayed at one sample for 6 hours while making no progress(it didn't stop, and there was no error messages). I checked the sample was GCA_003573835.1(DRR160049). Then I ran the sample alone using the code(in case 4 threads lead to the results, I used 64 threads this time ):

hafeZ.py -f /dssg/home/acct-clsjhh/clsjhh/hyl/SPI/test_provirus_del/ESKAPE/ESKAPE_fna/fna/genbank/bacteria/GCA_003573835.1/GCA_003573835.1_ASM357383v1_genomic.fna -r1 /dssg/home/acct-clsjhh/clsjhh/hyl/SPI/test_provirus_del/ESKAPE/ESKAPE_fastq/Staphylococcus_aureu/DRR160049/DRR160049_1.fastq -r2 /dssg/home/acct-clsjhh/clsjhh/hyl/SPI/test_provirus_del/ESKAPE/ESKAPE_fastq/Staphylococcus_aureu/DRR160049/DRR160049_2.fastq -o GCA_003573835.1 -D /dssg/home/acct-clsjhh/clsjhh/hyl/software/hafeZ/hafeZ_db -T phrogs -t 64

and the log.out showed

[M::mm_idx_gen::0.063*1.06] collected minimizers
[M::mm_idx_gen::0.076*5.15] sorted minimizers
[M::main::0.076*5.15] loaded/built the index for 2 target sequence(s)
[M::mm_mapopt_update::0.076*5.15] mid_occ = 1000
[M::mm_idx_stat] kmer size: 21; skip: 11; is_hpc: 0; #seq: 2
[M::mm_idx_stat::0.082*4.85] distinct minimizers: 471983 (99.02% are singletons); average occurrences: 1.020; average spacing: 5.994; total length: 2885609
[M::worker_pipeline::2.239*12.31] mapped 180580 sequences
[M::worker_pipeline::2.640*15.04] mapped 180630 sequences
[M::worker_pipeline::3.004*16.56] mapped 180878 sequences
[M::worker_pipeline::3.397*17.63] mapped 180928 sequences
[M::worker_pipeline::4.030*17.36] mapped 180836 sequences
[M::worker_pipeline::4.551*17.63] mapped 180910 sequences
[M::worker_pipeline::4.976*18.13] mapped 181050 sequences
[M::worker_pipeline::5.423*18.50] mapped 181212 sequences
[M::worker_pipeline::5.898*18.73] mapped 181028 sequences
[M::worker_pipeline::6.325*19.12] mapped 181200 sequences
[M::worker_pipeline::6.825*19.16] mapped 181404 sequences
[M::worker_pipeline::7.296*19.30] mapped 181418 sequences
[M::worker_pipeline::7.772*19.40] mapped 179944 sequences
[M::worker_pipeline::8.215*19.55] mapped 179966 sequences
[M::worker_pipeline::8.650*19.73] mapped 180182 sequences
[M::worker_pipeline::9.574*18.85] mapped 180244 sequences
[M::worker_pipeline::10.047*18.94] mapped 180366 sequences
[M::worker_pipeline::10.546*18.98] mapped 180312 sequences
[M::worker_pipeline::11.038*19.03] mapped 180376 sequences
[M::worker_pipeline::11.560*19.02] mapped 180266 sequences
[M::worker_pipeline::12.011*19.12] mapped 180410 sequences
[M::worker_pipeline::12.369*18.83] mapped 180652 sequences
[M::worker_pipeline::12.701*18.37] mapped 180446 sequences
[M::worker_pipeline::12.812*18.22] mapped 53438 sequences
[M::main] Version: 2.28-r1209
[M::main] CMD: minimap2 -ax sr -t 64 -o /dssg/home/acct-clsjhh/clsjhh/hyl/SPI/test_provirus_del/ESKAPE/ESKAPE_hafez/GCA_003573835.1/temp_minimap.sam /dssg/home/acct-clsjhh/clsjhh/hyl/SPI/test_provirus_del/ESKAPE/ESKAPE_hafez/GCA_003573835.1/temp_genome.fasta /dssg/home/acct-clsjhh/clsjhh/hyl/SPI/test_provirus_del/ESKAPE/ESKAPE_fastq/Staphylococcus_aureu/DRR160049/DRR160049_1.fastq /dssg/home/acct-clsjhh/clsjhh/hyl/SPI/test_provirus_del/ESKAPE/ESKAPE_fastq/Staphylococcus_aureu/DRR160049/DRR160049_2.fastq
[M::main] Real time: 12.837 sec; CPU: 233.411 sec; Peak RSS: 0.696 GB
[bam_sort_core] merging from 0 files and 64 in-memory blocks...

Running hafeZ version 1.0.4 with the following settings:
hafeZ.py -f /dssg/home/acct-clsjhh/clsjhh/hyl/SPI/test_provirus_del/ESKAPE/ESKAPE_fna/fna/genbank/bacteria/GCA_003573835.1/GCA_003573835.1_ASM357383v1_genomic.fna -r1 /dssg/home/acct-clsjhh/clsjhh/hyl/SPI/test_provirus_del/ESKAPE/ESKAPE_fastq/Staphylococcus_aureu/DRR160049/DRR160049_1.fastq -r2 /dssg/home/acct-clsjhh/clsjhh/hyl/SPI/test_provirus_del/ESKAPE/ESKAPE_fastq/Staphylococcus_aureu/DRR160049/DRR160049_2.fastq -o GCA_003573835.1 -D /dssg/home/acct-clsjhh/clsjhh/hyl/software/hafeZ/hafeZ_db -c 3.5 -b 3001 -w 4000 -m 6 -t 64 -p 0.1 -z 3.5

##################################################
################# Running hafeZ ##################
##################################################

################ Processing fasta ################
############### Done: 0.96 seconds ###############

########### Calculating genome length ############
############# Done: 0.00000 seconds ##############

########### Mapping reads to assembly ############
############## Done: 12.94 seconds ###############

############## Generating BAM file ###############
Running samtools
finished running samtools
############## Done: 35.61 seconds ###############

########### Generating coverage depths ###########
Running mosdepth
finished running mosdepth
############### Done: 6.92 seconds ###############

################ Smoothing signal ################
############### Done: 8.38 seconds ###############

################ Getting Z-scores ################
############### Done: 0.63 seconds ###############

##### Finding potential regions of interest ######
############### Done: 0.01 seconds ###############

####### Merging close regions of interest ########
############### Done: 0.16 seconds ###############

####### Removing small regions of interest #######
############## Done: 0.0002 seconds ##############

######## Parsing and processing sam file #########
############### Done: 2.89 seconds ###############

####### Checking for rois near contig ends #######
############### Done: 0.00 seconds ###############

############# Finding clipped reads ##############

and it remained this state for nearly two hours. The corresponding directory contains the file
image
The same situation appeared on the GCA_019042295.1(SRR14879700)
Would you please give me some suggestions?
Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant