Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segmentation fault (core dumped) #17

Open
nliorni opened this issue Feb 23, 2023 · 17 comments
Open

segmentation fault (core dumped) #17

nliorni opened this issue Feb 23, 2023 · 17 comments
Assignees

Comments

@nliorni
Copy link

nliorni commented Feb 23, 2023

Greetings,
I cloned the latest version of circminer, completed the installation (make) and builded index for hg38 reference genome.
I keep getting the same error after the circRNA detection is completed:

/usr/bin/bash: line 1: 305631 Segmentation fault (core dumped)

The "Segmentation fault (core dumped)" pops up also when calling "circminer --help", like:

... For more details and command line options run "circminer --help" Segmentation fault (core dumped)

How can I solve the issue? Thanks in advance for help

@fhach
Copy link
Collaborator

fhach commented Feb 26, 2023

seems like an installation error. @yenyilin, what do you think?

@nliorni
Copy link
Author

nliorni commented Feb 27, 2023

i get the same error with the bioconda installation

@yenyilin
Copy link

yenyilin commented Feb 27, 2023

circminer --help

Since 1de79ac. Seems like the destructor of FASTQParser.

@yenyilin
Copy link

Greetings, I cloned the latest version of circminer, completed the installation (make) and builded index for hg38 reference genome. I keep getting the same error after the circRNA detection is completed:

/usr/bin/bash: line 1: 305631 Segmentation fault (core dumped)

The "Segmentation fault (core dumped)" pops up also when calling "circminer --help", like:

... For more details and command line options run "circminer --help" Segmentation fault (core dumped)

How can I solve the issue? Thanks in advance for help

(1) Can you confirm that you finish the 1600 pairs of reads in the test sample properly?
(2) In that case do you mind sharing the fastq file with us?
(2.1) Otherwise can you share your commands (number of reads, providing a fastq file or device for getting piping)?
(3) While it should not affect your result, we understand the segmentation fault is annoying. The easiest workaround is to only running line 16-23 of fastq_parser.cpp when current_record != NULL.

@yenyilin
Copy link

@nliorni : We can not reproduce the segmentation fault after the detection step yet, and we appreciate it if you can provide more information regarding this scenario. The current workaround will be to add
if ( NULL != current_record){
line 16-25
}

in fastq_parser.cpp so circminer will not try to empty unallocated current_record.

We will update once we ensure that we fix the reproduced segmentation fault after the detection step. Thank you.

@nliorni
Copy link
Author

nliorni commented Mar 2, 2023

good morning! Thank you for the kind answer. The command I ran is:

circminer --verbosity 1 --thread 20 -r hg38.fa -g gencode36.gtf -1 sample_1.fastq -2 sample_2.fastq --output /path/to/output

how can I share the fastq with you? they are gb in size.

The segmentation fault error is the following:

Tue Feb 28 09:57:48 2023 [INFO] Number of threads: 1 Tue Feb 28 09:57:48 2023 [INFO] Input file type: Paired-end Tue Feb 28 09:57:49 2023 [INFO] Kmer size obtained from index: 20 Tue Feb 28 09:57:49 2023 [INFO] Loading GTF file... Tue Feb 28 09:58:24 2023 [INFO] Completed! (CPU time: 35.62s; Real time: 35.85s) Tue Feb 28 09:58:24 2023 [INFO] Genome index type: Full Tue Feb 28 09:58:24 2023 [INFO] Starting read extraction Tue Feb 28 09:58:24 2023 [INFO] + Loading genome index... Tue Feb 28 09:58:33 2023 [INFO] + Completed! (CPU time: 8.51s; Real time: 8.63s) Tue Feb 28 09:58:33 2023 [INFO] + Loading genome sequence... Tue Feb 28 09:58:36 2023 [INFO] + Completed! (CPU time: 3.22s; Real time: 3.24s) Tue Feb 28 09:58:36 2023 [INFO] + Starting pseudo-alignment (Round 1) Tue Feb 28 11:21:25 2023 [INFO] + Completed round 1! (CPU time: 3855.33s; Real time: 4968.88s) Tue Feb 28 11:21:25 2023 [INFO] + Loading genome index... Tue Feb 28 11:21:32 2023 [INFO] + Completed! (CPU time: 6.97s; Real time: 7.03s) Tue Feb 28 11:21:32 2023 [INFO] + Loading genome sequence... Tue Feb 28 11:21:34 2023 [INFO] + Completed! (CPU time: 2.47s; Real time: 2.49s) Tue Feb 28 11:21:34 2023 [INFO] + Starting pseudo-alignment (Round 2) Tue Feb 28 12:18:26 2023 [INFO] + Completed round 2! (CPU time: 3367.59s; Real time: 3411.76s) Tue Feb 28 12:18:26 2023 [INFO] + Loading genome index... Tue Feb 28 12:18:33 2023 [INFO] + Completed! (CPU time: 6.47s; Real time: 6.53s) Tue Feb 28 12:18:33 2023 [INFO] + Loading genome sequence... Tue Feb 28 12:18:35 2023 [INFO] + Completed! (CPU time: 2.46s; Real time: 2.48s) Tue Feb 28 12:18:35 2023 [INFO] + Starting pseudo-alignment (Round 3) Tue Feb 28 13:26:56 2023 [INFO] + Completed round 3! (CPU time: 4078.03s; Real time: 4100.55s) Tue Feb 28 13:26:56 2023 [INFO] Starting circRNA detection Tue Feb 28 13:26:56 2023 [INFO] + Sorting remaining read mappings using GNU sort... Tue Feb 28 13:27:18 2023 [INFO] + Completed! (CPU time: 0.00s; Real time: 22.18s) Tue Feb 28 13:27:18 2023 [INFO] + Loading genome sequence... Tue Feb 28 13:27:22 2023 [INFO] + Completed! (CPU time: 3.82s; Real time: 3.84s) Tue Feb 28 13:29:12 2023 [INFO] + Loading genome sequence... Tue Feb 28 13:29:15 2023 [INFO] + Completed! (CPU time: 3.39s; Real time: 3.41s) Tue Feb 28 13:30:34 2023 [INFO] + Loading genome sequence... Tue Feb 28 13:30:37 2023 [INFO] + Completed! (CPU time: 3.39s; Real time: 3.41s) /usr/bin/bash: line 1: 558958 Segmentation fault (core dumped) circminer --verbosity 1 -r /data/reference_data/hg38/genome/hg38_ucsc_filtered.fa -g /data/reference_data/hg38/annotations/gencode.v36.basic.annotation.gtf -1 /data2/analysis2/storlazzi/fastq/sratools/SRX669021/SRX669021_1.fastq -2 /data2/analysis2/storlazzi/fastq/sratools/SRX669021/SRX669021_2.fastq --output /data3/pipeline_data/output/circminer_GEO/results/SRX669021/circminer/SRX669021

so I don't know if the detection is actually fully completed.

I will attach the fastqc report for this fastq pair.

Sorry to bother you, and thanks again for the help.

nl

SRX669021_1_fastqc.zip

SRX669021_2_fastqc.zip

@yenyilin
Copy link

yenyilin commented Mar 2, 2023

Thank you for the information. We will start testing using SRR1797219 from SRX669021 and keep you updated.

@yenyilin
Copy link

yenyilin commented Mar 2, 2023

In the meantime I will assume you use
https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_36/gencode.v36.basic.annotation.gtf.gz
as your gtf (Basic gene annotation, CHR) and a corresponding fasta file as your hg38_ucsc_filtered.fa(autosomes plus X, Y, and MT) . Let me know if you use alternative GTF so I can better recover your scenarios.

@yenyilin
Copy link

yenyilin commented Mar 3, 2023

We tried to run the current release circminer using 47,209,075 paired-end reads of SRR1797219 and it finished properly. Can you remind us of the gcc version in your system?

@nliorni
Copy link
Author

nliorni commented Mar 3, 2023

Good morning @yenyilin. Yes, I used that reference annotation file, as you can see from the command I ran. The version of gcc I have is 8.5.0. Ok, I understand that you ran circminer on the same sample I have (SRX669021) and it does work. I don't know what the problem might be. Thank you again for your help.
nl

@yenyilin
Copy link

yenyilin commented Mar 3, 2023

@nliorni I will try it again on 8.5.0.

In the meantime I hope you don't mind me sharing some suggestions (which lead to more work on your side). In process_circ.cpp you already finished line 306 but can not proceed to line 325.
(1) Do you mind running our suggested patch in line 16 to 25 of fastq_parser.cpp and tell us the results?

if ( NULL != current_record){
for (int i = 0; i < threadCount; ++i) {
        free(current_record[i].rname);
        free(current_record[i].seq);
        free(current_record[i].rcseq);
        free(current_record[i].comment);
        free(current_record[i].qual);
        free(current_record[i].rqual);
    }
    // free(current_record);
    delete[] current_record;
}

(2) If (1) still failed, can you help tail -n 1600 of both read files and run it again using these 1,600 pairs?

@nliorni
Copy link
Author

nliorni commented Mar 14, 2023

@yenyilin Good morning, sorry for the late response. I will try the suggested patch and let you know. I am currently running circminer on a full dataset, and some samples seems to present this problem, as you can see from the attached log. Also, now is keeping loading the genome sequence. I'll wait and keep this updated. Thanks so much again,
nl
nohup.out.txt

@fhach
Copy link
Collaborator

fhach commented Mar 14, 2023

@nliorni Can you provide what type of resources your system has? RAM, CORES. Are you running using slurm or any type of queue management system where you can restrict the resources?

@nliorni
Copy link
Author

nliorni commented Mar 15, 2023

Greetings @fhach, I am running the analysis without queue management systems on a workstation with 252gb RAM and 64 cores

@nliorni
Copy link
Author

nliorni commented Mar 28, 2023

hello @yenyilin, sorry for the delayed response. I applied the suggested patch and we have more samples working this time, but still 44 on 122 samples of the dataset went through the same error presented in this issue.
We also tried the tail -n 1600 suggestion on the first sample failing:

tail -n 1600 /data2/analysis2/fastq/sratools/SRX669025/SRX669025_1.fastq > /data2/analysis2/fastq/sratools/SRX669025/1600tail_SRX669025_1.fastq
1600_fix_log_669025.txt

it seemed to work without throwing any error, as you can see from the attached log.

Thank you again for all the help, by the way.

@yenyilin
Copy link

@nliorni I only have GCC 8.4.0 and 9.3.0 that they both worked. In these cases I am wondering some system-level issues like Faraz mentioned.
(1) From your log it seems that you ran multiple jobs sequentially instead of simultaneous processing (such as parallel or xargs). Am I correct?
(2) Can you confirm that segmentation fault is reproducible for the same dataset? In short, the same dataset will always crash circminer. This information is important for us to eliminate the possibility of multiple jobs competing memory at the node.

@nliorni
Copy link
Author

nliorni commented Apr 7, 2023

@yenyilin

  1. yes, I am running the job through a bash script sequentially. I had a first try using the Snakemake WMS, but I resorted to a bash script as soon as I got the presented error to exclude it was a Snakemake problem.
  2. yes, the segmentation fault is reproducible for the same dataset: running the bash script again on the same dataset results in the same number of samples that throw an error.
    Thank you again for your help.
    nl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants