Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

microbial genome download using repophlan_get_microbes.py #1

Open
bioinfonext opened this issue Sep 21, 2020 · 2 comments
Open

microbial genome download using repophlan_get_microbes.py #1

bioinfonext opened this issue Sep 21, 2020 · 2 comments

Comments

@bioinfonext
Copy link

bioinfonext commented Sep 21, 2020

Hi,

I am trying to download the microbial genome using repophlan_get_microbes.py script on iMac. But it is running from last 2 days and still keep running, could you please suggest how many total sequences are there?
here is head and tail of log file;


Admins-iMac-3 $ 
faa	ffn	fna	frn

Admins-iMac-3$ cd fan

Admins-iMac-3:fna$ ls | wc -l
  186733
$ head repophlan_microbes.log
2020-09-19 08:17:41,693 repophlan_get_microbes.py INFO     Reading the taxonomy from taxonomy_reduced.txt... 
2020-09-19 08:17:45,182 repophlan_get_microbes.py INFO     Done.
2020-09-19 08:18:09,562 repophlan_get_microbes.py WARNING  GCF_000001215.4 [Drosophila melanogaster ] excluded from download because of uninteresting phyla!
2020-09-19 08:18:09,562 repophlan_get_microbes.py WARNING  GCF_000001405.39 [Homo sapiens ] excluded from download because of uninteresting phyla!
2020-09-19 08:18:09,563 repophlan_get_microbes.py WARNING  GCF_000001635.26 [Mus musculus ] excluded from download because of uninteresting phyla!
2020-09-19 08:18:09,563 repophlan_get_microbes.py WARNING  GCF_000001735.4 [Arabidopsis thaliana ecotype=Columbia] excluded from download because of uninteresting phyla!
2020-09-19 08:18:09,563 repophlan_get_microbes.py WARNING  GCF_000001895.5 [Rattus norvegicus strain=mixed] excluded from download because of uninteresting phyla!
2020-09-19 08:18:09,563 repophlan_get_microbes.py WARNING  GCF_000001905.1 [Loxodonta africana ] excluded from download because of uninteresting phyla!
2020-09-19 08:18:09,563 repophlan_get_microbes.py WARNING  GCF_000002035.6 [Danio rerio ] excluded from download because of uninteresting phyla!
2020-09-19 08:18:09,563 repophlan_get_microbes.py WARNING  GCF_000002075.1 [Aplysia californica ] excluded from download because of uninteresting phyla!

$ tail  repophlan_microbes.log
2020-09-21 20:42:26,871 repophlan_get_microbes.py INFO     Parsing of ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/001/635/075/GCA_001635075.1_ASM163507v1/GCA_001635075.1_ASM163507v1_genomic.gbff.gz
2020-09-21 20:42:27,198 repophlan_get_microbes.py INFO     Parsing of ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/010/319/175/GCA_010319175.1_PDT000567389.1/GCA_010319175.1_PDT000567389.1_genomic.gbff.gz
2020-09-21 20:42:27,482 repophlan_get_microbes.py INFO     Parsing of ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/003/518/025/GCA_003518025.1_ASM351802v1/GCA_003518025.1_ASM351802v1_protein.faa.gz
2020-09-21 20:42:27,608 repophlan_get_microbes.py INFO     Parsing of ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/001/788/535/GCA_001788535.1_ASM178853v1/GCA_001788535.1_ASM178853v1_protein.faa.gz
2020-09-21 20:42:27,628 repophlan_get_microbes.py WARNING  No remote file found (some ffn and faa are know to be missing remotely). Aborting the download of ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/010/997/445/GCA_010997445.1_ASM1099744v1/GCA_010997445.1_ASM1099744v1_protein.faa.gz. <urlopen error ftp error: [Errno ftp error] 550 GCA_010997445.1_ASM1099744v1_protein.faa.gz: No such file or directory>
2020-09-21 20:42:27,628 repophlan_get_microbes.py ERROR    Error in downloading of ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/010/997/445/GCA_010997445.1_ASM1099744v1/GCA_010997445.1_ASM1099744v1_protein.faa.gz <urlopen error ftp error: [Errno ftp error] 550 GCA_010997445.1_ASM1099744v1_protein.faa.gz: No such file or directory>
2020-09-21 20:42:27,628 repophlan_get_microbes.py ERROR    Error in downloading of ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/010/997/445/GCA_010997445.1_ASM1099744v1/GCA_010997445.1_ASM1099744v1_protein.faa.gz: empty fna file!
2020-09-21 20:42:28,239 repophlan_get_microbes.py WARNING  No remote file found (some ffn and faa are know to be missing remotely). Aborting the download of ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/902/778/895/GCA_902778895.1_Rumen_uncultured_genome_RUG12414/GCA_902778895.1_Rumen_uncultured_genome_RUG12414_protein.faa.gz. <urlopen error ftp error: [Errno ftp error] 550 GCA_902778895.1_Rumen_uncultured_genome_RUG12414_protein.faa.gz: No such file or directory>
2020-09-21 20:42:28,239 repophlan_get_microbes.py ERROR    Error in downloading of ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/902/778/895/GCA_902778895.1_Rumen_uncultured_genome_RUG12414/GCA_902778895.1_Rumen_uncultured_genome_RUG12414_protein.faa.gz <urlopen error ftp error: [Errno ftp error] 550 GCA_902778895.1_Rumen_uncultured_genome_RUG12414_protein.faa.gz: No such file or directory>
2020-09-21 20:42:28,239 repophlan_get_microbes.py ERROR    Error in downloading of ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/902/778/895/GCA_902778895.1_Rumen_uncultured_genome_RUG12414/GCA_902778895.1_Rumen_uncultured_genome_RUG12414_protein.faa.gz: empty fna file!
@fbeghini
Copy link
Member

fbeghini commented Dec 7, 2020

Hi,
Sorry for the late reply. You can get an estimate of the total assembly to be downloaded from the assembly_summary_genbank.txt and assembly_summary_refseq.txt files in ftp.ncbi.nlm.nih.gov/genomes/ASSEMBLY_REPORTS.

The download sometimes fails if a high number of CPUs is used since a limited number of connections to the NCBI's FTP can be done.

@bioinfonext
Copy link
Author

Thanks for your response, I was having look on run.sh script and in that,I am not able to understand how screen.py works and on what basis it adds the quality score to genome assembly. Do we also need to use here Pfam database?
Could you please share the correct commands to get the quality score for genomes?

Many thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants