You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to download the microbial genome using repophlan_get_microbes.py script on iMac. But it is running from last 2 days and still keep running, could you please suggest how many total sequences are there?
here is head and tail of log file;
Admins-iMac-3 $
faa ffn fna frn
Admins-iMac-3$ cd fan
Admins-iMac-3:fna$ ls | wc -l
186733
$ head repophlan_microbes.log
2020-09-19 08:17:41,693 repophlan_get_microbes.py INFO Reading the taxonomy from taxonomy_reduced.txt...
2020-09-19 08:17:45,182 repophlan_get_microbes.py INFO Done.
2020-09-19 08:18:09,562 repophlan_get_microbes.py WARNING GCF_000001215.4 [Drosophila melanogaster ] excluded from download because of uninteresting phyla!
2020-09-19 08:18:09,562 repophlan_get_microbes.py WARNING GCF_000001405.39 [Homo sapiens ] excluded from download because of uninteresting phyla!
2020-09-19 08:18:09,563 repophlan_get_microbes.py WARNING GCF_000001635.26 [Mus musculus ] excluded from download because of uninteresting phyla!
2020-09-19 08:18:09,563 repophlan_get_microbes.py WARNING GCF_000001735.4 [Arabidopsis thaliana ecotype=Columbia] excluded from download because of uninteresting phyla!
2020-09-19 08:18:09,563 repophlan_get_microbes.py WARNING GCF_000001895.5 [Rattus norvegicus strain=mixed] excluded from download because of uninteresting phyla!
2020-09-19 08:18:09,563 repophlan_get_microbes.py WARNING GCF_000001905.1 [Loxodonta africana ] excluded from download because of uninteresting phyla!
2020-09-19 08:18:09,563 repophlan_get_microbes.py WARNING GCF_000002035.6 [Danio rerio ] excluded from download because of uninteresting phyla!
2020-09-19 08:18:09,563 repophlan_get_microbes.py WARNING GCF_000002075.1 [Aplysia californica ] excluded from download because of uninteresting phyla!
$ tail repophlan_microbes.log
2020-09-21 20:42:26,871 repophlan_get_microbes.py INFO Parsing of ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/001/635/075/GCA_001635075.1_ASM163507v1/GCA_001635075.1_ASM163507v1_genomic.gbff.gz
2020-09-21 20:42:27,198 repophlan_get_microbes.py INFO Parsing of ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/010/319/175/GCA_010319175.1_PDT000567389.1/GCA_010319175.1_PDT000567389.1_genomic.gbff.gz
2020-09-21 20:42:27,482 repophlan_get_microbes.py INFO Parsing of ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/003/518/025/GCA_003518025.1_ASM351802v1/GCA_003518025.1_ASM351802v1_protein.faa.gz
2020-09-21 20:42:27,608 repophlan_get_microbes.py INFO Parsing of ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/001/788/535/GCA_001788535.1_ASM178853v1/GCA_001788535.1_ASM178853v1_protein.faa.gz
2020-09-21 20:42:27,628 repophlan_get_microbes.py WARNING No remote file found (some ffn and faa are know to be missing remotely). Aborting the download of ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/010/997/445/GCA_010997445.1_ASM1099744v1/GCA_010997445.1_ASM1099744v1_protein.faa.gz. <urlopen error ftp error: [Errno ftp error] 550 GCA_010997445.1_ASM1099744v1_protein.faa.gz: No such file or directory>
2020-09-21 20:42:27,628 repophlan_get_microbes.py ERROR Error in downloading of ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/010/997/445/GCA_010997445.1_ASM1099744v1/GCA_010997445.1_ASM1099744v1_protein.faa.gz <urlopen error ftp error: [Errno ftp error] 550 GCA_010997445.1_ASM1099744v1_protein.faa.gz: No such file or directory>
2020-09-21 20:42:27,628 repophlan_get_microbes.py ERROR Error in downloading of ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/010/997/445/GCA_010997445.1_ASM1099744v1/GCA_010997445.1_ASM1099744v1_protein.faa.gz: empty fna file!
2020-09-21 20:42:28,239 repophlan_get_microbes.py WARNING No remote file found (some ffn and faa are know to be missing remotely). Aborting the download of ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/902/778/895/GCA_902778895.1_Rumen_uncultured_genome_RUG12414/GCA_902778895.1_Rumen_uncultured_genome_RUG12414_protein.faa.gz. <urlopen error ftp error: [Errno ftp error] 550 GCA_902778895.1_Rumen_uncultured_genome_RUG12414_protein.faa.gz: No such file or directory>
2020-09-21 20:42:28,239 repophlan_get_microbes.py ERROR Error in downloading of ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/902/778/895/GCA_902778895.1_Rumen_uncultured_genome_RUG12414/GCA_902778895.1_Rumen_uncultured_genome_RUG12414_protein.faa.gz <urlopen error ftp error: [Errno ftp error] 550 GCA_902778895.1_Rumen_uncultured_genome_RUG12414_protein.faa.gz: No such file or directory>
2020-09-21 20:42:28,239 repophlan_get_microbes.py ERROR Error in downloading of ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/902/778/895/GCA_902778895.1_Rumen_uncultured_genome_RUG12414/GCA_902778895.1_Rumen_uncultured_genome_RUG12414_protein.faa.gz: empty fna file!
The text was updated successfully, but these errors were encountered:
Hi,
Sorry for the late reply. You can get an estimate of the total assembly to be downloaded from the assembly_summary_genbank.txt and assembly_summary_refseq.txt files in ftp.ncbi.nlm.nih.gov/genomes/ASSEMBLY_REPORTS.
The download sometimes fails if a high number of CPUs is used since a limited number of connections to the NCBI's FTP can be done.
Thanks for your response, I was having look on run.sh script and in that,I am not able to understand how screen.py works and on what basis it adds the quality score to genome assembly. Do we also need to use here Pfam database?
Could you please share the correct commands to get the quality score for genomes?
Hi,
I am trying to download the microbial genome using repophlan_get_microbes.py script on iMac. But it is running from last 2 days and still keep running, could you please suggest how many total sequences are there?
here is head and tail of log file;
The text was updated successfully, but these errors were encountered: