I'm trying to download all fasta files associated with one organism from ncbi.
I tried wget -r -l3 -A "*.fna.gz" ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/Microcystis_aeruginosa/
to get all files ending in .fna.gz from the third level down, but then it just rejects everything with the following output:
Removed “ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/Microcystis_aeruginosa/latest_assembly_versions/.listing”. Rejecting “GCF_000010625.1_ASM1062v1”. Rejecting “GCF_000307995.1_ASM30799v2”. Rejecting “GCF_000312165.1_ASM31216v1”. Rejecting “GCF_000312185.1_ASM31218v1”. Rejecting “GCF_000312205.1_ASM31220v1”. Rejecting “GCF_000312225.1_ASM31222v1”. Rejecting “GCF_000312245.1_ASM31224v1”. Rejecting “GCF_000312265.1_ASM31226v1”. Rejecting “GCF_000312285.1_ASM31228v1”. Rejecting “GCF_000312725.1_ASM31272v1”. Rejecting “GCF_000330925.1_MicAerT1.0”. Rejecting “GCF_000332585.1_MicAerD1.0”. Rejecting “GCF_000412595.1_spc777-v1”. Rejecting “GCF_000599945.1_Mic70051.0”. Rejecting “GCF_000787675.1_ASM78767v1”. Rejecting “GCF_000981785.1_ASM98178v1”.
Any ideas on why it's rejecting these directories? Thanks for your help.
Not exactly sure why it's rejecting your request, but when I was still doing this type of thing, I found that if I don't download queries in smaller batches, the NCBI server timed me out and blocked my IP for a while before I could download again. This doesn't seem to be the same problem that your seeing, but maybe this script might get the same thing done. Let me know if this helps.