I'm trying to reproduce the dendogram results of this paper, concerning to an specific 16s rRNA analysis.
But I don't know if there is a standard method for data management or data analysis. So, I've trying by myself. Below, a summary.
In the methods section says: "The resulting FASTQ files were deposited at https://www.ncbi.nlm.nih.gov/bioproject/PRJNA386442. MiSeq paired-end raw sequence forward and reverse reads were subsequently merged using ea-utils v1.1.2 with standard settings, followed by a split library step from QIIME v1.9.1 and removal sequence reads shorter than 200 nucleotides, reads that contained ambiguous bases, or reads with an average quality score of less than 30. "
So, I downloaded the sra files using SRATOOLKIT and used this code at the terminal:
for n in {141..188}; do prefetch "SRR5577$n"; done
Later, I converted to fastq files using:
for n in {141..188}; do fastq-dump "SRR5577$n"; done
But, for the merge step I can't use the fastq-join
function or any other in the ea-utils
package on github. It seems data doesn't have a correct format.
Did I do it well? Where can I learn more about this kind of analysis?
I would suggest using
--split-files
in fastq-dump, e.g.:As it appears that the data are paired-end. otherwise you wouldn't need to merge them. It will give you separate forward and reverse read files which presumably you input to ea-utils.