I've spent hours trying to get AddOrReplaceReadgroups to work but I keep getting the same error
I am using GATK4
I have a file called plate1_rg_fields.txt which looks like this:
UZH-CO-v001_CGTCTAAT 1 lib1 ILLUMINA unit1 1
UZH-CO-v001_AGACTCGT 1 lib1 ILLUMINA unit1 1
UZH-CO-v001_GCACGTCA 1 lib1 ILLUMINA unit1 1
etc etc etc it's 384 lines for a 384 well plate and it's tab delimited
I then run:
cat plate1_rg_fields.txt | while read SAMPLE ID LB PL PU SM
do
gatk AddOrReplaceReadGroups --INPUT "$SAMPLE".bam --OUTPUT "$SAMPLE".rg.bam --RGID "$ID" --RGLB "$LB" --RGPL "$PL" --RGPU "$PU" --RGSM "$SM"
done
The tool starts and I can see this in my log.out file
it starts like this:
Using GATK jar /data/colpe/conda/envs/gatk_env/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /data/colpe/conda/envs/gatk_env/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar AddOrReplaceReadGroups --INPUT UZH-CO-v001_AGACTCGT.bam --OUTPUT UZH-CO-v001_AGACTCGT.rg.bam --RGID 1 --RGLB lib1 --RGPL ILLUMINA --RGPU unit1 --RGSM 1
So it is reading my file and using the correct bam files and all
But then the error I get is:
', doesn't.of tags in a SAM header must adhere to the regular expression '^[ -~]+$',but the value provided for RGSM, '1
Tool returned:
1
I don't understand this, as I specified the nr 1 as my RGSM so that should adhere to the required regex, no? The odd thing is, when I try just one sample it works (ie the first line of my tab delimited file) but when I then run it with all samples I get this error and no files are written.
Ok do I worked it out. The issue was something with reading my file. What I had to do is add an extra column in the file with random stuff and then it read the SM smoothly and everything was ok.