I'm new to programming and having some trouble in actually trying to find out what I need to do in the first place. Any help would be fantastic.
The problem: I have several files in which I need to count how many "C"s there are in a given line and then need this printed out so that I have #ofC's and total number of characters. There are several million lines of this that needs to be analysed in each file. The data is grouped into four lines, with only the second line containing the data I need.
Example from file:
@M00859:19:000000000-A60W6:1:1101:17503:1628 1:N:0:1
TTATGTATTAAAATTAAGTTTTTTATAAAGTTATTTATTTTGGTTTGATTGGAACGACGAAGAAGTTGTTATATTTTTAAATTGGGAAATTGGAATTATTTGATTAGAAAGTGGGATAATTTTTTTATTTTAATTTTTATTAGATTTATTTAAGTTTTTGGTGTTTTTATAATTTTTTATGTATTTAAATTAAGTTTTTTATGAAGTGATTTAT
+
GGBGBGFHHG3A1DGDEDHGHHGGAG22FBGGFGHHFHHHHG?GGH?FGHB0DGHFCG???//CCHGFHHEGEHHHHHECBGGG1?EFGGH1EF1GHBHFGBFDHEB1GBED11//GB1FFGHHGGHHHHHB1FHFHHEHHE11GHHHHHHFFFHHHHG?CHGHGHHGHHFBHHHHHGHGGHFHHHHBFHHHHEHHHHGGGGFGFBFBFFGGGG
The final aim is to create a scatter plot of the number of C's versus the total number of characters for each file so we can compare the results between files.
Any help would be fantastic!
Cheers, Justin