How to use regexp to identify the number of hydrogens in a chemical formula?

133 Views Asked by At

Which expression should I use to identify the number of hydrogen atoms in a chemical formula?

For example:

C40H51N11O19 - 51 hydrogens

C2HO - 1 hydrogen

CO2 - no hydrogens (empty)

Any suggestions?

Thanks!

Cheers!

2

There are 2 best solutions below

2
On BEST ANSWER

You can start using this regex :

H\d*

H -> match literaly the H caracter d* -> match 0 to N time a digit

see exemple and try yourself other regex at : https://regex101.com/r/vdvH8S/2

But regex wont convert for you the result, regex only do lookup.

You need to process your result saying :

  • H with a number : extract the number
  • only H : 1
  • no match : 0
3
On

A Regex Expression that will match H with follwowing digits would be:

/H(\d+)/g
  • The 'H' is a literal charecter match to the H in the given chemical formula
  • () declares a capture group, so you cna then grab the captured group without the H in whatever programming language you are using
  • \d will match any digit along with the + modifier that matches 1 or more

There is no catch all scenarios here, you might be best using something other than a regex.