I am currently scraping Gmail data using the Gmail API. Some of the emails I am scraping contain vulgar fractions as seen below:
8⅜
6⅞
7¾
7⅞
The HTML outputs of the above vulgar fractions using the Gmail API are represented below:
8=E2=85=9C
6=E2=85=9E
7=C2=BE
7=E2=85=9E
How may I convert these back to strings such as '8 3/8'
, for processing in Python?
The strings are encoded using the quoted printable encoding, a method of encoding non-ASCII bytes into ASCII. You can decode to
str
like this:prints
which is composed of
str(8)
plus the unicode characterVULGAR FRACTION THREE EIGHTHS
.We can decompose the string further using unicode normalisation
outputs
We can combine the approaches to get all the parts of each string and cast them to ints or fractions:
Result:
Fraction
instances may be converted tofloat
orstr
in the usual way: