I'm trying to parse mailto URLs into a nice object or dictionary which includes subject
, body
, etc. I can't seem to find a library or class that achieves this- Do you know of any?
mailto:[email protected]?subject=mysubject&body=mybody
I'm trying to parse mailto URLs into a nice object or dictionary which includes subject
, body
, etc. I can't seem to find a library or class that achieves this- Do you know of any?
mailto:[email protected]?subject=mysubject&body=mybody
The core urlparse lib does less than a stellar job on mailtos, but gets you halfway there:
In [3]: from urlparse import urlparse
In [4]: urlparse("mailto:[email protected]?subject=mysubject&body=mybody")
Out[4]: ParseResult(scheme='mailto', netloc='', path='[email protected]?subject=mysubject&body=mybody', params='', query='', fragment='')
EDIT
A little research unearths this thread. Bottom line: python url parsing sucks.
import urllib
query = 'mailto:[email protected]?subject=mysubject&body=mybody'.partition('?')[2]
print dict((urllib.unquote(s).decode('utf-8') for s in pair.partition('=')[::2])
for pair in query.split('&'))
# -> {u'body': u'mybody', u'subject': u'mysubject'}
Here is a solution using the re module...
import re
d={}
def parse_mailto(a):
m=re.search('mailto:.+?@.+\\..+?', a)
email=m.group()[7:-1]
m=re.search('@.+?\\..+?\\?subject=.+?&', a)
subject=m.group()[19:-1]
m=re.search('&.+?=.+', a)
body=m.group()[6:]
d['email']=email
d['subject']=subject
d['body']=body
This assumes it is in the same format as you posted. You may need to make modifications to better fit your needs.
You can use urlparse and parse_qs to parse urls with mailto as scheme. Be aware though that according to scheme definition:
mailto:[email protected],[email protected]?subject=mysubject
is identical to
mailto:[email protected]&[email protected]&subject=mysubject
Here's an example:
from urlparse import urlparse, parse_qs
from email.message import Message
url = 'mailto:[email protected]?subject=mysubject&body=mybody&[email protected]'
msg = Message()
parsed_url = urlparse(url)
header = parse_qs(parsed_url.query)
header['to'] = header.get('to', []) + parsed_url.path.split(',')
for k,v in header.iteritems():
msg[k] = ', '.join(v)
print msg.as_string()
# Will print:
# body: mybody
# to: [email protected], [email protected]
# subject: mysubject
I like Alexander's answer but it is in Python 2! We now get urlparse()
and parse_qs()
from urllib.parse
. Also note that sorting the header in reverse puts it in the order: to, from, body.
from email.message import Message
from pathlib import Path
from urllib.parse import parse_qs, urlparse
url = Path("link.txt").read_text()
msg = Message()
parsed_url = urlparse(url)
header = parse_qs(parsed_url.query)
header["to"] = header.get("to", []) + parsed_url.path.split(",")
for k, v in sorted(header.items(), reverse=True):
print(f"{k}:", v[0])
I am just using this as a one-off, when I used msg.as_string()
I got some strange results though so I just went with the string. The values are lists of one value so I access the 0'th entry to make it a string.
You shold use special library like that
https://pypi.python.org/pypi/urlinfo
and contribute and create issue to make Python better ;)
P.S. Does not use Robbert Peters solution bcz it hack and does not work properly. Also using a regular expression is using super BFG Gun to get small bird.
Seems like you might just want to write your own function to do this.
Edit: Here is a sample function (written by a python noob).
Edit 2, cleanup do to feedback: