Regex to Extract #hashtags from MMD metadata in Python

398 Views Asked by other_other At 30 December 2013 at 16:31

I'm trying to extract all the #hashtags from the "Tags: #tag1 #tag2" line of a multimarkdown plaintext file. (I'm in Python multiline mode.)

I've tried using lookaheads:

^(?=Tags:\s.*)#(\w+)\b

and lookbehinds:

#(\w+)\b(?<=Tags:^\s)

Plain vanilla #(\w+)\b works, except it picks up any #hashtag that might appear later in the document.

Any hints, help, instruction appreciated.

Original Q&A

There are 2 best solutions below

blaze On 30 December 2013 at 19:00 BEST ANSWER

text = "\n\n#bogus\nTags: #foo #bar\n"

First, you need to get the line:

line = re.findall(r'Tags:.+\n', text)
# line = ['Tags: #foo #bar\n']

Lastly, you need to get the tags from the line:

tags = re.findall(r'#(\w+)', line[0])
# tags = ['foo', 'bar']
tags = re.findall(r'#\w+', line[0])
# tags = ['#foo', '#bar']

Lookbehind won't work since you would need to provide a pattern that doesn't have a fixed width.

P̲̳x͓L̳ On 30 December 2013 at 21:36

First get index where hash is located in the input text and then use re.findall to get repeated captures. Following example prints ['#tag1', '#tag2']

text = "Tags: #tag1 #tag2"

matched = re.search(r'^Tags([^#]+)', text)
if matched:
    tag_text = text[matched.end():]
    hash_tags = re.findall(r'(#(?:[^#\s]+(?:\s*?)))', tag_text)
    print hash_tags

Regex to Extract #hashtags from MMD metadata in Python

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in REGEX

Related Questions in MULTIMARKDOWN

Trending Questions

Popular # Hahtags

Popular Questions