I have a textarea containing some markdown. I do not want users to post html in it, unless it is inside the markdown code block like
``` someLanguageCode
<span>some html inside markdown code block</span>
```
I do not want to allow any html outside the markdown code block. So this would be illegal:
<span>some html tag outside code block</span>
<div>some more multiline html code outside
</div>
``` someLanguageCode
<span>some html inside markdown code block</span>
```
I was able to get a regex for single line html tags. <([a-zA-Z][a-zA-Z0-9]*)\b[^>]*>(.*?)<\/\1>
I am unable to
- get a regex that supports multi line html tags and
- to check whether that html is outside markdown code block.
I've made a jsfiddle to play around with this problem which shows what should match or should be rejected.
I'm doing this as an attempt to avoid obvious XSS injections.
As it was already mentioned in a comment, you shouldn't try to parse the whole HTML with a regex. I think you just want to strip the tags in the end and mark it as not valid. I created a jsfiddle where I put some code that parses the structure and gives you the possibility to apply your code in the markdown area or outside: