I'm in the middle of integrating MarkdownSharp, a serverside Markdown compilation library. I have that working, but now I need to sanitize the generated html.
I took a look at the Stack Exchange Data Explorer source code to see how they sanitize their html, and see that they use the following regex to sanitize images post-conversion:
private static readonly Regex _whitelist_img =
new Regex(
@"
^<img\s
src=""https?://[-a-z0-9+&@#/%?=~_|!:,.;\(\)]+""
(\swidth=""\d{1,3}"")?
(\sheight=""\d{1,3}"")?
(\salt=""[^""<>]*"")?
(\stitle=""[^""<>]*"")?
\s?/?>$",
RegexOptions.Singleline | RegexOptions.ExplicitCapture | RegexOptions.Compiled |
RegexOptions.IgnorePatternWhitespace);
I've been wrestling with how to do write an analagous regex for whitelist_iframe - that ensures that the iframe contains a link from youtube or vimeo. The following links are examples of what I'd like to embed:
<iframe width="560" height="315" src="//www.youtube.com/embed/IZ_ScEebDOM?rel=0" frameborder="0" allowfullscreen></iframe>
<iframe src="//player.vimeo.com/video/80825843?title=0&byline=0&portrait=0" width="500" height="281" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>
So I believe the above needs to be modified to:
- Account for
//
instead of http or https - Account for
</iframe>
closing tag - Account for
//www.youtube.com
or//player.vimeo.com
being required in the beginning of thesrc
tag.
I'm in the middle of butchering this up as my first regex... any help with this would be much appreciated.
Note that I am not looking to introduce additional libraries or complexity here with a better overall approach, I just want to augement code that's already working, with regex.
Account for // instead of http or https
Remove the "https?:" from the existing regex:
Account for closing tag
Add the closing tag after the end of your input:
Account for //www.youtube.com or //player.vimeo.com being required in the beginning of the src tag.
Add the desired domains in a selection list: