Given a HTML page fetched from
val html = io.Source.fromURL("http://example.org/aPage.html").mkString()
how to extract the contents wrapped within a given tag ? To illustrate this consider for instance this HTML fragment and tag <textarea>
,
val html = "<p>Marginalia</p>
<textarea rows="3" cols="10">Contents of interest"</textarea
<p>More marginalia</p>"
how to obtain "Contents of interest"
?
There are two easy ways to do this:
Scala XML
Add the Scala XML dependency to your project:
libraryDependencies += "org.scala-lang.modules" %% "scala-xml" % "1.0.3"
Now you can parse your HTML code and select all
textarea
tags.If your HTML is valid and want to do multiple XPath queries then this may be the better way. Also check out this blogpost for more info on what
\\
means or how to use the Scala-XML library.Regexp
Another simple way to do this is to define a regular expression and find the matches: