So I am loading some remote content and need to use regex to isolate the the content of some tags.
set xmlhttp = CreateObject("MSXML2.ServerXMLHTTP")
xmlhttp.open "GET", url, false
xmlhttp.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
xmlhttp.setRequestHeader "Accept-Language", "en-us"
xmlhttp.send "x=hello"
status = xmlhttp.status
if err.number <> 0 or status <> 200 then
if status = 404 then
Response.Write "[EFERROR]Page does not exist (404)."
elseif status >= 401 and status < 402 then
Response.Write "[EFERROR]Access denied (401)."
elseif status >= 500 and status <= 600 then
Response.Write "[EFERROR]500 Internal Server Error on remote site."
else
Response.write "[EFERROR]Server is down or does not exist."
end if
else
data = xmlhttp.responseText
I basically need to get the content of the <title>Here is the title</title>
also the meta description, keywords and some selected open graph meta data.
And finally I need to get the content of the first <h1>Heading</h1>
and <p>Paragraph</p>
How can I parse the html data to get these things? Should I use regex?
Use the
Mid
function combined with theInstr
function. I built a function which uses theMid
function to determine the tag wrapped text by finding the position of each tag using theInstr
function:When you run this function like this, it will return
My Title
And in your case, You would do it like this:
This will get the content in your
<title>
tag. orThis will get the content in your
<h1>
tag.The
Instr
function returns the first string found in the data, so this function will do exactly what you need.