rebol parsing html: get error "title has no value"

82 Views Asked by At

I'm trying to parse an html page:

url: https://dzone.com/articles/2-entity-framework-alternatives-or-give-me-data
html: read url

parse html [
    to {<h1 class="article-title" itemprop="headline">}
    thru {<h1 class="article-title" itemprop="headli
    ne">}copy title to {</h1>}
]

probe title

Can't see why it doesn't work since I get error "title has no value"

2

There are 2 best solutions below

1
On BEST ANSWER

I assume that you're using Rebol/view since the free versions don't do https though rebol3 does.

If you want to see if something is working you should look at the return value of the parse, and you'll see it's false which means that there's a problem with your parse rule. Anyway, this works for me though the quotes around the tags are not necessary as < and > are both string delimiters.

>> parse html [
    thru <h1 class="article-title" itemprop="headline">
    thru <h1 class="article-title" itemprop="headline">
    copy title to </h1> to end
]
== true

>> trim/head/tail title
== "2 Entity Framework Alternatives (or Give Me Data!)"
0
On

It does not work most probably because the first to stops before the matched string, so that thru starts at the beginning of the first occurence of <h1 ...>, not at the second as you might have expected. You need to skip the first occurrence, before trying to search for the second one. You can achieve that using two thru rules as shown in another answer, or just repeat the rule twice to avoid duplicating it:

parse html [
    2 thru <h1 class="article-title" itemprop="headline">
    copy title to </h1> to end
]

Notice the final to end rule, which will make parse return true if your rules succeed in reaching the end. The to end rule is a placeholder rule, as you do not care about what is following </h1>, but want to reach the end of the input anyway.

EDIT: Testing the code you submitted works fine from here unchanged. The editing of your question has actually fixed the cause of the error. I can reproduce your issue with your original code.