Yahoo Pipes and Website Name

167 Views Asked by At

How do I fetch Page Name with Yahoo Pipes?

I'm making a news / blog aggregator, and need to know the name of the site where the info is coming from (bbc, cnn, fox, etc).

Do I need to do this with REGEX?

Anyone that can help?

2

There are 2 best solutions below

0
On

I found this sample pipe http://pipes.yahoo.com/pipes/pipe.info?_id=69b5dce1c59501a0c64a660c1cfdb856. The page title included the name of the site too. I am not sure if this what you are looking for.

0
On

You can fetch the page using the XPath Fetch Page or Fetch Feed modules in the Sources menu. Maybe with others too.

After that you can extract the page name itself using the various operators, possibly Regex, or others, depending on the source page you are using and the output you want to get.

In general your question is too broad and difficult to answer. To get you started, I created an example pipe that extracts the title of your question from this post, which is basically the "page name" of the current page.

http://pipes.yahoo.com/pipes/pipe.info?_id=668acf3f807c30d7b75f12459edd3252

I used the XPath Fetch Page with parameters:

  • URL = this page
  • Extract using XPath = //div[@id="question-header"]

I got that div path by inspecting the source code of this page, where I saw that div#question-header is the container of a question. I could have selected a deeper inner container or a higher level container. It all depends on the amount of other information you need. The more information you want to you from the page, the higher level container you select.

Next, I used the Create RSS operator to create a proper RSS feed, with parameters:

  • Title = h1.a
  • Link = h1.a.href

I chose these elements because in the container I extracted with xpath, the page name is inside h1 a. In Yahoo Pipes you use a dot as the path separator.