Extract a content of a html page in php

8.7k Views Asked by At

There is any way to extract the content of a HTML page that starts from <body> and ends with </body> in php. If there can anyone post some sample code.

3

There are 3 best solutions below

0
On

Try PHP Simple HTML DOM Parser

$html = file_get_html('http://www.example.com/');
$body = $html->find('body');
0
On

You should have a look at the DOMDocument reference.

This example reads a html document, creates a DOMDocument and gets the body tag:

libxml_use_internal_errors(true);
$dom = new DOMDocument;
$dom->loadHTMLFile('http://example.com');
libxml_use_internal_errors(false);

$body = $dom->getElementsByTagName('body')->item(0);

echo $body->textContent; // print all the text content in the body

You should also check out the following resources:

DOM API Documentation
XPATH language specification

0
On

You can also try to use non-DOM solution based on strpos function:

$html = file_get_contents($url);
$html = substr($html,stripos($html,'<body>')+6);
$html = substr($html,0,strripos($html,'</body>'));

stripos is case insensitive version of strpos, strripos is case insensitive 'rightmost position' version of strpos.

Hope that it will help you!