Trying to crawl a website with some redirections

74 Views Asked by At

I want to scrape informations from a website however I can't have the access to this website until I don't check the checkbox.

So in order to access to my data I have to :

  • connect to the website URL
  • find the form and check a checkbox
  • validate the form and go an a new link
  • click on a new button
  • access to my data.

I don't know if it is possible / easy to do because I never scraped anything (just to prevent this is totally legal and I am not trying to access to confidential data)

Here is my PHP Script. I'm using Symfony DomCrawler and GuzzleHttp.

// Imports and display errors etc...
use Symfony\Component\DomCrawler\Crawler;

$client = new \GuzzleHttp\Client();
$response = $client->get("website.com");
$htmlString = $response->getBody();

$crawler = new Crawler($htmlString,'website.com');
//I'm writting the website address twice bc when I only use guzzle the program display an error of relative URL or something like that.


// Select the input checkbox
$checkbox = $crawler->filter('#condition')->first();
//I tried here to do this : $checkbox->attr('checked','checked'); as Chat GPT suggest me but it didn't work
var_dump($checkbox->attr('checked')); // Here the value is NULL 
// So I think I make a mistake here bc the value of the attr of the checkbox is NULL


$form = $crawler->filter('form')->last()->form(); // Select the form 

$actionUri = $form->getUri(); 
echo $actionUri;// here is the next url 
$client->post($actionUri, [
    'form_params' => $form->getValues(),
    'allow_redirects' => [
        'max' => 10, // maximum number of redirects to follow
        'strict' => true, // whether to apply strict RFC 2616 protocol redirect rules
        'referer' => true, // whether to add a Referer header
        'protocols' => ['http', 'https'], // allowed redirect protocols
        'track_redirects' => true // whether to return an array of all redirect responses
    ]
]);
// After this script I don't know how I am supposed to continue through the other page

In fact I tried to connect as an usual URL like so

//the script above + : 
$url = 'SecondStep.com';
$nextCrawler = new Crawler('',$url); 
// but here this url seems to redirect me to the first URL 

So I don't know what I'm supposed to do.

Sorry for my terrible english.

Conclusion : I wan't to check a checkbox input and to go an the next URL after click on the submit button

0

There are 0 best solutions below