I want to scrape informations from a website however I can't have the access to this website until I don't check the checkbox.
So in order to access to my data I have to :
- connect to the website URL
- find the form and check a checkbox
- validate the form and go an a new link
- click on a new button
- access to my data.
I don't know if it is possible / easy to do because I never scraped anything (just to prevent this is totally legal and I am not trying to access to confidential data)
Here is my PHP Script. I'm using Symfony DomCrawler and GuzzleHttp.
// Imports and display errors etc...
use Symfony\Component\DomCrawler\Crawler;
$client = new \GuzzleHttp\Client();
$response = $client->get("website.com");
$htmlString = $response->getBody();
$crawler = new Crawler($htmlString,'website.com');
//I'm writting the website address twice bc when I only use guzzle the program display an error of relative URL or something like that.
// Select the input checkbox
$checkbox = $crawler->filter('#condition')->first();
//I tried here to do this : $checkbox->attr('checked','checked'); as Chat GPT suggest me but it didn't work
var_dump($checkbox->attr('checked')); // Here the value is NULL
// So I think I make a mistake here bc the value of the attr of the checkbox is NULL
$form = $crawler->filter('form')->last()->form(); // Select the form
$actionUri = $form->getUri();
echo $actionUri;// here is the next url
$client->post($actionUri, [
'form_params' => $form->getValues(),
'allow_redirects' => [
'max' => 10, // maximum number of redirects to follow
'strict' => true, // whether to apply strict RFC 2616 protocol redirect rules
'referer' => true, // whether to add a Referer header
'protocols' => ['http', 'https'], // allowed redirect protocols
'track_redirects' => true // whether to return an array of all redirect responses
]
]);
// After this script I don't know how I am supposed to continue through the other page
In fact I tried to connect as an usual URL like so
//the script above + :
$url = 'SecondStep.com';
$nextCrawler = new Crawler('',$url);
// but here this url seems to redirect me to the first URL
So I don't know what I'm supposed to do.
Sorry for my terrible english.
Conclusion : I wan't to check a checkbox input and to go an the next URL after click on the submit button