C# Web Scraping with HTMLAgilityPack, can't move to the next page?

86 Views Asked by At

I have a program that can successfully login and can receive cookies. However, I've noticed when I try to move to the next page after logging in, the HTML isn't changing. So I'm assuming that I haven't actually switched to the next page.

I have pasted the code below. Because I haven't reached the next HTML page, I am not hitting the select id ProviderId Hidden tag my program needs to work to fill out the input box. Is there something semantically wrong with my code? Yes I understand that this code is basically spaghetti code at the moment. It's what happens right before or duing my ParseAndSelectFirstOption method that is causing issues.

So I log the HTML after logging in and "moving" to the next page, and the HTML content is the exact same. Is it that something is blocking me?

Code Below

using System;
using System.Collections.Generic;
using System.Net;
using System.Net.Http;
using System.Text;
using System.Threading.Tasks;
using HtmlAgilityPack;

class Program
{
    static void ParseAndSelectFirstOption(string htmlContent)
    {
        // Create a new HtmlDocument and load the HTML content
        HtmlDocument document = new HtmlDocument();
        Console.WriteLine(htmlContent);
        document.LoadHtml(htmlContent);

        // Find the select element representing the drop-down menu
        HtmlNode selectElement = document.DocumentNode.SelectSingleNode("//select[@id='ProviderId_hidden']");

        if (selectElement != null)
        {
            // Find the first option within the select element
            HtmlNode firstOption = selectElement.SelectSingleNode("option");

            if (firstOption != null)
            {
                // Set the 'selected' attribute of the first option to 'selected'
                firstOption.SetAttributeValue("selected", "selected");
                Console.WriteLine("First option selected");
            }
            else
            {
                Console.WriteLine("No options found in the drop-down menu.");
            }
        }
        else
        {
            Console.WriteLine("Drop-down menu not found.");
        }

    }
    static async Task Main(string[] args)
    {
        // URL of the login page
        string loginUrl = "https://medi.hfs.illinois.gov/IdentityGuardAuth/IdentityGuardLogin.aspx?IGDest=https://medi.hfs.illinois.gov/medi/mlogin.do";

        // Credentials for logging in
        string username = "xxx";
        string password = "xxx";

        // Create an instance of HttpClientHandler to manage cookies
        var handler = new HttpClientHandler
        {
            CookieContainer = new CookieContainer(),
            UseCookies = true
        };

        // Create an instance of HttpClient with the handler
        using HttpClient httpClient = new HttpClient(handler);

        // Prepare form data for login
        var formData = new Dictionary<string, string>
        {
            { "ctl00$cphMain$UserIdTextBox", username },
            { "ctl00$cphMain$PasswordTextBox", password }
        };

        // Encode form data
        var encodedFormData = new FormUrlEncodedContent(formData);

        // Send POST request to login endpoint
        HttpResponseMessage loginResponse = await httpClient.PostAsync(loginUrl, encodedFormData);

        // Check if login was successful (status code 200 or redirection)
        if (loginResponse.IsSuccessStatusCode)
        {
            var responseCookies = handler.CookieContainer.GetCookies(new Uri(loginUrl));
            if (responseCookies.Count > 0)
            {
                Console.WriteLine("Received cookies:");
                foreach (Cookie cookie in responseCookies)
                {
                    Console.WriteLine($"{cookie.Name}: {cookie.Value}");
                    Console.WriteLine("Login Success");
                }
                // URL of the page you want to navigate to after logging in
                string nextPageUrl = "https://medi.hfs.illinois.gov/iec/login.do";
                // Add session cookies to the request headers
                foreach (Cookie cookie in responseCookies)
                {
                    httpClient.DefaultRequestHeaders.Add("Cookie", $"{cookie.Name}={cookie.Value}");
                }
                foreach (var header in httpClient.DefaultRequestHeaders)
                {
                    Console.WriteLine($"{header.Key}: {string.Join(", ", header.Value)}");
                }
                // Send a GET request to the next page URL
                HttpResponseMessage nextPageResponse = await httpClient.GetAsync(nextPageUrl);
                // Check if navigation was successful (status code 200)
                if (nextPageResponse.IsSuccessStatusCode)
                {
                    // Process the content of the next page if needed
                    string nextPageContent = await nextPageResponse.Content.ReadAsStringAsync();
                    Console.WriteLine("Navigation to the next page succeeded");
                    // Assume htmlContent contains the HTML content of the page you've navigated t
                    Console.WriteLine(nextPageContent);

                    // Parse the HTML content and interact with the drop-down menu
                    ParseAndSelectFirstOption(nextPageContent);
                }
                else
                {
                    Console.WriteLine("Navigation to the next page failed. Status code: " + nextPageResponse.StatusCode);
                }
            }
            else
            {
                Console.WriteLine("No cookies received. Login may have failed.");
            }
        }
        else
        {
            Console.WriteLine($"Login failed. Status code: {loginResponse.StatusCode}");
        }
    }
}

The HTML of the First Page (The second page also is very similar html) The third page is the one I can't reach.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head><title>
        myHFS Login
</title><meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
    <script LANGUAGE="JavaScript" src="includes/scripts/menu.js"></script>
    <script LANGUAGE="JavaScript" src="includes/scripts/reload.js"></script>
    <link rel="stylesheet" href="styles/style_myhfs.css" type="text/css" />
                <link rel="stylesheet" href="./css/passwordvalidation.css" type="text/css"  />   
                
        
                <script type="text/javascript" src="https://webservices.illinois.gov/CDN/jQuery/3.6.0/jquery-3.6.0.slim.min.js"></script>
                <script type="text/javascript" src="./Scripts/js/passwordvalidation.js"></script>
                
        

    <script language="Javascript" type="text/javascript" src="Scripts/utilities.js"></script>
</head>

<body id="ctl00_pageBody" leftmargin="0" topmargin="0" marginwidth="0" marginheight="0" onunload="initFocus();" onload="initFocus();">

<table width="100%" border="0" cellpadding="0" cellspacing="0">
        <tr> 
                <td class="tmpl_headerbackgroundcolor"  height="60" rowspan="2" align="left" valign="bottom" width="275">
                        <img src="images/hd_montage_hfs.gif" width="275" height="60" border=0 alt="Illinois Department of Healthcare and Family Services">
                </td>
                <td class="tmpl_headerbackgroundcolor"  height="60" rowspan="2" align="left" valign="bottom" width="275"> 
                        <!-- Change the url to point to your site homepage Example URL Format: http://www.state.il.us -->
                        <img src="images/hd_title_hfs.gif" alt="Illinois Department of Healthcare and Family Services" width="275" height="60" border=0>
                </td>
                <td class="tmpl_headerbackgroundcolor"  height="60" rowspan="2" align="left" valign="middle" width="100%">&nbsp;</td>
                <td class="tmpl_headerbackgroundcolor" height="40" width="255" colspan="2" align="right" valign="middle">
                        <img src="images/blank.gif" width="225" height="1" alt="">
                        <br />                  
                        <a href="http://www.myhfs.illinois.gov" target="_parent">
                                <img src="images/hd_url_myhfs.gif" alt="www.myhfs.illinois.gov" border="0" width="205" height="20">
                        </a> 
                </td>
        </tr>
        <tr> 
                <td class="tmpl_headerbackgroundcolor" align="right" valign="bottom" height="20" width="20">
                        <img src="images/hd_tabtriangle_hfs.gif" border="0" width="20" height="20" alt="">
                </td>
                <td class="tmpl_tabbackgroundcolor" align="center" valign="bottom" width="205" height="1">
                        <img src="images/blank.gif" alt="" width="205" height="2" border="0" />
                        <a href="http://www.illinois.gov/gov/" class="tmpl_Governor">JB Pritzker, Governor</a>
                </td>
        </tr>
</table>
 
<table width="100%" border="0" cellspacing="0" cellpadding="0" >
  <tr> 
    <td class="tmpl_tabbackgroundcolor"><img src="images/blank.gif" width="5" height="8" ></td>
  </tr>
</table>

<table class="tmpl_brdrclr" width="100%" border="0" cellspacing="1" cellpadding="1" height="85%">
  <tr> 
    <td height="398" width="125" class="tmpl_sidebackground" valign="top">      
        <script language="Javascript"> 
 function openwindow() 
 { 
      win = window.open("https://autora01.illinois.gov/sub_agree.htm","PKI_Registration","toolbar=no,location=no,directories=no,status=no,menubar=no,scrollbars=yes,resizable=no,copyhistory=yes,width=600,height=450,left=20,top=10");
      win.focus();
 } 
 
function openrecovery()
{
       win1 = window.open("http://www2.illinois.gov/PKI/Pages/forgotpassword.html","PKI_Recovery","toolbar=no,location=no,directories=no,status=no,menubar=no,scrollbars=yes,resizable=no,copyhistory=yes,width=600,height=450,left=20,top=10");
         win1.focus();
}
        </script>
 
<table width="100%" border="0" cellspacing="1" cellpadding="1" class="tmpl_brdrclr">
  <tr> 
    <td class="tmpl_tabbackgroundcolor"> 
      <h1 class="tmpl_h1">
                <img src="images/blank.gif" alt="" border="0" width="1">
                <img src="images/blank.gif" alt="" border="0" width="1">
                        myHFS
                <!-- skip to content link for accessibility purposes -->
                <a href="#content"><img src="images/blank.gif" alt="Skip to Content" border="0" width="1"></a>
                <!-- skip to "state links" link for accessibility purposes -->
                <a href="#state"><img src="images/blank.gif" alt="Skip to State Links" border="0" width="1" height="1"></a>
                </h1> 
    </td>
  </tr>
 
  <!-- Start myHFS Links -->
 
  <tr> 
    <td class="tmpl_menu" onMouseOver="this.className='tmpl_menuover';" onClick="self.location='/medi/mlogin.do'" onMouseOut="this.className='tmpl_menu';" width="100%" height="10"> 
      <a href="/medi/mlogin.do" class="tmpl_menulinks" target="_parent">Login</a> 
    </td>
  </tr>
 
    <tr> 
    <td class="tmpl_menu" onMouseOver="this.className='tmpl_menuover';" onClick="self.location='http://www.myhfs.illinois.gov/gettingstarted.html'" onMouseOut="this.className='tmpl_menu';" width="100%" height="10"> 
      <a href="http://www.myhfs.illinois.gov/gettingstarted.html" class="tmpl_menulinks" target="_parent">Getting Started</a> 
    </td>
   </tr>
 
    <tr> 
    <td class="tmpl_menu" onMouseOver="this.className='tmpl_menuover';" onClick="self.location='http://www.myhfs.illinois.gov/browserdetection/BrowserCheck.html'" onMouseOut="this.className='tmpl_menu';" width="100%" height="10"> 
      <a href="http://www.myhfs.illinois.gov/browserdetection/BrowserCheck.html" class="tmpl_menulinks" target="_parent">Check Browser</a> 
    </td>
   </tr>
 
   <tr> 
    <td class="tmpl_menu" onMouseOver="this.className='tmpl_menuover';" onClick="self.location='http://www2.illinois.gov/PKI/Pages/newuser.html'" onMouseOut="this.className='tmpl_menu';" width="100%" height="10"> 
      <a href="http://www2.illinois.gov/PKI/Pages/newuser.html" class="tmpl_menulinks" target="_parent">Register</a> 
    </td>
  </tr>
   
   <tr> 
    <td class="tmpl_menu" onMouseOver="this.className='tmpl_menuover';" onClick="self.location='http://www.myhfs.illinois.gov/contactform.html'" onMouseOut="this.className='tmpl_menu';" width="100%" height="10"> 
      <a href="http://www.myhfs.illinois.gov/contactform.html" class="tmpl_menulinks" target="_parent">Contact Us</a> 
    </td>
   </tr>
 
   <tr> 
    <td class="tmpl_menu" onMouseOver="this.className='tmpl_menuover';" onClick="self.location='/TruePassSample/Logout.html'" onMouseOut="this.className='tmpl_menu';" width="100%" height="10"> 
      <a href="/TruePassSample/Logout.html" class="tmpl_menulinks" target="_parent">Logout</a> 
    </td>
   </tr>
 
   <tr> 
    <td class="tmpl_menu" onMouseOver="this.className='tmpl_menuover';" onClick="self.location='http://www.myhfs.illinois.gov/index.html'" onMouseOut="this.className='tmpl_menu';" width="100%" height="10"> 
      <a href="http://www.myhfs.illinois.gov/index.html" class="tmpl_menulinks" target="_parent">myHFS Index</a> 
    </td>
   </tr>
 
  <!-- End myHFS Links -->

  <!-- End State Mini Feature(s) --> 
 
</table>
<!-- end of left navigation set -->

<a name="content"></a>
   <table width="100%" border="0" cellspacing="0" cellpadding="0">
        <tr> 
          <td><img src="images/blank.gif" width="100" height="1" alt=""></td>
        </tr>
      </table>
    </td>
    <td height="400" valign="top">
      <table width="100%" border="0" cellspacing="0" cellpadding="2">
        <tr>
          <td class="tmpl_headerbackgroundcolor" align="left" valign="top">
            <!-- divider start --><table border="0" cellspacing="0" cellpadding="0">
              <tr> 
                <td class="tmpl_tabbackgroundcolor" valign="bottom" height="16"> <h1 class="tmpl_h1"><a class="tmpl_governor">
                   &nbsp;&nbsp;myHFS Login&nbsp; </a></h1></td>
                <td align="left" valign="top" class="tmpl_tabbackgroundcolor" height="16"><img src="images/bd_tabtriangle_hfs.gif" width="15" height="15" alt=""></td>
              </tr>
            </table>
            <table width="100%" border="0" cellspacing="0" cellpadding="0" >
              <tr>
                <td class="tmpl_tabbackgroundcolor"><img src="images/blank.gif" width="1" height="3" alt=""></td>
              </tr>
            </table>
            <!-- divider end -->
          </td>
        </tr>
      </table>
      <table width="82%" border="0" cellpadding="2">
        <tr>
          <td width="2%" height="261">&nbsp;</td>
          <td width="98%"><p align="center">
                                                                        <!-- <hr /><div class="alert alert-info"><big><strong>Sample Notice</strong><br />
                                                                                                                                                                Sample big text<br />
                                                                                                                                                                <strong>Sample big and strong text</strong></big></div><hr />-->


<!-- page content starts here -->
    <form name="aspnetForm" method="post" action="./IdentityGuardLogin.aspx?IGDest=https%3a%2f%2fmedi.hfs.illinois.gov%2fiec%2flogin.do" id="aspnetForm" autocomplete="off">
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="/wEPDwUKLTc2NTA4MjYyMGRk4yXCrZ24+RINZ65oTLr4m9DYPn2lQe0ZcFMMIKn8EKs=" />

<input type="hidden" name="__VIEWSTATEGENERATOR" id="__VIEWSTATEGENERATOR" value="0D89400B" />
<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="/wEdAATF1HbZhB+93/4iavo4aV0sIM4agJ1XYo5kfqR6EKXk8sBCb+I2JiifcGTdjuBfK/gQbMfQdanc+55VabuY26WKyqqAXVOsTMaAJggziEs8ynDLl3P2RxligTjYyeL1xac=" />

                                                                                                                <div id="main">
                                                                                                                        
   <div id="content">
                <p style="margin-top:25px;"><img width="100" height="35" src="images/digitallogo.gif">&nbsp;Please enter your User Name and Password from your state of Illinois Digital ID.</p>
        <p>
                        
                </p>
        <div id="ctl00_cphMain_ucShowForm">
        
            <table class="inputform">
                <tr>
                    <td>
                        <span id="ctl00_cphMain_lblUserID" class="lblDef">Username:</span>
                    </td>
                    <td>
                        <input name="ctl00$cphMain$UserIdTextBox" type="text" id="ctl00_cphMain_UserIdTextBox" class="inputDef" />
                    </td>
                </tr>
                <tr>
                    <td>
                        <span id="ctl00_cphMain_lblPassword" class="lblDef">Password:</span>
                    </td>
                    <td>
                        <input name="ctl00$cphMain$PasswordTextBox" type="password" id="ctl00_cphMain_PasswordTextBox" class="inputDef" />
                    </td>
                </tr>
                                <tr>
                                        <td>&nbsp;</td>
                                        <td>
                                                <input id="soirememberusername" type="checkbox"> Remember user name
                                        </td>
                                </tr>
                <tr>
                    <td class="lblDef">
                    </td>
                    <td>
                        <input type="submit" name="ctl00$cphMain$LoginButton" value="Login" id="ctl00_cphMain_LoginButton" class="continuebutton" />
                    </td>
                </tr>
                <tr>
                    <td>
                    </td>
                    <td>
                        
                    </td>
                </tr>
                                <tr>
                                        <td>&nbsp;</td>
                                        <td>&nbsp;</td>                                 
                </tr>
            </table>
                        <p>If you have forgotten your password or need to change your password, then choose 'Forgot Password'. You may also use this option to recover your password if you have exceeded your login limit.</p>
                        <input type="button" onclick="window.open('https://enroll.pki.illinois.gov/UserRegistration/en_US/Homepage.html','_blank');" value="Forgot Password" />
                        <br /><br />
                        <p>If you do not have a State of Illinois Digital ID and would like to register for one, then choose 'Get a Digital ID'.</p>
                        <p><em>If you are registering for a digital ID for use with the Healthcare and Family Services MEDI website, for security reasons HFS official policy/stance is to not allow access to MEDI for those located outside the United States of America.</em></p>
                        <input type="button" onclick="window.open('https://enroll.pki.illinois.gov/UserRegistration/en_US/Homepage.html','_blank');" value="Get a Digital ID" />
                        <br /><br />
                        <p>If you need to update the information associated with your State of Illinois Digital ID, then choose 'Manage Digital ID'.</p>
                        <input type="button" onclick="window.open('https://accounts.pki.illinois.gov/cms/UserSelfManagement','_blank');" value="Manage Digital ID" />

        
</div>
        
                
        
      
      
   </div>

                                                                                                                </div>
                                                                                                                <div>
                                                                                                                        
    <table class="lbllegal">
        <tr>
            <td class="lbllegal">
                
            </td>
        </tr>
    </table>

                                                                                                                </div>
                                                    </form>
                                                        
<!-- page content ends here -->
        <td height="4"></tr>
      </table>      
    </td>
  </tr>
</table>
<table width="100%" border="0" cellspacing="0" cellpadding="3" >
        <tr> 
                <td width="29%" valign="top"  class="tmpl_headerbackgroundcolor"><strong>Copyright &copy; 2024
                        
                        <a href="http://www.myhfs.illinois.gov/" class="tmpl_footerlink" target="_parent">myHFS</a></strong>
                </td>
 
                <td width="71%"  align="right" class="tmpl_headerbackgroundcolor">
                        <a href="http://www.myhfs.illinois.gov/privacy_policy_myhfs.html" class="tmpl_footerlink">Privacy Information</a>
                        |  
                        <a href="https://www.illinois.gov/iwas/" class="tmpl_footerlink">Web Accessibility</a>
                        <a href="https://www.illinois.gov/" class="tmpl_footerlink" target="_parent"></a>
                        | 
                        <a href="http://www.myhfs.illinois.gov/contactform.html" class="tmpl_footerlink" target="_parent"></a> 
                        <a href="http://www.myhfs.illinois.gov/contactform.html" class="tmpl_footerlink" target="_parent">Webmaster</a> <!-- <small><sub>1</sub></small> -->
                </td>
        </tr>
 
</table>
<!-- Support for remembering usernames -->
<script language="Javascript" src="./Scripts/js/js.cookie.min.js"></script>
<script defer language="Javascript" type="text/javascript" src="./Scripts/js/soi-remember-username.js"></script>
</body>
</html>

The Third Page's HTML (The one I cant get seem to get to) enter image description here

0

There are 0 best solutions below