How to use HttpRequest in .net to POST in one message?

180 Views Asked by At

I'm trying to crawl a web but can't get past the login using the .net HttpRequest and HttpResponse classes. Using net monitor, it seems a key difference is that the login from a browser includes a payload in the POST message, whereas the HttpRequest sends the payload in a separate message, which gets a 301 response. Is there a way to make it use a single message? Or is there something else I'm missing? I've used this code for another web site, which worked:

// Set GET to logon site.
SiteRequest = (HttpWebRequest)WebRequest.Create(logonUrl);

SiteRequest.Method = "GET";
SiteRequest.AllowAutoRedirect = AllowRedirect;
SiteRequest.CookieContainer = SiteCookieContainer;
SiteRequest.Referer = logonUrl;

SiteResponse = (HttpWebResponse)SiteRequest.GetResponse();
mainStream = SiteResponse.GetResponseStream();
ReadAndIgnoreAllStreamBytes(mainStream);
mainStream.Close();

// Send POST to logon site.
SiteRequest = (HttpWebRequest)WebRequest.Create(postUrl);
SiteRequest.Method = "POST";
SiteRequest.AllowAutoRedirect = AllowRedirect;
SiteRequest.ContentType = "application/x-www-form-urlencoded";
SiteRequest.CookieContainer = SiteCookieContainer;
SiteRequest.CookieContainer.Add(SiteResponse.Cookies);
SiteRequest.Referer = postUrl;
SiteRequest.Timeout = TimeoutMsec;

buffer = Encoding.UTF8.GetBytes(logonPostData);
SiteRequest.ContentLength = buffer.Length;

postStream = SiteRequest.GetRequestStream();
postStream.Write(buffer, 0, buffer.Length);
postStream.Flush();
postStream.Close();

SiteResponse = (HttpWebResponse)SiteRequest.GetResponse();

Using the HtmlWeb class in HtmlAgilityPack has the same issue.

Thanks.

Update:

Turns out I was using the "www.example.com" form of the address, and not "example.com", hence the redirect. But I get a "404" page not found error with the correct address.

Here's what the browser is sending for the post:

- Http: Request, POST /accounts/signin 
    Command: POST
  + URI: /accounts/signin
    ProtocolVersion: HTTP/1.1
    Accept:  text/html, application/xhtml+xml, */*
    Referer:  http://***.com/accounts/signin
    Accept-Language:  en-US,en;q=0.8,zh-Hans-CN;q=0.7,zh-Hans;q=0.5,zh-Hant-TW;q=0.3,zh-Hant;q=0.2
    UserAgent:  Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; WOW64; Trident/6.0; Touch)
  + ContentType:  application/x-www-form-urlencoded
    Accept-Encoding:  gzip, deflate
    Host:  example.com
    ContentLength:  67
    DNT:  1
    Connection:  Keep-Alive
    Cache-Control:  no-cache
  - Cookie:  PHPSESSID=169***efe; lang=en_US; cart=eyJ***wfQ%3D%3D; cartitems=W10%3D; __utma=***; __utmb=***; __utmc=**; __utmz=**
      PHPSESSID: 169***efe
      lang: en_US
      cart: eyJ***wfQ%3D%3D
      cartitems: W10%3D
      __utma: ***
      __utmb: ***
      __utmc: ***
      __utmz: ***

    HeaderEnd: CRLF
  - payload: HttpContentType =  application/x-www-form-urlencoded
     url: 
     email: ***
     password: ***

Here's what I'm sending:

(POST:)

- Http: Request, POST /accounts/signin 
    Command: POST
  + URI: /accounts/signin
    ProtocolVersion: HTTP/1.1
  + ContentType:  application/x-www-form-urlencoded
    Accept:  text/html, application/xhtml+xml, */*
    Accept-Language:  en-US,en;q=0.8,zh-Hans-CN;q=0.7,zh-Hans;q=0.5,zh-Hant-TW;q=0.3,zh-Hant;q=0.2
    Accept-Encoding:  gzip, deflate
    DNT:  1
    Cache-Control:  no-cache
    Referer:  http://***.com/accounts/signin
    Host:  chinesepod.com
  - Cookie:  lang=en_US; cart=eyJ***jowfQ%3D%3D; cartitems=W10%3D; PHPSESSID=944***3e7
      lang: en_US
      cart: eyJ***wfQ%3D%3D
      cartitems: W10%3D
      PHPSESSID: 944***3e7

    ContentLength:  61
    HeaderEnd: CRLF

(separate payload:)

- Http: HTTP Payload, URL: /accounts/signin 
  - payload: HttpContentType =  application/x-www-form-urlencoded
     url: 
     email: ***
     password: ***

The browser version has these __utXX cookies, which I'm assuming the browser adds for some kind of tagging, right? Otherwise the key difference, assuming cookie ordering doesn't matter, is that the payload is sent separately. See anything else amiss?

Thanks.

-John

0

There are 0 best solutions below