Processing USPTO website using Mechanize and python

124 Views Asked by At

All i want is to simply process USPTO trademark site for a simple pattern.

#!/usr/bin/python

import mechanize
import cookielib
br=mechanize.Browser()
cg = cookielib.LWPCookieJar()
br.set_cookiejar(cg);

#br.set_all_readonly(False)
br.set_handle_robots(False)
br.set_handle_refresh(False)
br.addheaders=[('User-agent', 'Firefox')]

response=br.open("http://uspto.gov/trademarks-application-process/search-trademark-database")

tess = 'TESS'
start_search = 'Basic Word Mark Search (New User)'

assert br.viewing_html()
print br.title()

for l in br.links(url_regex='tmsearch'):
        if l.text == tess:
                print l.url;
                break

br.follow_link(l)
newlink=br.geturl()
print newlink

br.open(newlink)
for link in br.links():
        if link.text == start_search:
                print "Found Basic Search"
                print link.text
                print link.url
                break;
**#Why do we need the contactination. Witoug this it doesn't generate a full URL**

newurl="http://tmsearch.uspto.gov" + link.url
print newurl
response1 = br.open(newurl);

print response1.read()

#for form in br.forms():
        #print "Form Name" form.name

Two questions.

  1. Without manually concatenating the prefix, i dont get a full url in this step.
  2. The last end of the program, i get some warning when where it says for form in.
  3. Finally, i want to input some search text in "Search Term", i'm assuming this is a form!, but unable to get to it. and then submit. Next is to followup the table that gets displayed after.
1

There are 1 best solutions below

0
On

Well;

  1. set your http variable to a variable to just pass it in as newurl = oldurl + link.url and you can always do it at the start br.open(oldurl + "w/e goes here")

  2. for i in response1.forms(): print "Form name:", i.name

  3. need to select form, send in text, then click submit.. here is some tips:

    for form in br.forms():
       if form.attrs['id'] == 'search':
       br.form = form
       break
    br["search"] = "text_search"
    br.submit()