This is a general question about whether it is possible, and if so how, to automate the download of a scribd.com search result document.
Scenario:
I have a Scribd account and find a document I want. I normally I then have to click the download button to start the download.
Any ideas for automating this? I'm using the scribd api and python to automatically extract document IDs based on automated queries, but once I get the doc_id's I have to physically go to each doc page and click the download button to get the physical txt/pdf file. I want to automate this step as well.
Any Ideas?
                        
Looking at the
python-scribddocumentation or thescribdAPI reference, any object that can give you a document ID or website URL can also give you a download URL. Or, if you already have a document ID, you can just callgetto get an object that can give you a download URL.Most likely, you've got a
Documentobject, which has this method:So, wherever you're calling
get_scribd_url, just callget_download_url.And then, to download the result, Python has
urllib2(2.x) orurllib.request(3.x) built into the standard library, or you can userequestsor any other third-party library instead.Putting it all together as an example:
Presumably you're going to want to use something like
user.findinstead ofuser.all. Or, if you've already written the code that gets the document IDs and don't want to change it, you can useuser.getwith each one.And if you want to post-filter the results, you probably want to use attributes beyond the basic ones (or you would have just passed them to the query), which means you need to call
loadon each document before you can access them (so adddocument.load()at the top of theis_document_i_wantfunction). But really, there's nothing complicated here.