Convert Grobid curl command to requests in Python

578 Views Asked by At

I'm trying to convert curl script to parse pdf file from grobid server to requests in Python.

Basically, if I run the grobid server as follows,

./gradlew run 

I can use the following curl to get the output of parsed XML of an academic paper example.pdf as below

curl -v --form [email protected] localhost:8070/api/processHeaderDocument

However, I don't know the way to convert this script into Python. Here is my attempt to use requests:

GROBID_URL = 'http://localhost:8070'
url = '%s/processHeaderDocument' % GROBID_URL
pdf = 'example.pdf'
xml = requests.post(url, files=[pdf]).text
2

There are 2 best solutions below

0
titipata On

I got the answer. Basically, I missed api in the GROBID_URL and also the input files should be a dictionary instead of a list.

GROBID_URL = 'http://localhost:8070'
url = '%s/api/processHeaderDocument' % GROBID_URL
pdf = 'example.pdf'
xml = requests.post(url, files={'input': open(pdf, 'rb')}).text
0
Wolfgang Fahl On

Here is an example bash script from http://ceur-ws.bitplan.com/index.php/Grobid. Please note that there is also a ready to use python client available. See https://github.com/kermitt2/grobid_client_python

#!/bin/bash
# WF 2020-08-04
# call grobid service with paper from ceur-ws
v=2644
p=44
vol=Vol-$v
pdf=paper$p.pdf
if [ ! -f $pdf ]
then
  wget http://ceur-ws.org/$vol/$pdf
else
  echo "paper $p from volume $v already downloaded" 
fi
curl -v --form input=@./$pdf http://grobid.bitplan.com/api/processFulltextDocument > $p.tei