Getting all records in a set using the Sickle package

1k Views Asked by At

How can I access all the records in each set using Sickle?

I can access sets like this, but I don't know how to go from here and download each record from every set:

from sickle import Sickle

sickle = Sickle('http://www.duo.uio.no/oai/request')
    sets = sickle.ListSets()
    for s in sets:
        print s

The print prints out every set like this:

<set xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><setSpec>com_10852_1</setSpec><setName>Det matematisk-naturvitenskapelige fakultet</setName></set>

I can also iterate through the sets to go deeper:

for s in sets:
    for rec in sets:
        print rec

This prints all the sub-sets, so it's probably from here I can get access to the individual records, but the API is hard to understand, and I have not be able to access the records.

1

There are 1 best solutions below

0
On BEST ANSWER

Be sure to read the short and sweet Tutorial.

For harvesting an entire OAI-PMH repository, you do not need to iterate over sets. Here is the complete code:

from sickle import Sickle

sickle = Sickle('http://www.duo.uio.no/oai/request')
recs = sickle.ListRecords(metadataPrefix="oai_dc")
for r in recs:
    print r

If for some reason you really wish to harvest records set by set, you can certainly do so. Here is the complete code again:

from sickle import Sickle

sickle = Sickle('http://www.duo.uio.no/oai/request')
sets = sickle.ListSets()
for s in sets:
    recs = sickle.ListRecords(metadataPrefix="oai_dc", set=s.setSpec)
    for r in recs:
        print r