Can I monitor progress of spacy parsing?

29 Views Asked by At

I have a simple program to process English text with spacy and output some of the info about the tokens. For a big text it takes a long time for spacy to process it. Is there a way to see how far the processing has progressed ideally as a percentage? I'm not using my own models, just ones provided by spacy.

import spacy

// load big text file into `text` variable

nlp = spacy.load("en_core_web_sm")
nlp.max_length = len(text)+1
doc = nlp(text)

// output info
1

There are 1 best solutions below

1
ewz93 On

In general I would not advice to parse the entire text as one big blob of text and instead try to split it into smaller paragraphs first.

For example, you can split at every \n\n first.

Then you can hand multiple documents to SpaCy at once using nlp.pipe(), which you can use a tqdm progress bar on.

Alternatively, you can create batches within the document and then concatenate the results.