I have a simple program to process English text with spacy and output some of the info about the tokens. For a big text it takes a long time for spacy to process it. Is there a way to see how far the processing has progressed ideally as a percentage? I'm not using my own models, just ones provided by spacy.
import spacy
// load big text file into `text` variable
nlp = spacy.load("en_core_web_sm")
nlp.max_length = len(text)+1
doc = nlp(text)
// output info
In general I would not advice to parse the entire text as one big blob of text and instead try to split it into smaller paragraphs first.
For example, you can split at every
\n\nfirst.Then you can hand multiple documents to SpaCy at once using
nlp.pipe(), which you can use a tqdm progress bar on.Alternatively, you can create batches within the document and then concatenate the results.