Is there a way to convert multiple spacy docs to one conllu file in Python?

40 Views Asked by At

I want to parse sentences with a spacy pipeline and then convert the docs into a single conllu file. But with

texts = ["First sentence.", "Second sentence.", "Third sentence."]

nlp = init_parser(language,
                    parser,
                    include_headers=True)

docs = list(nlp.pipe(texts))

I get multiple docs which I could convert to multiple conllu files with

for doc in docs:
    conll = doc._.conll_str

But I want a single file.

If I merge the docs to one doc with

from spacy.tokens import Doc

concat_doc = Doc.from_docs(docs)
conll = doc._.conll_str

I get the following error: UserWarning: [W101] Skipping Doc custom extension 'conll_str' while merging docs.

which results in TypeError: write() argument must be str, not None when I want to write conll to a file.

If I loop through the docs and append them to a file, every sentence will get 1 as the sent_ID, which I don't want either.

Does anyone have an idea how I could manage to parse the sentences and write them to a single conllu file? Thank you very much.

0

There are 0 best solutions below