Let's say I have a chunked corpus like below, and it is saved in a file called test.txt
[Rapunzel/NNP] let/VBD down/RP [her/PP$ long/JJ golden/JJ hair/NN]
then I can load it with ChunkedCorpusReader.
>>> from nltk.corpus.reader import ChunkedCorpusReader
>>> reader = ChunkedCorpusReader('.','test.txt')
>>> reader.chunked_sents()[0]
Tree('S', [Tree('NP', [('Rapunzel', 'NNP')]), ('let', 'VBD'), ('down', 'RP'), Tree('NP', [('her', 'PP$'), ('long', 'JJ'), ('golden', 'JJ'), ('hair', 'NN')])])
>>> print(reader.chunked_sents()[0])
(S
(NP Rapunzel/NNP)
let/VBD
down/RP
(NP her/PP$ long/JJ golden/JJ hair/NN))
and I made some change on the Tree object, say, switched the chunk tag from NP to NPP and called new
.
>>> print(new)
(S
(NPP Rapunzel/NNP)
let/VBD
down/RP
(NPP her/PP$ long/JJ golden/JJ hair/NN))
and Now I want to do is save this new
Tree in a file and load it with ChunkedCorpusReader or any other readers, as I did with test.txt
. However, I couldn't find a way to save NLTK Tree object in a file, and moreover, read it from a file. Anyone can help?
The default conversion to string, which
print
gave you, is not bad: It merges words with POS tags, and indents new lines properly. Sincefile.write()
doesn't automatically convert to string, you must passstr(newtree)
to the file'swrite
method.For more control over the appearance of the tree's string representation, use the tree method
pformat()
. Note thatTree.pformat()
was calledTree.pprint()
in earlier versions of the nltk; in the latest version,Tree.pformat()
returns a string whileTree.pprint()
writes to stdout.If you want your tree to be delimited by square brackets, add the option
parens="[]"
topformat()
.