How to process text from gutenberg project?

38 Views Asked by FrostyGuy At 31 July 2025 at 03:10

I'm using C#.I was given a task which is that I need to process txt files of books from project gutenberg here is an excerpt from that task

Each file you should parse to: Sentences; Words; Punctuation. For each file you should generate a new file. The name of that file is the name of the book. In each of those file you should have: Longest sentence by number of characters; Shortest sentence by numbers of words; Longest word; Most common letter; Words sorted by the number of uses in descending order;" How do I omit tables of contents, chapter titles, and other non-sentence elements ? It uses stanford nlp to separate sentences into words

I installed stanford nlp, except that it often treats tables of contents , chapter titles and other phrases that are not actual sentences as sentences.

Original Q&A

How to process text from gutenberg project?

There are 0 best solutions below

Related Questions in C#

Related Questions in NLP

Related Questions in PROJECT-GUTENBERG

Trending Questions

Popular # Hahtags

Popular Questions