TEXT - How to break a paragraph into smaller sentence (No indication of End of Sentence)

172 Views Asked by At

I am using both R and Python and trying to learn Text based analytics and NLP to some extent.

Question: How do I split a sentence which is a mix of sentences like below

Sentence = I like the application i like the system i do not like the process being followed.

I want to split this sentence into

  1. I like the application
  2. i like the system
  3. i do not like the process being followed

Note: I am able to split a sentence like below as it has a . to indicate end of a sentence

Sentence = I like the application. I like the system. I do not like the process being followed.

Vj

1

There are 1 best solutions below

2
On

I can propose an approach that can help you, since you don't have sentence delimiter, you can proceed as follow:

  • Apply a syntactic analyzing to extract the syntactic nature of the paragraph.

    Example: I like the application i like the system i do not like the process being followed

    will produce: PP VB DT NN...

    To extract the syntactic analyzing I recommend to use Stanford Parser.

    PP: Personal Pronoun

    VB: VerB

    DT: DeTerminer

    NN: NouN

    You can see that a sentence has a syntactic pattern that can be used to split a paragraph into sentences.

  • Build a model of possible syntactic tree of a sentence. By saying a model I mean a file/database that contains syntactic build of sentences.

    Example: a model can contain the following lines:

    PP VB DT NN --> (I eat an apple)

    VB ADJ NN --> (create new methods)

    To construct your model you can analyze many sentences (the larger is your set of sentences the more accurate will be your system). You can use a corpus built by your own self.

  • Once you have build your model, you can start writing your program. The main lines of your algorithm will be:

    1- Receive the input paragraph (as an input or file).

    2- Apply Stanford Parser to produce the syntactic tree of the paragraph.

    3- Start splitting your paragraph based on on the comparison of parts of the paragraph with previously constructed syntactic tree (your sentences model --> your pattern)

    You will need to measure the similarity of a part of the paragraph with a sentence-model.

I tried to give you an idea/approach on how to do what you want to do.

Probably you will need to work with NLTK (Natural Language Toolkit).