remove stopword from a String in asp.net c#

1.1k Views Asked by At

I am having trouble creating code which removes stop words from a string. Here is my code:

String Review="The portfolio is fine except for the fact that the last movement of sonata #6 is missing. What should one expect?";

string[] arrStopword = new string[] {"a", "i", "it", "am", "at", "on", "in", "to", "too", "very","of", "from", "here", "even", "the", "but", "and", "is","my","them", "then", "this", "that", "than", "though", "so", "are"};
StringBuilder sbReview = new StringBuilder(Review);
foreach (string word in arrStopword){
sbReview.Replace(word, "");}
Label1.Text = sbReview.ToString();

when running Label1.Text = "The portfolo s fne except for fct tht lst movement st #6 s mssng. Wht should e expect? "

I expect it must return "portofolio fine except for fact last movement sonata #6 is missing. what should one expect?"

Anybody know how to solve this?

4

There are 4 best solutions below

0
On BEST ANSWER

You could use " a ", " I ", etc to make sure the program only removes those words if they're used as a word (so with spaces around them). Just replace them with a space to keep the formatting as it is.

0
On

The problem is that you are comparing sub strings, not words. You need to split the original text, remove the items and then join it again.

try this

List<string> words = Review.Split(" ").ToList();
foreach(string stopWord in arrStopWord)
    words.Remove(stopWord);
string result = String.Join(" ", words);

The only issue that I can see with this is that it doesnt handle punctiation that well, but you get the general idea.

2
On

You can use LINQ to solve this problem. You first need to convert your string, using Split function, into list of string separated by " "(space), then use Except to get the words which your result will contain and then can apply string.Join

var newString = string.Join(" ", Review.Split(' ').Except(arrStopword));
0
On

Or You can use dotnet-stop-words package. And simply call the RemoveStopWords method

(yourString).RemoveStopWords("en");