I'm trying to find similar articles in database via correlation.
So i split text in array of words, then delete frequently used words (articles,pronouns and so on), then compare two text with pearson coefficient function. For some text it's works but for other it's not so good(texts with large text have higher coefficient).
Can somebody advice a good method to find related texts?
Some of the problems you mention boild down to normalizing over document length and overall word frequency. Try tf-idf.