Examples of words:
- ball
- encyclopedia
- tableau
Examples of random strings:
- qxbogsac
- jgaynj
- rnnfdwpm
Of course it may happen that a random string will actually be a word in some language or look like one. But basically a human being is able to say it something looks 'random' or not, basically just by checking if you are able to pronounce it or not.
I was trying to calculate entropy to distinguish those two but it's far from perfect. Do you have any other ideas, algorithms that works?
There is one important requirement though, I can't use heavy-weight libraries like nltk or use dictionaries. Basically what I need is some simple and quick heuristic that works in most cases.
Caveat I am not a Natural Language Expert
Assuming what ever mentioned in the link If You Can Raed Tihs, You Msut Be Raelly Smrat is authentic, a simple approach would be
Create a python dict of the words, with keys as the first and last character of the words in the dictionary
Now for any given word, search the dictionary (remember key is the first and last character of the word)
Compare if the characters in the value of the dictionary and your needle matches
A comparably slower approach would be to use difflib.get_close_matches(word, possibilities[, n][, cutoff])