I have big problem with replacing some characters in Java. I would like to remove all characters that are not letters, numbers or special national characters such as "ę, ą". When I use the function replaceAll("\W", " ")
special characters are also removed.
Example string: "Jest źle, ale będzie lepiej."
How it's replaced: "Jest le ale b dzie lepiej "
How it should be: "Jest źle ale będzie lepiej "
Sorry for my not very good english :)
Your English is better than Java's Polish. Java's regex does not speak Polish, and so it considers only a..z "national characters" (plus digits and the underscore -- GREP was obviously designed by programmers). That's fair, actuslly: the "normal" character for one language is "weird" for another.
You can sum up the few extra non-ASCII characters in a custom negated character class:
(you should add the other accented characters as well, and perhaps remove non-Polish characters such as Q and X).