I'm using Kotest data generators for tests which are pretty flexible and allow to do almost everything. However, the String generators are very technical and it's tough to generate real world text String with them.
For example, generating Strings with printable ASCII characters ( to ~) is pretty far real world use cases even from real world ASCII input since there're no newlines and tabs included. In the real real world all sorts for UTF-8 characters can be created in browsers with various language settings.
There's the stringPattern generator in Kotest, but it uses RxGen 1.4 which does not yet support generation based on Character classes (release 1.5 is pending). Otherwise I'd say [\p{Punct}]|[\p{Graph}]|[\p{Print}]|[\p{Blank}] is my idea, but I have no idea about Unicode character classes and I feel that an existing solution to the problem is way better than figuring this out myself.
I'm using Kotest 5.8.0 in a Kotlin 1.9 project.
If Lorem Ipsum can pass as real world text for you, this is easy to use:
Source: https://github.com/mdeanda/lorem