how to generate test data based on java object model?

307 Views Asked by At

Let's assume that there is a Java class object model in which all fields are annotated with JSR-303 constraints. Something like this:

@SuperBuilder
@ToString
@EqualsAndHashCode
public class Address {
    @Size(min = 1, max = 20)
    @NotNull
    public final String line1;
    @Size(min = 1, max = 20)
    public final String line2;
    @Size(min = 1, max = 20)
    public final String city;
    @Size(min = 1, max = 20)
    public final String country;
    @Size(min = 1, max = 20)
    public final String zipCode;
}

and maybe few tens more like this:

@SuperBuilder
@ToString
@EqualsAndHashCode
public class Person {
    @NotNull
    @Valid
    public final Phone phone;
    @NotNull
    @Valid
    public final Address address;
}

Manually writing and maintaing test data for these is extremely tedious.

I believe that this manual process can be automated, let's say:

  • minimum values: null for nullable fields, minimum value for numeric fields, minimum length for string/collection fields, and false for booleans
  • maximum values: not null for nullable fields, maximum value for numeric fields, maximum length for string/collection fields, and true for booleans
  • below minimum values: null for nullable fields... you get the rest :)
  • above maximum values

Then the programmer should be able to call:

SomeMagicalGenerator.for(Person.class)
  .generateMaximum() // returns a Stream<Person>
  .distinct()
  .limit(10)

and use it as TestNG DataProvider for example.

At first Instancio looked promising :) Checked it here: https://github.com/adrian-herscu/instancio-experiment

But... I failed to find a way to make it generate minimum and maximum values accoding to the annotations. In my dream if that would work, then making it generate out-range values should be workable too.

Any ideas? Suggestions? Do I miss something?

2

There are 2 best solutions below

2
On

If you enable the Keys.BEAN_VALIDATION_ENABLED setting, then Instancio will generate valid objects using the constraints.

Person person = Instancio.of(Person.class)
    .withSettings(Settings.create().set(Keys.BEAN_VALIDATION_ENABLED, true))
    .create();

See: https://www.instancio.org/user-guide/#bean-validation

5
On

I hope I understand your intention correctly... two options for interpretation:

To me this sounds dangerously close to test the implementation of the annotations, rather than your own code.

I don't expect to learn too much from this kind of tests. If the annotations were implemented manually: Sure - validate that the validation is correct. But this way, personally, I'd skip writing those tests.

What would happen, when such a test fails? My expectation is: If a test fails, the fix is to correct the annotation. Which might risk any external component that relies on the original annotation to break. But in general, you're just testing that you have used the "correct" annotation - this is something that a simple review can do (after all, you review your test for the correct assumptions as well, right? this way just just have the same review to do twice: For the annotation and the test)

Another option:

If you want to generate random objects: Are those repeatable for each test? Is the data expected to be completely random? You use the example of an address: Certainly, "aiwjelfkmjlsk" does not sound like a description for any point on the planet, though "krk" actually is. If you really just want to validate static criteria like length and min/max, you're right. But if there's any semantics to the data, you haven't won anything.

If the random values are not repeatable, you might have random test failures, with the next run succeeding due to other values being generated.

Also there's value in explicit input values for tests. I recall writing a test feeding random data into a "credit card transfer money" web application (20y ago), literally by "sitting on the keyboard" to generate data. Imagine my surprise to discover that the application was happy to pay me "3e3" money (aka 3000) instead of rejecting this as not a number. Having such a value explicitly in a test has more value than a stream of random nonreproducible data. And if it's reproducible randomness, you might not even get to such values, which provide a tremendous documentation value for your application.