I'am working in a Transparent Dirty Detection Agent (tdd-agent). It work really well redefining the target classes to implement the setDirty()/isDirty() and set it when it detect a putfield, but I want to extend it to detect calls to add() in collections, for example. How could I detect that call?
When the call is made with a complex parameters the generated asm separates the field and the final invocation to the method and I'm not figuring out how to detect and handle it.
For example:
private List<Foo> listaFoo = new ArrayList<>();
public void testNativeCollections(){
this.listaFoo.add(new Foo("text").setS("otro text"));
}
This block generate this asm:
aload 0 // reference to self
getfield test/OuterTarget.listaFoo:java.util.List
new test/Foo
dup
ldc "text" (java.lang.String)
invokespecial test/Foo.<init>(Ljava/lang/String;)V
ldc "otro text" (java.lang.String)
invokevirtual test/Foo.setS(Ljava/lang/String;)Ltest/Foo;
invokeinterface java/util/List.add(Ljava/lang/Object;)Z
pop
and I need to pair the first getfield with the last invoke interface to detect that this call will modify the object and after it insert a call to setDirty(). The main issue is that the code between the getfield/invoke could be arbitrary long and complex.
You'd have to build java's own internal (as in, its private API, but, java is open source, there's that) static stackwalk analysis code, which is extremely complicated. One upside is that bytecode gen has been changed to be more palatable to it (e.g.
RETis no longer being generated byjavac, mostly for this reason).You need to map each and every opcode that exists (and there a lot) to the effect it has on the stack. For example,
ALOAD_0(which ASM at least helpfully flattens;ALOAD_0,ALOAD_1,ALOAD_2,ALOAD_3, andALOAD constantare all flattened into justaloadwhich helps a bit) has the effect of popping nothing and pushing 1 object onto the stack.Doing that analysis:
All the flavours of
invokerequire analysing the signature to know what impact it has on stack (-1 stack for every argument it has, and all butinvokestatican additional -1 stack for the implicit receiver argument - then +1 stack if its return type is anything butV.Here your analyser should be capable of registering that that last
invokeinterfaceis the one that pops the field you are interested in off the stack as receiver of anaddmethod, thus, qualifying it for a 'dirty' flag.I don't think a library exists that is public and maintained in a way that it is intended for analysis like this, but, I haven't looked all that much - perhaps now you know what to look for.
The JVM does this kind of analysis itself when loading a class. For example, if you have this bytecode:
The verifier will abort with a
VerifyErrorand this entire class will never even be loaded: It analysed that thatIADDinstruction, which pops 2 things off the stack and either [A] adds them if they are both ints and pushes that back on, or [B] all hell breaks loose if they aren't both int values - and it analysis that this is a B situation, because what's on the stack isthisand an int - not 2 ints.How does it do that? By applying the same principle: Analyse every instruction and keep track of what that does to the stack. In its case, it registers the type of each thing on the stack (
ALOAD_0- okay, there is an object instack[0].DUP- okay, there is an object instack[1].GETFIELD someIntField- okay, there is now an int instack[1].IADD- okay, checkstack[current]andstack[current-1]'s types? Uhoh, one of em aintint- VerifyError!You'd do the same thing, except instead of tracking 'int', 'object', etcetera, you track specifically: "Ah, field this-or-that", and once you hit an
INVOKEVIRTUALto a method sig you know 'dirties' its receiver (such asj/u/List.add(...)Z), you can use your stack analysis to know exactly what it is dirtying.