With a java ASM agent, how to detect a collection modification in a object?

Question

With a java ASM agent, how to detect a collection modification in a object?

60 Views Asked by Marcelo D. Ré At 01 March 2024 at 15:43

I'am working in a Transparent Dirty Detection Agent (tdd-agent). It work really well redefining the target classes to implement the setDirty()/isDirty() and set it when it detect a putfield, but I want to extend it to detect calls to add() in collections, for example. How could I detect that call?

When the call is made with a complex parameters the generated asm separates the field and the final invocation to the method and I'm not figuring out how to detect and handle it.

For example:

private List<Foo> listaFoo = new ArrayList<>();
public void testNativeCollections(){

        this.listaFoo.add(new Foo("text").setS("otro text"));
        
}

This block generate this asm:

 aload 0 // reference to self
 getfield test/OuterTarget.listaFoo:java.util.List
 new test/Foo
 dup
 ldc "text" (java.lang.String)
 invokespecial test/Foo.<init>(Ljava/lang/String;)V
 ldc "otro text" (java.lang.String)
 invokevirtual test/Foo.setS(Ljava/lang/String;)Ltest/Foo;
 invokeinterface java/util/List.add(Ljava/lang/Object;)Z
 pop

and I need to pair the first getfield with the last invoke interface to detect that this call will modify the object and after it insert a call to setDirty(). The main issue is that the code between the getfield/invoke could be arbitrary long and complex.

Original Q&A

There are 2 best solutions below

**rzwitserloot** · Answer 1 · 2024-03-01T15:57:02.923000

You'd have to build java's own internal (as in, its private API, but, java is open source, there's that) static stackwalk analysis code, which is extremely complicated. One upside is that bytecode gen has been changed to be more palatable to it (e.g. RET is no longer being generated by javac, mostly for this reason).

You need to map each and every opcode that exists (and there a lot) to the effect it has on the stack. For example, ALOAD_0 (which ASM at least helpfully flattens; ALOAD_0, ALOAD_1, ALOAD_2, ALOAD_3, and ALOAD constant are all flattened into just aload which helps a bit) has the effect of popping nothing and pushing 1 object onto the stack.

Doing that analysis:

          // stack at 0
aload 0   // stack at 1
getfield  // pops 1, then pushes value. stack at 1, remember: stack[0] = listaFoo
new test/Foo // stack at 2
dup          // stack at 3
ldc "text" (java.lang.String) //stack at 4
invokespecial test/Foo.<init>(Ljava/lang/String;)V // stack at 2
ldc "otro text" (java.lang.String) // stack at 3
invokevirtual test/Foo.setS(Ljava/lang/String;)Ltest/Foo; // stack at 2
invokeinterface java/util/List.add(Ljava/lang/Object;)Z // stack at 1
pop

All the flavours of invoke require analysing the signature to know what impact it has on stack (-1 stack for every argument it has, and all but invokestatic an additional -1 stack for the implicit receiver argument - then +1 stack if its return type is anything but V.

Here your analyser should be capable of registering that that last invokeinterface is the one that pops the field you are interested in off the stack as receiver of an add method, thus, qualifying it for a 'dirty' flag.

I don't think a library exists that is public and maintained in a way that it is intended for analysis like this, but, I haven't looked all that much - perhaps now you know what to look for.

The JVM does this kind of analysis itself when loading a class. For example, if you have this bytecode:

ALOAD_0
DUP
GETFIELD someIntField
IADD

The verifier will abort with a VerifyError and this entire class will never even be loaded: It analysed that that IADD instruction, which pops 2 things off the stack and either [A] adds them if they are both ints and pushes that back on, or [B] all hell breaks loose if they aren't both int values - and it analysis that this is a B situation, because what's on the stack is this and an int - not 2 ints.

How does it do that? By applying the same principle: Analyse every instruction and keep track of what that does to the stack. In its case, it registers the type of each thing on the stack (ALOAD_0 - okay, there is an object in stack[0]. DUP - okay, there is an object in stack[1]. GETFIELD someIntField - okay, there is now an int in stack[1]. IADD - okay, check stack[current] and stack[current-1]'s types? Uhoh, one of em aint int - VerifyError!

You'd do the same thing, except instead of tracking 'int', 'object', etcetera, you track specifically: "Ah, field this-or-that", and once you hit an INVOKEVIRTUAL to a method sig you know 'dirties' its receiver (such as j/u/List.add(...)Z), you can use your stack analysis to know exactly what it is dirtying.

**Marcelo D. Ré** · Answer 2 · 2024-03-22T10:47:01.370000

And finally I got it! As I said, one step at time.

The trick was to use the AnalyzerAdapter class and keep a track about the name of the field in the stack so at any time you can know the field name who is referenced.

Here is the main code:

/**
 *
 * @author Marcelo D. Ré {@literal <[email protected]>}
 */
public class WriteAccessActivatorAdapter extends AnalyzerAdapter implements ITransparentDirtyDetectorDef, IJavaCollections {

    private final static Logger LOGGER = Logger.getLogger(WriteAccessActivatorAdapter.class.getName());
    private boolean activate = false;
    private String owner;
    private List<String> ignoreFields;
    private List<String> collectionFields;
    
    HashSet<String> lastCollectionModifiedFields = new HashSet<>();
    
    // mapea la posición de la pila con el nombre del campo asociado
    private Map<String,String> stackToField = new HashMap<>(); 
    
    static {
        if (LOGGER.getLevel() == null) {
            LOGGER.setLevel(LogginProperties.WriteAccessActivatorAdapter);
        }
    }
    
    public WriteAccessActivatorAdapter(int api,
                                       String owner, 
                                       int access, 
                                       String name, 
                                       String descriptor, 
                                       MethodVisitor methodVisitor, 
                                       List<String> ignoreFields, 
                                       List<String> collectionFields
                                      ) {
        super(api, owner, access, name, descriptor, methodVisitor);
        this.ignoreFields = ignoreFields;
        this.collectionFields = collectionFields;
        this.owner = owner;
    }

    
    
    /**
     * Add a call to setDirty in every method that has a PUTFIELD in its code.
     * @param opcode código a analizar
     */
    @Override
    public synchronized void visitInsn(int opcode) {
        LOGGER.log(Level.FINEST, "Activate: {0} - opcode: {1} ", new Object[]{this.activate,Printer.OPCODES[opcode]});
        // analizar las listas
        if ((this.activate)&&((opcode >= Opcodes.IRETURN && opcode <= Opcodes.RETURN) 
                               || opcode == Opcodes.ATHROW 
                               )) {
            // si hay colleciones agregadas, incluirlas como dirty antes de retornar. 
            if (lastCollectionModifiedFields.size()>0) {
                insertDirtyCollectionsFields();
                lastCollectionModifiedFields.clear();
            }
            
            LOGGER.log(Level.FINEST, "Agregando llamada a setDirty...");
            mv.visitVarInsn(Opcodes.ALOAD, 0);
//            mv.visitInsn(Opcodes.ICONST_1);
            mv.visitMethodInsn(Opcodes.INVOKEVIRTUAL, owner, SETDIRTY, "()V", false);
            //mv.visitFieldInsn(Opcodes.PUTFIELD, owner, "__ogm__dirtyMark", "Z");
        } 
        super.visitInsn(opcode);

        LOGGER.log(Level.FINEST, "fin --------------------------------------------------");
    }

    @Override
    public synchronized void visitFieldInsn(int opcode, String owner, String name, String desc) {
        super.visitFieldInsn(opcode, owner, name, desc);
        
        LOGGER.log(Level.FINEST, "opcode: {0} - owner: {1} - name: {2} - desc: {3} - transient: {4}", new Object[]{Printer.OPCODES[opcode], owner, name, desc, ignoreFields.contains(name)});
        printStack(); 
        if ((opcode ==Opcodes.GETFIELD)||(opcode == Opcodes.GETSTATIC)) {
            // si se está accediendo a un field, preservar el nombre para futuras referencias.
            this.stackToField.put(""+(this.stack==null? 0:(this.stack.size()-1)),name);
//            this.owner = owner;
        }
        if ((opcode == Opcodes.PUTFIELD) && (!ignoreFields.contains(name))) {
            LOGGER.log(Level.FINEST, "Modificación detectada!! Agregar el campo \"{0}\" a la lista.",new Object[]{name});
            
            this.activate = true;
//            this.owner = owner;
            
            printStack(); 
            
            insertDirtyField(name);
            
        } 
        LOGGER.log(Level.FINEST, "fin --------------------------------------------------");
    }

    @Override
    public void visitMethodInsn(int opcode, String owner, String name, String descriptor, boolean isInterface) {
        LOGGER.log(Level.FINEST, "opcode: {0} - owner: {1} - name: {2} - desc: {3} - isInterface: {4}", new Object[]{Printer.OPCODES[opcode], owner, name, descriptor, isInterface});
        printStack();
        // si el método coincide con una de las clases y métodos a monitorear, revisar el stack para verificar
        // que el campo sea un field.
        LOGGER.log(Level.FINEST, "activable object?: "+getJavaCollections().contains("L"+owner+";")
                                     +" - method: "+ name + "> activable? : " +getJavaCollectionsDirtyMethods().contains(name) );
        if ((getJavaCollections().contains("L"+owner+";")) && (getJavaCollectionsDirtyMethods().contains(name))) {
            // calcular la posición de la pila a acceder
            int stackOffset = descriptor.equals("()V")?0:descriptor.substring(1, descriptor.indexOf(")"))
                                                      .split(";").length;
            int stackIdx = this.stack == null ? 0 : this.stack.size() - 1 - stackOffset;
            String field = this.stackToField.get(""+stackIdx);
            
            LOGGER.log(Level.FINEST, "modificación de una colección detectada! stack idx: "+stackIdx+" field: "+field);
            if (this.collectionFields.contains(field)) {
                lastCollectionModifiedFields.add(field);
                this.activate = true;
//                this.owner = owner;
            } 
        }
        super.visitMethodInsn(opcode, owner, name, descriptor, isInterface);
        
    }

    @Override
    public void visitInvokeDynamicInsn(String name, String descriptor, Handle bootstrapMethodHandle, Object... bootstrapMethodArguments) {
        super.visitInvokeDynamicInsn(name, descriptor, bootstrapMethodHandle, bootstrapMethodArguments); 
        LOGGER.log(Level.FINEST, "\n\n\n\n\nname: "+name+" - desc: "+descriptor+"   bs: "+ Arrays.toString(bootstrapMethodArguments));
        printStack();
        
        for (Object bsMthArg : bootstrapMethodArguments) {
            String bsMth = bsMthArg.toString();
            int dot = bsMth.indexOf('.');
            int bracket = bsMth.indexOf("(");
            if (dot > 0 && bracket > 0) {
                String cls = "L"+bsMth.substring(0, dot)+";";
                String mth = bsMth.substring(dot+1, bracket);
                LOGGER.log(Level.FINEST, "cls: "+cls + "   -   method: "+mth);

                if (getJavaCollections().contains(cls) && getJavaCollectionsDirtyMethods().contains(mth)) {
                    int stackIdx = this.stack.size() - 1 ;
                    String field = this.stackToField.get(""+stackIdx);
                    LOGGER.log(Level.FINEST, "modificación de una colección detectada! stack idx: "+stackIdx+" field: "+field);
                    if (this.collectionFields.contains(field)) {
                        lastCollectionModifiedFields.add(field);
                        this.activate = true;
                    } 
                }
            }
        }
        
        LOGGER.log(Level.FINEST, "\n\n\n\n\n");
        
    }

    @Override
    public void visitLabel(Label label) {
        LOGGER.log(Level.FINEST, "Label: "+label);
        if (lastCollectionModifiedFields.size()>0){
            // si se ha agregado un collectionModifiedField, instrumentar add del campo
            LOGGER.log(Level.FINEST, "Modificaciones detectadas!! Agregar los campos a la lista.");
            printStack();
            insertDirtyCollectionsFields();
            
            // resetear el campo
            lastCollectionModifiedFields.clear();
            LOGGER.log(Level.FINEST, " --------------------------------------------------");
        }
        super.visitLabel(label); 
    }

    @Override
    public void visitJumpInsn(int opcode, Label label) {
        if (this.activate && opcode == Opcodes.GOTO) {
            // si hay colleciones agregadas, incluirlas como dirty antes de retornar. 
            if (lastCollectionModifiedFields.size()>0) {
                insertDirtyCollectionsFields();
                lastCollectionModifiedFields.clear();
            }
        } 
        super.visitJumpInsn(opcode, label); 
    }
    
    
    
    
    @Override
    public void visitEnd() {
        LOGGER.log(Level.FINEST, "fin MethodVisitor -------------------------------------");
//        mv.visitMaxs(0, 0);
        super.visitEnd();
    }
 
    private void printStack() {
        if (LOGGER.isLoggable(Level.FINEST)) {
            if (this.stack != null) {
                System.out.println("stack size:"+this.stack.size());

                for (int i = 0; i < this.stack.size(); i++) {
                    Object o = this.stack.get(i);
                    System.out.println(""+o.getClass()+" :  "+o + " --> "+ this.stackToField.get(""+i));
                }
                System.out.println("--------------");
            } else {
                System.out.println("stack size: NULL <<<<<<<<<<<<<<<<<<<<<<< ");
            }
        }
    }
    
    /**
     * Insert all field registered in the lastCollectionFields hashset.
     */
    private void insertDirtyCollectionsFields() {
        for (String lastCollectionModifiedField : lastCollectionModifiedFields) {
            mv.visitVarInsn(Opcodes.ALOAD, 0);
            mv.visitFieldInsn(Opcodes.GETFIELD, owner, MODIFIEDFIELDS, "Ljava/util/Set;");
            mv.visitLdcInsn(lastCollectionModifiedField);
            mv.visitMethodInsn(Opcodes.INVOKEINTERFACE, "java/util/Set", "add", "(Ljava/lang/Object;)Z", true);
            mv.visitInsn(Opcodes.POP); // Descartar el resultado booleano de add
        }
    }
    
    private void insertDirtyField(String name) {
        mv.visitVarInsn(Opcodes.ALOAD, 0);
        mv.visitFieldInsn(Opcodes.GETFIELD, owner, MODIFIEDFIELDS, "Ljava/util/Set;");
        mv.visitLdcInsn(name);
        mv.visitMethodInsn(Opcodes.INVOKEINTERFACE, "java/util/Set", "add", "(Ljava/lang/Object;)Z", true);
        mv.visitInsn(Opcodes.POP); // Descartar el resultado booleano de add
    }
}

Of course, there's a lot of situation that it will not work. You must respect the "Tell, don`t ask" rule and "Law of Demeter" !!!

The full code is at github.

With a java ASM agent, how to detect a collection modification in a object?

There are 2 best solutions below

Related Questions in JAVA

Related Questions in JAVA-BYTECODE-ASM

Trending Questions

Popular # Hahtags

Popular Questions