Java agents, Javassist and Byte Buddy

November 04, 2017

Page content

Java Virtual Machine (JVM) is a really great platform, mature, and well-established. Apart from lots of normal features used by all developers, there are some that are more low-level, designed to serve more “system” or “tooling” purposes. One example is sun.misc.Unsafe, which gives e.g. low-level access to memory. Another such feature is agents. Agents are the JVM mechanism that enables external code to integrate with a running JVM, including access to the bytecode before it’s loaded and executed.

In this post:

The basics of Java agents and Instrumentation API.
An example agent for metrics collection.
Libraries for bytecode manipulation: Javassist and Byte Buddy.

Agents were introduced in Java 1.5, but programming them is rarely a part of JVM programmer’s everyday job. However, having JVM code in production means a high probability of using some agents. I’d like to mention several widely used classes of them:

JMX-HTTP bridges, e.g. Jolokia, which gives access to JXM MBeans over HTTP (very useful for monitoring);
profilers, e.g. YourKit or JProfiler;
debuggers, namely Java Debug Wire Protocol (JDWP) agent;
aspect-oriented programming toolkits, AspectJ in particular;
hot code reloading tools like JRebel, which are especially useful in Java EE environment.

There are two types of agents in JVM: Java agents and native agents. Java agents are in JVM bytecode (i.e. written in one of JVM languages, most commonly in Java) packed in JARs and follow special code organisation convention so as JVM could use them. Native agents are different: they’re in native code (most commonly compiled from C++) packed in dynamic libraries, use JVM Tool Interface, and operate on a more low level than Java agents. Particularly, they can affect the garbage collector, thread management, locking and synchronisation, etc. Profilers and debuggers use native agents.

In this post, we’re focusing solely on Java agents.

The basics of Java agents

Java agents themselves are very simple. Let’s consider this piece of code:

package me.ivanyu.javaagentsdemo;

import java.lang.instrument.Instrumentation;

public class SimplestAgent {
    /**
     * If the agent is attached to a JVM on the start,
     * this method is invoked before {@code main} method is called.
     *
     * @param agentArgs Agent command line arguments.
     * @param inst      An object to access the JVM instrumentation mechanism.
     */
    public static void premain(final String agentArgs,
                               final Instrumentation inst) {
        System.out.println(
            "Hey, look: I'm instrumenting a freshly started JVM!");
    }

    /**
     * If the agent is attached to an already running JVM,
     * this method is invoked.
     *
     * @param agentArgs Agent command line arguments.
     * @param inst      An object to access the JVM instrumentation mechanism.
     */
    public static void agentmain(final String agentArgs,
                                 final Instrumentation inst) {
        System.out.println("Hey, look: I'm instrumenting a running JVM!");
    }
}

The agent class doesn’t extend any class or implements any interface—everything is by convention. There are two static methods: premain and agentmain, their difference is explained in the Javadoc comments. An agent can have both of them or only one. Only one of these methods is executed, it depends on the time when the agent is attached to the JVM.

The Instrumentation object allows the agent to instrument programs running on JVM, namely gives access to classes bytecode and some other features. We will cover it a bit deeper later.

Let’s just describe several operations that can be performed by the agent:

Transformation of the classes’ bytecode when they are being loaded. It happens when the agent registers one or more ClassFileTransformers.
Complete Redefinition of the bytecode of particular classes, which are already loaded in the JVM.
Retransformation of the bytecode of particular classes (again, by ClassFileTransformers) when it’s explicitly triggered by the agent or when Redefinition happens.

To be a proper Java agent, this class needs to be packed into a JAR with some specific parameters in the manifest. This could be done with maven-jar-plugin (assuming we’re using Maven here):

<plugin>
  <groupId>org.apache.maven.plugins</groupId>
  <artifactId>maven-jar-plugin</artifactId>
  <version>2.4</version>
  <configuration>
    <archive>
      <manifestEntries>
        <Premain-Class>me.ivanyu.javaagentsdemo.SimplestAgent</Premain-Class>
        <Agent-Class>me.ivanyu.javaagentsdemo.SimplestAgent</Agent-Class>
        <Can-Redefine-Classes>false</Can-Redefine-Classes>
        <Can-Retransform-Classes>false</Can-Retransform-Classes>
        <Can-Set-Native-Method-Prefix>false</Can-Set-Native-Method-Prefix>
      </manifestEntries>
    </archive>
  </configuration>
</plugin>

Let me explain these manifest options.

Premain-Class sets the class which contains premain function, pretty much the same as Main-Class in normal applications.
Agent-Class sets the class which contains agentmain function.
Can-Redefine-Classes indicates whether the agent can do Redefinition (false by default).
Can-Retransform-Classes indicates whether the agent can do Retransformation (false by default).
Can-Set-Native-Method-Prefixindicates whether the agent is able to instrument native methods (false by default). We aren’t covering native method instrumentation in this post, see this piece of documentation.

Now we package this with mvn package and we will get agent-1.0.0-SNAPSHOT.jar. Now if we run a Java application with additional option, the agent will be attached to a newly started JVM:

$ java -javaagent:some/path/agent-1.0.0-SNAPSHOT.jar -jar the_application.jar

Hey, look: I'm instrumenting a freshly started JVM!

Instrumentation API

Of course, we would like to do something more interesting than printing a line of text on the application start. And Instrumentation gives us features to do much. Check its documentation for the overview of the features, we will focus on code transformations.

Here’s an example of a very simple ClassFileTransformer:

package me.ivanyu.javaagentsdemo;

import java.lang.instrument.ClassFileTransformer;
import java.security.ProtectionDomain;

public class ClassListingTransformer implements ClassFileTransformer {
    @Override
    public byte[] transform(final ClassLoader loader,
                            final String className,
                            final Class<?> classBeingRedefined,
                            final ProtectionDomain protectionDomain,
                            final byte[] classfileBuffer) {
        System.out.println(className);

        // null means "use the bytecode without modifications".
        return null;
    }
}

ClassFileTransformers work on bytecode level, literally with byte arrays, in which opcodes are written. Apart from other things, className and classfileBuffer are passed to the transform method. In this simple example, className is printed and no transformations are done to the bytecode (we indicate this to the instrumentation mechanism by returning null).

Here’s the code to attach this transformer:

inst.addTransformer(new ClassListingTransformer());

When we run an application with this agent attached, we will see:

com/intellij/rt/execution/application/AppMainV2$Agent
java/util/concurrent/ConcurrentHashMap$ForwardingNode
com/intellij/rt/execution/application/AppMainV2
com/intellij/rt/execution/application/AppMainV2$1
java/lang/reflect/InvocationTargetException
java/lang/NoSuchMethodException
java/lang/invoke/MethodHandleImpl
java/lang/invoke/MethodHandleImpl$1
java/lang/invoke/MethodHandleImpl$2
...

We can directly modify bytecode and return a new version of it, and it will be loaded under the given class name.

However, direct modification of bytecode in byte arrays is clumsy and bug-prone (and out of the scope of this article). Several libraries that give a higher level interface to bytecode array exist. ASM is a powerful, but still quite a low-level library for bytecode manipulation. More high-level libraries are usually based on ASM, among them: cglib, Javassist, Byte Buddy and others (see this great overview on StackOverflow). We will try the last two.

Metrics collection example

Let’s make bytecode manipulation more interesting by making it concrete. Suppose we want to collect two metrics from calls of particular methods in our code: an average execution time and a total number of calls of a method. (Sounds made up, but it’s fine for demonstration—the post is not about metric collection after all :)

The complete code of the example is in this repository. It’s separated into three subprojects: agent contains agents’ code, client contains the client’s code, and common contains the code shared between agent and client.

First, we need a way to mark methods we’re interested in. For example, with annotations:

package me.ivanyu.javaagentsdemo.common;

import java.lang.annotation.*;

@Target({ElementType.METHOD, ElementType.CONSTRUCTOR})
@Retention(RetentionPolicy.RUNTIME)
public @interface CollectMetrics {
}

Now, the class that collects metrics:

package me.ivanyu.javaagentsdemo.common;

import java.util.Collections;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;

/**
 * Collects metrics on method calls.
 *
 * Counts the total number of method calls and
 * the average duration of method execution (with Exponential moving average
 * https://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average).
 */
public class MetricsCollector {
  private static final
  ConcurrentHashMap<String, Entry> entries = new ConcurrentHashMap<>();

  private static final double alpha = 0.015;

  /**
   * Report method call metrics.
   */
  public static void report(final String methodName,
                final long duration) {

    entries.compute(methodName,
        (final String key,
         final Entry curr) -> {
          if (curr == null) {
            return new Entry(1L, duration);
          }

          final long newAvgDuration = Math.round(
              curr.getAvgDuration() * (1 - alpha) + duration * alpha);
          return new Entry(
              curr.getCallCounts() + 1, newAvgDuration);
        });
  }

  public static class Entry {
    private final long callCounts;
    private final long avgDuration;

    private Entry(final long callCounts, final long avgDuration) {
      this.callCounts = callCounts;
      this.avgDuration = avgDuration;
    }

    public long getCallCounts() {
      return callCounts;
    }

    public long getAvgDuration() {
      return avgDuration;
    }
  }

  public static Map<String, Entry> getEntries() {
    return Collections.unmodifiableMap(entries);
  }
}

All the class internals are static for the sake of simplicity of sharing the collector between the agent and the client.

Now we can annotate methods we’re interested in with @CollectMetrics annotation and access them later using MetricsCollector.getEntries():

package me.ivanyu.javaagentsdemo.client;

import me.ivanyu.javaagentsdemo.common.CollectMetrics;
import me.ivanyu.javaagentsdemo.common.MetricsCollector;

public class ClientMain {
  public static void main(String[] args) throws InterruptedException {
    final Adder adder = new Adder();
    for (int i = 0; i < 10; i++) {
      helloWorld();
      adder.add(1, 30);
    }

    try {
      withException();
    } catch (Exception e) {
      // do nothing
    }

    MetricsCollector.getEntries().forEach((key, entry) -> {
      System.out.printf("%s\t%d calls\t%d ns avg\n",
          key, entry.getCallCounts(), entry.getAvgDuration());
    });
  }

  @CollectMetrics
  private static void helloWorld() throws InterruptedException {
    System.out.println("Hello world");
    Thread.sleep(124L);
  }

  @CollectMetrics
  private static void withException() throws Exception {
    throw new Exception();
  }

  private static class Adder {
    @CollectMetrics // applicable to constructor as well
    Adder() throws InterruptedException {
      Thread.sleep(1234L);
    }

    @CollectMetrics
    public int add(final int a, final int b) {
      return a + b;
    }
  }
}

Finally, it’s time to instrument these methods’ bytecode by agents. Let’s start with Javassist.

The Javassist agent

Javassist library gives a convenient way to manipulate bytecode. Unlike many other libraries of the sort, it allows to do this not only on the bytecode level itself, but also on the source code level by leveraging its own compiler: it literally compiles strings with Java code in the run time. Unfortunately, this compiler lacks some basic features like auto-boxing, generics or even type checks, but this won’t be the problem in this case. We will do what we’re going to do exactly on the source code level.

Javassist’s intervention starts in ClassFileTransformer. Here is a transformer that instruments methods annotated with @CollectMetrics with metric collection code:

package me.ivanyu.javaagentsdemo;

import javassist.*;
import javassist.bytecode.AnnotationsAttribute;
import javassist.bytecode.MethodInfo;
import javassist.bytecode.annotation.Annotation;
import me.ivanyu.javaagentsdemo.common.CollectMetrics;
import me.ivanyu.javaagentsdemo.common.MetricsCollector;

import java.lang.instrument.ClassFileTransformer;
import java.lang.instrument.IllegalClassFormatException;
import java.security.ProtectionDomain;

public class MetricsCollectionTransformer implements ClassFileTransformer {

  private final ClassPool classPool = ClassPool.getDefault();

  @Override
  public byte[] transform(final ClassLoader loader,
                          final String className,
                          final Class<?> classBeingRedefined,
                          final ProtectionDomain protectionDomain,
                          final byte[] classfileBuffer)
      throws IllegalClassFormatException {

    // className can be null, ignoring such classes.
    if (className == null) {
      return null;
    }

    // Javassist uses "." as a separator in class/package names.
    final String classNameDots = className.replaceAll("/", ".");
    final CtClass ctClass = classPool.getOrNull(classNameDots);

    // Won't find some classes from java.lang.invoke,
    // but we're not interested in them anyway.
    if (ctClass == null) {
      return null;
    }

    // A frozen CtClass is a CtClass
    // that was already converted to Java class.
    if (ctClass.isFrozen()) {
      // No longer need to keep the CtClass object in memory.
      ctClass.detach();
      return null;
    }

    try {
      boolean anyMethodInstrumented = false;

      // Behaviors == methods and constructors.
      for (final CtBehavior behavior : ctClass.getDeclaredBehaviors()) {
        if (isAnnotatedAsCollectMetrics(behavior)) {
          System.out.printf("%s - will collect metrics\n",
              behavior.getLongName());
          instrument(behavior);
          anyMethodInstrumented = true;
        }
      }

      if (anyMethodInstrumented) {
        return ctClass.toBytecode();
      }
    } catch (Exception e) {
      e.printStackTrace(System.err);
    } finally {
      // No longer need to keep the CtClass object in memory.
      ctClass.detach();
    }

    return null;
  }

  /**
   * Checks if the behavior is annotated with @CollectMetrics.
   */
  private boolean isAnnotatedAsCollectMetrics(final CtBehavior behavior) {
    final MethodInfo methodInfo = behavior.getMethodInfo();

    for (final Object attrInfo : methodInfo.getAttributes()) {
      if (attrInfo instanceof AnnotationsAttribute) {
        final Annotation annotation = ((AnnotationsAttribute) attrInfo)
            .getAnnotation(CollectMetrics.class.getName());
        return annotation != null;
      }
    }

    return false;
  }

  /**
   * Instruments the behavior with metric reporting code.
   */
  private void instrument(final CtBehavior behavior)
      throws CannotCompileException, NotFoundException {

    behavior.addLocalVariable("$_traceTimeStart", CtClass.longType);
    behavior.insertBefore("$_traceTimeStart = System.nanoTime();");

    // Add reporting of the call, e.g.:
    // MetricsCollector.report("<full method name>", 1256);
    // Won't work in case of exception.
    final String reportCode = MetricsCollector.class.getName() +
        ".report(" +
        "\"" + behavior.getLongName() + "\", " +
        "System.nanoTime() - $_traceTimeStart" +
        ");";
    behavior.insertAfter(reportCode);
  }
}

The key idea here is the following: for every class being loaded, find among its methods and constructors those which annotated with @CollectMetrics. Then, modify these functions:

In the beginning, record the current time in the local variable $_traceTimeStart.
In the end, call MetricsCollector.report with the current function name and time elapsed (the difference between the current time and $_traceTimeStart).

We construct the source code for this as strings, Javassist does the rest. However, there is a problem here: the reporting code won’t be executed in case of exception. It’s difficult to overcome this on the source code level in Javassist (unfortunately, Javassist can’t always magically weave additional code into bytecode). Luckily, this problem doesn’t exist with Byte Buddy.

When we run ClientMain with this agent attached, we will see:

Starting MetricsCollectionJavassistAgent
me.ivanyu.javaagentsdemo.client.ClientMain.helloWorld() - will collect metrics
me.ivanyu.javaagentsdemo.client.ClientMain.withException() - will collect metrics
me.ivanyu.javaagentsdemo.client.ClientMain$Adder() - will collect metrics
me.ivanyu.javaagentsdemo.client.ClientMain$Adder.add(int,int) - will collect metrics
Hello world
Hello world
Hello world
Hello world
Hello world
Hello world
Hello world
Hello world
Hello world
Hello world
me.ivanyu.javaagentsdemo.client.ClientMain.helloWorld()	10 calls	123814838 ns avg
me.ivanyu.javaagentsdemo.client.ClientMain$Adder()	1 calls	1233897983 ns avg
me.ivanyu.javaagentsdemo.client.ClientMain$Adder.add(int,int)	10 calls	4558 ns avg

(Surely, in a “normal” application such metrics are reported via JMX or HTTP.)

The Byte Buddy agent

In contrast to Javassist, Byte Buddy doesn’t work on the source code level, it works with bytecode directly. However, it provides a convenient DSL for this.

Let’s consider the same metric collecting agent made with Byte Buddy:

package me.ivanyu.javaagentsdemo;

import me.ivanyu.javaagentsdemo.common.CollectMetrics;
import me.ivanyu.javaagentsdemo.common.MetricsCollector;
import net.bytebuddy.agent.builder.AgentBuilder;
import net.bytebuddy.agent.builder.AgentBuilder.Transformer;
import net.bytebuddy.asm.Advice;
import net.bytebuddy.asm.AsmVisitorWrapper;
import net.bytebuddy.description.type.TypeDescription;
import net.bytebuddy.dynamic.DynamicType;
import net.bytebuddy.matcher.ElementMatchers;
import net.bytebuddy.utility.JavaModule;

import java.lang.instrument.Instrumentation;
import java.lang.reflect.Executable;

public class MetricsCollectionByteBuddyAgent {
  public static void premain(final String agentArgs,
                 final Instrumentation inst) throws Exception {
    System.out.printf("Starting %s\n",
        MetricsCollectionByteBuddyAgent.class.getSimpleName());

    new AgentBuilder.Default()
        .type(ElementMatchers.any())
        .transform(new MetricsTransformer())
        .with(AgentBuilder.Listener.StreamWriting.toSystemOut())
        .with(AgentBuilder.TypeStrategy.Default.REDEFINE)
        .installOn(inst);
  }

  private static class MetricsTransformer implements Transformer {
    @Override
    public DynamicType.Builder<?> transform(
        final DynamicType.Builder<?> builder,
        final TypeDescription typeDescription,
        final ClassLoader classLoader,
        final JavaModule module) {

      final AsmVisitorWrapper methodsVisitor =
          Advice.to(EnterAdvice.class, ExitAdviceMethods.class)
              .on(ElementMatchers.isAnnotatedWith(CollectMetrics.class)
              .and(ElementMatchers.isMethod()));

      final AsmVisitorWrapper constructorsVisitor =
          Advice.to(EnterAdvice.class, ExitAdviceConstructors.class)
              .on(ElementMatchers.isAnnotatedWith(CollectMetrics.class)
                  .and(ElementMatchers.isConstructor()));

      return builder.visit(methodsVisitor).visit(constructorsVisitor);
    }

    private static class EnterAdvice {
      @Advice.OnMethodEnter
      static long enter() {
        return System.nanoTime();
      }
    }

    private static class ExitAdviceMethods {
      @Advice.OnMethodExit(onThrowable = Throwable.class)
      static void exit(@Advice.Origin final Executable executable,
                       @Advice.Enter final long startTime,
                       @Advice.Thrown final Throwable throwable) {
        final long duration = System.nanoTime() - startTime;
        MetricsCollector.report(executable.toGenericString(), duration);
      }
    }

    private static class ExitAdviceConstructors {
      @Advice.OnMethodExit
      static void exit(@Advice.Origin final Executable executable,
                       @Advice.Enter final long startTime) {
        final long duration = System.nanoTime() - startTime;
        MetricsCollector.report(executable.toGenericString(), duration);
      }
    }
  }
}

The first thing you notice is that we don’t write ClassFileTransformer directly. Instead, Byte Buddy’s API provides us a way to create it using the declarative DSL.

The code means “use MetricsTransformer to transform any class you can find” (by default some bootstrap classes are filtered out).

MetricsTransformer is also rather simple. In its only method transform, first, we create two visitors that can look into bytecode code and transform it. These visitors are based on the mechanism called advice, which details you can find in the documentation. To put it simply, the idea is that we give it some classes with some static methods annotated with @Advice.OnMethodEnter and @Advice.OnMethodExit. These methods serve as the source of bytecode, which is embedded before and after the bytecode of the method being instrumented. Plus, we can “pass” values from the enter method to the exit method using @Advice.Enter annotation, which we use for passing the method enter time.

The first visitor is applied only to methods (annotated with @CollectMetrics), the second – only to constructors (again, annotated with @CollectMetrics). We need to separate them because Byte Buddy doesn’t support instrumenting constructors that throws exceptions ((onThrowable = Throwable.class) is the difference), but it’s OK with methods.

Then, we declare that DynamicType.Builder should visit every method of the class bytecode using the two visitors.

If we run an application with this agent attached, we will see:

Starting MetricsCollectionByteBuddyAgent
Hello world
Hello world
Hello world
Hello world
Hello world
Hello world
Hello world
Hello world
Hello world
Hello world
public int me.ivanyu.javaagentsdemo.client.ClientMain$Adder.add(int,int)	10 calls	620 ns avg
private static void me.ivanyu.javaagentsdemo.client.ClientMain.withException() throws java.lang.Exception	1 calls	36224 ns avg
me.ivanyu.javaagentsdemo.client.ClientMain$Adder() throws java.lang.InterruptedException	1 calls	1233520342 ns avg
private static void me.ivanyu.javaagentsdemo.client.ClientMain.helloWorld() throws java.lang.InterruptedException	10 calls	123624230 ns avg

(String representation of methods in the report is different from the Javassist agent, but essentially they’re the same).

Conclusion

This was a brief introduction into creating Java agent and bytecode modification. As you can see, you can do practically anything with bytecode. Some agents do really advanced bytecode modifications (e.g. JRebel, according to this blog post be ZeroTurnaround). Also, there are native agents (were mentioned in the beginning), which have control over the JVM itself, not only bytecode of classes.

Do you program or use Java agents? Let me know in the comments!

Thank you.