Java agents, Javassist and Byte Buddy

Java Virtual Machine (JVM) is a really great platform, mature, and well-established. Apart from lots of normal features used by all developers, there are some that are more low-level, designed to serve more “system” or “tooling” purposes. One example is sun.misc.Unsafe, which gives e.g. low-level access to memory. Another such feature is agents. Аgents are the JVM mechanism that enables external code to integrate with a running JVM, including access to the bytecode before it’s loaded and executed.

In this post:

  1. The basics of Java agents and Instrumentation API.
  2. An example agent for metrics collection.
  3. Libraries for bytecode manipulation: Javassist and Byte Buddy.

Agents were introduced in Java 1.5, but programming them is rarely a part of JVM programmer’s everyday job. However, having JVM code in production means a high probability of using some agents. I’d like to mention several widely used classes of them:

  • JMX-HTTP bridges, e.g. Jolokia, which gives access to JXM MBeans over HTTP (very useful for monitoring);
  • profilers, e.g. YourKit or JProfiler;
  • debuggers, namely Java Debug Wire Protocol (JDWP) agent;
  • aspect-oriented programming toolkits, AspectJ in particular;
  • hot code reloading tools like JRebel, which are especially useful in Java EE environment.

There are two types of agents in JVM: Java agents and native agents. Java agents are in JVM bytecode (i.e. written in one of JVM languages, most commonly in Java) packed in JARs and follow special code organisation convention so as JVM could use them. Native agents are different: they’re in native code (most commonly compiled from C++) packed in dynamic libraries, use JVM Tool Interface, and operate on a more low level than Java agents. Particularly, they can affect the garbage collector, thread management, locking and synchronisation, etc. Profilers and debuggers use native agents.

In this post, we’re focusing solely on Java agents.

The basics of Java agents

Java agents themselves are very simple. Let’s consider this piece of code:

The agent class doesn’t extend any class or implements any interface—everything is by convention. There are two static methods: premain and agentmain, their difference is explained in the Javadoc comments. An agent can have both of them or only one. Only one of these methods is executed, it depends on the time when the agent is attached to the JVM.

The Instrumentation object allows the agent to instrument programs running on JVM, namely gives access to classes bytecode and some other features. We will cover it a bit deeper later.

Let’s just describe several operations that can be performed by the agent:

  • Transformation of the classes’ bytecode when they are being loaded. It happens when the agent registers one or more ClassFileTransformers.
  • Complete Redefinition of the bytecode of particular classes, which are already loaded in the JVM.
  • Retransformation of the bytecode of particular classes (again, by ClassFileTransformers) when it’s explicitly triggered by the agent or when Redefinition happens.

To be a proper Java agent, this class needs to be packed into a JAR with some specific parameters in the manifest. This could be done with maven-jar-plugin (assuming we’re using Maven here):

Let me explain these manifest options.

  • Premain-Class sets the class which contains premain function, pretty much the same as Main-Class in normal applications.
  • Agent-Class sets the class which contains agentmain function.
  • Can-Redefine-Classes indicates whether the agent can do Redefinition (false by default).
  • Can-Retransform-Classes indicates whether the agent can do Retransformation (false by default).
  • Can-Set-Native-Method-Prefixindicates whether the agent is able to instrument native methods (false by default). We aren’t covering native method instrumentation in this post, see this piece of documentation.

Now we package this with mvn package and we will get agent-1.0.0-SNAPSHOT.jar. Now if we run a Java application with additional option, the agent will be attached to a newly started JVM:

Instrumentation API

Of course, we would like to do something more interesting than printing a line of text on the application start. And Instrumentation gives us features to do much. Check its documentation for the overview of the features, we will focus on code transformations.

Here’s an example of a very simple ClassFileTransformer:

ClassFileTransformers work on bytecode level, literally with byte arrays, in which opcodes are written. Apart from other things, className and classfileBuffer are passed to the transform method. In this simple example, className is printed and no transformations are done to the bytecode (we indicate this to the instrumentation mechanism by returning null).

Here’s the code to attach this transformer:

When we run an application with this agent attached, we will see:

We can directly modify bytecode and return a new version of it, and it will be loaded under the given class name.

However, direct modification of bytecode in byte arrays is clumsy and bug-prone (and out of the scope of this article). Several libraries that give a higher level interface to bytecode array exist. ASM is a powerful, but still quite a low-level library for bytecode manipulation. More high-level libraries are usually based on ASM, among them: cglib, Javassist, Byte Buddy and others (see this great overview on StackOverflow). We will try the last two.

Metrics collection example

Let’s make bytecode manipulation more interesting by making it concrete. Suppose we want to collect two metrics from calls of particular methods in our code: an average execution time and a total number of calls of a method. (Sounds made up, but it’s fine for demonstration—the post is not about metric collection after all :)

The complete code of the example is in GitHub icon this repository. It’s separated into three subprojects: agent contains agents’ code, client contains the client’s code, and common contains the code shared between agent and client.

First, we need a way to mark methods we’re interested in. For example, with annotations:

Now, the class that collects metrics:

All the class internals are static for the sake of simplicity of sharing the collector between the agent and the client.

Now we can annotate methods we’re interested in with @CollectMetrics annotation and access them later using MetricsCollector.getEntries():

Finally, it’s time to instrument these methods’ bytecode by agents. Let’s start with Javassist.

The Javassist agent

Javassist library gives a convenient way to manipulate bytecode. Unlike many other libraries of the sort, it allows to do this not only on the bytecode level itself, but also on the source code level by leveraging its own compiler: it literally compiles strings with Java code in the run time. Unfortunately, this compiler lacks some basic features like auto-boxing, generics or even type checks, but this won’t be the problem in this case. We will do what we’re going to do exactly on the source code level.

Javassist’s intervention starts in ClassFileTransformer. Here is a transformer that instruments methods annotated with @CollectMetrics with metric collection code:

The key idea here is the following: for every class being loaded, find among its methods and constructors those which annotated with @CollectMetrics. Then, modify these functions:

  1. In the beginning, record the current time in the local variable $_traceTimeStart.
  2. In the end, call MetricsCollector.report with the current function name and time elapsed (the difference between the current time and $_traceTimeStart).

We construct the source code for this as strings, Javassist does the rest. However, there is a problem here: the reporting code won’t be executed in case of exception. It’s difficult to overcome this on the source code level in Javassist (unfortunately, Javassist can’t always magically weave additional code into bytecode). Luckily, this problem doesn’t exist with Byte Buddy.

When we run ClientMain with this agent attached, we will see:

(Surely, in a “normal” application such metrics are reported via JMX or HTTP.)

The Byte Buddy agent

In contrast to Javassist, Byte Buddy doesn’t work on the source code level, it works with bytecode directly. However, it provides a convenient DSL for this.

Let’s consider the same metric collecting agent made with Byte Buddy:

The first thing you notice is that we don’t write ClassFileTransformer directly. Instead, Byte Buddy’s API provides us a way to create it using the declarative DSL.

The code means “use MetricsTransformer to transform any class you can find” (by default some bootstrap classes are filtered out).

MetricsTransformer is also rather simple. In its only method transform, first, we create two visitors that can look into bytecode code and transform it. These visitors are based on the mechanism called advice, which details you can find in the documentation. To put it simply, the idea is that we give it some classes with some static methods annotated with @Advice.OnMethodEnter and @Advice.OnMethodExit. These methods serve as the source of bytecode, which is embedded before and after the bytecode of the method being instrumented. Plus, we can “pass” values from the enter method to the exit method using @Advice.Enter annotation, which we use for passing the method enter time.

The first visitor is applied only to methods (annotated with @CollectMetrics), the second – only to constructors (again, annotated with @CollectMetrics). We need to separate them because Byte Buddy doesn’t support instrumenting constructors that throws exceptions ((onThrowable = Throwable.class) is the difference), but it’s OK with methods.

Then, we declare that DynamicType.Builder should visit every method of the class bytecode using the two visitors.

If we run an application with this agent attached, we will see:

(String representation of methods in the report is different from the Javassist agent, but essentially they’re the same).

Conclusion

This was a brief introduction into creating Java agent and bytecode modification. As you can see, you can do practically anything with bytecode. Some agents do really advanced bytecode modifications (e.g. JRebel, according to this blog post be ZeroTurnaround). Also, there are native agents (were mentioned in the beginning), which have control over the JVM itself, not only bytecode of classes.

Do you program or use Java agents? Let me know in the comments! ?

Thank you.