Freitag, 9. Januar 2015

Make agents, not frameworks

Ever since their introduction, Java annotations have become an integral part of the APIs of larger application frameworks. Good examples for such APIs are those of Spring or Hibernate where adding a few lines of annotation code implements quite complex program logic. And while one can argue about the drawbacks of these particular APIs, most developers would agree that this form of declarative programming is quite expressive when used right. However, only few developers choose to implement annotation-based APIs for their own frameworks or application middleware, mainly because they are regarded as difficult to realize. In the following article, I want to convince you that such APIs are in contrast quite trivial to implement and, using the right tools, do not require any special knowledge of Java intrinsics.

One problem that becomes quite obvious when implementing an annotation-based API is that annotations are not being handled by an executing Java runtime. As a consequence, it is not possible to assign a specific meaning to a given user annotation. For example, consider that we wanted to define a @Log annotation which we want to provide for simply logging each invocation of an annotated method:

class Service {
  void doSomething() { 
    // do something ...

As the @Log annotation is not capable of executing program logic by its mere existence, it would be up to the annotation's user to perform the requested logging. Obviously, this renders the annotation almost useless as we cannot invoke the doSomething method and expect to observe a corresponding statement in our log. So far, the annotation only serves as a marker without contributing any program logic.

Bridging the gap

In order to overcome this glaring limitation, many annotation-driven frameworks use subclassing in combination with method overriding to implement the logic that is associated with a particular annotation. This is commonly referred to as subclass instrumentation. For the proposed @Log annotation, subclass instrumentation would result in creating a class similar to the following LoggingService:

class LoggingService extends Service {
  void doSomething() { 
    Logger.log("doSomething() was called");

Of course, the above class does not normally need to be implemented explicitly. Instead, it is a popular approach to generate such classes only at runtime using a code generation library such as cglib or Javassist. Both these libraries offer simple APIs for creating program enhancing subclasses. As a nice side effect of delaying the class's creation until runtime, the proposed logging framework would be usable without any specific preparation and would always stay in sync with the user's code. Neither would be the case if the class would be created in a more explicit manner, for example by writing a Java source file during a build process.

But, does it scale?

However, this solution brings along another drawback. By placing the annotation's logic into the generated subclass, one must not longer instantiate the example Service class by its constructor. Otherwise, invocations of annotated methods would still not be logged: Obviously, calling the constructor does not create an instance of the required subclass. And to make things worse - when using the suggested approach of runtime generation - the LoggingService cannot be instantiated directly either as the Java compiler does not know about the runtime-generated class.

For this reason, frameworks such as Spring or Hibernate use object factories and do not allow for direct instantiation of objects that are considered to be a part of their framework logic. With Spring, creating objects by a factory comes naturally as all of Spring's objects are already managed beans which are to be created by the framework in the first place. Similarly, most Hibernate entities are created as a result of a query and are thus not instantiated explicitly. However, when for example saving an entity instance that is not yet represented in the database, a user of Hibernate needs to substitute a recently saved instance with an instance that is returned from Hibernate after storage. From looking at questions on Hibernate, ignoring this substitution already renders a common beginner's mistake. Other than that, thanks to these factories in place, subclass instrumentation happens mostly transparent to a framework user because Java's type system implies that a subclass can substitute any of its super classes. Hence, an instance of LoggingService can be used everywhere a user would expect an instance of the user-defined Service class.

Unfortunately, this approved method of instance factories proves difficult for implementing the proposed @Log annotation as this would entail using a factory for every single instance of a potentially annotated class. Obviously, this would add a tremendous amount of boilerplate code. Probably, we would even create more boilerplate than we avoid by not hard-coding the logging instruction into the methods. Also, the accidental use of a constructor would introduce subtle bugs to a Java program because the annotations on such instances would not longer be treated as we expect them to be. As another problem, factories are not easily composable. What if we wanted to add a @Log annotation to a class that already is a Hibernate bean? This sounds trivial but would require extensive configuration to merge both framework's factories. And finally, the resulting, factory-bloated code would not turn out too pretty to read and migrations to using the framework would be costly to implement. This is where instrumentation with Java agents comes into place. This underestimated form of instrumentation offers a great alternative to the discussed subclass instrumentation.

A simple agent

A Java agent is represented by a simple jar file. Similarly to normal Java programs, Java agents define some class as an entry point. This class is then expected to define a static method which is invoked before the actual Java program's main method is called:

class MyAgent {
  public static void premain(String args, Instrumentation inst) {
    // implement agent here ...

The most interesting part when dealing with Java agents is the premain method's second argument which represents an instance of the Instrumentation interface. This interface offers a way of hooking into Java's class loading process by defining a ClassFileTransformer. With such transformers, we are able to enhance any class of a Java program before its first use.

While using this API might sound straight forward at first, it imposes a new challenge. Class file transformations are executed by altering compiled Java classes which are represented as Java byte code. As a matter of fact, the Java virtual machine has no notion of what Java, the programming language is. Instead, it only deals with this byte code. And it is also thanks to this byte code abstraction that the JVM is easily capable of running other languages such as Scala or Groovy. As a consequence, a registered class file transformer only offers to transform a given byte (code) array into another one.

Even though libraries such as ASM or BCEL offer an easy API for manipulating compiled Java classes, only few developers are experienced in working with raw byte code. To make things worse, getting byte code manipulation right is often cumbersome and even small mistakes are redeemed by the virtual machine with throwing a nasty and unrecoverable VerifierError. Fortunately, there are better, easier ways to manipulate byte code.

Byte Buddy, a library that I wrote and maintain, provides a simple API both for manipulating compiled Java classes and for creating Java agents. In some aspects, Byte Buddy is a code generation library similar to cglib and Javassist. However, other than those libraries, Byte Buddy offers a unified API for implementing subclasses and for redefining existing classes. For this article, we do however only want to look into redefining a class using a Java agent. Curious readers are referred to Byte Buddy's webpage which offers a detailed tutorial on its full feature set.

Using Byte Buddy for a simple agent

One way that Byte Buddy offers for defining an instrumentation, is using dependency injection. Doing so, an interceptor class - which is represented by any plain old Java object - simply requests any required information by annotations on its parameters. For example, by using Byte Buddy's @Origin annotation on a parameter of the Method type, Byte Buddy deducts that the interceptor wants to know about the method that is being intercepted. This way, we can define a generic interceptor that is always aware of the method that is being intercepted:

class LogInterceptor {
  static void log(@Origin Method method) {
    Logger.log(method + " was called");

Of course, Byte Buddy ships with many more annotations.

But how does this interceptor represent the logic that we intended for the proposed logging framework? So far, we only defined an interceptor that is logging the method call. What we miss is the subsequent invocation of the original code of the method. Fortunately, Byte Buddy's instrumentations are composable. First, we define a MethodDelegation to the recently defined LogInterceptor which by default invokes the interceptor's static method on every call of a method. Starting from this, we can then compose the delegation with a subsequent call of the original method's code which is represented by SuperMethodCall:

Finally, we need to inform Byte Buddy on the methods that are to be intercepted by the specified instrumentation. As we explained before, we want this instrumentation to apply for any method that is annotated with @Log. Within Byte Buddy, such a property of a method can be identified using an ElementMatcher which is similar to a Java 8 predicate. In the static utility class ElementMatchers, we can already find a suitable matcher for identifying methods with a given annotation: ElementMatchers.isAnnotatedWith(Log.class).

With all this, we can now define an agent that implements the suggested logging framework. For Java agents, Byte Buddy provides a utility API that builds on the class modification API that we just discussed. Similarly to this latter API, it is designed as a domain specific language such that its meaning should be easily understood only by looking at the implementation. As we can see, defining such an agent only requires a few lines of code:

class LogAgent {
  public static void premain(String args, Instrumentation inst) {
    new AgentBuilder.Default()
      .transform( builder -> return builder
                                  .andThen(SuperMethodCall.INSTANCE)) )

Note that this minimal Java agent would not interfere with the remainder of the application as any executing code observes the instrumented Java classes just as if the logging statement was hard-coded into any annotated method.

What about real life?

Of course the presented agent-based logger is a trivial example. And often, broadly-scoped frameworks that offer similar features out-of-the-box such for example Spring or Dropwizard are great. However, such frameworks are equally often opinionated about how to approach programming problems. For a large number of software applications, this might not be a problem. And yet, sometimes these opinions are in the way of something bigger. Then, working around a framework's assumption on how to do things can cause more than just a few problems, often causes leaky abstractions and might just result in exploding costs for software maintenance. This is true especially when applications grow and change over time and diverge in their needs from what an underlying framework offers.

In contrast, when composing more specialized frameworks or libraries in a pic n mix fashion, one simply replaces problematic components with another one. And if this does not work either, one can even implement a custom solution without interfering with the rest of the application. As we learned, this seems difficult to realize on the JVM, mainly as a consequence of Java's strict type system. Using Java agents, it is however very much possible to overcome these typing constraints.

I came to the point where I believe that at least any cross-cutting concern should be covered by an agent-driven, specialized library instead of by a built-in module of a monolithic framework. And I really wish more applications would consider this approach. In the most trivial case, it is enough to use an agent to register listeners on methods of interest and to take it from there. This indirect approach of composing code modules avoids the strong cohesion that I observe in a large fraction of the Java applications I come across. As a nice side effect, It also makes testing very easy. And similarly to running tests, not adding an agent when starting up an application, allows to pointedly disable a certain application feature like for example logging. All this without changing a line of code and without crashing the application as the JVM simply ignores annotations that it cannot resolve at runtime. Security, logging, caching, there are many reasons that these topics and more should be taken care of in the suggested manner. Therefore, sometimes, make agents, not frameworks.

5 Kommentare:

  1. Thanks, this is very interesting and sounds useful, especially for DSL builders. I'll make a mental bookmark for future reference.

  2. Great topic. I'm just getting into writing some class manipulation code. I have a few questions:

    1) When using AgentBuilder, is Byte-Buddy replacing the Service class bytecode with proxy class bytecode, or is injecting the logger code directly into the method?

    2) How does this effect debugging? I could see manipulation of bytecode interfering with the bytecode->source line mappings.

    3) How do you register the Agents when running unit tests? Do you hook into a specific Maven lifecycle? I could see this being tricky when running JUnit tests directly from eclipse. You'd have to register the agent jar in the Run Configurations > Arguments, but if you're modifying agent code you'd have to rebuild the jar.

    I'll poke around the Byte-Buddy code base, and maybe I can answer some of these questions myself.

    Overall great post. I initially was steering away from using Agents in my current project, but now I may take another look.

    1. Thanks!

      Byte Buddy renames the original method and also copies the debugging information. The user does therefore not notice the instrumentation when setting breakpoint. The only trace is left by an additional synthetic method frame.

      By this approach it is possible to invoke the original method conditionally or multiple times.

      For using an agent in a unit test, Byte Buddy ships with a programmatic attacher, simply call ByteBuddyAgent.installOnOpenJDK().

  3. Awesome article! I really enjoyed reading it.

    Do you know if it is possible to use the same concept (ByteBuddyAgent.installOnOpenJDK) to intercept all new object creations. This way, I could do dependency injection in beans which are not managed by Spring.

    I know this can also be done using Spring and AOP but then you always have to declare an agent which makes development, unit testing e.d. much harder. (I know you can by default specify an agent in Intellij but still...). I really like the ByteBuddyAgent where you just say: ByteBuddyAgent.installOnOpenJDK

    Congrats and keep up the good work!

    1. Thank you, I appreciate the feedback.

      Yes, you can intercept constructor calls just the same way as method calls. However, make sure to invoke the super constructor every time BEFORE you intercept the call. You can also do this afterwards, but then you cannot access the "this" instance in your interceptor.