Mittwoch, 10. Mai 2017

Yet another Jigsaw opinion piece

In the past weeks there has been a heated debate around the imminent release of Java 9 and its most famous feature: the Java platform module system – the JPMS which is better known under its project umbrella‘s name Jigsaw. The module system is introduced into the Java ecosystem in form of a formal specification process a JSR which needs to be approved in its final form by its expert group. Among other members of this expert group, representatives of Red Hat and IBM have now voted to reject Java‘s module system in the first ballot which they believe is not yet ready for production.  

What‘s the fuzz all about?

Even today, Java developers are widely familiar with modularity. Build systems like Maven organize code as modules that are compiled against a declared set of dependencies. Only at runtime, these modules are put together on the class path where these compile-time module boundaries vanish. With Jigsaw, the module path is offered as an alternative to this class path for which the JVM retains such compile-time boundaries at runtime. By not using this module path, applications should function just as before. But this comes with the exception of applications that rely on APIs internal to the JVM. The Java standard library is always loaded as a collection of modules, even if the class path is used exclusively such that internal Java APIs are no longer accessible.

This latter limitation in compatibility has raised some concerns among the maintainers of both libraries and end-user applications. And in this context it can be a bit surprising that the recent objections do not relate too much to these concerns. While mentioning issues around compatibility, both Red Hat and IBM predominantly argue that the JPMS requires further extension to allow for a better integration with existing module systems such as JBoss modules and OSGi.

What problem does still need solving?

By jar hell, developers typically describe a situation where a Java application would require two different versions of a library to satisfy different transitive dependencies. Using the class path, this is impossible as one version of a library shadows a second copy. If a class of a given name is loaded for the first time, the system class loader scans jar files in their command line order and loads the first class file it discovers. In the worst case, this can result in Frankenstein functionality if the shadowed jar file contains some exclusive classes that link with classes of the shadowing jar. But more typically, it results in a runtime failure once a feature dependent on a specific version is triggered.

With OSGi and JBoss modules, this problem can be partially resolved. The latter module systems allow to load a library by each its own class loader, thus avoiding the system class loader that is responsible for the class path. With this approach, multiple versions of the same class can coexist by isolation within separate class loaders. Doing so, it is for example possible for two libraries to both depend on their specific version of the commonly breaking Guava API. With class loader isolation, any library would delegate calls to its required version when loading dependent classes.

When using the module path, the JPMS does not (currently) apply such class loader isolation. This means that jar hell is not solved by Java 9. In contrast to using the class path, the JVM does however detect the described version conflict and fails the application at startup, rather than speculating on accidental compatibility. To enforce this constraint, every Java package name is now exclusive to a specific module or the class path. Therefore, it is not possible for two modules to share a package. This restriction also holds if a private package is not meant to be exposed what is considered to be another flaw of the current module design by Jigsaw‘s critics.

A missed chance to escape jar hell?

For class loader isolation to work, it is necessary that versions of the same module never interact. And while two such versions would of course never interact directly, it is unfortunately more than common that two versions are part of the public API of different modules. For example, if two libraries return instances of Guava‘s Function type, a version conflict between each module‘s Guava version can no longer be solved using class loader isolation, even if the Function type did not change between those versions. At runtime, any loaded class is described as a tuple of its name and class loader but since two class loaders do now offer the Function type, which one should be resolved?

This described problem, as a matter of fact, cannot be solved by a module system. Instead, a module system can discover this conflict and inform the user of the need of an explicit resolution. This is accomplished by the current implementation of the JPMS and of course both OSGi and JBoss modules. At the end of the day, version conflicts can only be avoided by evolving APIs in a compatible manner.

Is Jigsaw too simple?

Despite the remaining limitations of a class loader-isolating module system, the current argument against Jigsaw is mainly revolving around this item. Additionally, the expert group members that reject Jigsaw point out the lacking support for circular module dependencies ("module A depends on B depends on C depends on A") and the inability to alter the module graph after its creation.

From a technical perspective, it would of course be possible to add these features. As a matter of fact, Java 9 already ships with a module builder API that allows for loading modules with exclusive class loaders. There is no technical limitation in choosing to retain a single class loader for the module path; rather this decision is considered to be the responsible choice for the JVM by Oracle. And before diving deeper into the arguments, I want to state that I fully agree with the company’s reasoning.

What is wrong with class loader isolation?

As mentioned before, even with class loader isolation, manual version management can often not be avoided. Also, library authors that rely on common APIs with version incompatibilities such as Guava do increasingly shade such dependencies. When shading, the code of a library is copied into a separate name space, thus allowing an application to refer to “its version“ by different names instead of by different class loaders. This approach has of course flaws of its own, especially when a shaded dependency uses JNI. On the other hand, this approach overcomes the just mentioned shortcoming of class loader isolation when using libraries with conflicting shared dependencies. Also, by shading a common dependency, a library author relieves its users from potential conflicts independently of a deployment method.

Allowing for circular dependencies would neither impose a big technical challenge. However, cyclic dependencies are rather uncommon and many build systems like Maven do neither support them. Typically, cyclic dependencies can be refactored to non-cyclic ones by splitting up at least one module into implementation and API. In this context, if a feature seems to be of such little common concern, I do not think corner cases justify its addition, especially when the class path still serves as a backup. And if this decision turns out to be wrong, cyclic dependencies can always be enabled in a future release. Taking this feature away would however not be possible.

Finally, dynamic modules render a feature that could be useful to more than a few applications. When you require the dynamic redeployment of modules with an active life cycle, from my experience in my last project, OSGi is a very good choice. That said, most applications are static and do not have a good reason for its use. But by adding support for a dynamic module graph, the complexity of this feature would translate into the JPMS. Therefore, I think it is the right decision to leave this feature out for now and wait until its use is better understood. Naturally, an approachable module system increases adoption.

Compatibility first.

Does this incompatibility mean the end for OSGi and JBoss modules? Of course not. Quite to the contrary, the introduction of standardized module descriptors gives opportunity to existing module systems. Missing manifest headers to describe bundles is one of the major pain points when using OSGi due to a significant number of libraries that do not consider the proprietary module descriptor. With the introduction of a standardized module descriptor, existing module systems can ease this limitation by using the latter descriptor as a secondary source for a module’s description.

I do not doubt for a second that Red Hat and IBM rejected the JSR with their best intentions. At the same time, I cannot agree with the criticism on the lacking reach of the module system. In my opinion, the existing changes are sufficiently challenging for the Java ecosystem to adopt and especially a last minute introduction of class loader isolation bears the potential of unwanted surprise. In this light, I find the arguments made against the current state of Jigsaw inconsistent as it criticizes the complexity of transition to modules but also demands its extension.

There is no perfect module system

Personally, I think that the current proposal for the JPMS bears two big challenges. Unfortunately enough, they got in the background due to the recent discussion.

Automatic modules

Without a module descriptor, modular code can only refer to a non-modular jar file in form of a so-called automatic module. Automatic modules do not impose any restrictions and are named by their jar file. This works well for developers of end user applications which never release their code for use by another application. Library developers do however lack a stable module name to refer to their dependant automatic modules. If released, they would rely on stable file names for their dependencies which are difficult to assume.

For the adoption of Jigsaw, this would imply a bottom-up approach where any library author can only modularize their software after all dependant code was already modularized. To ease the transition, a manifest entry was added which allows to publish a jar with a stable automatic module name without the need to modularize code or even migrate to Java 9. This allows other libraries users that depend on this first library with a stable name to modularize their code, thus breaking through the bottom-up requirement.

I think it is essential to allow library maintainers to state an explicit module name before their code is migrated to fully use the JPMS and I consider this a more than adequate way to deal with this problem which does unlikely offer a better solution.

Reflection and accessibility

With Jigsaw, it is no longer allowed to access non-public, non-exported members using reflection what is an opportunity that many frameworks currently assume. Of course, with a security manager being set, such access can be impossible even in today’s Java releases but since security managers are used so seldom, this is not thought of much. With Jigsaw, this default is reversed where one needs to explicitly open packages for such reflective access, therefore affecting many Java applications.

Generally, I think that Jigsaw’s encapsulation is a better default than the current general openness. If I want to give Hibernate access to my beans, the JPMS allows me to open my beans to Hibernate only by a qualified export. With a security manager, controlling such fine grained access was difficult if not impossible to implement. However, this transition will induce a lot of growing pain and many libraries are not maintained actively enough to survive adopting these new requirements. Thus, adding this restriction will definitely kill off some libraries that would otherwise still provide a value.

Also, there are use cases of reflection that are still uncovered. For the mocking library Mockito (that I help maintain) we do for example need a way of defining classes in any class loader. This was and still is only possible by the use of internal Java APIs for which no alternative is yet offered. As Mockito is only used in test environments, security should not be of concern in this context. But thanks to the retained openness of sun.misc.Unsafe on which we already rely for instantiating mock classes without constructor calls, we can simply open these APIs by changing their accessibility using its direct memory APIs.

This is of course no good enough solution for the years to come but I am convinced that those concerns can be addressed before removing the Unsafe class entirely. As one possibility, the JVM could be extended with a test module that needs to be resolved explicitly on the command line and which allows such extended access. Another option would be to require the attachment of a Java agent by any test runner because of their ability to break through module barriers. But as for now, any maintained software has an opportunity to resolve its non-standard Java use and continue the discussion on missing APIs in the upcoming years.

Finding consensus.

Considering the stereotype of the socially anxious computer nerd, software development can be a rather emotional business. Oracle has always been a company that Java developers love to hate and the current discussion partly jumps on this bandwagon. Looking at the success of Java as a language and a platform, I do however think that Oracle deserves credit for its objectively good job in its stewardship. Breaking software today with future success in mind is a delicate and grateless task. Anybody who refactored correct but complex code should be sympathetic to this challenge.

Project Jigsaw has often been criticized for being an unnecessary effort and I admit that this thought had crossed my own mind. Yet, it is thanks to the module systems that dead weight like CORBA or RMI can finally be removed from the JVM. With the implied reduction in size of modular Java applications, the JVM has become more attractive for the use within containerized applications and cloud computing what is surely no coincidence given Oracle’s market strategy. And while it would of course be possible to further postpone this effort to a later Java release, the JVM must address the removal of functionality at some point. Now is as good a time as any.

To ease the upcoming transition, it is important to keep the breaking changes down to a minimum. Therefore, I am convinced that extending the scope of Jigsaw is not in the best interest of the broader Java community. Many of the rejecting votes of the recent ballot asked for the involved parties to find consensus on the outstanding issues. Unfortunately, the features in question can either be implemented or discarded where consensus can only be reached by one party giving up their position.

With the typical Java application in mind, I do hope that Oracle does not answer to the demands with a scope extension only to secure a successful vote on the Jigsaw JSR. Rather, I want to appeal to the expert group members who were voting against the JSR to reconsider their vote with the needs of the entire Java ecosystem in mind where the requirements of existing enterprise module solutions are only one factor among many. With the broad usage of Java, ranging from business applications to low-latency systems, it is only natural that different parties identify different priorities for the evolution of the platform. I am convinced that Oracle has found a common denominator for a module system that serves most users.

4 Kommentare:

  1. As a member of the EG, I would point out that we do not have a vote, or any actual power. That lies with the Spec Lead and the Executive Committee.

    Rest assured though, I am using my voice in the EG to further the needs of the entire Java community rather than any narrow constituency -- I believe the same is true of Red Hat and IBM. However I don't agree that pushing forward with the JSR in its current state actually serves the needs of the Java community.

    1. This blog posting was written to explain why I disagree. There are few arguments that do not relate to integration with existing module systems and I do not think this should be the priority, given the wide range of Java usages and how this integration could interfere with this usage.

  2. Good article! I just shared it in Linked-In

  3. I reviewed a lot of Jigsaw posts and mails in the mailing list, still haven't found a single good argument against module isolation.

    Arguments I saw:
    It is hard
    We don't have time
    It won't work for 100% cases
    Classloaders will be different

    So far, only second argument looks like a true one. But having it wrong just to be on time is worse.