Software interoperability: boosting high-level productive programming languages with efficient low-level ones (Java-C/C++)


Software Development Interoperability Programming languages Productivity JNI FFI Flexibility Adaptability Efficiency Performance Optimization Parallel programming Abstraction OOP Object-Oriented Programming Software Architecture

Frequently we are faced with two types of problems when developing and working with our code, both of which are very severe. This occurs both when implementing a CLI processing program, as well as when developing a typical desktop GUI-supported application or even in a web application as a service server.

The first arises as an inconvenience when using a type of functionality, limiting the expressiveness of our application in a specific problem domain. This happens because we have to implement that functionality from scratch or because it is not the best language or framework to develop that type of uses. This is very common when we develop an application in any language, and eventually we need to implement business logic strongly driven, for example, by machine learning and artificial intelligence, being our language limited in libraries and capabilities to develop all the necessary machinery, or requiring great efforts to achieve something minimally decent. However, following this example, with other languages with strong influence in data science, such as python and its community packages, we have plenty of facilities to make such applications, saving enormous engineering periods. This happens with all of them, since there is not a single language that covers all the domains of knowledge, because after all they end up specializing in some specific areas. Those who provide an infinite number of libraries for graphics generation, lack good mathematical and physics tools; those who facilitate the generation of a distributed system based on an actor model, do not have good tools to process system streams and inodes; those who are good and make it easy to perform probabilistic programming and provide inference and logic mechanisms, complicate a lot the exposure of services and communication protocols; and so on with any language and framework.

The second type occurs when we need to achieve low-level functionality, access hardware devices, direct manipulation of operating system drivers, or generally, the most common, when requiring higher performance or energy efficiency. The latter case is usually caused by implementations in a high-level language, mainly concentrating on bottleneck regions to be optimized.

In this article we focus on the latter case, the acceleration of our application through access to a language that provides higher performance than that offered by the current implementation of the complete system. However, all the cases mentioned above are solved by means of the same mechanism, which is the interoperability between languages, or in other words, connecting one with the other through mechanisms usually called foreign function interface (FFI) or language bindings. Each language has one or more ways (if it is more sophisticated) to perform this type of operations, being able to become really complex and even allow the call to objects (under the Object-Oriented Paradigm - OOP), the exchange of non-trivial structures or types and even the exposure and use of runtime functionalities of the language. Here we are going to focus on exposing a case where the whole application has been implemented in Java, and we want to provide more functionality and efficiency to some modules through access to C/C++ and OpenMP, the portable framework to exploit parallel computing.

The following diagram shows the Java and Native layers, how they relate and enable interoperability through the Java Native Interface (JNI) mechanism. The figure is composed of three regions, the upper one, with the compilation process; the middle one, showing the Java Virtual Machine (JVM) process space; and the lower one, with the operating system.

Interoperability overview between JVM and Native execution through JNI

Diagram of the Java and Native layers showing the interoperability between JVM and Native execution through JNI.

First, the user compiles the source codes. On the one hand, the compilation of the bytecodes (.class) is done by javac. On the other hand, the C and C++ source codes are compiled using any of the available compilers, such as icc, g++ or clang++, producing the object files as dynamic libraries (.so/.dll). The object files that serve as a bridge to the native connection are built as a separate compilation unit (bridge-object file .so/.dll). Since previously built external libraries, called group4layers libs, are needed, they are also linked to during the compilation process (libraries-object files .so/.dll).

Once compiled, the bytecodes are integrated through the load subsystem, storing them in memory, interpreting them and processing them considering the rest of the necessary parts provided by the Java runtime. As shown in the runtime data region, there are multiple areas to store variables, structures and objects, as in a system process, but the JVM itself controls their location. The loaded bytecodes also request and release memory, interacting with the different areas. Finally, the execution engine is in charge of compiling the bytecode to machine instructions, applying sophisticated optimization strategies. One of the advantages of Java and the JVM runtime is that it allows us to create new classes dynamically, compile them and execute them on the fly, increasing flexibility. This is where the JNI connection comes into play. Thanks to this interoperability mechanism, the execution engine is able to offload the execution to the native interface (Native Interface) through the previously compiled bridge (bridge-object file). This, in turn, will use functionality from the external native libraries (Native Library), linked and provided together with the bridge (Library). As we can see, all these interfaces and native libraries are part of the JVM process.

Finally, in the lower layer, different components of the operating system are represented, from device drivers and system services, to memory management or the kernel process and thread scheduler. Since in this article we focus on improving performance and perform a native management of multiple system threads, in order to take advantage of the available cores, both the execution engine and the native library make direct use of functionalities provided by the operating system itself (Processes & Threads Implementation). On the one hand, JVM provides its concurrency mechanisms to scale the application and take advantage of the system resources in a transparent way for the programmers. On the other hand, the compiled native libraries make explicit use of the available cores and processors, exploit thread and cores affinity strategies, as well as cache optimization and memory hierarchy exploitation, interacting directly with the available hardware and avoiding any restriction imposed by the JVM.

Below is the diagram exposing the software interoperability carried out in this article for a specific example, considering the Java and Native realms. As mentioned before, we use JNI to make the connection between Java modules and C/C++ compilation units.

Example of software interoperability carried out between Java (JVM) and Native realms (C-C++) through JNI

Diagram exposing the software interoperability carried out in this post via JNI, highlighting the Java and Native realms along with their modules and compilation units.

The user starts by running the main class of the system, called Entrypoint as an intuitive and simplified example. The Entrypoint class represents the start of an application, composed of hundreds or thousands of classes, containing the functionality to be optimized, called Core. As a result of years of work, this class is properly implemented in Java, but it is still a bottleneck, so it is necessary to provide a more efficient implementation. For this purpose, a native implementation of this functionality is made, using the JNI as a bridging mechanism.

The native implementation exploits the JNI interface through the Bridge class, which acts as a connector to the native realm. It communicates by transferring calls and parameters, while being able to receive results and callbacks from the native part. It is important that it meets the specification (signature match) with the native interface, so that calls can be translated between Java and C/C++.

Within the native realm, there is a native library region, composed of two pieces: the dynamic bridging library (libbridge), provided by Bridge.cpp; and the set of previously compiled and linked Group4Layers libraries, called libNativeCore. The latter implements the Core functionality, but is highly optimized and based on accelerated native execution without any dependency on JVM or Java.

The Bridge class facilitates the communication with the native modules, being divided into a Java part and a native part, being the latter materialized in the libbridge library. Depending on the decisions made by the architecture, Core can be used with both the Java-based and the native optimized sides, since it is composed of both implementations. The advantage of this mechanism is that an incremental port is available, providing these functionalities independently and optionally.

In the following paragraphs we expose the files previously indicated to achieve the functionality indicated in the example above, simplifying the fragments with the minimum necessary to facilitate their understanding, operation and relationships.

The Entrypoint.java file represents the simplification of the actual Java application, whether it is a server, a daemon (process) or a desktop application. At some point in the whole software architecture it will be determined whether to run the Java Core or the native Core, here represented as Core.compute and Core.efficient_compute, respectively.

It is important to mention that the Core could actually be divided among several components, but for the sake of facilitating the example, here it is presented in a single unit. Since we are going to extend the behavior, it would be better to use appropriate patterns for them and be able to specialize them, for example by using the Strategy/Policy or even Template method design patterns.

import Core;

public class Entrypoint {
   // Exposing a simple example, but the Core functionality would be better as
   // part of a Behavioral Pattern, like Strategy/Policy or Template method.
   public static void main(String[] args) {
     // Java impl. computation
     Core.compute(/* send args */);
     // rest of the high-level software until we need the efficient computation:
     Core.efficient_compute(/* send args */);
   }
}

The Core.java class has both methods that implement the bottleneck functionality, including efficient computation. This second method needs the Bridge class, to which it propagates its arguments. As we have mentioned, it would be nice to make it independent and be able to compose the classes and behaviors.

The advantage of this interoperability and incremental porting is that the native Core could reuse functions from the Core itself (in Java). For example, if 5 methods perform the compute functionality, and only 2 are bottlenecks, 3 could be reused from the Java implementation. Of course, it will be necessary to perform profiling to see the overheads in making calls and conversions between Java and C++, evaluating the real impact, because it could happen that it is still more convenient to perform the entire implementation in C++, i.e., the 5 methods, without suffering penalties for conversions between types and runtimes.

import Bridge;

public class Core {
  // rest of the Core functionality which can also be reused in the NativeCore
  public static void compute(/* args */){ /* java implementation */ }
  public static void efficient_compute(/* args */){
     (new Bridge()).compute(/* propagate args */);
  }
}

The Bridge.java class acts as a bridge to the Bridge.cpp class, by using the native native void compute method and delegating arguments to C/C++. It is important to specify the dynamic library that implements this functionality. To do so, it is necessary to make use of the static loading, at class level, of such object code, which provides and implements such functions keeping the machine code executable regardless of the memory location loaded (PIC). This will be reflected later during the compilation stages.

public class Bridge {
  static {
    // Load native library libbridge.so (Unix) or bridge.dll (Windows)
    System.loadLibrary("bridge");
  }

  // delegated (to native) functionality
  public native void compute(/* java to c/c++ args */);
}

The library that implements the native functionality is encapsulated as part of another compilation unit, with NativeCore.cpp being the entry point, with its own dependencies and modular code. As it is not important for the purpose of this article, a rough sketch of what could be a more sophisticated and efficient usage is simply given. For this purpose, in this case, we have chosen to make use of a group4layers library for sequential code acceleration, based on vector operations (SIMD), using intrinsics, providing macros, packed types and functions to squeeze the most out of existing multi-core processors on all types of architectures. In addition, this library makes use of the OpenMP parallel programming technology, to exploit the shared memory paradigm to the extreme, taking advantage not only of the vector units but also of the processor cores.

As seen in the previous diagrams, with these mechanisms we will be interacting directly with the operating system, so that our functionality acts as a bypass, raising and managing more threads, all of them independent of the JVM runtime and its (potentially) concurrent processing. We explicitly use a flag as a heuristic to determine the number of threads created, or create them dynamically based on the observed requirements. In any case, in the native module we are free to do anything from low-level drivers for network interfaces, peripherals and I/O devices, to all kinds of acceleration and processing strategies taking advantage of the physical resources of the system, usually exposed by the operating system itself.

#include <omp.h>
#include <g4l_simd_acc.h>
#include "NativeCore.h" /* my custom lib to implement the core */

extern "C" {
  void NativeCore_compute(/* args: heuristic_cond, Am, Tx, R, r, nthreads, ... */){
     int i, j;
     heuristic_cond ? omp_set_num_threads(nthreads) : omp_set_dynamic(true);
     #pragma omp parallel for collapse(2) schedule(guided, dimX/(dimY*2)) \
                 default(none) shared(Am, Tx, R, r) private(i, j)
     for (i=0; i<dimX; ++i) {
       for (j=0; j<dimY; j+=(j_sep + G4L_PREF_STRIDE_SIMD)) {
         // group4layers libs: SIMD intrinsics - convenient helpers
         // ... continue impl.
       }
     }
  }
}

The last file needed to establish the connection between languages is Bridge.cpp, being part of the bridge from C/C++ via JNI. As can be seen, it requires the macros, attributes and types provided by jni.h, in addition to the headers of the bridge itself (Bridge.h), generated from the Bridge.java file. And finally, all the native functionality implemented, detailed previously.

This file requires the Java_Bridge_compute function, as well as a set of arguments, some mandatory by the JNI mechanism itself, and others, at our own discretion. In this example we have not delegated any, but we would place them after the two mandatory ones.

The function starting with c_cpp_entrypoint has been set up as a mechanism to perform argument and type matching between languages, typically called converters and marshalling/unmarshalling operations, as well as other preparatory and organizational work before deploying all the efficient computational machinery. However, our native libraries and implementations could have been called directly from the main function. This represents a minimal example to understand the operation, but the idea is to encapsulate the native behavior so it can be reused from other places, thus any conversion and usage between Java-C/C/C++ is recommended to be isolated, as shown here.

#include <jni.h>
#include "NativeCore.h" /* my custom lib to implement the core */
#include "Bridge.h"

void c_cpp_entrypoint_efficient_functionality(/* raw native args */) {
  // call any c/c++ libs/funcs...
  // adapting the Java arguments to our custom native core implementation
  // initializing data structures, setting up variables, ...
  NativeCore_compute(/* passing prepared args */);
}

JNIEXPORT void JNICALL Java_Bridge_compute(JNIEnv *env, jobject thisObj,
                                           /* rest of the arguments from Java */) {
  // (un)marshalling and transforming of args to native types
  c_cpp_entrypoint_efficient_functionality(/* raw native args */);
  return;
}

Finally, the steps to compile and run this case are shown. We start with the group4layers native libraries already compiled, including all the low-level functionality and code acceleration mechanisms. Thus, we have g4l_custom_libs with the compiled dynamic libraries (object codes) and header files.

In the first step we create the C headers for Bridge.java, being necessary to complete correctly the third step, so that the function signatures match between Java and C. The second step compiles to bytecode the Java codes. Third, the bridge is built as a dynamic library libbridge.so, requiring all the JNI specification, NativeCore dependencies and group4layers libraries. Once this step is done, we have ready the library to be loaded by the Java bridge (System.loadLibrary("bridge")). Finally, we initialize our Java application starting from our entry point, being important to indicate to the JVM runtime loader the base path to locate the native libraries.

With these simple steps, the whole system will be taking advantage of the native functionality, increasing the flexibility and efficiency of our Java application. In addition, it is a dynamic and optional connection mechanism, so if we make a good software architecture design, we can alternate between pure Java code, only native, or even a hybrid approach, with the best of both.

# Precondition) group4layers' native core library is built
# and located in `g4l_custom_libs` with the headers

# Step 1) create Bridge.java, get headers/JNI mock
javac -h . Bridge.java

# Step 2) compile Java code
javac Bridge.java; javac Core.java; javac Entrypoint.java;

# Step 3) create Bridge.cpp (linked to JNI/jni.h), get dynamic library
$CXX_COMPILER -fPIC -I"$JAVA_HOME/include" -I"$JAVA_HOME/include/linux" \
              -I"g4l_custom_libs/include" -L"g4l_custom_libs/lib" -shared \
              -o libbridge.so -lNativeCore Bridge.cpp

# Step 4) run java + native libs
java -Djava.library.path=. Entrypoint

We have seen the importance of being able to connect different languages through the FFI interoperability mechanisms offered (JNI in Java), providing great versatility to the applications. Access to more efficient languages has enabled accelerating the bottlenecks found in Java code regions, offloading such functionality to C and C++ code featuring powerful libraries, such as OpenMP directives and vector acceleration libraries. This is key to facilitate porting, achieving parallel and efficient executions. However, there is a drawback. If we look at the first diagram in the article, we can see how our C/C++ code resides within the Java Virtual Machine (JVM) process space. Thus, if we make a critical bug while developing native functionality, we can bring the whole system down. Similarly, we are not protected by security mechanisms, since we are inside the JVM environment, not being isolated in a sandbox, so we have to be specially careful when developing this kind of functionality. This is a problem that is seen in many FFI mechanisms, although there are languages that offer several connection mechanisms and could provide one that is not part of the runtime process, as in the case of Erlang. In any case, the more guarantees we can offer on our native code, the better. For example, by establishing defensive programming and a good system of error control and assertions, performing all kinds of tests or even static analysis and formal verification. Another alternative is to use stricter languages that provide greater guarantees in the correctness of native functionalities, either by strong typing and control of overflows, or by memory management and synchronization primitives, as can be found in languages such as Rust or Ada.

Summing up, the possibility of exploiting interoperability mechanisms, access to specialized libraries, extra functionality and more efficient languages is really important to provide sufficient flexibility and adaptability to applications. However, we must be aware of how this interoperability between languages is accomplished, its fundamentals and mechanisms, in order to provide the best guarantees to achieve improved performance or functionality without compromising the whole system.

This website uses its own and third party cookies to analyze the traffic and offer you a better experience. By browsing or using our services the user is accepting its use.More information.