Home

March 27, 2026

Bridging MLIR C++ to Java with JavaCPP

ProofSouq uses MLIR under the hood to represent Lean and Rocq operations. This allows us to take advantage of MLIR’s excellent facilities for conversion and rewriting.

However, we prefer to keep as much of the application layer in Java as possible. We therefore must introduce Java/C++ interop. Thankfully, it’s possible to forgo writing JNI glue code by hand and instead use JavaCPP, with which you annotate Java classes to describe your C++ API, and it generates the JNI glue at build time. The result is a workflow where adding a new C++ function to your Java API is as simple as adding a method declaration.

This post explains how ProofSouq uses JavaCPP to expose a custom MLIR C++ library to the Java areas of its codebase.

API Shape

ProofSouq’s defines a CMake project that builds a shared library called ps-mlir-api. This library provides a set of namespaced C++ functions organized by concern: MLIR context and module lifecycle, MLIR operation building, dialect-specific types and operations, logging, and initialization. Some representative slices of the header files defining this API are:

// include/API/CoreAPI.h
namespace ps::core {
    MLIRContext* createMlirContext();
    void destroyMlirContext(MLIRContext* context);
    ModuleOp createModuleOp(MLIRContext context);
    bool verifyModule(const ModuleOp module);
    auto createBlockInRegion(mlir::Operation* op, unsigned regionIdx) -> mlir::Block*;
    auto saveInsertionPoint(const mlir::OpBuilder* builder) -> mlir::OpBuilder::InsertPoint*;
    void setInsertionPointToBlockStart(mlir::OpBuilder* builder, mlir::Block* block);
    void restoreInsertionPoint(mlir::OpBuilder* builder, const mlir::OpBuilder::InsertPoint* insertionPoint);
    // ...
}

// include/API/SearchAPI.h
namespace ps::search {
    auto convertRocqToSearch(mlir::ModuleOp module) -> std::string;
    auto convertLeanToSearch(mlir::ModuleOp module) -> std::string;
    auto canonicalizeSearch(mlir::ModuleOp module) -> std::string;
    auto buildSearchIndex(...) -> std::string;
}  // ...

// ...

The goal is to call these functions from Java without writing any JNI code by hand.

Bridge Components

The Java-layer bridge consists of three separate Gradle subprojects:

ps-mlir-bindings: the bridge classes themselves, plus a Gradle task that invokes JavaCPP’s code generator
ps-mlir-config: a single JavaCPP configuration class that tells JavaCPP what library to link against and which headers to parse
ps-mlir-types: various implementations of JavaCPP’s Pointer class so that we can pass typed pointers around

This separation keeps concerns clean. The configuration class is published as its own artifact so that bridge classes and their consumers can both depend on it without circular references.

Building the C++ Library

Before JavaCPP can generate anything, you need a shared library to link against. The CMakeLists for ps-mlir-api is straightforward but has a few details worth noting:

add_llvm_library(ps-mlir-api
    SHARED
    BaseAPI.cpp CoreAPI.cpp InitAPI.cpp LoggingAPI.cpp
    LeanAPI.cpp RocqAPI.cpp SearchAPI.cpp

    LINK_LIBS
    PUBLIC
        MLIRInferTypeOpInterface MLIRIR MLIRParser
        MLIRSupport MLIRTransforms
    PRIVATE
        MLIRRocqDialect MLIRLeanDialect MLIRSearchDialect MLIRBaseDialect
        MLIRRocqToLeanConversion MLIRLeanToRocqConversion
        MLIRRocqToSearchConversion MLIRLeanToSearchConversion
)

set_target_properties(ps-mlir-api PROPERTIES
    CXX_VISIBILITY_PRESET default
    VISIBILITY_INLINES_HIDDEN OFF
    BUILD_RPATH "${LLVM_LIBRARY_DIRS};${MLIR_LIBRARY_DIRS}"
    INSTALL_RPATH "$ORIGIN;${LLVM_LIBRARY_DIRS}"
    NO_SONAME ON
)

A few things to note:

CXX_VISIBILITY_PRESET default and VISIBILITY_INLINES_HIDDEN OFF ensure that the library’s symbols are exported, as JavaCPP’s generated JNI code needs to find them at runtime.
NO_SONAME ON gives the library a simple filename without a version suffix, so that the ultimate path to the library within the JAR’s resource folder is deterministic.
The RPATH settings ensure the library can find LLVM and MLIR at runtime regardless of where the JAR is unpacked.
LINK_LIBS PUBLIC contains the libraries for all core MLIR types/interfaces used in the API header file function signatures; everything else is an implementation detail that belongs in LINK_LIBS PRIVATE.

Bridge Configuration

JavaCPP’s configuration mechanism is a class annotated with @Properties that describes the target platform, the headers to parse, and the libraries to link against. ProofSouq’s looks like this:

// MlirConfig.java
@Properties(
  value = @Platform(
    library = "jniMlirBridge",
    include = {
      "<unistd.h>",
      "<stdio.h>",
      "mlir/IR/BuiltinOps.h",
      "mlir/IR/MLIRContext.h",
      "mlir/IR/Operation.h",
      "API/BaseAPI.h",
      "API/InitAPI.h",
      "API/CoreAPI.h",
      "API/LeanAPI.h",
      "API/LoggingAPI.h",
      "API/RocqAPI.h",
      "API/SearchAPI.h"
    },
    preload = "ps-mlir-api",
    link = "ps-mlir-api"
  )
)
public class MlirConfig {
}

library names the JNI library that JavaCPP will produce (jniMlirBridge.so).
preload names a native library that must be loaded before the JNI library; in this case, the shared library described above.
link tells the linker what to link against when generating jniMlirBridge.so.
The include list names the C++ headers that JavaCPP will parse; the paths are relative to whatever -I flags you pass to the build task. Note that headers for any MLIR types appearing in the API’s function signatures are also included.

Writing Bridge Classes

Each bridge class maps to a C++ namespace, specified with the @Namespace annotation, and each method is public static native. JavaCPP generates JNI wrappers from these declarations.

The start of MlirCoreBridge looks like:

// MlirCoreBridge.java
@Properties(inherit = MlirConfig.class)
@Namespace("ps::core")
public class MlirCoreBridge {

  static {
    Loader.load();
  }

  public static native MlirContext createMlirContext();

  public static native void destroyMlirContext(MlirContext context);

  public static native @ByVal MlirModuleOp createModuleOp(MlirContext context);

  public static native void destroyModuleOp(@ByVal MlirModuleOp moduleOp);

  public static native boolean verifyModule(@ByVal @Const MlirModuleOp module);

  // ...
}

@ByVal tells JavaCPP that the C++ function returns or takes a value type: MLIR’s ModuleOp, Type, Value, Attribute, and Location are all value types, thin wrappers around a pointer with value semantics. Without @ByVal, JavaCPP assumes a pointer return by default.
@Const maps to C++‘s const.
@StdString tells JavaCPP that the C++ function returns a std::string, which it will automatically convert to a Java String, no manual cleanup required.

Anything not annotated with @ByVal is implicitly a @ByRef, i.e. a pointer type. Heavyweight MLIR data structures are passed around by pointer, e.g. MLIRContext*, OpBuilder*, Operation*, and ultimately must be deallocated to avoid memory leaks.

The Wrapper Types

MLIR’s types (mlir::MLIRContext, mlir::Operation, etc.) need Java counterparts; without them, the Java codebase would be littered with generic Pointers everywhere with no static indication as to what they actually point to. These can be created by subclassing Pointer, e.g.:

// MlirContext.java

/** Always passed around by reference, because mlir::MLIRContext is a heavyweight struct. */
@Properties(inherit = MlirConfig.class)
@Namespace("mlir")
@Name("MLIRContext")
@Opaque
public class MlirContext extends Pointer {
  public MlirContext(Pointer p) {
    super(p);
  }
}

Defining these in a dedicated Gradle module lets you depend on the types without pulling in the full bridge implementations, which is useful when writing classes/interfaces elsewhere in the codebase that need to work with them, but which don’t need to call out to the shared library themselves.

Handling Callbacks

One of the more interesting problems is wiring up C++ callback registrations. The initialization API accepts a function pointer from callers which want to receive MLIR diagnostic/error messages:

// include/API/InitAPI.h
typedef void (*DiagnosticCallback)(int severity, const char* message);
namespace ps::init {
  void setupCrashHandler();
  void registerLlvmFatalErrorHandler(DiagnosticCallback callback);
  void registerMlirContextErrorHandler(mlir::MLIRContext* ctx, DiagnosticCallback callback);
}

JavaCPP handles this via FunctionPointer. You subclass it and implement call with the matching signature:

// MlirInitBridge.java
@Properties(inherit = MlirConfig.class)
@Namespace("ps::init")
public class MlirInitBridge {

  private static final Logger LOG = LoggerFactory.getLogger(MlirInitBridge.class);
  // Hold a static reference to prevent GC from eating our listener
  private static DiagnosticCallback activeCallback;

  static {
    Loader.load();
  }

  public static class DiagnosticCallback extends FunctionPointer {
    static {
      Loader.load();
    }

    protected DiagnosticCallback() {
      allocate();
    }

    private native void allocate();

    public void call(int severity, @Cast("const char*") BytePointer message) {
      String msg = message.getString();
      switch (severity) {
        case 1:
          LOG.warn("MLIR Warning: {}", msg);
          break;
        case 2:
          LOG.error("MLIR Error: {}", msg);
          break;
        case 3:
          LOG.debug("MLIR Remark: {}", msg);
          break;
        case 4:
          LOG.error("!!! LLVM FATAL !!!: {}", msg);
          break;
        default:
          LOG.info("MLIR Info: {}", msg);
          break;
      }
    }
  }

  /** Performs process-related setup (log handling, signal handling) at most once. */
  public static synchronized void safeGlobalInitialize() {
    if (activeCallback != null) {
      LOG.info("LLVM/MLIR process-wide log/signal handling already initialized. Skipping.");
      return;
    }
    activeCallback = new DiagnosticCallback();
    setupCrashHandler();
    registerLlvmFatalErrorHandler(activeCallback);
    LOG.info("Initialized crash handling and logging for MLIR/LLVM.");
  }

  public static void registerMlirContextErrorHandler(MlirContext mlirContext) {
    registerMlirContextErrorHandler(mlirContext, activeCallback);
  }

  private static native void setupCrashHandler();

  public static native void registerLlvmFatalErrorHandler(DiagnosticCallback callback);

  private static native void registerMlirContextErrorHandler(
    MlirContext mlirContext, DiagnosticCallback callback);
}

The activeCallback static field is noteworthy: C++ doesn’t know anything about Java’s garbage collector, so if the DiagnosticCallback object gets GC’d while the shared library still holds the function pointer, the next dereference of it will segfault. Keeping a static reference in Java ensures the object lives as long as the JVM.

setupCrashHandler() installs handlers for SIGABRT and SIGSEGV that translate them into recoverable JVM exceptions rather than aborting the process. This is necessary because LLVM uses these signals internally for assertion failures, which would otherwise kill the JVM.

Raw Buffer Returns

Most of the API returns @StdString String, which JavaCPP converts from a std::string to an interned String automatically. But one such annotated method returns a protobuf-encoded binary blob as a BytePointer instead:

public static native @StdString BytePointer buildSearchIndex(
  @ByVal MlirModuleOp preCanonicalModule,
  @ByVal MlirModuleOp postCanonicalModule);

Here, @StdString BytePointer tells JavaCPP that the C++ function returns a std::string, but to give the Java caller a raw BytePointer rather than converting it to a Java String. This is necessary when the return value is binary data: String conversion assumes UTF-8, which would corrupt arbitrary bytes. The caller reads the BytePointer’s backing buffer directly and passes it to the #parseFrom() deserializer method on the appropriate protobuf message type.

Building the Bridge

The actual bridge generation happens in a Gradle JavaExec task:

dependencies {
  implementation project(":ps-mlir-config")
  implementation project(":ps-mlir-types")

  implementation libs.javacpp
}

sourceSets {
  // create a dedicated source set to avoid cyclic dependency when sources live in main
  jni {
    java {
      srcDir 'src/main/java'
      include '**/*.java'
    }
  }
}

configurations {
  jniImplementation.extendsFrom(implementation)
  jniRuntimeOnly.extendsFrom(runtimeOnly)
}

def generatedJavacppDir = layout.buildDirectory.dir("generated/javacpp")
def cmakeBuildDir = "${project(":ps-mlir-cpp").projectDir}/build/cmake/lib/API"
def platformSubpath = "com/proofsouq/mlir/linux-x86_64" // the exact subpath at which JavaCPP will look for the .so
def platformSubdir = generatedJavacppDir.get().dir(platformSubpath).asFile
def sharedLibFilename = "libps-mlir-api.so" // relative to cmakeBuildDir
def bridgeLibName = "jniMlirBridge"

task generateJavacppBridge(type: JavaExec) {
  group = "build"
  description = "Generates JNI wrappers and compiles the native bridge."
  dependsOn compileJava

  // Ensure the C++ library is built first
  dependsOn ":ps-mlir-cpp:cmakeBuild"

  inputs.dir "${projectDir}/src" // if the bridge source, recompile
  inputs.file "${cmakeBuildDir}/${sharedLibFilename}" // if the library binary changes, recompile
  outputs.file("${platformSubdir}/lib${bridgeLibName}.so")

  // Use the jni sourceSet outputs to break the cycle with main
  classpath = sourceSets.jni.runtimeClasspath

  mainClass = "org.bytedeco.javacpp.tools.Builder"

  args "-d", platformSubdir,
    "-Xcompiler", "-L${cmakeBuildDir}",  // path to .so
    "-Xcompiler", "-I${project(":ps-mlir-cpp").projectDir}/include",
    "-Xcompiler", "-Wl,-rpath,\$ORIGIN", // Important: tells the .so to look in its own folder
    "-o", bridgeLibName,
    "com.proofsouq.mlir.bridge.MlirBaseBridge",
    "com.proofsouq.mlir.bridge.MlirInitBridge",
    "com.proofsouq.mlir.bridge.MlirCoreBridge",
    "com.proofsouq.mlir.bridge.MlirSearchBridge",
    "com.proofsouq.mlir.bridge.MlirRocqBridge",
    "com.proofsouq.mlir.bridge.MlirLeanBridge",
    "com.proofsouq.mlir.bridge.MlirLogBridge",
    "com.proofsouq.mlir.types.MlirModuleOp" // for native #getOperation()

  doLast {
    // I tried making this a dedicated Copy task, but ran into some odd/transient
    // issues where the lib was not copied as expected
    copy {
      // manually copies the dependent C++ library into the resource folder
      // in which JavaCPP will look for it at runtime: com/proofsouq/mlir/linux-x86_64/
      from cmakeBuildDir
      include sharedLibFilename
      into platformSubdir
    }
  }
}

sourceSets.main.resources {
  // Include the built shared library in JARs
  srcDir generatedJavacppDir
}
// Ensure the shared library is built and copied BEFORE resources are collected
processResources.dependsOn generateJavacppBridge

This turned out to be more complicated than I expected, mainly due to the need to introduce a dedicated source set to eliminate a cycle in the build graph for the compileJava target. Other than, the situation is mostly straightforward:

-L points to the directory containing libps-mlir-api.so so the linker can find it.
-I adds the MLIR C++ include path so the generated code compiles.
-Wl,-rpath,$ORIGIN sets the runtime library search path to the directory containing jniMlirBridge.so itself, so the library can find libps-mlir-api.so when both are unpacked from the JAR at the same location.
Any Pointer subclasses which themselves contain native methods, e.g. here there is one for mlir::ModuleOp#getOperation(), must be included in the argument list.

After code generation, the task copies the library generated by JavaCPP (libps-mlir-api.so) into com/proofsouq/mlir/linux-x86_64/. JavaCPP expects its preloaded native libraries to live in a platform-specific subdirectory of the classpath. Both jniMlirBridge.so (generated, built by JavaCPP) and libps-mlir-api.so (copied, built by CMake) end up here, and both are packaged into the JAR as resources. At runtime, JavaCPP extracts them and loads them in dependency order.

Driving Over the Bridge

With all this in place, call sites using the bridge are clean and simple:

public void example() {
  MlirContext ctx = MlirCoreBridge.createMlirContext();
  MlirModuleOp module = MlirCoreBridge.createModuleOp(ctx);
  // ...
  String mlir = printModuleToString(module, true);
  boolean isValid = verifyModule(module);
  String details = verifyModuleWithDetails(module);
  // ...
  MlirCoreBridge.destroyModuleOp(module);
  MlirCoreBridge.destroyMlirContext(ctx);
}

The bridge reads like a normal Java API, without any JNI or C/C++ glue code required. The annotations in the bridge classes encode the full contract with C++, and any (well, most) expectation mismatches will manifest as compile-time errors when the JavaCPP generator runs.

Parting Notes

A simple API surface is better. JavaCPP’s annotation set covers the common cases well, but C++ APIs with complex overloading, templates, unusual calling conventions, etc. require extra work (or may be impossible) to represent in JavaCPP. The more your C++ API looks like a C API with thin C++ types, the smoother the experience. MLIR plays very well here, as most of the types one would be interested in passing around are thin pointer wrappers with value semantics.

The generated JNI library bundles its dependencies. This is convenient for deployment but can result in a large JAR, and rebuilding the bridge requires rerunning the full C++ compilation and code generation pipeline. This approach is workable as a starting point, but with time/growth, may need to be reevaluated.

C++/Java lifetime management is still required. The JVM and C++ memory spaces are distinct, and the JVM’s GC adds another variable into the mix. Once again, MLIR is a great fit: an mlir::MLIRContext owns practically every allocation made while building modules, operations, types, etc., so with few exceptions (pass managers, rewrite patterns), deallocation is as simple as destroying the mlir::MLIRContext.