Home
Bridging MLIR C++ to Java with JavaCPP
ProofSouq uses MLIR under the hood to represent Lean and Rocq operations. This allows us to take advantage of MLIR’s excellent facilities for conversion and rewriting.
However, we prefer to keep as much of the application layer in Java as possible. We therefore must introduce Java/C++ interop. Thankfully, it’s possible to forgo writing JNI glue code by hand and instead use JavaCPP, with which you annotate Java classes to describe your C++ API, and it generates the JNI glue at build time. The result is a workflow where adding a new C++ function to your Java API is as simple as adding a method declaration.
This post explains how ProofSouq uses JavaCPP to expose a custom MLIR C++ library to the Java areas of its codebase.
API Shape
ProofSouq’s defines a CMake project that builds a shared library called ps-mlir-api. This library provides a set of namespaced C++ functions organized by concern: MLIR context and module lifecycle, MLIR operation building, dialect-specific types and operations, logging, and initialization. Some representative slices of the header files defining this API are:
// include/API/CoreAPI.h
namespace ps::core {
MLIRContext* createMlirContext();
void destroyMlirContext(MLIRContext* context);
ModuleOp createModuleOp(MLIRContext context);
bool verifyModule(const ModuleOp module);
auto createBlockInRegion(mlir::Operation* op, unsigned regionIdx) -> mlir::Block*;
auto saveInsertionPoint(const mlir::OpBuilder* builder) -> mlir::OpBuilder::InsertPoint*;
void setInsertionPointToBlockStart(mlir::OpBuilder* builder, mlir::Block* block);
void restoreInsertionPoint(mlir::OpBuilder* builder, const mlir::OpBuilder::InsertPoint* insertionPoint);
// ...
}
// include/API/SearchAPI.h
namespace ps::search {
auto convertRocqToSearch(mlir::ModuleOp module) -> std::string;
auto convertLeanToSearch(mlir::ModuleOp module) -> std::string;
auto canonicalizeSearch(mlir::ModuleOp module) -> std::string;
auto buildSearchIndex(...) -> std::string;
} // ...
// ...
The goal is to call these functions from Java without writing any JNI code by hand.
Bridge Components
The Java-layer bridge consists of three separate Gradle subprojects:
ps-mlir-bindings: the bridge classes themselves, plus a Gradle task that invokes JavaCPP’s code generatorps-mlir-config: a single JavaCPP configuration class that tells JavaCPP what library to link against and which headers to parseps-mlir-types: various implementations of JavaCPP’sPointerclass so that we can pass typed pointers around
This separation keeps concerns clean. The configuration class is published as its own artifact so that bridge classes and their consumers can both depend on it without circular references.
Building the C++ Library
Before JavaCPP can generate anything, you need a shared library to link against. The CMakeLists for ps-mlir-api is straightforward but has a few details worth noting:
add_llvm_library(ps-mlir-api
SHARED
BaseAPI.cpp CoreAPI.cpp InitAPI.cpp LoggingAPI.cpp
LeanAPI.cpp RocqAPI.cpp SearchAPI.cpp
LINK_LIBS
PUBLIC
MLIRInferTypeOpInterface MLIRIR MLIRParser
MLIRSupport MLIRTransforms
PRIVATE
MLIRRocqDialect MLIRLeanDialect MLIRSearchDialect MLIRBaseDialect
MLIRRocqToLeanConversion MLIRLeanToRocqConversion
MLIRRocqToSearchConversion MLIRLeanToSearchConversion
)
set_target_properties(ps-mlir-api PROPERTIES
CXX_VISIBILITY_PRESET default
VISIBILITY_INLINES_HIDDEN OFF
BUILD_RPATH "${LLVM_LIBRARY_DIRS};${MLIR_LIBRARY_DIRS}"
INSTALL_RPATH "$ORIGIN;${LLVM_LIBRARY_DIRS}"
NO_SONAME ON
)
A few things to note:
CXX_VISIBILITY_PRESET defaultandVISIBILITY_INLINES_HIDDEN OFFensure that the library’s symbols are exported, as JavaCPP’s generated JNI code needs to find them at runtime.NO_SONAME ONgives the library a simple filename without a version suffix, so that the ultimate path to the library within the JAR’s resource folder is deterministic.- The
RPATHsettings ensure the library can find LLVM and MLIR at runtime regardless of where the JAR is unpacked. LINK_LIBS PUBLICcontains the libraries for all core MLIR types/interfaces used in the API header file function signatures; everything else is an implementation detail that belongs inLINK_LIBS PRIVATE.
Bridge Configuration
JavaCPP’s configuration mechanism is a class annotated with @Properties that describes the target platform, the headers to parse, and the libraries to link against. ProofSouq’s looks like this:
// MlirConfig.java
@Properties(
value = @Platform(
library = "jniMlirBridge",
include = {
"<unistd.h>",
"<stdio.h>",
"mlir/IR/BuiltinOps.h",
"mlir/IR/MLIRContext.h",
"mlir/IR/Operation.h",
"API/BaseAPI.h",
"API/InitAPI.h",
"API/CoreAPI.h",
"API/LeanAPI.h",
"API/LoggingAPI.h",
"API/RocqAPI.h",
"API/SearchAPI.h"
},
preload = "ps-mlir-api",
link = "ps-mlir-api"
)
)
public class MlirConfig {
}
librarynames the JNI library that JavaCPP will produce (jniMlirBridge.so).preloadnames a native library that must be loaded before the JNI library; in this case, the shared library described above.linktells the linker what to link against when generatingjniMlirBridge.so.- The
includelist names the C++ headers that JavaCPP will parse; the paths are relative to whatever-Iflags you pass to the build task. Note that headers for any MLIR types appearing in the API’s function signatures are also included.
Writing Bridge Classes
Each bridge class maps to a C++ namespace, specified with the @Namespace annotation, and each method is public static native. JavaCPP generates JNI wrappers from these declarations.
The start of MlirCoreBridge looks like:
// MlirCoreBridge.java
@Properties(inherit = MlirConfig.class)
@Namespace("ps::core")
public class MlirCoreBridge {
static {
Loader.load();
}
public static native MlirContext createMlirContext();
public static native void destroyMlirContext(MlirContext context);
public static native @ByVal MlirModuleOp createModuleOp(MlirContext context);
public static native void destroyModuleOp(@ByVal MlirModuleOp moduleOp);
public static native boolean verifyModule(@ByVal @Const MlirModuleOp module);
// ...
}
@ByValtells JavaCPP that the C++ function returns or takes a value type: MLIR’sModuleOp,Type,Value,Attribute, andLocationare all value types, thin wrappers around a pointer with value semantics. Without@ByVal, JavaCPP assumes a pointer return by default.@Constmaps to C++‘sconst.@StdStringtells JavaCPP that the C++ function returns astd::string, which it will automatically convert to a JavaString, no manual cleanup required.
Anything not annotated with @ByVal is implicitly a @ByRef, i.e. a pointer type. Heavyweight MLIR data structures are passed around by pointer, e.g. MLIRContext*, OpBuilder*, Operation*, and ultimately must be deallocated to avoid memory leaks.
The Wrapper Types
MLIR’s types (mlir::MLIRContext, mlir::Operation, etc.) need Java counterparts; without them, the Java codebase would be littered with generic Pointers everywhere with no static indication as to what they actually point to. These can be created by subclassing Pointer, e.g.:
// MlirContext.java
/** Always passed around by reference, because mlir::MLIRContext is a heavyweight struct. */
@Properties(inherit = MlirConfig.class)
@Namespace("mlir")
@Name("MLIRContext")
@Opaque
public class MlirContext extends Pointer {
public MlirContext(Pointer p) {
super(p);
}
}
Defining these in a dedicated Gradle module lets you depend on the types without pulling in the full bridge implementations, which is useful when writing classes/interfaces elsewhere in the codebase that need to work with them, but which don’t need to call out to the shared library themselves.
Handling Callbacks
One of the more interesting problems is wiring up C++ callback registrations. The initialization API accepts a function pointer from callers which want to receive MLIR diagnostic/error messages:
// include/API/InitAPI.h
typedef void (*DiagnosticCallback)(int severity, const char* message);
namespace ps::init {
void setupCrashHandler();
void registerLlvmFatalErrorHandler(DiagnosticCallback callback);
void registerMlirContextErrorHandler(mlir::MLIRContext* ctx, DiagnosticCallback callback);
}
JavaCPP handles this via FunctionPointer. You subclass it and implement call with the matching signature:
// MlirInitBridge.java
@Properties(inherit = MlirConfig.class)
@Namespace("ps::init")
public class MlirInitBridge {
private static final Logger LOG = LoggerFactory.getLogger(MlirInitBridge.class);
// Hold a static reference to prevent GC from eating our listener
private static DiagnosticCallback activeCallback;
static {
Loader.load();
}
public static class DiagnosticCallback extends FunctionPointer {
static {
Loader.load();
}
protected DiagnosticCallback() {
allocate();
}
private native void allocate();
public void call(int severity, @Cast("const char*") BytePointer message) {
String msg = message.getString();
switch (severity) {
case 1:
LOG.warn("MLIR Warning: {}", msg);
break;
case 2:
LOG.error("MLIR Error: {}", msg);
break;
case 3:
LOG.debug("MLIR Remark: {}", msg);
break;
case 4:
LOG.error("!!! LLVM FATAL !!!: {}", msg);
break;
default:
LOG.info("MLIR Info: {}", msg);
break;
}
}
}
/** Performs process-related setup (log handling, signal handling) at most once. */
public static synchronized void safeGlobalInitialize() {
if (activeCallback != null) {
LOG.info("LLVM/MLIR process-wide log/signal handling already initialized. Skipping.");
return;
}
activeCallback = new DiagnosticCallback();
setupCrashHandler();
registerLlvmFatalErrorHandler(activeCallback);
LOG.info("Initialized crash handling and logging for MLIR/LLVM.");
}
public static void registerMlirContextErrorHandler(MlirContext mlirContext) {
registerMlirContextErrorHandler(mlirContext, activeCallback);
}
private static native void setupCrashHandler();
public static native void registerLlvmFatalErrorHandler(DiagnosticCallback callback);
private static native void registerMlirContextErrorHandler(
MlirContext mlirContext, DiagnosticCallback callback);
}
The activeCallback static field is noteworthy: C++ doesn’t know anything about Java’s garbage collector, so if the DiagnosticCallback object gets GC’d while the shared library still holds the function pointer, the next dereference of it will segfault. Keeping a static reference in Java ensures the object lives as long as the JVM.
setupCrashHandler() installs handlers for SIGABRT and SIGSEGV that translate them into recoverable JVM exceptions rather than aborting the process. This is necessary because LLVM uses these signals internally for assertion failures, which would otherwise kill the JVM.
Raw Buffer Returns
Most of the API returns @StdString String, which JavaCPP converts from a std::string to an interned String automatically. But one such annotated method returns a protobuf-encoded binary blob as a BytePointer instead:
public static native @StdString BytePointer buildSearchIndex(
@ByVal MlirModuleOp preCanonicalModule,
@ByVal MlirModuleOp postCanonicalModule);
Here, @StdString BytePointer tells JavaCPP that the C++ function returns a std::string, but to give the Java caller a raw BytePointer rather than converting it to a Java String. This is necessary when the return value is binary data: String conversion assumes UTF-8, which would corrupt arbitrary bytes. The caller reads the BytePointer’s backing buffer directly and passes it to the #parseFrom() deserializer method on the appropriate protobuf message type.
Building the Bridge
The actual bridge generation happens in a Gradle JavaExec task:
dependencies {
implementation project(":ps-mlir-config")
implementation project(":ps-mlir-types")
implementation libs.javacpp
}
sourceSets {
// create a dedicated source set to avoid cyclic dependency when sources live in main
jni {
java {
srcDir 'src/main/java'
include '**/*.java'
}
}
}
configurations {
jniImplementation.extendsFrom(implementation)
jniRuntimeOnly.extendsFrom(runtimeOnly)
}
def generatedJavacppDir = layout.buildDirectory.dir("generated/javacpp")
def cmakeBuildDir = "${project(":ps-mlir-cpp").projectDir}/build/cmake/lib/API"
def platformSubpath = "com/proofsouq/mlir/linux-x86_64" // the exact subpath at which JavaCPP will look for the .so
def platformSubdir = generatedJavacppDir.get().dir(platformSubpath).asFile
def sharedLibFilename = "libps-mlir-api.so" // relative to cmakeBuildDir
def bridgeLibName = "jniMlirBridge"
task generateJavacppBridge(type: JavaExec) {
group = "build"
description = "Generates JNI wrappers and compiles the native bridge."
dependsOn compileJava
// Ensure the C++ library is built first
dependsOn ":ps-mlir-cpp:cmakeBuild"
inputs.dir "${projectDir}/src" // if the bridge source, recompile
inputs.file "${cmakeBuildDir}/${sharedLibFilename}" // if the library binary changes, recompile
outputs.file("${platformSubdir}/lib${bridgeLibName}.so")
// Use the jni sourceSet outputs to break the cycle with main
classpath = sourceSets.jni.runtimeClasspath
mainClass = "org.bytedeco.javacpp.tools.Builder"
args "-d", platformSubdir,
"-Xcompiler", "-L${cmakeBuildDir}", // path to .so
"-Xcompiler", "-I${project(":ps-mlir-cpp").projectDir}/include",
"-Xcompiler", "-Wl,-rpath,\$ORIGIN", // Important: tells the .so to look in its own folder
"-o", bridgeLibName,
"com.proofsouq.mlir.bridge.MlirBaseBridge",
"com.proofsouq.mlir.bridge.MlirInitBridge",
"com.proofsouq.mlir.bridge.MlirCoreBridge",
"com.proofsouq.mlir.bridge.MlirSearchBridge",
"com.proofsouq.mlir.bridge.MlirRocqBridge",
"com.proofsouq.mlir.bridge.MlirLeanBridge",
"com.proofsouq.mlir.bridge.MlirLogBridge",
"com.proofsouq.mlir.types.MlirModuleOp" // for native #getOperation()
doLast {
// I tried making this a dedicated Copy task, but ran into some odd/transient
// issues where the lib was not copied as expected
copy {
// manually copies the dependent C++ library into the resource folder
// in which JavaCPP will look for it at runtime: com/proofsouq/mlir/linux-x86_64/
from cmakeBuildDir
include sharedLibFilename
into platformSubdir
}
}
}
sourceSets.main.resources {
// Include the built shared library in JARs
srcDir generatedJavacppDir
}
// Ensure the shared library is built and copied BEFORE resources are collected
processResources.dependsOn generateJavacppBridge
This turned out to be more complicated than I expected, mainly due to the need to introduce a dedicated source set to eliminate a cycle in the build graph for the compileJava target. Other than, the situation is mostly straightforward:
-Lpoints to the directory containinglibps-mlir-api.soso the linker can find it.-Iadds the MLIR C++ include path so the generated code compiles.-Wl,-rpath,$ORIGINsets the runtime library search path to the directory containingjniMlirBridge.soitself, so the library can findlibps-mlir-api.sowhen both are unpacked from the JAR at the same location.- Any
Pointersubclasses which themselves containnativemethods, e.g. here there is one formlir::ModuleOp#getOperation(), must be included in the argument list.
After code generation, the task copies the library generated by JavaCPP (libps-mlir-api.so) into com/proofsouq/mlir/linux-x86_64/. JavaCPP expects its preloaded native libraries to live in a platform-specific subdirectory of the classpath. Both jniMlirBridge.so (generated, built by JavaCPP) and libps-mlir-api.so (copied, built by CMake) end up here, and both are packaged into the JAR as resources. At runtime, JavaCPP extracts them and loads them in dependency order.
Driving Over the Bridge
With all this in place, call sites using the bridge are clean and simple:
public void example() {
MlirContext ctx = MlirCoreBridge.createMlirContext();
MlirModuleOp module = MlirCoreBridge.createModuleOp(ctx);
// ...
String mlir = printModuleToString(module, true);
boolean isValid = verifyModule(module);
String details = verifyModuleWithDetails(module);
// ...
MlirCoreBridge.destroyModuleOp(module);
MlirCoreBridge.destroyMlirContext(ctx);
}
The bridge reads like a normal Java API, without any JNI or C/C++ glue code required. The annotations in the bridge classes encode the full contract with C++, and any (well, most) expectation mismatches will manifest as compile-time errors when the JavaCPP generator runs.
Parting Notes
A simple API surface is better. JavaCPP’s annotation set covers the common cases well, but C++ APIs with complex overloading, templates, unusual calling conventions, etc. require extra work (or may be impossible) to represent in JavaCPP. The more your C++ API looks like a C API with thin C++ types, the smoother the experience. MLIR plays very well here, as most of the types one would be interested in passing around are thin pointer wrappers with value semantics.
The generated JNI library bundles its dependencies. This is convenient for deployment but can result in a large JAR, and rebuilding the bridge requires rerunning the full C++ compilation and code generation pipeline. This approach is workable as a starting point, but with time/growth, may need to be reevaluated.
C++/Java lifetime management is still required. The JVM and C++ memory spaces are distinct, and the JVM’s GC adds another variable into the mix. Once again, MLIR is a great fit: an mlir::MLIRContext owns practically every allocation made while building modules, operations, types, etc., so with few exceptions (pass managers, rewrite patterns), deallocation is as simple as destroying the mlir::MLIRContext.