Skip to content

Sidra's integration runtimes using OpenJDK

This page intends to clarify some concepts about integration runtimes and give a quick guide about which cost-free runtime environment can be used and how.

Java Runtime is needed

The target physical format for the data ingested with Sidra is Parquet files, being optimal for Databricks. The IR agent does not have the native ability to (de)serialize to Parquet; for these, Java libraries are needed. And the IR agent installation kit does not include a Java Runtime. Hence, a Java Runtime must be manually installed on the node, after installing the IR agent.

This can be easily done using the Java installer, which sets everything necessary.

Mind the licensing

Recently, Oracle changed the licensing terms for the Java Runtime Environment (JRE): for production environment, fees may apply. The usage of the JRE may not be free-of-charge.

Fortunately, Oracle is contributing to an open-source version - OpenJDK - that can be used. See the install instructions.

Microsoft, with its commitment to protect its customers from licensing claims, is also publishing their own build of the open-source OpenJDK, with an .MSI installer too: The Microsoft Build of OpenJDK.

PATH and Registry keys

Unfortunately, the OpenJDK versions above, including the Microsoft's OpenJDK installer -commitment to respecting licenses and disimbursment- don't set the Registry keys that the Integration Runtime agent needs for calling the JRE when processing for Parquet files.

According to the troubleshooting article here, the IR agent:

  1. Needs to be same bitness (64-bit) as the JRE, hence, both will be installed under C:\Program Files\ (not C:\Program Files (x86)\).
  2. Checks for the installed version of JRE under HKEY_LOCAL_MACHINE\SOFTWARE\JavaSoft\Java Runtime Environment, value CurrentVersion.
  3. Retrieves location of the JRE from HKEY_LOCAL_MACHINE\SOFTWARE\JavaSoft\Java Runtime Environment\<versionNumber>, value JavaHome.
  4. Locates the bin\server folder in the path retrieved.
  5. Loads the Java Virtual Machine library, jvm.dll; if it is not present, the bin\client is probed for the same DLL.

In addition, for it, is recommendable to set the environment variables for the JRE too:

  • The variable JAVA_HOME should point to the installation folder; for example:
    C:\Program Files\Microsoft\jdk-\
  • The variable PATH should includes the path to the binaries of JRE; for example:
    C:\Program Files\Microsoft\jdk-\bin


On installing the Oracle's Java 8, the installer is adding Registry keys under:

Exporting that key, we end up with a text file bearing the .REG extension, looking simmilar to the below. Then, Oracle's Java 8 may be uninstalled.

The default Java location when we install using the Microsoft's build of OpenJDK, on Windows, is:
C:\Program Files\Microsoft\jdk-\

After adapting the Registry file - the corresponding keys - updating the version and location of the JRE and disregarding the browser plugin entries:

Windows Registry Editor Version 5.00



[HKEY_LOCAL_MACHINE\SOFTWARE\JavaSoft\Java Plug-in\11.333.2]
"JavaHome"="C:\\Program Files\\Microsoft\\jdk-"

[HKEY_LOCAL_MACHINE\SOFTWARE\JavaSoft\Java Runtime Environment]

[HKEY_LOCAL_MACHINE\SOFTWARE\JavaSoft\Java Runtime Environment\1.17]
"RuntimeLib"="C:\\Program Files\\Microsoft\\jdk-\\bin\\server\\jvm.dll"
"JavaHome"="C:\\Program Files\\Microsoft\\jdk-"

[HKEY_LOCAL_MACHINE\SOFTWARE\JavaSoft\Java Runtime Environment\]
"JavaHome"="C:\\Program Files\\Microsoft\\jdk-"
"RuntimeLib"="C:\\Program Files\\Microsoft\\jdk-\\bin\\server\\jvm.dll"

[HKEY_LOCAL_MACHINE\SOFTWARE\JavaSoft\Java Runtime Environment\\MSI]
"INSTALLDIR"="C:\\Program Files\\Microsoft\\jdk-\\"




[HKEY_LOCAL_MACHINE\SOFTWARE\JavaSoft\Java Web Start\1.0.1]
"Home"="C:\\Program Files\\Microsoft\\jdk-\\bin"

[HKEY_LOCAL_MACHINE\SOFTWARE\JavaSoft\Java Web Start\1.0.1_02]
"Home"="C:\\Program Files\\Microsoft\\jdk-\\bin"

[HKEY_LOCAL_MACHINE\SOFTWARE\JavaSoft\Java Web Start\1.0.1_03]
"Home"="C:\\Program Files\\Microsoft\\jdk-\\bin"

[HKEY_LOCAL_MACHINE\SOFTWARE\JavaSoft\Java Web Start\1.0.1_04]
"Home"="C:\\Program Files\\Microsoft\\jdk-\\bin"

"Home"="C:\\Program Files\\Microsoft\\jdk-\\bin"

[HKEY_LOCAL_MACHINE\SOFTWARE\JavaSoft\Java Web Start\1.2.0_01]
"Home"="C:\\Program Files\\Microsoft\\jdk-\\bin"

[HKEY_LOCAL_MACHINE\SOFTWARE\JavaSoft\Java Web Start\11.333.2]
"Home"="C:\\Program Files\\Microsoft\\jdk-\\bin"


[HKEY_LOCAL_MACHINE\SOFTWARE\JavaSoft\Java Web Start Caps\11.333.2]

Visual C++ runtime

The Java Virtual Machine of the JRE - <jre-path>\bin\server\jvm.dll - takes a dependency on Visual C++ runtime libraries, as illustrated below:

# Using Visual Studio Build Tools, location:
# C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.29.30133\bin\Hostx64\x64\
.\dumpbin.exe /DEPENDENTS "C:\Program Files\Microsoft\jdk-\bin\server\jvm.dll"
 Image has the following dependencies:

Process Monitor traces show that C:\Windows\System32\vcruntime140_1.dll is being loaded (among others); if Visual C++ runtime libs are missing, then jvm.dll of the JRE can't be loaded either.

For the above specific version of Microsoft's build of OpenJDK, installing the version 17 from the Microsoft Visual C++ Redistributable downloads page, then applying the above Registry keys, resulted in a working Data Factory Integration Runtime node, one able to process Parquet files.

Avoid OutOfMemoryError

When handling large amounts of data with Parquet files, the Java-based libraries may cause the JRE to go into high fragmentation of the free memory space in Java heaps. This will cause, ultimately, failures in the Data Factory activities involving Parquet files; such failures would show in the error message something like:

java.lang.OutOfMemoryError:Java heap space

To help avoiding such occurrences, we could "tell" the Java runtime to use a larger memory space for its heaps. Do so by adding a system-wide environment variable with the minimum and maximum value for heap sizes.

_JAVA_OPTIONS = -Xms512m -Xmx16gJAVA_TOOL_OPTIONS = -Xms512m -Xmx16g

out of memory error image

According to Microsoft documentation, a _JAVA_OPTIONS should be added, and it works in the tests we performed. But according to comments in this post, the supported environment variable name is JAVA_TOOL_OPTIONS.

The flag Xms specifies the initial memory allocation pool for a Java Virtual Machine (JVM), while Xmx specifies the maximum memory allocation pool. This means that JVM will be started with Xms amount of memory and will be able to use a maximum of Xmx amount of memory. By default, the JRE uses min 64 MB and max 1GB.

After setting Java heap memory limits, reboot the machine. We may check memory settings by looking for the values MinHeapSize/InitialHeapSize and MaxHeapSize/SoftMaxHeapSize in the output of:

java -XX:+PrintFlagsFinal -version

Last update: 2023-07-19