Sidra's integration runtimes using OpenJDK¶
This page intends to clarify some concepts about integration runtimes and give a quick guide about which cost-free runtime environment can be used and how to configure it.
Creation of an Integration Runtime¶
Step 1: Create a Virtual Machine¶
The first step is to create a new virtual machine that will work as the integration runtime. Check the requirements that the VM needs here.
You may also use an existing one, just ensure it meets the requirements.
Step 2: Setting up a self hosted integration runtime on ADF¶
1. Create a self-hosted IR¶
Check instructions here.
2. Configure the self-hosted IR¶
Check instructions here. In the self-hosted integration runtime installation on the VM step, install the integration runtime software under the path C:\Program Files\
(not C:\Program Files (x86)\
).
Step 3: Set the VM ready for Sidra¶
1. Install Java Runtime (JRE) and OpenJDK¶
The target physical format for the data ingested with Sidra is .parquet
files, being optimal for Databricks. The IR agent does not have the native ability to (de)serialize to Parquet: for these, Java libraries are needed. And the IR agent installation kit does not include a Java Runtime. Hence, a Java Runtime must be manually installed on the node, after installing the IR agent.
This can be easily done using the Java installer, which sets everything necessary: - Installs the x64-bits version (the installer should do it automatically if the VM has an x64-bits image) - Installs it under the path C:\Program Files\
(not C:\Program Files (x86)\
)
Recently, Oracle changed the licensing terms for the Java Runtime Environment (JRE) and for production environments, fees may apply. To avoid these extra costs, we are going to switch the JRE with OpenJDK (an open-source version of the JRE). You need to install both JRE and OpenJDK (starting with JRE). We will use OpenJDK, but we install JRE because it will create some entries in Windows Registry that we need.
After installing JRE, to install OpenJDK, use the .MSI installer
that Microsoft, with its commitment to protect its customers from licensing claims, places here. During the installation process, mark the option to set the JAVA_HOME
variable.
2. PATH and Registry keys¶
Unfortunately, the OpenJDK versions don't set the Registry keys due its commitment to respecting licenses and disbursement, that the Integration Runtime agent needs for calling the JRE when processing for Parquet files.
You can check here, for more information.
1. Configure registry keys¶
The next step is to change the registry entries of the JRE to use OpenJDK.
On installing the Oracle's Java 8, the installer is adding Registry keys under: HKEY_LOCAL_MACHINE\SOFTWARE\JavaSoft
or HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\JavaSoft
(depending on OS version).
Export that key and you will end up with a text file bearing the .REG
extension. To export the file just right click on the folder -> Export
.
Now, you may uninstall JRE if you want.
Open the exported file with a text editor and modify the values of the current JRE location path with the OpenJDK location path. The default location for the Microsoft's build of OpenJDK, on Windows, is: C:\Program Files\Microsoft\jdk-21.0.2.13-hotspot\
(the name will change depending on the installed version). Check the folder to see whether it has been correctly installed. - If after the installation, the folder bin\server
does not exists, the folder bin\client
has the same dll and works the same.
After adapting the Registry file - the corresponding keys - updating the version and location of the JRE and disregarding the browser plugin entries, you should end with something like the following.
1.1 Registry file example¶
Windows Registry Editor Version 5.00
[HKEY_LOCAL_MACHINE\SOFTWARE\JavaSoft]
[HKEY_LOCAL_MACHINE\SOFTWARE\JavaSoft\Java Plug-in]
[HKEY_LOCAL_MACHINE\SOFTWARE\JavaSoft\Java Plug-in\11.333.2]
"JavaHome"="C:\\Program Files\\Microsoft\\jdk-21.0.2.13-hotspot"
"UseJava2IExplorer"=dword:00000001
"UseNewJavaPlugin"=dword:00000001
[HKEY_LOCAL_MACHINE\SOFTWARE\JavaSoft\Java Runtime Environment]
"CurrentVersion"="1.17"
"BrowserJavaVersion"="11.333.2"
[HKEY_LOCAL_MACHINE\SOFTWARE\JavaSoft\Java Runtime Environment\1.17]
"RuntimeLib"="C:\\Program Files\\Microsoft\\jdk-21.0.2.13-hotspot\\bin\\server\\jvm.dll"
"JavaHome"="C:\\Program Files\\Microsoft\\jdk-21.0.2.13-hotspot"
"MicroVersion"="0"
[HKEY_LOCAL_MACHINE\SOFTWARE\JavaSoft\Java Runtime Environment\1.17.0.3.7-hotspot]
"JavaHome"="C:\\Program Files\\Microsoft\\jdk-21.0.2.13-hotspot"
"MicroVersion"="0"
"RuntimeLib"="C:\\Program Files\\Microsoft\\jdk-21.0.2.13-hotspot\\bin\\server\\jvm.dll"
[HKEY_LOCAL_MACHINE\SOFTWARE\JavaSoft\Java Runtime Environment\1.17.0.3.7-hotspot\MSI]
"INSTALLDIR"="C:\\Program Files\\Microsoft\\jdk-21.0.2.13-hotspot\\"
"JU"=""
"OEMUPDATE"=""
"FROMVERSION"="NA"
"FROMVERSIONFULL"=""
"PRODUCTVERSION"="17.0.3.7-hotspot"
"EULA"=""
"JAVAUPDATE"="1"
"AUTOUPDATECHECK"="1"
"AUTOUPDATEDELAY"=""
"FullVersion"="17.0.3.7-hotspot"
[HKEY_LOCAL_MACHINE\SOFTWARE\JavaSoft\Java Update]
[HKEY_LOCAL_MACHINE\SOFTWARE\JavaSoft\Java Update\Policy]
"Country"="ES"
"PostStatusUrl"=https://sjremetrics.java.com/b/ss//6
[HKEY_LOCAL_MACHINE\SOFTWARE\JavaSoft\Java Web Start]
"CurrentVersion"="11.333.2"
[HKEY_LOCAL_MACHINE\SOFTWARE\JavaSoft\Java Web Start\1.0.1]
"Home"="C:\\Program Files\\Microsoft\\jdk-21.0.2.13-hotspot\\bin"
[HKEY_LOCAL_MACHINE\SOFTWARE\JavaSoft\Java Web Start\1.0.1_02]
"Home"="C:\\Program Files\\Microsoft\\jdk-21.0.2.13-hotspot\\bin"
[HKEY_LOCAL_MACHINE\SOFTWARE\JavaSoft\Java Web Start\1.0.1_03]
"Home"="C:\\Program Files\\Microsoft\\jdk-21.0.2.13-hotspot\\bin"
[HKEY_LOCAL_MACHINE\SOFTWARE\JavaSoft\Java Web Start\1.0.1_04]
"Home"="C:\\Program Files\\Microsoft\\jdk-21.0.2.13-hotspot\\bin"
[HKEY_LOCAL_MACHINE\SOFTWARE\JavaSoft\Java Web Start\1.2]
"Home"="C:\\Program Files\\Microsoft\\jdk-21.0.2.13-hotspot\\bin"
[HKEY_LOCAL_MACHINE\SOFTWARE\JavaSoft\Java Web Start\1.2.0_01]
"Home"="C:\\Program Files\\Microsoft\\jdk-21.0.2.13-hotspot\\bin"
[HKEY_LOCAL_MACHINE\SOFTWARE\JavaSoft\Java Web Start\11.333.2]
"Home"="C:\\Program Files\\Microsoft\\jdk-21.0.2.13-hotspot\\bin"
[HKEY_LOCAL_MACHINE\SOFTWARE\JavaSoft\Java Web Start Caps]
[HKEY_LOCAL_MACHINE\SOFTWARE\JavaSoft\Java Web Start Caps\11.333.2]
"JNLPProtocol"=dword:00000001
"JNLPAssociation2"=dword:00000001
1.2 Apply registry file¶
Double click on the exported and edited registry file to apply the new values.
Visual C++ runtime¶
The Java Virtual Machine of the JRE - <jre-path>\bin\server\jvm.dll
- takes a dependency on Visual C++ runtime libraries, as illustrated below:
# Using Visual Studio Build Tools, location:
# C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.29.30133\bin\Hostx64\x64\
.\dumpbin.exe /DEPENDENTS "C:\Program Files\Microsoft\jdk-21.0.2.13-hotspot\bin\server\jvm.dll"
...
Image has the following dependencies:
...
VCRUNTIME140.dll
VCRUNTIME140_1.dll
...
Process Monitor traces show that C:\Windows\System32\vcruntime140_1.dll
is being loaded (among others); if Visual C++ runtime libs are missing, then jvm.dll
of the JRE can't be loaded either.
Install, if missing, the necessary version of (https://aka.ms/vs/17/release/vc_redist.x64.exe) from the Microsoft Visual C++ Redistributive downloads page that works with your installed version of OpenJDK.
Avoid OutOfMemoryError¶
When handling large amounts of data with Parquet files, the Java-based libraries may cause the JRE to go into high fragmentation of the free memory space in Java heaps. This will cause, ultimately, failures in the Data Factory activities involving Parquet files; such failures would show in the error message something like:
java.lang.OutOfMemoryError:Java heap space
To help avoiding such occurrences, we could "tell" the Java runtime to use a larger memory space for its heaps. Do so by adding a system-wide environment variable with the minimum and maximum value for heap sizes.
_JAVA_OPTIONS = -Xms512m -Xmx16gJAVA_TOOL_OPTIONS = -Xms512m -Xmx16g
According to Microsoft documentation, a _JAVA_OPTIONS
should be added, and it works in the tests we performed. But according to comments in this post, the supported environment variable name is JAVA_TOOL_OPTIONS
.
The flag Xms
specifies the initial memory allocation pool for a Java Virtual Machine (JVM), while Xmx
specifies the maximum memory allocation pool. This means that JVM will be started with Xms
amount of memory and will be able to use a maximum of Xmx
amount of memory. By default, the JRE uses min 64 MB and max 1GB.
After setting Java heap memory limits, reboot the machine. We may check memory settings by looking for the values MinHeapSize
/InitialHeapSize
and MaxHeapSize
/SoftMaxHeapSize
in the output of:
java -XX:+PrintFlagsFinal -version