Eclipse Notes, Java's Vector API, and JITWatch

This blog post is a collection of notes on how I like to setup the Eclipse IDE, and a starting point for how to use Java's new Vector API. I'll also show how to use JITWatch to see how Java source code transforms into Java bytecode and Intel assembly instructions. That tool is particularly helpful when trying to figure out performance issues with vectorized code.

Part 1: Installing the Eclipse IDE and a Couple Plug-Ins

  1. Download "Eclipse IDE for Java Developers" and extract the archive. You may also want to place a link to "eclipse.exe" on your desktop or taskbar.
  2. https://www.eclipse.org/downloads/packages/
  3. Open Eclipse. On the first run it will ask you where to create a workspace. The default location is fine. After the IDE appears you can check for updates and optionally install a couple plug-ins that I find very helpful: "Jeeeyul's Eclipse Themes" is a plug-in that improves the appearance of the GUI, and "Launch Configuration View" is a plug-in that makes it easier to manage projects with several run configuations (as we'll see later.)
  4. Open Eclipse Check "Use this as the default and do not ask again" > Launch Close the Welcome tab Close the Donate tab Help > Check for Updates Help > Eclipse Marketplace Search for "jeeeyul" > Jeeeyul's Eclipse Themes > Install > Confirm > Accept > Finish > Check the box > Trust Selected > Restart Now Help > Eclipse Marketplace Search for "launch" > Launch Configuration View Latest > Install > Finish > Install Anyway > Restart Now

Note: It looks like the upcoming 2021-12 release of Eclipse will come with Launch Configuration View already included.

Part 2: Eclipse GUI Tips

  1. Enable Jeeeyul's Theme and configure it as desired. You can adjust one of the included themes to your taste, or download my theme which is a slightly modified version of the default theme.
  2. Window > Preferences > General > Appearance > Theme = Jeeeyul's themes - Custom Theme > Apply and Close > Restart Window > Preferences > General > Appearance > Jeeeyul's Themes > Presets > Import > Select "Eclipse Theme" > Apply > Apply and Close
  3. Add the "Tasks" tab to your current perspective. It lists the TODO's/FIXME's in your code, which is particularly helpful when working on large projects or collaborating with other people.
  4. Window > Show View > Other > General > Tasks > Open
  5. Add the "Breakpoints" and "Debug" tabs to your current perspective. This makes is easier to debug code without having to switch to the Debug Perspective.
  6. Window > Show View > Other > Debug > Breakpoints > Open Window > Show View > Other > Debug > Debug > Open
  7. Add the "Launch Configurations" tab to your current perspective. It lists all of your Run Configurations and External Tool Configurations, which is a little more convenient than accessing them through menus.
  8. Window > Show View > Other > Debug > Launch Configurations > Open
  9. Add the "Terminal" tab to your current perspective. This is a quick way to get to the command line (cmd.exe) from within the IDE.
  10. Window > Show View > Other > Terminal > Terminal > Open Click the "Open a Terminal" toolbar icon inside that tab to obtain a terminal.
  11. Keep the Project Explorer in sync with the currently active file editor tab.
  12. Project Explorer > Link with Editor (it's a toolbar button)
  13. Simplify the GUI by removing undesired toolbar buttons.
  14. Window > Perspective > Customize Perspective > Toolbar Visibility uncheck "Terminal" uncheck "Jeeeyul's Eclipse Themes" Launch > uncheck "Coverage" uncheck "Java Element Creation" uncheck "Search" uncheck "Navigate" uncheck "Help"

Part 3: Eclipse Preferences Tips

  1. Often it will look like Eclipse has frozen but if you look in the lower-right corner you'll see a small progress bar. Instead of doing things "in the background" I prefer it to be more obvious:
  2. Window > Preferences > General > uncheck "Always run in background"
  3. If you mouse over a JavaDoc pop-up, it will wait a few seconds before showing more details. I prefer not to wait:
  4. Window > Preferences > General > Editors > Text Editors > "when mouse moved into hover = enrich immediately"
  5. Several plug-ins load at startup but you can disable the ones you don't care about:
  6. Window > Preferences > General > Startup and Shutdown > uncheck "buildship..." "equinox..." "language server..." and "oomph..."
  7. The workspace name is shown in the title bar but if you only use one workspace you probably don't need to see that:
  8. Window > Preferences > General > Workspace > uncheck "show workspace name"
  9. Incubating features (like the Vector API) are located inside the jdk.* packages. Content-Assist will not recommend anything from those packages because they are not used by most developers. But we'll be trying out the Vector API so we actually want those recommendations:
  10. Window > Preferences > Java > Appearance > Type Filters > uncheck "jdk.*"
  11. When debugging multithreaded code a breakpoint can be used to pause one thread or all threads. The default of pausing one thread is fine but you might want to pause all threads in some situations:
  12. Window > Preferences > Java > Debug > "default suspend policy for new breakpoint"
  13. Auto-completion can be used to replace existing code or simply insert the rest of a proposed identifier. The default of replacing code can be helpful, but I find it causes more problems than it solves. I also perfer auto-completion to only kick in when I press Enter, not when I press Space:
  14. Window > Preferences > Java > Editor > Content Assist > "completion inserts" Window > Preferences > Java > Editor > Content Assist > check "disable insertion triggers except enter"
  15. Unlimited scroll back in the console is very helpful:
  16. Window > Preferences > Run/Debug > Console > uncheck "limit console output"
  17. If you use Eclipse's Git features then you probably want to specify your name and e-mail address:
  18. Window > Preferences > Version Control > Git > Configuration > User Settings > Add Entry > "user.name = Your Name" and "user.email = youremail@example.com"

The above preferences affect all projects. Changes that only affect the current project can be made with:

Project > Properties

For more tips and tricks, check out Noopur Gupta's "Mastering Your Eclipse IDE" talk at Eclipsecon 2019:
Video: https://www.youtube.com/watch?v=8WcntACvfl4
Slides: https://www.eclipsecon.org/sites/default/files/slides/Mastering%20your%20Eclipse%20IDE%20-%20ECE%202019.pdf

Part 4: Installing Several JDKs

The Vector API is still "incubating" and undergoing lots of development. Performance differences between JDK versions can be drastic so I'll be testing my code with multiple JDKs on multiple OS's on multiple architectures. Development will be done with Windows, but I'll also test on a Linux VM, and on a Raspberry Pi 4 (using two versions of Raspberry Pi OS: the default Arm32 version and a beta AArch64 version.)

The OpenJDK project provides builds for a few operating systems and architectures:
https://jdk.java.net/archive/

An alternative source for builds is the Adoptium project. They support a wider variety of OS's and architectures. They also provide convenient installers for Windows but I'll be using their ZIP files because I want to have multiple JDKs available on the same machine.
https://adoptium.net/releases.html

Java 18 is still under development at the time of writing. There are Early Access builds on the OpenJDK website but I'll be trying some "nightly" builds from Shipilev's web site instead. The "server-release" archives provide what we need:
https://builds.shipilev.net/openjdk-jdk/

I downloaded Java 16, Java 17, and a Java 18 nightly build, then made a "java_projects" folder on my Desktop and extracted the JDKs there. The Eclipse IDE includes the JustJ distribution of Java 16, but it doesn't seem to include the incubating Vector API so we must switch to one of the downloaded JDKs. Let's tell the Eclipse IDE about the new JDKs and change the default one to Adoptium Java 16:

Windows > Preferences > Java > Installed JREs Add > Next > Directory > go to Desktop/java_projects/jdk-16.0.2+7 > Select Folder > set "JRE name" to "jdk-16" > Finish Add > Next > Directory > go to Desktop/java_projects/jdk-17.0.1+12 > Select Folder > set "JRE name" to "jdk-17" > Finish Add > Next > Directory > go to Desktop/java_projects/jdk > Select Folder > set "JRE name" to "jdk-18-nightly" > Finish Check the box next to "jdk-16" to make it the default. Apply and Close

Part 5: First Steps with Java's Vector API

If you're new the Java's Vector API, the following resources may be helpful:

My curiosity in the Vector API comes from wanting to improve performance in Telemetry Viewer. One of my bottlenecks is in verifying the checksums of binary packets. My laptop can currently process approximately 20Gbps of telemetry. That's faster than I have a need for, but it would still be nice to improve things if that results in reduced power consumption.

Let's start by creating a new project and giving it a Main class:

File > New > Java Project > Project name = "Vector API Test" > Finish > Don't Create File > New > Class > Name = "Main", and check "public static void main(String[] args)" > Finish

Here's some code I wrote that demonstrates a scalar way of testing checksums, and four attempts at vectorizing it:

import java.net.InetAddress; import java.nio.ByteOrder; import jdk.incubator.vector.ByteVector; import jdk.incubator.vector.ShortVector; import jdk.incubator.vector.VectorMask; import jdk.incubator.vector.VectorOperators; import jdk.incubator.vector.VectorShuffle; import jdk.incubator.vector.VectorSpecies; public class Main { // simulating checksum verification of binary packets // each packet contains 1 sync byte, then 8 payload bytes, then a 2 byte checksum: // AA 01 02 03 04 05 06 07 08 10 14 // (0xAA is the sync word, then 4 little-endian int16's: 0x0201, 0x0403, 0x0605, 0x0807, then a little-endian int16 checksum: 0x1410) final static int packetByteCount = 11; final static byte[] buffer = new byte[3 * 1048576 * packetByteCount]; // 3M packets static { for(int i = 0; i < buffer.length; i += packetByteCount) { buffer[i ] = (byte) 0xAA; buffer[i+ 1] = (byte) 0x01; buffer[i+ 2] = (byte) 0x02; buffer[i+ 3] = (byte) 0x03; buffer[i+ 4] = (byte) 0x04; buffer[i+ 5] = (byte) 0x05; buffer[i+ 6] = (byte) 0x06; buffer[i+ 7] = (byte) 0x07; buffer[i+ 8] = (byte) 0x08; buffer[i+ 9] = (byte) 0x10; buffer[i+10] = (byte) 0x14; } } /** * Prints out some information about the computer and JRE, then benchmarks the code. * * @param args Not used. */ public static void main(String[] args) { System.out.println("===================================================================================="); try { System.out.println("hostname = " + InetAddress.getLocalHost().getHostName()); } catch(Exception e) {} System.out.println("java.vm.name = " + System.getProperty("java.vm.name")); System.out.println("java.vm.version = " + System.getProperty("java.vm.version")); System.out.println("java.vendor.version = " + System.getProperty("java.vendor.version")); System.out.println("os.name = " + System.getProperty("os.name")); System.out.println("os.version = " + System.getProperty("os.version")); System.out.println("os.arch = " + System.getProperty("os.arch")); System.out.println("java.home = " + System.getProperty("java.home")); System.out.println("user.dir = " + System.getProperty("user.dir")); System.out.println("===================================================================================="); System.out.println(); System.out.print("Verifying checksums, scalar code... "); long start = System.nanoTime(); for(int repeat = 0; repeat < 500; repeat++) verifyChecksumsScalar(); long end = System.nanoTime(); double scalarMilliseconds = (end - start) / 1000000.0; System.out.println(String.format("took %9.3f ms", scalarMilliseconds)); System.out.print("Verifying checksums, vectorA code... "); start = System.nanoTime(); for(int repeat = 0; repeat < 500; repeat++) verifyChecksumsVectorA(); end = System.nanoTime(); double milliseconds = (end - start) / 1000000.0; System.out.println(String.format("took %9.3f ms >>> %6.1f%% faster than scalar <<<", milliseconds, (1.0 - milliseconds / scalarMilliseconds) * 100)); System.out.print("Verifying checksums, vectorB code... "); start = System.nanoTime(); for(int repeat = 0; repeat < 500; repeat++) verifyChecksumsVectorB(); end = System.nanoTime(); milliseconds = (end - start) / 1000000.0; System.out.println(String.format("took %9.3f ms >>> %6.1f%% faster than scalar <<<", milliseconds, (1.0 - milliseconds / scalarMilliseconds) * 100)); System.out.print("Verifying checksums, vectorC code... "); start = System.nanoTime(); for(int repeat = 0; repeat < 500; repeat++) verifyChecksumsVectorC(); end = System.nanoTime(); milliseconds = (end - start) / 1000000.0; System.out.println(String.format("took %9.3f ms >>> %6.1f%% faster than scalar <<<", milliseconds, (1.0 - milliseconds / scalarMilliseconds) * 100)); System.out.print("Verifying checksums, vectorD code... "); start = System.nanoTime(); for(int repeat = 0; repeat < 500; repeat++) verifyChecksumsVectorD(); end = System.nanoTime(); milliseconds = (end - start) / 1000000.0; System.out.println(String.format("took %9.3f ms >>> %6.1f%% faster than scalar <<<", milliseconds, (1.0 - milliseconds / scalarMilliseconds) * 100)); } /** * A scalar way of verifying the packet checksums: * * Interpret bytes 1 and 2 as a little-endian integer, then add it to an accumulator. * Interpret bytes 3 and 4 as a little-endian integer, then add it to an accumulator. * Interpret bytes 5 and 6 as a little-endian integer, then add it to an accumulator. * Interpret bytes 7 and 8 as a little-endian integer, then add it to an accumulator. * The lower 16 bits of the accumulator now contains the sum of the payload region. * Interpret bytes 9 and 10 as a little-endian integer, then compare that to the accumulator. If they're not equal, the packet is corrupt. */ public static void verifyChecksumsScalar() { for(int offset = 0; offset < buffer.length; offset += packetByteCount) { int sum = 0; int lsb = 0; int msb = 0; lsb = 0xFF & buffer[offset+1]; msb = 0xFF & buffer[offset+2]; sum += (msb << 8 | lsb); lsb = 0xFF & buffer[offset+3]; msb = 0xFF & buffer[offset+4]; sum += (msb << 8 | lsb); lsb = 0xFF & buffer[offset+5]; msb = 0xFF & buffer[offset+6]; sum += (msb << 8 | lsb); lsb = 0xFF & buffer[offset+7]; msb = 0xFF & buffer[offset+8]; sum += (msb << 8 | lsb); sum %= 65536; lsb = 0xFF & buffer[offset+9]; msb = 0xFF & buffer[offset+10]; int reportedSum = (msb << 8 | lsb); if(reportedSum != sum) System.out.println("corrupt"); } } /** * Perhaps the most simple way to vectorize this algorithm: * * The payload region is 8 bytes, which is 64 bits, which is a commonly supported SIMD register size. * Copy those 8 bytes into a SIMD register, treating the bytes as little-endian shorts. * Calculate the sum of those little-endian shorts with a reduce operation. * Finally, calculate the reported sum manually. If they do not match, the packet is corrupt. */ public static void verifyChecksumsVectorA() { VectorSpecies<Short> species = ShortVector.SPECIES_64; for(int i = 0; i < buffer.length; i += packetByteCount) { ShortVector vec = ShortVector.fromByteArray(species, buffer, i+1, ByteOrder.LITTLE_ENDIAN); short sum = vec.reduceLanes(VectorOperators.ADD); int lsb = 0xFF & buffer[i+9]; int msb = 0xFF & buffer[i+10]; int reportedSum = (msb << 8 | lsb); if(reportedSum != sum) System.out.println("corrupt"); } } /** * It might be more efficient to use a wider SIMD register, since modern processors support 256 bit (or bigger) registers. * So let's try processing 3 packets inside one register: * * Copy 32 bytes into a 256 bit SIMD register, starting at the payload region of the first packet. * Use 2 blend operations to remove the non-payload bytes (checksums and sync words) that exist between the payload regions of the three packets. * Use 3 reduce operations (with masks) to individually calculate the sums of the 3 packets. * Finally, calculate the 3 reported sums manually. If they do not match, the packet is corrupt. */ public static void verifyChecksumsVectorB() { VectorSpecies<Byte> byteSpecies = ByteVector.SPECIES_256; VectorMask<Byte> firstMask = VectorMask.fromLong(byteSpecies, 0b11111111111111111111111100000000); VectorMask<Byte> secondMask = VectorMask.fromLong(byteSpecies, 0b00000000111111110000000000000000); VectorSpecies<Short> packetSpecies = ShortVector.SPECIES_256; VectorMask<Short> packet1Mask = VectorMask.fromLong(packetSpecies, 0b000000001111); VectorMask<Short> packet2Mask = VectorMask.fromLong(packetSpecies, 0b000011110000); VectorMask<Short> packet3Mask = VectorMask.fromLong(packetSpecies, 0b111100000000); for(int offset = 0; offset < buffer.length; offset += packetByteCount*3) { ByteVector bvec = ByteVector.fromArray(byteSpecies, buffer, offset + 1); ByteVector bvec2 = bvec.blend(bvec.slice(3), firstMask); ByteVector bvec3 = bvec2.blend(bvec2.slice(3), secondMask); ShortVector svec = bvec3.reinterpretAsShorts(); short sum1 = svec.reduceLanes(VectorOperators.ADD, packet1Mask); short sum2 = svec.reduceLanes(VectorOperators.ADD, packet2Mask); short sum3 = svec.reduceLanes(VectorOperators.ADD, packet3Mask); int lsb = 0xFF & buffer[offset+9]; int msb = 0xFF & buffer[offset+10]; int reportedSum = (msb << 8 | lsb); if(reportedSum != sum1) System.out.println("corrupt"); lsb = 0xFF & buffer[offset+20]; msb = 0xFF & buffer[offset+21]; reportedSum = (msb << 8 | lsb); if(reportedSum != sum2) System.out.println("corrupt"); lsb = 0xFF & buffer[offset+31]; msb = 0xFF & buffer[offset+32]; reportedSum = (msb << 8 | lsb); if(reportedSum != sum3) System.out.println("corrupt"); } } /** * The previous attempt was slow. * Let's try 1 rearrange and 1 blend operation, instead of 2 blend operations. * Let's also try 1 reduce operation, instead of 3. This will not catch all checksum failures, but this is just a test. */ public static void verifyChecksumsVectorC() { VectorSpecies<Byte> byteSpecies = ByteVector.SPECIES_256; VectorShuffle<Byte> byteShuffle = VectorShuffle.fromArray(byteSpecies, new int[] { 0, 1, 2, 3, 4, 5, 6, 7, 11,12,13,14,15,16,17,18, 22,23,24,25,26,27,28,29, 0, 0, 0, 0, 0, 0, 0, 0}, 0); VectorMask<Byte> unusedBytesMask = VectorMask.fromLong(byteSpecies, 0b11111111_00000000_00000000_00000000); for(int offset = 0; offset < buffer.length; offset += packetByteCount*3) { ByteVector bvec = ByteVector.fromArray(byteSpecies, buffer, offset + 1); bvec = bvec.rearrange(byteShuffle); bvec = bvec.blend(0, unusedBytesMask); ShortVector svec = bvec.reinterpretAsShorts(); short sum = svec.reduceLanes(VectorOperators.ADD); int lsb = 0xFF & buffer[offset+9]; int msb = 0xFF & buffer[offset+10]; int reportedSum = (msb << 8 | lsb); lsb = 0xFF & buffer[offset+20]; msb = 0xFF & buffer[offset+21]; reportedSum += (msb << 8 | lsb); lsb = 0xFF & buffer[offset+31]; msb = 0xFF & buffer[offset+32]; reportedSum += (msb << 8 | lsb); if(reportedSum != sum) System.out.println("corrupt"); } } /** * It looks like there may be a cleaner way to remove the non-payload bytes. * One of the methods for filling a SIMD register accepts an array of indices. * Like before, let's also try 1 reduce operation, instead of 3. This will not catch all checksum failures, but this is just a test. */ public static void verifyChecksumsVectorD() { VectorSpecies<Byte> byteSpecies = ByteVector.SPECIES_256; int[] indices = new int[] { 1, 2, 3, 4, 5, 6, 7, 8, 12,13,14,15,16,17,18,19, 23,24,25,26,27,28,29,30, 34,35,36,37,38,39,40,41}; for(int offset = 0; offset < buffer.length; offset += packetByteCount*4) { ByteVector bvec = ByteVector.fromArray(byteSpecies, buffer, offset, indices, 0); ShortVector svec = bvec.reinterpretAsShorts(); short sum = svec.reduceLanes(VectorOperators.ADD); int lsb = 0xFF & buffer[offset+9]; int msb = 0xFF & buffer[offset+10]; int reportedSum = (msb << 8 | lsb); lsb = 0xFF & buffer[offset+20]; msb = 0xFF & buffer[offset+21]; reportedSum += (msb << 8 | lsb); lsb = 0xFF & buffer[offset+31]; msb = 0xFF & buffer[offset+32]; reportedSum += (msb << 8 | lsb); lsb = 0xFF & buffer[offset+42]; msb = 0xFF & buffer[offset+43]; reportedSum += (msb << 8 | lsb); if(reportedSum != sum) System.out.println("corrupt"); } } }

Lots of errors will appear because Eclipse is still trying to use it's bundled JRE instead of Adoptium Java 16. This can be fixed by changing the project's JRE System Library:

Project > Properties > Java Build Path > Libraries > JRE System Library > Edit > "Alternate JRE = jdk-16" > Finish > Apply and Close

Some important notes:

  • My use case of vectorizing checksums is not ideal. The data is not aligned on word or cache line boundaries, and there is little vectorized work to do. That means the cost of setting things up may eat away at most of the SIMD performance gains.
  • This code is not an example of how to expertly vectorize your algorithms. I'm a beginner at it, and my troubles along the way inspired me to write this article.
  • As we'll also see, some of the vectorizing attempts resulted in slower code -- sometimes massively slower! Part of this is due to the incomplete state of Java's Vector API, and part of it is due to my inexperience.
  • Some of my vectorizing attempts are incomplete and won't catch all checksum failures. I stopped working on some attempts when it became obivous they were slow.
  • A tool like JMH could be used to benchmark the code, but I decided to keep it simple and use timestamps instead. As we will see later on, I verified that the JIT was kicking in, and that it was not optimizing away my code, so I'm not worried about measurement inaccuracy. My measurements also correlate well with real-world observations while developing Telemetry Viewer.

Part 6: Benchmarking the Code on Windows (x86_64)

Let's compile and run the code. We'll create three Run Configurations (for Java 16, Java 17, and a Java 18 Nightly.) We must also pass a flag to the JRE to enable the Vector API because incubating features are disabled by default:

Run > Run Configurations Select "Java Application" then click the "New Launch Configuration" toolbar icon. Name = "Vector API Test (This PC, Java 16)" Arguments tab > VM argument = --add-modules=jdk.incubator.vector JRE tab > Alternate JRE = jdk-16 Apply With the current run configuration selected, click the "Duplicate" toolbar icon Name = "Vector API Test (This PC, Java 17)" JRE tab > Alternate JRE = jdk-17 Apply With the current run configuration selected, click the "Duplicate" toolbar icon Name = "Vector API Test (This PC, Java 18 Nightly)" JRE tab > Alternate JRE = jdk-18-nightly Apply Close

Expanding the "Java Application" tree in the Launch Configurations tab reveals the three launch configurations. Double-click on each one to run them. On my laptop I get the following results:

Windows 10, x86_64, Adoptium Java 16: Verifying checksums, scalar code... took 2393.890 ms Verifying checksums, vectorA code... took 2310.526 ms >>> 3.5% faster than scalar <<< Verifying checksums, vectorB code... took 11543.454 ms >>> -382.2% faster than scalar <<< Verifying checksums, vectorC code... took 4459.361 ms >>> -86.3% faster than scalar <<< Verifying checksums, vectorD code... took 8766.583 ms >>> -266.2% faster than scalar <<< Windows 10, x86_64, Adoptium Java 17: Verifying checksums, scalar code... took 2587.480 ms Verifying checksums, vectorA code... took 2175.599 ms >>> 15.9% faster than scalar <<< Verifying checksums, vectorB code... took 4009.761 ms >>> -55.0% faster than scalar <<< Verifying checksums, vectorC code... took 1704.891 ms >>> 34.1% faster than scalar <<< Verifying checksums, vectorD code... took 8657.405 ms >>> -234.6% faster than scalar <<< Windows 10, x86_64, Shipilev Java 18 Nightly: Verifying checksums, scalar code... took 2597.357 ms Verifying checksums, vectorA code... took 2054.242 ms >>> 20.9% faster than scalar <<< Verifying checksums, vectorB code... took 4061.849 ms >>> -56.4% faster than scalar <<< Verifying checksums, vectorC code... took 1716.538 ms >>> 33.9% faster than scalar <<< Verifying checksums, vectorD code... took 8719.769 ms >>> -235.7% faster than scalar <<<

Having tested on only one OS and one architechure has already revealed a lot:

  1. Newer JDK releases have made significant performance improvements.
  2. Curiously, Java 17 and 18 seem to be a little slower when running my scalar code.
  3. Some of my vectorized attempts are still much slower than the scalar code.

While trying to figure out my performance issues I found it helpful to skim through the JEPs. JEP 417 (targeted for Java 18) indicates that support for masks will be added soon. My "vectorB" code used masks and ran very slow, so that would explain why. The code for JEP 417 has not been merged in yet, so the Java 18 Nightly build I tried probably doesn't have those improvements. I'll be keeping an eye on this pull request: https://github.com/openjdk/jdk/pull/5873.

I'm still not sure why "vectorD" was so slow. I'm guessing it would be faster if my data was nicely aligned.

Part 6: Benchmarking the Code on a Linux VM (x86_64)

Start by SSH'ing into a Linux VM and downloading the JDKs into ~/java_projects/. The Terminal tab in Eclipse can be used for this:

ssh farrellf@FarrellF-UbuntuVM -i Desktop/id_rsa $ mkdir java_projects $ cd java_projects $ wget https://github.com/adoptium/temurin16-binaries/releases/download/jdk-16.0.2%2B7/OpenJDK16U-jdk_x64_linux_hotspot_16.0.2_7.tar.gz $ wget https://github.com/adoptium/temurin17-binaries/releases/download/jdk-17.0.1%2B12/OpenJDK17U-jdk_x64_linux_hotspot_17.0.1_12.tar.gz $ wget https://builds.shipilev.net/openjdk-jdk/openjdk-jdk-linux-x86_64-server-release.tar.xz $ tar -xvf OpenJDK16U-jdk_x64_linux_hotspot_16.0.2_7.tar.gz $ tar -xvf OpenJDK17U-jdk_x64_linux_hotspot_17.0.1_12.tar.gz $ tar -xvf openjdk-jdk-linux-x86_64-server-release.tar.xz $ exit

Use SCP to copy the code to the VM, then use SSH to run that code on the VM with various JDKs:

scp -i "Desktop/id_rsa" "C:\Users\FarrellF\eclipse-workspace\Vector API Test\src\Main.java" farrellf@FarrellF-UbuntuVM:~/java_projects/ ssh -i "Desktop/id_rsa" farrellf@FarrellF-UbuntuVM "~/java_projects/jdk-16.0.2+7/bin/java --add-modules=jdk.incubator.vector ~/java_projects/Main.java" ssh -i "Desktop/id_rsa" farrellf@FarrellF-UbuntuVM "~/java_projects/jdk-17.0.1+12/bin/java --add-modules=jdk.incubator.vector ~/java_projects/Main.java" ssh -i "Desktop/id_rsa" farrellf@FarrellF-UbuntuVM "~/java_projects/jdk/bin/java --add-modules=jdk.incubator.vector ~/java_projects/Main.java"

It will get annoying having to copy-and-paste the SCP and SSH commands every time you make a change and want to run another test. Eclipse's "External Tools Configuration" feature makes it easy to invoke tools outside the IDE. We can use the command line (cmd.exe) as an external tool, and have it run SCP and SSH for us:

Run > External Tools > External Tools Configurations Select "Program" then click the "New Launch Configuration" toolbar icon. Name = Vector API Test (Linux VM, Java 16) Location = C:\Windows\System32\cmd.exe Arguments = /c scp -i "C:/Users/FarrellF/Desktop/id_rsa" "C:/Users/FarrellF/eclipse-workspace/Vector API Test/src/Main.java" farrellf@FarrellF-UbuntuVM:~/java_projects/ && ssh -i "C:/Users/FarrellF/Desktop/id_rsa" farrellf@FarrellF-UbuntuVM "~/java_projects/jdk-16.0.2+7/bin/java --add-modules=jdk.incubator.vector ~/java_projects/Main.java" Apply With the current run configuration selected, click the "Duplicate" toolbar icon Name = Vector API Test (Linux VM, Java 17) Arguments = /c scp -i "C:/Users/FarrellF/Desktop/id_rsa" "C:/Users/FarrellF/eclipse-workspace/Vector API Test/src/Main.java" farrellf@FarrellF-UbuntuVM:~/java_projects/ && ssh -i "C:/Users/FarrellF/Desktop/id_rsa" farrellf@FarrellF-UbuntuVM "~/java_projects/jdk-17.0.1+12/bin/java --add-modules=jdk.incubator.vector ~/java_projects/Main.java" Apply With the current run configuration selected, click the "Duplicate" toolbar icon Name = Vector API Test (Linux VM, Java 18 Nightly) Arguments = /c scp -i "C:/Users/FarrellF/Desktop/id_rsa" "C:/Users/FarrellF/eclipse-workspace/Vector API Test/src/Main.java" farrellf@FarrellF-UbuntuVM:~/java_projects/ && ssh -i "C:/Users/FarrellF/Desktop/id_rsa" farrellf@FarrellF-UbuntuVM "~/java_projects/jdk/bin/java --add-modules=jdk.incubator.vector ~/java_projects/Main.java" Apply Close

Expanding the "Program" tree in the Launch Configurations tab reveals the three external tool configurations. Double-click on each one to run them. With my VM I got the following results:

Linux VM, x86_64, Adoptium Java 16: Verifying checksums, scalar code... took 2904.334 ms Verifying checksums, vectorA code... took 2722.829 ms >>> 6.2% faster than scalar <<< Verifying checksums, vectorB code... took 837828.999 ms >>> -28747.5% faster than scalar <<< Verifying checksums, vectorC code... took 408426.271 ms >>> -13962.6% faster than scalar <<< Verifying checksums, vectorD code... took 9764.034 ms >>> -236.2% faster than scalar <<< Linux VM, x86_64, Adoptium Java 17: Verifying checksums, scalar code... took 3094.979 ms Verifying checksums, vectorA code... took 2653.237 ms >>> 14.3% faster than scalar <<< Verifying checksums, vectorB code... took 5224.525 ms >>> -68.8% faster than scalar <<< Verifying checksums, vectorC code... took 2106.555 ms >>> 31.9% faster than scalar <<< Verifying checksums, vectorD code... took 9288.912 ms >>> -200.1% faster than scalar <<< Linux VM, x86_64, Shipilev Java 18 Nightly: Verifying checksums, scalar code... took 2979.614 ms Verifying checksums, vectorA code... took 2355.044 ms >>> 21.0% faster than scalar <<< Verifying checksums, vectorB code... took 5235.825 ms >>> -75.7% faster than scalar <<< Verifying checksums, vectorC code... took 2095.325 ms >>> 29.7% faster than scalar <<< Verifying checksums, vectorD code... took 9172.082 ms >>> -207.8% faster than scalar <<<

As we can see, Java 16 seems to have a bug where some vectorized code is REDICULOUSLY slow when running in a VM. This also happens when Windows in running in a VM, so it's not specific to Linux VMs.

Part 7: Benchmarking the Code on a Raspberry Pi 4 (Arm32)

Before getting started, I like to change the username on my Pi, the hostname of my Pi, and configure SSH to require key authentication. This is all optional, but here's how to do it if you want to:

ssh pi@raspberrypi $ sudo adduser farrellf $ sudo usermod -a -G adm,dialout,cdrom,sudo,audio,video,plugdev,games,users,input,netdev,gpio,i2c,spi farrellf $ sudo su - farrellf $ sudo raspi-config 1 System Options > S4 Hostname > Ok > "FarrellF-Pi4" > Ok 1 System Options > S5 Boot / Auto Login > B4 Desktop Autologin > Finish > Yes After the Pi reboots: ssh farrellf@FarrellF-Pi4 $ sudo deluser -remove-home pi $ mkdir ~/.ssh $ exit scp "C:/Users/FarrellF/Desktop/id_rsa.pub" farrellf@FarrellF-Pi4:~/.ssh/authorized_keys ssh farrellf@FarrellF-Pi4 $ chmod 700 ~/.ssh/authorized_keys $ sudo nano /etc/ssh/sshd_config Uncomment and edit these lines: PubkeyAuthentication yes PasswordAuthentication no Save the file and exit: Ctrl+O > Enter > Ctrl-X $ sudo systemctl restart ssh $ exit Test SSH login with keys: ssh farrellf@FarrellF-Pi4 -i Desktop/id_rsa $ exit

Note that the above commands replaced the "authorized_keys" file, which is fine for a new user. You may want to append to that file instead if your Pi user already has an authorized_keys file.

Downloading and extracting the JDKs is identical to what we did for the Linux VM, but we need to download 32-bit ARM builds instead:

ssh farrellf@FarrellF-Pi4 -i Desktop/id_rsa $ mkdir java_projects $ cd java_projects $ wget https://github.com/adoptium/temurin16-binaries/releases/download/jdk-16.0.2%2B7/OpenJDK16U-jdk_arm_linux_hotspot_16.0.2_7.tar.gz $ wget https://github.com/adoptium/temurin17-binaries/releases/download/jdk-17.0.1%2B12/OpenJDK17U-jdk_arm_linux_hotspot_17.0.1_12.tar.gz $ wget https://builds.shipilev.net/openjdk-jdk/openjdk-jdk-linux-arm32-hflt-server-release.tar.xz $ tar -xvf OpenJDK16U-jdk_arm_linux_hotspot_16.0.2_7.tar.gz $ tar -xvf OpenJDK17U-jdk_arm_linux_hotspot_17.0.1_12.tar.gz $ tar -xvf openjdk-jdk-linux-arm32-hflt-server-release.tar.xz $ exit

Add some more External Tools Configurations like before:

Run > External Tools > External Tools Configurations With the one of the run configurations selected, click the "Duplicate" toolbar icon Name = Vector API Test (Pi 4, Java 16) Arguments = /c scp -i "C:/Users/FarrellF/Desktop/id_rsa" "C:/Users/FarrellF/eclipse-workspace/Vector API Test/src/Main.java" farrellf@FarrellF-Pi4:~/java_projects/ && ssh -i "C:/Users/FarrellF/Desktop/id_rsa" farrellf@FarrellF-Pi4 "~/java_projects/jdk-16.0.2+7/bin/java --add-modules=jdk.incubator.vector ~/java_projects/Main.java" Apply With the current run configuration selected, click the "Duplicate" toolbar icon Name = Vector API Test (Pi 4, Java 17) Arguments = /c scp -i "C:/Users/FarrellF/Desktop/id_rsa" "C:/Users/FarrellF/eclipse-workspace/Vector API Test/src/Main.java" farrellf@FarrellF-UbuntuVM:~/java_projects/ && ssh -i "C:/Users/FarrellF/Desktop/id_rsa" farrellf@FarrellF-UbuntuVM "~/java_projects/jdk-17.0.1+12/bin/java --add-modules=jdk.incubator.vector ~/java_projects/Main.java" Apply With the current run configuration selected, click the "Duplicate" toolbar icon Name = Vector API Test (Pi 4, Java 18 Nightly) Arguments = /c scp -i "C:/Users/FarrellF/Desktop/id_rsa" "C:/Users/FarrellF/eclipse-workspace/Vector API Test/src/Main.java" farrellf@FarrellF-Pi4:~/java_projects/ && ssh -i "C:/Users/FarrellF/Desktop/id_rsa" farrellf@FarrellF-Pi4 "~/java_projects/jdk/bin/java --add-modules=jdk.incubator.vector ~/java_projects/Main.java" Apply Close

The "Program" tree in the Launch Configurations tab reveals the three additional external tool configurations. Double-click on each one to run them. I got the following results:

Pi 4, Arm32, Adoptium Java 16: Verifying checksums, scalar code... took 11881.300 ms Verifying checksums, vectorA code... took 221895.978 ms >>> -1767.6% faster than scalar <<< Verifying checksums, vectorB code... # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (arm.ad:1028), pid=5840, tid=6345 # Error: ShouldNotReachHere() # # JRE version: OpenJDK Runtime Environment Temurin-16.0.2+7 (16.0.2+7) (build 16.0.2+7) # Java VM: OpenJDK Server VM Temurin-16.0.2+7 (16.0.2+7, mixed mode, g1 gc, linux-arm) # Problematic frame: # V [libjvm.so+0xd341c] Matcher::vector_ideal_reg(int)+0x44 ... Pi 4, Arm32, Adoptium Java 17: Verifying checksums, scalar code... took 11505.220 ms Verifying checksums, vectorA code... # # A fatal error has been detected by the Java Runtime Environment: # # SIGBUS (0x7) at pc=0xb3e8b4dc, pid=6606, tid=6607 # # JRE version: OpenJDK Runtime Environment Temurin-17.0.1+12 (17.0.1+12) (build 17.0.1+12) # Java VM: OpenJDK Server VM Temurin-17.0.1+12 (17.0.1+12, mixed mode, sharing, g1 gc, linux-arm) # Problematic frame: # J 582 c2 jdk.incubator.vector.Short64Vector.fromByteArray0([BI)Ljdk/incubator/vector/ShortVector; jdk.incubator.vector@17.0.1 (7 bytes) @ 0xb3e8b4dc [0xb3e8b490+0x0000004c] ... Pi 4, Arm32-HFLT, Shipilev Java 18 Nightly: Error: dl failure on line 542 Error: failed /home/farrellf/java_projects/jdk/lib/server/libjvm.so, because /lib/arm-linux-gnueabihf/libm.so.6: version `GLIBC_2.29' not found (required by /home/farrellf/java_projects/jdk/lib/server/libjvm.so)

Well... that was a let down. Java 16 and 17 crashed, and the Java 18 Nightly build needs a newer version of GLIBC than Raspberry Pi OS comes with. I didn't expect these tests to perform well because the JEPs specifically say they are only targeting x86_64 and AArch64, but I was curious to see how the fallback implementations would perform on Arm32.

Part 8: Benchmarking the Code on a Raspberry Pi 4 (AArch64)

The official Raspberry Pi OS is 32-bit but they have started to offer a beta AArch64 version: https://downloads.raspberrypi.org/raspios_arm64/images/ Let's try it out.

Like before, I changed my username / hostname / SSH configuration as described in Part 7.

Downloading and extracting the JDKs is identical to what we did in Part 7, but we need to download 64-bit ARM ("AArch64") builds instead:

ssh farrellf@FarrellF-Pi4 -i Desktop/id_rsa $ mkdir java_projects $ cd java_projects $ wget https://github.com/adoptium/temurin16-binaries/releases/download/jdk-16.0.2%2B7/OpenJDK16U-jdk_aarch64_linux_hotspot_16.0.2_7.tar.gz $ wget https://github.com/adoptium/temurin17-binaries/releases/download/jdk-17.0.1%2B12/OpenJDK17U-jdk_aarch64_linux_hotspot_17.0.1_12.tar.gz $ wget https://builds.shipilev.net/openjdk-jdk/openjdk-jdk-linux-aarch64-server-release.tar.xz $ tar -xvf OpenJDK16U-jdk_aarch64_linux_hotspot_16.0.2_7.tar.gz $ tar -xvf OpenJDK17U-jdk_aarch64_linux_hotspot_17.0.1_12.tar.gz $ tar -xvf openjdk-jdk-linux-aarch64-server-release.tar.xz $ exit

I'm using the same Pi as before, just booted from another disk, so there is no need to create more External Tool Configurations. Double-click on each of the existing Pi configurations to run them. I got the following results:

Pi 4, AArch64, Adoptium Java 16: Verifying checksums, scalar code... took 11517.057 ms Verifying checksums, vectorA code... took 9384.111 ms >>> 18.5% faster than scalar <<< Verifying checksums, vectorB code... took 7495015.143 ms >>> -64977.5% faster than scalar <<< Verifying checksums, vectorC code... took 3282422.142 ms >>> -28400.5% faster than scalar <<< Verifying checksums, vectorD code... took 273615.500 ms >>> -2275.7% faster than scalar <<< Pi 4, AArch64, Adoptium Java 17: Verifying checksums, scalar code... took 11575.545 ms Verifying checksums, vectorA code... took 9377.791 ms >>> 19.0% faster than scalar <<< Verifying checksums, vectorB code... took 8032002.942 ms >>> -69287.7% faster than scalar <<< Verifying checksums, vectorC code... took 3451573.463 ms >>> -29717.8% faster than scalar <<< Verifying checksums, vectorD code... took 249912.099 ms >>> -2059.0% faster than scalar <<< Pi 4, AArch64, Shipilev Java 18 Nightly: Error: dl failure on line 542 Error: failed /home/farrellf/java_projects/jdk/lib/server/libjvm.so, because /lib/aarch64-linux-gnu/libm.so.6: version `GLIBC_2.29' not found (required by /home/farrellf/java_projects/jdk/lib/server/libjvm.so)

The GLIBC error is because the Shipilev binaries were built against a newer version of GLIBC than what's used in Raspberry Pi OS. A quick test revealed that the JDK 18 Early Access builds from the OpenJDK project work. But the performance is still horrible:

Pi 4, AArch64, OpenJDK Java 18 EA: Verifying checksums, scalar code... took 11759.950 ms Verifying checksums, vectorA code... took 9568.449 ms >>> 18.6% faster than scalar <<< Verifying checksums, vectorB code... took 8132770.026 ms >>> -69056.5% faster than scalar <<< Verifying checksums, vectorC code... took 3548230.455 ms >>> -30072.2% faster than scalar <<< Verifying checksums, vectorD code... took 245422.380 ms >>> -1986.9% faster than scalar <<<

It looks like the SIMD registers on the Pi 4 CPU are 128 bits wide, which explains why my code that requested 256 bit registers performed so poorly. This is why the API lets you obtain a "preferred" register size instead of hardcoding it. I'm still surprised at how poorly the API's fallback implementations perform.

Part 9: Crude CI/CD with Launch Groups

Now that I have some ideas of where to change my code, I'm ready to run more experiments. I could make changes, then double-click on each of the nine run configurations to test how they perform... but that will get annoying pretty quick. For a complex project, you might setup a CI/CD pipeline to automate all of this. For a simple project, Eclipse's "Launch Group" feature helps out and keeps things simple. It automates the running of multiple run configurations and external tool configurations. The runs can be done in parallel or sequentially. I'm trying to test performace so I'll run them sequentially:

Run > Run Configurations > Launch Group > click the "New Launch Configuration" toolbar icon Name = "Vector API Test (Run All)" Add Java Application > Vector API Test (This PC, Java 16) Post Launch Action = Wait until terminated OK Add Java Application > Vector API Test (This PC, Java 17) Post Launch Action = Wait until terminated OK Add Java Application > Vector API Test (This PC, Java 18 Nightly) Post Launch Action = Wait until terminated OK Add Program > Vector API Test (Linux VM, Java 16) Post Launch Action = Wait until terminated OK Add Program > Vector API Test (Linux VM, Java 17) Post Launch Action = Wait until terminated OK Add Program > Vector API Test (Linux VM, Java 18 Nightly) Post Launch Action = Wait until terminated OK Add Program > Vector API Test (Pi 4, Java 16) Post Launch Action = Wait until terminated OK Add Program > Vector API Test (Pi 4, Java 17) Post Launch Action = Wait until terminated OK Add Program > Vector API Test (Pi 4, Java 18 Nightly) Post Launch Action = Wait until terminated OK Apply Close

Double-clicking the newly created Launch Group in the Launch Configrations tab will kick off the whole process. We'll end up with nine consoles, which can be accessed by clicking on the console tab's "Display Selected Console" toolbar icon.

Part 10: Looking Under the Hood with JITWatch

It would be nice to confirm if our code is getting compiled by the JIT. The PrintCompilation JRE flag can be used to see what methods get JIT'd:

-XX:+PrintCompilation

That can be useful for a quick check, but often it's more helpful to see the actual disassembly. This is particularly useful when trying out the Vector API so we can see if the generated code matches the SIMD instructions we were hoping to invoke. A handful of JRE flags can be used for this:

-XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:+PrintAssembly -XX:PrintAssemblyOptions=intel -XX:LogFile=hotspot.log

If you run your code with those flags you'll end up with lots of text printed to the console and also a log file. Looking carefully reveals that it only printed the machine code, not the corresponding assembly. This is because the JDK requires the HSDIS library to disassemble the code but they can't include that library due to license conflicts. I was unable to find a precompiled HSDIS DLL for x86_64 but found some instructions on how to compile it at: https://dropzone.nfshost.com/hsdis/. We need to install Cygwin, then download the JDK and Binutils source code, and finally compile HSDIS with a special make command. I had problems with Binutils 2.37, but version 2.36.1 worked perfectly:

https://www.cygwin.com/setup-x86_64.exe Next > Next > Next > Next > Next > Select a Download Site > Next All > Devel > gcc-core > Select the newest version All > Devel > make > Select the newest version All > Devel > mingw64-x86_64-gcc-code > Select the newest version All > Web > wget > Select the newest version Next > Next > Finish Cygwin64 Terminal $ cd C:/Users/FarrellF/Desktop $ wget https://ftp.gnu.org/gnu/binutils/binutils-2.36.1.tar.xz $ tar -xvf binutils-2.36.1.tar.xz $ wget https://github.com/openjdk/jdk/archive/refs/tags/jdk-17-ga.tar.gz $ tar -xvf jdk-17-ga.tar.gz $ cd jdk-jdk-17-ga/src/utils/hsdis/ $ make OS=Linux MINGW=x86_64-w64-mingw32 BINUTILS=../../../../binutils-2.36.1 $ cp build/Linux-amd64/hsdis-amd64.dll ../../../../java_projects/jdk-16.0.2+7/bin/ $ cp build/Linux-amd64/hsdis-amd64.dll ../../../../java_projects/jdk-17.0.1+12/bin/ $ cp build/Linux-amd64/hsdis-amd64.dll ../../../../java_projects/jdk/bin/ $ cd ../../../.. $ rm jdk-17-ga.tar.gz $ rm jdk-jdk-17-ga/ -rf $ rm binutils-2.36.1.tar.xz $ rm binutils-2.36.1/ -rf $ exit

Let's create another Run Configuration for collecting that log:

Run > Run Configurations With the "This PC, Java 17" run configuration selected, click the "Duplicate" toolbar icon Name = Vector API Test (This PC, Java 17, Collect JITWatch Log) Arguments > VM Arguments = --add-modules=jdk.incubator.vector -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:+PrintAssembly -XX:PrintAssemblyOptions=intel -XX:LogFile=hotspot.log Apply Close

If you run it, you'll see a massive amount of data printed to the console and a "hotspot.log" file created in the project folder.

JITWatch can be used to process that log file and make it easier to find the information we care about. Download JITWatch and save it in the project folder. Normally you can just double-click the .jar file to run it, but since we are using the incubating Vector API we also have to enable that feature when running JITWatch. Let's create an External Tool Configuration to make it easy:

Download https://github.com/AdoptOpenJDK/jitwatch/releases/download/1.4.2/jitwatch-ui-1.4.2-shaded-win.jar Save it in the project folder. In Eclipse, right-click the project > Refresh Run > External Tools > External Tools Configurations > Click the "New Launch Configuration" toolbar icon Name = Run JITWatch Location = Browse Filesystem > C:\Users\FarrellF\Desktop\java_projects\jdk-17.0.1+12\bin\java.exe Working Directory = Browse Workspace > Select the "Vector API Test" project Arguments = --add-modules=jdk.incubator.vector -jar jitwatch-ui-1.4.2-shaded-win.jar Apply Close

Double-click "Run JITWatch" in the Launch Configurations tab to run the program. After it opens we can select the log file and tell it about our source code. It will parse everything and let us see how our source code corresponds to bytecode and assembly:

Run JITWatch Open Log > "hotspot.log" Config Source Locations > Add Folder > Go to the "src" project subfolder > Select Folder Source Locations > Add JDK Src Class Locations > Add Folder > Go to the "bin" project subfolder > Select Folder Save Start After a few seconds the log will be parsed. Expand the "(default package)" tree > Main > verifyChecksumsScalar() > check "Mouseover" Expand the "(default package)" tree > Main > verifyChecksumsVectorA() > check "Mouseover" Expand the "(default package)" tree > Main > verifyChecksumsVectorB() > check "Mouseover" Expand the "(default package)" tree > Main > verifyChecksumsVectorC() > check "Mouseover" Expand the "(default package)" tree > Main > verifyChecksumsVectorD() > check "Mouseover"

The left pane contains Java source code, the center pane contains Java bytecode, and the right pane contains the actual assembly instructions. Hovering over a line of bytecode will reveal a little more information about it. For example, with the vectorized methods we see several green lines of bytecode that were inlined by the VM.

Further reading:

Youtube Video

Telemetry Viewer v0.8

Telemetry Viewer v0.8 Changelog (2021-07-24)

  • Multiple telemetry connections are now possible.
  • Basic triggering was added for time domain charts. It works like an oscilloscope: trigger on a rising edge, falling edge, or both edges. The usual trigger modes are supported: auto, normal and single.
  • The timeline now has a full set of playback controls. You can jump to the beginning, jump to the end, play, pause, and rewind. Playback and rewinding speed can be adjusted from 1x to 8x.
  • Cameras are now managed like regular connections, and exporting them creates standard MKV files. The MKV files can be played back in common movie players like VLC, or they can be imported back into TelemetryViewer. The benefit of playing them with TelemetryViewer is that the timestamps for each frame are displayed on screen.
  • Exporting is much faster now and the exporting process can be canceled.
  • Added support for the RDTech TC66/TC66C USB-C power meters. They are available here: https://amzn.to/3l6QFYD
  • Added a "Statistics Chart" which can calculate and display the minimum/maximum/mean/median/standard deviation/90th percentile. The chart can also be used as a simple numeric display (showing just the current value of a dataset.)
  • Transmitting to UARTs is now supported. Data can be specified in text/hex/binary forms. Data can be sent once or repeatedly, and the data can be bookmarked for later use.
  • "Test Mode" has been renamed to "Demo Mode" to make what it does more obvious. New waveforms were added to help demonstrate trigger functionality.
  • Massive speed improvements were made in the data processing logic, and a "Stress Test Mode" was added to benchmark it. A modern laptop can process and visualize telemetry at speeds over 5Gbps.
  • For binary mode, the sync word is now optional and its value can be specified. Example Java code is also provided for binary mode UDP connections.
  • Notifications are now drawn with OpenGL, resulting is much smoother animations. They now slide into or out of existence. The different notification categories can be enabled or disabled, and their colors can be changed.
  • Replaced the color picker with an easier and simpler design.
  • Benchmarking now profiles every chart on screen instead of just one.
  • Added support for uint32 binary datasets. Note that samples are processed and stored into float32's, so the full range of uint32 samples can not be perfectly represented.
  • Lots of minor changes to improve the user experience. Some of the textboxes now shows units to make things more obvious, and some dropdown boxes were replaced with button groups to require one less click from the user.
  • Various small bug fixes. See the git commit log for more details.

Java 16 Notes

Java 16 was recently released and made some changes to how the internal APIs work. The OpenGL library that I use interacts with some of those internal APIs, and an updated version that is compatible with Java 16 has not been released yet. As a work around, if you use Java 16 you must run the .jar file from the command line with a special flag:

java --illegal-access=permit -jar TelemetryViewer_v0.8.jar

This work around also applies to older versions of Telemetry Viewer when using Java 16.

Telemetry Viewer v0.8 Demo Video



Download

Executables (.jar) and source code (.zip) can be downloaded at http://www.farrellf.com/TelemetryViewer/ or the project can be viewed at https://github.com/farrellf/TelemetryViewer

Telemetry Viewer v0.7

Telemetry Viewer v0.7 Changelog (2020-07-17)

  • Webcams and network cameras (MJPEG over HTTP) are now supported.
  • Initial support for the Raspberry Pi 4 (currently does not support antialiasing or cameras.)
  • A new "timeline" feature makes it easy to jump or scrub through lots of data.
  • Time domain charts can now show timestamps (date and time) along the x-axis.
  • Bitfield (boolean and enum) "levels" can now be visualized as bars drawn on top of the charts (similar to a logic analyzer.)
  • Timestamps can be shown in any of the common formats: YYYY-MM-DD, MM-DD-YYYY and DD-MM-YYYY.
  • Most of the OpenGL and chart code has been rewritten, resulting in massive speed improvements. CPU and GPU usage is often cut in half. When using Nvidia GPUs the GPU usage has been cut down by almost 80%!
  • Progress bars are now displayed when importing and exporting data.
  • Added support for Java 9+ (still works with Java 8.)
  • Various small bug fixes. See the git commit log for more details.

Raspberry Pi Notes

Telemetry Viewer will only work on the Pi 4. Older Pi's don't support some of the OpenGL ES features that are required, and implementing those features on the CPU would be slow.

The Pi 4 GPU is supposedly capable of OpenGL ES 3.2, but the drivers only fully support ES 3.1 and partially support ES 3.2. Telemetry Viewer requires "geometry shaders" which are part of ES 3.2.

As of today, the version of Mesa included in "Ubuntu MATE 20.04 Raspberry Pi 32-bit" supports geometry shaders, but the version of Mesa in "Raspberry Pi OS" does not. If you use Ubuntu, all you need to do is install Java ($ sudo apt install default-jre) and you are ready to use Telemetry Viewer.

If you want to use Raspberry Pi OS, try running Telemetry Viewer. Maybe you'll get lucky and by the time you read this an updated Mesa will already be in Raspberry Pi OS.

If you get GLSL errors (like the screenshot above) you will need to download Mesa from their git repo, then compile it and install it. You will also need to set an environment variable every time you want to run Telmetry Viewer. I do not recommend this for beginners, but here is how I got it working:

$ sudo pip3 install meson mako $ sudo apt install libdrm-dev llvm bison flex libxext-dev libxdamage-dev libxcb-glx0-dev libx11-xcb-dev libxcb-dri2-0-dev libxcb-dri3-dev libxcb-present-dev libxshmfence-dev libxxf86vm-dev libxrandr-dev ninja-build $ git clone https://gitlab.freedesktop.org/mesa/mesa.git $ cd mesa $ nano meson_options.txt set platforms to ['drm', 'x11', 'surfaceless'] and set gallium-drivers to ['kmsro', 'v3d', 'vc4', 'swrast'] $ mkdir build $ cd build $ meson .. $ sudo ninja install

To run Telemetry Viewer you will need to set an environment variable to select the new Mesa you just installed:

$ LD_LIBRARY_PATH="/usr/local/lib/arm-linux-gnueabihf" java -jar /path/to/TelemetryViewer_v0.7.jar

Telemetry Viewer v0.7 Demo Video



Download

Executables (.jar) and source code (.zip) can be downloaded at http://www.farrellf.com/TelemetryViewer/ or the project can be viewed at https://github.com/farrellf/TelemetryViewer

Introduction to JNI with Eclipse, GCC and MSYS2

One of the most common uses for the Java Native Interface (JNI) is to allow Java code to interact with C libraries. In this guide I will show how to write a small Java library that uses JNI to read data from a USB device. I will use the "D2XX" API from FTDI to communicate with a USB device that implements their "Synchronous 245 FIFO" protocol. D2XX is a relatively simple C library, so it makes for a nice introduction to JNI.

Readers not familiar with D2XX might want to skim through my previous blog post: http://www.farrellf.com/projects/software/2020-04-18_FTDI_Sync_245_FIFO_Tutorial__D2XX_with_Visual_Studio_2019/.

Some Useful Links

https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/jniTOC.html
Official JNI specification. This explains how the API works and why it was designed the way it is.

https://www3.ntu.edu.sg/home/ehchua/programming/java/JavaNativeInterface.html
An excellent tutorial on JNI.

Install the IDE, C Compiler and Related Tools

While Visual Studio might be the most popular IDE for Windows, I will be using Eclipse and GCC instead. I'd like to make my library cross-platform in the future, and that will be easier to do with GCC. There are a variety of ways to install GCC and other common UNIX utilities (make, etc.) on Windows. I've settled on using MSYS2. For the IDE, I use Eclipse with the JDT (Java Developer Toolkit) CDT (C/C++ Developer Toolkit) and TM Terminal plug-ins.

  1. Install Java 8.
    https://adoptopenjdk.net/ The default options in the installer are fine: Next > Accept > Next > Next > Install > Finish
  2. Install Eclpse for Java Developers, CDT and TM Terminal.
    https://www.eclipse.org/downloads/packages/installer Select "Eclipse IDE for Java Developers" > Install > Accept Now > Launch In Eclipse: Check "Use this as the default and do not ask again" > Launch Help > Check for Updates Help > Eclipse Marketplace > Search for "CDT" > Install > Confirm > Accept > Finish > Restart Now Help > Eclipse Marketplace > Search for "TM Terminal" > Install > Confirm > Yes > Accept > Finish > Restart Now File > Exit
  3. Install MSYS2 and add it to the PATH environment variable.
    https://www.msys2.org/ The default options in the installer are fine: Next > Next > Next > Finish In MSYS: $ pacman -Syu (and close the window when prompted to) Start > MSYS2 64bit > MSYS2 MSYS $ pacman -Syu $ pacman -S base-devel mingw-w64-x86_64-toolchain Start > "path" > Edit the System Environment Variables > Environment Variables > Path > Edit New > "C:\msys64\mingw64\bin" > New > "C:\msys64\usr\bin" > OK > OK > OK

Test Everything with a Hello World

  1. Open Eclipse and Create a New C Project.
    File > New > Project > C/C++ > C Project > Next > Project Name = "HelloWorld", Toolchain = "MinGW GCC" > Next > Finish > No File > New > Other > C/C++ > Source File > Next > Source File = "main.c" > Finish
  2. Write the hello world code.
    #include <stdio.h> int main(int argc, char** argv) { printf("hello world.\r\n"); return 0; }
  3. Verify that it compiles and runs without error.
    Project > Build All Run > Run

Create the Java Project

  1. If the C toolchain works, we can proceed to write the Java portion of the project.
    File > New > Java Project > Project Name = "EasyD2XX" > Finish File > New > Class > Package = "com.example.easyd2xx", Name = "EasyD2XX" > Finish
  2. Write the Java portion of the JNI code.
    package com.example.easyd2xx; import java.nio.ByteBuffer; import java.util.List; public class EasyD2XX { public String name; public String chipName; public String serialNumber; public int location; private long handle; private EasyD2XX(String name, String chipName, String serialNumber, int location) { this.name = name; this.chipName = chipName; this.serialNumber = serialNumber; this.location = location; } /** * Automatically load the EasyD2XX.dll file before any methods of this class are called. */ static { try { System.loadLibrary("EasyD2XX"); } catch(UnsatisfiedLinkError e) { // do nothing, to allow graceful degradation if the FTDI driver or the EasyD2XX.dll file could not be found. // native function calls will now throw an UnsatasfiedLinkError when they are called. } } /** * Gets a list of devices. * * @return A list of the attached FTDI D2XX devices. */ public static native List<EasyD2XX> getDevices(); /** * Opens and configures the device for Synchronous 245 FIFO mode. * * @param readTimeoutMilliseconds Maximum amount of time to wait when reading. * @param writeTimeoutMilliseconds Maximum amount of time to wait when writing. * @throws Exception If the device could not be opened, or * if the device does not support the Synchronous 245 FIFO mode. */ public native void openAsFifo(int readTimeoutMilliseconds, int writeTimeoutMilliseconds) throws Exception; /** * Reads a series of bytes from the device into a byte[]. * * @param byteCount How many bytes to read. * @return The received bytes, as a byte[]. * @throws Exception If the read timed out, or * if the device is no longer available. */ public native byte[] read(int byteCount) throws Exception; /** * Reads a series of bytes from the device into a ByteBuffer. * * @param buffer Location to store the read bytes. * @param byteCount How many bytes to read. * @throws Exception If the read times out, or * if the device is no longer available. */ public native void read(ByteBuffer buffer, int byteCount) throws Exception; /** * Closes the device. * * @throws Exception If the device was not already open. */ public native void close() throws Exception; }
  3. Create a test class that can be used as a demo and as verification of proper functionality.
    File > New > Class > Name = Test > Finish
  4. Write the test code.
    package com.example.easyd2xx; import java.nio.ByteBuffer; import java.util.List; import java.util.Scanner; public class Test { /** * A simple test for the EasyD2XX class. * * @param args Not currently used. */ public static void main(String[] args) { try { // get a list of devices List<EasyD2XX> devices = EasyD2XX.getDevices(); if(devices.isEmpty()) { System.out.println("No devices were detected. Exiting."); return; } System.out.println("Select a device to read from:"); System.out.println(); for(int i = 0; i < devices.size(); i++) { System.out.println("Device " + i + ":"); EasyD2XX device = devices.get(i); System.out.println("Name: " + device.name); System.out.println("Chip: " + device.chipName); System.out.println("SN: " + device.serialNumber); System.out.println("Location: " + device.location); System.out.println(); } // let the user pick a device Scanner stdin = new Scanner(System.in); int deviceIndex = stdin.nextInt(); stdin.close(); // connect EasyD2XX device = devices.get(deviceIndex); device.openAsFifo(1000, 1000); // read 1GB into a byte[] long start = System.currentTimeMillis(); @SuppressWarnings("unused") byte[] oneGbArray = device.read(1073741824); long stop = System.currentTimeMillis(); System.out.println("Read 1GB into a byte[] in " + (stop - start) + "ms."); // also read 1GB into a ByteBuffer start = System.currentTimeMillis(); ByteBuffer buffer = ByteBuffer.allocateDirect(1073741824); device.read(buffer, 1073741824); stop = System.currentTimeMillis(); System.out.println("Read 1GB into a ByteBuffer in " + (stop - start) + "ms."); // disconnect device.close(); System.out.println("Done. Exiting."); } catch(Exception | UnsatisfiedLinkError e) { System.out.println(e.getMessage() + " Exiting."); e.printStackTrace(); } } }

At this point the Java code is done and will compile just fine, but at run time you'll get a UnsatisfiedLinkError when any of the "native" methods are called. That is because the native (JNI) code needs to be written and compiled into a .dll file.

Add the C Portion of JNI Code to the Project

  1. Add C support to the Java project.
    File > New > Other > C/C++ > Convert to a C/C++ Project (Adds C/C++ Nature) > Next > check the project, choose "C Project", choose "Makefile Project" and "MinGW GCC" > Finish > No
  2. Add a Makefile.
    File > New > File > choose the project root directory, Filename = "makefile" > Finish
  3. Write the makefile with three targets. The first target, "make header" will be used to generate the JNI stubs. The next two targets are just helpers that print out signatures for your Java code ("make signatures") and also for any other Java class ("make sig"). Those two targets are optional, I just include them so I don't have to rememeber the commands to type in.
    # "make header" to generate the .h file header: mkdir -p jni javac -h jni src/com/example/easyd2xx/EasyD2xx.java rm src/com/example/easyd2xx/EasyD2xx.class # "make sig" to ask the user for a class name, then print the field and method signatures for that class sig: @bash -c 'read -p "Fully-qualified class name (example: java.util.List) ? " CLASSNAME && javap -s $$CLASSNAME'; # "make signatures" to print the field and method signatures for the EasyD2XX class signatures: javac src/com/farrellf/d2xx/EasyD2xx.java -d bin javap -s -p bin/com/farrellf/d2xx/EasyD2xx.class
  4. Show the "Build Targets" tab in Eclipse, then add targets for "header" and "signatures".
    Window > Show View > Other > Make > Build Targets > Open Select the project folder New Build Target > Target Name = "header" > OK New Build Target > Target Name = "signatures" > OK
  5. Run the "make header" target to generature the .h file.
    Double-click the "header" target to run it.

There will now be a "jni" subfolder in the project, containing a header file with stubs for each native method.

Unfortunately "make sig" can not be run from the Build Targets panel because Eclipse does not connect stdin when running it. Instead, you can run "make sig" from a terminal. I use the "TM Terminal" plug-in for Eclipse. With that plug-in, just select the project folder, then Ctrl-Alt-T to open a terminal in that folder.

If you open the com_example_easyd2xx_EasyD2XX.h file in Eclipse, you'll get lots of warnings and errors, because Eclipse doesn't know where to find the JNI headers (the "#include <jni.h>" line.)

  1. Add the JNI header folders to the Preprocessor Include Path.
    Project > Properties > C/C++ General > Preprocessor Include Paths, Macros etc. > Entires > GNU C > CDT User Setting Entries > Add Select "File System Path" Path = C:\Program Files\AdoptOpenJDK\jdk-8.0.252.09-hotspot\include Check "treat as built-in" Check "contains system headers" OK Repeat that again, but for path C:\Program Files\AdoptOpenJDK\jdk-8.0.252.09-hotspot\include\win32 Apply and Close
  2. Copy the D2XX .h and .lib files into the "jni" folder.
    Copy "ftd2xx.h" and "amd64\ftd2xx.lib" from the FTDI ZIP file into the "jni" folder Then in Eclipse: right-click the jni folder > Refresh
  3. Duplicate the com_example_easyd2xx_EasyD2XX.h file, rename the copy to .c, and write the C portion of the JNI code.
    #include "com_example_easyd2xx_EasyD2XX.h" #include "ftd2xx.h" #include <string.h> JNIEXPORT jobject JNICALL Java_com_example_easyd2xx_EasyD2XX_getDevices(JNIEnv* env, jclass thisClass) { // create an empty ArrayList object jclass listClass = (*env)->FindClass(env, "java/util/ArrayList"); if(listClass == NULL) return NULL; jmethodID listConstructor = (*env)->GetMethodID(env, listClass, "<init>", "()V"); if(listConstructor == NULL) return NULL; jobject list = (*env)->NewObject(env, listClass, listConstructor); if(list == NULL) return NULL; // get a handle for "list.add(object)" jmethodID listAddMethodHandle = (*env)->GetMethodID(env, listClass, "add", "(Ljava/lang/Object;)Z"); if(listAddMethodHandle == NULL) return NULL; // get a handle for "new EasyD2XX(name, chipName, serialNumber, location)" jmethodID easyD2xxConstructor = (*env)->GetMethodID(env, thisClass, "<init>", "(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;I)V"); if(easyD2xxConstructor == NULL) return NULL; // get the number of devices and build (but not read) the devices info list DWORD deviceCount = 0; if(FT_CreateDeviceInfoList(&deviceCount) != FT_OK) printf("Unable to get the FTDI devices count.\r\n"); // get the devices info list FT_DEVICE_LIST_INFO_NODE *devices = (FT_DEVICE_LIST_INFO_NODE*) malloc(sizeof(FT_DEVICE_LIST_INFO_NODE) * deviceCount); if(FT_GetDeviceInfoList(devices, &deviceCount) != FT_OK) printf("Unable to get the device info list.\r\n"); // for each FTDI device, create an EasyD2XX object and add it to the list for(DWORD i = 0; i < deviceCount; i++) { jstring name = (*env)->NewStringUTF(env, devices[i].Description); jstring chipName = devices[i].Type == FT_DEVICE_BM ? (*env)->NewStringUTF(env, "FT232BM") : devices[i].Type == FT_DEVICE_AM ? (*env)->NewStringUTF(env, "FT232AM") : devices[i].Type == FT_DEVICE_100AX ? (*env)->NewStringUTF(env, "100AX") : devices[i].Type == FT_DEVICE_UNKNOWN ? (*env)->NewStringUTF(env, "[Unknown Device]") : devices[i].Type == FT_DEVICE_2232C ? (*env)->NewStringUTF(env, "FT2232C") : devices[i].Type == FT_DEVICE_232R ? (*env)->NewStringUTF(env, "FT232R") : devices[i].Type == FT_DEVICE_2232H ? (*env)->NewStringUTF(env, "FT2232H") : devices[i].Type == FT_DEVICE_4232H ? (*env)->NewStringUTF(env, "FT4232H") : devices[i].Type == FT_DEVICE_232H ? (*env)->NewStringUTF(env, "FT232H") : devices[i].Type == FT_DEVICE_X_SERIES ? (*env)->NewStringUTF(env, "X Series") : devices[i].Type == FT_DEVICE_4222H_0 ? (*env)->NewStringUTF(env, "FT4222H, 0") : devices[i].Type == FT_DEVICE_4222H_1_2 ? (*env)->NewStringUTF(env, "FT4222H, 1-2") : devices[i].Type == FT_DEVICE_4222H_3 ? (*env)->NewStringUTF(env, "FT4222H, 3") : devices[i].Type == FT_DEVICE_4222_PROG ? (*env)->NewStringUTF(env, "FT4222, Prog") : devices[i].Type == FT_DEVICE_900 ? (*env)->NewStringUTF(env, "FT900 Series") : devices[i].Type == FT_DEVICE_930 ? (*env)->NewStringUTF(env, "FT930 Series") : devices[i].Type == FT_DEVICE_UMFTPD3A ? (*env)->NewStringUTF(env, "UMFTPD3A") : (*env)->NewStringUTF(env, "[Unknown Device]"); jstring serialNumber = (*env)->NewStringUTF(env, devices[i].SerialNumber); jint location = devices[i].LocId; jobject newEasyD2XXobject = (*env)->NewObject(env, thisClass, easyD2xxConstructor, name, chipName, serialNumber, location); if(newEasyD2XXobject == NULL) { free(devices); return NULL; } // list.add(newObject) (*env)->CallBooleanMethod(env, list, listAddMethodHandle, newEasyD2XXobject); if((*env)->ExceptionCheck(env)) { free(devices); return NULL; } } // done free(devices); return list; } JNIEXPORT void JNICALL Java_com_example_easyd2xx_EasyD2XX_openAsFifo(JNIEnv* env, jobject this, jint readTimeoutMilliseconds, jint writeTimeoutMilliseconds) { // get a handle for this class jclass thisClass = (*env)->GetObjectClass(env, this); if(thisClass == NULL) return; // get a handle for the Exception class, in case we need to throw an Exception jclass exception = (*env)->FindClass(env, "java/lang/Exception"); if(exception == NULL) return; // check if the device is a FT2232H or FT232H, because only those devices support FIFO mode jfieldID chipNameHandle = (*env)->GetFieldID(env, thisClass, "chipName", "Ljava/lang/String;"); if(chipNameHandle == NULL) return; jstring chipName = (*env)->GetObjectField(env, this, chipNameHandle); if(chipName == NULL) return; const char* chipNameCstring = (*env)->GetStringUTFChars(env, chipName, NULL); if(strcmp(chipNameCstring, "FT2232H") != 0 && strcmp(chipNameCstring, "FT232H") != 0) { (*env)->ReleaseStringUTFChars(env, chipName, chipNameCstring); (*env)->ThrowNew(env, exception, "Device does not support Synchronous 245 FIFO mode."); return; } (*env)->ReleaseStringUTFChars(env, chipName, chipNameCstring); FT_HANDLE ftdiHandle = 0; // open by location if possible (not possible on linux) jfieldID locationHandle = (*env)->GetFieldID(env, thisClass, "location", "I"); if(locationHandle == NULL) return; jint location = (*env)->GetIntField(env, this, locationHandle); if(FT_OpenEx((void*)(uintptr_t)location, FT_OPEN_BY_LOCATION, &ftdiHandle) != FT_OK) { // open by name if open by location failed jfieldID nameHandle = (*env)->GetFieldID(env, thisClass, "name", "Ljava/lang/String;"); if(nameHandle == NULL) return; jstring name = (*env)->GetObjectField(env, this, nameHandle); if(name == NULL) return; const char* nameCstring = (*env)->GetStringUTFChars(env, name, NULL); if(FT_OpenEx((void*)nameCstring, FT_OPEN_BY_DESCRIPTION, &ftdiHandle) != FT_OK) { (*env)->ReleaseStringUTFChars(env, name, nameCstring); (*env)->ThrowNew(env, exception, "Unable to open the device."); return; } (*env)->ReleaseStringUTFChars(env, name, nameCstring); } // configure the device if(FT_SetBitMode(ftdiHandle, 0xFF, 0x40) != FT_OK || // sync 245 FIFO mode FT_SetLatencyTimer(ftdiHandle, 2) != FT_OK || // minimum latency FT_SetUSBParameters(ftdiHandle, 65536, 65536) != FT_OK || // 64K buffers FT_SetFlowControl(ftdiHandle, FT_FLOW_RTS_CTS, 0, 0) != FT_OK || // flow control FT_Purge(ftdiHandle, FT_PURGE_RX | FT_PURGE_TX) != FT_OK || // flush FIFOs FT_SetTimeouts(ftdiHandle, readTimeoutMilliseconds, writeTimeoutMilliseconds) != FT_OK) { // timeouts // failure (*env)->ThrowNew(env, exception, "Unable to configure the device."); return; } else { // success jfieldID ftdiHandleHandle = (*env)->GetFieldID(env, thisClass, "handle", "J"); (*env)->SetLongField(env, this, ftdiHandleHandle, (uintptr_t) ftdiHandle); } } JNIEXPORT jbyteArray JNICALL Java_com_example_easyd2xx_EasyD2XX_read__I(JNIEnv* env, jobject this, jint byteCount) { // get a handle for this class jclass thisClass = (*env)->GetObjectClass(env, this); if(thisClass == NULL) return NULL; // get a handle for the Exception class, in case we need to throw an Exception jclass exception = (*env)->FindClass(env, "java/lang/Exception"); if(exception == NULL) return NULL; // get the value of "this.handle" jfieldID ftdiHandleHandle = (*env)->GetFieldID(env, thisClass, "handle", "J"); if(ftdiHandleHandle == NULL) return NULL; FT_HANDLE ftdiHandle = (FT_HANDLE) (uintptr_t) (*env)->GetLongField(env, this, ftdiHandleHandle); // create a new byte[] jbyteArray array = (*env)->NewByteArray(env, byteCount); if(array == NULL) return NULL; jbyte* buffer = (*env)->GetByteArrayElements(env, array, NULL); // read into the byte[] jint bytesRead = 0; while(byteCount > 0) { jint amount = (byteCount < 65536) ? byteCount : 65536; DWORD readAmount = 0; if(FT_Read(ftdiHandle, &buffer[bytesRead], amount, &readAmount) != FT_OK) { (*env)->ReleaseByteArrayElements(env, array, buffer, 0); (*env)->ThrowNew(env, exception, "Unable to read from the device."); return array; } bytesRead += readAmount; byteCount -= readAmount; } // done (*env)->ReleaseByteArrayElements(env, array, buffer, 0); return array; } JNIEXPORT void JNICALL Java_com_example_easyd2xx_EasyD2XX_read__Ljava_nio_ByteBuffer_2I(JNIEnv* env, jobject this, jobject buffer, jint byteCount) { // get a handle for this class jclass thisClass = (*env)->GetObjectClass(env, this); if(thisClass == NULL) return; // get a handle for the Exception class, in case we need to throw an Exception jclass exception = (*env)->FindClass(env, "java/lang/Exception"); if(exception == NULL) return; // get the value of "this.handle" jfieldID ftdiHandleHandle = (*env)->GetFieldID(env, thisClass, "handle", "J"); if(ftdiHandleHandle == NULL) return; FT_HANDLE ftdiHandle = (FT_HANDLE) (uintptr_t) (*env)->GetLongField(env, this, ftdiHandleHandle); // get the buffer and ensure it is big enough void* bufferPtr = (*env)->GetDirectBufferAddress(env, buffer); if(bufferPtr == NULL) return; jlong bufferSize = (*env)->GetDirectBufferCapacity(env, buffer); if(bufferSize < byteCount) { (*env)->ThrowNew(env, exception, "The buffer does not have enough space."); return; } // read into the ByteBuffer jint bytesRead = 0; while(byteCount > 0) { jint amount = (byteCount < 65536) ? byteCount : 65536; DWORD readAmount = 0; if(FT_Read(ftdiHandle, &((char*)bufferPtr)[bytesRead], amount, &readAmount) != FT_OK) { (*env)->ThrowNew(env, exception, "Unable to read from the device."); return; } bytesRead += readAmount; byteCount -= readAmount; } } JNIEXPORT void JNICALL Java_com_example_easyd2xx_EasyD2XX_close(JNIEnv* env, jobject this) { // get a handle for this class jclass thisClass = (*env)->GetObjectClass(env, this); if(thisClass == NULL) return; // get a handle for the Exception class, in case we need to throw an Exception jclass exception = (*env)->FindClass(env, "java/lang/Exception"); if(exception == NULL) return; // get the value of "this.handle" jfieldID ftdiHandleHandle = (*env)->GetFieldID(env, thisClass, "handle", "J"); if(ftdiHandleHandle == NULL) return; // close the device FT_HANDLE ftdiHandle = (FT_HANDLE) (uintptr_t) (*env)->GetLongField(env, this, ftdiHandleHandle); if(FT_Close(ftdiHandle) != FT_OK) { (*env)->ThrowNew(env, exception, "Unable to close the device."); return; } }
  4. Add a "dll" target to the makefile. It will be used to compile the JNI portion of the project into a .dll file. You can also add an "all" target, so that "make" with automatically call the "dll" target if no target is specified.
    dll: gcc jni/com_example_easyd2xx_EasyD2XX.c jni/ftd2xx.lib -I"C:/Program Files/AdoptOpenJDK/jdk-8.0.242.08-hotspot/include" -I"C:/Program Files/AdoptOpenJDK/jdk-8.0.242.08-hotspot/include/win32" -shared -o EasyD2XX.dll file EasyD2XX.dll nm EasyD2XX.dll | grep Java ldd EasyD2XX.dll all: dll
  5. In the "Build Targets" tab, add another build target like before, but for "dll". Double-click the target to run it.

Keep in mind that you can compile this without the D2XX drivers installed, and there will be no error messages. But if you run it, Java will not be able to find the D2XX DLL at run time, and you will get an UnsatisfiedLinkError even if the EasyD2XX.dll file is fine. That is part of why I call "ldd" while compiling the DLL. If ldd prints out some lines with "???" then you will have problems at run time. For example, before installing the D2XX driver, I get this:

ldd EasyD2XX.dll ntdll.dll => /c/WINDOWS/SYSTEM32/ntdll.dll (0x7ffb6e4a0000) KERNEL32.DLL => /c/WINDOWS/System32/KERNEL32.DLL (0x7ffb6cfd0000) KERNELBASE.dll => /c/WINDOWS/System32/KERNELBASE.dll (0x7ffb6bf10000) ??? => ??? (0x6af80000) ??? => ??? (0x7ffb6cdb0000)

After installing the D2XX driver, I get this:

ldd EasyD2XX.dll ntdll.dll => /c/WINDOWS/SYSTEM32/ntdll.dll (0x7ffb6e4a0000) KERNEL32.DLL => /c/WINDOWS/System32/KERNEL32.DLL (0x7ffb6cfd0000) KERNELBASE.dll => /c/WINDOWS/System32/KERNELBASE.dll (0x7ffb6bf10000) msvcrt.dll => /c/WINDOWS/System32/msvcrt.dll (0x7ffb6cdb0000) FTD2XX.dll => /c/WINDOWS/SYSTEM32/FTD2XX.dll (0x180000000) SETUPAPI.dll => /c/WINDOWS/System32/SETUPAPI.dll (0x7ffb6c810000) cfgmgr32.dll => /c/WINDOWS/System32/cfgmgr32.dll (0x7ffb6c1c0000) ucrtbase.dll => /c/WINDOWS/System32/ucrtbase.dll (0x7ffb6c210000) RPCRT4.dll => /c/WINDOWS/System32/RPCRT4.dll (0x7ffb6d090000) bcrypt.dll => /c/WINDOWS/System32/bcrypt.dll (0x7ffb6bc30000) USER32.dll => /c/WINDOWS/System32/USER32.dll (0x7ffb6e2c0000) win32u.dll => /c/WINDOWS/System32/win32u.dll (0x7ffb6bd10000) GDI32.dll => /c/WINDOWS/System32/GDI32.dll (0x7ffb6e290000) gdi32full.dll => /c/WINDOWS/System32/gdi32full.dll (0x7ffb6c310000) msvcp_win.dll => /c/WINDOWS/System32/msvcp_win.dll (0x7ffb6c4b0000) ADVAPI32.dll => /c/WINDOWS/System32/ADVAPI32.dll (0x7ffb6d230000) sechost.dll => /c/WINDOWS/System32/sechost.dll (0x7ffb6c560000) IMM32.DLL => /c/WINDOWS/System32/IMM32.DLL (0x7ffb6d480000)

You can now run the Java code with Run > Run, and it should print out a list of attached FTDI devices. That's all there is to a basic JNI project. You could merge the code into an existing project, or export this project as a Jar file so it can be used as a library for other projects.

Unlike regular Java code, JNI code is obviously platform-dependant. In other words, you have to compile a .dll/.so/.dylib file for each operating system and each architecture that you wish to support. This guide only covered 64-bit Windows, but in my next post I will cover other platforms and show how to bundle everything into a single Jar file.

Finally, a small note for anyone familiar with the D2XX library: FTDI provides their library in both static and dynamic forms. I used the dynamic form because the static version for Windows will not link properly when using GCC. It seems like FTDI only intends for it to be used with the Visual C++ compiler.

YouTube Video

FTDI Synchronous 245 Tutorial: D2XX with Visual Studio 2019

Perhaps the easiest way to get data between an FPGA (or microcontroller) and a PC is with a UART. It works great up to a few megabits per second, but often becomes unreliable if you push it much past that. One easy way to transfer hundreds of megabits per second is to use an FTDI chip that supports their "Synchronous 245 FIFO" protocol. It is very easy for an FPGA to implement, and I have been able to reliably transfer data to my PC at just over 350Mbps.

I used an FT232H on a "UM232H" development board from FTDI. DigiKey sells them for around $22: https://www.digikey.com/product-detail/en/ftdi-future-technology-devices-international-ltd/UM232H/768-1103-ND/

I replaced it's headers with some regular 0.1" pin headers because the factory-installed pins do not mate well with jumper wires. Here's what my setup looks like when wired to a Lattice MachXO2 FPGA development board:

Some Useful Links

https://www.ftdichip.com/Support/Documents/DataSheets/Modules/DS_UM232H.pdf
Datasheet for the FTDI development board. It summarizes the features, lists the pin out, specifies how to configure the power pins, and contains the schematic.

https://www.ftdichip.com/Drivers/D2XX.htm
To use the Synchronous 245 FIFO mode, you will need the "D2XX" driver. This might have been automatically installed when you plugged an FTDI into your PC, but you need to download this anyway, because the ZIP file contains the .lib and .h files needed when writing the software.

https://www.ftdichip.com/Support/Documents/ProgramGuides/D2XX_Programmer's_Guide(FT_000071).pdf
The D2XX Programmer's Guide explains how to use their API. It covers all of the data structures and functions that will be used.

https://www.ftdichip.com/Support/Documents/AppNotes/AN232B-03_D2XXDataThroughput.pdf
https://www.ftdichip.com/Support/Documents/AppNotes/AN232B-04_DataLatencyFlow.pdf
Applications Notes explaining how the buffers and latency timer interact, which you need to understand in order to get the best performance from the system.

https://www.ftdichip.com/Support/Documents/TechnicalNotes/TN_153%20Instructions%20on%20
Including%20the%20D2XX%20Driver%20in%20a%20VS%20Express%202013%20Project.pdf

Explains how to setup a Visual Studio project so it can use the D2XX API. This is basically the "hello world" tutorial.

https://www.ftdichip.com/Support/Documents/AppNotes/AN_130_FT2232H_Used_In_
FT245%20Synchronous%20FIFO%20Mode.pdf

Explains how to use the Synchronous 245 FIFO mode, from both a hardware and software perspective. It contains timing diagrams, demo code, some advise, etc.

Prepare a Visual Studio 2019 Project

  1. Open Visual Studio 2019, then create a new project:
    File > New > Project > Empty C++ Project > Next > Project Name = "d2xx_test", Location = Desktop > Create
  2. Create the main.cpp file:
    Right-click the project > Add > New Item > C++ File, Name = "main.cpp" > Add
  3. Copy the header file and library file from the D2XX Driver zip file ("CDM v2.12.28 WHQL Certified.zip" or similar) into the project's source code folder:
    From the ZIP file: copy "/Static/amd64/ftd2xx.lib" and "/ftd2xx.h" into "Desktop/d2xx_test/d2xx_test/"
  4. Update the IDE so it knows about those files:
    Drag-n-drop the LIB file onto the Resource Files folder in the Visual Studio Solution Explorer. Drag-n-drop the H file onto the Header Files folder in the Visual Studio Solution Explorer. Right-click the project > Properties > Configuration = "All Configurations", Platform = "All Platforms" > Configuration Properties > C / C++ > Preprocessor > Preprocessor Definitions > click the "V" icon > Edit > type "FTD2XX_STATIC" > OK Linker > Input > Additional Dependencies > click the "V" icon > Edit > type "ftd2xx.lib" > OK > OK

Demo Program

By now the Visual Studio project is fully setup, so you can start using the API. Below is a simple demo program I wrote. It's reads 1GB of data from the Lattice MachXO2 FPGA.

  1. The program starts by displaying information about each FTDI device that is attached. Keep in mind that not all FTDI devices support the Synchronous 245 FIFO protocol.
  2. If a device with a certain serial number is found, the program will attempt to connect, configure for FIFO mode, and read 1GB of data.
  3. An error message will be displayed if there are any problems, and the program will exit.

Software Source Code (C++)

#include <stdio.h> #include <time.h> #include "ftd2xx.h" int main(int argc, char** argv) { FT_HANDLE handle; // check how many FTDI devices are attached to this PC unsigned long deviceCount = 0; if(FT_CreateDeviceInfoList(&deviceCount) != FT_OK) { printf("Unable to query devices. Exiting.\r\n"); return 1; } // get a list of information about each FTDI device FT_DEVICE_LIST_INFO_NODE* deviceInfo = (FT_DEVICE_LIST_INFO_NODE*) malloc(sizeof(FT_DEVICE_LIST_INFO_NODE) * deviceCount); if(FT_GetDeviceInfoList(deviceInfo, &deviceCount) != FT_OK) { printf("Unable to get the list of info. Exiting.\r\n"); return 1; } // print the list of information for(unsigned long i = 0; i < deviceCount; i++) { printf("Device = %d\r\n", i); printf("Flags = 0x%X\r\n", deviceInfo[i].Flags); printf("Type = 0x%X\r\n", deviceInfo[i].Type); printf("ID = 0x%X\r\n", deviceInfo[i].ID); printf("LocId = 0x%X\r\n", deviceInfo[i].LocId); printf("SN = %s\r\n", deviceInfo[i].SerialNumber); printf("Description = %s\r\n", deviceInfo[i].Description); printf("Handle = 0x%X\r\n", deviceInfo[i].ftHandle); printf("\r\n"); // connect to the device with SN "FT3SSN2O" if(strcmp(deviceInfo[i].SerialNumber, "FT3SSN2O") == 0) { if (FT_OpenEx(deviceInfo[i].SerialNumber, FT_OPEN_BY_SERIAL_NUMBER, &handle) == FT_OK && FT_SetBitMode(handle, 0xFF, 0x40) == FT_OK && FT_SetLatencyTimer(handle, 2) == FT_OK && FT_SetUSBParameters(handle, 65536, 65536) == FT_OK && FT_SetFlowControl(handle, FT_FLOW_RTS_CTS, 0, 0) == FT_OK && FT_Purge(handle, FT_PURGE_RX | FT_PURGE_TX) == FT_OK && FT_SetTimeouts(handle, 1000, 1000) == FT_OK) { // connected and configured successfully // read 1GB of data from the FTDI/FPGA char rxBuffer[65536] = { 0 }; unsigned long byteCount = 0; time_t startTime = clock(); for(int i = 0; i < 16384; i++) { if(FT_Read(handle, rxBuffer, 65536, &byteCount) != FT_OK || byteCount != 65536) { printf("Error while reading from the device. Exiting.\r\n"); return 1; } } time_t stopTime = clock(); double secondsElapsed = (double)(stopTime - startTime) / CLOCKS_PER_SEC; double mbps = 8589.934592 / secondsElapsed; printf("Read 1GB from the FTDI in %0.1f seconds.\r\n", secondsElapsed); printf("Average read speed: %0.1f Mbps.\r\n", mbps); return 0; } else { // unable to connect or configure printf("Unable to connect to or configure the device. Exiting.\r\n"); return 1; } } } return 0; }

Firmware Source Code (Verilog)

`default_nettype none module top ( // ftdi 245 fifo signals output reg [7:0] data, // [7:0] = pins 1,2,3,4,9,10,11,12 input wire rx_empty, // pin 13 input wire tx_full, // pin 14 output reg read_n, // pin 19 output reg write_n, // pin 20 output reg send_immediately_n, // pin 21 input wire clock_60mhz, // pin 27 output reg output_enable_n, // pin 28 // status leds output reg power_led_n, // pin 97 output reg tx_active_led_n // pin 107 ); reg [7:0] counter; always @(posedge clock_60mhz) begin power_led_n <= 0; output_enable_n <= 1; send_immediately_n <= 1; if(!tx_full) begin write_n <= 0; data <= counter; tx_active_led_n <= 0; counter <= counter + 1; end else begin write_n <= 1; read_n <= 1; tx_active_led_n <= 1; end end endmodule

Output of Test Run

Device = 0 Flags = 0x2 Type = 0x6 ID = 0x4036010 LocId = 0x262 SN = B Description = Lattice FTUSB Interface Cable B Handle = 0x0 Device = 1 Flags = 0x2 Type = 0x6 ID = 0x4036010 LocId = 0x261 SN = A Description = Lattice FTUSB Interface Cable A Handle = 0x0 Device = 2 Flags = 0x2 Type = 0x8 ID = 0x4036014 LocId = 0x281 SN = FT3SSN2O Description = UM232H Handle = 0x0 Read 1GB from the FTDI in 24.1 seconds. Average read speed: 356.5 Mbps.

YouTube Video

  1  Next >