java

Nov 09 2020

Human Friendly Error Handling in the IVAAP Data Backend

As the use cases of IVAAP grow, the implementation of the data backend evolves. Past releases of IVAAP have been focused on providing data portals to our customers. Since then, a new use case has appeared where IVAAP is used to validate the injection of data in the cloud. Both use cases have a lot in common, but they differ in the way errors should be handled.

In a portal, when a dataset fails to load, the reason why needs to stay “hidden” to end-users. The inner workings of the portal and its data storage mechanisms should not be exposed as they are irrelevant to the user trying to open a new dataset. When IVAAP is used to validate the results of an injection workflow, many more details about where the data is and how it failed to load need to be communicated. And these details should be expressed in a human friendly way.

To illustrate the difference between a human-friendly message and a non-human friendly message, let’s take the hypothetical case where a fault file should have been posted as an object in Amazon S3,… but the upload part of the ingestion workflow failed for some reason. When trying to open that dataset, the Amazon SDK would report this low-level error: “The specified key does not exist. (Service S3, Status Code: 404, Request ID: XXXXXX)”. In the context of an ingestion workflow, a more human friendly message would be “This fault dataset is backed by a file that is either missing or inaccessible.”

The IVAAP Data Backend is written in Java. This language has a built-in way to handle errors, so a developer’s first instinct is to use this mechanism to pass human friendly messages back to end users. However, this approach is not as practical as it seems. The Java language doesn’t make a distinction between human-friendly error messages and low-level error messages such as the one sent by the Amazon SDK, meant to be read only by developers. Essentially, to differentiate them, we would need to create a HumanFriendlyException class, and use this class in all places where an error with a human-friendly explanation is available.

This approach is difficult to scale to a large body of code like IVAAP’s. And the IVAAP Data Backend is not just code, it also comes with a large set of third-party libraries that have their own idea of how to communicate errors. To make matters worse, It’s very common for developers to do this:

    try {

             // do something here

        } catch (Exception ex) {

            throw new RuntimeException(ex);

        }

This handling wraps the exception, making it difficult to catch by the caller. A “better” implementation would be:

try {

             // do something here

       } catch (HumanFriendlyException ex) {

            throw ex;

        } catch (Exception ex) {

            throw new RuntimeException(ex);

        }

While this is possible to enforce this style for the entirety of IVAAP’s code, you can’t do this for third party libraries calling IVAAP’s code.

Another issue with Java exceptions is that they tend to occur at a low-level, where very little context is known. If a service needs to read a local file, a message “Can’t read file abc.txt” will only be relevant to end users if the primary function of the service call was to read that file. If reading this file was only accessory to the service completion, bubbling up an exception about this file all the way to the end-user will not help.

To provide human-friendly error messages, IVAAP uses a layered approach instead:

High level code that catches exceptions reports these exceptions with a human friendly message to a specific logging system
When exceptions are thrown in low level code, issues that can expressed in a human friendly way are also reported to that same logging system

With this layered approach where there is a high-level “catch all”, IVAAP is likely to return relevant human friendly errors for most service calls. And the quality of the message improves as more low-level logging is added. This continuous improvement effort is more practical than a pure exception-based architecture because it can be done without having to refactor how/when Java exceptions are thrown or caught.

To summarize, the architecture of IVAAP avoids using Java exceptions when human-friendly error messages can be communicated. But this is not just an architecture where human-friendly errors use an alternate path to bubble up all the way to the user. It has some reactive elements to it.

For example, if a user calls a backend service to access a dataset, and this dataset fails to load, a 404/Not Found HTTP status code is sent by default with no further details. However, if a human friendly error was issued during the execution of this service, the status code changes to 500/Internal Server Error, and the content of the human friendly message is included in the JSON output of this service. This content is then picked up by the HTML5 client to show to the user. I call this approach “reactive” because unlike a classic logging system, the presence of logs modifies the visible behavior of the service.

With the 2.7 release of IVAAP, we created two categories of human friendly logs. One is connectivity. When a human friendly connectivity log is present, 404/Not Found errors and empty collections are reported with a 500/Internal Server Error HTTP status code. The other is entitlement. When a human friendly entitlement log is present, 404/Not Found errors and empty collections are reported with a 403/Forbidden HTTP status code.

The overall decision on which error message to show to users belongs to the front-end. Only the front-end knows the full context of the task a user is performing. The error handling in the IVAAP Data Backend provides a sane default that the viewer can elect to use, depending on context. OSDU is one of the deployments where the error handling of the data backend is key to the user experience. The OSDU platform has ingestion workflows outside of IVAAP, and with the error reporting capabilities introduced in 2.7, IVAAP becomes a much more effective tool to QA the results of these workflows.

For more information on INT’s newest platform, IVAAP, please visit int.flywheelstaging.com/products/ivaap/

Nov 06 2020

How to Get the Best Performance out of Your Seismic Web Applications

One of the most challenging data management problems faced in the industry is with seismic files. Some oil and gas companies estimate that they acquire a petabyte of data per day or more. Domain knowledge and specific approaches are required to move, access, and visualize that data.

In this blog post, we will dive deep into the details of modern technology that can be useful to achieve speed up. We will also cover: common challenges around seismic visualization, how INT helps solve these challenges with advanced compression and decompression techniques, how INT uses vectorization to speed up compression, and more.

What Is IVAAP?

IVAAP is a data visualization platform that accelerates the delivery of cloud-enabled geoscience, drilling, and production solutions.

IVAAP Client offers flexible dashboards, 2D & 3D widgets, sessions, and templates
IVAAP Server side connects to multiple data sources, integrates with your workflows, and offers real-time services
IVAAP Admin client manages user access and projects

Server – Client Interaction

Interaction occurs when the client requests a file to display from the server, the server returns the file lists, the user chooses a file to display, and then the server starts sending chunks of data while it displays this data.

Some issues encountered with this scheme include:

Seismic data files are huge in size — they can be hundreds of gigabytes or even terabytes.
Because of the file size, it takes too much time to transfer files via network.
The network can have too much bandwidth.

The goals of this scheme are to:

Speed up file transfer time
Reduce data size for transfer
Add user controls for different network bandwidth

And the solution:

We decided to implement server-side compression and client-side decompression. We also decided to provide the client parameter that we call acceptable error level after the seismic data file compression/decompression process.

By taking a closer look at compression and decompression data, we can see that the original seismic data goes through a set of five transformations — AGC, Normalization, Hear Wavelets, Quantization, and Huffman. As a result of this transformation, we get a compressed file that can be sent to clients via network. And on the client’s side, there is a decompression process that goes in different directions — from inverse Huffman to inverse AGC. This is the way that clients get original data. It does not get precise, original data. But it gets data after the compression and decompression process. That’s why we added an acceptable error level after the compression and decompression process. This is because we have different scenarios where clients don’t always require the full original data with the full level of precision. For example, sometimes the client only needs to review the seismic data. So using this acceptable error level, they can control how much data will be passed by a network and, of course, speed up this process.

The resulting scheme looks like this:

The client requests a file list from the server, the user chooses a file to display, and then the server starts sending the data and compresses it. The server then sends it to the client, the client decompresses, and finally, it displays the data. This is repeated for each tile to display.

So why not use any other existing compression, like GZIP, LZ Deflate, etc.? We tried these compressions, but we found out that this type of compression is not as effective as we’d like it to be on our seismic data.

Server-Side Interaction

The primary objective was to speed up the current implementation of compression and decompression on both the server and client side.

The proposal:

Server-side compression is implemented in Java, so we decided to create C++ implementation of compression sequence and use JNI layer to call native methods. For the client-side decompression, we implemented in JavaScript to create C++ implementation of decompression and use WebAssembly (WASM) proposal for integrating C++ code into JS.
We implemented both compression and decompression algorithms in C++, but after comparing the results and performance of C++ and Java, we discovered that C++ was just 1.5 times faster than “warmed up JVM”. That’s why we decided to move on and apply SIMD instructions for further speedup.

Single Instruction Multiple Data (SIMD)

SIMD architecture performs the same operation on multiple data elements in parallel. For Scalar operation, you have to perform four separate calculations to get the right result. For SIMD operations, you apply one vector value calculation to get the correct result.

SIMD benefits:

Allows processing of several data values with one single instruction.
Much faster computation on predefined computation patterns.

SIMD drawbacks:

SIMD operations cannot be used to process multiple data in different ways.
SIMD operations can only be applied to predefined processing patterns with independent data handling.

Normalization: C++ scalar implementation

Normalization: C++ SIMD SSE implementation

Server-Side Speedup Results

There are different types of speedup for different algorithms:

Normalization is 9 times faster than the scalar C++ version
Haar Wavelets is 6 times faster than the scalar C++ version
Huffman has no performance increase (not vectorizable algorithm)

Overall, the server-side compression performance improvement is around 3 times faster than the Java version. This is applying SIMD C++ code. This was good for us, so we decided to move on to the client-side speedup.

Client-Side Speedup

For the client-side speedup, we implemented decompression algorithms in C++ and used WASM to integrate the C++ code in JavaScript.

WebAssembly

WASM is:

A binary executable format that can run in browsers
A low-level virtual machine
A high-level language compile result

WASM is not:

A programming language
Connected to the web and cannot be run outside the web

Steps to get WASM working:

Compile C/C++ code with Emscripten to obtain a WASM binary
Bind WASM binary to the page using a JavaScript “glue code”
Run app and let the browser instantiate the WASM module, the memory, and the table of references. Once that is done, the WebApp is fully operative.

C++ Code to Integrate (TaperFilter.h/cpp)

Emscripten Bindings

WebAssembly Integration Example

Client-Side Speedup Takeaways:

Emscripten supports the WebAssembly SIMD proposal
Vectorized code will be executed by browsers
The results of vectorization for decompression algorithm are:
- Inv Normalization: 6 times speedup
- Inv Haar Wavelets: 10 times speedup
- Inv Huffman: no performance improvement (not vectorizable)

Overall, the client-side decompression performance improvement with vectorized C++ code was around 6 times faster than the JavaScript version.

For more information on GeoToolkit, please visit int.com/geotoolkit/ or check out our webinar, “How to Get the Best Performance of Your Seismic Web Applications.”

Apr 23 2020

Opening IVAAP to Your Proprietary Data Through the Backend SDK

When doing demos of IVAAP, the wow factor is undeniably its user interface, built on top of GeoToolkit.JS. What users of IVAAP typically don’t see is the part accessing the data itself, the IVAAP backend. When we designed the IVAAP backend, we wanted our customers to be able to extend its functionalities. This is one of the reasons we chose Java for its programming language—customers typically have access to Java programmers.

Java is the programming language; it is a well-known, generic-purpose language, but the IVAAP Backend Software Development Kit (SDK) is typically only discovered during an IVAAP evaluation. In previous articles, I described the Lookup API (How to Empower Developers with à la Carte Deployment in IVAAP Upstream Data Visualization Platform) and the use of scopes (Using Scopes in IVAAP: Smart Caching and Other Benefits for Developers). As the SDK has grown, I thought it would be a good time to review what else this SDK provides.

One Optimized Use Case: Plugging Your Own Data

The most common question that I get is: “I see that you can access a WITSML datasource, a PPDM database. I have my own proprietary store for geoscience data, what do I need to do to make IVAAP visualize the data for my data store?” This is where the SDK comes into play. You do not need to modify IVAAP backend’s code to add your own data. In a nutshell, you just need to write a few Java classes, compile them, and add them to your IVAAP deployment.

The Java classes you write need to meet the Application Programming Interface (API) that the SDK defines. If you are a developer, this answer is not enough, this is the textbook definition of a SDK. What makes the IVAAP Backend SDK efficient for our use case is that you only need to write the API for the data you have. Since IVAAP’s built-in data model allows the visualization of maybe 30 different aspects of a well (log curves, deviations, tubing sets, mud logs, raster logs, etc), you only need to write classes for the data you have. For example, to visualize log curves, regardless of how these curves are stored, you only need to write about a dozen classes for a complete implementation.

The next question I get at this point is: “How do I know what to write?”. There is a large amount of documentation available. During the evaluation process, you are granted access to our developers site. This site is a reference used by all INT developers working on the IVAAP backend, whether they are developing IVAAP itself, or creating plugins for customers. It’s a Wiki and gets updated regularly. When I get support questions about the SDK, I typically will write an article in that Wiki and share the link. This is not the only piece of documentation available. There is a classic JavaDoc documentation that details the API in a formal manner. And there is also sample code. We created a sample connector to a SQL database storing well curves, trajectories, well locations and schematics as a practical example on how to use the SDK.

An Extensive Geoscience Data Model to Leverage

Lots of work has been done in IVAAP to facilitate workflows associated with wells, whether they are drilling workflows, production monitoring workflows, or just to manage an inventory. Specifically, IVAAP has a data model to expose the location of wells, log curves, deviation curves, mud logs, schematics, fracking, core images, raster logs, tops and any type of well documentation. Wells are not the only data models that IVAAP includes. Other models exist for seismic data and reservoirs. Several types of surfaces are also supported such as faults, grid surfaces, triangle meshes and seismic horizons.

These data models were built over-time based upon the common denominator between models coming from different systems. For example, if you are familiar with WITSML, you will find that the definition of a well log resembles what WITSML provides, but is flexible enough to also support LAS and DLIS files. From a developer perspective, the data model is exposed through the SDK’s API, without making any assumption on how this data is stored. The data model works for data stored in the cloud, on a file system, in a SQL database, and even data exposed only through a web service. While most of IVAAP’s connectors access one form of data store at a time, some connectors mix storages to combine data from web services and cloud storages. IVAAP’s data model is storage-agnostic, and the services to expose this data model to the HTML5 client are storage-agnostic as well.

IVAAP covers the most common data types found in geoscience. It provides the services to access this data, and the UI to visualize it. When starting an IVAAP development project, most developers should only have to focus on plugging their data, expressing through the SDK’s API on how to retrieve this data.

An API to Customize Entitlements

There is one more way that the IVAAP SDK makes the developer experience seamless when plugging a proprietary datastore. Not only does no code have to be written to expose this data to the viewer, but no code has to be written to control who has access to which data. Both aspects are built-in into the code that will call your implementation. You only have to write the data access layer, and not worry about entitlements or web services. By default, entitlements are based upon the information entered in the IVAAP Administration application.

This separation between data access and entitlements saves development time, but there are cases when a data store controls both data and access to this data. When IVAAP needs to access such an integrated system, the entitlement checks layer needs to be performed by the data access code. The entitlement API allows these checks to be performed at the data level.

The entitlement API is actually very fine-grained. You can customize the behavior of each service to limit access to specific data points. For example, the default behavior of IVAAP is to grant access to all curves of a well when you have been granted access to that well. Depending on your business rules, you might elect to restrict access to specific log curves. The SDK doesn’t force you into an “all or nothing” decision.

An API to Implement Your Own REST Services

Another typical use case is when you need to give access to data that doesn’t belong to the IVAAP built-in data model. In this particular situation, you need to extend IVAAP by adding custom widgets, and ad-hoc web services are needed to expose the relevant data to this widget. There is of course an API for this. External developers use the same API as INT developers to implement web services. INT has developed more than 500 REST services using this API, and external developers benefit from this experience.

Most services are JSON-based, and IVAAP uses the jackson libraries to create JSON content. To advertise capabilities to the HTML5 client, the IVAAP backend uses HATEOAS links. For example, if the JSON description of a well has a link to the mud logs services, then this well has mud logs. If this link is not present, the HTML5 client understands that this well doesn’t contain mud logs, and will adapt its UI accordingly. If you were to add your own service exposing more data associated with a well, you would typically want to add your own HATEOAS to the description of wells. Adding HATEOAS links to existing services is possible by plugging so-called Entity classes. You do not need to modify the code of this service to modify its behavior.

IVAAP’s REST services follow the OpenAPI specifications. There is actually a built-in web service whose only purpose is to expose the available services in the classic Swagger format. IVAAP’s SDK uses annotations similar to the Swagger Annotations API. If you are familiar with this API, documenting your own REST services should be a breeze.

Most of the REST services are JSON-based, but sometimes binary streams are used instead for performance reasons. Binary streams are typically used in IVAAP to expose seismic data, but also surfaces. The SDK uses events to implement such streaming services.

An API to Implement Your Own Real Time Feeds

The service API is not limited to REST services. An API is also available to communicate with the IVAAP HTML5 client through websockets. The WebSockets API is typically used to implement real time communications between the client and the server. For example, when a user opens a well, the user interface uses websockets to send a subscription message to the backend, requesting to be notified if this well changes. This enables a whole set of capabilities, such as real time monitoring. This is the API we use to monitor wells from WITSML datasources. The SDK includes an entire set of hooks so that customers can write their own feeds, including subscription, unsubscription and broadcast of messages.

When you write REST services, the container details are abstracted away and you only need to worry about implementing domain-related code. A REST service working in a Tomcat based development environment will work without any modification in a Play cluster. Likewise, feeds developed with the SDK work seamlessly in both Tomcat and Play. On a developer station, the SDK will use end points from the Servlet API to carry messages. In a Play cluster, the SDK will use ActiveMQ. ActiveMQ allows scalability and reliability features that servlets miss, such as high-rate of messages, and reliable delivery of messages. The use of ActiveMQ is transparent to the developers of feeds.

Utilitarian APIs

There is more to the IVAAP SDK than its APIs to access data, write services or customize entitlements. There are a few other APIs worth mentioning. One of them is the API to perform CRS conversions. Its default implementation uses Apache SIS, but the API itself is generic in nature. CRS conversions are often needed in geoscience, for example to visualize datasets on a map, on top of satellite imagery. Years of work has been built into the Apache SIS library, and virtually no work is needed by IVAAP developers to leverage this library when the SDK is used.

There are also APIs to execute code at startup and to query the environment that IVAAP is running on. The Lookup API gives access to the features that are plugged. The DataSource API indicates which data sources are configured to run in the JVM. The Hosted Services API provides an inventory of the external services that an IVAAP instance needs to interact with. A hosted service could be the REST service that evaluates formulas, or the machine learning system that IVAAP feeds its data to.

A “Developer-Friendly” Development Environment

We made lots of efforts to make sure the development process would be as simple as possible. Developers with experience with Java Servlets will be at ease with their IVAAP development environment. They will use tools they are familiar with such as Eclipse and Tomcat. A production instance of IVAAP doesn’t use servlets, it uses the Play framework. By following the SDK’s API, it is virtually transparent to developers that their code will be deployed in a cluster.

There are a few instances where awareness of the cluster environment is needed. For example, when caching is involved, you want to make sure that all caches are cleared across all JVMs when data gets updated. The IVAAP SDK includes an API to send and receive cluster events, and to create your own events. Since events are serialized from/to JSON, instances in the cluster do not need to share the same build version to interact with each other. This was a deliberate design choice so that you can upgrade your cluster while it’s running, without service interruption.

Caching is a large topic, outside of the scope of this article. IVAAP’s SDK proposes a “DistributedStore” API that hides the complexity of sharing state across JVMs. As long as you use this API, code that caches data will work without any modification in a single-JVM development environment and a multiple-JVMs production environment.

Finally, the SDK’s API is designed to allow fast iterative development. For example, once you have implemented the two classes that define how to list wells in your datastore, you can test them right away with Postman. Earlier I wrote that plugging your own log curves requires about a dozen classes. There is no need to write all twelve to start seeing results. Actually, you do not need to launch Postman to test your web services. You can test services using JUnit. A REST service written with the SDK can be tested with JUnit. This saves time by eliminating the need to launch Tomcat.

When you evaluate IVAAP, you might not have enough time to grasp the depth of the IVAAP SDK. Hopefully, this guide will help you get started.

May 15 2019

5 Simple Techniques to Avoid Bugs While Programming

Programming is an activity that requires a special set of cognitive skills. While the industry has developed processes and tools to ensure the quality of software artifacts, the act of writing code is a craft in itself. Developers pride themselves on the “big picture” results they achieve, but the activity of programming is definitely a humbling experience: It’s easy to introduce bugs, and regardless of whether I catch them right away or later in the pipeline, I hate to be reminded I am inherently flawed and have introduced a defect. For this article, I will focus on simple methods to avoid bugs, not before or after you write code, but while you write that code.

Leverage the Hints from the IDE

This one should be obvious to most programmers, but the reality is that the tools we use are a bit smarter than what we give them credit for. It’s easy to miss these hints. They are sometimes buried in build logs, drowned in a sea of flags, visually too subtle to attract our much-needed attention. Listen to your Integrated Development Environment (IDE)—it is telling you something.

One of the most obvious signs that something is up with your code is when the IDE tells you that you are not using a variable you introduced earlier in your code. You had some intent for this variable, but that intent got lost while working on some other aspect. There is certainly a bug lurking there.

Another useful alert is when the IDE detects that there is a logical path for a NullPointerException. Maybe the most common case is well taken care of, but there seems to be a road less-travelled to this ubiquitous exception.

The activity of writing code forces you to concentrate on the text you are typing while keeping a stack of other concerns in mind. This cognitive load is already heavy; I like to reduce it by removing ambiguities. The IDE will tell you when you are using the same variable name for an instance variable and a local variable. I don’t need to risk being confused about which one I am manipulating. I’ll follow my digital assistant’s advice and rename the local variable in question.

Break up Your Content

The brain can only keep so much information. And each coding window only shows you about 50 lines. When the code grows, so does the likelihood of bugs, just from not remembering details implemented a few invisible lines away. One simple technique to avoid such troubles is to break up the code. Keep the methods short. Ideally, a well-focused class shouldn’t have more than 200 lines. It’s not always possible, but where it is, you can avoid bugs when each class has only one responsibility. The class’ code itself is easy to review, and its function is easy to remember in other contexts. If you just wrote a large body of code in one class, it’s time to break it up into smaller pieces.

This is nothing new. When the metric of cyclomatic complexity was developed in 1976, its first applications tried to limit the complexity of each module, splitting them into smaller modules. These modules became easier to write, and easier to test, too. If the most likely place for defects is where complexity lies, reducing complexity automatically reduces the rate of defects.

It’s not just the height of your screen that matters when having “too much to code” might be “too much to cope.” The width matters, too. This is actually a classic source of bugs: a line goes beyond what’s visible without scrolling, meaning it’s not being looked at as often as other lines. Code reviewed less is more likely to be incorrect. Long lines are also bug candidates when overly confident IDEs autofill their content over the developer’s watchful eyes. The simple solution is to break up these long lines, for example, inserting carriage returns when they go over 120 characters. The IDE will often show you this limit graphically.

Use the Java Type System

Object-oriented programming is a great way to abstract sequential instructions into relationships and behaviors. The Java type system is quite powerful. When two pieces of code interact, the parameters of this interaction typically need to meet specific type requirements, so the compiler will prevent code being typed from using the wrong object type in the wrong context.

You can use this type checking to your advantage to reduce opportunities for bugs. Writing a program often requires code where mundane objects are being passed around—a collection of names, an Integer identifying a record, etc. If you inadvertently pass the wrong list of Strings, the compiler won’t be any help. Instead of passing List<String> or Integer, create your own objects, such as “NamesCollection” and “RecordId.” By forcing the passing of functional objects instead of generic ones, you will know immediately, as you type, that you just picked up the wrong collection or the wrong number.

Don’t Leave Your Code to Chance

I have seen many cases where two (or more) classes have the exact same name. They have different package names, but package names are hidden away at the very beginning of each class that uses another class. It’s a good idea to give classes unique names across your code base. This will make sure there is no ambiguity around which one a developer is using. Murphy’s law is not your friend—you will inevitably pick the wrong one otherwise.

The behavior of HashMaps may also be a source of hard-to-troubleshoot bugs. The Java Virtual Machine tends to optimize the behavior of HashMaps when the application load grows. What this means for developers is that you will observe one behavior during development, and a different behavior in production. To avoid this kind of head-scratching riddle, I have learned to use LinkedHashMaps even when not needed as they will always return entries in a consistent manner. This is a trick to use wisely: the performance cost of using LinkedHashMaps should be weighed against the risk of having to solve a problem that only happens in production.

Reuse Existing Code

At the end of the day, there is so much to keep track of, just writing code is doomed to produce bugs. The simplest way to avoid writing bugs is not write any code at all, and leverage some other battle-tested component instead.

Reusing code is the ultimate simplification technique, but I am not just talking about integrating other people’s work into your own software. The code you already have is the result of many refactorings. It has real-world experience you are missing (or that you forgot). Use the code you already have as your template. If the quality of your template is not what you need, refactor this code, then reuse it. There shouldn’t be in your program “two best patterns” to do the same thing.

As I implement a new feature, I typically look at similar code doing similar things: how classes, methods, and variables are (meaningfully) named, which properties and behaviors are exposed, how objects interact with each other, etc. In all the bodies of code I have maintained for several years, I have found that consistency is the easiest constraint to follow, to teach, and the one that brings the most benefits in terms of overall quality.