Remember me

Register  |   Lost password?

The Trading Mesh

In-Memory We Trust?

Fri, 31 Aug 2012 03:16:53 GMT           

Tips for Picking the Right IMDS for your Trading Application

 

In-memory database systems (IMDSs) have taken off in recent years, driven by declining RAM costs and the need for speed across many applications and industries.

 

While it might be assumed that all solutions in this IT category are cut from the same cloth, in truth there are differences, and degrees of “in-memory-ness.” If you are considering an IMDS, it is important to carefully assess your requirements and match these against the capabilities of today’s offerings. 

 

The good news is that there are many fine products, from a range of vendors, to choose from. However, not all may be ideal for your application needs. This article explores IMDSs for high frequency trading, where it is critical to get the lowest latency, along with a high level of determinism (i.e. the database system must be predictably fast).  It discusses what to look for, and suggests the minimum requirements that a true IMDS – one that performs best for trading – should have.

 

Will the Real IMDS Please Stand Up?

 

In-memory database systems can be a key tool for minimizing latency while harnessing the growing volumes of data flowing through capital markets IT. IMDSs offer the features of traditional (i.e., file system-based) DBMSs—including transactions, multi-user concurrency control, and high level data definition and querying languages—but with a key difference: IMDSs store records in main memory, eliminating latency-inducing file I/O.

 

Disk-based DBMSs impose additional overhead including cache management, data transfer, and more. Working with data in main memory eliminates these bottlenecks, and IMDSs’ streamlined design can reduce system hardware requirements.

 

Vendors claiming to offer IMDSs have proliferated. The IMDS page on Wikipedia now lists about 45 such products; in July 2007 that number was eight!

 

However, many of these entries are simply on-disk DBMSs equipped with memory-based analogs of features that would otherwise exist in a file system. And in tweaking their technology to better wear the IMDS label, some vendors have eliminated important capabilities. The result is often a stripped down database system, one that has limited functionality when used in-memory, and with at least some of the latency-inducing characteristics remaining.

 

The problem is, while they may exploit cheap and abundant memory, these retrofits are not true in-memory database systems. Understanding the distinction is critical as it can affect the hardware requirements (and therefore total cost of ownership), performance, time-to-revenue, and ultimately the success or failure of a solution.  

 

In order to tell the difference between real and retrofit IMDSs, two key areas to examine are origins and wholeness.

 

Origins of the IMDS Species: Retrofit or the Real Thing?

 

Absent a complete redesign, when a DBMS designed for disk storage is recast as an IMDS, artifacts of its origins remain. These can inhibit performance and waste system resources.

 

For example, what justification is there for an IMDS to maintain a cache? Traditional DBMSs keep recently used records in RAM, so they can be accessed without I/O. But managing this cache is itself a process that requires substantial memory and CPU cycles, so even a “cache hit” underperforms a true in-memory database.  When you lift the hood of “retrofits,” a surprising number of them still include caching logic that is fully operational despite the entire database system now being in RAM.

 

Another such artifact is on-disk database system architectures’ requirement that data be transferred numerous times as it is used. Figure 1 (below) shows the handoffs required for an application to read a piece of data from an on-disk DBMS, modify it and write that record back to the database. These steps, which require time and CPU cycles, are still present in nearly all the DBMSs that have been repackaged as IMDSs.

 

 

In contrast, a true IMDS transfers data just once (in each direction). Data is copied directly from the IMDS to the application, and back from the application to the database, as shown in Figure 2 (below). There are no intermediate copies in a database cache or file system cache.

In another example, traditional DBMSs store redundant data (that is, data that is already stored in tables) in their indexes. This is useful for on-disk databases: if sought-after data resides in the index, there is no need to retrieve it from the data file, and I/O is prevented.

 

But when the vendor later deploys this database in RAM and declares it to be an IMDS, the redundant data is typically still present in the indexes, consuming storage space even though the entire table is now in memory. There is no longer any performance advantage from the redundant data—it just wastes memory.

 

In contrast, a database designed from the ground up as an IMDS does not store redundant data in its indexes.

 

With regard to wholeness, an IMDS is a type of database system, and it should be a complete DBMS. It should not sacrifice functionality just to accommodate in-memory storage of tables or of the entire DBMS. Shortcomings like the inability to support concurrent access, or limiting applications to a single process, are giveaways that an old-style DBMS is being recast as an IMDS.

 

In another example, a complete database system implements transactions that support the ACID (Atomic, Consistent, Isolated and Durable) properties or, in the case of certain so-call “NoSQL” systems, BASE (Basic Availability, Soft-state, Eventual Consistency) properties. This ensures integrity of data – providing anything less makes the product a record manager, not a DBMS. On the other hand, transaction logging, which most DBMSs implement to support database recovery, does require writes to storage (disk or SSD). Some applications can tolerate this overhead, in order to gain recoverability. But an IMDS should also provide the ability to turn off transaction logging and run flat-out in "pure" in-memory mode.

 

The performance gap between these knockoffs and built-from-the-ground-up IMDSs can be significant. You can take an on-disk DBMS and throw it into memory using a RAM-disk for storage, but an IMDS will outperform it, given the same computing task. McObject’s benchmark shows this, documenting an IMDS outperforming a RAM-disk database by 4x for database reads and by an impressive 420x for database writes.

 

Conclusion

 

While retrofit IMDS approaches can be effective for a range of application areas, the high stakes world of real-time capital markets leaves little room for compromise. When considering an IMDS for latency reduction in trading, look at its roots and confirm that full DBMS functionality is offered. Carefully consider available options, and be wary of vendors’ claims.

 

The degree to which a solution eliminates caching, and redundant data in indexes, for example, will directly affect performance and efficient use of memory. Avoid artificial restrictions that are by-products of an incomplete implementation of in-memory tables. Demand that any solution provides all standard database product features (transactions, replication, BLOB and TEXT columns, etc.), and refuse to tolerate artificial limits on database or transaction size. Test different solutions. Results will differ widely between purported “in memory database systems,” because some of these solutions were designed to run in-memory, while others were not.

 

 

, , , , , , , , , , , , , , , , , , ,