Kx Systems and High Frequency Trading
Fri, 10 Jun 2011 09:17:00 GMT
Recently I had the opportunity to meet and chat with Simon Garland, Chief Strategist of Kx Systems, who was over in London on a visit from Switzerland.
The company’s kdb+ database is used extensively in a high frequency trading capacity to store, analyze, process and retrieve large data sets at high speed, so I wanted to find out more about the company and their product set.
HFT Review: Simon, can you explain what sets Kx Systems and kdb+ apart and why you guys are experiencing such growth in the HFT space?
Simon Garland: Well, our big claim to fame is that we’re enormously fast, which is why people use us. Another thing that’s not appreciated as much, is that we’re “one stop shopping”. As long as the data gets into kdb+ really quickly it doesn’t matter whether you use our feed handlers or feed handlers from other vendors. The data comes off the feed from the exchange as a little table so you can run SQL queries on it straight away. This, combined with historical data gives you CEP (complex event processing) on streaming and historical data. You can run complex queries on the data, for example, “is this different from what it was a second ago, a minute ago, a week, or a month?”
After we’ve analyzed that little table and done the CEP, which can include streaming analysis, the table is catenated on the bottom of the intraday database. It then becomes part of a bigger table and you can run SQL queries on it again, in your in-memory database, on disk, or a combination of both.
With the new machines that are now coming on to the market, where high-end commodity models offer terabytes of memory, an in-memory database on just one machine can, for example, contain all the intra-day Level 2 market data you need. You don’t have to store sections or subsets of data on different servers and try to keep them all working together. It’s just “the table” and you can easily run queries over all of it.
HFTR: Can you tell me more about q?
SG: q is Kx’s proprietary language. It’s a powerful, concise and elegant array language, which means that a production system could just be a single page of code, not pages and pages of code and a nightmare to maintain. Clearly there is an initial investment in learning it, but the power it gives you to manipulate streaming, real-time and historical data makes that initial investment really worthwhile.
HFTR: What are some of the practical applications of being able to combine both streaming and real-time data together with historical, all in the same database?
SG: The most important thing is the simplicity, which translates into speed and ease of doing analysis. For example, it allows you to do complicated, time-critical analysis, such as pre-trade risk. This means that you are likely to see interesting trading opportunities before those who are using the same off-the-shelf solution as everybody else. So you’re there first and you’re there so early, you can afford to do comprehensive pre-trade risk analysis, and you’re able to look for patterns you’ve seen in the past.
HFTR: More and more firms are talking about putting the pre-trade risk directly on the chip. They’re actually programming, direct onto FPGAs or other chips, the ability to run pre-trade risk checks at the silicon level. Are you guys doing anything specifically chip-related in that way?
SG: Yes and no. We think it’s a little too early to build it in as standard. We’ve provided hooks and some of our more sophisticated users are already using them, but we haven’t built it right in so far because there are no common standards yet. It’s only with the latest generation of GPUs that you’ve been able to do 64-bit floating point calculation. It’s still a bit “Wild West”!
HFTR: That’s an interesting response, because Kx were ahead of the curve with 64 bit, with multithreading and with various other things, so could this be next?
SG: Yes, we’ve made it possible for advanced users to go in and do it themselves, and we do have clients already making use of this in production. And there are so many other places to be improving performance at the moment.
HFTR: Such as?
SG: On the latest machines, for example, the number of available cores has increased dramatically so improvements can be made. If you’ve got your code running across the cores at random the overall performance will be considerably worse than if you were to lock the code onto particular cores and sockets. For example, “this process can have eight cores, we’ll force them onto socket number two.”
HFTR: Are you seeing any kind of interesting, innovative and unusual uses of your technology in the HFT space which you can share?
SG: Yes. It’s interesting how quickly things are changing, but our clients tend not to share this information as, at the moment, it’s very much their competitive advantage. There are people offloading calculations onto FPGAs and GPUs far sooner than we expected. Although it’s a lot of work to make sure it’s ready for production, there are thousands of GPUs in production systems being driven from kdb+. In this sort of application the business logic is managed from kdb+ using q, an interpreted language, which makes it easy to inspect what’s going on. You don’t have to go digging into memory on a GPU for example. It makes systems much more maintainable and much faster to modify.
HFTR: But how easy is it to find kdb+ programmers with this expertise?
SG: I think the problem isn’t really getting kdb+ programmers, it’s the usual challenge of finding good people. Say you’re a bank, and you need Java programmers who can write multithreaded, production quality applications using GPUs, you’re probably only looking at a handful of potential candidates in the bank, and they are in such demand they’re going to be booked out for projects way in advance. And, how easy do you think it is to get people who really know how to handle these GPUs and who have been keeping up on the latest SDKs? People who know what’s cool, which machine is good, which models have serious flaws?
HFTR: Are there any particular communities where such people gather online, to discuss these kinds of technologies, particularly in the finance sector?
SG: There are a few, but it’s still a case of, “let’s meet up for a beer this evening”. The community has not got big enough yet to be really established online. People who are good are very good and they know who else is good in that space. The people who are “famous” are typically those who’ve been in and out of the various manufacturers, they know what’s there, what’s in the pipeline, what was good, and what didn’t quite work out.
HFTR: Are you seeing any interesting applications of your technology in specific asset classes, like FX for example?
SG: The world of FX is certainly changing. There’s a significant rise in volumes, so there is much more data to deal with. Also, because of the upheavals in 2008 there was an influx of people into FX, who brought with them very interesting experience from working in other asset classes. For example, they know how to handle huge data volumes, and understand different ways of analyzing data at real-time speeds to produce a significant competitive advantage. There is a lot of potential for expansion, but this is a sensitive area, so our clients tend to keep it to themselves.
HFTR: Where would Kx either compete or sit alongside other CEP solutions?
SG: Well, that’s the thing with this “one stop shopping” idea. CEP packages are often used for data which has just come off the exchange. But they are not complete solutions, because you need to have other applications in the mix, e.g. other databases, which implies that you will have to deal with other data representations and SQL dialects. This approach is fine if you’re handling up to, say, 100 million records or so. But if you’re dealing with billions of records, you don’t want to be taking the data from one application and sending it to another, it would just be too slow. This volume of data takes time to move around, whereas once it’s in kdb+, it’s immediately available to you. Another advantage of kdb+ being so fast is that you can back-test years’ worth of data in just a few minutes, you don’t have to run tests overnight and wait for the results the following morning. What we sometimes see is CEP packages using kdb+ in the background.
HFTR: I understand kdb+ hooks in well with languages like Matlab as well?
SG: Yes, things like Matlab are very useful. We don’t have their breadth of statistical routines, but equally, Matlab couldn’t handle 500 billion records at the sort of speeds our customers require. So you need kdb+ for the part where you need things like, “give me this and this and this, and while you’re at it, can you aggregate it up to one minute bars, and for these stocks please just get me the thousand records with the highest values.” Then you export the data to Matlab or a visualization package like Panopticon and you can graph it and analyze it, but there’s no way those packages alone can do the heavy lifting.
HFTR: So what’s next for Kx?
SG: Faster! But it’s not only being faster, it’s enhancing kdb+ based on customer feedback, market and technology developments. We realize how core our systems are to our customers’ businesses, so we’re focused on making sure our solutions are reliable and stable.
HFTR: Thank you Simon.