Beyond Latency Utilizing Ultra Accurate Network Timing in HFT Systems
Wed, 22 Feb 2012 01:26:00 GMT
An interview with Paul Rushton of Korusys
Sponsored by Korusys
In this interview for the High Frequency Trading Review, Mike O’Hara talks to Paul Rushton, Managing Director at Korusys (www.beyondlatency.com and www.korusys.com), about delivering highly accurate timing and synchronization to High Frequency Trading systems and how that time accuracy can be realized and utilized to improve both trading performance and latency measurement.
HFTR: Paul, welcome to the HFT Review. Your background at Korusys is in FPGAs, network timing and clock synchronization, so why the interest in high frequency trading (HFT)?
PR: The main reason we got into HFT is because we came across a number of synergies between the HFT world and the telecoms world.
On the FPGA side we’ve done a lot of FPGA-based network monitoring, packet sniffing and similar things that we’ve been seeing increasingly in high frequency trading systems, so there’s certainly crossover there. And on the synchronization side, with latency monitoring and the speed of networks getting so much faster in the HFT world, the granularity of timing required to be able to measure those speeds is becoming more and more of an issue.
We had some interest from various customers so we decided to apply some of the synchronization technology we’ve developed for telecoms into these new areas of HFT, where it’s a little fresher. There’s more education to be done and there are myths to be expunged but it’s proving quite fruitful and interesting for us.
HFTR: Can you give us an overview of where things currently stand with network timing? What’s the current state of the market and the technology?
PR: There are many ways of getting time into a system over the network. Traditionally, NTP (Network Time Protocol) has been the standard bearer for timing, and it’s served a lot of applications very well over the years. But we’ve got to the point now where NTP is running out of steam. Latencies across networks have been reducing massively and with the speed of the actual servers and processors increasing too, the timing granularity you’re able to extract from NTP is just not going to cut it if you want to measure those things accurately.
We’re timing experts, so we’re very interested in the performance of NTP, PTP (Precision Time Protocol) and all other timing systems. And it’s interesting to note that most people, in order to see how accurate NTP is, look at the log files in their server system. This involves quite a leap of faith, because what NTP actually records is estimated time error. Those values may give you a nice, warm, fuzzy feeling as to how accurate your clock could be, but if you go to the trouble we’ve gone to in developing hardware to accurately measure NTP’s performance from a server, you’ll see that the actual figures are an order of magnitude away from the estimated values, so you are getting a pretty poor clock into your system.
That’s where PTP comes in as the next step up in network based timing, and it’s come about through the networking industry realizing the inaccuracies of NTP and coming up with a different protocol to account for it.
However, there are some really critical things that those involved in network timing for HFT systems need to know about PTP. The key thing is that unlike NTP, which specifies a time recovery algorithm in its standard, PTP simply specifies the protocol only.
So the time recovery algorithm, which is one of the crucial things that you need to get right in order to get really accurate time, is vendor dependent. That means that not all PTP systems are the same.
That’s a critical thing to realize. You can’t just make a decision to go from NTP to PTP and hope it will be better. It probably will be -- but vendor A, vendor B, and vendor C will all perform differently depending on various aspects of their system, including how good their algorithm is. And this is where you need to be a little careful with your choice and match the performance you need to the sort of systems that are people are offering.
HFTR: So when looking at the various different PTP systems, what kind of variables are there?
PR: There are really two key ones. The first is the algorithm itself, which is critical because it determines how well the system copes with network disturbances. The second is the hardware, for example you need a really good oscillator because it gives you a stable time base between packet arrivals, enabling you to recover a clean clock from your algorithm. Hardware time stamping is also critical, because if you’re aligned with software time stamping, you can lose all the performance that you were looking to gain from PTP. So it’s a combination of those variables, in particular the algorithm. If you can get both of those right, it can have a big impact on how accurately you deliver your time.
As you know, we’ve been working in telecoms for years where we’ve had to deliver sub-microsecond time accuracy to mobile phone base stations across standard metropolitan area networks.
These telecom networks are typically fairly highly loaded and have high jitter for packets transit times. So if you’ve got a system and algorithm that can deliver that sort of accuracy over those types of networks, when you implement this technology in financial networks (which are, by comparison, extremely fast and relatively much cleaner in terms of their loading) this becomes a much easier problem to solve.
That means you can just run your system over your standard network, you don’t necessarily need to implement things like transparent or boundary clock switches, or build out a clean timing network.
Of course if you want to implement a separate network for timing or use transparent clock switches that’s all fine, but it’s not strictly necessary if you have a high performance PTP system.
It all depends on your choice of how much CapEx you want to spend versus the performance you need. Some systems -- because they haven’t got a very good algorithm -- will absolutely require those boundary clocks, transparent clocks and/or separate networks just for timing. But really good timing systems don’t need all that.
HFTR: Before we look at some of the best practices for implementing PTP within HFT systems and networks, can you give me an idea of the level of accuracy that we’re talking about here? Are we talking microseconds, nanoseconds or what?
PR: There’s a whole range and it depends what system you’re looking at and what packet rate you’re running. But you can certainly get down to low tens of nanoseconds if you’re in a nice clean network and you’ve got a really good algorithm, a good oscillator and good time stamping. That’s absolutely achievable. In fact, over properly contested highly loaded metropolitan area networks, even with just a best effort packet service, sub microsecond time performance is achievable, that’s out there today in mobile phone base stations.
But you get a whole range. To get really accurate performance you have to be a little bit careful about your network engineering. Typically on financial networks, because they tend to be switched networks, the pass forward and backward are the same and very lightly loaded, so with the right system implemented you can get well under sub-microsecond performance, certainly down to hundreds or tens of nanoseconds.
HFTR: If we look at the actual process of delivering accurate time that “last mile” into HFT systems, what are the key issues there?
PR: First of all, you want specific hardware to recover the time. You can implement PTP in your server and use software time stamping and that’s all fine, that should give you an up tick in performance, but we make no excuses that we target extremely high performance, where time is really valued. And for that you need a PCIE card or some other network timing card, where the time resides on the card.
Then you need to solve the problem of getting the time from the card to where it can really be consumed, to where it’s actually useful. There are various different ways of implementing this. Some vendors will leave the time on the card and instruct the applications to read the time off the card. That’s all very well but that means you have to go to an external bridge chip set across the PCIE bus into the card and back again, so by the time you’ve actually got the time, it’s out of date and you’ve lost all that accuracy that you were trying to get by implementing a PTP solution in the first place.
What you have to do is to solve the problem in a similar way to how it’s solved in network timing, but on a smaller scale. In network timing you have a highly referenced master in your network and a slave residing in your server. The slave gets packets from the master across the network and recovers time from those packets, which are jittered so there’s an algorithm running that recovers the accurate time. And you can run a very similar system inside your server, where the PCIE card becomes essentially the master (because that’s where time has been recovered) and you need to read the time across the PCIE bus. You can perform similar sorts of techniques to remove delay and jitter from that PCIE read and create an accurate representation of what the time is, in the software itself. That solves the first part of that last mile problem.
The second part of the question though, is once you’ve delivered that time into the software, how do you make that available to the software in a really lightweight manner? It’s probably no secret amongst HFT practitioners that reading time using the “get time of day” call in Linux is not the way to go. One, it’s pretty inaccurate, but two (more critically) it’s a really heavyweight process, so by doing that call you’re actually affecting the performance of your algorithms because of how long it takes to read the time.
So we’ve written applications that pull the time off our card and deliver it into the software as a really lightweight call, so that the software can get access to it in just a few processor clock cycles rather than a few microseconds, which is what “get time of day” would take. At that level, inside the software, time is really just a value in memory, so it’s a case of how do you make that value accurate and how do you make it accessible when someone wants to read it? That’s the sort of problem that you have to solve if you want to make highly granular, highly accurate time available for use within the system.
We’ve got what we call our “timing tool kit” that allows you to do this, to pull time off the card and deliver it to the software. In the first instance a customer might be happy to just use their standard Linux calls to get time, in which case we can discipline the Linux system clock. But for critical applications they would probably want to use our application clock with the really accurate lightweight access.
It basically comes down to picking the pieces of the jigsaw that you want to implement, the pieces that are right for your system, and then plugging them into place
HFTR: How are you typically seeing such accurate time being used in practice? From a business perspective, how is nanosecond-accurate time actually benefiting the firms who have it?
PR: What it gives you is observability of things, because you’ve got highly granular time inside your system. We’re timing experts not high frequency traders, but three of the key areas we’ve seen are back testing, software metrics and time as a variable in algorithms and applications.
If we take the areas of software metrics and back testing first, the key thing to understand is that there are network latency monitoring boxes and systems around today that work very well and can give you a good sense of latency across your networks. But the real essence of what latency is useful for is not when a packet hits a NIC card or some arbitrary network point, it’s when it actually hits a library point in your system or a decision point in your software, that’s when you can really utilize the knowledge of how long it has taken to get here. Networks are becoming faster and faster and some of the bottlenecks are now more prevalent inside the server systems than on the networks, so it’s becoming more and more critical to be able to see, from a timing point of view, what’s going on inside your system.
To be able to do that you need to see the time in the software. With this kind of accurate time, right where your software is using it, you can get an absolute knowledge of how long it’s taken for that packet or information to get to that point in your system. This allows you to send logs out, recording and playing back for back testing.
One important feature of our system is that we provide such a light way of sending these time stamped points out to a network monitoring system, through a simple lightweight “call by” reference. This gives the time stamp, sends a packet out, formatted however you want, all done by the card. So you can look at that at a later date and examine the performance of your system, see micro bursts on the network and inside your system, identify bottlenecks and tune the performance of your algorithm. Sending of these accurately timestamped performance logs of points inside your software can be achieved with virtually zero overhead top of the performance of your system.
The other point to understand is how timing might be important as a parameter inside an algorithm. If you can get accurate time inside your applications, what you can then do is use that time as a live parameter in your algorithm’s decisions or your application mechanics.
If you have accurate timestamps from where and when the packet was sent and you know not just when it hit your NIC card, but when it actually hit a decision point in your algorithm that could affect a trading decision or a risk decision you might make, like needing to send a cancel of an order for example, suddenly time becomes something that you can use in real time. You can see micro bursts in real time as part of your algorithm execution rather than as something that you’ve recorded just for playing back and using to bash your network provider for service level agreements. It can actually be part of the core of what your application is trying to do and can potentially affect how good it is at doing it.
Critically, by solving the ’last mile’ problem we just discussed, it means your logs are based on the same timebase as your live algorithm, correlating the logging and algorithm execution together. If the time on the card is not directly related to the time in the system them these two timebases can deviate from each other significantly ruining your accuracy of your logs.
HFTR: If we look at the use of accurate time synchronization across longer distances, for example between different trading venues, why is it so difficult to have consistent, accurate timestamps?
PR: That’s the “timing everywhere” paradigm that you see across networks. There’s not a great benefit in having accurate timing in one part of a network but not in another, because if you’re relying on inconsistent time stamps between the send side of a network and the receive side, then obviously your latency measurement is going to be adversely affected.
The question is how to distribute accurate timing to all points in a system, and that’s where network timing comes in. Obviously you can have GPS feeds to various places, but what people want is kind of a plug and play system that’s really accurate. One way is to put all your engineering into where the masters are, spreading those masters out into your best fit locations, getting your GPS feeds into there and then spreading the rest of the timing through the network using PTP. What this means is at your end points, you really need to recover your PTP timing as accurately as you possibly can at all points, because the worst link in the chain is going to be where your worst timing is recovered.
But as I said earlier, not all PTP systems are equal. You can get PTP systems that are going to give you nanosecond accuracy and some that are going to give you tens of microseconds accuracy, so this is where you need to make your own decisions and choices into how you implement your system based upon where timing is really important to you. And that’s really what network timing is all about, being able to deliver this time flooding across your network to all end points, all having the same timing reference.
The best way to do that is to have plenty of timing references around, i.e. one in each centre so you’re not going across long distances. But if you’ve got a good enough system, the actual distance away from your master shouldn’t make too much of a difference. You can deliver really good timing across some fairly hairy networks with a good “whole system” implementation.
Things like for example network engineering can make it very much easier for timing algorithms. If you want to invest your capital in just a few critical master points where you’ve got good GPS feeds and want to have lots of far flung locations that are recovering the time from these masters, then maybe you want to put some investment into quality of service for your timing packets across there. That’s one potential solution.
Another potential solution is to have more masters in more of your locations so you’re not trying to take timing from quite so far away at all these points. It’s really network engineering decisions based on cost, application and performance.
HFTR: And what kind of investment are we typically talking about for PTP-based solutions?
PR: That’s a tough one to answer. I would say that a PTP master is more expensive than an NTP server, maybe double the cost. PTP slaves range in price. PTPd is actually free in Linux, so if you just want a software solution you can stop using the NTP daemon and use the PTP daemon instead and that will give you an up tick in performance. But at the points where you really value highly accurate time, you’re talking about the cost of timing cards, which will be the price of a GPS card, or a standard NIC card or something like that. Still, given the sort of investment that financial firms put into HFT infrastructure, those figures get lost in the noise in terms of the overall investment. It comes down to firms deciding what they need where and how to implement it, but the costs are negligible in comparison with the other costs that these networks have in place.
HFTR: Finally, where are you seeing most interest coming from? Are you doing more business with infrastructure providers, exchanges, sell-side firms, prop trading firms or quant hedge funds for example? And are you seeing any kind of trend developing?
PR: I wouldn’t say any type of firm in particular or any particular set of customers, we are seeing interest across the board, from network infrastructure providers to banks themselves and proprietary trading firms, it’s anyone and everyone really. I think this is an interesting time for timing. We’ve been doing this in telecoms for years but with HFT, there’s a whole new level of high visibility so it’s an interesting time to see which firms are popping up, what they’re doing and why. It really is of interest across the board. People realize that the timing they’ve got currently just isn’t good enough and that they need to do something about it.
HFTR: Thank you Paul.
- o -
Paul Rushton BEng, MSc, MBA
Paul Rushton is Managing Director at Korusys with over twenty years’ experience developing Systems and Hardware designs for the telecommunications and networking markets.
Part of the original IEEE1588 working group who developed the PTP synchronization protocol, he is now focused on developing and implementing synchronization solutions specifically for financial technologies.
Leading experts in packet based synchronization techniques providing both consultancy services and synchronization products to various market segments.
Korusys is also a trusted provider of Electronics Design Services, focused primarily on FPGA and Embedded Software design and development.
Korusys has earned a reputation for high quality, right first time development with a wide variety of clients.