Latency Monitoring and Clock Synchronisation
Fri, 25 Mar 2011 05:00:00 GMT
Citihub Conversations, Part One
This interview is the first in our series of Citihub Conversations, where we will discuss a range of topics on the intersection of technology with the financial markets – particularly in the high frequency tradingspace – with associates from Citihub, the specialist IT consultancy focused on the financial services industry.
In this interview, Mike O’Hara talks to Donovan Ransome, Associate Partner and Head of eTrading & Market Data Practice at Citihub, about the challenges of accurately monitoring and measuring latency in a high performance trading environment.
High Frequency Trading Review: Donovan, you recently wrote a blog post about precision time synchronisation. Can you give us a brief introduction on why accurate clock synchronisation and latency measurement are so important?
Donovan Ransome: Yes, latency monitoring is actually a very interesting topic. We do a lot of work around time synchronisation and accurate time stamping and we help clients with it quite extensively.
Optimsed latency ensures that our client’s competitive edge is maintained, so measuring it ensures you stay in the game. Also it can be a great indicator that something is going wrong. If latency increases, it means there’s a problem. Obviously you’ve also got other indicators like fill rates falling off, but latency is a great metric. However, in order to monitor or measure latency, you’ve first got to have some sort of accurate timing, a baseline that everything fits with. And once you’ve got an accurate time network, it’s all about when you time stamp and how you correlate.
The industry at the moment has a lot of technologies that can measure different points on the wire, inferring that if you’re measuring point 1 and point 2, the latency between those two points is x. Therefore if you’ve got application A that sits between point 1 and point 2 and latency increases, then it implies the problem has to be with that application.
Latency might be a great metric, but it doesn’t necessarily give you the view as to what exactly is going wrong, it just gives you a starting point for taking corrective action. You still have to understand what is causing latency, and take necessary corrective action, both in your infrastructure and in your trading model.
HFTR: The FPL (FIX Protocol Limited) Inter-Party Latency Working Group is evolving a set of standards to measure latency at specific points along the chain. Would you like to see firms adopting these standards?
DR: FPL is good, it is the industry’s own body and therefore should be the route to the market’s commonly agreed standard. We would support use of all the output from these working parties and if we can use their design patterns to test service levels or vendor propositions we will.
I think standards make sense if you’ve got multiple venues and you’re looking for some sort of solution that will allow you to measure the latency from an exchange.
But what a lot of people are looking for is quite frankly an accurately timestamped execution venue’s market data or transaction message. If the exchange was properly synchronised to a grandmaster clock and it was done extremely accurately using PTP (Precision Time Protocol) with some sort of hardware-assisted time stamping on the exchange message, then it isn’t difficult to correlate this with a timestamp inserted in the client’s feed handler or exchange gateways.
HFTR: Don’t most exchanges already give you a decent time stamp?
DR: Well, most of the exchanges do time-stamp their messages, but there are different levels of accuracy. But there are still some execution venues that don’t give you anything, so you don’t have any visibility of the latency.
Still, most of them do have something and generally there are clever ways for firms to work out the latency. For example, if you’re constantly pinging the order book you can measure the round-trip time for those orders to appear on the book and with executed orders you can measure latency through the matching engine. Plus if you are sourcing your own connectivity, you will know your fibre length and your carrier’s latency, which can be provided when first installed. This, combined with comparing exchange timestamps to timestamps on your handlers and gateways, will let you build up a complete picture of what “good” looks like and you can immediately see where deviations occur.
Also with all this information, you can actually get a pretty good idea as to how accurate the exchange’s timestamps and clocks are.
HFTR: How do you achieve an accurate timestamp?
There are various ways. As I explained in my blog, this comes down to time synchronisation between grandmaster clocks and the server or device performing the timestamp, and how you actually perform the timestamp.
NTP (Network Time Protocol) is being replaced with PTP, which ensures oscillators are synchronised to the time source with sub-millisecond accuracy. A well designed PTP solution using good quality oscillators and hardware assisted timestamping will get you down to nanosecond accuracy. But even using software like PTPd will give you a relatively accurate level of time stamping. That’s very often good enough compared to time synchronised through DNS servers located in distant data centres.
It’s all about what works, what is really required by the business, what gives you all the metrics, having good timestamps and good time sync across all the application processes and the order flow and then pulling all of that data into a database that allows you to work with it.
HFTR: So, once you have the timestamp data in the database, what then?
DR: First and foremost, you want to get a view of what is your latency curve. What does the distribution look like, what is the probability of a message being at one microsecond or ten microseconds versus 100 microseconds or even 100 milliseconds? Pulling the timestamping and measurement figures into a database over time gives you those distribution curves and lets you visualise your picture of latency, it lets you understand what components are causing you problems. Where do you focus? Is it your FIX engine software choice or is it the pre-trade credit check that’s causing the largest fluctuations in your latency, for example?
Then once you’ve started addressing that, you can improve it on a number of levels. For example, you might get to the point where you want to measure and react to latency in real-time. So it’s a journey, but very often people try to reach that nirvana without going through the journey first, and they end up in a 6-8 month deployment program, where really what they needed was something that gave them the focus and pointed to quick wins. Experience with all the tools available allows Citihub to deliver a phased roll-out of progressively tighter latency measurements. This helps you stay in the game.
HFTR: Can the act of monitoring and measuring latency, either in the application or the hardware, actually impact the latency itself?
DR: There are various opinions on how intrusive it is. I feel that some of the intrusiveness that people talk about has been overplayed. The additional latency created is no more than getting time and inserting a timestamp, and by doing this in hardware the additional latency is negligible. It comes down to the level of the latency game you are in. If you’re in a trading business concerned with nanoseconds and if you are putting even your analytics into FPGA then yes, it could be a concern. But remember latency is important in many other business lines, where the business is looking to measure and improve milliseconds of latency.
HFTR: I assume that at Citihub, you’ve started to build up a view of what the best practices are at different levels?
DR: Absolutely. In fact, a colleague of mine has just published a blog on this; on the different time calls you can make within an application. There are many different system calls, Java, C and scripting calls, each with different levels of accuracy.
For example, do you need to use time synchronised with a master clock to do timing in an application? No. You’ve just got to look at the number of clock cycles it takes.
Or if you do need to get time for a timestamp, some calls are impacted by time adjustments or corrections. And if you’re constantly trying to measure the time in an application and all of a sudden your time adjusts, due to it going out of sync with your Stratum server, then suddenly your timing has gone out of the window because your clocks have just moved.
The latency monitoring programs that have worked really well are the ones where firms have looked at latency end-to-end. They’ve looked at the entire message flow, they’ve looked at all the points and decided where they want to measure. Do they want to do it at the network level on the wire? How much do they want to do in the application? What is the right point to correlate all this data? How do you correlate all the data?
And then the really sophisticated stuff is where you start correlating some of the latency metrics – which are an indicator – with infrastructure health, buffer sizes or fill rates. Because there are correlations, but it’s about joining it all together. This then gives your business and IT teams the visibility and intelligence they need to be at the forefront of their game. We like to call it Intelligent Monitoring.
HFTR: Presumably that’s where consultancy firms like Citihub come in?
DR: As a consultancy, we do latency assessments. We generally get brought in to do everything from assessing a client’s infrastructure and where their latency is, baselining it, and making recommendations to remediate and improve at a tactical and strategic level.
Part of our latency assessment actually includes a monitoring assessment too, so we look at what monitoring a client has in place today as well as looking at the operational framework around it. So we don’t just go in and measure latency, we look at what best practices are being used, what’s in their process for managing the infrastructure, what is the resiliency across the network, everything, end-to-end.
From there, we could go into an engagement, where we help the client to devise and implement an ongoing strategy, which could be a monitoring strategy, for example. Then once a high level architecture is agreed, we help clients with their technology choices and vendor selection.
HFTR: Thank you Donovan.
In future interviews in this series of Citihub Conversations, we’ll be covering topics such as developing a co-location/proximity hosting strategy, the “build versus buy” debate, the challenges of keeping up with exchange connectivity in a multi-venue world, and other related subjects.