Remember me

Register  |   Lost password?

The Trading Mesh

Why Nanosecond Precision is Needed in High Performance Trading Environments

Wed, 24 Oct 2012 04:54:44 GMT           

An Interview with Dan Joe Barry, Napatech

24th October 2012

 

Sponsored by Napatech (http://www.napatech.com/)

 

In this interview for HFT Review, Mike O’Hara talks to Dan Joe Barry, Vice President of Marketing at Napatech, a firm that develops and markets highly advanced programmable network adapters for network traffic analysis and application off-loading.

 

HFT Review: Dan Joe, welcome to the HFT Review. Napatech may not be a familiar name to many of our readers, so can you provide a brief introduction to who you are and what you do?

 

Dan Joe Barry: Yes, at Napatech we develop intelligent adapters for network traffic analysis and application off-loading. Our customers are typically firms that build various different types of network appliances for doing things like testing, measurement, monitoring and analysis of data, basically anything beyond routing and switching.

 

The nature of IP networks is such that they need a lot of analysis in real time to determine what’s going on, and a key part of that is that you need to capture the data in real time. Unlike with telecoms protocols for example, there’s no built-in management mechanism in Ethernet or IP, so you need to look at every frame, every packet, to determine what kind of packet it is, where it’s going, what it’s doing, and so on, in order to make decisions based upon that data.

 

The actual process of capturing those packets off the network and delivering them for analysis quickly and efficiently, without losing any of the packets, is actually quite tricky. That’s where we come in, we have the technology to do that and we provide it in a way that allows customers to use standard hardware. The product we make is designed to be installed in standard servers and in that regard is similar to standard network interface cards, it’s got the same Ethernet interfaces on it and it fits into the same PCIe slot on your server that other cards would, so we’ve made it really easy to use.

 

What distinguishes us from a standard NIC is that we’re not using it for communication or to have a protocol stack or anything like that. The card is purely focused on getting data off the line and into the application as quickly as possible with guaranteed zero packet loss. That’s the value we provide.

 

HFTR: How does that apply to financial trading networks?

 

DJB: We have a lot of customers who use our technology to build probes for performance monitoring - or even monitoring in general - to see what’s going on in the network. Then there are some specialised solutions in the financial networks for latency measurement, either within the network itself or from the trading firm’s network to the exchange’s for example, which use our technology.

 

HFTR: As we move towards higher data rates such as 40 Gbps & 100 Gbps, what are the key issues you see around the areas of accurate time synchronization and precision time-stamping?

 

DJB: As the data rates go up, obviously things are happening faster and you have to be able to react more quickly.

 

We recently published a white paper (available at http://www.thetradingmesh.com/pg/blog/napatech/read/64475) where we illustrate how, if you need to ensure you’re time-stamping each and every frame - without losing any frames and making sure they all have a unique time stamp - then on a 10 Gbps network you have to be able to react within 67 nanoseconds between frames. At 40 Gbps, that figure comes down to 17 ns and at 100 Gbps it’s just 6.7 ns. That is a significant challenge.

 

HFTR: How are you addressing that challenge at Napatech?

 

DJB: We do all of the time stamping in hardware on the card, as close to the wire as we possibly can, with a precision of 10 ns and a resolution of 5 ns. That is the kind of technology that is going to be needed in the future and we have the ability to do it today.

 

However, the key issue is having solutions that can react in nanosecond time across the whole chain. That is one of the reasons we published our white paper. What we’re trying to illustrate is that as well as precision time stamping, you also have to take time synchronisation into consideration, because there are a number of different applications that will not only require you to have synchronisation of clocks, but also synchronisation of time itself, i.e. having an absolutely unique time stamp even when measurements are performed over distances and across networks. When you’re measuring latency between a trading firm and an exchange for example, you’re trying to compare two completely different environments. Small errors come into play, so the accuracy of the systems you’re using all along the chain is important.

 

We want to map this out for people so they understand what’s going on. Time synchronisation and time stamping are very specialist domains and the knowledge certainly isn’t widespread. So we’re trying to help the appliance vendors to understand more about this, to become au fait with the terminology and the meanings of the different technology options. Firms are going to have to know this stuff. It’s becoming increasingly important, not just to network engineers but also to their managers, the guys who don’t normally get involved at the technology level.

 

HFTR: If we look at the methods currently being used by exchanges and trading venues around the world for synchronization & time-stamping, where are they along the technology curve? And are there any particular problems resulting from their current practices?

 

DJB: There’s a broad range of different solutions out there.

 

The best option currently is to use Global Positioning System (GPS) antennas, and most exchange data centres do now have at least one installation of GPS. GPS is great because it gives you accuracy to about 30 ns, but the downside is that you have to have an antenna and you have to connect that antenna to the point where you want to monitor. You obviously can’t do that for every point in the data centre because you might have 100,000 servers. So you need to think about how you’re going to distribute the signal, which you can do in various ways.

 

Most data centres would make use of the GPS pulse-per-second signal, which is just a pulse to make sure your clock is synchronised with the reference. There are also other methods such as CDMA (Code Division Multiple Access), which is used in North America and has the advantage of being able to operate over long distances via cell towers. This gives accuracy to about 10 microseconds, but it comes with its own problems. If for example you are on a boundary between cells, you can get unpredictable switchover between cells, resulting in jumps in time. So it’s not particularly stable.

 

IRIG-B, which most people would agree is legacy now (it was invented in the 1950’s and developed in the 1960’s), is still in use in some data centres. It’s better than Network Time Protocol (NTP) but it’s still only accurate to about 100 microseconds, which doesn’t even cut it on 10 Gbps networks, never mind 40 Gbps or 100 Gbps.

 

Some venues are still using NTP in software. The problem with software of course is that it runs on an operating system so it has to wait, like everything else, for its time to run. Although there are some applications that can live with that level of accuracy and precision, NTP isn’t good enough for financial networks and high-end trading applications where accurate time measurement is critical, you really have to use hardware time synchronisation.

 

What is being increasingly adopted now is IEEE 1588, the Precision Time Protocol (PTP) standard. That offers a good compromise between accuracy (around 100 nanoseconds) and practicality because you can run it over an Ethernet network. This is the direction that the industry is moving towards.

 

HFTR: What are some of the security issues around GPS and PTP? Isn’t GPS particularly susceptible to jamming, for example?

 

DJB: Yes, and there are also some security issues with PTP because it is just a protocol, so it could potentially be attacked by hackers. But there’s ongoing work with the IEEE standards body to figure out how best to protect the protocol from such attacks.

 

As for GPS, I’ve heard of hypothetical scenarios where a hacker using a GPS jammer could change the GPS time reference by jamming the signal, inserting a different signal to establish some sort of “gap” and then creating havoc on the back of that. But I think that would be a lot of work, it wouldn’t be particularly easy to do successfully.

 

Having said that, GPS is sensitive to other things, such as weather conditions, or even a bird sitting on the antenna! These things do happen and do cause problems, which is why many systems that are dependent upon GPS for timing will have a backup via atomic clocks, whose time will vary from GPS by about 10 microseconds per day.

 

HFTR: What happens if someone loses their timing signal in your environment?

 

DJB: If you lose the reference signal, you’re backed up by the local oscillator on the adaptor itself. This is not as high precision as the reference signal, but in the very worst case it keeps time within a skew of 50 microseconds per second and more typically 10 microseconds per second.

 

The key thing is how you recover from the loss and how you deal with that skew. It’s important that when you reconnect you don’t just jump from the skewed signal to the accurate signal. So we have a mechanism for enabling you to smoothly get back to the accurate time. And that’s all configurable, depending upon the application.

 

HFTR: You have your own proprietary protocol, Napatech Time Synchronization (NT-TS), which – according to your white paper – is accurate down to 10 nanoseconds. How did that come about and how does that work?

 

DJB: Well, we had the challenge of distributing time signals ourselves, before IEEE 1588 started to become popular. We needed to come up with a solution to make sure our adapters, when using multiple adapters in the system, would have the ability to be synced very closely, because one of the things we do here is merge data from multiple points into a single analysis stream. We needed something like a PPS signal, but with a time stamp associated with it. So we came up with a protocol that would allow us to do that and synchronise our adapters either within the box or between boxes, with an accuracy of 10 ns.

 

We’ve built on that concept so we now have a distribution box and a daisy-chain that we use to propagate the signal between adapters. We also recently introduced the NTTSE “end-point” that can take an IEEE 1588 signal and convert it into three of our NT-TS time signals. This allows us to bring IEEE 1588 into existing installations with our cards.

 

The key advantage with NT-TS is that it is very, very accurate. But obviously from an end-to-end point of view, we are dependent upon getting an accurate signal in the first place.

 

HFTR: Any final thoughts?

 

DJB: Coming back to our earlier point, we’re really trying to raise the awareness of the need for nanosecond precision. You get a lot of people talking about what is “good enough” in these situations, but we would like to highlight the fact that you have to raise the bar of what’s good enough when you go from 1 Gbps to 10, 40 and 100 Gbps. Microsecond accuracy is not going to get you there any more.

 

Going forward, we’re going to be working with other parties in the industry to find out the best way to tackle 100 Gbps and keep the accuracies and resolutions and precisions that are required. And as we continue to learn and continue to understand more, we’ll share that with the community because everyone needs to understand this at a fundamental level. 

 

HFTR: Thank you, Dan Joe

 

Click here to download this article in PDF format


, , , , , , , , , , , , , , , , , , , , , , , , , ,