Remember me

Register  |   Lost password?

The Trading Mesh

FPGA & Hardware Accelerated Trading, Part Five The View from Intel

Thu, 23 Aug 2012 05:07:00 GMT           

This is part five of a six-part series by Mike O'Hara, looking at the usage of FPGAs and other types of hardware acceleration in the financial trading ecosystem.

 

Previous articles in this series have focused mainly on Field Programmable Gate Array (FPGA) technology. However, a number of firms are now looking at how the Intel Sandy Bridge architecture can help them accelerate their trading infrastructures, rather than following the FPGA route.

 

I spoke with David Barrand, EMEA Head of Financial Services and David O’Shea, Relationship Manager at Intel, to gain a better understanding of how Intel’s latest technology is helping firms to compete in ever-faster trading environments.

 

HFT Review: What trends are you currently seeing in the way trading technology is evolving?

 

David Barrand: If you look back a year or 18 months, there was a lot of talk about GPGPUs & FPGAs cutting a swathe through the businesses that we deal with. But what we’ve actually seen is that the rate at which Intel’s technology is progressing is such that the advantages that perhaps were there from those alternative technologies are now falling away.

 

Intel can afford to invest and maintain that pace of advancement because the volume at which we sell our technologies is so large. What that means to the market - in terms of being able to conduct high frequency trading - is lower costs for all. It’s almost a democratisation that lowers the barriers to entry and allows far smaller firms, or firms that don’t have such a broad cost base, to compete.

 

Companies for whom HFT technology is their core business are now selecting Intel to run that business across the board. So not only has the gap to FPGAs closed, but firms can see it progressing such that this is the right technology to bet on in the future.

 

HFTR: What kind of technological advancements are we talking about here?

 

David O’Shea: There’s an old saying, “With time, truth and technology change”. What was true in 2005 and what was true in 2012 are two different things.

 

In 2005, when direct feeds started hitting the market and people started using FPGAs for algo trading, we were struggling with the physics of a 150W one-core chip. We also had some big challenges around manufacturing at that moment in time.

 

But with the Intel “tick-tock” model, approximately every two years we bring a new architecture to market, which in the next year we shrink in order to get more performance and more cores.

 

When we introduced the Core Architecture in the Nehalem in 2008, it made a big difference in the number of applications that people would consider putting on an the CPU or the FPGA.

 

HFTR: In what way?

 

DOS: FPGAs did a good job of bringing in data, normalising it and then distributing it to the CPU for higher level functions. But you have to consider the entire process. At one time people envisioned doing the entire process on an FPGA, but that just has not happened for many resaons.

 

FPGA-based feed handlers are very common, but when people want to do more, they end up sending the data to a CPU socket. There were visions at one time about doing a lot of the calculation on the FPGA, but programming FPGAs are fairly challenging. It’s certainly not as easy as performant code with higher level languages.

 

We found that people started looking at using Cuda® as an alternative for the math side of things, but this too can be a more challenging development environment. So, people take applications that were targeted at an FPGA for math, and attempt to run those computations on GPGPUs. But then they were losing the benefit of doing everything on the FPGA card and sending out the data as quickly as though that is possible, because once you leave the card, you incur a latency hit going across the bus.

 

Fragmentation of the code base is a bad thing, and this type of GPU + FPGA + CPU solution makes it worse.

 

HFTR: What sort of latency hit are we talking about here?

 

DOS: It depends on how frequently people refresh these cards, some of the FPGA cards are still PCI Gen1. But once you have to leave the FPGA card, you have to handle the throughput, and actually process the work that allows you to make a trade.

 

All of that work tends to happen today on the CPU, and our latest CPU platform, Sandy Bridge, is really another game changer.

 

HFTR: What are the key elements of Sandy Bridge from a trading architecture perspective?

 

DOS: With Sandy Bridge, we’ve introduced an embedded I/O controller on the socket, and  Data Direct I/O. These two enhancements  mean we can move the data directly from NIC to the Socket and into L3 Cache to L2 Cache where it can  be used by the application.  This eliminates an I/O controller hop and the round trip latency for accessing memory. That provides a big improvement in latency and is why SolarFlare and Mellanox have been able to publish incredible reductions in their latency by using this technology. Mellanox claims a sub 1 microsecondtime.

 

Then you have PCI Gen3, which will let you support  a 40Gig connection. You can meet the data thoughput requirements and reduce latency. Most of the 40gig cards are quad-port 10Gig cards, but the fact is you’re able to support that level of bandwidth, which is a significant amount of throughput.

 

So now you have data being taken directly from the NIC card into L3, you have kernel bypass, you have incredible bandwidth to the sockets, you have very low latency into L2 being used by the application, and you’ve got 8 very powerful cores per socket. And that number is only going to go up.

 

HFTR: How does all of this stack up versus an FPGA-based trading architectures?

 

DOS: We’ve been running tests at various HFT firms and a number of companies that write software to run on our platform, and one of these firms has already found that the Sandy Bridge server with the necessary kernel bypass NICs is outperforming an FPGA. And this was only released in March.

 

The important thing is that you have to think about the entire process. If the FPGA accelerates one piece, and you write that in Verilog or VHDL, and then you take your maths section and write it in Cuda or OpenCL, and then you write all of your transaction-type stuff in x86 compatible languages like C or C++, you’ll need expertise in at least three different languages before doing anything else.

 

DB: Just to re-emphasise, Intel knows how expensive it is to maintain an architecture and continue to develop it. We’ve run many architectures over the years. It is hard work and it takes a lot of effort and money to maintain this level of development. If you’re trying to do that with FPGAs & GPGPUs, which have a far more limited market, then the commercials of that actually limit what you’re able to do to develop in a profitable manner. So people may look back in a few years and point at this period we’re in at the moment and say, “that was the era of the GPGPUs and FPGAs but now we’re on general purpose CPUs again”.

 

HFTR: Gentlemen, thank you for your insight.

 

 

The final part of this series, which will be published at www.thetradingmesh.com/pg/fpga in early September, will have some predictions from various experts on the future of FPGA and hardware-accelerated trading.

 

 

, , , , , , , , , , , , , , , , , , , , , , ,