Where is Server I/O Headed and is it Keeping Up with Compute and Storage?
Tue, 06 Mar 2012 03:33:00 GMT
An Interview with Yaron Haviv of Mellanox Technologies
Sponsored by Mellanox Technologies
In this interview for the High Frequency Trading Review, Mike O’Hara talks to Yaron Haviv, Vice President of Data Center Solutions at Mellanox Technologies (www.mellanox.com) about some of the latest technological developments impacting trading systems architectures. The company recently announced the World’s Lowest Latency for High Frequency Trading
HFT Review: Yaron, welcome to the HFT Review. What is the business justification for buying higher speed networks?
Yaron Haviv: It’s all about being faster than your competition and for different sides of the trade cycle that means different things. Exchanges can gain larger market share by providing more liquidity. Buy-side firms can be the first to respond to new market data, arbitrage opportunities and make a larger profit than their slower competitors. Sell-side firms can provide faster access and pre-trade validations, enabling their customers to trade faster.
For an algorithmic trader, profitability is typically a function of the quality of the algorithm, the quality/timeliness of the data used and the execution speed. Timeliness of data and execution speeds are a direct function of I/O and network latency, hence the move to low latency networks such as 10GbE, InfiniBand, efficient server I/O leveraging OS bypass and RDMA.
HFTR: What is the benefit of using the new PCIe 3.0 support, available on new server platforms, for trading applications?
YH: The new PCIe 3.0 bus is twice as fast as PCIe 2.0. It can deliver more than 6GB/s of sustained data rate to a single adapter. It’s not just about bandwidth. Now every PCIe message takes half the time and it can cut the overall transaction latency by hundreds of nanoseconds.
In addition to reducing latency for a single message, the PCIe bus speed has a significant impact on multicast performance in multi-core environments. If there are multiple processes listening on the same multicast stream (a very typical scenario for market data consumption), each additional process suffers the added latency penalty of an additional replication on the PCIe bus. The move to PCIe 3.0 cuts this penalty in half, enabling better predictability and scalability for multi-core environments. Traders are buying new servers with many cores and need to ensure that their algorithms will be just as effective no matter which core they run on.
Another benefit is the systems now have a better match between the memory speed and the I/O subsystem, leading to greater CPU and memory efficiencies.
HFTR: If a user only cares about latency, should s/he bother with 40GbE or InfiniBand?
YH: In many cases, higher bandwidth has a direct impact on latency. The most notable ones would be larger message sizes/rates, congested networks and bursty traffic, which is very common in trading environments as market data is neither constant nor predictable. In essence, the bandwidth helps traders guarantee that they will get the same latency, as measured in their lab conditions, on the live system, where traffic is bursty and unpredictable and message sizes and rates vary. Exchanges definitely need higher bandwidth to be able to deal with increasing message rates, while sustaining the low latency traders expect.
In 2012, we will probably see some of the exchanges begin to offer direct connectivity over either 40GbE or InfiniBand, in which case any lower speed interconnect in the middle would add a terrible queuing latency penalty.
HFTR: With new data center Ethernet, is there a benefit in using lossless networks for HFT?
YH: Yes, lossless Ethernet doesn’t drop packets like traditional Ethernet whenever oversubscription or congestion occurs; rather, it pauses incoming traffic for a very brief period. The immediate impact is that the network is more predictable, and has less jitter from the drops and retransmissions. Low jitter is no less important than low latency. Predictability is key for having an algorithm effectively executed, monitored and improved. If you can’t predict how much time it will take your order to hit the exchange, you won’t be able to accurately predict your chance of making or losing money.
Many lossless Ethernet switches also support a cut-through mode of operation which can reduce single-hop latency from microseconds to a couple of hundred nanoseconds.
HFTR: How does networking keep up with the Moore law and CPU speeds?
YH: In reality it doesn’t. If you look at the latest Intel platforms using Sandy Bridge chips, a dual socket system has 100GB/s of memory bandwidth. This is double the memory speed of the previous generation (Nehalem) and about 10x faster than the older Xeon systems.
In comparison, a 10GbE link provides only 1.2GB/s of bandwidth, which is only 1/80 of the system capacity. This is one of the main reasons that have led companies to adopt InfiniBand and more recently to adopt 40GbE, which provides a much better ratio between the system capacity and its I/O.
HFTR: What can one do to reduce system costs or increase efficiency in the data center?
YH: Servers nowadays are quite similar, so you can’t save much on those. There are two domains that you can improve on. First, you can create much more cost and power efficient networks and second, you can increase the application efficiency.
Users can design scale-out network fabrics today at a fraction of the cost and power compared to traditional network gear, by leveraging a new generation of switching products built on high-density, low-power silicon. Using high-density silicon components not only allow for improved networking speeds, they also significantly reduce and optimize the required hardware for network switches and cables and require less networking tiers.
On the application side, in order to deliver higher performance on the same hardware, and be able to utilize the hardware capabilities to their limit, typically we will need to eliminate bottle-necks and inefficiencies in software, and mainly in the OS. OS involvement adds latency, jitter and CPU cycles to every I/O transaction, due to queuing, buffering, copying and context-switching, and prohibits the application from reaching maximum efficiency.
By utilizing technologies such as OS bypass and RDMA, these I/O bottlenecks can be removed and the maximum capabilities of the hardware can be utilized, in terms of latency, jitter and throughput. These types of acceleration technologies/products exist today not just for market data and trading, but also for various other I/O intensive applications, such as Memcached, Hadoop and even storage.
Application efficiency can help run jobs faster or alternatively run the same job on fewer servers. This is obviously a direct and substantial saving on costs, cooling, space, and maintenance.
HFTR: Can people use server Virtualization and still keep up with the performance requirements of HFT?
YH: Virtualization is not just about running more Virtual Machines (VMs) on one box. It also brings a new level of administration efficiency to the data center. However, most people today would not consider putting any performance sensitive applications on their virtualized environment, not to mention trading applications, so the “private clouds” of the banks end up running mostly virtual desktops and back office applications.
Infrastructure managers and even business divisions continue to be under pressure to move more applications to the virtualized environment in order to get the cost and flexibility benefits, and the barriers to doing that are becoming more psychological than actual.
Today, we have the technology to allow guest VMs to have direct access to the network adapter, without having to go through layers of hypervisors and virtual switches, which add latency, jitter and CPU cycles. These VMs can reside on the same physical server as less performance sensitive VMs, each with its own QoS configuration. Add to that the ability to have user space access to that NIC from inside a VMware or KVM virtual machine using OS bypass or RDMA, and you can get I/O performance faster than most traditional physical machines.
Now customers can have the benefits of virtualization without compromising performance, making it applicable to HFT customers. We do not foresee virtualization in the co-lo any time soon, but various database applications that do have specific performance requirements could be the first ones to make the move.
HFTR: Can high-speed SSD and Flash technologies be incorporated into the network and provide application benefits?
YH: One very visible IT trend is the move to SSDs, as they are hundreds of times faster than standard drives. For example, an SSD can serve data in 20 microseconds versus a disk that takes 5 milliseconds. This means that in some cases, we can now store and retrieve data from SSDs in place of memory. We also see some vendors delivering “memory” appliances with a lot of battery backed or replicated RAM.
The problem with that is traditional storage networks and storage access protocols haven’t yet aligned with those speeds, so some storage products will still take more than a millisecond to read the data even from an SSD drive. This is why various vendors have developed new direct storage access protocols that leverage RDMA, and allow users remote storage access at the speeds of SSDs or even RAM. As an example, using iSCSI over RDMA achieves more than 95% bandwidth efficiency, while using it over TCP/IP achieves only around 50% efficiency, and this can be seen on both InfiniBand and 10 or 40GbE, using RoCE.
Business intelligence applications, risk analysis and back-testing can all benefit from these types of speeds from storage, and that benefit can have a very high return-on-investment. For example, if a new and improved algorithm goes live in a day instead of a week, there’s a lot of additional money to be made, and those algorithms are typically tested against days, weeks or months of historical tick data.
HFTR: What kind of innovations are expected in the networking space in the next 2 years?
YH: One obvious direction is the increase in speed. We will see 100Gb/s InfiniBand and Ethernet coming to market in the next couple of years. With the innovations in speed, there will be greater emphasis on the role of the interconnect in driving business results and lowering data center CapEx and OpEx. The data center depends on the interconnect for efficient connectivity between servers and between servers and storage.
Another hot area is virtualization, with native virtualization support embedded in the switches and NICs. On the physical level we may see lower cost optical solutions that will target the 100Gb/s transition.
Another interesting trend is Software Defined Networking or OpenFlow, where switches provide low-level APIs and controller software can program switches forwarding and policies. This trend will commoditize some of the switching software as well as allow better user-control over the network. Some large-scale service providers already use this type of technology to lower costs of larger network fabrics.
HFTR: Thank you Yaron
Mellanox Technologies Inc.
350 Oakmead Parkway, Suite 100
Sunnyvale, CA 94085
Tel: (408) 970-3400
Fax: (408) 970-3403
Yaron Haviv Bio:
Haviv has served as Mellanox’s Vice President of Data Center Solutions since February 2011. Prior to Mellanox, Mr. Haviv was Chief Technology Officer at Voltaire from 2001 to 2011, vice president of research and development and chief designer from 1999 to 2001, and was the chief designer responsible for the system architecture of Voltaire’s InfiniBand solutions from 1997 to 1999. Previously, Mr. Haviv served as a hardware and chip designer at Scitex Corporation Ltd., an Israeli-based developer, manufacturer, marketer and servicer of interactive computerized prepress systems for the graphic design, printing, and publishing markets. Mr. Haviv holds a B.Sc. in Electrical Engineering from Tel-Aviv University, Israel.
Mellanox Technologies (NASDAQ: MLNX, TASE: MLNX) is a leading supplier of end-to-end InfiniBand and Ethernet connectivity solutions and services for servers and storage. Mellanox products optimize data center performance and deliver industry-leading bandwidth, scalability, power conservation and cost-effectiveness while converging multiple legacy network technologies into one future-proof architecture. The company offers innovative solutions that address a wide range of markets including HPC, enterprise, mega warehouse data centers, cloud computing, Internet and Web 2.0.