The Growing Demands for Historical Tick Data in the HFT and Systematic Hedge Fund Space
Wed, 27 Jul 2011 07:08:00 GMT
An Interview with Gordon Bloor
In this interview for the High Frequency Trading Review, Mike O’Hara talks to Gordon Bloor, Chief Executive of Morningstar Real-Time Data, the division of Morningstar that specialises in delivery of consolidated real time and historical market data.
Firms that run algorithmic, systematic and high frequency trading models cannot exist without historical data for back-testing their strategies. Data requirements are becoming increasingly broad (in terms of cross-market and cross-asset class) and deep (in terms of the granularity, i.e. full depth-of-book on a tick by tick basis). Sourcing such data cost-effectively, in a consolidated, normalised format, can be a real challenge to such firms.
Morningstar may not be as much of a household name as incumbents such as Thomson Reuters and Bloomberg, but a growing list of clients in the high frequency trading and systematic hedge fund space is testament to their ability to satisfy these demands.
HFT Review: Gordon can you give us some quick background on who you are and what you do, maybe starting with the Tenfore connection?
Gordon Bloor: Certainly. At Tenfore we were a small, independent, London-based but globally operating vendor owned by a UK private equity firm. During Q2 2008, they took the decision to sell Tenfore and we put a process in place. Morningstar came into the frame and acquired us for a couple of reasons. The first was that they were already a large consumer of real-time feeds from different vendors globally. They wanted to leverage that real-time piece across lots of different platforms, but were limited by vendor’s commercial restrictions. So owning a vendor, or becoming a primary vendor directly connected to the exchanges, was a key strategic step for them to leverage that data across multiple platforms.
Having worked with many of the larger global vendors, Morningstar also realized that the global market isn’t serviced particularly well and they felt that as a company, with significant resource and with senior relationships across the whole of the investment spectrum, there was an opportunity for them to make an impact in the real-time space. They had the financial muscle and the relationships to take on the “big boys” and be a bit disruptive. So they saw this as an opportunity to enter a new sphere of the investment market business.
HFTR: And the business group that you run is Morningstar Real Time Data, which is very much focused on delivering not just real time data but historical data too, correct?
GB: Yes. So although we’re called “real time”, it’s a bit of misnomer because what we’re finding is that clients increasingly want their vendor to do more of the ‘heavy lifting’ in terms of the processing of the data. Whether that’s storage and delivery of tick history or intra-day processed data like one or five-minute bars, etc, we’re seeing a big demand for people to take the results of that processing rather than taking the raw material and doing the processing on-site themselves.
HFTR: If we look at the increasing numbers of quant funds, systematic hedge funds, HFT firms who require historical tick data to develop and test their strategies, what kinds of demands are you typically seeing from this community?
GB: The demands generally fall into two categories. There’s the tactical demands, where someone has written a new trading strategy and wants to back test that against certain market conditions. To do that they need a limited amount of data to allow them to scan the markets that particular trading strategy will be active across and look for correlations that will prove or disprove the efficiency of the model. It’s tactical because it’s immediate and specific to a trading strategy.
The other demand is more strategic; people are building repositories of tick history for their own continued usage. We have a number of organisations for example, that have asked for copies of our entire historical tick database, which spans up to 90 months for each of the 80-odd exchanges that we cover. They want to take the full global historical tick database, which is almost a hundred terabytes compressed, and then take continual updates of that data on a daily basis. That gives them the freedom to run different strategies for different parts of their business, to analyze that data as much as they want on an ongoing basis.
HFTR: So how do your clients typically take delivery of these terabytes of data?
GB: Generally the history is delivered on physical media, clients will have infrastructure on their site and we’ll seed that infrastructure by delivering tranches of data. Some people might ask us to give them the latest year first and then work their way back through the history. Other people might want to do it exchange by exchange, so they’ll ask us to send all the LSE history that we’ve got and then work on to NASDAQ, NYSE, and so on.
Delivering on-going tick files, Morningstar will provide dedicated infrastructure for delivery via FTP at required intervals either intraday or EOD.
HFTR: What about the format of the data itself?
GB: We deliver it in a normal but highly compressed flat file format. Internally we store data in our own proprietary database format that’s been developed over time and designed for efficiency in both storing and retrieving the data. But delivery is very configurable so you can choose whichever instruments or markets you want, whichever fields you want, as well as the format you want the file to be in. There are lots of bespoke products out in the marketplace for manipulating tick data on the clients’ sites, like OneTick. We’ll load our data into whichever of those a client has.
HFTR: What about the granularity of the data in terms of top of book, level 2, full depth, etc?
GB: Our default structure, which is valid for 95 percent of the markets we take, is to take the vendor feed to its full depth. So for example with EuroNext we take Cash Premium, which is full order book. However, historically EuroNext had ten levels of depth. So our history has increased as EuroNext has added services to those vendor feeds. Generally our policy wherever we collect data is we’ll take the greatest level of depth available.
HFTR: I imagine you probably don’t know exactly what HFT firms and systematic hedge funds are doing with this data when they get it other than feeding it into their black boxes and doing all sorts of clever “stuff”. But are you seeing client requirements evolve in any way? Are there any requests coming up now that you didn’t see a year ago?
GB: I think the awareness of the availability of this data is increasing, and there’s a broader population of firms now looking at consolidated historical data. But you’re right, we very rarely get involved in the detailed analysis of what a client needs to do with the data because generally, HFT firms are relatively private about the way they conduct their business. They also have a full understanding of exactly what data they need in order to run those trading strategies, so they will generally come with a fairly evolved requirement for the depth and the breadth of the data they require.
Historically firms have struggled to find full depth data from a consolidated source. A user can go to lots of individual exchanges and take their full book data but if you’re taking files from separate sources, actually knitting it all together into a single format, single database, single synchronized feed is very difficult. Whereas if you take a file from a single source, which has already consolidated in real time, the synchronization is there and it’s time-stamped so people can really see the activity across multiple venues in a synchronized fashion.
But this data is not available everywhere, there’s a limited choice. You can get it direct from the sources (i.e. the exchanges) or you can get it from one or two global vendors. With the large incumbent vendors, their commercial model really impacts how you can use the data internally. One of the unique things about Morningstar is that we offer an enterprise-wide agreement to use our data, so you just source it from us once and you can use it multiple times, in multiple geographies, across different teams, etc.
This is quite different to the model with most incumbent vendors, where there are normally very tight limitations on licensed use. I came across a scenario recently where a large Japanese bank was licensing tick data for a seven figure fee annually. The bank wanted to provide the same data to different teams in the same office locations, but they were quoted an additional seven figure fee! Partnering with Morningstar that wouldn’t occur, you would have an enterprise-wide license and could re-use the service across the organization, saving repeat charges for the same data.
HFTR: What about the issue of integrating historical and real-time data? I imagine that the firms who run these models need to have a consistent format between what they’re doing on a daily and intraday basis, versus what they’re back testing with their models.
GB: The quality of Morningstar’s historical tick data stands testament to the quality of its consolidated data feeds; someone can look over a historical capture of real time feeds and assess the quality for their trading applications. We would always advise, where the feeds are of the right nature for trading activities there is an advantage in having consistency between the historical data you’re analysing to implement the trading strategy and the real time data you’re using for the trading strategy itself.
That said, in most environments, the tick data analysis is being done on exchange-traded data and it’s therefore very possible to analyze data from one vendor and trade using a direct feed or even a feed from another vendor.
HFTR: Looking at things from a slightly different perspective, moving away from the pure HFT side of things and towards firms who are servicing customers, one of the ways that historical data is being used more and more is on the compliance side, having to prove best execution through transaction cost analysis (TCA). Are you seeing a growth in clients using it this way?
GB: Yes, we’ve worked with a couple of third parties providing TCA products into the marketplace. In terms of compliance, the use of this kind of tick data is probably not as widespread as it is for testing strategies but obviously, under MiFID’s best execution requirements, historical book data is enormously valuable. So although our primary area of interest is still definitely from the trading side, we are seeing more demand from the post-transaction side as well.
HFTR: Can you tell us a little bit about your commercial model? If a client wants to test out some sample data, what is involved?
GB: Firstly, we provide samples based on the client’s requirements. The only real way of analyzing any source is to take relevant data for the markets and of the granularity that the strategy requires. Within reason, we’ll allow the client to select relevant data for the trial so they can be sure they are gaining a true representation of the data they will receive. If a client is interested in particular markets, they select either a short period of time for that entire market or group of instruments or a smaller data set over a longer period of time. They then specify their bespoke file format, before we create the file.
HFTR: Finally, I was wondering if you had any insights on what some of the “best practices” might be in terms of handling these very large datasets, and typical pitfalls or mistakes to avoid?
GB: Not understanding the logistics up front is a typical pitfall, it’s not complicated but large tick data extracts will be many terabytes and are not something that will generally be processed on regular servers or in spreadsheet applications. Typically a high performance SAN (large amount of specialized disks) is required alongside specialist applications and while readily available, the speed you can process the data will be directly proportional to the level of investment in the hardware and or the choice of specialist application utilized to process the data
HFTR: Thanks Gordon.