High frequency data

Last updated

High frequency data refers to time-series data collected at an extremely fine scale. As a result of advanced computational power in recent decades, high frequency data can be accurately collected at an efficient rate for analysis. [1] Largely used in the financial field, high frequency data provides observations at very frequent intervals that can be used to understand market behaviors, dynamics, and micro-structures. [2]

Contents

High frequency data collections were originally formulated by massing tick-by-tick market data, by which each single 'event' (transaction, quote, price movement, etc.) is characterized by a 'tick', or one logical unit of information. Due to the large amounts of ticks in a single day, high frequency data collections generally contain a large amount of data, allowing high statistical precision. [3] High frequency observations across one day of a liquid market can equal the amount of daily data collected in 30 years. [3]

Use

Data collected at high frequencies inform and update stock statistics in real-time HK Shang Huan Sheung Wan De Fu Dao Zhong Des Voeux Road Central shop Sun Hung Kei Financial display sign Hang Seng Stock Index October 2017 IX1.jpg
Data collected at high frequencies inform and update stock statistics in real-time

Due to the introduction of electronic forms of trading and Internet-based data providers, high frequency data has become much more accessible and can allow one to follow price formation in real-time. This has resulted in a large new area of research in the high frequency data field, where academics and researchers use the characteristics of high frequency data to develop adequate models for predicting future market movements and risks. [3] Model predictions cover a wide range of market behaviors including volume, volatility, price movement, and placement optimization. [4]

There is an ongoing interest in both regulatory agencies and academia surrounding transaction data and limit order book data, of which greater implications of trade and market behaviors as well as market outcomes and dynamics can be assessed using high frequency data models. Regulatory agencies take a large interest in these models due to the fact that liquidity and price risks are not fully understood in terms of newer forms of automated trading applications. [4]

High frequency data studies contain value in their ability to trace irregular market activities over a period of time. This information allows a better understanding of price and trading activity and behavior. Due to the importance of timing in market events, high frequency data requires analysis using point processes, which depend on observations and history to characterize random occurrences of events. [4] This understanding was first developed by 2003 Nobel Prize in Economics winner Robert Fry Engle III, who specializes in developing financial econometric analysis methods using financial data and point processes. [4]

High frequency data forms

High frequency data are primarily used in financial research and stock market analysis. Whenever a trade, quote, or electronic order is processed, the relating data are collected and entered in a time-series format. As such, high frequency data are often referred to as transaction data. [4]

There are five broad levels of high frequency data that are obtained and used in market research and analysis:

Trade data

Individual trade data collected at a certain interval within a time series. [4] There are two main variables to describe a single point of trade data: the time of the transaction, and a vector known as a 'mark', which characterizes the details of the transaction event. [5]

Trade and quote data

Data collected details both trades and quotes, including price changes and direction, time stamps, and volume. Such information can be found at the TAQ (Trade and Quote) database operated by the NYSE. [4] Where trade data details the exchange of a transaction itself, quote data details the optimal trading conditions for a given exchange. This information can indicate halts in exchanges and both opening and closing quotes. [6]

Fixed level order book data

Using systems that have been completely computerized, the depth of the market can be assessed using limit order activities that occur in the background of a given market. [4]

Messages on all limit order activities

This data level displays the full information surrounding limit order activities, and can create a reproduction of the trade flow at any given time using information on time stamps, cancellations, and buyer/seller identification. [4]

Data on order book snapshots

Snapshots of the order book activities can be recorded on equi-distant based grids to limit the need to reproduce the order book. This however limits trade analysis ability, and is therefore more useful in understanding dynamics rather than book and trading interaction. [4]

Properties in financial analysis

In financial analysis, high frequency data can be organized in differing time scales from minutes to years. [3] As high frequency data comes in a largely dis-aggregated form over a time-series compared to lower frequency methods of data collection, it contains various unique characteristics that alter the way the data are understood and analyzed. Robert Fry Engle III categorizes these distinct characteristics as irregular temporal spacing, discreteness, diurnal patterns, and temporal dependence. [7]

High frequency data plotted over time on the FTSE 100 index chart FTSE 100 index chart since 1984.png
High frequency data plotted over time on the FTSE 100 index chart

Irregular temporal spacing

High frequency data employs the collection of a large sum of data over a time series, and as such the frequency of single data collection tends to be spaced out in irregular patterns over time. This is especially clear in financial market analysis, where transactions may occur in sequence, or after a prolonged period of inactivity. [7]

Discreteness

High frequency data largely incorporates pricing and transactions, of which institutional rules prevent from drastically rising or falling within a short period of time. This results in data changes based on the measure of one tick. [7] This lessened ability to fluctuate makes the data more discrete in its use, such as in stock market exchange, where popular stocks tend to stay within 5 ticks of movement. Due to the level of discreteness of high frequency data, there tends to be high level of kurtosis present in the set. [7]

Diurnal patterns

Analysis first made by Engle and Russel in 1998 notes that high frequency data follows a diurnal pattern, with the duration between trades being smallest at the open and the close of the market. Some foreign markets, which operate 24 hours a day, still display a diurnal pattern based on the time of the day. [7]

Temporal dependence

Due largely to discreteness in prices, high frequency data are temporally dependent. The spread forced by small tick differences in buying and selling prices creates a trend that pushes the price in a particular direction. Similarly, the duration and transaction rates between trades tend to cluster, denoting dependence on the temporal changes of price. [7]

Ultra-High frequency data

In an observation noted by Robert Fry Engle III, the availability of higher frequencies of data over time incited movement from years, to months, to very frequent intervals collections of financial data. This movement however is not infinite in moving to higher frequencies, but faces a limit when all transactions are eventually recorded. [5] Engle coined this limiting frequency level as ultra-high frequency data. An outstanding quality of this maximum frequency is extreme irregularly spaced data, due to the large spread of time that a dis-aggregated collection imposes. [5] Rather than breaking the sequence of ultra-high frequency data by time intervals, which would essentially cause a loss of data and make the set a lower frequency, methods and models such as the autoregressive conditional duration model can be used to consider varying waiting times between data collection. [5] Effective handling of ultra-high frequency data can be used to increase accuracy of econometric analyses. This can be accomplished with two processes: data cleaning and data management. [6]

Data cleaning

Data cleaning, or data cleansing, is the process of utilizing algorithmic functions to remove unnecessary, irrelevant, and incorrect data from high frequency data sets. [6] Ultra-high frequency data analysis requires a clean sample of records to be useful for study. As velocities in ultra-high frequency collection increase, more errors and irrelevant data are likely to be identified in the collection. [6] Errors that occur can be attributed to human error, both intentional (e.g. 'dummy' quotes) and unintentional (e.g. typing mistake), or computer error, which occur with technical failures. [8]

Data management

Data Management refers to the process of selecting a specific time-series of interest within a set of ultra-high frequency data to be pulled and organized for the purpose of an analysis. Various transactions may be reported at the same time and at different price levels, and econometric models generally require one observation at each time stamp, necessitating some form of data aggregation for proper analysis. [6] Data management efforts can be effective to remedy ultra-high frequency data characteristics including irregular spacing, bid-ask bounce, and market opening and closing. [6]

Alternate uses outside of financial trading

A study published in the Freshwater Biology journal focusing on episodic weather effects on lakes highlights the use of high frequency data to further understand meteorological drivers and the consequences of "events", or sudden changes to physical, chemical, and biological parameters of a lake. [9] Due to advances in data collection technology and human networks coupled with the placement of high frequency monitoring stations at a variety of lake types, these events can be more effectively explored. The use of high frequency data in these studies is noted to be an important factor in allowing analyses of rapidly occurring weather changes at lakes, such as wind speed and rainfall, increasing understandings of lake capacities to handle events in the wake of increasing storm severity and climate change. [9]

High frequency data has been found to be useful in the forecasting of inflation. A study by Michele Mondugno in the International Journal of Forecasting indicates that use of daily and monthly data at a high frequency have generally improved the forecast accuracy of total CPI inflation in the United States. [10] The study utilized a comparison of lower frequency models with one that considered all variables at a high frequency. It was ultimately found that the increased accuracy of both highly volatile transport and energy components of prices in the high frequency inflation model led to greater performance and more accurate results. [10]

The use of half-life estimation to evaluate speeds of mean reversion in economic and financial variables has faced issues in regards to sampling, as a half-life of about 13.53 years would require 147 years of annual data according to early AR process models. [11] As a result, some scholars have utilized high frequency data to estimate half-life annual data. While use of high frequency data can face some limitations to discovering true half-life, mainly through the bias of an estimator, utilizing a high frequency ARMA model has been found to consistently and effectively estimate half-life with long annual data. [11]

See also

Related Research Articles

In finance, technical analysis is an analysis methodology for analysing and forecasting the direction of prices through the study of past market data, primarily price and volume. As a type of active management, it stands in contradiction to much of modern portfolio theory. The efficacy of technical analysis is disputed by the efficient-market hypothesis, which states that stock market prices are essentially unpredictable, and research on whether technical analysis offers any benefit has produced mixed results. It is distinguished from fundamental analysis, which considers a company's financial statements, health, and the overall state of the market and economy.

Economic data are data describing an actual economy, past or present. These are typically found in time-series form, that is, covering more than one time period or in cross-sectional data in one time period. Data may also be collected from surveys of for example individuals and firms or aggregated to sectors and industries of a single economy or for the international economy. A collection of such data in table form comprises a data set.

<span class="mw-page-title-main">Clive Granger</span> British economist & Nobel laureate (1934–2009)

Sir Clive William John Granger was a British econometrician known for his contributions to nonlinear time series analysis. He taught in Britain, at the University of Nottingham and in the United States, at the University of California, San Diego. Granger was awarded the Nobel Memorial Prize in Economic Sciences in 2003 in recognition of the contributions that he and his co-winner, Robert F. Engle, had made to the analysis of time series data. This work fundamentally changed the way in which economists analyse financial and macroeconomic data.

Economic forecasting is the process of making predictions about the economy. Forecasts can be carried out at a high level of aggregation—for example for GDP, inflation, unemployment or the fiscal deficit—or at a more disaggregated level, for specific sectors of the economy or even specific firms. Economic forecasting is a measure to find out the future prosperity of a pattern of investment and is the key activity in economic analysis. Many institutions engage in economic forecasting: national governments, banks and central banks, consultants and private sector entities such as think-tanks, companies and international organizations such as the International Monetary Fund, World Bank and the OECD. A broad range of forecasts are collected and compiled by "Consensus Economics". Some forecasts are produced annually, but many are updated more frequently.

<span class="mw-page-title-main">Robert F. Engle</span> American economist & Nobel laureate (born 1942)

Robert Fry Engle III is an American economist and statistician. He won the 2003 Nobel Memorial Prize in Economic Sciences, sharing the award with Clive Granger, "for methods of analyzing economic time series with time-varying volatility (ARCH)".

Financial econometrics is the application of statistical methods to financial market data. Financial econometrics is a branch of financial economics, in the field of economics. Areas of study include capital markets, financial institutions, corporate finance and corporate governance. Topics often revolve around asset valuation of individual stocks, bonds, derivatives, currencies and other financial instruments.

Articles in economics journals are usually classified according to JEL classification codes, which derive from the Journal of Economic Literature. The JEL is published quarterly by the American Economic Association (AEA) and contains survey articles and information on recently published books and dissertations. The AEA maintains EconLit, a searchable data base of citations for articles, books, reviews, dissertations, and working papers classified by JEL codes for the years from 1969. A recent addition to EconLit is indexing of economics journal articles from 1886 to 1968 parallel to the print series Index of Economic Articles.

Algorithmic trading is a method of executing orders using automated pre-programmed trading instructions accounting for variables such as time, price, and volume. This type of trading attempts to leverage the speed and computational resources of computers relative to human traders. In the twenty-first century, algorithmic trading has been gaining traction with both retail and institutional traders. A study in 2019 showed that around 92% of trading in the Forex market was performed by trading algorithms rather than humans.

RATS, an abbreviation of Regression Analysis of Time Series, is a statistical package for time series analysis and econometrics. RATS is developed and sold by Estima, Inc., located in Evanston, IL.

Market microstructure is a branch of finance concerned with the details of how exchange occurs in markets. While the theory of market microstructure applies to the exchange of real or financial assets, more evidence is available on the microstructure of financial markets due to the availability of transactions data from them. The major thrust of market microstructure research examines the ways in which the working processes of a market affect determinants of transaction costs, prices, quotes, volume, and trading behavior. In the twenty-first century, innovations have allowed an expansion into the study of the impact of market microstructure on the incidence of market abuse, such as insider trading, market manipulation and broker-client conflict.

<span class="mw-page-title-main">Eric Ghysels</span> Belgian economist (born 1956)

Eric Ghysels is a Belgian economist with interest in finance and time series econometrics, and in particular the fields of financial econometrics and financial technology. He is the Edward M. Bernstein Distinguished Professor of Economics at the University of North Carolina and a Professor of Finance at the Kenan-Flagler Business School. He is also the Faculty Research Director of the Rethinc.Labs at the Frank Hawkins Kenan Institute of Private Enterprise.

<span class="mw-page-title-main">Adaptive Modeler</span> Financial software application

Altreva Adaptive Modeler is a software application for creating agent-based financial market simulation models for the purpose of forecasting prices of real world market traded stocks or other securities. The technology it uses is based on the theory of agent-based computational economics (ACE), the computational study of economic processes modeled as dynamic systems of interacting heterogeneous agents.

Demand forecasting is the prediction of the quantity of goods and services that will be demanded by consumers at a future point in time. More specifically, the methods of demand forecasting entail using predictive analytics to estimate customer demand in consideration of key economic conditions. This is an important tool in optimizing business profitability through efficient supply chain management. Demand forecasting methods are divided into two major categories, qualitative and quantitative methods. Qualitative methods are based on expert opinion and information gathered from the field. This method is mostly used in situations when there is minimal data available for analysis such as when a business or product has recently been introduced to the market. Quantitative methods, however, use available data, and analytical tools in order to produce predictions. Demand forecasting may be used in resource allocation, inventory management, assessing future capacity requirements, or making decisions on whether to enter a new market.

Peter Reinhard Hansen is the Henry A. Latané Distinguished Professor of Economics at the University of North Carolina at Chapel Hill. He has previously taught at Brown University, Stanford Graduate School of Business, Stanford University, and the European University Institute.

In financial econometrics, the Markov-switching multifractal (MSM) is a model of asset returns developed by Laurent E. Calvet and Adlai J. Fisher that incorporates stochastic volatility components of heterogeneous durations. MSM captures the outliers, log-memory-like volatility persistence and power variation of financial returns. In currency and equity series, MSM compares favorably with standard volatility models such as GARCH(1,1) and FIGARCH both in- and out-of-sample. MSM is used by practitioners in the financial industry to forecast volatility, compute value-at-risk, and price derivatives.

High-frequency trading (HFT) is a type of algorithmic trading in finance characterized by high speeds, high turnover rates, and high order-to-trade ratios that leverages high-frequency financial data and electronic trading tools. While there is no single definition of HFT, among its key attributes are highly sophisticated algorithms, co-location, and very short-term investment horizons in trading securities. HFT uses proprietary trading strategies carried out by computers to move in and out of positions in seconds or fractions of a second.

A financial data vendor provides market data to financial firms, traders, and investors. The data distributed is collected from sources such as stock exchange feeds, brokers and dealer desks or regulatory filings.

The methodology of econometrics is the study of the range of differing approaches to undertaking econometric analysis.

In statistics, signal processing, and econometrics, an unevenlyspaced time series is a sequence of observation time and value pairs in which the spacing of observation times is not constant.

<span class="mw-page-title-main">John Rust</span> American economist and econometrician (born 1955)

John Philip Rust is an American economist and econometrician. John Rust received his PhD from MIT in 1983 and taught at the University of Wisconsin, Yale University and University of Maryland before joining Georgetown University in 2012. John Rust was awarded Frisch Medal in 1992 and became the fellow of Econometric Society in 1993.

References

  1. Ruey S. Tsay (2000) Editor's Introduction to Panel Discussion on Analysis of High-Frequency Data, Journal of Business & Economic Statistics, 18:2, 139-139, doi : 10.1080/07350015.2000.10524855
  2. Andersen, T. G. (2000). Some reflections on analysis of high-frequency data. Journal of Business & Economic Statistics, 18(2), 146-153. doi : 10.1080/07350015.2000.10524857
  3. 1 2 3 4 Dacorogna, M. M. (2001). An introduction to high-frequency finance. San Diego: Academic Press.
  4. 1 2 3 4 5 6 7 8 9 10 Hautsch, N., & SpringerLink (Online service). (2012;2011;). Econometrics of financial high-frequency data (2012th ed.). Heidelberg;Berlin;: Springer. doi : 10.1007/978-3-642-21925-2
  5. 1 2 3 4 Engle, R. F. (2000). The econometrics of ultra-high-frequency data. Econometrica, 68(1), 1-22. doi : 10.1111/1468-0262.00091
  6. 1 2 3 4 5 6 Brownlees, C. T., & Gallo, G. M. (2006). Financial econometric analysis at ultra-high frequency: Data handling concerns. Computational Statistics and Data Analysis, 51(4), 2232-2245. doi : 10.1016/j.csda.2006.09.030
  7. 1 2 3 4 5 6 R. Russell, Jeffrey & F. Engle, Robert. (2010). Analysis of High-Frequency Data. Handbook of Financial Econometrics, Vol 1. 383-426. 10.1016/B978-0-444-50897-3.50010-9.
  8. Verousis, T., & Ap Gwilym, O. (2010). An improved algorithm for cleaning ultra high-frequency data. Journal of Derivatives & Hedge Funds, 15(4), 323-340. doi : 10.1057/jdhf.2009.16
  9. 1 2 JENNINGS, E., JONES, S., ARVOLA, L., STAEHR, P. A., GAISER, E., JONES, I. D., Teknisk-naturvetenskapliga vetenskapsområdet. (2012). Effects of weather‐related episodic events in lakes: An analysis based on high‐frequency data. Freshwater Biology, 57(3), 589-601. doi : 10.1111/j.1365-2427.2011.02729.x
  10. 1 2 Modugno, M. (2013). Now-casting inflation using high frequency data. International Journal of Forecasting, 29(4), 664-675. doi : 10.1016/j.ijforecast.2012.12.003
  11. 1 2 Huang, M., Liao, S., & Lin, K. (2015). Augmented Half‐Life estimation based on High‐Frequency data. Journal of Forecasting, 34(7), 523-532. doi : 10.1002/for.2342