Stock correlation network

Last updated

A stock correlation network is a type of financial network based on stock price correlation used for observing, analyzing and predicting the stock market dynamics.

Contents

Background

In the last decade, financial networks have attracted more attention from the research community. A study on company ownership based network showed a power law distribution with majority of companies controlled by small number of people. Another study focused on board of directors where the network was created between companies if represented by the same member on board. The board membership network thus created resulted in a power law with small number of board members representing large number of companies. Several studies have proposed network based models for studying the stock correlation network. [1] [2] [3] [4] Stock correlation network has proven its efficacy in predicting market movements. Chakrabortia and Onella showed that the average distance between the stocks can be a significant indicator of market dynamics. [5] Their work focused on stock market (1985–1990) that included the stock market crash of 1987 (Black Monday). Andrew Lo and Khandaniy worked on the network of different hedge funds and observed the patterns before the August 2007 stock market turbulence. [6]

Methods

The basic approach for building the stock correlation network involves two steps. The first step aims at finding the correlation between each pair of stock considering their corresponding time series. The second step applies a criterion to connect the stocks based on their correlation. The popular method for connecting two correlated stocks is the minimum spanning tree method. The other methods are, planar maximally filtered graph, and winner take all method. In all three methods, the procedure for finding correlation between stocks remains the same.

Step 1: Select the desired time series data. The time series data can be daily closing prices, daily trading volumes, daily opening prices, and daily price returns.

Step 2: For a particular time series selected from step 1, find the cross correlation for each pair of stocks using the cross correlation formula.

Step 3: Compute the cross correlation for all the stocks and create a cross correlation matrix . The cross correlation is between stock and stock and their time series data is free of time delays.

Step 4: In case of the minimum spanning tree method a metric distance is calculated using the cross correlation matrix.

=

Where is the edge distance between stock and stock . The minimum spanning tree and planar maximally filtered graph may cause loss of information, i.e., some high correlation edges are discarded and low correlation edges are retained because of the topological reduction criteria. [7] Tse, et al. introduced the winner take all connection criterion where in the drawback of minimum spanning tree and planar maximally filtered graph are eliminated. [7] In winner take all method, step 1-3 are retained. However, in step 4 the nodes are linked based on a threshold.

λ

The threshold values (λ) can be set between 0 and 1. Tse, et al. showed that for large values of threshold (0.7, 0.8, and 0.9) the stock correlation networks are scale free where the nodes linked in a manner that their degree distribution follows a power law. [7] For small values of threshold, the network tends to be fully connected and does not exhibit scale free distribution.

Related Research Articles

In graph theory, a planar graph is a graph that can be embedded in the plane, i.e., it can be drawn on the plane in such a way that its edges intersect only at their endpoints. In other words, it can be drawn in such a way that no edges cross each other. Such a drawing is called a plane graph or planar embedding of the graph. A plane graph can be defined as a planar graph with a mapping from every node to a point on a plane, and from every edge to a plane curve on that plane, such that the extreme points of each curve are the points mapped from its end nodes, and all curves are disjoint except on their extreme points.

<span class="mw-page-title-main">Percolation theory</span> Mathematical theory on behavior of connected clusters in a random graph

In statistical physics and mathematics, percolation theory describes the behavior of a network when nodes or links are added. This is a geometric type of phase transition, since at a critical fraction of addition the network of small, disconnected clusters merge into significantly larger connected, so-called spanning clusters. The applications of percolation theory to materials science and in many other disciplines are discussed here and in the articles network theory and percolation.

<span class="mw-page-title-main">Scale-free network</span> Network whose degree distribution follows a power law

A scale-free network is a network whose degree distribution follows a power law, at least asymptotically. That is, the fraction P(k) of nodes in the network having k connections to other nodes goes for large values of k as

Econophysics is a heterodox interdisciplinary research field, applying theories and methods originally developed by physicists in order to solve problems in economics, usually those including uncertainty or stochastic processes and nonlinear dynamics. Some of its application to the study of financial markets has also been termed statistical finance referring to its roots in statistical physics. Econophysics is closely related to social physics.

<span class="mw-page-title-main">Hubbard model</span>

The Hubbard model is an approximate model used to describe the transition between conducting and insulating systems. It is particularly useful in solid-state physics. The model is named for John Hubbard.

The Hurst exponent is used as a measure of long-term memory of time series. It relates to the autocorrelations of the time series, and the rate at which these decrease as the lag between pairs of values increases. Studies involving the Hurst exponent were originally developed in hydrology for the practical matter of determining optimum dam sizing for the Nile river's volatile rain and drought conditions that had been observed over a long period of time. The name "Hurst exponent", or "Hurst coefficient", derives from Harold Edwin Hurst (1880–1978), who was the lead researcher in these studies; the use of the standard notation H for the coefficient also relates to his name.

<span class="mw-page-title-main">Barabási–Albert model</span>

The Barabási–Albert (BA) model is an algorithm for generating random scale-free networks using a preferential attachment mechanism. Several natural and human-made systems, including the Internet, the World Wide Web, citation networks, and some social networks are thought to be approximately scale-free and certainly contain few nodes with unusually high degree as compared to the other nodes of the network. The BA model tries to explain the existence of such nodes in real networks. The algorithm is named for its inventors Albert-László Barabási and Réka Albert.

<span class="mw-page-title-main">Community structure</span> Concept in graph theory

In the study of complex networks, a network is said to have community structure if the nodes of the network can be easily grouped into sets of nodes such that each set of nodes is densely connected internally. In the particular case of non-overlapping community finding, this implies that the network divides naturally into groups of nodes with dense connections internally and sparser connections between groups. But overlapping communities are also allowed. The more general definition is based on the principle that pairs of nodes are more likely to be connected if they are both members of the same community(ies), and less likely to be connected if they do not share communities. A related but different problem is community search, where the goal is to find a community that a certain vertex belongs to.

Hierarchical clustering is one method for finding community structures in a network. The technique arranges the network into a hierarchy of groups according to a specified weight function. The data can then be represented in a tree structure known as a dendrogram. Hierarchical clustering can either be agglomerative or divisive depending on whether one proceeds through the algorithm by adding links to or removing links from the network, respectively. One divisive technique is the Girvan–Newman algorithm.

<span class="mw-page-title-main">Reciprocity (network science)</span>

In network science, reciprocity is a measure of the likelihood of vertices in a directed network to be mutually linked. Like the clustering coefficient, scale-free degree distribution, or community structure, reciprocity is a quantitative measure used to study complex networks.

The percolation threshold is a mathematical concept in percolation theory that describes the formation of long-range connectivity in random systems. Below the threshold a giant connected component does not exist; while above it, there exists a giant component of the order of system size. In engineering and coffee making, percolation represents the flow of fluids through porous media, but in the mathematics and physics worlds it generally refers to simplified lattice models of random systems or networks (graphs), and the nature of the connectivity in them. The percolation threshold is the critical value of the occupation probability p, or more generally a critical surface for a group of parameters p1, p2, ..., such that infinite connectivity (percolation) first occurs.

In computer vision, maximally stable extremal regions (MSER) are used as a method of blob detection in images. This technique was proposed by Matas et al. to find correspondences between image elements from two images with different viewpoints. This method of extracting a comprehensive number of corresponding image elements contributes to the wide-baseline matching, and it has led to better stereo matching and object recognition algorithms.

The clique percolation method is a popular approach for analyzing the overlapping community structure of networks. The term network community has no widely accepted unique definition and it is usually defined as a group of nodes that are more densely connected to each other than to other nodes in the network. There are numerous alternative methods for detecting communities in networks, for example, the Girvan–Newman algorithm, hierarchical clustering and modularity maximization.

In the context of the physical and mathematical theory of percolation, a percolation transition is characterized by a set of universal critical exponents, which describe the fractal properties of the percolating medium at large scales and sufficiently close to the transition. The exponents are universal in the sense that they only depend on the type of percolation model and on the space dimension. They are expected to not depend on microscopic details such as the lattice structure, or whether site or bond percolation is considered. This article deals with the critical exponents of random percolation.

<span class="mw-page-title-main">Dependency network</span>

The dependency network approach provides a system level analysis of the activity and topology of directed networks. The approach extracts causal topological relations between the network's nodes, and provides an important step towards inference of causal activity relations between the network nodes. This methodology has originally been introduced for the study of financial data, it has been extended and applied to other systems, such as the immune system, and semantic networks.

Weighted correlation network analysis, also known as weighted gene co-expression network analysis (WGCNA), is a widely used data mining method especially for studying biological networks based on pairwise correlations between variables. While it can be applied to most high-dimensional data sets, it has been most widely used in genomic applications. It allows one to define modules (clusters), intramodular hubs, and network nodes with regard to module membership, to study the relationships between co-expression modules, and to compare the network topology of different networks. WGCNA can be used as a data reduction technique, as a clustering method, as a feature selection method, as a framework for integrating complementary (genomic) data, and as a data exploratory technique. Although WGCNA incorporates traditional data exploratory techniques, its intuitive network language and analysis framework transcend any standard analysis technique. Since it uses network methodology and is well suited for integrating complementary genomic data sets, it can be interpreted as systems biologic or systems genetic data analysis method. By selecting intramodular hubs in consensus modules, WGCNA also gives rise to network based meta analysis techniques.

<span class="mw-page-title-main">Disparity filter algorithm of weighted network</span>

Disparity filter is a network reduction algorithm to extract the backbone structure of undirected weighted network. Many real world networks such as citation networks, food web, airport networks display heavy tailed statistical distribution of nodes' weight and strength. Disparity filter can sufficiently reduce the network without destroying the multi-scale nature of the network. The algorithm is developed by M. Angeles Serrano, Marian Boguna and Alessandro Vespignani.

<span class="mw-page-title-main">Bianconi–Barabási model</span>

The Bianconi–Barabási model is a model in network science that explains the growth of complex evolving networks. This model can explain that nodes with different characteristics acquire links at different rates. It predicts that a node's growth depends on its fitness and can calculate the degree distribution. The Bianconi–Barabási model is named after its inventors Ginestra Bianconi and Albert-László Barabási. This model is a variant of the Barabási–Albert model. The model can be mapped to a Bose gas and this mapping can predict a topological phase transition between a "rich-get-richer" phase and a "winner-takes-all" phase.

Physicists often use various lattices to apply their favorite models in them. For instance, the most favorite lattice is perhaps the square lattice. There are 14 Bravais space lattice where every cell has exactly the same number of nearest, next nearest, nearest of next nearest etc. neighbors and hence they are called regular lattice. Often physicists and mathematicians study phenomena which require disordered lattice where each cell do not have exactly the same number of neighbors rather the number of neighbors can vary wildly. For instance, if one wants to study the spread of disease, viruses, rumors etc. then the last thing one would look for is the square lattice. In such cases a disordered lattice is necessary. One way of constructing a disordered lattice is by doing the following.

<span class="mw-page-title-main">Mediation-driven attachment model</span>

In the scale-free network theory, a mediation-driven attachment (MDA) model appears to embody a preferential attachment rule tacitly rather than explicitly. According to MDA rule, a new node first picks a node from the existing network at random and connect itself not with that but with one of the neighbors also picked at random.

References

  1. Mantegna, R.N. (1999). "Hierarchical structure in financial markets". The European Physical Journal B. Springer Science and Business Media LLC. 11 (1): 193–197. arXiv: cond-mat/9802256 . Bibcode:1999EPJB...11..193M. doi:10.1007/s100510050929. ISSN   1434-6028. S2CID   16976422.
  2. Vandewalle, N. Brisbois, F. and Tordoir, X. (2001). Self-organized critical topology of stock markets. Quantit. Finan(1): 372–375
  3. Bonanno, Giovanni; Caldarelli, Guido; Lillo, Fabrizio; Mantegna, Rosario N. (2003-10-28). "Topology of correlation-based minimal spanning trees in real and model markets". Physical Review E. American Physical Society (APS). 68 (4): 046130. arXiv: cond-mat/0211546 . Bibcode:2003PhRvE..68d6130B. doi:10.1103/physreve.68.046130. ISSN   1063-651X. PMID   14683025. S2CID   16150661.
  4. Onnela, J.-P.; Chakraborti, A.; Kaski, K.; Kertész, J.; Kanto, A. (2003-11-13). "Dynamics of market correlations: Taxonomy and portfolio analysis". Physical Review E. 68 (5): 056110. arXiv: cond-mat/0302546 . Bibcode:2003PhRvE..68e6110O. doi:10.1103/physreve.68.056110. ISSN   1063-651X. PMID   14682849. S2CID   9619753.
  5. Onnela, J.-P.; Chakraborti, A.; Kaski, K.; Kertész, J. (2003). "Dynamic asset trees and Black Monday". Physica A: Statistical Mechanics and Its Applications. 324 (1–2): 247–252. arXiv: cond-mat/0212037 . Bibcode:2003PhyA..324..247O. doi:10.1016/s0378-4371(02)01882-4. ISSN   0378-4371. S2CID   11555914.
  6. Andrew W. Lo Amir E. Khandaniy. (2007). What happened to the quants in August 2007? Preprint.
  7. 1 2 3 Tse, Chi K.; Liu, Jing; Lau, Francis C.M. (2010). "A network perspective of the stock market". Journal of Empirical Finance. Elsevier BV. 17 (4): 659–667. doi:10.1016/j.jempfin.2010.04.008. ISSN   0927-5398.