Network Science Based Basketball Analytics

Last updated

Network Science based basketball analytics comprise a various recent attempts to apply the perspective of networks to the analysis of basketball.

Contents

Overview

Traditional basketball statistics analyze individuals independently of their teammates or competitors and traditional player positions are determined by individual attributes. In contrast, these network based analytics are obtained by constructing a team or league level player networks, where individual players are nodes connected by the ball movement or by some measure of similarity. Then, the metrics are obtained by calculating network properties, such as degree, density, centrality, clustering, distance etc. This approach enriches the analysis of basketball with new individual and team level statistics and offers a new way of assigning position to a player.

Team level statistics

The biggest contribution to the team level metrics came from Arizona State University researchers led by Jennifer H. Fewell. Using 2010 NBA first round playoff data, they constructed the networks for each team using players as nodes and ball movement between them as links. They distinguish the trade-off between not necessarily mutually exclusive division of labor and team's unpredictability that are measured by Uphill downhill flux and Team entropy respectively. [1]

Team entropy - A measure of unpredictability and variation in teams offense, higher entropy meaning more variation. It is calculated as aggregated individual Shannon entropies, where unpredictability is measured as uncertainty of the ball movement between any two nodes. [1]

Uphill downhill flux - Measures the division of labor, or the expertise in moving the ball to the player with the best shooting percentage. According to Fewell et al. It can be interpreted as an average change in potential shooting percentage per pass. [1] The metric is calculated as a sum of the differences between the shooting percentages of the nodes at the ends of each edge

,

where p ij is the probability of the link between players i and j, xi and xj are their shooting percentages. [1]

Other measures include:

Team clustering coefficient - A direct application of a clustering coefficient. It measures how interconnected are the players, whether the ball moves via one node or whether in many ways between all the players. [1]

Team degree centrality - Similarly to the previous metric, it measures if there is one dominant player in the team. It is calculated by the formula

where deg(v) is the degree of the node v, deg(v*) is the highest degree node, V is the number of nodes.

Combined low clustering and high degree centrality mean that the defense can put double team on the dominant player, since without him ball team experiences problems in moving the ball. [1]

Average path length - Number of passes per play. [1]

Path flow rate - Number of passes per unit time. It measures how quickly the team moves the ball. [1]

Deviation from maximum operating potential - Using players as the nodes and ball movement and links and true shooting percentage as efficiency, analogy can by made to the traffic network. Each individual is assumed to have a skill curve f(x), which is declining in the number shots taken. Individual maximization of the efficiency yield whereas to maximum efficiency is achieved by solving , where

The difference between these two constitute the teams deviation from the maximum potential. [2]

Individual statistics

Success/Failure Ratio - The number of times the player (node) was involved in the successful play divided by the number of times the player was involved in the unsuccessful play. The metric is obtained from the team play by play network. [1]

Under/over performance - The metric is calculated by mapping the bipartite player network. Players are connected if they were a part of one team. The links are weighted by how successful was the team, where the players played together. Then node centrality measures are compared to the reference centrality distributions for each node obtained by bootstrap - based randomization procedures and p - values are calculated. For example p - value of player i is given by :

, where πi* is the reference centrality score, πi0 is the calculated centrality score, J - number of iterations. High p - values indicate under-performance, low - over-performance. [3]

Under - utilization - A player is under-utilized by the time if he has a low degree centrality, but is over-performing [3]

Player positions

New basketball positions were classified by Stanford University student Muthu Alagappan, who while working for the data visualization company Ayasdi, mapped the network of one season NBA players linking them by the similarity of their statistics. Then, based on node clusters players were grouped into 13 positions. [4]

Offensive Ball-Handler Player that specializes in scoring and ball handling, but has low averages of steals and blocks.

Defensive Ball-Handler Player who specialized in assisting and stealing the ball, but is average in scoring and shooting.

Combo Ball-Handler Player who are above average in both offense and defense, but doesn't excel in any.

Shooting Ball-Handler Player that is above average in shot attempts and points scored per game.

Role-Playing Ball-Handler Those who play few minutes and don't have large impact on the team.

3-Point Rebounder A big man and a ball handler with above average rebounds and three point shots attempted and made.

Scoring Rebounder Player with high scoring and rebound averages.

Paint Protector Those valued for blocking and rebounding, but with low average points scored.

Scoring Paint Protector Players that at both good and offense and defense in the paint.

NBA 1st-Team Those with above averages in most of the statistical categories.

NBA 2nd-Team Similar, but a bit worse than NBA 1st-Team players.

Role Player Similar, but worse than NBA 2nd-Team players.

One-of-a-Kind Ones that are so good and exceptional that could not be categorized. [4]

See also

Network science

Graph theory

Muthu Alagappan

APBRmetrics

Related Research Articles

<span class="mw-page-title-main">Entropy (information theory)</span> Expected amount of information needed to specify the output of a stochastic data source

In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable , which takes values in the alphabet and is distributed according to , the entropy is

<span class="mw-page-title-main">Mutual information</span> Measure of dependence between two variables

In probability theory and information theory, the mutual information (MI) of two random variables is a measure of the mutual dependence between the two variables. More specifically, it quantifies the "amount of information" obtained about one random variable by observing the other random variable. The concept of mutual information is intimately linked to that of entropy of a random variable, a fundamental notion in information theory that quantifies the expected "amount of information" held in a random variable.

In information geometry, the Fisher information metric is a particular Riemannian metric which can be defined on a smooth statistical manifold, i.e., a smooth manifold whose points are probability measures defined on a common probability space. It can be used to calculate the informational difference between measurements.

Decision tree learning is a supervised learning approach used in statistics, data mining and machine learning. In this formalism, a classification or regression decision tree is used as a predictive model to draw conclusions about a set of observations.

In mathematical statistics, the Fisher information is a way of measuring the amount of information that an observable random variable X carries about an unknown parameter θ of a distribution that models X. Formally, it is the variance of the score, or the expected value of the observed information.

In mathematical statistics, the Kullback–Leibler (KL) divergence, denoted , is a type of statistical distance: a measure of how one probability distribution P is different from a second, reference probability distribution Q. A simple interpretation of the KL divergence of P from Q is the expected excess surprise from using Q as a model when the actual distribution is P. While it is a measure of how different two distributions are, and in some sense is thus a "distance", it is not actually a metric, which is the most familiar and formal type of distance. In particular, it is not symmetric in the two distributions, and does not satisfy the triangle inequality. Instead, in terms of information geometry, it is a type of divergence, a generalization of squared distance, and for certain classes of distributions, it satisfies a generalized Pythagorean theorem.

Feature selection is the process of selecting a subset of relevant features for use in model construction. Stylometry and DNA microarray analysis are two cases where feature selection is used. It should be distinguished from feature extraction.

<span class="mw-page-title-main">Centrality</span> Degree of connectedness within a graph

In graph theory and network analysis, indicators of centrality assign numbers or rankings to nodes within a graph corresponding to their network position. Applications include identifying the most influential person(s) in a social network, key infrastructure nodes in the Internet or urban networks, super-spreaders of disease, and brain networks. Centrality concepts were first developed in social network analysis, and many of the terms used to measure centrality reflect their sociological origin.

The player efficiency rating (PER) is John Hollinger's all-in-one basketball rating, which attempts to collect or boil down all of a player's contributions into one number. Using a detailed formula, Hollinger developed a system that rates every player's statistical performance.

In anonymity networks, it is important to be able to measure quantitatively the guarantee that is given to the system. The degree of anonymity is a device that was proposed at the 2002 Privacy Enhancing Technology (PET) conference. Two papers put forth the idea of using entropy as the basis for formally measuring anonymity: "Towards an Information Theoretic Metric for Anonymity", and "Towards Measuring Anonymity". The ideas presented are very similar with minor differences in the final definition of .

In probability theory and statistics, the Jensen–Shannon divergence is a method of measuring the similarity between two probability distributions. It is also known as information radius (IRad) or total divergence to the average. It is based on the Kullback–Leibler divergence, with some notable differences, including that it is symmetric and it always has a finite value. The square root of the Jensen–Shannon divergence is a metric often referred to as Jensen–Shannon distance.

In mathematics, the topological entropy of a topological dynamical system is a nonnegative extended real number that is a measure of the complexity of the system. Topological entropy was first introduced in 1965 by Adler, Konheim and McAndrew. Their definition was modelled after the definition of the Kolmogorov–Sinai, or metric entropy. Later, Dinaburg and Rufus Bowen gave a different, weaker definition reminiscent of the Hausdorff dimension. The second definition clarified the meaning of the topological entropy: for a system given by an iterated function, the topological entropy represents the exponential growth rate of the number of distinguishable orbits of the iterates. An important variational principle relates the notions of topological and measure-theoretic entropy.

Rubber elasticity refers to a property of crosslinked rubber: it can be stretched by up to a factor of 10 from its original length and, when released, returns very nearly to its original length. This can be repeated many times with no apparent degradation to the rubber. Rubber is a member of a larger class of materials called elastomers and it is difficult to overestimate their economic and technological importance. Elastomers have played a key role in the development of new technologies in the 20th century and make a substantial contribution to the global economy. Rubber elasticity is produced by several complex molecular processes and its explanation requires a knowledge of advanced mathematics, chemistry and statistical physics, particularly the concept of entropy. Entropy may be thought of as a measure of the thermal energy that is stored in a molecule. Common rubbers, such as polybutadiene and polyisoprene, are produced by a process called polymerization. Very long molecules (polymers) are built up sequentially by adding short molecular backbone units through chemical reactions. A rubber polymer follows a random, zigzag path in three dimensions, intermingling with many other rubber molecules. An elastomer is created by the addition of a few percent of a cross linking molecule such as sulfur. When heated, the crosslinking molecule causes a reaction that chemically joins (bonds) two of the rubber molecules together at some point. Because the rubber molecules are so long, each one participates in many crosslinks with many other rubber molecules forming a continuous molecular network. As a rubber band is stretched, some of the network chains are forced to become straight and this causes a decrease in their entropy. It is this decrease in entropy that gives rise to the elastic force in the network chains.

<span class="mw-page-title-main">Assortativity</span> Tendency for similar nodes to be connected

Assortativity, or assortative mixing, is a preference for a network's nodes to attach to others that are similar in some way. Though the specific measure of similarity may vary, network theorists often examine assortativity in terms of a node's degree. The addition of this characteristic to network models more closely approximates the behaviors of many real world networks.

The partition function or configuration integral, as used in probability theory, information theory and dynamical systems, is a generalization of the definition of a partition function in statistical mechanics. It is a special case of a normalizing constant in probability theory, for the Boltzmann distribution. The partition function occurs in many problems of probability theory because, in situations where there is a natural symmetry, its associated probability measure, the Gibbs measure, has the Markov property. This means that the partition function occurs not only in physical systems with translation symmetry, but also in such varied settings as neural networks, and applications such as genomics, corpus linguistics and artificial intelligence, which employ Markov networks, and Markov logic networks. The Gibbs measure is also the unique measure that has the property of maximizing the entropy for a fixed expectation value of the energy; this underlies the appearance of the partition function in maximum entropy methods and the algorithms derived therefrom.

<span class="mw-page-title-main">Network science</span> Academic field

Network science is an academic field which studies complex networks such as telecommunication networks, computer networks, biological networks, cognitive and semantic networks, and social networks, considering distinct elements or actors represented by nodes and the connections between the elements or actors as links. The field draws on theories and methods including graph theory from mathematics, statistical mechanics from physics, data mining and information visualization from computer science, inferential modeling from statistics, and social structure from sociology. The United States National Research Council defines network science as "the study of network representations of physical, biological, and social phenomena leading to predictive models of these phenomena."

The volume entropy is an asymptotic invariant of a compact Riemannian manifold that measures the exponential growth rate of the volume of metric balls in its universal cover. This concept is closely related with other notions of entropy found in dynamical systems and plays an important role in differential geometry and geometric group theory. If the manifold is nonpositively curved then its volume entropy coincides with the topological entropy of the geodesic flow. It is of considerable interest in differential geometry to find the Riemannian metric on a given smooth manifold which minimizes the volume entropy, with locally symmetric spaces forming a basic class of examples.

<span class="mw-page-title-main">True shooting percentage</span> Basketball statistic

In basketball, true shooting percentage is an advanced statistic that measures a player's efficiency at shooting the ball. It is intended to more accurately calculate a player's shooting than field goal percentage, free throw percentage, and three-point field goal percentage taken individually. Two- and three-point field goals and free throws are all considered in its calculation. It is abbreviated TS%.

In the context of complex networks, attack tolerance is the network's robustness meaning the ability to maintain the overall connectivity and diameter of the network as nodes are removed. Several graph metrics have been proposed to predicate network robustness. Algebraic connectivity is a graph metric that shows the best graph robustness among them.

In network science, the network entropy is a disorder measure derived from information theory to describe the level of randomness and the amount of information encoded in a graph. It is a relevant metric to quantitatively characterize real complex networks and can also be used to quantify network complexity

References

  1. 1 2 3 4 5 6 7 8 9 Fewell J.H., Armbruster D, Ingraham J, Petersen A, Waters JS (2012) Basketball Teams as Strategic Networks. PLoS ONE 7(11): e47445. doi:10.1371/journal.pone.0047445 http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0047445
  2. Brian Skinner (2011) The Price of Anarchy in Basketball, Journal of Quantitative Analysis in Sports 6(1), 3 (2010), https://arxiv.org/abs/0908.1801v4
  3. 1 2 Piette, J, Pham, L. and Anand, S. (2011) “Evaluating Basketball Player Performance via Statistical Network Modeling,” in Sloan Sports Analytics Conference, (Boston, U.S.A.), http://www.sloansportsconference.com/wp-content/uploads/2011/08/Evaluating-Basketball-Player-Performance-via-Statistical-Network-Modeling.pdf
  4. 1 2 Jeff Beckham "Analytics Reveal 13 New Basketball" Positions, "https://www.wired.com" 04.30.2012