Urban scaling [1] is an area of research within the study of cities as complex systems. It examines how various urban indicators change systematically with city size.
The literature on urban scaling was motivated by the success of scaling theory in biology, itself motivated in turn by the success of scaling in physics. [2] [3] Crucial insights from scaling analysis applied to a system can emerge from finding power-law function relationships between variables of interest and the size of the system (as opposed to finding power-law probability distributions). Power-laws have an implicit self-similarity which suggests universal mechanisms at work, which in turn support the search for fundamental laws. [3] The study of power-laws is closely linked to the study of critical phenomena in physics, in which emergent properties and scale invariance are central and organizing concepts. These concepts resurface in the study of complex systems, [4] [5] and are of particular importance in the urban scaling framework.
The phenomenon of scaling in biology is often referred to as allometric scaling. Some of these relationships were studied by Galileo (e.g., in terms of the area width of animals' legs as a function of their mass) and then studied a century ago by Max Kleiber (see Kleiber's law) in terms of the relationship between basal metabolic rate and mass. A theoretical explanation of allometric scaling laws in biology was provided by the Metabolic Scaling Theory. [2]
The application of scaling in the context of cities is inspired by the idea that, in cities, urban activities are emergent phenomena arising from the interactions of many individuals in close physical proximity. This is in contrast to applying scaling to countries or other social group delineations, which are more ad-hoc sociological constructions. The expectation is that collective effects in cities should result in the form of large-scale quantitative urban regularities that ought to hold across cultures, countries and history. If such regularities are observed, then it would support the search for a general mathematical theory of cities. [6]
Indeed, Luis Bettencourt, Geoffrey West, and Jose Lobo's seminal work [7] demonstrated that many urban indicators are associated with population size through a power-law relationship, in which socio-economic quantities tend to scale superlinearly, [8] while measures of infrastructure (such as the number of gas stations) scale sublinearly with population size. [9] They argue for a quantitative, predictive framework to understand cities as collective wholes, guiding urban policy, improving sustainability, and managing urban growth. [1]
The literature has grown, with many theoretical explanations for these emergent power-laws. Ribeiro and Rybski summarized these in their paper "Mathematical models to explain the origin of urban scaling laws". [10] Examples include Arbesman et al.'s 2009 model, [11] Bettencourt's 2013 model, [12] Gomez-Lievano et al.'s 2017 model, [13] and Yang et al.'s 2019 model, [14] among others (see for a more thorough review of the models [10] ). The ultimate explanation of scaling laws observed in cities is still debated. [15] [16]
where is the urban indicator, is a constant, is the population size, and is the scaling exponent.
The key focus of Urban Scaling as a field (in contrast with other fields [see "Economics" and "Sociology" sections below]) is the emphasis on studying the origin and explanation of particular values of the scaling exponents. While other fields have recognized a relationship between size and urban metrics, it is mainly researchers in the field of Urban Scaling who have been interested in the fact that, from all the possible relationships two variables can be related, and all the coefficients that can mediate the strength of the relationship, urban metrics and population size are related through power-laws and the exponents can be slightly below 1 or slightly above 1.
The urban scaling framework mostly focuses on cross-sectional relationships. That is, it describes the power-law relationship between urban metrics across many cities for a particular point in time.
The framework can be extended to understand whether a given city will follow or will deviate from the power-law relationship describing the whole urban system.
Assume cities in an urban system, and assume their populations grow exponentially with a fixed and constant rate , . Assume cities generate some type of output , which also grows exponentially, but with another rate , such that . Here, and represent the population and the output at time , respectively. Together, these two assumptions imply that
In turn, this yields the following implicit power-law relationship between output and population:
where .
That is, if population size and output grow exponentially at different rates, they will be longitudinally related through a power-law for any single city . Furthermore, if the ratio between the initial output and population size is a constant independent of the city, , then the same power-law will describe the cross-sectional data, since the proportionality factor and the exponent in the power-law will not depend on the subscript . This is a very simple example in which urban scaling would arise both in time and in space, with a scaling exponent equal to the ratio of the growth rates. The relationship between temporal and cross-sectional scaling can be made more general.
The question of whether there is a relationship between temporal scaling and cross-sectional scaling is addressed by noting that the outcome variable is a function of population size and time (with perhaps some random noise), . There is a certain debate in the published literature on this topic, due to a lack of explicit definitions about what scaling means in time and in space.
Here, the following three relationships and definitions are assumed:
Note that the longitudinal scaling exponent is the ratio of two partial derivatives with respect to time (i.e., holding size constant for ), while the cross-sectional scaling exponent is the ratio of two partial derivatives with respect to size (i.e., holding time constant).
For clarity and convenience, let and , and drop the city-specific subscript . Using the above conventions, the total derivative of with respect to time is
.
Since is a function of time only, then . Hence, dividing on both sides by , we conclude that
.
Based on this, one can interpret to be a "total" urban scaling exponent, and thus
.
However, since is a function of time only (), both and change simultaneously over time. This interdependence makes it impossible to hold constant while observing changes in , which is necessary to directly estimate the longitudinal exponent from empirical data. Consequently, only the total scaling exponent and the cross-sectional exponent can be empirically estimated, while the longitudinal exponent remains unobservable in practice due to the confounding effect of 's dependency on time.
Some early studies in economics can be seen to have contributed to early stages of the urban scaling literature (unintendedly) by their analyses of how economic outcomes change with population size. One such study is Sveikauskas' 1975 "The productivity of cities", [21] in which he reports a positive association between the average productivity or workers and city population size.
Today, the field of urban economics is focused on understanding the causal underpinnings of the benefits that accrue when people come together in physical space. Hence, a big body of literature has been focused on understanding the so-called "urban wage premium", which is the fact that nominal wages tend to be larger in larger cities.
The field of sociology has also investigated the relationship between socioeconomic variables and the size and density of populations.
For example, Émile Durkheim, a French sociologist, highlighted the sociological impacts of population density and growth in his 1893 dissertation, "The Division of Labour in Society." In his work, Durkheim emphasized the collective social effects of population. He proposed that an increase in population leads to more social interactions, resulting in competition, specialization, and eventually conflict, which then necessitates the development of social norms and integration. This concept, known as "dynamic density," was later expanded by American sociologist Louis Wirth, particularly in the context of urban settings. However, it wasn't until the 1970s that these ideas were translated into (sociological) mathematical models, sparking debates among sociologists about the complexities of urban agglomeration. [22] [23] [24]
Critics like Claude S. Fischer argued that mathematical models oversimplified the reality of social interactions in cities. Fischer contended that these models assumed urbanites interact randomly, akin to marbles in a jar, which fails to capture the nuanced and localized nature of city life. He pointed out that most city dwellers have limited interactions within their neighborhoods and rarely venture into other parts of the city, contradicting the notion that social interactions scale uniformly with population size. Fischer’s criticism emphasized the need for a deeper understanding of social systems, beyond mere quantitative models. [25]
Since the formulation of the urban scaling hypothesis, several researchers from the complexity field have criticized the framework and its approach. These criticisms often target the statistical methods used, suggesting that the relationship between economic output and city size may not be a power law. For instance, Shalizi (2011) [26] argues that other functions could fit the relationship between urban characteristics and population equally well, challenging the notion of scale invariance. Bettencourt et al. (2013) [27] responded that while other models might fit the data, the power-law hypothesis remains robust without a better theoretical alternative.
Other critiques by Leitão et al. (2016) [28] and Altmann (2020) [29] pointed out potential misspecifications in the statistical analysis, such as incorrect distribution assumptions and the independence of observations. These concerns highlight the need for theory to guide the choice of statistical methods. Additionally, the issue of defining city boundaries raises conceptual challenges. Arcaute et al. (2015) [30] and subsequent studies showed that different boundary definitions yield different scaling exponents, questioning the premise of agglomeration economies. They suggest that models should consider the intra-city composition of economic and social activities rather than relying solely on aggregate measures.
Another criticism of the urban scaling approach relates to the over-reliance on averages in measuring individual-level quantities such as average wages, or average number of patents produced. Complex systems, such as cities, exhibit distributions of their individual components that are often heavy-tailed. Heavy-tailed distributions are very different from normal distributions, and tend to generate extremely large values. The presence of extreme outliers can invalidate the Law of Large Numbers, making averages unreliable. Gomez-Lievano et al. (2021) [31] showed that in log-normally distributed urban quantities (such as wages), averages only make sense for sufficiently large cities. Otherwise, artificial correlations between city size and productivity can emerge, misleadingly suggesting the appearance of urban scaling.
In statistics, a power law is a functional relationship between two quantities, where a relative change in one quantity results in a relative change in the other quantity proportional to the change raised to a constant exponent: one quantity varies as a power of another. The change is independent of the initial size of those quantities.
In probability theory and statistics, the exponential distribution or negative exponential distribution is the probability distribution of the distance between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate; the distance parameter could be any meaningful mono-dimensional measure of the process, such as time between production errors, or length along a roll of fabric in the weaving manufacturing process. It is a particular case of the gamma distribution. It is the continuous analogue of the geometric distribution, and it has the key property of being memoryless. In addition to being used for the analysis of Poisson point processes it is found in various other contexts.
In probability theory and statistics, the geometric distribution is either one of two discrete probability distributions:
The Pareto distribution, named after the Italian civil engineer, economist, and sociologist Vilfredo Pareto, is a power-law probability distribution that is used in description of social, quality control, scientific, geophysical, actuarial, and many other types of observable phenomena; the principle originally applied to describing the distribution of wealth in a society, fitting the trend that a large portion of wealth is held by a small fraction of the population. The Pareto principle or "80-20 rule" stating that 80% of outcomes are due to 20% of causes was named in honour of Pareto, but the concepts are distinct, and only Pareto distributions with shape value of log45 ≈ 1.16 precisely reflect it. Empirical observation has shown that this 80-20 distribution fits a wide range of cases, including natural phenomena and human activities.
In probability theory and statistics, the Weibull distribution is a continuous probability distribution. It models a broad range of random variables, largely in the nature of a time to failure or time between events. Examples are maximum one-day rainfalls and the time a user spends on a web page.
In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval [0, 1] or in terms of two positive parameters, denoted by alpha (α) and beta (β), that appear as exponents of the variable and its complement to 1, respectively, and control the shape of the distribution.
In probability theory and statistics, the gamma distribution is a versatile two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the gamma distribution. There are two equivalent parameterizations in common use:
In statistics, the logistic model is a statistical model that models the log-odds of an event as a linear combination of one or more independent variables. In regression analysis, logistic regression estimates the parameters of a logistic model. In binary logistic regression there is a single binary dependent variable, coded by an indicator variable, where the two values are labeled "0" and "1", while the independent variables can each be a binary variable or a continuous variable. The corresponding probability of the value labeled "1" can vary between 0 and 1, hence the labeling; the function that converts log-odds to probability is the logistic function, hence the name. The unit of measurement for the log-odds scale is called a logit, from logistic unit, hence the alternative names. See § Background and § Definition for formal mathematics, and § Example for a worked example.
In probability theory and statistics, the Gumbel distribution is used to model the distribution of the maximum of a number of samples of various distributions.
The Ising model, named after the physicists Ernst Ising and Wilhelm Lenz, is a mathematical model of ferromagnetism in statistical mechanics. The model consists of discrete variables that represent magnetic dipole moments of atomic "spins" that can be in one of two states. The spins are arranged in a graph, usually a lattice, allowing each spin to interact with its neighbors. Neighboring spins that agree have a lower energy than those that disagree; the system tends to the lowest energy but heat disturbs this tendency, thus creating the possibility of different structural phases. The model allows the identification of phase transitions as a simplified model of reality. The two-dimensional square-lattice Ising model is one of the simplest statistical models to show a phase transition.
In physics, a partition function describes the statistical properties of a system in thermodynamic equilibrium. Partition functions are functions of the thermodynamic state variables, such as the temperature and volume. Most of the aggregate thermodynamic variables of the system, such as the total energy, free energy, entropy, and pressure, can be expressed in terms of the partition function or its derivatives. The partition function is dimensionless.
In economics and econometrics, the Cobb–Douglas production function is a particular functional form of the production function, widely used to represent the technological relationship between the amounts of two or more inputs and the amount of output that can be produced by those inputs. The Cobb–Douglas form was developed and tested against statistical evidence by Charles Cobb and Paul Douglas between 1927 and 1947; according to Douglas, the functional form itself was developed earlier by Philip Wicksteed.
The Lotka–Volterra equations, also known as the Lotka–Volterra predator–prey model, are a pair of first-order nonlinear differential equations, frequently used to describe the dynamics of biological systems in which two species interact, one as a predator and the other as prey. The populations change through time according to the pair of equations:
In probability theory and statistics, the logistic distribution is a continuous probability distribution. Its cumulative distribution function is the logistic function, which appears in logistic regression and feedforward neural networks. It resembles the normal distribution in shape but has heavier tails. The logistic distribution is a special case of the Tukey lambda distribution.
The classical XY model is a lattice model of statistical mechanics. In general, the XY model can be seen as a specialization of Stanley's n-vector model for n = 2.
In probability theory and statistics, the inverse gamma distribution is a two-parameter family of continuous probability distributions on the positive real line, which is the distribution of the reciprocal of a variable distributed according to the gamma distribution.
In information theory, the cross-entropy between two probability distributions and , over the same underlying set of events, measures the average number of bits needed to identify an event drawn from the set when the coding scheme used for the set is optimized for an estimated probability distribution , rather than the true distribution .
The Avrami equation describes how solids transform from one phase to another at constant temperature. It can specifically describe the kinetics of crystallisation, can be applied generally to other changes of phase in materials, like chemical reaction rates, and can even be meaningful in analyses of ecological systems.
In non ideal fluid dynamics, the Hagen–Poiseuille equation, also known as the Hagen–Poiseuille law, Poiseuille law or Poiseuille equation, is a physical law that gives the pressure drop in an incompressible and Newtonian fluid in laminar flow flowing through a long cylindrical pipe of constant cross section. It can be successfully applied to air flow in lung alveoli, or the flow through a drinking straw or through a hypodermic needle. It was experimentally derived independently by Jean Léonard Marie Poiseuille in 1838 and Gotthilf Heinrich Ludwig Hagen, and published by Hagen in 1839 and then by Poiseuille in 1840–41 and 1846. The theoretical justification of the Poiseuille law was given by George Stokes in 1845.
In statistics, the variance function is a smooth function that depicts the variance of a random quantity as a function of its mean. The variance function is a measure of heteroscedasticity and plays a large role in many settings of statistical modelling. It is a main ingredient in the generalized linear model framework and a tool used in non-parametric regression, semiparametric regression and functional data analysis. In parametric modeling, variance functions take on a parametric form and explicitly describe the relationship between the variance and the mean of a random quantity. In a non-parametric setting, the variance function is assumed to be a smooth function.