SaTScan is a software tool that employs scan statistics for the spatial and temporal analysis of clusters of events. [1] [2] [3] [4] The software is trademarked by Martin Kulldorff, and was designed originally for public health and epidemiology to identify clusters of cases in both space (geographical location) and time and to perform statistical analysis to determine if these clusters are significantly different from what would be expected by chance [1] [5] [6] The software provides a user-friendly interface and a range of statistical methods, making it accessible to researchers and practitioners. [1] [7] While not a full Geographic Information System, the outputs from SaTScan can be integrated with software such as ArcGIS or QGIS to visualize and analyze spatial data, and to map the distribution of various phenomena.
SaTScan employs scan statistics to identify clusters of space and time phenomena. [1] Scan statistics use regular shapes (usually circles) of varying sizes to evaluate a study area. [8] [9] Within each circle, the software computes if the phenomena within the circle is significantly different than expected compared to the area outside the circle. [8] [9]
SaTScan can analyze data retrospectively or prospectively. It can look at the data spatially, temporally, or simultaneously incorporate both space and time. [1] SaTScan can incorporate numerous probability models, including Poisson distribution, Bernoulli distribution, Monte Carlo method, and multinomial distribution. [1] [2] [9] Using these, it can look for areas of higher and lower occurrences of phenomena than expected. [1]
Results are output into a variety of formats, including ESRI Shapefile, HTML, and KML. [1]
SaTScan was developed by a group of epidemiologists and statisticians led by Martin Kulldorff, a Swedish biostatistician professor of medicine at Harvard Medical School. [10] Version 1.0 of the software was first released in 1997 and has since become a widely used tool in the field of public health research and practice. [11]
SaTScan was developed in response to a growing need for sophisticated tools to analyze disease outbreaks. [2] Before the development of SaTScan, few tools were available that could effectively analyze the spatial and temporal patterns of disease, making it difficult for public health authorities to respond effectively to outbreaks.
Since its release, SaTScan has been used in many public health research studies, including infectious diseases, cancers, and other conditions. [1] Public health authorities and disease surveillance systems have also adopted the software in many countries, and it has broad applications for other types of data. [2]
SaTScan was used extensively by researchers during the COVID-19 pandemic. [12]
![]() | This section needs expansion. You can help by adding to it. (February 2022) |
SaTScan was originally developed for epidemiology and public health. Since its release, SaTScan has been used in many public health research studies involving GIS, including infectious diseases, cancers, and other conditions. [1] Public health authorities and disease surveillance systems have also adopted the software in many countries. [1]
SaTScan can identify areas of high pest or disease risk, informing crop and livestock management and disease control efforts. [13]
SaTScan can also be adapted and applied to certain astronomical studies, particularly those that involve analyzing spatial and temporal patterns in astronomical data. [2] [14] For example, SaTScan could identify clustering patterns in the distribution of galaxies or other astronomical objects, such as stars. [14]
SaTScan can identify hot spots and patterns in crime data, which can assist law enforcement agencies in allocating resources and developing crime reduction strategies. [2] [15]
SaTScan can identify areas of environmental concern, such as high levels of air pollution or water contamination. [16]
SaTScan can identify areas of high risk for wildlife diseases, which can inform disease management and conservation efforts. [17]
A geographic information system (GIS) consists of integrated computer hardware and software that store, manage, analyze, edit, output, and visualize geographic data. Much of this often happens within a spatial database; however, this is not essential to meet the definition of a GIS. In a broader sense, one may consider such a system also to include human users and support staff, procedures and workflows, the body of knowledge of relevant concepts and methods, and institutional organizations.
Epidemiology is the study and analysis of the distribution, patterns and determinants of health and disease conditions in a defined population.
Global Infectious Diseases and Epidemiology Online Network (GIDEON) is a web-based program for decision support and informatics in the fields of Infectious Diseases and Geographic Medicine. Due to the advancement of both disease research and digital media, print media can no longer follow the dynamics of outbreaks and epidemics as they emerge in "real time." As of 2005, more than 300 generic infectious diseases occur haphazardly in time and space and are challenged by over 250 drugs and vaccines. 1,500 species of pathogenic bacteria, viruses, parasites and fungi have been described. GIDEON works to combat this by creating a diagnosis through geographical indicators, a map of the status of the disease in history, a detailed list of potential vaccines and treatments, and finally listing all the potential species of the disease or outbreak such as bacterial classifications.
Health geography is the application of geographical information, perspectives, and methods to the study of health, disease, and health care. Medical geography, a sub-discipline of, or sister field of health geography, focuses on understanding spatial patterns of health and disease in relation to the natural and social environment. Conventionally, there are two primary areas of research within medical geography: the first deals with the spatial distribution and determinants of morbidity and mortality, while the second deals with health planning, help-seeking behavior, and the provision of health services.
Spatial analysis is any of the formal techniques which studies entities using their topological, geometric, or geographic properties. Spatial analysis includes a variety of techniques using different analytic approaches, especially spatial statistics. It may be applied in fields as diverse as astronomy, with its studies of the placement of galaxies in the cosmos, or to chip fabrication engineering, with its use of "place and route" algorithms to build complex wiring structures. In a more restricted sense, spatial analysis is geospatial analysis, the technique applied to structures at the human scale, most notably in the analysis of geographic data. It may also be applied to genomics, as in transcriptomics data.
GIS or Geographic Information Systems has been an important tool in archaeology since the early 1990s. Indeed, archaeologists were early adopters, users, and developers of GIS and GIScience, Geographic Information Science. The combination of GIS and archaeology has been considered a perfect match, since archaeology often involves the study of the spatial dimension of human behavior over time, and all archaeology carries a spatial component.
Spatial epidemiology is a subfield of epidemiology focused on the study of the spatial distribution of health outcomes; it is closely related to health geography.
Disease Informatics (also infectious disease informatics) studies the knowledge production, sharing, modeling, and management of infectious diseases. It became a more studied field as a by-product of the rapid increases in the amount of biomedical and clinical data widely available, and to meet the demands for useful data analyses of such data.
Disease diffusion occurs when a disease is transmitted to a new location. It implies that a disease spreads, or pours out, from a central source. The idea of showing the spread of disease using a diffusion pattern is relatively modern, compared to earlier methods of mapping disease, which are still used today. According to Rytokonen, the goals of disease mapping are: 1) to describe the spatial variation in disease incidence to formulate an etiological hypothesis; 2) to identify areas of high risk in order to increase prevention; and 3) to provide a map of disease risk for a region for better risk preparedness.
Geographic information systems (GISs) and geographic information science (GIScience) combine computer-mapping capabilities with additional database management and data analysis tools. Commercial GIS systems are very powerful and have touched many applications and industries, including environmental science, urban planning, agricultural applications, and others.
A boundary problem in analysis is a phenomenon in which geographical patterns are differentiated by the shape and arrangement of boundaries that are drawn for administrative or measurement purposes. The boundary problem occurs because of the loss of neighbors in analyses that depend on the values of the neighbors. While geographic phenomena are measured and analyzed within a specific unit, identical spatial data can appear either dispersed or clustered depending on the boundary placed around the data. In analysis with point data, dispersion is evaluated as dependent of the boundary. In analysis with areal data, statistics should be interpreted based upon the boundary.
In statistics, a scan statistic or window statistic is a problem relating to the clustering of randomly positioned points. An example of a typical problem is the maximum size of a cluster of points on a line or the longest series of successes recorded by a moving window of fixed length.
Head/tail breaks is a clustering algorithm for data with a heavy-tailed distribution such as power laws and lognormal distributions. The heavy-tailed distribution can be simply referred to the scaling pattern of far more small things than large ones, or alternatively numerous smallest, a very few largest, and some in between the smallest and largest. The classification is done through dividing things into large and small things around the arithmetic mean or average, and then recursively going on for the division process for the large things or the head until the notion of far more small things than large ones is no longer valid, or with more or less similar things left only. Head/tail breaks is not just for classification, but also for visualization of big data by keeping the head, since the head is self-similar to the whole. Head/tail breaks can be applied not only to vector data such as points, lines and polygons, but also to raster data like digital elevation model (DEM).
Spatiotemporal patterns are patterns that occur in a wide range of natural phenoma and are characterized by a spatial and temporal patterning. The general rules of pattern formation hold. In contrast to "static", pure spatial patterns, the full complexity of spatiotemporal patterns can only be recognized over time. Any kind of traveling wave is a good example of a spatiotemporal pattern. Besides the shape and amplitude of the wave, its time-varying position in space is an essential part of the entire pattern.
Data mining, the process of discovering patterns in large data sets, has been used in many applications.
A land use regression model is an algorithm often used for analyzing pollution, particularly in densely populated areas.
Martin Kulldorff is a Swedish biostatistician. He was a professor of medicine at Harvard Medical School from 2003 until his dismissal in 2024. He is a member of the US Food and Drug Administration's Drug Safety and Risk Management Advisory Committee and a former member of the Vaccine Safety Subgroup of the Advisory Committee on Immunization Practices at the Centers for Disease Control and Prevention.
Acoustic epidemiology refers to the study of the determinants and distribution of disease. It also refers to the analysis of sounds produced by the body through a single tool or a combination of diagnostic tools.
The uncertain geographic context problem or UGCoP is a source of statistical bias that can significantly impact the results of spatial analysis when dealing with aggregate data. The UGCoP is very closely related to the Modifiable areal unit problem (MAUP), and like the MAUP, arises from how we divide the land into areal units. It is caused by the difficulty, or impossibility, of understanding how phenomena under investigation in different enumeration units interact between enumeration units, and outside of a study area over time. It is particularly important to consider the UGCoP within the discipline of time geography, where phenomena under investigation can move between spatial enumeration units during the study period. Examples of research that needs to consider the UGCoP include food access and human mobility.
The Modified Temporal Unit Problem (MTUP) is a source of statistical bias that occurs in time series and spatial analysis when using temporal data that has been aggregated into temporal units. In such cases, choosing a temporal unit can affect the analysis results and lead to inconsistencies or errors in statistical hypothesis testing.