SaTScan

Last updated

SaTScan is a software tool that employs scan statistics for the spatial and temporal analysis of clusters of events. [1] [2] [3] [4] The software is trademarked by Martin Kulldorff, and was designed originally for public health and epidemiology to identify clusters of cases in both space (geographical location) and time and to perform statistical analysis to determine if these clusters are significantly different from what would be expected by chance [1] [5] [6] The software provides a user-friendly interface and a range of statistical methods, making it accessible to researchers and practitioners. [1] [7] While not a full Geographic Information System, the outputs from SaTScan can be integrated with software such as ArcGIS or QGIS to visualize and analyze spatial data, and to map the distribution of various phenomena.

Contents

Analysis

SaTScan employs scan statistics to identify clusters of space and time phenomena. [1] Scan statistics use regular shapes (usually circles) of varying sizes to evaluate a study area. [8] [9] Within each circle, the software computes if the phenomena within the circle is significantly different than expected compared to the area outside the circle. [8] [9]

SaTScan can analyze data retrospectively or prospectively. It can look at the data spatially, temporally, or simultaneously incorporate both space and time. [1] SaTScan can incorporate numerous probability models, including Poisson distribution, Bernoulli distribution, Monte Carlo method, and multinomial distribution. [1] [2] [9] Using these, it can look for areas of higher and lower occurrences of phenomena than expected. [1]

Results are output into a variety of formats, including ESRI Shapefile, HTML, and KML. [1]

History

SaTScan was developed by a group of epidemiologists and statisticians led by Martin Kulldorff, a Swedish biostatistician professor of medicine at Harvard Medical School. [10] Version 1.0 of the software was first released in 1997 and has since become a widely used tool in the field of public health research and practice. [11]

SaTScan was developed in response to a growing need for sophisticated tools to analyze disease outbreaks. [2] Before the development of SaTScan, few tools were available that could effectively analyze the spatial and temporal patterns of disease, making it difficult for public health authorities to respond effectively to outbreaks.

Since its release, SaTScan has been used in many public health research studies, including infectious diseases, cancers, and other conditions. [1] Public health authorities and disease surveillance systems have also adopted the software in many countries, and it has broad applications for other types of data. [2]

SaTScan was used extensively by researchers during the COVID-19 pandemic. [12]

Applications

Epidemiology

SaTScan was originally developed for epidemiology and public health. Since its release, SaTScan has been used in many public health research studies involving GIS, including infectious diseases, cancers, and other conditions. [1] Public health authorities and disease surveillance systems have also adopted the software in many countries. [1]

Agriculture

SaTScan can identify areas of high pest or disease risk, informing crop and livestock management and disease control efforts. [13]

Astronomy

SaTScan can also be adapted and applied to certain astronomical studies, particularly those that involve analyzing spatial and temporal patterns in astronomical data. [2] [14] For example, SaTScan could identify clustering patterns in the distribution of galaxies or other astronomical objects, such as stars. [14]

Criminology

SaTScan can identify hot spots and patterns in crime data, which can assist law enforcement agencies in allocating resources and developing crime reduction strategies. [2] [15]

Environmental monitoring

SaTScan can identify areas of environmental concern, such as high levels of air pollution or water contamination. [16]

Wildlive surveillance

SaTScan can identify areas of high risk for wildlife diseases, which can inform disease management and conservation efforts. [17]

See also

Related Research Articles

<span class="mw-page-title-main">Geographic information system</span> System to capture, manage and present geographic data

A geographic information system (GIS) consists of integrated computer hardware and software that store, manage, analyze, edit, output, and visualize geographic data. Much of this often happens within a spatial database, however, this is not essential to meet the definition of a GIS. In a broader sense, one may consider such a system also to include human users and support staff, procedures and workflows, the body of knowledge of relevant concepts and methods, and institutional organizations.

Mathematical models can project how infectious diseases progress to show the likely outcome of an epidemic and help inform public health and plant health interventions. Models use basic assumptions or collected statistics along with mathematics to find parameters for various infectious diseases and use those parameters to calculate the effects of different interventions, like mass vaccination programs. The modelling can help decide which intervention(s) to avoid and which to trial, or can predict future growth patterns, etc.

<span class="mw-page-title-main">Health geography</span>

Health geography is the application of geographical information, perspectives, and methods to the study of health, disease, and health care. Medical geography, a sub-discipline of or sister field of health geography, focuses on understanding spatial patterns of health and disease as related to the natural and social environment. Conventionally, there are two primary areas of research within medical geography: the first deals with the spatial distribution and determinants of morbidity and mortality, while the second deals with health planning, help-seeking behavior, and the provision of health services.

<span class="mw-page-title-main">Spatial analysis</span> Formal techniques which study entities using their topological, geometric, or geographic properties

Spatial analysis is any of the formal techniques which studies entities using their topological, geometric, or geographic properties. Spatial analysis includes a variety of techniques using different analytic approaches, especially spatial statistics. It may be applied in fields as diverse as astronomy, with its studies of the placement of galaxies in the cosmos, or to chip fabrication engineering, with its use of "place and route" algorithms to build complex wiring structures. In a more restricted sense, spatial analysis is geospatial analysis, the technique applied to structures at the human scale, most notably in the analysis of geographic data. It may also be applied to genomics, as in transcriptomics data.

GIS or Geographic Information Systems has been an important tool in archaeology since the early 1990s. Indeed, archaeologists were early adopters, users, and developers of GIS and GIScience, Geographic Information Science. The combination of GIS and archaeology has been considered a perfect match, since archaeology often involves the study of the spatial dimension of human behavior over time, and all archaeology carries a spatial component.

Spatial epidemiology is a subfield of epidemiology focused on the study of the spatial distribution of health outcomes; it is closely related to health geography.

Disease Informatics (also infectious disease informatics) studies the knowledge production, sharing, modeling, and management of infectious diseases. It became a more studied field as a by-product of the rapid increases in the amount of biomedical and clinical data widely available, and to meet the demands for useful data analyses of such data.

Disease diffusion occurs when a disease is transmitted to a new location. It implies that a disease spreads, or pours out, from a central source. The idea of showing the spread of disease using a diffusion pattern is relatively modern, compared to earlier methods of mapping disease, which are still used today. According to Rytokonen, the goals of disease mapping are: 1) to describe the spatial variation in disease incidence to formulate an etiological hypothesis; 2) to identify areas of high risk in order to increase prevention; and 3) to provide a map of disease risk for a region for better risk preparedness.

Geographic information systems (GISs) and geographic information science (GIScience) combine computer-mapping capabilities with additional database management and data analysis tools. Commercial GIS systems are very powerful and have touched many applications and industries, including environmental science, urban planning, agricultural applications, and others.

<span class="mw-page-title-main">Dot distribution map</span> Thematic map using dots to visualize distribution

A dot distribution map is a type of thematic map that uses a point symbol to visualize the geographic distribution of a large number of related phenomena. Dot maps are a type of unit visualizations that rely on a visual scatter to show spatial patterns, especially variances in density. The dots may represent the actual locations of individual phenomena, or be randomly placed in aggregation districts to represent a number of individuals. Although these two procedures, and their underlying models, are very different, the general effect is the same.

A boundary problem in analysis is a phenomenon in which geographical patterns are differentiated by the shape and arrangement of boundaries that are drawn for administrative or measurement purposes. The boundary problem occurs because of the loss of neighbors in analyses that depend on the values of the neighbors. While geographic phenomena are measured and analyzed within a specific unit, identical spatial data can appear either dispersed or clustered depending on the boundary placed around the data. In analysis with point data, dispersion is evaluated as dependent of the boundary. In analysis with areal data, statistics should be interpreted based upon the boundary.

<span class="mw-page-title-main">CrimeStat</span>

CrimeStat is a crime mapping software program. CrimeStat is Windows-based program that conducts spatial and statistical analysis and is designed to interface with a geographic information system (GIS). The program is developed by Ned Levine & Associates under the direction of Ned Levine, with funding by the National Institute of Justice (NIJ), an agency of the United States Department of Justice. The program and manual are distributed for free by NIJ.

In statistics, a scan statistic or window statistic is a problem relating to the clustering of randomly positioned points. An example of a typical problem is the maximum size of a cluster of points on a line or the longest series of successes recorded by a moving window of fixed length.

<span class="mw-page-title-main">Head/tail breaks</span> Algorithm

Head/tail breaks is a clustering algorithm for data with a heavy-tailed distribution such as power laws and lognormal distributions. The heavy-tailed distribution can be simply referred to the scaling pattern of far more small things than large ones, or alternatively numerous smallest, a very few largest, and some in between the smallest and largest. The classification is done through dividing things into large and small things around the arithmetic mean or average, and then recursively going on for the division process for the large things or the head until the notion of far more small things than large ones is no longer valid, or with more or less similar things left only. Head/tail breaks is not just for classification, but also for visualization of big data by keeping the head, since the head is self-similar to the whole. Head/tail breaks can be applied not only to vector data such as points, lines and polygons, but also to raster data like digital elevation model (DEM).

<span class="mw-page-title-main">Spatiotemporal pattern</span> Patterns in both time and space

Spatiotemporal patterns are patterns that occur in a wide range of natural phenoma and are characterized by a spatial and temporal patterning. The general rules of pattern formation hold. In contrast to "static", pure spatial patterns, the full complexity of spatiotemporal patterns can only be recognized over time. Any kind of traveling wave is a good example of a spatiotemporal pattern. Besides the shape and amplitude of the wave, its time-varying position in space is an essential part of the entire pattern.

Data mining, the process of discovering patterns in large data sets, has been used in many applications.

<span class="mw-page-title-main">Martin Kulldorff</span> Professor of medicine, biostatistician

Martin Kulldorff is a Swedish biostatistician. He has been a professor of medicine at Harvard Medical School since 2003, though on leave as of 2023. He is a member of the US Food and Drug Administration's Drug Safety and Risk Management Advisory Committee and a former member of the Vaccine Safety Subgroup of the Advisory Committee on Immunization Practices at the US Centers for Disease Control and Prevention.

<span class="mw-page-title-main">Web GIS</span> Technologies employing the World Wide Web to manage spatial data

Web GIS, or Web Geographic Information Systems, are GIS that employ the World Wide Web to facilitate the storage, visualization, analysis, and distribution of spatial information over the Internet. The World Wide Web, or the Web, is an information system that uses the internet to host, share, and distribute documents, images, and other data. Web GIS involves using the World Wide Web to facilitate GIS tasks traditionally done on a desktop computer, as well as enabling the sharing of maps and spatial data. While Web GIS and Internet GIS are sometimes used interchangeably, they are different concepts. Web GIS is a subset of Internet GIS, which is itself a subset of distributed GIS, which itself is a subset of broader Geographic information system. The most common application of Web GIS is Web mapping, so much so that the two terms are often used interchangeably in much the same way as Digital mapping and GIS. However, Web GIS and web mapping are distinct concepts, with web mapping not necessarily requiring a Web GIS.

<span class="mw-page-title-main">Uncertain geographic context problem</span> Source of statistical bias

The uncertain geographic context problem or UGCoP is a source of statistical bias that can significantly impact the results of spatial analysis when dealing with aggregate data. The UGCoP is very closely related to the Modifiable areal unit problem (MAUP), and like the MAUP, arises from how we divide the land into areal units. It is caused by the difficulty, or impossibility, of understanding how phenomena under investigation in different enumeration units interact between enumeration units, and outside of a study area over time. It is particularly important to consider the UGCoP within the discipline of time geography, where phenomena under investigation can move between spatial enumeration units during the study period. Examples of research that needs to consider the UGCoP include food access and human mobility.

<span class="mw-page-title-main">Modifiable temporal unit problem</span> Source of statistical bias

The Modified Temporal Unit Problem (MTUP) is a source of statistical bias that occurs in time series and spatial analysis when using temporal data that has been aggregated into temporal units. In such cases, choosing a temporal unit can affect the analysis results and lead to inconsistencies or errors in statistical hypothesis testing.

References

  1. 1 2 3 4 5 6 7 8 9 10 11 Kulldorff, Martin (2022). "SaTScanJ User Guide" (PDF). SaTScan. SaTScan™. Retrieved 11 February 2023.
  2. 1 2 3 4 5 6 "SaTScan™ - Spatial and Space-Time Scan Statistics". National Cancer Institute: The Division of Cancer Control and Population Sciences (DCCPS). Retrieved 11 February 2023.
  3. Blair, Kimberly (October 26, 2014). "UWF students turn quality-of-life data detectives". Pensacola News Journal. Retrieved February 11, 2023.
  4. Glaz, J.; Naus, J.; Wallenstein, S. (2001). "Introduction". Scan Statistics. Springer Series in Statistics. pp. 3–9. doi:10.1007/978-1-4757-3460-7_1. ISBN   978-1-4419-3167-2.
  5. Elias, Johannes; Harmsen, Dag; Claus, Heike; Hellenbrand, Wiebke; Frosch, Matthias; Vogel, Ulrich (2006). "Spatiotemporal Analysis of Invasive Meningococcal Disease, Germany". Emerging Infectious Diseases . 12 (11): 1689–1695. PMC   3372358 . PMID   17283618.
  6. Yang, Shu-qin; Fang, Zheng-gang; Lv, Cai-xia; An, Shu-yi; Guan, Peng; Huang, De-sheng; Wu, Wei (February 2022). "Spatiotemporal cluster analysis of COVID-19 and its relationship with environmental factors at the city level in mainland China". Environmental Science and Pollution Research . 29 (9): 13386–13395. doi:10.1007/s11356-021-16600-9. PMC   8483427 . PMID   34595708.
  7. "SaTScan" (PDF). SaTScan License Agreement. SaTScan™. Retrieved 11 February 2023.
  8. 1 2 Kulldorff, Martin (1997). "A spatial scan statistic" (PDF). Communications in Statistics – Theory and Methods. 26 (6): 1481–1496. doi:10.1080/03610929708831995.
  9. 1 2 3 Cromley, Ellen K.; McLafferty, Sara L. (2002). GIS and Public Health. The Guilford Press. ISBN   1-57230-707-2.
  10. "Martin Kulldorff, Ph.D." Hillsdale College: Washington DC Campus. Retrieved 11 February 2023.
  11. "SaTScan Version History" (PDF). SaTScan. SaTScan™. Retrieved 11 February 2023.
  12. Desjardins, M.R.; Hohl, A.; Delmelle, E.M. (2020). "Rapid surveillance of COVID-19 in the United States using a prospective space-time scan statistic: Detecting and evaluating emerging clusters". Applied Geography. 118: 102202. doi:10.1016/j.apgeog.2020.102202. PMC   7139246 . PMID   32287518.
  13. Frössling, Jenny; Nødtvedt, Ane; Lindberg, Ann; Björkman, Camilla (2008). "Spatial analysis of Neospora caninum distribution in dairy cattle from Sweden". Geospatial Health. 3 (1): 39–45. doi: 10.4081/gh.2008.230 . PMID   19021107 . Retrieved 11 February 2023.
  14. 1 2 de la Fuente Marcos, R.; de la Fuente Marcos, C. (January 2008). "From Star Complexes to the Field: Open Cluster Families". The Astrophysical Journal. 672 (1): 342–351. Bibcode:2008ApJ...672..342D. doi: 10.1086/524028 . S2CID   250775794 . Retrieved 11 February 2023.
  15. Zeoli, April M.; Pizarro, Jesenia M.; Grady, Sue C.; Melde, Christopher (12 Oct 2012). "Homicide as Infectious Disease: Using Public Health Methods to Investigate the Diffusion of Homicide". Justice Quarterly. 31 (3): 609–632. doi:10.1080/07418825.2012.732100. S2CID   70487308.
  16. Gao, Jie; Zhang, Zhijie; Hu, Yi; Jianchao, Bian; Jiang, Wen; Xiaoming, Wang; Liqian, Sun; Qingwu, Jiang (2014). "Geographical Distribution Patterns of Iodine in Drinking-Water and Its Associations with Geological Factors in Shandong Province, China". International Journal of Environmental Research and Public Health. 11 (5): 5431–5444. doi: 10.3390/ijerph110505431 . PMC   4053898 . PMID   24852390.
  17. Carricondo-Sanchez, David; Odden, Morten; Linnell, John D. C.; Odden, John (April 19, 2017). "The range of the mange: Spatiotemporal patterns of sarcoptic mange in red foxes (Vulpes vulpes) as revealed by camera trapping". PLOS ONE. 12 (4): e0176200. doi: 10.1371/journal.pone.0176200 . PMC   5397041 . PMID   28423011.

SaTScan official website