Statistical literacy is the ability to understand and reason with statistics and data. The abilities to understand and reason with data, or arguments that use data, are necessary for citizens to understand material presented in publications such as newspapers, television, and the Internet. However, scientists also need to develop statistical literacy so that they can both produce rigorous and reproducible research and consume it. Numeracy is an element of being statistically literate and in some models of statistical literacy, or for some populations (e.g., students in kindergarten through 12th grade/end of secondary school), it is a prerequisite skill. Being statistically literate is sometimes taken to include having the abilities to both critically evaluate statistical material and appreciate the relevance of statistically-based approaches to all aspects of life in general [1] [2] [3] or to the evaluating, design, and/or production of scientific work. [4]
Each day people are inundated with statistical information from advertisements ("4 out of 5 dentists recommend"), news reports ("opinion poll show the incumbent leading by four points"), and even general conversation ("half the time I don't know what you're talking about"). Experts and advocates often use numerical claims to bolster their arguments, and statistical literacy is a necessary skill to help one decide what experts mean and which advocates to believe. This is important because statistics can be made to produce misrepresentations of data that may seem valid. The aim of statistical literacy proponents is to improve the public understanding of numbers and figures.
Health decisions are often manifest as statistical decision problems but few doctors or patients are well equipped to engage with these data. [5]
Results of opinion polling are often cited by news organizations, but the quality of such polls varies considerably. Some understanding of the statistical technique of sampling is necessary in order to be able to correctly interpret polling results. Sample sizes may be too small to draw meaningful conclusions, and samples may be biased. The wording of a poll question may introduce a bias, and thus can even be used intentionally to produce a biased result. Good polls use unbiased techniques, with much time and effort being spent in the design of the questions and polling strategy. Statistical literacy is necessary to understand what makes a poll trustworthy and to properly weigh the value of poll results and conclusions.
For these reasons, and others, many programs around the world have been created to promote or improve statistical literacy. For example, many official statistical agencies such as Statistics Canada and the Australian Bureau of Statistics have programs to educate students in schools about the nature of statistics. A project [6] of the International Statistical Institute is the only international organization whose focus is to promote national programs and drives to increase the statistical literacy of all members of society. Numerous resources and activities, as well as a body of international experts help maintain a very successful campaign across the continents. The United Nations Economic Commission for Europe has taken the notion of statistical literacy as the subject for its fourth guide to making data meaningful. Recognising the obligation of its royal charter to promote the public understanding of statistics, in 2010 the Royal Statistical Society launched a ten-year statistical literacy campaign. [7]
Experiments in the sciences, business models and reports, use statistics. People involved in these fields generally have studied the meaning of statistical quantities, such as averages and standard deviation. Many colleges and universities require an introductory course in statistics as part of a professional program.
Data visualization can contribute to either understanding or misunderstanding of the data or of the argument being made with the data. [8] [9] [10] [11] [12]
Studies have shown that human beings’ estimations of probabilities are heavily influenced by context and wording. Statistical reasoning may be difficult to develop and refine, which has led to labeling this type of reasoning as not intuitive. For example, people typically underestimate the probability of being involved in a car accident because their everyday interaction with vehicles gives the impression that they are safer than they actually are. Likewise, they tend to overestimate the probability of being attacked by a shark because of media or other influences. [13]
Gambling is one setting in which a lack of statistical literacy can be costly.[ citation needed ] Simple probability theory helps the individual either estimate or calculate the probabilities involved with games of chance. However, most individuals fail to approximate, for example, the probability of being dealt a full-house in a game of poker. Not understanding these probabilities causes the individual to wager more or less than they would knowing at least an estimate of the probability.[ citation needed ] Increasing individuals’ statistical literacy and knowledge of probability through classroom applications, textbook examples, and other methods, would lead to more informed citizens, capable of making more informed decisions, or perhaps not. [13]
The definition of statistical literacy and opinions about it have been somewhat historically variable. Before 1940 some statistical skills passed to the sciences. Some statistics was then taught in grade school, "So a degree of statistical literacy will be universal in the future...". [14] More recently, expectations have been higher. "'Statistical Literacy' is the ability to understand and critically evaluate statistical results that permeate our lives...". [2] Those statistical results often originate from inferential methods which reached college statistics textbooks in about 1940. Statistics continues to advance. A lack of statistical literacy has long been condemned under many labels. [15] [16] [17] [18] H.G. Wells has been cited as saying that statistical understanding will one day be as important as being able to read or write [2] but he may have been referring more to the older idea of political arithmetic than modern statistics.
In statistics, sampling bias is a bias in which a sample is collected in such a way that some members of the intended population have a lower or higher sampling probability than others. It results in a biased sample of a population in which all individuals, or instances, were not equally likely to have been selected. If this is not accounted for, results can be erroneously attributed to the phenomenon under study rather than to the method of sampling.
Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model to be studied. Populations can be diverse groups of people or objects such as "all people living in a country" or "every atom composing a crystal". Statistics deals with every aspect of data, including the planning of data collection in terms of the design of surveys and experiments.
Simpson's paradox is a phenomenon in probability and statistics in which a trend appears in several groups of data but disappears or reverses when the groups are combined. This result is often encountered in social-science and medical-science statistics, and is particularly problematic when frequency data are unduly given causal interpretations. The paradox can be resolved when confounding variables and causal relations are appropriately addressed in the statistical modeling.
In statistics, quality assurance, and survey methodology, sampling is the selection of a subset or a statistical sample of individuals from within a statistical population to estimate characteristics of the whole population. The subset is meant to reflect the whole population and statisticians attempt to collect samples that are representative of the population. Sampling has lower costs and faster data collection compared to recording data from the entire population, and thus, it can provide insights in cases where it is infeasible to measure an entire population.
Information design is the practice of presenting information in a way that fosters an efficient and effective understanding of the information. The term has come to be used for a specific area of graphic design related to displaying information effectively, rather than just attractively or for artistic expression. Information design is closely related to the field of data visualization and is often taught as part of graphic design courses. The broad applications of information design along with its close connections to other fields of design and communication practices have created some overlap in the definitions of communication design, data visualization, and information architecture.
Edward Rolf Tufte, sometimes known as "ET", is an American statistician and professor emeritus of political science, statistics, and computer science at Yale University. He is noted for his writings on information design and as a pioneer in the field of data visualization.
An opinion poll, often simply referred to as a survey or a poll, is a human research survey of public opinion from a particular sample. Opinion polls are usually designed to represent the opinions of a population by conducting a series of questions and then extrapolating generalities in ratio or within confidence intervals. A person who conducts polls is referred to as a pollster.
Numeracy is the ability to understand, reason with, and apply simple numerical concepts. The charity National Numeracy states: "Numeracy means understanding how mathematics is used in the real world and being able to apply it to make the best possible decisions...It's as much about thinking and reasoning as about 'doing sums'". Basic numeracy skills consist of comprehending fundamental arithmetical operations like addition, subtraction, multiplication, and division. For example, if one can understand simple mathematical equations such as 2 + 2 = 4, then one would be considered to possess at least basic numeric knowledge. Substantial aspects of numeracy also include number sense, operation sense, computation, measurement, geometry, probability and statistics. A numerically literate person can manage and respond to the mathematical demands of life.
John Wilder Tukey was an American mathematician and statistician, best known for the development of the fast Fourier Transform (FFT) algorithm and box plot. The Tukey range test, the Tukey lambda distribution, the Tukey test of additivity, and the Teichmüller–Tukey lemma all bear his name. He is also credited with coining the term bit and the first published use of the word software.
Chartjunk consists of all visual elements in charts and graphs that are not necessary to comprehend the information represented on the graph, or that distract the viewer from this information.
Infographic are graphic visual representations of information, data, or knowledge intended to present information quickly and clearly. They can improve cognition by using graphics to enhance the human visual system's ability to see patterns and trends. Similar pursuits are information visualization, data visualization, statistical graphics, information design, or information architecture. Infographics have evolved in recent years to be for mass communication, and thus are designed with fewer assumptions about the readers' knowledge base than other types of visualizations. Isotypes are an early example of infographics conveying information quickly and easily to the masses.
Statistics, when used in a misleading fashion, can trick the casual observer into believing something other than what the data shows. That is, a misuse of statistics occurs when a statistical argument asserts a falsehood. In some cases, the misuse may be accidental. In others, it is purposeful and for the gain of the perpetrator. When the statistical reason involved is false or misapplied, this constitutes a statistical fallacy.
This glossary of statistics and probability is a list of definitions of terms and concepts used in the mathematical sciences of statistics and probability, their sub-disciplines, and related fields. For additional related terms, see Glossary of mathematics and Glossary of experimental design.
Data and information visualization is the practice of designing and creating easy-to-communicate and easy-to-understand graphic or visual representations of a large amount of complex quantitative and qualitative data and information with the help of static, dynamic or interactive visual items. Typically based on data and information collected from a certain domain of expertise, these visualizations are intended for a broader audience to help them visually explore and discover, quickly understand, interpret and gain important insights into otherwise difficult-to-identify structures, relationships, correlations, local and global patterns, trends, variations, constancy, clusters, outliers and unusual groupings within data. When intended for the general public to convey a concise version of known, specific information in a clear and engaging manner, it is typically called information graphics.
Innumeracy: Mathematical Illiteracy and its Consequences is a 1988 book by mathematician John Allen Paulos about innumeracy as the mathematical equivalent of illiteracy: incompetence with numbers rather than words. Innumeracy is a problem with many otherwise educated and knowledgeable people. While many people would be ashamed to admit they are illiterate, there is very little shame in admitting innumeracy by saying things like "I'm a people person, not a numbers person", or "I always hated math", but Paulos challenges whether that widespread cultural excusing of innumeracy is truly worthy of acceptability.
Statistical graphics, also known as statistical graphical techniques, are graphics used in the field of statistics for data visualization.
Statistics education is the practice of teaching and learning of statistics, along with the associated scholarly research.
Data literacy is the ability to read, understand, create, and communicate data as information. Much like literacy as a general concept, data literacy focuses on the competencies involved in working with data. It is, however, not similar to the ability to read text since it requires certain skills involving reading and understanding data.
Intuitive statistics, or folk statistics, is the cognitive phenomenon where organisms use data to make generalizations and predictions about the world. This can be a small amount of sample data or training instances, which in turn contribute to inductive inferences about either population-level properties, future data, or both. Inferences can involve revising hypotheses, or beliefs, in light of probabilistic data that inform and motivate future predictions. The informal tendency for cognitive animals to intuitively generate statistical inferences, when formalized with certain axioms of probability theory, constitutes statistics as an academic discipline.
Graphical perception is the human capacity for visually interpreting information on graphs and charts. Both quantitative and qualitative information can be said to be encoded into the image, and the human capacity to interpret it is sometimes called decoding. The importance of human graphical perception, what we discern easily versus what our brains have more difficulty decoding, is fundamental to good statistical graphics design, where clarity, transparency, accuracy and precision in data display and interpretation are essential for understanding the translation of data in a graph to clarify and interpret the science.