Boolean analysis

Last updated

Boolean analysis was introduced by Flament (1976). [1] The goal of a Boolean analysis is to detect deterministic dependencies between the items of a questionnaire or similar data-structures in observed response patterns. These deterministic dependencies have the form of logical formulas connecting the items. Assume, for example, that a questionnaire contains items i, j, and k. Examples of such deterministic dependencies are then i  j, i  j  k, and i  j  k.

Determinism is the philosophical idea that all events, including moral choices, are determined completely by previously existing causes. Determinism is at times understood to preclude free will because it entails that humans cannot act otherwise than they do. It can also be called hard determinism from this point of view. Hard determinism is a position on the relationship of determinism to free will. The theory holds that the universe is utterly rational because complete knowledge of any given situation assures that unerring knowledge of its future is also possible. Some philosophers suggest variants around this basic definition. Deterministic theories throughout the history of philosophy have sprung from diverse and sometimes overlapping motives and considerations. The opposite of determinism is some kind of indeterminism. Determinism is often contrasted with free will.

Questionnaire research instrument consisting of a series of questions and other prompts for the purpose of gathering information from respondents

A questionnaire is a research instrument consisting of a series of questions for the purpose of gathering information from respondents. The questionnaire was invented by the Statistical Society of London in 1838.

Contents

Since the basic work of Flament (1976) a number of different methods for Boolean analysis have been developed. See, for example, Buggenhaut and Degreef (1987), Duquenne (1987), item tree analysis Leeuwe (1974), Schrepp (1999), or Theuns (1998). These methods share the goal to derive deterministic dependencies between the items of a questionnaire from data, but differ in the algorithms to reach this goal.

Item tree analysis

Item tree analysis (ITA) is a data analytical method which allows constructing a hierarchical structure on the items of a questionnaire or test from observed response patterns.
Assume that we have a questionnaire with m items and that subjects can answer positive (1) or negative (0) to each of these items, i.e. the items are dichotomous. If n subjects answer the items this results in a binary data matrix D with m columns and n rows. Typical examples of this data format are test items which can be solved (1) or failed (0) by subjects. Other typical examples are questionnaires where the items are statements to which subjects can agree (1) or disagree (0).
Depending on the content of the items it is possible that the response of a subject to an item j determines her or his responses to other items. It is, for example, possible that each subject who agrees to item j will also agree to item i. In this case we say that item j implies item i. The goal of an ITA is to uncover such deterministic implications from the data set D.

Boolean analysis is an explorative method to detect deterministic dependencies between items. The detected dependencies must be confirmed in subsequent research. Methods of Boolean analysis do not assume that the detected dependencies describe the data completely. There may be other probabilistic dependencies as well. Thus, a Boolean analysis tries to detect interesting deterministic structures in the data, but has not the goal to uncover all structural aspects in the data set. Therefore, it makes sense to use other methods, like for example latent class analysis, together with a Boolean analysis.

In statistics, exploratory data analysis (EDA) is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling or hypothesis testing task. Exploratory data analysis was promoted by John Tukey to encourage statisticians to explore the data, and possibly formulate hypotheses that could lead to new data collection and experiments. EDA is different from initial data analysis (IDA), which focuses more narrowly on checking assumptions required for model fitting and hypothesis testing, and handling missing values and making transformations of variables as needed. EDA encompasses IDA.

In statistics, a latent class model (LCM) relates a set of observed multivariate variables to a set of latent variables. It is a type of latent variable model. It is called a latent class model because the latent variable is discrete. A class is characterized by a pattern of conditional probabilities that indicate the chance that variables take on certain values.

Application areas

The investigation of deterministic dependencies has some tradition in educational psychology. The items represent in this area usually skills or cognitive abilities of subjects. Bart and Airasian (1974) use Boolean analysis to establish logical implications on a set of Piagetian tasks. Other examples in this tradition are the learning hierarchies of Gagné (1968) or the theory of structural learning of Scandura (1971).

Educational psychology is the branch of psychology concerned with the scientific study of human learning. The study of learning processes, from both cognitive and behavioral perspectives, allows researchers to understand individual differences in intelligence, cognitive development, affect, motivation, self-regulation, and self-concept, as well as their role in learning. The field of educational psychology relies heavily on quantitative methods, including testing and measurement, to enhance educational activities related to instructional design, classroom management, and assessment, which serve to facilitate learning processes in various educational settings across the lifespan.

There are several attempts to use boolean analysis, especially item tree analysis to construct knowledge spaces from data. Examples can be found in Held and Korossy (1998), or Schrepp (2002).

In mathematical psychology, a knowledge space is a combinatorial structure describing the possible states of knowledge of a human learner. To form a knowledge space, one models a domain of knowledge as a set of concepts, and a feasible state of knowledge as a subset of that set containing the concepts known or knowable by some individual. Typically, not all subsets are feasible, due to prerequisite relations among the concepts. The knowledge space is the family of all the feasible subsets.

Methods of Boolean analysis are used in a number of social science studies to get insight into the structure of dichotomous data. Bart and Krus (1973) use, for example, Boolean analysis to establish a hierarchical order on items that describe socially unaccepted behavior. Janssens (1999) used a method of Boolean analysis to investigate the integration process of minorities into the value system of the dominant culture. Romme (1995a) introduced Boolean comparative analysis to the management sciences, and applied it in a study of self-organizing processes in management teams (Romme 1995b).

Social science is a category of academic disciplines, concerned with society and the relationships among individuals within a society. Social science as a whole has many branches. These social sciences include, but are not limited to: anthropology, archaeology, communication studies, economics, history, human geography, jurisprudence, linguistics, political science, psychology, public health, and sociology. The term is also sometimes used to refer specifically to the field of sociology, the original "science of society", established in the 19th century. For a more detailed list of sub-disciplines within the social sciences see: Outline of social science.

A. Georges L. (Sjoerd) Romme is a Dutch organizational theorist and professor of entrepreneurship and innovation at the Eindhoven University of Technology.

Relations to other areas

Boolean analysis has some relations to other research areas. There is a close connection between Boolean analysis and knowledge spaces. The theory of knowledge spaces provides a theoretical framework for the formal description of human knowledge. A knowledge domain is in this approach represented by a set Q of problems. The knowledge of a subject in the domain is then described by the subset of problems from Q he or she is able to solve. This set is called the knowledge state of the subject. Because of dependencies between the items (for example, if solving item j implies solving item i) not all elements of the power set of Q will, in general, be possible knowledge states. The set of all possible knowledge states is called the knowledge structure. Methods of Boolean analysis can be used to construct a knowledge structure from data (for example, Theuns, 1998 or Schrepp, 1999). The main difference between both research areas is that Boolean analysis concentrates on the extraction of structures from data while knowledge space theory focus on the structural properties of the relation between a knowledge structure and the logical formulas which describe it.

Closely related to knowledge space theory is formal concept analysis (Ganter and Wille, 1996). Similar to knowledge space theory this approach concentrates on the formal description and visualization of existing dependencies. Formal concept analysis analysis offers very effective ways to construct such dependencies from data, with a focus on if-then expressions ("implications"). There is even a method, called attribute exploration [2] , for extracting all implications from hard-to-access data.

Another related field is data mining. Data mining deals with the extraction of knowledge from large databases. Several data mining algorithms extract dependencies of the form j → i (called association rules) from the database.

The main difference between Boolean analysis and the extraction of association rules in data mining is the interpretation of the extracted implications. The goal of a Boolean analysis is to extract implications from the data which are (with the exception of random errors in the response behavior) true for all rows in the data set. For data mining applications it is sufficient to detect implications which fulfill a predefined level of accuracy.

It is, for example in a marketing scenario, of interest to find implications which are true for more than x% of the rows in the data set. An online bookshop may be interested, for example, to search for implications of the form If a customer orders book A he also orders book B if they are fulfilled by more than 10% of the available customer data.

Related Research Articles

Structure arrangement and organization of interrelated elements in an object or system, or the object or system so organized

Structure is an arrangement and organization of interrelated elements in a material object or system, or the object or system so organized. Material structures include man-made objects such as buildings and machines and natural objects such as biological organisms, minerals and chemicals. Abstract structures include data structures in computer science and musical form. Types of structure include a hierarchy, a network featuring many-to-many links, or a lattice featuring connections between components that are neighbors in space.

Questionnaire construction refers to the design of a questionnaire to gather statistically useful information about a given topic. When properly constructed and responsibly administered, questionnaires can provide valuable data about any given subject.

Formal concept analysis (FCA) is a principled way of deriving a concept hierarchy or formal ontology from a collection of objects and their properties. Each concept in the hierarchy represents the objects sharing some set of properties; and each sub-concept in the hierarchy represents a subset of the objects in the concepts above it. The term was introduced by Rudolf Wille in 1980, and builds on the mathematical theory of lattices and ordered sets that was developed by Garrett Birkhoff and others in the 1930s.

Social network analysis

Social network analysis (SNA) is the process of investigating social structures through the use of networks and graph theory. It characterizes networked structures in terms of nodes and the ties, edges, or links that connect them. Examples of social structures commonly visualized through social network analysis include social media networks, memes spread, information circulation, friendship and acquaintance networks, business networks, social networks, collaboration graphs, kinship, disease transmission, and sexual relationships. These networks are often visualized through sociograms in which nodes are represented as points and ties are represented as lines. These visualizations provide a means of qualitatively assessing networks by varying the visual representation of their nodes and edges to reflect attributes of interest.

Association rule learning method for discovering interesting relations between variables in databases

Association rule learning is a rule-based machine learning method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using some measures of interestingness. This rule-based approach also generates new rules as it analyzes more data. The ultimate goal, assuming a large enough dataset, is to help a machine mimic the human brain’s feature extraction and abstract association capabilities from new uncategorized data.

Cluster analysis Task of grouping a set of objects so that objects in the same group (or cluster) are more similar to each other than to those in other clusters

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics.

Granular computing (GrC) is an emerging computing paradigm of information processing that concerns the processing of complex information entities called "information granules", which arise in the process of data abstraction and derivation of knowledge from information or data. Generally speaking, information granules are collections of entities that usually originate at the numeric level and are arranged together due to their similarity, functional or physical adjacency, indistinguishability, coherency, or the like.

A Guttman Scale is formed by a set of items if they can be ordered in a reproducible hierarchy. For example, in a test of achievement in mathematics, if examinees can successfully answer items at one level of difficulty, they would be able to answer the earlier questions.

Raymond Cattell British-American psychologist

Raymond Bernard Cattell was a British and American psychologist, known for his psychometric research into intrapersonal psychological structure. His work also explored the basic dimensions of personality and temperament, the range of cognitive abilities, the dynamic dimensions of motivation and emotion, the clinical dimensions of abnormal personality, patterns of group syntality and social behavior, applications of personality research to psychotherapy and learning theory, predictors of creativity and achievement, and many multivariate research methods including the refinement of factor analytic methods for exploring and measuring these domains. Cattell authored, co-authored, or edited almost 60 scholarly books, more than 500 research articles, and over 30 standardized psychometric tests, questionnaires, and rating scales. According to a widely cited ranking, Cattell was the 16th most eminent, 7th most cited in the scientific journal literature, and among the most productive, but controversial psychologists of the 20th century.

Structural equation modeling Form of causal modeling that fit networks of constructs to data

Structural equation modeling (SEM) is a form of causal modeling that includes a diverse set of mathematical models, computer algorithms, and statistical methods that fit networks of constructs to data. SEM includes confirmatory factor analysis, confirmatory composite analysis, path analysis, partial least squares path modeling, and latent growth modeling. The concept should not be confused with the related concept of structural models in econometrics, nor with structural models in economics. Structural equation models are often used to assess unobservable 'latent' constructs. They often invoke a measurement model that defines latent variables using one or more observed variables, and a structural model that imputes relationships between latent variables. The links between constructs of a structural equation model may be estimated with independent regression equations or through more involved approaches such as those employed in LISREL.

Spatial analysis Formal techniques which study entities using their topological, geometric, or geographic properties

Spatial analysis or spatial statistics includes any of the formal techniques which study entities using their topological, geometric, or geographic properties. Spatial analysis includes a variety of techniques, many still in their early development, using different analytic approaches and applied in fields as diverse as astronomy, with its studies of the placement of galaxies in the cosmos, to chip fabrication engineering, with its use of "place and route" algorithms to build complex wiring structures. In a more restricted sense, spatial analysis is the technique applied to structures at the human scale, most notably in the analysis of geographic data.

In statistics, qualitative comparative analysis (QCA) is a data analysis technique for determining which logical conclusions a data set supports. The analysis begins with listing and counting all the combinations of variables observed in the data set, followed by applying the rules of logical inference to determine which descriptive inferences or implications the data supports. The technique was originally developed by Charles Ragin in 1987.

The attribute hierarchy method (AHM), is a cognitively based psychometric procedure developed by Jacqueline Leighton, Mark Gierl, and Steve Hunka at the Centre for Research in Applied Measurement and Evaluation (CRAME) at the University of Alberta. The AHM is one form of cognitive diagnostic assessment that aims to integrate cognitive psychology with educational measurement for the purposes of enhancing instruction and student learning. A cognitive diagnostic assessment (CDA), is designed to measure specific knowledge states and cognitive processing skills in a given domain. The results of a CDA yield a profile of scores with detailed information about a student’s cognitive strengths and weaknesses. This cognitive diagnostic feedback has the potential to guide instructors, parents and students in their teaching and learning processes.

Structural complexity theory

In computational complexity theory of computer science, the structural complexity theory or simply structural complexity is the study of complexity classes, rather than computational complexity of individual problems and algorithms. It involves the research of both internal structures of various complexity classes and the relations between different complexity classes.

In statistics, listwise deletion is a method for handling missing data. In this method, an entire record is excluded from analysis if any single value is missing.

Individual psychological assessment (IPA) is a tool used by organizations to make decisions on employment. IPA allows employers to evaluate and maintain potential candidates for hiring, promotion, and development by using a series of job analysis instruments such as position analysis questionnaires (PAQ), occupational analysis inventory (OAI), and functional job analysis (FJA). These instruments allow the assessor to develop valid measures of intelligence, personality tests, and a range of other factors as means to determine selection and promotion decisions. Personality and cognitive ability are good predictors of performance. Emotional Intelligence helps individuals navigate through challenging organizational and interpersonal encounters. Since individual differences have a long history in explaining human behavior and the different ways in which individuals respond to similar events and circumstances, these factors allow the organization to determine if an applicant has the competence to effectively and successfully do the work that the job requires. These assessments are administered throughout organizations in different forms, but they share one common goal in the selection process, and that is the right candidate for the job.

In formal concept analysis (FCA) implications relate sets of properties. An implication AB holds in a given domain when every object having all attributes in A also has all attributes in B. Such implications characterize the concept hierarchy in an intuitive manner. Moreover, they are "well-behaved" with respect to algorithms. The knowledge acquisition method called attribute exploration uses implications.

References

  1. Flament, C. (1976). "L'analyse booleenne de questionnaire", Paris: Mouton.
  2. Ganter, Bernhard and Obiedkov, Sergei (2016) Conceptual Exploration. Springer, ISBN   978-3-662-49290-1