In the social sciences, coding is an analytical process in which data, in both quantitative form (such as questionnaires results) or qualitative form (such as interview transcripts) are categorized to facilitate analysis.
One purpose of coding is to transform the data into a form suitable for computer-aided analysis. This categorization of information is an important step, for example, in preparing data for computer processing with statistical software. Prior to coding, an annotation scheme is defined. It consists of codes or tags. During coding, coders manually add codes into data where required features are identified. The coding scheme ensures that the codes are added consistently across the data set and allows for verification of previously tagged data. [1]
Some studies will employ multiple coders working independently on the same data. This also minimizes the chance of errors from coding and is believed to increase the reliability of data.
One code should apply to only one category and categories should be comprehensive. There should be clear guidelines for coders (individuals who do the coding) so that code is consistent.
For quantitative analysis, data is coded usually into measured and recorded as nominal or ordinal variables.
Questionnaire data can be pre-coded (process of assigning codes to expected answers on designed questionnaire), field-coded (process of assigning codes as soon as data is available, usually during fieldwork), post-coded (coding of open questions on completed questionnaires) or office-coded (done after fieldwork). Note that some of the above are not mutually exclusive.
In social sciences, spreadsheets such as Excel and more advanced software packages such as R, Matlab, PSPP/SPSS, DAP/SAS, MiniTab and Stata are often used.
For disciplines in which a qualitative format is preferential, including ethnography, humanistic geography or phenomenological psychology a varied approach to coding can be applied. Iain Hay (2005) outlines a two-step process beginning with basic coding in order to distinguish overall themes, followed by a more in depth, interpretive code in which more specific trends and patterns can be interpreted. [2]
Much of qualitative coding can be attributed to either grounded or a priori coding. [3] Grounded coding refers to allowing notable themes and patterns emerge from the document themselves, where as a priori coding requires the researcher to apply pre-existing theoretical frameworks to analyze the documents. As coding methods are applied across various texts, the researcher is able to apply axial coding, which is the process of selecting core thematic categories present in several documents to discover common patterns and relations. [4]
Coding is considered a process of discovery and is done in cycles. Prior to constructing categories, a researcher might apply a first and second cycle coding methods. [3] There are a multitude of methods available, and a researcher will want to pick one that is suited for the format and nature of their documents. Not all methods can be applied to every type of document. Some examples of first cycle coding methods include:
The process can be done manually, which can be as simple as highlighting different concepts with different colours, or fed into a software package. Some examples of qualitative software packages include Atlas.ti, MAXQDA, NVivo, QDA Miner, and RQDA.
After assembling codes it is time to organize them into broader themes and categories. The process generally involves identifying themes from the existing codes, reducing the themes to a manageable number, creating hierarchies within the themes and then linking themes together through theoretical modeling. [6]
Creating memos during the coding process is integral to both grounded and a priori coding approaches. Qualitative research is inherently reflexive; as the researcher delves deeper into their subject, it is important to chronicle their own thought processes through reflective or methodological memos, as doing so may highlight their own subjective interpretations of data. [7] It is crucial to begin memoing at the onset of research. Regardless of the type of memo produced, what is important is that the process initiates critical thinking and productivity in the research. Doing so will facilitate easier and more coherent analyses as the project draws on. [8] Memos can be used to map research activities, uncover meaning from data, maintaining research momentum and engagement and opening communication. [9]
Questionnaire construction refers to the design of a questionnaire to gather statistically useful information about a given topic. When properly constructed and responsibly administered, questionnaires can provide valuable data about any given subject.
Participant observation is one type of data collection method by practitioner-scholars typically used in qualitative research and ethnography. This type of methodology is employed in many disciplines, particularly anthropology, sociology, communication studies, human geography, and social psychology. Its aim is to gain a close and intimate familiarity with a given group of individuals and their practices through an intensive involvement with people in their cultural environment, usually over an extended period of time.
Qualitative research is a type of research that aims to gather and analyse non-numerical (descriptive) data in order to gain an understanding of individuals' social reality, including understanding their attitudes, beliefs, and motivation. This type of research typically involves in-depth interviews, focus groups, or field observations in order to collect data that is rich in detail and context. Qualitative research is often used to explore complex phenomena or to gain insight into people's experiences and perspectives on a particular topic. It is particularly useful when researchers want to understand the meaning that people attach to their experiences or when they want to uncover the underlying reasons for people's behavior. Qualitative methods include ethnography, grounded theory, discourse analysis, and interpretative phenomenological analysis. Qualitative research methods have been used in sociology, anthropology, political science, psychology, communication studies, social work, folklore, educational research, information science and software engineering research.
Content analysis is the study of documents and communication artifacts, which might be texts of various formats, pictures, audio or video. Social scientists use content analysis to examine patterns in communication in a replicable and systematic manner. One of the key advantages of using content analysis to analyse social phenomena is their non-invasive nature, in contrast to simulating social experiences or collecting survey answers.
In its most common sense, methodology is the study of research methods. However, the term can also refer to the methods themselves or to the philosophical discussion of associated background assumptions. A method is a structured procedure for bringing about a certain goal, like acquiring knowledge or verifying knowledge claims. This normally involves various steps, like choosing a sample, collecting data from this sample, and interpreting the data. The study of methods concerns a detailed description and analysis of these processes. It includes evaluative aspects by comparing different methods. This way, it is assessed what advantages and disadvantages they have and for what research goals they may be used. These descriptions and evaluations depend on philosophical background assumptions. Examples are how to conceptualize the studied phenomena and what constitutes evidence for or against them. When understood in the widest sense, methodology also includes the discussion of these more abstract issues.
In statistics, a categorical variable is a variable that can take on one of a limited, and usually fixed, number of possible values, assigning each individual or other unit of observation to a particular group or nominal category on the basis of some qualitative property. In computer science and some branches of mathematics, categorical variables are referred to as enumerations or enumerated types. Commonly, each of the possible values of a categorical variable is referred to as a level. The probability distribution associated with a random categorical variable is called a categorical distribution.
Grounded theory is a systematic methodology that has been largely applied to qualitative research conducted by social scientists. The methodology involves the construction of hypotheses and theories through the collecting and analysis of data. Grounded theory involves the application of inductive reasoning. The methodology contrasts with the hypothetico-deductive model used in traditional scientific research.
Narrative inquiry or narrative analysis emerged as a discipline from within the broader field of qualitative research in the early 20th century, as evidence exists that this method was used in psychology and sociology. Narrative inquiry uses field texts, such as stories, autobiography, journals, field notes, letters, conversations, interviews, family stories, photos, and life experience, as the units of analysis to research and understand the way people create meaning in their lives as narratives.
Theoretical sampling is a process of data collection for generating theory whereby the analyst jointly collects codes and analyses data and decides what data to collect next and where to find them, in order to develop a theory as it emerges. The initial stage of data collection depends largely on a general subject or problem area, which is based on the analyst's general perspective of the subject area. The initial decisions are not based on a preconceived theoretical framework. The researcher begins by identifying some key concepts and features which they will research about. This gives a foundation for the research. A researcher must be theoretically sensitive so that a theory can be conceptualized and formulated as it emerges from the data being collected. Caution must be taken so as to not limit oneself to specific aspects of a theory; this will make a researcher blind towards other concepts and aspects of the theory. The main question in this method of sampling is this: what groups should the researcher turn to next in the data collection process, and why?
Interpretative phenomenological analysis (IPA) is a qualitative form of psychology research. IPA has an idiographic focus, which means that instead of producing generalization findings, it aims to offer insights into how a given person, in a given context, makes sense of a given situation. Usually, these situations are of personal significance; examples might include a major life event, or the development of an important relationship. IPA has its theoretical origins in phenomenology and hermeneutics, and many of its key ideas are inspired by the work of Edmund Husserl, Martin Heidegger, and Maurice Merleau-Ponty. IPA's tendency to combine psychological, interpretative, and idiographic elements is what distinguishes it from other approaches to qualitative, phenomenological psychology.
RQDA was an R package for computer assisted qualitative data analysis or CAQDAS. It was installable from, and runs within, the R statistical software, but has a separate window running a graphical user interface. RQDA's approach allowed for tight integration of the constructivist approach of qualitative research with quantitative data analysis which can increase the rigor, transparency and validity of qualitative research.
Netnography is a "form of qualitative research that seeks to understand the cultural experiences that encompass and are reflected within the traces, practices, networks and systems of social media". It is a specific set of research practices related to data collection, analysis, research ethics, and representation, rooted in participant observation that can be conceptualized into three key stages: investigation, interaction, and immersion. In netnography, a significant amount of the data originates in and manifests through the digital traces of naturally occurring public conversations recorded by contemporary communications networks. Netnography uses these conversations as data. It is an interpretive research method that adapts the traditional, in-person participant observation techniques of anthropology to the study of interactions and experiences manifesting through digital communications.
MAXQDA is a software program designed for computer-assisted qualitative and mixed methods data, text and multimedia analysis in academic, scientific, and business institutions. It is being developed and distributed by VERBI Software based in Berlin, Germany.
Aquad is open source computer-assisted qualitative data analysis software (CAQDAS). It supports analysis of text, audio, video, and graphical data.
Thematic analysis is one of the most common forms of analysis within qualitative research. It emphasizes identifying, analysing and interpreting patterns of meaning within qualitative data. Thematic analysis is often understood as a method or technique in contrast to most other qualitative analytic approaches – such as grounded theory, discourse analysis, narrative analysis and interpretative phenomenological analysis – which can be described as methodologies or theoretically informed frameworks for research. Thematic analysis is best thought of as an umbrella term for a variety of different approaches, rather than a singular method. Different versions of thematic analysis are underpinned by different philosophical and conceptual assumptions and are divergent in terms of procedure. Leading thematic analysis proponents, psychologists Virginia Braun and Victoria Clarke distinguish between three main types of thematic analysis: coding reliability approaches, code book approaches and reflexive approaches. They first described their own widely used approach in 2006 in the journal Qualitative Research in Psychology as reflexive thematic analysis. This paper has over 120,000 Google Scholar citations and according to Google Scholar is the most cited academic paper published in 2006. The popularity of this paper exemplifies the growing interest in thematic analysis as a distinct method.
Online content analysis or online textual analysis refers to a collection of research techniques used to describe and make inferences about online material through systematic coding and interpretation. Online content analysis is a form of content analysis for analysis of Internet-based communication.
Based in grounded theory, open coding is the analytic process through which concepts (codes) are attached to observed data and phenomena during qualitative data analysis. It is one of the techniques described by Strauss (1987) and Strauss and Corbin (1990) for working with text. Open coding attempts to codify, name or classifying the observed phenomenon and is achieved by segmenting data into meaningful expressions and describing that data with a single word or short sequence of words. Relevant annotations and concepts are then attached to these expressions.
Cassandre is a free open source software for computer assisted qualitative data analysis and interpretation in humanities and social sciences. Although it refers, like other CAQDAS-software, to Grounded Theory Method, it also allows to conduct discourse analysis or quantitative content analysis. The software is designed as a server to support collaborative work. Formerly focused on semi-automatic coding, it now provides diaries assisting qualitative analysis.
KH Coder is an open source software for computer assisted qualitative data analysis, particularly quantitative content analysis and text mining. It can be also used for computational linguistics. It supports processing and etymological information of text in several languages, such as Japanese, English, French, German, Italian, Portuguese and Spanish. Specifically, it can contribute factual examination co-event system hub structure, computerized arranging guide, multidimensional scaling and comparative calculations. Word frequency statistics, part-of-speech analysis, grouping, correlation analysis, and visualization are among the features offered by KH Coder.
Quirkos is a CAQDAS software package for the qualitative analysis of text data, commonly used in social science. It provides a graphical interface in which the nodes or themes of analysis are represented by bubbles. It is designed primarily for new and non-academic users of qualitative data, to allow them to quickly learn the basics of qualitative data analysis. Although simpler to use, it lacks some of the features present in other commercial CAQDAS packages such as multimedia support. However, it has been proposed as a useful tool for lay and participant led analysis and is comparatively affordable. It is developed by Edinburgh, UK based Quirkos Software, and was first released in October 2014.