Data collection

Last updated
Example of data collection in the biological sciences: Adelie penguins are identified and weighed each time they cross the automated weighbridge on their way to or from the sea. Automated weighbridge for Adelie penguins - journal.pone.0085291.g002.png
Example of data collection in the biological sciences: Adélie penguins are identified and weighed each time they cross the automated weighbridge on their way to or from the sea.

Data collection is the process of gathering and measuring information on targeted variables in an established system, which then enables one to answer relevant questions and evaluate outcomes. Data collection is a research component in all study fields, including physical and social sciences, humanities, [2] and business. While methods vary by discipline, the emphasis on ensuring accurate and honest collection remains the same. The goal for all data collection is to capture quality evidence that allows analysis to lead to the formulation of convincing and credible answers to the questions that have been posed. Data collection and validation consists of four steps when it involves taking a census and seven steps when it involves sampling. [3]

Contents

Regardless of the field of study or preference for defining data (quantitative or qualitative), accurate data collection is essential to maintain research integrity. The selection of appropriate data collection instruments (existing, modified, or newly developed) and delineated instructions for their correct use reduce the likelihood of errors.

A formal data collection process is necessary as it ensures that the data gathered are both defined and accurate. This way, subsequent decisions based on arguments embodied in the findings are made using valid data. [4] The process provides both a baseline from which to measure and in certain cases an indication of what to improve.

There are 5 common data collection methods:

  1. closed-ended surveys and quizzes,
  2. open-ended surveys and questionnaires,
  3. 1-on-1 interviews,
  4. focus groups, and
  5. direct observation. [5]

DMPs and data collection

DMP is the abbreviation for data management platform. It is a centralized storage and analytical system for data. Mainly used by marketers, DMPs exist to compile and transform large amounts of data into discernible information. [6] Marketers may want to receive and utilize first, second and third-party data. DMPs enable this, because they are the aggregate system of DSPs (demand side platform) and SSPs (supply side platform). When in comes to advertising, DMPs are integral for optimizing and guiding marketers in future campaigns. This system and their effectiveness is proof that categorized, analyzed, and compiled data is far more useful than raw data.

Data collection and analysis on z/OS

z/OS is a widely used operating system for IBM mainframe. It is designed to offer a stable, secure, and continuously available environment for applications running on the mainframe. Operational data is data that z/OS system produces when it runs. This data indicates the health of the system and can be used to identify sources of performance and availability issues in the system. The analysis of operational data by analytics platforms provide insights and recommended actions to make the system work more efficiently, and to help resolve or prevent problems. IBM Z Operational Log and Data Analytics collects IT operational data from z/OS systems, transforms it to a consumable format, and streams it to third-party enterprise analytics platforms like the Elastic Stack and Splunk, or to the included operational data analysis platform. [7]

IBM Z Operational Log and Data Analytics supports the collection and analysis of the following operational data: [8]

Data integrity issues [9]

The main reason for maintaining data integrity is to support the observation of errors in the data collection process. Those errors may be made intentionally (deliberate falsification) or non-intentionally (random or systematic errors).

There are two approaches that may protect data integrity and secure scientific validity of study results invented by Craddick, Crawford, Rhodes, Redican, Rukenbrod and Laws in 2003:

Quality assurance

Its main focus is prevention which is primarily a cost-effective activity to protect the integrity of data collection. Standardization of protocol best demonstrates this cost-effective activity, which is developed in a comprehensive and detailed procedures manual for data collection. The risk of failing to identify problems and errors in the research process is evidently caused by poorly written guidelines. Listed are several examples of such failures:

Quality control

Since quality control actions occur during or after the data collection all the details are carefully documented. There is a necessity for a clearly defined communication structure as a precondition for establishing monitoring systems. Uncertainty about the flow of information is not recommended as a poorly organized communication structure leads to lax monitoring and can also limit the opportunities for detecting errors. Quality control is also responsible for the identification of actions necessary for correcting faulty data collection practices and also minimizing such future occurrences. A team is more likely to not realize the necessity to perform these actions if their procedures are written vaguely and are not based on feedback or education.

Data collection problems that necessitate prompt action:

See also

Related Research Articles

Analysis Process of understanding a complex topic or substance

Analysis is the process of breaking a complex topic or substance into smaller parts in order to gain a better understanding of it. The technique has been applied in the study of mathematics and logic since before Aristotle, though analysis as a formal concept is a relatively recent development.

Research Systematic study undertaken to increase knowledge

Research is "creative and systematic work undertaken to increase the stock of knowledge". It involves the collection, organization and analysis of information to increase understanding of a topic or issue. A research project may be an expansion on past work in the field. To test the validity of instruments, procedures, or experiments, research may replicate elements of prior projects or the project as a whole.

z/OS 64-bit operating system for IBM mainframes

z/OS is a 64-bit operating system for IBM z/Architecture mainframes, introduced by IBM in October 2000. It derives from and is the successor to OS/390, which in turn followed a string of MVS versions. Like OS/390, z/OS combines a number of formerly separate, related products, some of which are still optional. z/OS has the attributes of modern operating systems, but also retains much of the older functionality originated in the 1960s and still in regular use—z/OS is designed for backward compatibility.

IBM Db2 Family Relational model database server

Db2 is a family of data management products, including database servers, developed by IBM. They initially supported the relational model, but were extended to support object–relational features and non-relational structures like JSON and XML. The brand name was originally styled as DB/2, then DB2 until 2017 and finally changed to its present form.

SPSS Statistical analysis software

SPSS Statistics is a statistical software suite developed by IBM for data management, advanced analytics, multivariate analysis, business intelligence, criminal investigation. Long produced by SPSS Inc., it was acquired by IBM in 2009. Current versions have the brand name: IBM SPSS Statistics.

Marketing research is the systematic gathering, recording, and analysis of qualitative and quantitative data about issues relating to marketing products and services. The goal is to identify and assess how changing elements of the marketing mix impacts customer behavior.

Multimethodology

Multimethodology or multimethod research includes the use of more than one method of data collection or research in a research study or set of related studies. Mixed methods research is more specific in that it includes the mixing of qualitative and quantitative data, methods, methodologies, and/or paradigms in a research study or set of related studies. One could argue that mixed methods research is a special case of multimethod research. Another applicable, but less often used label, for multi or mixed research is methodological pluralism. All of these approaches to professional and academic research emphasize that monomethod research can be improved through the use of multiple data sources, methods, research methodologies, perspectives, standpoints, and paradigms.

Quantitative research All procedures for the numerical representation of empirical facts

Quantitative research is a research strategy that focuses on quantifying the collection and analysis of data. It is formed from a deductive approach where emphasis is placed on the testing of theory, shaped by empiricist and positivist philosophies.

Analytics is the systematic computational analysis of data or statistics. It is used for the discovery, interpretation, and communication of meaningful patterns in data. It also entails applying data patterns towards effective decision-making. It can be valuable in areas rich with recorded information; analytics relies on the simultaneous application of statistics, computer programming and operations research to quantify performance.

An assay is an investigative (analytic) procedure in laboratory medicine, mining, pharmacology, environmental biology and molecular biology for qualitatively assessing or quantitatively measuring the presence, amount, or functional activity of a target entity. The analyte can be a drug, biochemical substance, chemical element or compound, or cell in an organism or organic sample. The measured entity is often called the analyte, the measurand, or the target of the assay. An assay usually aims to measure an analyte's intensive property and express it in the relevant measurement unit.

Reliability, availability and serviceability (RAS), also known as reliability, availability, and maintainability (RAM), is a computer hardware engineering term involving reliability engineering, high availability, and serviceability design. The phrase was originally used by International Business Machines (IBM) as a term to describe the robustness of their mainframe computers.

Research design Overall strategy utilized to carry out research

Research design refers to the overall strategy utilized to carry out research that defines a succinct and logical plan to tackle established research question(s) through the collection, interpretation, analysis, and discussion of data.

IBM System Management Facility (SMF) is a component of IBM's z/OS for mainframe computers, providing a standardised method for writing out records of activity to a file. SMF provides full "instrumentation" of all baseline activities running on that IBM mainframe operating system, including I/O, network activity, software usage, error conditions, processor utilization, etc.

MAXQDA is a software program designed for computer-assisted qualitative and mixed methods data, text and multimedia analysis in academic, scientific, and business institutions. It is being developed and distributed by VERBI Software based in Berlin, Germany.

Analytical quality control, commonly shortened to AQC, refers to all those processes and procedures designed to ensure that the results of laboratory analysis are consistent, comparable, accurate and within specified limits of precision. Constituents submitted to the analytical laboratory must be accurately described to avoid faulty interpretations, approximations, or incorrect results. The qualitative and quantitative data generated from the laboratory can then be used for decision making. In the chemical sense, quantitative analysis refers to the measurement of the amount or concentration of an element or chemical compound in a matrix that differs from the element or compound. Fields such as industry, medicine, and law enforcement can make use of AQC.

Linux on IBM Z

Linux on IBM Z is the collective term for the Linux operating system compiled to run on IBM mainframes, especially IBM Z and IBM LinuxONE servers. Similar terms which imply the same meaning are Linux on zEnterprise, Linux on zSeries, Linux/390, Linux/390x, etc. The three Linux distributions certified for usage on the IBM Z hardware platform are Red Hat Enterprise Linux, SUSE Linux Enterprise, and Ubuntu.

Thematic analysis is one of the most common forms of analysis within qualitative research. It emphasizes identifying, analysing and interpreting patterns of meaning within qualitative data. Thematic analysis is often understood as a method or technique in contrast to most other qualitative analytic approaches - such as grounded theory, discourse analysis, narrative analysis and interpretative phenomenological analysis - which can be described as methodologies or theoretically informed frameworks for research. Thematic analysis is best thought of as an umbrella term for a variety of different approaches, rather than a singular method. Different versions of thematic analysis are underpinned by different philosophical and conceptual assumptions and are divergent in terms of procedure. Leading thematic analysis proponents, psychologists Virginia Braun and Victoria Clarke distinguish between three main types of thematic analysis: coding reliability approaches, code book approaches and reflexive approaches. They describe their own widely used approach first outlined in 2006 in the journal Qualitative Research in Psychology as reflexive thematic analysis. Their 2006 paper has over 90,000 Google Scholar citations and according to Google Scholar is the most cited academic paper published in 2006. The popularity of this paper exemplifies the growing interest in thematic analysis as a distinct method.

Behavioral analytics is a recent advancement in business analytics that reveals new insights into the behavior of consumers on eCommerce platforms, online games, web and mobile applications, and IoT. The rapid increase in the volume of raw event data generated by the digital world enables methods that go beyond typical analysis by demographics and other traditional metrics that tell us what kind of people took what actions in the past. Behavioral analysis focuses on understanding how consumers act and why, enabling accurate predictions about how they are likely to act in the future. It enables marketers to make the right offers to the right consumer segments at the right time.

A customer data platform (CDP) is a collection of software which creates a persistent, unified customer database that is accessible to other systems. Data is pulled from multiple sources, cleaned and combined to create a single customer profile. This structured data is then made available to other marketing systems. According to Gartner, customer data platforms have evolved from a variety of mature markets, "including multichannel campaign management, tag management and data integration."

A data management platform (DMP) is a software platform used for collecting and managing data. They allow businesses to identify audience segments, which can be used to target specific users and contexts in online advertising campaigns. DMPs may use big data and artificial intelligence algorithms to process and analyze large data sets about users from various sources. Some advantages of using DMPs include data organization, increased insight on audiences and markets, and effective advertisement budgeting. On the other hand, DMPs often have to deal with privacy concerns due to the integration of third-party software with private data. This technology is continuously being developed by global entities such as Nielsen and Oracle.

References

  1. Lescroël, A. L.; Ballard, G.; Grémillet, D.; Authier, M.; Ainley, D. G. (2014). Descamps, Sébastien (ed.). "Antarctic Climate Change: Extreme Events Disrupt Plastic Phenotypic Response in Adélie Penguins". PLOS ONE. 9 (1): e85291. doi: 10.1371/journal.pone.0085291 . PMC   3906005 . PMID   24489657.
  2. Vuong, Quan-Hoang; La, Viet-Phuong; Vuong, Thu-Trang; Ho, Manh-Toan; Nguyen, Hong-Kong T.; Nguyen, Viet-Ha; Pham, Hiep-Hung; Ho, Manh-Tung (September 25, 2018). "An open database of productivity in Vietnam's social sciences and humanities for public use". Scientific Data. 5: 180188. doi:10.1038/sdata.2018.188. PMC   6154282 . PMID   30251992.
  3. Ziafati Bafarasat, A. (2021) Collecting and validating data: A simple guide for researchers. Advance. Preprint.. https://doi.org/10.31124/advance.13637864.v1
  4. Data Collection and Analysis By Dr. Roger Sapsford, Victor Jupp ISBN   0-7619-5046-X
  5. Jovancic, Nemanja. "5 Data Collection Methods for Obtaining Quantitative and Qualitative Data". LeadQuizzes. LeadQuizzes. Retrieved 23 February 2020.
  6. Collin, E. M. (2020-11-04). "Data Collection: The Complete Guide". Easy Earned Money. Retrieved 2020-11-05.
  7. IBM: IBM Z Operational Log and Data Analytics Product Page
  8. IBM: IBM Z Operational Log and Data Analytics documentation
  9. Northern Illinois University (2005). "Data Collection". Responsible Conduct in Data Management. Retrieved June 8, 2019.