Data collection system

Last updated

Data collection system (DCS) is a computer application that facilitates the process of data collection, allowing specific, structured information to be gathered in a systematic fashion, subsequently enabling data analysis to be performed on the information. [1] [2] [3] Typically a DCS displays a form that accepts data input from a user and then validates that input prior to committing the data to persistent storage such as a database.

Contents

Many computer systems implement data entry forms, but data collection systems tend to be more complex, with possibly many related forms containing detailed user input fields, data validations, and navigation links among the forms.

DCSs can be considered a specialized form of content management system (CMS), particularly when they allow the information being gathered to be published, edited, modified, deleted, and maintained. Some general-purpose CMSs include features of DCSs. [4] [5]

Importance

Accurate data collection is essential to many business processes, [6] [7] [8] to the enforcement of many government regulations, [9] and to maintaining the integrity of scientific research. [10]

Data collection systems are an end-product of software development. Identifying and categorizing software or a software sub-system as having aspects of, or as actually being a "Data collection system" is very important. This categorization allows encyclopedic knowledge to be gathered and applied in the design and implementation of future systems. In software design, it is very important to identify generalizations and patterns and to re-use existing knowledge whenever possible. [11]

Types

Generally the computer software used for data collection falls into one of the following categories of practical application. [12]

Vocabulary

There is a taxonomic scheme associated with data collection systems, with readily-identifiable synonyms used by different industries and organizations. [23] [24] [25] Cataloging the most commonly used and widely accepted vocabulary improves efficiencies, helps reduce variations, and improves data quality. [26] [27] [28]

The vocabulary of data collection systems stems from the fact that these systems are often a software representation of what would otherwise be a paper data collection form with a complex internal structure of sections and sub-sections. Modeling these structures and relationships in software yields technical terms describing the hierarchy of data containers, along with a set of industry-specific synonyms. [29] [30]

Collection synonyms

A collection (used as a noun) is the topmost container for grouping related documents, data models, and datasets. Typical vocabulary at this level includes the terms: [29]

  • Project
  • Registry
  • Repository
  • System
  • Top-level Container
  • Library
  • Study
  • Organization
  • Party
  • Site

Data model synonyms

Each document or dataset within a collection is modeled in software. Constructing these models is part of designing or "authoring" the expected data to be collected. The terminology for these data models includes: [29]

  • Datamodel
  • Data dictionary
  • Schema
  • Form
  • Document
  • Survey
  • Instrument
  • Questionnaire
  • Data Sheet
  • Expected Measurements
  • Expected Observations
  • Encounter Form
  • Study Visit Form

Sub-collection or master-detail synonyms

Data models are often hierarchical, containing sub-collections or master–detail structures described with terms such as: [29]

  • Section, Sub-section
  • Block
  • Module
  • Sub-document
  • Roster
  • Parent-Child [31]
  • Dynamic List [31]

Data element synonyms

At the lowest level of the data model are the data elements that describe individual pieces of data. Synonyms include: [29] [32]

Data point synonyms

Moving from the abstract, domain modelling facet to that of the concrete, actual data: the lowest level here is the data point within a dataset. Synonyms for data point include: [29]

  • Value
  • Input
  • Answer
  • Response
  • Observation
  • Measurement
  • Parameter Value
  • Column Value

Dataset synonyms

Finally, the synonyms for dataset include: [29]

  • Row
  • Record
  • Occurrence
  • Instance
  • (Document) Filing
  • Episode
  • Submission
  • Observation Point
  • Case
  • Test
  • (Individual) Sample

See also

Related Research Articles

<span class="mw-page-title-main">Data set</span> Collection of data

A data set is a collection of data. In the case of tabular data, a data set corresponds to one or more database tables, where every column of a table represents a particular variable, and each row corresponds to a given record of the data set in question. The data set lists values for each of the variables, such as for example height and weight of an object, for each member of the data set. Data sets can also consist of a collection of documents or files.

<span class="mw-page-title-main">Geographic information system</span> System to capture, manage and present geographic data

A geographic information system (GIS) consists of integrated computer hardware and software that store, manage, analyze, edit, output, and visualize geographic data. Much of this often happens within a spatial database, however, this is not essential to meet the definition of a GIS. In a broader sense, one may consider such a system also to include human users and support staff, procedures and workflows, the body of knowledge of relevant concepts and methods, and institutional organizations.

In computer science, an abstract machine is a theoretical model that allows for a detailed and precise analysis of how a computer system functions. It is similar to a mathematical function in that it receives inputs and produces outputs based on predefined rules. Abstract machines vary from literal machines in that they are expected to perform correctly and independently of hardware. Abstract machines are "machines" because they allow step-by-step execution of programmes; they are "abstract" because they ignore many aspects of actual (hardware) machines. A typical abstract machine consists of a definition in terms of input, output, and the set of allowable operations used to turn the former into the latter. They can be used for purely theoretical reasons as well as models for real-world computer systems. In the theory of computation, abstract machines are often used in thought experiments regarding computability or to analyse the complexity of algorithms. This use of abstract machines is fundamental to the field of computational complexity theory, such as finite state machines, Mealy machines, push-down automata, and Turing machines.

<span class="mw-page-title-main">Prototype</span> Early sample or model built to test a concept or process

A prototype is an early sample, model, or release of a product built to test a concept or process. It is a term used in a variety of contexts, including semantics, design, electronics, and software programming. A prototype is generally used to evaluate a new design to enhance precision by system analysts and users. Prototyping serves to provide specifications for a real, working system rather than a theoretical one. In some design workflow models, creating a prototype is the step between the formalization and the evaluation of an idea.

<span class="mw-page-title-main">Computational biology</span> Branch of biology

Computational biology refers to the use of data analysis, mathematical modeling and computational simulations to understand biological systems and relationships. An intersection of computer science, biology, and big data, the field also has foundations in applied mathematics, chemistry, and genetics. It differs from biological computing, a subfield of computer science and engineering which uses bioengineering to build computers.

<span class="mw-page-title-main">Drupal</span> Web content management system

Drupal is a free and open-source web content management system (CMS) written in PHP and distributed under the GNU General Public License. Drupal provides an open-source back-end framework for at least 14% of the top 10,000 websites worldwide and 1.2% of the top 10 million websites—ranging from personal blogs to corporate, political, and government sites. Systems also use Drupal for knowledge management and for business collaboration.

<span class="mw-page-title-main">SPSS</span> Statistical analysis software

SPSS Statistics is a statistical software suite developed by IBM for data management, advanced analytics, multivariate analysis, business intelligence, and criminal investigation. Long produced by SPSS Inc., it was acquired by IBM in 2009. Versions of the software released since 2015 have the brand name IBM SPSS Statistics.

A distributed control system (DCS) is a computerised control system for a process or plant usually with many control loops, in which autonomous controllers are distributed throughout the system, but there is no central operator supervisory control. This is in contrast to systems that use centralized controllers; either discrete controllers located at a central control room or within a central computer. The DCS concept increases reliability and reduces installation costs by localising control functions near the process plant, with remote monitoring and supervision.

<span class="mw-page-title-main">Digital object identifier</span> ISO standard unique string identifier for a digital object

A digital object identifier (DOI) is a persistent identifier or handle used to uniquely identify various objects, standardized by the International Organization for Standardization (ISO). DOIs are an implementation of the Handle System; they also fit within the URI system. They are widely used to identify academic, professional, and government information, such as journal articles, research reports, data sets, and official publications. DOIs have also been used to identify other types of information resources, like commercial videos.

<span class="mw-page-title-main">Yokogawa Electric</span> Japanese electrical engineering company

Yokogawa Electric Corporation is a Japanese multinational electrical engineering and software company, with businesses based on its measurement, control, and information technologies.

<span class="mw-page-title-main">UK Biobank</span> Long-term biobank study of 500,000 people

UK Biobank is a large long-term biobank study in the United Kingdom (UK) which is investigating the respective contributions of genetic predisposition and environmental exposure to the development of disease. It began in 2006.

Data quality refers to the state of qualitative or quantitative pieces of information. There are many definitions of data quality, but data is generally considered high quality if it is "fit for [its] intended uses in operations, decision making and planning". Moreover, data is deemed of high quality if it correctly represents the real-world construct to which it refers. Furthermore, apart from these definitions, as the number of data sources increases, the question of internal data consistency becomes significant, regardless of fitness for use for any particular external purpose. People's views on data quality can often be in disagreement, even when discussing the same set of data used for the same purpose. When this is the case, data governance is used to form agreed upon definitions and standards for data quality. In such cases, data cleansing, including standardization, may be required in order to ensure data quality.

Address geocoding, or simply geocoding, is the process of taking a text-based description of a location, such as an address or the name of a place, and returning geographic coordinates, frequently latitude/longitude pair, to identify a location on the Earth's surface. Reverse geocoding, on the other hand, converts geographic coordinates to a description of a location, usually the name of a place or an addressable location. Geocoding relies on a computer representation of address points, the street / road network, together with postal and administrative boundaries.

In control theory, advanced process control (APC) refers to a broad range of techniques and technologies implemented within industrial process control systems. Advanced process controls are usually deployed optionally and in addition to basic process controls. Basic process controls are designed and built with the process itself, to facilitate basic operation, control and automation requirements. Advanced process controls are typically added subsequently, often over the course of many years, to address particular performance or economic improvement opportunities in the process.

An industrial control system (ICS) is an electronic control system and associated instrumentation used for industrial process control. Control systems can range in size from a few modular panel-mounted controllers to large interconnected and interactive distributed control systems (DCSs) with many thousands of field connections. Control systems receive data from remote sensors measuring process variables (PVs), compare the collected data with desired setpoints (SPs), and derive command functions that are used to control a process through the final control elements (FCEs), such as control valves.

In computing, the term remote desktop refers to a software- or operating system feature that allows a personal computer's desktop environment to be run remotely from one system, while being displayed on a separate client device. Remote desktop applications have varying features. Some allow attaching to an existing user's session and "remote controlling", either displaying the remote control session or blanking the screen. Taking over a desktop remotely is a form of remote administration.

<span class="mw-page-title-main">Metadata</span> Data about data

Metadata is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including:

<span class="mw-page-title-main">Data</span> Units of information

In common usage and statistics, data is a collection of discrete or continuous values that convey information, describing the quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted formally. A datum is an individual value in a collection of data. Data is usually organized into structures such as tables that provide additional context and meaning, and which may themselves be used as data in larger structures. Data may be used as variables in a computational process. Data may represent abstract ideas or concrete measurements. Data is commonly used in scientific research, economics, and in virtually every other form of human organizational activity. Examples of data sets include price indices, unemployment rates, literacy rates, and census data. In this context, data represents the raw facts and figures from which useful information can be extracted.

The Open Semantic Framework (OSF) is an integrated software stack using semantic technologies for knowledge management. It has a layered architecture that combines existing open source software with additional open source components developed specifically to provide a complete Web application framework. OSF is made available under the Apache 2 license.

Microsoft Power BI is an interactive data visualization software product developed by Microsoft with a primary focus on business intelligence. It is part of the Microsoft Power Platform. Power BI is a collection of software services, apps, and connectors that work together to turn various sources of data into static and interactive data visualizations. Data may be input by reading directly from a database, webpage, PDF, or structured files such as spreadsheets, CSV, XML, JSON, XLSX, and SharePoint.

References

  1. "What is a Data Collection System (DCS)? - Definition from Techopedia". Techopedia.com. Retrieved 2016-10-14.
  2. "Planning and Design of Data Collection Systems". U.S. Department of Transportation (US DOT). 2005-08-15. Retrieved 2016-10-14.
  3. "Surveys and Data Collection Systems". U.S. Department of Health & Human Services. 2016-04-16. Retrieved 2016-10-14.
  4. "Using SharePoint Forms for Data Collection". Microsoft Corporation. Retrieved 2016-10-14.
  5. "Using Drupal for Multi-Page Collection of Data from Users". The Drupal Association. 2009-07-03. Retrieved 2016-10-14.
  6. "Data collection". SearchCIO. TechTarget. Retrieved 20 December 2016.
  7. "Which Data Collection Method Should I Choose?". B2B International. Retrieved 20 December 2016.
  8. "How and Why Data Will Save Small Business". Small Business Trends Small Business Trends. Small Business Trends LLC. 2015-03-20. Retrieved 20 December 2016.
  9. "FAQ: Data Collection Requirements for Broker-Dealers". FINRA.org. Financial Industry Regulatory Authority, Inc. on behalf of the U.S. Securities and Exchange Commission (SEC). Retrieved 4 February 2017.
  10. Data Collection and Analysis By Dr. Roger Sapsford, Victor Jupp ISBN   0-7619-5046-X
  11. Sen, A. (1997). "The role of opportunism in the software design reuse process". IEEE Transactions on Software Engineering. 23 (7): 418–436. doi:10.1109/32.605760.
  12. "Data Collection Software". GetApp. Nubera eBusiness S.L. Retrieved 20 December 2016.
  13. "Survey Data Collection". NORC at the University of Chicago. 2016. Retrieved 2016-10-14.
  14. "Using the Data Collection System". U.S. Department of Education. 2016. Retrieved 2016-10-14.
  15. "How to Collect Data". American College of Cardiology. 2016. Retrieved 2016-10-14.
  16. Frøen, J. F.; Myhre, S. L.; Frost, M. J.; Chou, D.; Mehl, G.; Say, L.; Cheng, S.; Fjeldheim, I.; Friberg, I. K.; French, S.; Jani, J. V.; Kaye, J.; Lewis, J.; Lunde, A.; Mørkrid, K.; Nankabirwa, V.; Nyanchoka, L.; Stone, H.; Venkateswaran, M.; Wojcieszek, A. M.; Temmerman, M.; Flenady, V. J. (2016). "eRegistries: Electronic registries for maternal and child health". BMC Pregnancy and Childbirth. 16: 11. doi: 10.1186/s12884-016-0801-7 . PMC   4721069 . PMID   26791790.
  17. Pace, W. D.; Staton, E. W. (2005). "Electronic Data Collection Options for Practice-Based Research Networks". Annals of Family Medicine. 3 (Suppl 1): s21–s29. doi:10.1370/afm.270. PMC   1466955 . PMID   15928215.
  18. "MANAGING DATA FOR PERFORMANCE IMPROVEMENT" (PDF). U. S. Department of Health and Human Services Health Resources and Services Administration.
  19. "Collecting and Reporting Data for Performance Measurement: Moving Toward Alignment". Proceedings of the AHRQ Conference on Health Care Data Collection and Reporting. AHRQ Publication No. 07-0033-EF (March 2007). November 8–9, 2006. Retrieved 4 February 2017.
  20. "Quiz - Drupal.org". Drupal.org. Dries Buytaert. 6 July 2005. Retrieved 20 December 2016.
  21. "Online QuizBuilder web app built with Laravel". Webxity. Webxity Technologies.
  22. "Regulatory Filing". FINRA.org. Financial Industry Regulatory Authority, Inc. on behalf of the U.S. Securities and Exchange Commission (SEC). Retrieved 4 February 2017.
  23. Hay, David C. (2006). Data model patterns a metadata map ([Repr.]. ed.). Amsterdam: Elsevier Morgan Kaufmann. p. 40. ISBN   978-0120887989 . Retrieved 5 February 2017.
  24. "Classification, Taxonomies and You" (PDF). Verity. Verity, Inc. Retrieved 6 February 2017.
  25. Bayona-Oré, Sussy; Calvo-Manzano, Jose A.; Cuevas, Gonzalo; San-Feliu, Tomas (21 December 2012). "Critical success factors taxonomy for software process deployment". Software Quality Journal. 22 (1): 21–48. doi:10.1007/s11219-012-9190-y. S2CID   18047921.
  26. "Collecting and Reporting Data for Performance Measurement: Moving Toward Alignment". Proceedings of the AHRQ Conference on Health Care Data Collection and Reporting. AHRQ Publication No. 07-0033-EF (March 2007): 13 of 50. November 8–9, 2006. Retrieved 4 February 2017.
  27. Busch, Joseph. "Conducting Taxonomy Validation: Healthcare Example" (PDF). Taxonomy Strategies. Taxonomy Strategies LLC. Retrieved 7 February 2017.
  28. "6 Challenges: Performance Measurement Data Collection & Reporting". Extract Systems. 15 December 2016. Retrieved 7 February 2017.
  29. 1 2 3 4 5 6 7 Hay, David C. (1996). Data model patterns : conventions of thought. New York: Dorset House Pub. p. 218ff. ISBN   978-0932633293 . Retrieved 6 February 2017.
  30. Wendicke, Annemarie (March 2016). "What Makes Data Meaningful? The Important Role of Data Structures". Journal of AHIMA. 87 (3): 34–36. PMID   27039625 . Retrieved 7 February 2017.
  31. 1 2 "NCDR® AFib Ablation Registry™ v1.0 - Data Dictionary - Full Specifications [PDF]". ACC Quality Improvement for Institutions. American College of Cardiology. p. 36 of 143. Retrieved 9 February 2017.
  32. "Data Element: Federal Standard 1037C: Glossary of Telecommunications Terms". www.its.bldrdoc.gov. U.S. Dept. of Commerce, Institute for Telecommunication Sciences. Archived from the original on 1 March 2011. Retrieved 7 February 2017.