This article needs additional citations for verification .(November 2016) |
In computing, data validation or input validation is the process of ensuring data has undergone data cleansing to confirm it has data quality, that is, that it is both correct and useful. It uses routines, often called "validation rules", "validation constraints", or "check routines", that check for correctness, meaningfulness, and security of data that are input to the system. The rules may be implemented through the automated facilities of a data dictionary, or by the inclusion of explicit application program validation logic of the computer and its application.
This is distinct from formal verification, which attempts to prove or disprove the correctness of algorithms for implementing a specification or property.
Data validation is intended to provide certain well-defined guarantees for fitness and consistency of data in an application or automated system. Data validation rules can be defined and designed using various methodologies, and be deployed in various contexts. [1] Their implementation can use declarative data integrity rules, or procedure-based business rules. [2]
The guarantees of data validation do not necessarily include accuracy, and it is possible for data entry errors such as misspellings to be accepted as valid. Other clerical and/or computer controls may be applied to reduce inaccuracy within a system.
In evaluating the basics of data validation, generalizations can be made regarding the different kinds of validation according to their scope, complexity, and purpose.
For example:
Data type validation is customarily carried out on one or more simple data fields.
The simplest kind of data type validation verifies that the individual characters provided through user input are consistent with the expected characters of one or more known primitive data types as defined in a programming language or data storage and retrieval mechanism.
For example, an integer field may require input to use only characters 0 through 9.
Simple range and constraint validation may examine input for consistency with a minimum/maximum range, or consistency with a test for evaluating a sequence of characters, such as one or more tests against regular expressions. For example, a counter value may be required to be a non-negative integer, and a password may be required to meet a minimum length and contain characters from multiple categories.
Code and cross-reference validation includes operations to verify that data is consistent with one or more possibly-external rules, requirements, or collections relevant to a particular organization, context or set of underlying assumptions. These additional validity constraints may involve cross-referencing supplied data with a known look-up table or directory information service such as LDAP.
For example, a user-provided country code might be required to identify a current geopolitical region.
Structured validation allows for the combination of other kinds of validation, along with more complex processing. Such complex processing may include the testing of conditional constraints for an entire complex data object or set of process operations within a system.
Consistency validation ensures that data is logical. For example, the delivery date of an order can be prohibited from preceding its shipment date.
Multiple kinds of data validation are relevant to 10-digit pre-2007 ISBNs (the 2005 edition of ISO 2108 required ISBNs to have 13 digits from 2007 onwards [3] ).
This section needs additional citations for verification .(July 2012) |
Failures or omissions in data validation can lead to data corruption or a security vulnerability. [4] Data validation checks that data are fit for purpose, [5] valid, sensible, reasonable and secure before they are processed.
The International Bank Account Number (IBAN) for example LV30RIKO0000083232646 is an internationally agreed upon system of identifying bank accounts across national borders to facilitate the communication and processing of cross border transactions with a reduced risk of transcription errors. An IBAN uniquely identifies the account of a customer at a financial institution. It was originally adopted by the European Committee for Banking Standards (ECBS) and since 1997 as the international standard ISO 13616 under the International Organization for Standardization (ISO). The current version is ISO 13616:2020, which indicates the Society for Worldwide Interbank Financial Telecommunication (SWIFT) as the formal registrar. Initially developed to facilitate payments within the European Union, it has been implemented by most European countries and numerous countries in other parts of the world, mainly in the Middle East and the Caribbean. By July 2024, 88 countries were using the IBAN numbering system.
Software testing is the act of checking whether software satisfies expectations.
Data integrity is the maintenance of, and the assurance of, data accuracy and consistency over its entire life-cycle. It is a critical aspect to the design, implementation, and usage of any system that stores, processes, or retrieves data. The term is broad in scope and may have widely different meanings depending on the specific context even under the same general umbrella of computing. It is at times used as a proxy term for data quality, while data validation is a prerequisite for data integrity.
In computer science, ACID is a set of properties of database transactions intended to guarantee data validity despite errors, power failures, and other mishaps. In the context of databases, a sequence of database operations that satisfies the ACID properties is called a transaction. For example, a transfer of funds from one bank account to another, even involving multiple changes such as debiting one account and crediting another, is a single transaction.
In computer software, business logic or domain logic is the part of the program that encodes the real-world business rules that determine how data can be created, stored, and changed. It is contrasted with the remainder of the software that might be concerned with lower-level details of managing a database or displaying the user interface, system infrastructure, or generally connecting various parts of the program.
In computer programming, a type system is a logical system comprising a set of rules that assigns a property called a type to every term. Usually the terms are various language constructs of a computer program, such as variables, expressions, functions, or modules. A type system dictates the operations that can be performed on a term. For variables, the type system determines the allowed values of that term.
Extract, transform, load (ETL) is a three-phase computing process where data is extracted from an input source, transformed, and loaded into an output data container. The data can be collected from one or more sources and it can also be output to one or more destinations. ETL processing is typically executed using software applications but it can also be done manually by system operators. ETL software typically automates the entire process and can be run manually or on recurring schedules either as single jobs or aggregated into a batch of jobs.
An email address identifies an email box to which messages are delivered. While early messaging systems used a variety of formats for addressing, today, email addresses follow a set of specific rules originally standardized by the Internet Engineering Task Force (IETF) in the 1980s, and updated by RFC 5322 and 6854. The term email address in this article refers to just the addr-spec in Section 3.4 of RFC 5322. The RFC defines address more broadly as either a mailbox or group. A mailbox value can be either a name-addr, which contains a display-name and addr-spec, or the more common addr-spec alone.
A product key, also known as a software key, serial key or activation key, is a specific software-based key for a computer program. It certifies that the copy of the program is original.
In engineering, a requirement is a condition that must be satisfied for the output of a work effort to be acceptable. It is an explicit, objective, clear and often quantitative description of a condition to be satisfied by a material, design, product, or service.
In software project management, software testing, and software engineering, verification and validation is the process of checking that a software engineer system meets specifications and requirements so that it fulfills its intended purpose. It may also be referred to as software quality control. It is normally the responsibility of software testers as part of the software development lifecycle. In simple terms, software verification is: "Assuming we should build X, does our software achieve its goals without any bugs or gaps?" On the other hand, software validation is: "Was X what we should have built? Does X meet the high-level requirements?"
In electronic design automation, a design rule is a geometric constraint imposed on circuit board, semiconductor device, and integrated circuit (IC) designers to ensure their designs function properly, reliably, and can be produced with acceptable yield. Design rules for production are developed by process engineers based on the capability of their processes to realize design intent. Electronic design automation is used extensively to ensure that designers do not violate design rules; a process called design rule checking (DRC). DRC is a major step during physical verification signoff on the design, which also involves LVS checks, XOR checks, ERC, and antenna checks. The importance of design rules and DRC is greatest for ICs, which have micro- or nano-scale geometries; for advanced processes, some fabs also insist upon the use of more restricted rules to improve yield.
Data cleansing or data cleaning is the process of identifying and correcting corrupt, inaccurate, or irrelevant records from a dataset, table, or database. It involves detecting incomplete, incorrect, or inaccurate parts of the data and then replacing, modifying, or deleting the affected data. Data cleansing can be performed interactively using data wrangling tools, or through batch processing often via scripts or a data quality firewall.
Executable UML is both a software development method and a highly abstract software language. It was described for the first time in 2002 in the book "Executable UML: A Foundation for Model-Driven Architecture". The language "combines a subset of the UML graphical notation with executable semantics and timing rules." The Executable UML method is the successor to the Shlaer–Mellor method.
An entity–attribute–value model (EAV) is a data model optimized for the space-efficient storage of sparse—or ad-hoc—property or data values, intended for situations where runtime usage patterns are arbitrary, subject to user variation, or otherwise unforeseeable using a fixed design. The use-case targets applications which offer a large or rich system of defined property types, which are in turn appropriate to a wide set of entities, but where typically only a small, specific selection of these are instantiated for a given entity. Therefore, this type of data model relates to the mathematical notion of a sparse matrix. EAV is also known as object–attribute–value model, vertical database model, and open schema.
In computer programming, a magic string is an input that a programmer believes will never come externally and which activates otherwise hidden functionality. A user of this program would likely provide input that gives an expected response in most situations. However, if the user does in fact innocently provide the pre-defined input, invoking the internal functionality, the program response is often quite unexpected to the user.
Industrial process data validation and reconciliation, or more briefly, process data reconciliation (PDR), is a technology that uses process information and mathematical methods in order to automatically ensure data validation and reconciliation by correcting measurements in industrial processes. The use of PDR allows for extracting accurate and reliable information about the state of industry processes from raw measurement data and produces a single consistent set of data representing the most likely process operation.
In information technology a reasoning system is a software system that generates conclusions from available knowledge using logical techniques such as deduction and induction. Reasoning systems play an important role in the implementation of artificial intelligence and knowledge-based systems.
Clinical data management (CDM) is a critical process in clinical research, which leads to generation of high-quality, reliable, and statistically sound data from clinical trials. Clinical data management ensures collection, integration and availability of data at appropriate quality and cost. It also supports the conduct, management and analysis of studies across the spectrum of clinical research as defined by the National Institutes of Health (NIH). The ultimate goal of CDM is to ensure that conclusions drawn from research are well supported by the data. Achieving this goal protects public health and increases confidence in marketed therapeutics.
In business analysis, the Decision Model and Notation (DMN) is a standard published by the Object Management Group. It is a standard approach for describing and modeling repeatable decisions within organizations to ensure that decision models are interchangeable across organizations.