Data masking or data obfuscation is the process of modifying sensitive data in such a way that it is of no or little value to unauthorized intruders while still being usable by software or authorized personnel. Data masking can also be referred as anonymization, or tokenization, depending on different context.
The main reason to mask data is to protect information that is classified as personally identifiable information, or mission critical data. However, the data must remain usable for the purposes of undertaking valid test cycles. It must also look real and appear consistent. It is more common to have masking applied to data that is represented outside of a corporate production system. In other words, where data is needed for the purpose of application development, building program extensions and conducting various test cycles. It is common practice in enterprise computing to take data from the production systems to fill the data component, required for these non-production environments. However, this practice is not always restricted to non-production environments. In some organizations, data that appears on terminal screens to call center operators may have masking dynamically applied based on user security permissions (e.g. preventing call center operators from viewing credit card numbers in billing systems).
The primary concern from a corporate governance perspective [1] is that personnel conducting work in these non-production environments are not always security cleared to operate with the information contained in the production data. This practice represents a security hole where data can be copied by unauthorized personnel, and security measures associated with standard production level controls can be easily bypassed. This represents an access point for a data security breach.
Data involved in any data masking or obfuscation must remain meaningful at several levels:
Substitution is one of the most effective methods of applying data masking and being able to preserve the authentic look and feel of the data records.
It allows the masking to be performed in such a manner that another authentic-looking value can be substituted for the existing value. [2] There are several data field types where this approach provides optimal benefit in disguising the overall data subset as to whether or not it is a masked data set. For example, if dealing with source data which contains customer records, real life surname or first name can be randomly substituted from a supplied or customised look up file. If the first pass of the substitution allows for applying a male first name to all first names, then the second pass would need to allow for applying a female first name to all first names where gender equals "F." Using this approach we could easily maintain the gender mix within the data structure, apply anonymity to the data records but also maintain a realistic looking database, which could not easily be identified as a database consisting of masked data.
This substitution method needs to be applied for many of the fields that are in database structures across the world, such as telephone numbers, zip codes and postcodes, as well as credit card numbers and other card type numbers like Social Security numbers and Medicare numbers where these numbers actually need to conform to a checksum test of the Luhn algorithm.
In most cases, the substitution files will need to be fairly extensive so having large substitution datasets as well the ability to apply customized data substitution sets should be a key element of the evaluation criteria for any data masking solution.
The shuffling method is a very common form of data obfuscation. It is similar to the substitution method but it derives the substitution set from the same column of data that is being masked. In very simple terms, the data is randomly shuffled within the column. [3] However, if used in isolation, anyone with any knowledge of the original data can then apply a "what if" scenario to the data set and then piece back together a real identity. The shuffling method is also open to being reversed if the shuffling algorithm can be deciphered.[ citation needed ]
Data shuffling overcomes reservations about using perturbed or modified confidential data because it retains all the desirable properties of perturbation while performing better than other masking techniques in both data utility and disclosure risk. [3]
Shuffling, however, has some real strengths in certain areas. If for instance, the end of year figures for financial information in a test data base, one can mask the names of the suppliers and then shuffle the value of the accounts throughout the masked database. It is highly unlikely that anyone, even someone with intimate knowledge of the original data could derive a true data record back to its original values.
The numeric variance method is very useful for applying to financial and date driven information fields. Effectively, a method utilising this manner of masking can still leave a meaningful range in a financial data set such as payroll. If the variance applied is around +/- 10% then it is still a very meaningful data set in terms of the ranges of salaries that are paid to the recipients.
The same also applies to the date information. If the overall data set needs to retain demographic and actuarial data integrity, then applying a random numeric variance of +/- 120 days to date fields would preserve the date distribution, but it would still prevent traceability back to a known entity based on their known actual date or birth or a known date value for whatever record is being masked.
Encryption is often the most complex approach to solving the data masking problem. The encryption algorithm often requires that a "key" be applied to view the data based on user rights. This often sounds like the best solution, but in practice the key may then be given out to personnel without the proper rights to view the data. This then defeats the purpose of the masking exercise. Old databases may then get copied with the original credentials of the supplied key and the same uncontrolled problem lives on.
Recently, the problem of encrypting data while preserving the properties of the entities got recognition and a newly acquired interest among the vendors and academia. New challenge gave birth to algorithms performing format-preserving encryption. These are based on the accepted Advanced Encryption Standard (AES) algorithmic mode recognized by NIST. [4]
Sometimes a very simplistic approach to masking is adopted through applying a null value to a particular field. The null value approach is really only useful to prevent visibility of the data element.
In almost all cases, it lessens the degree of data integrity that is maintained in the masked data set. It is not a realistic value and will then fail any application logic validation that may have been applied in the front end software that is in the system under test. It also highlights to anyone that wishes to reverse engineer any of the identity data that data masking has been applied to some degree on the data set.
Character scrambling or masking out of certain fields is also another simplistic yet very effective method of preventing sensitive information to be viewed. It is really an extension of the previous method of nulling out, but there is a greater emphasis on keeping the data real and not fully masked all together.
This is commonly applied to credit card data in production systems. For instance, an operator at a call centre might bill an item to a customer's credit card. They then quote a billing reference to the card with the last 4 digits of XXXX XXXX xxxx 6789. As an operator they can only see the last 4 digits of the card number, but once the billing system passes the customer's details for charging, the full number is revealed to the payment gateway systems.
This system is not very effective for test systems, but it is very useful for the billing scenario detailed above. It is also commonly known as a dynamic data masking method. [5] [6]
Additional rules can also be factored into any masking solution regardless of how the masking methods are constructed. Product agnostic white papers [7] are a good source of information for exploring some of the more common complex requirements for enterprise masking solutions, which include row internal synchronization rules, table internal synchronization rules and table [8] to Table Synchronization Rules.
Data masking is tightly coupled with building test data. Two major types of data masking are static and on-the-fly data masking.
Static data masking is usually performed on the golden copy of the database, but can also be applied to values in other sources, including files. In DB environments, production database administrators will typically load table backups to a separate environment, reduce the dataset to a subset that holds the data necessary for a particular round of testing (a technique called "subsetting"), apply data masking rules while data is in stasis, apply necessary code changes from source control, and/or and push data to desired environment. [9]
Deterministic masking is the process of replacing a value in a column with the same value whether in the same row, the same table, the same database/schema and between instances/servers/database types. Example: A database has multiple tables, each with a column that has first names. With deterministic masking the first name will always be replaced with the same value – “Lynne” will always become “Denise” – wherever “Lynne” may be in the database. [10]
There are also alternatives to the static data masking that rely on stochastic perturbations of the data that preserve some of the statistical properties of the original data. Examples of statistical data obfuscation methods include differential privacy [11] and the DataSifter method. [12]
On-the-fly data masking [13] happens in the process of transferring data from environment to environment without data touching the disk on its way. The same technique is applied to "Dynamic Data Masking" but one record at a time. This type of data masking is most useful for environments that do continuous deployments as well as for heavily integrated applications. Organizations that employ continuous deployment or continuous delivery practices do not have the time necessary to create a backup and load it to the golden copy of the database. Thus, continuously sending smaller subsets (deltas) of masked testing data from production is important. In heavily integrated applications, developers get feeds from other production systems at the very onset of development and masking of these feeds is either overlooked and not budgeted until later, making organizations non-compliant. Having on-the-fly data masking in place becomes essential.
Dynamic data masking is similar to on-the-fly data masking, but it differs in the sense that on-the-fly data masking is about copying data from one source to another source so that the latter can be shared. Dynamic data masking happens at runtime, dynamically, and on-demand so that there doesn't need to be a second data source where to store the masked data dynamically.
Dynamic data masking enables several scenarios, many of which revolve around strict privacy regulations e.g. the Singapore Monetary Authority or the Privacy regulations in Europe.
Dynamic data masking is attribute-based and policy-driven. Policies include:
Dynamic data masking can also be used to encrypt or decrypt values on the fly especially when using format-preserving encryption.
Several standards have emerged in recent years to implement dynamic data filtering and masking. For instance, XACML policies can be used to mask data inside databases.
There are six possible technologies to apply Dynamic data masking:
In latest years, organizations develop their new applications in the cloud more and more often, regardless of whether final applications will be hosted in the cloud or on- premises. The cloud solutions as of now allow organizations to use infrastructure as a service, platform as a service, and software as a service. There are various modes of creating test data and moving it from on-premises databases to the cloud, or between different environments within the cloud. Dynamic Data Masking becomes even more critical in cloud when customers need to protecting PII data while relying on cloud providers to administer their databases. Data masking invariably becomes the part of these processes in the systems development life cycle (SDLC) as the development environments' service-level agreements (SLAs) are usually not as stringent as the production environments' SLAs regardless of whether application is hosted in the cloud or on-premises.
In cryptography, a brute-force attack consists of an attacker submitting many passwords or passphrases with the hope of eventually guessing correctly. The attacker systematically checks all possible passwords and passphrases until the correct one is found. Alternatively, the attacker can attempt to guess the key which is typically created from the password using a key derivation function. This is known as an exhaustive key search.
Tokenization, when applied to data security, is the process of substituting a sensitive data element with a non-sensitive equivalent, referred to as a token, that has no intrinsic or exploitable meaning or value. The token is a reference that maps back to the sensitive data through a tokenization system. The mapping from original data to a token uses methods that render tokens infeasible to reverse in the absence of the tokenization system, for example using tokens created from random numbers. A one-way cryptographic function is used to convert the original data into tokens, making it difficult to recreate the original data without obtaining entry to the tokenization system's resources. To deliver such services, the system maintains a vault database of tokens that are connected to the corresponding sensitive data. Protecting the system vault is vital to the system, and improved processes must be put in place to offer database integrity and physical security.
In computer networking, a proxy server is a server application that acts as an intermediary between a client requesting a resource and the server providing that resource. It improves privacy, security, and performance in the process.
Key management refers to management of cryptographic keys in a cryptosystem. This includes dealing with the generation, exchange, storage, use, crypto-shredding (destruction) and replacement of keys. It includes cryptographic protocol design, key servers, user procedures, and other relevant protocols.
Internet security is a branch of computer security. It encompasses the Internet, browser security, web site security, and network security as it applies to other applications or operating systems as a whole. Its objective is to establish rules and measures to use against attacks over the Internet. The Internet is an inherently insecure channel for information exchange, with high risk of intrusion or fraud, such as phishing, online viruses, trojans, ransomware and worms.
Secure communication is when two entities are communicating and do not want a third party to listen in. For this to be the case, the entities need to communicate in a way that is unsusceptible to eavesdropping or interception. Secure communication includes means by which people can share information with varying degrees of certainty that third parties cannot intercept what is said. Other than spoken face-to-face communication with no possible eavesdropper, it is probable that no communication is guaranteed to be secure in this sense, although practical obstacles such as legislation, resources, technical issues, and the sheer volume of communication serve to limit surveillance.
Proxy re-encryption (PRE) schemes are cryptosystems which allow third parties (proxies) to alter a ciphertext which has been encrypted for one party, so that it may be decrypted by another.
Mental poker is the common name for a set of cryptographic problems that concerns playing a fair game over distance without the need for a trusted third party. The term is also applied to the theories surrounding these problems and their possible solutions. The name comes from the card game poker which is one of the games to which this kind of problem applies. Similar problems described as two party games are Blum's flipping a coin over a distance, Yao's Millionaires' Problem, and Rabin's oblivious transfer.
Attribute-based access control (ABAC), also known as policy-based access control for IAM, defines an access control paradigm whereby a subject's authorization to perform a set of operations is determined by evaluating attributes associated with the subject, object, requested operations, and, in some cases, environment attributes.
Privacy-enhancing technologies (PET) are technologies that embody fundamental data protection principles by minimizing personal data use, maximizing data security, and empowering individuals. PETs allow online users to protect the privacy of their personally identifiable information (PII), which is often provided to and handled by services or applications. PETs use techniques to minimize an information system's possession of personal data without losing functionality. Generally speaking, PETs can be categorized as either hard or soft privacy technologies.
In cryptography, format-preserving encryption (FPE), refers to encrypting in such a way that the output is in the same format as the input. The meaning of "format" varies. Typically only finite sets of characters are used; numeric, alphabetic or alphanumeric. For example:
Linoma Software was a developer of secure managed file transfer and IBM i software solutions. The company was acquired by HelpSystems in June 2016. Mid-sized companies, large enterprises and government entities use Linoma's software products to protect sensitive data and comply with data security regulations such as PCI DSS, HIPAA/HITECH, SOX, GLBA and state privacy laws. Linoma's software runs on a variety of platforms including Windows, Linux, UNIX, IBM i, AIX, Solaris, HP-UX and Mac OS X.
Database activity monitoring is a database security technology for monitoring and analyzing database activity. DAM may combine data from network-based monitoring and native audit information to provide a comprehensive picture of database activity. The data gathered by DAM is used to analyze and report on database activity, support breach investigations, and alert on anomalies. DAM is typically performed continuously and in real-time.
Cloud computing security or, more simply, cloud security, refers to a broad set of policies, technologies, applications, and controls utilized to protect virtualized IP, data, applications, services, and the associated infrastructure of cloud computing. It is a sub-domain of computer security, network security, and, more broadly, information security.
In Electronic Health Records (EHR's) data masking, or controlled access, is the process of concealing patient health data from certain healthcare providers. Patients have the right to request the masking of their personal information, making it inaccessible to any physician, or a particular physician, unless a specific reason is provided. Data masking is also performed by healthcare agencies to restrict the amount of information that can be accessed by external bodies such as researchers, health insurance agencies and unauthorised individuals. It is a method used to protect patients’ sensitive information so that privacy and confidentiality are less of a concern. Techniques used to alter information within a patient's EHR include data encryption, obfuscation, hashing, exclusion and perturbation.
WebLOAD is load testing tool, performance testing, stress test web applications. This web and mobile load testing and analysis tool is from RadView Software. Load testing tool WebLOAD combines performance, scalability, and integrity as a single process for the verification of web and mobile applications. It can simulate hundreds of thousands of concurrent users making it possible to test large loads and report bottlenecks, constraints, and weak points within an application.
Identity-based conditional proxy re-encryption (IBCPRE) is a type of proxy re-encryption (PRE) scheme in the identity-based public key cryptographic setting. An IBCPRE scheme is a natural extension of proxy re-encryption on two aspects. The first aspect is to extend the proxy re-encryption notion to the identity-based public key cryptographic setting. The second aspect is to extend the feature set of proxy re-encryption to support conditional proxy re-encryption. By conditional proxy re-encryption, a proxy can use an IBCPRE scheme to re-encrypt a ciphertext but the ciphertext would only be well-formed for decryption if a condition applied onto the ciphertext together with the re-encryption key is satisfied. This allows fine-grained proxy re-encryption and can be useful for applications such as secure sharing over encrypted cloud data storage.
Data-centric security is an approach to security that emphasizes the dependability of the data itself rather than the security of networks, servers, or applications. Data-centric security is evolving rapidly as enterprises increasingly rely on digital information to run their business and big data projects become mainstream. It involves the separation of data and digital rights management that assign encrypted files to pre-defined access control lists, ensuring access rights to critical and confidential data are aligned with documented business needs and job requirements that are attached to user identities.
Ride sharing networks face issues of user privacy like other online platforms do. Concerns surrounding the apps include the security of financial details, and privacy of personal details and location. Privacy concerns can also rise during the ride as some drivers choose to use passenger facing cameras for their own security. As the use of ride sharing services become more widespread so do the privacy issues associated with them.
Identity replacement technology is any technology that is used to cover up all or parts of a person's identity, either in real life or virtually. This can include face masks, face authentication technology, and deepfakes on the Internet that spread fake editing of videos and images. Face replacement and identity masking are used by either criminals or law-abiding citizens. Identity replacement tech, when operated on by criminals, leads to heists or robbery activities. Law-abiding citizens utilize identity replacement technology to prevent government or various entities from tracking private information such as locations, social connections, and daily behaviors.