Identity correlation

Last updated July 09, 2024

Identity correlation is, in information systems, a process that reconciles and validates the proper ownership of disparate user account login IDs (user names) that reside on systems and applications throughout an organization and can permanently link ownership of those user account login IDs to particular individuals by assigning a unique identifier (also called primary or common keys) to all validated account login IDs.^[1]

Basic Requirements of Identity Correlation
Linking Disparate Account IDs Across Multiple Systems or Applications
Discovering Intentional and Unintentional Inconsistencies in Identity Data
Identifying Orphan or Defunct Account Login IDs
Validating Individuals to their Appropriate Account IDs
Assigning a unique primary or common key for every system or application Account ID that is attached to each individual
Approaches to Linking Disparate Account IDs
Common Barriers to Performing Identity Correlation
Privacy Concerns
Extensive Time and Effort Requirements
Three Methods of Identity Correlation Project Delivery
See Also: Related Topics
References

The process of identity correlation validates that individuals only have account login IDs for the appropriate systems and applications a user should have access to according to the organization's business policies, access control policies, and various application requirements.

In the context of identity correlation, a unique identifier is one that is guaranteed to be unique among those used for a group and for a specific purpose. There are three main types, each corresponding to a different generation strategy:

Serial numbers, assigned incrementally
Random numbers, selected from a number space much larger than the maximum (or expected) number of objects to be identified. Although not unique, some identifiers of this type may be appropriate for identifying objects in many practical applications and so are referred to as “unique” within this context
Names or codes allocated by choice but forced to be unique by keeping a central registry such as the EPC Information Services of the EPCglobal Network

For identity correlation, a unique identifier is typically a serial or random number. In this context, a unique identifier is typically represented as an additional attribute in the directory associated with each particular data source. However, adding an attribute to each system-specific directory may affect application or specific business requirements, depending on the requirements of the organization. Under these circumstances, unique identifiers may not be an acceptable addition.

Basic Requirements of Identity Correlation

Identity Correlation involves several factors:

Linking Disparate Account IDs Across Multiple Systems or Applications

Many organizations must find a method to comply with audits that require them to link disparate application user identities with the actual people who are associated with those user identities.

Some individuals may have a fairly common first and/or last name, which makes it difficult to link the right individual to the appropriate account login ID, especially when those account login IDs are not linked to enough specific identity data to remain unique.

A typical construct of the login ID, for example, can be the 1st character of givenname + next 7 of a erial number, with incremental uniqueness. This would produce login IDs like jsmith12, jsmith 13, jsmith14, etc. for users John Smith, James Smith, and Jack Smith, respectively.

Conversely, one individual might undergo a name change either formally or informally, which can cause new account login IDs that the individual appropriates to appear drastically different in nomenclature from the account login IDs that the individual acquired prior to any change.

For example, a woman could get married and decide to use her new surname professionally. If her name was originally Mary Jones but she is now Mary Smith, she could call HR and ask them to update her contact information and email address with her new surname. This request would update her Microsoft Exchange login ID to mary. smith to reflect that surname change, but it might not actually update her information or login credentials in any other system she has access to. In this example, she could still be mjones in Active Directory and mj5678 in RACF.

Identity correlation should link the appropriate system account login IDs to individuals who might be indistinguishable, as well as to those who appear to be drastically different from a system-by-system standpoint. Still, it should be associated with the same individual.

Discovering Intentional and Unintentional Inconsistencies in Identity Data

Inconsistencies in identity data typically develop over time in organizations as applications are added, removed, or changed and as individuals attain or retain an ever-changing stream of access rights as they matriculate into and out of the organization.

Application user login IDs do not always have a consistent syntax across different applications or systems. Many user login IDs are not specific enough to directly correlate back to one particular individual within an organization.

User data inconsistencies can also occur due to manual input errors, non-standard nomenclature, or name changes that might not be identically updated across all systems.

The identity correlation process should consider these inconsistencies to link up identity data that might seem unrelated upon initial investigation.

Identifying Orphan or Defunct Account Login IDs

Organizations can expand and consolidate through mergers and acquisitions, which increases the complexity of business processes, policies, and procedures.

As an outcome of these events, users are subject to moving to different parts of the organization, attaining a new position within the organization, or matriculating out of the organization altogether. At the same time, each new application that is added has the potential to produce a new, completely unique user ID.

Some identities may become redundant, others may violate application-specific or more widespread departmental policies, others could be related to non-human or system account IDs, and still others may no longer be applicable to a particular user environment.

Projects that span different parts of the organization or focus on more than one application become difficult to implement because user identities are often not properly organized or recognized as defunct due to business process changes.

An identity correlation process must identify all orphan or defunct account identities that no longer belong to such drastic shifts in an organization's infrastructure.

Validating Individuals to their Appropriate Account IDs

Under such regulations as Sarbanes-Oxley and Gramm-Leach-Bliley Act, it is required for organizations to ensure the integrity of each user across all systems and account for all access a user has to various back-end systems and applications in an organization.

If implemented correctly, identity correlation will expose compliance issues. Auditors frequently ask organizations to account for who has access to what resources. For companies that have not already fully implemented an enterprise identity management solution, identity correlation and validation are required to adequately attest to the true state of an organization's user base.

This validation process typically requires interaction with individuals within an organization who are familiar with the organization's user base from an enterprise-wide perspective and those who are responsible and knowledgeable of each individual system and/or application-specific user base.

In addition, much of the validation process might ultimately involve direct communication with the individual in question to confirm particular identity data that is associated with that specific individual.

Assigning a unique primary or common key for every system or application Account ID that is attached to each individual

In response to various compliance pressures, organizations can introduce unique identifiers for their entire user base to validate that each user belongs in each specific system or application in which he/she has login capabilities.

To effectuate such a policy, various individuals familiar with the organization's entire user base and each system-specific user base must be responsible for validating that certain identities should be linked together and other identities should be disassociated from each other.

Once the validation process is complete, a unique identifier can be assigned to that individual and his or her associated system-specific account login IDs.

Approaches to Linking Disparate Account IDs

As mentioned above, in many organizations, users may sign into different systems and applications using different login IDs. There are many reasons to link these into 'enterprise-wide' user profiles.

There are a number of basic strategies to perform this correlation, or "ID Mapping:"

Assume that account IDs are the same:
- In this case, mapping is trivial.
- This actually works in many organizations, in cases where a rigorous and standardized process has been used to assign IDs to new users for a long time.
Import mapping data from an existing system:
- If an organization has implemented a robust process for mapping IDs to users over a long period, this data is already available and can be imported into any new Identity management system.
Exact matching on attribute values:
- Find one identity attribute or a combination of attributes on one system that correlate to one or more attributes on another system.
- Connect IDs on the two systems by finding users whose attribute(s) are the same.
Approximate matching on attribute values:
- The same as above, but instead of requiring attributes or expressions to match exactly, tolerate some differences.
- This allows for misspelled, inconsistently capitalized, and otherwise somewhat diverse names and similar identity values.
- The risk here is that accounts that should not be connected will accidentally be matched by this process.
Self-service login ID reconciliation:
- Invite users to fill in a form and indicate which IDs, on which systems, they own.
- Users might lie or make mistakes—so it's important to validate user input, for example by asking users also to provide passwords and to check those passwords.
- Users might not recognize system names—so it's important to offer alternatives or ask users for IDs+passwords in general, rather than asking them to specify which system those IDs are for.
Hire a consultant and/or do it manually:
- This still leaves open the question of where the data comes from—perhaps by interviewing every user in question?

Common Barriers to Performing Identity Correlation

Privacy Concerns

Often, any process that requires an in-depth look into identity data brings up a concern for privacy and disclosure issues. Part of the identity correlation process infers that each particular data source will need to be compared against an authoritative data source to ensure consistency and validity against relevant corporate policies and access controls.

Any such comparison that involves exposure of enterprise-wide, authoritative, HR-related identity data will require various non-disclosure agreements either internally or externally, depending on how an organization decides to undergo an identity correlation exercise.

Because authoritative data is frequently highly confidential and restricted, such concerns may bar the way from performing an identity correlation activity thoroughly and sufficiently.

Extensive Time and Effort Requirements

Most organizations experience difficulties understanding the inconsistencies and complexities within their identity data across all their data sources. Typically, the process can not be completed accurately or sufficiently by manually comparing two lists of identity data or executing simple scripts to find matches between two different data sets. Even if an organization can dedicate full-time individuals to such an effort, the methodologies usually do not expose an adequate enough percentage of defunct identities, validate an adequate enough percentage of matched identities, or identify system (non-person) account IDs to pass the typical requirements of an identity-related audit.

Manual efforts to accomplish identity correlation require a great deal of time and people effort and do not guarantee that the effort will be completed successfully or in a compliant fashion.

Because of this, automated identity correlation solutions have recently entered the marketplace to provide more effortless ways of handling identity correlation exercises.

Typical automated identity correlation solution functionality includes the following characteristics:

Analysis and comparison of identities within multiple data sources
Flexible match criteria definitions and assignments for any combination of data elements between any two data sources
Easy connectivity either directly or indirectly to all permissible sources of data
Out-of-the-box reports and/or summaries of data match results
Ability to manually override matched or unmatched data combinations
Ability to view data results on a fine-grained level
Assignment of unique identifiers to pre-approved or manually validated matched data.
Export abilities to send verified user lists back to source systems and/or provisioning solutions
Ability to customize data mapping techniques to refine data matches
Role-based access controls built into the solution to regulate identity data exposures as data is loaded, analyzed, and validated by various individuals both inside and outside of the organization
Ability to validate identity data against end-users more quickly or efficiently than through manual methodologies
Collection of identity attributes from personal mobile devices through partial identity extraction^[2]
Profiling of web surfing and social media behavior through tracking mechanisms^[3]
Biometric measurements of the users in question can correlate identities across systems^[4]
Centralized identity brokerage systems that administer and access identity attributes across identity silos^[5]

Three Methods of Identity Correlation Project Delivery

Identity correlation solutions can be implemented under three distinct delivery models. These delivery methodologies are designed to offer a solution that is flexible enough to correspond to various budget and staffing requirements and meet both short and/or long-term project goals and initiatives.

Software Purchase – This is the classic Software Purchase model where an organization purchases a software license and runs the software within its own hardware infrastructure.

Training is available and recommended
Installation Services are optional

Identity Correlation as a Service (ICAS) – ICAS is a subscription-based service where a client connects to a secure infrastructure to load and run correlation activities. This offering provides full functionality offered by the identity correlation solution without owning and maintaining hardware and related support staff.

Turn-Key Identity Correlation – A Turn-key methodology requires a client to contract with and provide data to a solutions vendor to perform the required identity correlation activities. Once completed, the solutions vendor will return correlated data, identify mismatches, and provide data integrity reports.

Validation activities will still require some direct feedback from individuals within the organization who understand the state of the organizational user base from an enterprise-wide viewpoint and those within the organization who are familiar with each system-specific user base. In addition, some validation activities might require direct feedback from individuals within the user base itself.

A Turn-Key solution can be performed as a single one-time activity, monthly, quarterly, or even as part of an organization's annual validation activities. Additional services are available, such as:

Email Campaigns to help resolve data discrepancies
Consolidated or merged list generation

Related Research Articles

In computing, extract, transform, load (ETL) is a three-phase process where data is extracted from an input source, transformed, and loaded into an output data container. The data can be collated from one or more sources and it can also be output to one or more destinations. ETL processing is typically executed using software applications but it can also be done manually by system operators. ETL software typically automates the entire process and can be run manually or on recurring schedules either as single jobs or aggregated into a batch of jobs.

A surrogate key in a database is a unique identifier for either an entity in the modeled world or an object in the database. The surrogate key is not derived from application data, unlike a natural key.

passwd is a command on Unix, Plan 9, Inferno, and most Unix-like operating systems used to change a user's password. The password entered by the user is run through a key derivation function to create a hashed version of the new password, which is saved. Only the hashed version is stored; the entered password is not saved for security reasons.

Identity management (IdM), also known as identity and access management, is a framework of policies and technologies to ensure that the right users have the appropriate access to technology resources. IdM systems fall under the overarching umbrellas of IT security and data management. Identity and access management systems not only identify, authenticate, and control access for individuals who will be utilizing IT resources but also the hardware and applications employees need to access.

In computing, the term virtual directory has a couple of meanings. It may simply designate a folder which appears in a path but which is not actually a subfolder of the preceding folder in the path. However, this article will discuss the term in the context of directory services and identity management.

<span class="mw-page-title-main">Accounting information system</span> System of collecting, storing and processing financial and accounting data

An accounting information system (AIS) is a system of collecting, storing and processing financial and accounting data that are used by decision makers. An accounting information system is generally a computer-based method for tracking accounting activity in conjunction with information technology resources. The resulting financial reports can be used internally by management or externally by other interested parties including investors, creditors and tax authorities. Accounting information systems are designed to support all accounting functions and activities including auditing, financial accounting porting, -managerial/ management accounting and tax. The most widely adopted accounting information systems are auditing and financial reporting modules.

A federated identity in information technology is the means of linking a person's electronic identity and attributes, stored across multiple distinct identity management systems.

A digital identity is data stored on computer systems relating to an individual, organization, application, or device. For individuals, it involves the collection of personal data that is essential for facilitating automated access to digital services, confirming one's identity on the internet, and allowing digital systems to manage interactions between different parties. It is a component of a person's social identity in the digital realm, often referred to as their online identity.

Logical security consists of software safeguards for an organization's systems, including user identification and password access, authenticating, access rights and authority levels. These measures are to ensure that only authorized users are able to perform actions or access information in a network or a workstation. It is a subset of computer security.

OpenID is an open standard and decentralized authentication protocol promoted by the non-profit OpenID Foundation. It allows users to be authenticated by co-operating sites using a third-party identity provider (IDP) service, eliminating the need for webmasters to provide their own ad hoc login systems, and allowing users to log in to multiple unrelated websites without having to have a separate identity and password for each. Users create accounts by selecting an OpenID identity provider, and then use those accounts to sign on to any website that accepts OpenID authentication. Several large organizations either issue or accept OpenIDs on their websites.

Data cleansing or data cleaning is the process of detecting and correcting corrupt or inaccurate records from a record set, table, or database and refers to identifying incomplete, incorrect, inaccurate or irrelevant parts of the data and then replacing, modifying, or deleting the dirty or coarse data. Data cleansing may be performed interactively with data wrangling tools, or as batch processing through scripting or a data quality firewall.

There are several forms of software used to help users or organizations better manage passwords:

An entity–attribute–value model (EAV) is a data model optimized for the space-efficient storage of sparse—or ad-hoc—property or data values, intended for situations where runtime usage patterns are arbitrary, subject to user variation, or otherwise unforeseeable using a fixed design. The use-case targets applications which offer a large or rich system of defined property types, which are in turn appropriate to a wide set of entities, but where typically only a small, specific selection of these are instantiated for a given entity. Therefore, this type of data model relates to the mathematical notion of a sparse matrix. EAV is also known as object–attribute–value model, vertical database model, and open schema.

Electronic authentication is the process of establishing confidence in user identities electronically presented to an information system. Digital authentication, or e-authentication, may be used synonymously when referring to the authentication process that confirms or certifies a person's identity and works. When used in conjunction with an electronic signature, it can provide evidence of whether data received has been tampered with after being signed by its original sender. Electronic authentication can reduce the risk of fraud and identity theft by verifying that a person is who they say they are when performing transactions online.

Security information and event management (SIEM) is a field within the field of computer security, where software products and services combine security information management (SIM) and security event management (SEM). SIEM is the core component of any typical Security Operations Center (SOC), which is the centralized response team addressing security issues within an organization.

An expense and cost recovery system (ECRS) is a specialized subset of "extract, transform, load" (ETL) functioning as a powerful and flexible set of applications, including programs, scripts and databases designed to improve the cash flow of businesses and organizations by automating the movement of data between cost recovery systems, electronic billing from vendors, and accounting systems.

In computer security, general access control includes identification, authorization, authentication, access approval, and audit. A more narrow definition of access control would cover only access approval, whereby the system makes a decision to grant or reject an access request from an already authenticated subject, based on what the subject is authorized to access. Authentication and access control are often combined into a single operation, so that access is approved based on successful authentication, or based on an anonymous access token. Authentication methods and tokens include passwords, biometric scans, physical keys, electronic keys and devices, hidden paths, social barriers, and monitoring by humans and automated systems.

Digital identity is used in Australia by residents to validate who they are over digital media, such as over the Internet.

Customeridentity and access management (CIAM) is a subset of the larger concept of identity access management (IAM) that focuses on managing and controlling external parties' access to a business' applications, web portals and digital services.

Unified access management (UAM) refers to an identity management solution that is used by enterprises to manage digital identities and provide secure access to users across multiple devices and applications, both cloud and on-premise. Unified access management solutions provide a single platform from which IT can manage access across a diverse set of users, devices, and applications, whether on-premise or in the cloud.

References

↑ Harris, Shon. "CISSP Certification All-In-One Exam Guide, 4th Ed." (November 9, 2007), McGraw-Hill Osborne Media.
↑ Fritsch, Lothar; Momen, Nurul (2017). "Derived Partial Identities Generated from App Permissions". Gesellschaft für Informatik: 117–130.{{cite journal}}: Cite journal requires |journal= (help)
↑ US 8566866,Fleischman, Michael Ben,"Web identity to social media identity correlation",published 2013-10-22, assigned to Bluefin Labs Inc.
↑ Ng-Kruelle, Grace; Swatman, Paul A.; Hampe, J. Felix; Rebne, Douglas S. (2006). "Biometrics and e-Identity (e-Passport) in the European Union: End-user perspectives on the adoption of a controversial innovation". Journal of Theoretical and Applied Electronic Commerce Research. 1 (2): 12–35. doi: 10.3390/jtaer1020010 . ISSN 0718-1876.
↑ Bruegger, Bud P.; Roßnagel, Heiko (2016). Towards a decentralized identity management ecosystem for Europe and beyond. Gesellschaft für Informatik e.V. ISBN 978-3-88579-658-9.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Harris, Shon. "CISSP Certification All-In-One Exam Guide, 4th Ed." (November 9, 2007), McGraw-Hill Osborne Media.

[2] Fritsch, Lothar; Momen, Nurul (2017). "Derived Partial Identities Generated from App Permissions". Gesellschaft für Informatik: 117–130.{{cite journal}}: Cite journal requires |journal= (help)

[3] US 8566866,Fleischman, Michael Ben,"Web identity to social media identity correlation",published 2013-10-22, assigned to Bluefin Labs Inc.

[4] Ng-Kruelle, Grace; Swatman, Paul A.; Hampe, J. Felix; Rebne, Douglas S. (2006). "Biometrics and e-Identity (e-Passport) in the European Union: End-user perspectives on the adoption of a controversial innovation". Journal of Theoretical and Applied Electronic Commerce Research. 1 (2): 12–35. doi: 10.3390/jtaer1020010 . ISSN 0718-1876.

[5] Bruegger, Bud P.; Roßnagel, Heiko (2016). Towards a decentralized identity management ecosystem for Europe and beyond. Gesellschaft für Informatik e.V. ISBN 978-3-88579-658-9.

[1]

[2]

[3]

[4]

[5]