Data steward

Last updated

A data steward is an oversight or data governance role within an organization, and is responsible for ensuring the quality and fitness for purpose of the organization's data assets, including the metadata for those data assets. A data steward may share some responsibilities with a data custodian, such as the awareness, accessibility, release, appropriate use, security and management of data. [1] A data steward would also participate in the development and implementation of data assets. A data steward may seek to improve the quality and fitness for purpose of other data assets their organization depends upon but is not responsible for.

Contents

Data stewards have a specialist role that utilizes an organization's data governance processes, policies, guidelines and responsibilities for administering an organizations' entire data in compliance with policy and/or regulatory obligations. The overall objective of a data steward is the data quality of the data assets, datasets, data records and data elements. [1] [2] This includes documenting metainformation for the data, such as definitions, related rules/governance, physical manifestation, and related data models (most of these properties being specific to an attribute/concept relationship), identifying owners/custodian's various responsibilities, relations insight [ definition needed ] pertaining to attribute quality, aiding with project requirement data facilitation and documentation of capture rules.

Data stewards begin the stewarding process with the identification of the data assets and elements which they will steward, with the ultimate result being standards, controls and data entry.[ citation needed ] The steward works closely with business glossary standards analysts (for standards), with data architect/modelers (for standards), with DQ analysts (for controls) and with operations team members (good-quality data going in per business rules) while entering data.

Data stewardship roles are common when organizations attempt to exchange data precisely and consistently between computer systems and to reuse data-related resources.[ citation needed ] Master data management often[ quantify ] makes references to the need for data stewardship for its implementation to succeed. Data stewardship must have precise purpose, fit for purpose or fitness.

Data steward responsibilities

A data steward ensures that each assigned data element:

  1. Has clear and unambiguous data element definition
  2. Does not conflict with other data elements in the metadata registry (removes duplicates, overlap etc.)
  3. Has clear enumerated value definitions if it is of type Code
  4. Is still being used (remove unused data elements)
  5. Is being used consistently in various computer systems
  6. Is being used, fit for purpose = Data Fitness
  7. Has adequate documentation on appropriate usage and notes
  8. Documents the origin and sources of authority on each metadata element
  9. Is protected against unauthorised access or change

Responsibilities of data stewards vary between different organisations and institutions. For example, at Delft University of Technology, data stewards are perceived as the first contact point for any questions related to research data. They also have subject-specific background allowing them to easily connect with researchers and to contextualise data management problems to take into account disciplinary practices. [3]

Types of data stewards

Depending on the set of data stewardship responsibilities assigned to an individual, there are 4 types (or dimensions of responsibility) of data stewards typically found within an organization:

  1. Data object data steward - responsible for managing reference data and attributes of one business data entity
  2. Business data steward - responsible for managing critical data, both reference and transactional, created or used by one business function. The data steward may also serve as a liaison between the organization's data users and technical teams, helping to bridge the gap between business needs and technical requirements. They may also play a role in educating others within the organization about best practices for data management, and advocating for data-driven decision-making.
  3. Process data steward - responsible for managing data across one business process
  4. System data steward - responsible for managing data for at least one IT system [4]

Benefits of data stewardship

Systematic data stewardship can foster:

  1. Faster analysis
  2. Consistent use of data management resources
  3. Easy mapping of data between computer systems and exchange documents
  4. Lower costs associated with migration to (for example) Service Oriented Architecture (SOA)
  5. Mitigation of data risk
  6. Better control of dangers associated with privacy, legal, errors, etc.

Assignment of each data element to a person sometimes seems like an unimportant process. But many groups[ which? ] have found that users have greater trust and usage rates in systems where they can contact a person with questions on each data element.

Examples

Delft University of Technology (TU Delft) offers an example of data stewardship implementation at a research institution. In 2017 the Data Stewardship Project was initiated at TU Delft to address research data management needs in a disciplinary manner across the whole campus. [5] Dedicated data stewards with subject-specific background were appointed at every TU Delft faculty to support researchers with data management questions and to act as a linking point with the other institutional support services. The project is coordinated centrally by TU Delft Library, and it has its own website, [6] blog [7] and a YouTube channel. [8]

The EPA metadata registry furnishes an example of data stewardship. Note that each data element therein has a "POC" (point of contact).

In 2023, ETH Zurich launched the Data Stewardship Network (DSN) to facilitate collaboration among employees engaged in data management, analysis, and code development across research groups. The DSN serves as a platform for networking and knowledge exchange, aiming to professionalize the role of data stewards who support research data management and reproducible workflows. Established by the team for Research Data Management and Digital Curation at the ETH Library, the DSN collaborates with Scientific IT Services to provide expertise in areas such as storage infrastructure and reproducible workflows. [9]

Data stewardship applications

A new market for data governance applications is emerging, one in which both technical and business staff — stewards — manage policies. These new applications, like previous generations, deliver a strong business glossary capability, but they do not stop there. Vendors are introducing additional features addressing the roles of business in addition to technical stewards' concerns. [10]

Information stewardship applications are business solutions used by business users acting in the role of information steward (interpreting and enforcing information governance policy, for example). These developing solutions represent, for the most part, an amalgam of a number of disparate, previously IT-centric tools already on the market, but are organized and presented in such a way that information stewards (a business role) can support the work of information policy enforcement as part of their normal, business-centric, day-to-day work in a range of use cases.

The initial push for the formation of this new category of packaged software came from operational use cases — that is, use of business data in and between transactional and operational business applications. This is where most of the master data management efforts are undertaken in organizations. However, there is also now a faster-growing interest in the new data lake arena for more analytical use cases. [11]

Some of the vendors in Metadata Management, like Alation, have started highlighting the importance of Data Stewards to employees interested in using data to make business decisions. [12]

See also

Related Research Articles

A document management system (DMS) is usually a computerized system used to store, share, track and manage files or documents. Some systems include history tracking where a log of the various versions created and modified by different users is recorded. The term has some overlap with the concepts of content management systems. It is often viewed as a component of enterprise content management (ECM) systems and related to digital asset management, document imaging, workflow systems and records management systems.

<span class="mw-page-title-main">Stewardship</span> Planning and management of resources and processes

Stewardship is a practice committed to ethical value that embodies the responsible planning and management of resources. The concepts of stewardship can be applied to the environment and nature, economics, health, places, property, information, theology, and cultural resources.

<span class="mw-page-title-main">Data management</span> Disciplines related to managing data as a resource

Data management comprises all disciplines related to handling data as a valuable resource, it is the practice of managing an organization's data so it can be analyzed for decision making.

Information technology (IT)governance is a subset discipline of corporate governance, focused on information technology (IT) and its performance and risk management. The interest in IT governance is due to the ongoing need within organizations to focus value creation efforts on an organization's strategic objectives and to better manage the performance of those responsible for creating this value in the best interest of all stakeholders. It has evolved from The Principles of Scientific Management, Total Quality Management and ISO 9001 Quality management system.

A metadata registry is a central location in an organization where metadata definitions are stored and maintained in a controlled method.

Enterprise data management (EDM) is the ability of an organization to precisely define, easily integrate and effectively retrieve data for both internal applications and external communication. EDM focuses on the creation of accurate, consistent, and transparent content. EDM emphasizes data precision, granularity, and meaning and is concerned with how the content is integrated into business applications as well as how it is passed along from one business process to another.

A configuration management database (CMDB) is an ITIL term for a database used by an organization to store information about hardware and software assets. It is useful to break down configuration items into logical layers. This database acts as a data warehouse for the organization and also stores information regarding the relationships among its assets. The CMDB provides a means of understanding the organization's critical assets and their relationships, such as information systems, upstream sources or dependencies of assets, and the downstream targets of assets.

Information lifecycle management (ILM) refers to strategies for administering storage systems on computing devices.

Data governance is a term used on both a macro and a micro level. The former is a political concept and forms part of international relations and Internet governance; the latter is a data management concept and forms part of corporate data governance.

Master data represents "data about the business entities that provide context for business transactions". The most commonly found categories of master data are parties, products, financial structures and locational concepts.

Internal control, as defined by accounting and auditing, is a process for assuring of an organization's objectives in operational effectiveness and efficiency, reliable financial reporting, and compliance with laws, regulations and policies. A broad concept, internal control involves everything that controls risks to an organization.

Preservation metadata is item level information that describes the context and structure of a digital object. It provides background details pertaining to a digital object's provenance, authenticity, and environment. Preservation metadata, is a specific type of metadata that works to maintain a digital object's viability while ensuring continued access by providing contextual information, usage details, and rights.

Master data management (MDM) is a discipline in which business and information technology work together to ensure the uniformity, accuracy, stewardship, semantic consistency and accountability of the enterprise's official shared master data assets.

Digital curation is the selection, preservation, maintenance, collection, and archiving of digital assets. Digital curation establishes, maintains, and adds value to repositories of digital data for present and future use. This is often accomplished by archivists, librarians, scientists, historians, and scholars. Enterprises are starting to use digital curation to improve the quality of information and data within their operational and strategic processes. Successful digital curation will mitigate digital obsolescence, keeping the information accessible to users indefinitely. Digital curation includes digital asset management, data curation, digital preservation, and electronic records management.

<span class="mw-page-title-main">Metadata</span> Data about data

Metadata is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including:

A Business Intelligence Competency Center (BICC) is a cross-functional organizational team that has defined tasks, roles, responsibilities and processes for supporting and promoting the effective use of Business Intelligence (BI) across an organization.

Information governance, or IG, is the overall strategy for information at an organization. Information governance balances the risk that information presents with the value that information provides. Information governance helps with legal compliance, operational transparency, and reducing expenditures associated with legal discovery. An organization can establish a consistent and logical framework for employees to handle data through their information governance policies and procedures. These policies guide proper behavior regarding how organizations and their employees handle information whether it is physically or electronically.

A metadata repository is a database created to store metadata. Metadata is information about the structures that contain the actual data. Metadata is often said to be "data about data", but this is misleading. Data profiles are an example of actual "data about data". Metadata adds one layer of abstraction to this definition– it is data about the structures that contain data. Metadata may describe the structure of any data, of any subject, stored in any format.

The World Data System (WDS) was created by the International Council for Science (ICSU), the predecessor to the International Science Council (ISC), at their 29th General Assembly in October 2008. The mission of the World Data System is to enhance the capabilities, impact and sustainability of member data repositories and data services by creating trusted communities of scientific data repositories, strengthening the scientific enterprise throughout the entire lifecycle of all data related components - creating first-class data that feeds first-class research output, and advocating for accessible data and transparent and reproducible science.

E-Theses Online Service (EThOS) is a bibliographic database and union catalogue of electronic theses provided by the British Library, the National Library of the United Kingdom. As of February 2022 EThOS provided access to over 500,000 doctoral theses awarded by over 140 UK higher education institutions, with around 3,000 new thesis records added every month until the British Library cyberattack forced the service to be temporarily taken offline.

References

  1. 1 2 Cramer, Jonathan James (March 5, 2019). "6 Key Responsibility of the Invaluable Data Steward". DNB. Archived from the original on March 28, 2019. Retrieved November 11, 2022.
  2. "What is Data Stewardship? Its Importance, Benefits, Programs and more". Simplilearn. November 30, 2021. Archived from the original on January 21, 2022. Retrieved November 11, 2022.
  3. NewMedia Centre (2018-05-16), 1 Data Stewardship at the TU Delft V2, archived from the original on 2021-12-19, retrieved 2018-06-12
  4. "Understanding the different types of a data steward - LightsOnData". LightsOnData. 2018-06-13. Retrieved 2018-06-20.
  5. Teperek, Marta; Cruz, Maria J.; Verbakel, Ellen; Böhmer, Jasmin K.; Dunning, Alastair (2018-01-22). "Data Stewardship – addressing disciplinary data management needs". Open Science Framework. doi:10.17605/OSF.IO/MJK9T. S2CID   59344239.
  6. "Data Stewardship". TU Delft. Archived from the original on 2018-06-12. Retrieved 2018-06-12.
  7. "Data Stewardship". Open Working. 2018-02-13. Retrieved 2018-06-12.
  8. "Data Stewardship TU Delft". YouTube. Retrieved 2018-06-12.
  9. "Launch of the Data Stewardship Network at ETH Zurich". ethz.ch. 2023-01-18. Retrieved 2024-04-15.
  10. "The Forrester Wave™: Data Governance Stewardship Applications, Q1 2016". www.forrester.com. Retrieved 2016-12-20.
  11. De Simoni, Guido (15 April 2016). "Market Guide for Information Stewardship Applications" . Gartner.
  12. "Magic Quadrant for Metadata Management Solutions" . Gartner. 9 August 2018.

Further reading