Data migration

Last updated

Data migration is the process of selecting, preparing, extracting, and transforming data and permanently transferring it from one computer storage system to another. Additionally, the validation of migrated data for completeness and the decommissioning of legacy data storage are considered part of the entire data migration process. [1] [2] Data migration is a key consideration for any system implementation, upgrade, or consolidation, and it is typically performed in such a way as to be as automated as possible, freeing up human resources from tedious tasks. Data migration occurs for a variety of reasons, including server or storage equipment replacements, maintenance or upgrades, application migration, website consolidation, disaster recovery, and data center relocation. [2]

Contents

The standard phases

As of 2011, "nearly 40 percent of data migration projects were over time, over budget, or failed entirely." [1] [3] Thus, proper planning is critical for an effective data migration. While the specifics of a data migration plan may vary—sometimes significantly—from project to project, IBM suggests there are three main phases to most any data migration project: planning, migration, and post-migration. [2] Each of those phases has its own steps. During planning, dependencies and requirements are analyzed, migration scenarios get developed and tested, and a project plan that incorporates the prior information is created. During the migration phase, the plan is enacted, and during post-migration, the completeness and thoroughness of the migration is validated, documented, and closed out, including any necessary decommissioning of legacy systems. [2] For applications of moderate to high complexity, these data migration phases may be repeated several times before the new system is considered to be fully validated and deployed.

Planning: The data and applications to be migrated are selected based on business, project, and technical requirements and dependencies. Hardware and bandwidth requirements are analyzed. Feasible migration and back-out scenarios are developed, as well as the associated tests, automation scripts, mappings, and procedures. Data cleansing and transformation requirements are also gauged for data formats to improve data quality and to eliminate redundant or obsolete information. Migration architecture is decided on and developed, any necessary software licenses are obtained, and change management processes are started. [1] [2]

Migration: Hardware and software requirements are validated, and migration procedures are customized as needed. Some sort of pre-validation testing may also occur to ensure requirements and customized settings function as expected. If all is deemed well, migration begins, including the primary acts of data extraction, where data is read from the old system, and data loading, where data is written to the new system. Additional verification steps ensure the developed migration plan was enacted in full. [1] [2]

Post-migration: After data migration, results are subjected to data verification to determine whether data was accurately translated, is complete, and supports processes in the new system. During verification, there may be a need for a parallel run of both systems to identify areas of disparity and forestall erroneous data loss. Additional documentation and reporting of the migration project is conducted, and once the migration is validated complete, legacy systems may also be decommissioned. Migration close-out meetings will officially end the migration process. [1] [2]

Project versus process

There is a difference between data migration and data integration activities. Data migration is a project through which data will be moved or copied from one environment to another, and removed or decommissioned in the source. During the migration (which can take place over months or even years), data can flow in multiple directions, and there may be multiple simultaneous migrations. The ETL (extract, transform, load) actions will be necessary, although the means of achieving these may not be those traditionally associated with the ETL acronym.

Data integration, by contrast, is a permanent part of the IT architecture, and is responsible for the way data flows between the various applications and data stores—and is a process rather than a project activity. Standard ETL technologies designed to supply data from operational systems to data warehouses would fit within the latter category. [4]

Categories

Data is stored on various media in files or databases, and is generated and consumed by software applications, which in turn support business processes. The need to transfer and convert data can be driven by multiple business requirements, and the approach taken to the migration depends on those requirements. Four major migration categories are proposed on this basis.

Storage migration

A business may choose to rationalize the physical media to take advantage of more efficient storage technologies. [2] This will result in having to move physical blocks of data from one tape or disk to another, often using virtualization techniques. The data format and content itself will not usually be changed in the process and can normally be achieved with minimal or no impact to the layers above. [5]

Database migration

Similarly, it may be necessary to move from one database vendor to another, or to upgrade the database software being used. The latter case is less likely to require a physical data migration, but this can happen with major upgrades. In these cases a physical transformation process may be required since the underlying data format can change significantly. This may or may not affect behavior in the applications layer, depending largely on whether the data manipulation language or protocol has changed. [6] However, some modern applications are written to be almost entirely agnostic to the database technology, [7] so a change from Sybase, MySQL, IBM Db2 or SQL Server to Oracle should only require a testing cycle to be confident that both functional and non-functional performance has not been adversely affected.

Application migration

Changing application vendor—for instance a new CRM or ERP platform—will inevitably involve substantial transformation as almost every application or suite operates on its own specific data model and also interacts with other applications and systems within the enterprise application integration environment. [8] Furthermore, to allow the application to be sold to the widest possible market, commercial off-the-shelf packages are generally configured for each customer using metadata. Application programming interfaces (APIs) may be supplied by vendors to protect the integrity of the data they must handle.

Business process migration

Business processes operate through a combination of human and application systems actions, often orchestrated by business process management tools. When these change they can require the movement of data from one store, database or application to another to reflect the changes to the organization and information about customers, products and operations. Examples of such migration drivers are mergers and acquisitions, business optimization, and reorganization to attack new markets or respond to competitive threat. [9]

The first two categories of migration are usually routine operational activities that the IT department takes care of without the involvement of the rest of the business. The last two categories directly affect the operational users of processes and applications, are necessarily complex, and delivering them without significant business downtime can be challenging. A highly adaptive approach, concurrent synchronization, a business-oriented audit capability, and clear visibility of the migration for stakeholders—through a project management office or data governance team—are likely to be key requirements in such migrations. [9]

Migration as a form of digital preservation

Migration, which focuses on the digital object itself, is the act of transferring, or rewriting data from an out-of-date medium to a current medium and has for many years been considered the only viable approach to long-term preservation of digital objects. [10] Reproducing brittle newspapers onto microfilm is an example of such migration.

Disadvantages

See also

Related Research Articles

<span class="mw-page-title-main">Database</span> Organized collection of data in computing

In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and analyze the data. The DBMS additionally encompasses the core facilities provided to administer the database. The sum total of the database, the DBMS and the associated applications can be referred to as a database system. Often the term "database" is also used loosely to refer to any of the DBMS, the database system or an application associated with the database.

<span class="mw-page-title-main">Enterprise resource planning</span> Corporate task of optimizing the existing resources in a company

Enterprise resource planning (ERP) is the integrated management of main business processes, often in real time and mediated by software and technology. ERP is usually referred to as a category of business management software—typically a suite of integrated applications—that an organization can use to collect, store, manage and interpret data from many business activities. ERP systems can be local-based or cloud-based. Cloud-based applications have grown in recent years due to the increased efficiencies arising from information being readily available from any location with Internet access.

<span class="mw-page-title-main">Legacy system</span> Old computing technology or system that remains in use

In computing, a legacy system is an old method, technology, computer system, or application program, "of, relating to, or being a previous or outdated computer system", yet still in use. Often referencing a system as "legacy" means that it paved the way for the standards that would follow it. This can also imply that the system is out of date or in need of replacement.

<span class="mw-page-title-main">Extract, transform, load</span> Procedure in computing

In computing, extract, transform, load (ETL) is a three-phase process where data is extracted from an input source, transformed, and loaded into an output data container. The data can be collated from one or more sources and it can also be output to one or more destinations. ETL processing is typically executed using software applications but it can also be done manually by system operators. ETL software typically automates the entire process and can be run manually or on recurring schedules either as single jobs or aggregated into a batch of jobs.

<span class="mw-page-title-main">Computer-aided software engineering</span> Domain of software tools

Computer-aided software engineering (CASE) is a domain of software tools used to design and implement applications. CASE tools are similar to and are partly inspired by computer-aided design (CAD) tools used for designing hardware products. CASE tools are intended to help develop high-quality, defect-free, and maintainable software. CASE software was often associated with methods for the development of information systems together with automated tools that could be used in the software development process.

<span class="mw-page-title-main">Digital obsolescence</span> Data loss as the format goes into disuse

Digital obsolescence is the risk of data loss because of inabilities to access digital assets, due to the hardware or software required for information retrieval being repeatedly replaced by newer devices and systems, resulting in increasingly incompatible formats. While the threat of an eventual "digital dark age" was initially met with little concern until the 1990s, modern digital preservation efforts in the information and archival fields have implemented protocols and strategies such as data migration and technical audits, while the salvage and emulation of antiquated hardware and software address digital obsolescence to limit the potential damage to long-term information access.

Enterprise content management (ECM) extends the concept of content management by adding a timeline for each content item and, possibly, enforcing processes for its creation, approval, and distribution. Systems using ECM generally provide a secure repository for managed items, analog or digital. They also include one methods for importing content to manage new items, and several presentation methods to make items available for use. Although ECM content may be protected by digital rights management (DRM), it is not required. ECM is distinguished from general content management by its cognizance of the processes and procedures of the enterprise for which it is created.

Enterprise software, also known as enterprise application software (EAS), is computer software used to satisfy the needs of an organization rather than its individual users. Enterprise software is an integral part of a computer-based information system, handling a number of business operations, for example to enhance business and management reporting tasks, or support production operations and back office functions. Enterprise systems must process information at a relatively high speed.

IBM InfoSphere DataStage is an ETL tool and part of the IBM Information Platforms Solutions suite and IBM InfoSphere. It uses a graphical notation to construct data integration solutions and is available in various versions such as the Server Edition, the Enterprise Edition, and the MVS Edition. It uses a client-server architecture. The servers can be deployed in both Unix as well as Windows.

Business intelligence software is a type of application software designed to retrieve, analyze, transform and report data for business intelligence (BI). The applications generally read data that has been previously stored, often - though not necessarily - in a data warehouse or data mart.

In computing, data transformation is the process of converting data from one format or structure into another format or structure. It is a fundamental aspect of most data integration and data management tasks such as data wrangling, data warehousing, data integration and application integration.

Legacy modernization, also known as software modernization or platform modernization, refers to the conversion, rewriting or porting of a legacy system to modern computer programming languages, architectures, software libraries, protocols or hardware platforms. Legacy transformation aims to retain and extend the value of the legacy investment through migration to new platforms to benefit from the advantage of the new technologies.

In information systems, applications architecture or application architecture is one of several architecture domains that form the pillars of an enterprise architecture (EA).

RTTS is a professional services organization that provides software quality outsourcing, training, and resources for business applications. With offices in New York City, Philadelphia, Atlanta, and Phoenix, RTTS serves mid-sized to large corporations throughout North America. RTTS uses the software quality and test solutions from IBM, Hewlett Packard Enterprise, Microsoft and other vendors and open source tools to perform software performance testing, functional test automation, big data testing, data warehouse/ETL testing, mobile application testing, security testing and service virtualization.

Database preservation usually involves converting the information stored in a database to a form likely to be accessible in the long term as technology changes, without losing the initial characteristics of the data.

Innovative Routines International (IRI), Inc. is an American software company first known for bringing mainframe sort merge functionality into open systems. IRI was the first vendor to develop a commercial replacement for the Unix sort command, and combine data transformation and reporting in Unix batch processing environments. In 2007, IRI's coroutine sort ("CoSort") became the first product to collate and convert multi-gigabyte XML and LDIF files, join and lookup across multiple files, and apply role-based data privacy functions for fields within sensitive files.

<span class="mw-page-title-main">Linoma Software</span>

Linoma Software was a developer of secure managed file transfer and IBM i software solutions. The company was acquired by HelpSystems in June 2016. Mid-sized companies, large enterprises and government entities use Linoma's software products to protect sensitive data and comply with data security regulations such as PCI DSS, HIPAA/HITECH, SOX, GLBA and state privacy laws. Linoma's software runs on a variety of platforms including Windows, Linux, UNIX, IBM i, AIX, Solaris, HP-UX and Mac OS X.

Data virtualization is an approach to data management that allows an application to retrieve and manipulate data without requiring technical details about the data, such as how it is formatted at source, or where it is physically located, and can provide a single customer view of the overall data.

The following is provided as an overview of and topical guide to databases:

<span class="mw-page-title-main">SAP HANA</span> Database management system by SAP

SAP HANA is an in-memory, column-oriented, relational database management system developed and marketed by SAP SE. Its primary function as the software running a database server is to store and retrieve data as requested by the applications. In addition, it performs advanced analytics and includes extract, transform, load (ETL) capabilities as well as an application server.

References

  1. 1 2 3 4 5 Morris, J. (2012). "Chapter 1: Data Migration: What's All the Fuss?". Practical Data Migration (2nd ed.). BCS Learning & Development Ltd. pp. 7–15. ISBN   9781906124847.
  2. 1 2 3 4 5 6 7 8 Dufrasne, B.; Warmuth, A.; Appel, J.; et al. (2017). "Chapter 1: Introducing disk data migration". DS8870 Data Migration Techniques. IBM Redbooks. pp. 1–16. ISBN   9780738440606.
  3. Howard, P. (23 August 2011). "Data Migration Report - 2011". Bloor Research International Limited. Retrieved 20 July 2018.
  4. King, T. (17 August 2016). "Data Integration vs. Data Migration; What's the Difference?". Solutions Review - Data Integration. LeadSpark, Inc. Retrieved 20 July 2018.
  5. Seiwert, C.; Klee, P.; Marinez, L.; et al. (2012). "Chapter 2: Migration techniques and processes". Data Migration to IBM Disk Storage Systems. IBM Redbooks. pp. 7–30. ISBN   9780738436289.
  6. Fowler, M.; Beck, K.; Brant, J.; et al. (2012). Refactoring: Improving the Design of Existing Code. Addison-Wesley. pp. 63–4. ISBN   9780133065268.
  7. Fronc, A. (1 March 2015). "Database-agnostic applications". DBA Presents. Retrieved 20 July 2018.
  8. Plivna, G. (1 July 2006). "Data migration from old to new application: An experience". gplivna.eu. Retrieved 20 July 2018.
  9. 1 2 Allen, M.; Cervo, D. (2015). Multi-Domain Master Data Management: Advanced MDM and Data Governance in Practice. Morgan Kaufmann. pp. 61–2. ISBN   9780128011478.
  10. van der Hoeven, Jeffrey; Bram Lohman; Remco Verdegem (2007). "Emulation for Digital Preservation in Practice: The Results". The International Journal of Digital Curation. 2 (2): 123–132. doi: 10.2218/ijdc.v2i2.35 .
  11. Muira, Gregory (2007). "Pushing the Boundaries of Traditional Heritage Policy: maintaining long-term access to multimedia content" (PDF). IFLA Journal. 33 (4): 323–326. doi:10.1177/0340035207086058. S2CID   110505620.