Database administration

Last updated

Database administration is the function of managing and maintaining database management systems (DBMS) software. Mainstream DBMS software such as Oracle, IBM Db2 and Microsoft SQL Server need ongoing management. As such, corporations that use DBMS software often hire specialized information technology personnel called database administrators or DBAs.

Contents

Responsibilities

Types

There are three types of DBAs:

  1. Systems DBAs (also referred to as physical DBAs, operations DBAs or production Support DBAs): focus on the physical aspects of database administration such as DBMS installation, configuration, patching, upgrades, backups, restores, refreshes, performance optimization, maintenance and disaster recovery.
  2. Development DBAs: focus on the logical and development aspects of database administration such as data model design and maintenance, DDL (data definition language) generation, SQL writing and tuning, coding stored procedures, collaborating with developers to help choose the most appropriate DBMS feature/functionality and other pre-production activities.
  3. Application DBAs: usually found in organizations that have purchased 3rd party application software such as ERP (enterprise resource planning) and CRM (customer relationship management) systems. Examples of such application software includes Oracle Applications, Siebel and PeopleSoft (both now part of Oracle Corp.) and SAP. Application DBAs straddle the fence between the DBMS and the application software and are responsible for ensuring that the application is fully optimized for the database and vice versa. They usually manage all the application components that interact with the database and carry out activities such as application installation and patching, application upgrades,database cloning, building and running data cleanup routines, data load process management, etc.

In larger organizations, individuals typically specialize in a specific type of database administration. However, in smaller organizations, it's common for a single person or team to handle multiple database administration roles.

Automation of database administration

The degree to which the administration of a database is automated dictates the skills and personnel required to manage databases. On one end of the spectrum, a system with minimal automation will require significant experienced resources to manage; perhaps 5-10 databases per DBA. Alternatively an organization might choose to automate a significant amount of the work that could be done manually therefore reducing the skills required to perform tasks. As automation increases, the personnel needs of the organization splits into highly skilled workers to create and manage the automation and a group of lower skilled "line" DBAs who simply execute the automation.

Database administration work is complex, repetitive, time-consuming and requires significant training. Since databases hold valuable and mission-critical data, companies usually look for candidates with multiple years of experience. Database administration often requires DBAs to put in work during off-hours (for example, for planned after hours downtime, in the event of a database-related outage or if performance has been severely degraded). DBAs are commonly well compensated for the long hours.

One key skill required and often overlooked when selecting a DBA is database recovery (a part of disaster recovery). It is not a case of “if” but a case of “when” a database suffers a failure, ranging from a simple failure to a full catastrophic failure. The failure may be data corruption, media failure, or user induced errors. In either situation the DBA must have the skills to recover the database to a given point in time to prevent a loss of data.

Database administration tools

Often, the DBMS software comes with certain tools to help DBAs manage the DBMS. Such tools are called native tools. For example, Microsoft SQL Server comes with SQL Server Management Studio and Oracle has tools such as SQL*Plus and Oracle Enterprise Manager/Grid Control. In addition, 3rd parties such as BMC, Quest Software, Embarcadero Technologies, patchVantage and SQL Maestro Group offer GUI tools to monitor the DBMS and help DBAs carry out certain functions inside the database more easily.

Another kind of database software exists to manage the provisioning of new databases and the management of existing databases and their related resources. The process of creating a new database can consist of hundreds or thousands of unique steps from satisfying prerequisites to configuring backups where each step must be successful before the next can start. A human cannot be expected to complete this procedure in the same exact way time after time - exactly the goal when multiple databases exist.

As the number of DBAs grows, without automation the number of unique configurations frequently grows to be costly/difficult to support. All of these complicated procedures can be modeled by the best DBAs into database automation software and executed by the standard DBAs. Software has been created specifically to improve the reliability and repeatability of these procedures such as Stratavia's Data Palette and GridApp Systems Clarity.

The impact of IT and cloud automation

Automated Database operations has grown since 2009, following Amazon Web Services introduction of AWS RDS, providing automated and managed database as a service. [1] Microsoft Azure launched a similar automated database as a service in 2010, with SQL Azure, providing automated backups, with geo-replication and high availability. The introduction of docker (software) containers has enhanced support for fast delivery of containerized database instances, and both Amazon Web Services and Microsoft Azure have enhanced automated support for containers in their respective services.

Third party support for database container images has grown, including MongoDB, PostgreSQL, MySQL from Oracle, and Microsoft SQL Server from Microsoft, and from independent port's of docker (software) from Windocks [2] Kubernetes, and the development of the Kubernetes Operator pattern by CoreOS, further extended the ability to orchestrate database container. [3] Kubernetes Operators have been used by third parties to enable the ability to automate database administration, including deployment of instances of a database, upgrade database versions, or perform backups. [4]

Newer technologies such as Stratavia's Data Palette suite and GridApp Systems Clarity have begun to increase the automation of databases causing the reduction of database related tasks. However at best this only reduces the amount of mundane, repetitive activities and does not eliminate the need for DBAs. The intention of DBA automation is to enable DBAs to focus on more proactive activities around database architecture, deployment, performance and service level management.

Every database requires a database owner account that can perform all schema management operations. This account is specific to the database and cannot log into Data Director. You can add database owner accounts after database creation. Data Director users must log in with their database-specific credentials to view the database, its entities, and its data or to perform database management tasks. Database administrators and application developers can manage databases only if they have appropriate permissions and roles granted to them by the organization administrator. The permissions and roles must be granted on the database group or on the database, and they only apply within the organization in which they are granted.

Learning database administration

There are several education institutes that offer professional courses, including late-night programs, to allow candidates to learn database administration. DBMS vendors such as Oracle, Microsoft and IBM offer certification programs to help companies to hire qualified DBA practitioners. College degree in Computer Science or related field is helpful but not necessarily a prerequisite.

See also

Related Research Articles

<span class="mw-page-title-main">Database</span> Organized collection of data in computing

In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and analyze the data. The DBMS additionally encompasses the core facilities provided to administer the database. The sum total of the database, the DBMS and the associated applications can be referred to as a database system. Often the term "database" is also used loosely to refer to any of the DBMS, the database system or an application associated with the database.

<span class="mw-page-title-main">Ingres (database)</span>

Ingres Database is a proprietary SQL relational database management system intended to support large commercial and government applications.

<span class="mw-page-title-main">IBM Db2</span> Relational model database server

Db2 is a family of data management products, including database servers, developed by IBM. It initially supported the relational model, but was extended to support object–relational features and non-relational structures like JSON and XML. The brand name was originally styled as DB/2, then DB2 until 2017 and finally changed to its present form.

Database administrators (DBAs) use specialized software to store and organize data. The role may include capacity planning, installation, configuration, database design, migration, performance monitoring, security, troubleshooting, as well as backup and data recovery.

<span class="mw-page-title-main">Data dictionary</span> Set of metadata that contains definitions and representations of data elements

A data dictionary, or metadata repository, as defined in the IBM Dictionary of Computing, is a "centralized repository of information about data such as meaning, relationships to other data, origin, usage, and format". Oracle defines it as a collection of tables with metadata. The term can have one of several closely related meanings pertaining to databases and database management systems (DBMS):

NetApp, Inc. is an intelligent data infrastructure company that provides unified data storage, integrated data services, and cloud operations (CloudOps) solutions to enterprise customers. The company is based in San Jose, California. It has ranked in the Fortune 500 from 2012 to 2021. Founded in 1992 with an initial public offering in 1995, NetApp offers cloud data services for management of applications and data both online and physically.

Dataphor is an open-source truly-relational database management system (RDBMS) and its accompanying user interface technologies, which together are designed to provide highly declarative software application development. The Dataphor Server has its own storage engine or it can be a virtual, or federated, DBMS, meaning that it can utilize other database engines for storage.

ArcSDE is a server-software sub-system that aims to enable the usage of Relational Database Management Systems for spatial data. The spatial data may then be used as part of a geodatabase.

IBM Storage Protect is a data protection platform that gives enterprises a single point of control and administration for backup and recovery. It is the flagship product in the IBM Spectrum Protect family.

<span class="mw-page-title-main">EnterpriseDB</span> American software company

EnterpriseDB (EDB), a privately held company based in Massachusetts, provides software and services based on the open-source database PostgreSQL, and is one of the largest contributors to Postgres. EDB develops and integrates performance, security, and manageability enhancements into Postgres to support enterprise-class workloads. EDB has also developed database compatibility for Oracle to facilitate the migration of workloads from Oracle to EDB Postgres and to support the operation of many Oracle workloads on EDB Postgres.

Catalogic DPX is an enterprise-level data protection tool that backs up and restores data and applications for a variety of operating systems. It has data protection, disaster recovery and business continuity planning capabilities. Catalogic DPX protects physical servers or virtual machines on VMWare vSphere and Microsoft Hyper-V hypervisors, supports many database applications, including Oracle, SQL Server, SharePoint, Exchange, and SAP HANA. DPX supports agent-based or agent-less backups. Users can map to and use a backed up version of the database if something goes wrong with the primary version. DPX is managed from a single console and catalog. This allows for centralized control of both tape-based and disk-based data protection jobs across heterogeneous operating systems. DPX can protect data centers, remote sites and supports recovery from DR. DPX can protect data to disk, tape or cloud. It is used for various recovery use cases including file, application, BMR, VM or DR. DPX can spin up VMs from backup images, recover physical servers, bring up applications online from snapshot based backups, it can be used to recover from Ransomware.

An embedded database system is a database management system (DBMS) which is tightly integrated with an application software; it is embedded in the application. It is a broad technology category that includes:

Vinzant Software is a privately held company that is based in Hobart, IN. Vinzant Software develops and markets enterprise job scheduling products for platforms including Windows, Unix, Linux, IBM i, and MPE/ix. It was founded in 1988 by David Vinzant and has focused solely on job scheduling since 1995.

<span class="mw-page-title-main">WaveMaker</span> Low-code programming platform

WaveMaker is a Java-based low-code development platform designed for building software applications and platforms. The company, WaveMaker Inc., is based in Mountain View, California. The platform is intended to assist enterprises in speeding up their application development and IT modernization initiatives through low-code capabilities. Additionally, for independent software vendors (ISVs), WaveMaker serves as a customizable low-code component that seamlessly integrates into their products.

<span class="mw-page-title-main">Vertica</span> Software company

Vertica is an analytic database management software company. Vertica was founded in 2005 by the database researcher Michael Stonebraker with Andrew Palmer as the founding CEO. Ralph Breslauer and Christopher P. Lynch served as CEOs later on.

Sentrigo was a privately held software company located in Santa Clara, California, USA, until its acquisition in April, 2011 by McAfee.

<span class="mw-page-title-main">OpenShift</span> Cloud computing software

OpenShift is a family of containerization software products developed by Red Hat. Its flagship product is the OpenShift Container Platform — a hybrid cloud platform as a service built around Linux containers orchestrated and managed by Kubernetes on a foundation of Red Hat Enterprise Linux. The family's other products provide this platform through different environments: OKD serves as the community-driven upstream, Several deployment methods are available including self-managed, cloud native under ROSA, ARO and RHOIC on AWS, Azure, and IBM Cloud respectively, OpenShift Online as software as a service, and OpenShift Dedicated as a managed service.

<span class="mw-page-title-main">SingleStore</span> Database management system

SingleStore is a proprietary, cloud-native database designed for data-intensive applications. A distributed, relational, SQL database management system (RDBMS) that features ANSI SQL support, it is known for speed in data ingest, transaction processing, and query processing.

Kubernetes is an open-source container orchestration system for automating software deployment, scaling, and management. Originally designed by Google, the project is now maintained by the Cloud Native Computing Foundation.

LPAR2RRD is an open-source software tool that is used for monitoring and reporting performance of servers, clouds and databases. It is developed by the Czech company XoruX.

References

  1. Perry, Yifat (2020-05-18). "AWS Database". NetApp BlueXP.
  2. Windocks does what Docker and Microsoft can't do
  3. CoreOS Introduces "Operators" to Simplify Configuration and Management of Modern Distributed Applications
  4. Kubernetes Operators in Depth