Evolutionary database design

Last updated September 27, 2021

Evolutionary database design involves incremental improvements to the database schema so that it can be continuously updated with changes, reflecting the customer's requirements. People across the globe work on the same piece of software at the same time hence, there is a need for techniques that allow a smooth evolution of database as the design develops. Such methods utilize automated refactoring and continuous integration so that it supports agile methodologies for software development. These development techniques are applied on systems that are in pre-production stage as well on systems that have already been released. These techniques not only cover relevant changes in the database schema according to customer's changing needs, but also migration of modified data into the database and also customizing the database access code accordingly without changing the data semantics.^[1]

History

After using the waterfall model for a long time, the software industry has witnessed a rise in adoption of agile methods for software development. Agile methodologies don’t assume requirements to be permanent at any stage of the software life cycle. These methods are designed to support sporadic changes in contrast to waterfall design technique. An important part of this approach is iterative development, where the entire software life-cycle is run multiple times during the life of a project. Every iteration witnesses the complete software development life cycle despite the iterations being of short duration that can vary between weeks to a few months.^[1]

Before the adoption of these methodologies, the entire system was designed before starting to develop the code. The same principle was applied to the database schema as well where it was considered to be derived out of the software requirements which were in turn developed by collaboration between the customer, end-users, business analysts, etc. and these requirements were not expected to change with the progress in the software development. This approach proved to be cumbersome because as time progressed, the redundancies in the existing database schema in the form of unused rows or columns were evident. This redundancy along with data quality problems went on to become a costly affair. It was concluded that the practice of not having design interleaved with construction and testing was highly inefficient.^[1]

Techniques

As mentioned in the previous section evolutionary methods are iterative in nature and these methods have become immensely popular over last two decades. Evolutionary database design aims to construct the database schema over the course of the project instead of building the entire database schema at the beginning of the project. This method of database design can capture and deal effectively with the changing requirements of projects.

There are five evolutionary database design techniques that can aid developers in building their database in an iterative fashion. A brief overview about the five techniques are provided below.

Database refactoring

Refactoring is the process of making changes to the program without affecting the functionality of the program. Database refactoring is the technique of implementing small changes to the database schema without affecting the functionality and information stored in the database.^[2] The main purpose of database refactoring is to improve the database design so that the database is more in-sync with the changing requirements. The user can modify tables, views, stored procedures and triggers. Dependency between the database and external applications make database refactoring a challenge.

Evolutionary data modeling

Data modeling is the technique of identifying entities, associating attributes to the entities and deciding the data structure to represent the attributes.^[3] In the traditional database scenario, a logical data model is created at the beginning to represent the entities and their associated attributes. In evolutionary data modeling the technique of data modeling is performed in an iterative manner, that is multiple data models are developed, each model representing a different aspect of the database. This kind of data modeling technique is practiced in an agile environment and it is one of the main principles of agile development.^[4]

Database regression testing

Whenever a new functionality is added to a system, it is essential to verify that the update does not corrupt or render the system unusable. In a database, the business logic is implemented in stored procedures, data validation rules and referential integrity and they have to be tested thoroughly when any change is implemented in the system. Regression testing is the process of executing all the test cases whenever a new feature is added to the system. test-first development (TFD) is a form of regression testing followed in evolutionary database design. The steps involved in TFD approach are,^[3]

Before adding a new function to the system, add a test to the test case suite such that the system fails the test
Run the tests, either the entire set of test cases or just a subset and ensure that the newly added test does indeed fail
Update the function such that the test passes
Run the tests again to ensure that all they succeed and that the system is not broken

Configuration management of database artifacts

Configuration management is a detailed recording of versions and updates that have been applied to any system. Configuration management is useful in rolling back updates and changes which have impacted the system in a negative manner. To ensure that any updates made in database refactoring can be rolled back, it is important to maintain database artifacts like data definition language scripts, data model files, reference data, stored procedures, etc. in a configuration management system.^[5]

Developer sandboxes

A sandbox is a fully functional environment in which the system can be built, tested and executed. In order to make changes to the database schema in an evolutionary manner it is ideal for every developer to have his/her own physical sandbox, copy of source code and a copy of database. In a sandbox environment the developer can make changes to the database schema and run tests without affecting the work of other developers and other environments. Once the change has been implemented successfully, it is promoted to pre-production environment where in acceptance testing is performed and after the acceptance tests succeed it is deployed into production.

Advantages and disadvantages

Advantages

High quality of database design: In evolutionary database design, the developer makes small changes to the database schema in an incremental manner and this achieves a highly optimized database schema.
Handling change: In a traditional database approach, a lot of time is spent in remodeling and restructuring the database when the requirements change. In evolutionary database technique, the schema of the database is adjusted periodically to keep up with the changing requirements. Hence, evolutionary database design technique is better suited in handling the changing requirements.
Guaranteed working of system at all times: The evolutionary database design approach follows test-first development model, in which the complete working of a system is tested before and after implementing an update. Hence, it is guaranteed that the system always works.
Compatible with software development: The IT industry is progressing towards agile method of software development and evolutionary database design ensures that data development is in sync with software development.
Reduced overall effort: In an evolutionary environment only the functionality that is required at that moment is implemented and no more.

Disadvantages

Cultural impediments: Evolutionary database design approach is relatively a newer concept and many well qualified data professionals still advocate the traditional approach. Therefore, most of the databases are still being designed in a serial fashion and evolutionary database design is yet to gain support and traction among experienced data professionals.
Requires a learning curve: Most of the developers are more familiar with the traditional approach and it takes time to learn evolutionary design as it is not intuitive.
Complex: When the database has many external dependencies, making changes to the schema becomes all the more complicated as the external dependencies should also be updated to cope up with the changes made in the database schema. With the increase in number of dependencies, Evolutionary Database Design approach becomes extremely complex.

Comparison with traditional database design

Traditional database design technique does not support changes like evolutionary database design technique.'Unfortunately, the traditional data community assumed that evolving database schema is a hard thing to do and as a result never thought through how to do it.'^[1] In a way, the evolutionary design is better for application developers and traditional design is better for data professionals.^[6]

Properties	Traditional Database Design	Evolutionary Database Design
Design	Traditional databases were developed by collaboration between business analysts and users.	Evolutionary Databases were designed by software developers and data professionals.
Design Issues	They demonstrate some design issues in databases. Commercially available databases were developed by experienced individuals, but are now being serviced by the database and not data professionals.^[6]	They are developed by a close alliance of software developers and data professionals. The database evolves with the development and hence is processed by the same individuals who were responsible for development.
Approach towards change	Any change requested by the user is incorporated in the logical model followed by the physical model and then tested to ensure perfect functionality.^[6]	Change is an integral part of evolutionary database design. Any change requested by the user is immediately implemented in the database as well as the code. The data migration scripts need to be updated as well.
Dependency on ER diagram	Traditional design is methodological and due to its dependency on ER diagram and its detailed design phases such as user, logic, and physical we can track data as well as its meaning.^[6]	Design is interleaved between stages for evolutionary database design. Thus, the relationship between entities can change over software development cycle and so does the ER diagram.

Tools

Given below are a list of tools that provide the functionality of designing and developing a database in an evolutionary manner.

Related Research Articles

In computer programming and software design, code refactoring is the process of restructuring existing computer code—changing the factoring—without changing its external behavior. Refactoring is intended to improve the design, structure, and/or implementation of the software, while preserving its functionality. Potential advantages of refactoring may include improved code readability and reduced complexity; these can improve the source code's maintainability and create a simpler, cleaner, or more expressive internal architecture or object model to improve extensibility. Another potential goal for refactoring is improved performance; software engineers face an ongoing challenge to write programs that perform faster or use less memory.

Iterative and incremental development is any combination of both iterative design or iterative method and incremental build model for development.

Rapid-application development (RAD), also called rapid-application building (RAB), is both a general term for adaptive software development approaches, and the name for James Martin's method of rapid development. In general, RAD approaches to software development put less emphasis on planning and more emphasis on an adaptive process. Prototypes are often used in addition to or sometimes even instead of design specifications.

Test-driven development (TDD) is a software development process relying on software requirements being converted to test cases before software is fully developed, and tracking all software development by repeatedly testing the software against all test cases. This is as opposed to software being developed first and test cases created later.

In systems engineering and software engineering, requirements analysis focuses on the tasks that determine the needs or conditions to meet the new or altered product or project, taking account of the possibly conflicting requirements of the various stakeholders, analyzing, documenting, validating and managing software or system requirements.

Systems development life cycle Systems engineering term

In systems engineering, information systems and software engineering, the systems development life cycle (SDLC), also referred to as the application development life-cycle, is a process for planning, creating, testing, and deploying an information system. The systems development life cycle concept applies to a range of hardware and software configurations, as a system can be composed of hardware only, software only, or a combination of both. There are usually six stages in this cycle: requirement analysis, design, development and testing, implementation, documentation, and evaluation.

In software development, agile is a set of practices intended to improve the effectiveness of software development professionals, teams, and organizations. It involves discovering requirements and developing solutions through the collaborative effort of self-organizing and cross-functional teams and their customer(s)/end user(s). It advocates adaptive planning, evolutionary development, early delivery, and continual improvement, and it encourages flexible responses to changes in requirements, resource availability, and understanding of the problems to be solved.

Data modeling in software engineering is the process of creating a data model for an information system by applying certain formal techniques.

Software prototyping is the activity of creating prototypes of software applications, i.e., incomplete versions of the software program being developed. It is an activity that can occur in software development and is comparable to prototyping as known from other fields, such as mechanical engineering or manufacturing.

Lean software development is a translation of lean manufacturing principles and practices to the software development domain. Adapted from the Toyota Production System, it is emerging with the support of a pro-lean subculture within the Agile community. Lean offers a solid conceptual framework, values and principles, as well as good practices, derived from experience, that support agile organizations.

The incremental build model is a method of software development where the product is designed, implemented and tested incrementally until the product is finished. It involves both development and maintenance. The product is defined as finished when it satisfies all of its requirements. This model combines the elements of the waterfall model with the iterative philosophy of prototyping.

Extreme programming (XP) is an agile software development methodology used to implement software projects. This article details the practices used in this methodology. Extreme programming has 12 practices, grouped into four areas, derived from the best practices of software engineering.

Internet-Speed Development is an Agile Software Development development method using a combined spiral model/waterfall model with daily builds aimed at developing a product with high speed.

A database refactoring is a simple change to a database schema that improves its design while retaining both its behavioral and informational semantics. Database refactoring does not change the way data is interpreted or used and does not fix bugs or add new functionality. Every refactoring to a database leaves the system in a working state, thus not causing maintenance lags, provided the meaningful data exists in the production environment.

In software development, the V-model represents a development process that may be considered an extension of the waterfall model, and is an example of the more general V-model. Instead of moving down in a linear way, the process steps are bent upwards after the coding phase, to form the typical V shape. The V-Model demonstrates the relationships between each phase of the development life cycle and its associated phase of testing. The horizontal and vertical axes represent time or project completeness (left-to-right) and level of abstraction, respectively.

In software engineering, a software development process is the process of dividing software development work into smaller, parallel or sequential steps or subprocesses to improve design, product management, and project management. It is also known as a software development life cycle (SDLC). The methodology may include the pre-definition of specific deliverables and artifacts that are created and completed by a project team to develop or maintain an application.

Database testing usually consists of a layered process, including the user interface (UI) layer, the business layer, the data access layer and the database itself. The UI layer deals with the interface design of the database, while the business layer includes databases supporting business strategies.

In software engineering, schema migration refers to the management of incremental, reversible changes and version control to relational database schemas. A schema migration is performed on a database whenever it is necessary to update or revert that database's schema to some newer or older version.

Agile Business Intelligence (BI) refers to the use of Agile software development for BI projects to reduce the time it takes for traditional BI to show value to the organization, and to help in quickly adapting to changing business needs. Agile BI enables the BI team and managers to make better business decisions, and to start doing this more quickly.

Extreme programming (XP) is a software development methodology which is intended to improve software quality and responsiveness to changing customer requirements. As a type of agile software development, it advocates frequent "releases" in short development cycles, which is intended to improve productivity and introduce checkpoints at which new customer requirements can be adopted.

References

1 2 3 4 "Evolutionary Database Design" . Retrieved 2016-09-14.
↑ Vial, G. (2015-11-01). "Database Refactoring: Lessons from the Trenches". IEEE Software. 32 (6): 71–79. doi:10.1109/MS.2015.131. ISSN 0740-7459. S2CID 15349486.
1 2 Ambler, Scott; Sadalage, Pramod J (2006). Refactoring Database: Evolutionary Database Design. Addison Wesley Professional. ISBN 978-0-321-29353-4.
↑ "The Principles of Agile Modeling (AM)". www.agilemodeling.com. Retrieved 2016-09-22.
↑ "Evolutionary/Agile Database Best Practices". agiledata.org. Retrieved 2016-09-14.
1 2 3 4 Data, Big; data, Thank heavens for the silicon chip: A. BRIEF history of; motor, Car hacker secrets revealed: Clutching up a tank engine in a classic; Beyond the genome: YOU'VE BEEN DECODED, again. "Evolutionary vs. traditional database design" . Retrieved 2016-09-14.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[:1-1] 1 2 3 4 "Evolutionary Database Design" . Retrieved 2016-09-14.

[2] Vial, G. (2015-11-01). "Database Refactoring: Lessons from the Trenches". IEEE Software. 32 (6): 71–79. doi:10.1109/MS.2015.131. ISSN 0740-7459. S2CID 15349486.

[:0-3] 1 2 Ambler, Scott; Sadalage, Pramod J (2006). Refactoring Database: Evolutionary Database Design. Addison Wesley Professional. ISBN 978-0-321-29353-4.

[4] "The Principles of Agile Modeling (AM)". www.agilemodeling.com. Retrieved 2016-09-22.

[5] "Evolutionary/Agile Database Best Practices". agiledata.org. Retrieved 2016-09-14.

[:2-6] 1 2 3 4 Data, Big; data, Thank heavens for the silicon chip: A. BRIEF history of; motor, Car hacker secrets revealed: Clutching up a tank engine in a classic; Beyond the genome: YOU'VE BEEN DECODED, again. "Evolutionary vs. traditional database design" . Retrieved 2016-09-14.

[1]

[2]

[3]

[4]

[5]

[6]