Real-time database

Last updated

Real-time database has two meanings. The most common use of the term refers to a database system which uses streaming technologies to handle workloads whose state is constantly changing. [1] This differs from traditional databases containing persistent data, mostly unaffected by time. When referring to streaming technologies, real-time processing means that a transaction is processed fast enough for the result to come back and be acted on right away. [2] Such real-time databases are useful for assisting social media platforms in the removal of fake news, in-store surveillance cameras identifying potential shoplifters by their behavior/movements, etc.

Contents

The second meaning of the term “real-time database” adheres to a stricter definition of real-time consistent with Real-time computing. Hard real-time database systems work with a real-time operating system to ensure the temporal validity of data through the enforcement of database transaction deadlines and include a mechanism (such as transaction scheduling policies) to maximize the number of successfully committed transactions and minimize the number of rolled-back transactions. While the performance metric for most database systems is throughput or transactions-per-second, the performance metric of a hard real-time database system is the ratio of committed-to-aborted transactions. This ratio indicates how effective the transaction scheduling policy is, with the ultimate goal of meeting deadlines 100% of the time. Hard real-time databases, through enforcement of deadlines, may not allow transactions to be late (overrun the deadline). [3]

Overview

Real-time databases are traditional databases that use an extension to give the additional power to yield reliable responses. They use timing constraints that represent a certain range of values for which the data are valid. This range is called temporal validity. A conventional database cannot work under these circumstances because the inconsistencies between the real world objects and the data that represents them are too severe for simple modifications. An effective system needs to be able to handle time-sensitive queries, return only temporally valid data, and support priority scheduling. To enter the data in the records, often a sensor or an input device monitors the state of the physical system and updates the database with new information to reflect the physical system more accurately. [4] When designing a real-time database system, one should consider how to represent valid time, how facts are associated with real-time system. Also, consider how to represent attribute values in the database so that process transactions and data consistency have no violations.

When designing a system, it is important to consider what the system should do when deadlines are not met. [5] For example, an air-traffic control system constantly monitors hundreds of aircraft and makes decisions about incoming flight paths and determines the order in which aircraft should land based on data such as fuel, altitude, and speed. If any of this information is late, the result could be devastating. To address issues of obsolete data, the timestamp can support transactions by providing clear time references.

Preserving data consistency

Although the real-time database system may seem like a simple system, problems arise during overload when two or more database transactions require access to the same portion of the database. A transaction is usually the result of an execution of a program that accesses or changes the contents of a database. [6] A transaction is different from a stream because a stream only allows read-only operations, and transactions can do both read and write operations. This means in a stream, multiple users can read from the same piece of data, but they cannot both modify it. [4] A database must let only one transaction operate at a time to preserve data consistency. For example, if two students demand to take the remaining spot for a section of a class and they hit submit at the same time, only one student should be able to register for it. [4]

Real-time databases can process these requests utilizing scheduling algorithms for concurrency control, prioritizing both students’ requests in some way. Throughout this article, we assume that the system has a single processor, a disk based database, and a main memory pool. [7]

In real-time databases, deadlines are formed and different kinds of systems respond to data that does not meet its deadline in different ways. In a real-time system, each transaction uses a timestamp to schedule the transactions. [4] A priority mapper unit assigns a level of importance to each transaction upon its arrival in the database system that is dependent on how the system views times and other priorities. The timestamp method relies on the arrival time in the system. Researchers indicate that for most studies, transactions are sporadic with unpredictable arrival times. For example, the system gives an earlier request deadline to a higher priority and a later deadline to a lower priority. [7] Below is a comparison of different scheduling algorithms.

Earliest Deadline
PT = DT — The value of a transaction is not important. An example is a group of people calling to order a product.
Highest Value
PT = 1/VT — The deadline is not important. Some transactions should get to CPU based on criticalness, not fairness. This is an example of least slack that can wait the least amount of time. If the telephone switchboards were overloaded, people who call 911 should get priority. [8]
Value inflated deadline
PT = DT/VT — Gives equal weight to deadline and values based on scheduling. An example is registering for classes where the student selects a block of classes that he wishes to take and presses submit. In this scenario, higher priorities often take up precedence. A school registration system probably uses this technique when the server receives two registration transactions. If one student had 22 credits and the other had 100 credits, the person with 100 credits would take priority (Value based scheduling).

Timing constraints and deadlines

A system that correctly perceives the serialization and timing constraints associated with transactions with soft or firm deadlines, takes advantage of absolute consistency. [9] Another way of making sure that data is absolute is using relative constraints. Relative constraints ensure transactions enter into the system at the same time as the rest of the group that the data transaction is associated with. Using the mechanisms of absolute and relative constraints greatly ensures the accuracy of data.

An additional way of dealing with conflict resolution in a real-time database system besides deadlines is a wait policy method. This process helps ensure the latest information in time critical systems. The policy avoids conflict by asking all non-requesting blocks to wait until the most essential block of data is processed. [4] While studies in labs have found that data-deadline based policies do not improve performance significantly, the forced wait policy can improve performance by 50 percent. [10] The forced wait policy may involve waiting for higher priority transactions to process in order to prevent deadlock. Another example of when data can be delayed is when a block of data is about to expire. The forced wait policy delays processing until the data is updated using new input data. The latter method helps increase the accuracy of the system and can cut down on the number of necessary processes that are aborted. Generally, relying on wait policies is not optimal. [11]

It is necessary to discuss the formation of deadlines. Deadlines are the constraints for soon-to-be replaced data accessed by the transaction. Deadlines can be either observant or predictive. [11] In an observant deadline system, all unfinished transactions are examined and the processor determines whether any had met its deadline. [4] Problems arise in this method because of variations caused by seek time variations, buffer management and page faults. [12] A more stable way of organizing deadlines is the predictive method. It builds a candidate schedule and determines if a transaction would miss its deadline under the schedule. [4]

The type of response to a missed deadline depends on whether the deadline is hard, soft, or firm. Hard deadlines require that each data packet reach its destination before the packet has expired and if not, the process could be lost, causing a possible problem. Problems like these are not very common because omnipotence of the system is required before assigning deadlines to determine worst case. This is very hard to do and if something unexpected happens to the system such as a minute hardware glitch, it could throw the data off. For soft or firm deadlines, missing a deadline can lead to a degraded performance but not a catastrophe. [7] A soft deadline meets as many deadlines as possible. However, no guarantee exists that the system can meet all deadlines. Should a transaction miss its deadline, the system has more flexibility and the transaction may increase in importance. Below is a description of these responses:

Hard deadline
If not meeting deadlines creates problems, a hard deadline is best. It is periodic, meaning that it enters the database on a regular rhythmic pattern. An example is data gathered by a sensor. These are often used in life critical systems. [13]
Firm deadline
Firm deadlines appear to be similar to hard deadlines yet they differ from hard deadlines because firm deadlines measure how important it is to complete the transaction at some point after the transaction arrives. Sometimes completing a transaction after its deadline has expired may be harmful or not helpful, and both the firm and hard deadlines consider this. An example of a firm deadline is an autopilot system. [8]
Soft deadline
If meeting time constrains is desirable but missing deadlines do not cause serious damage, a soft deadline may be best. It operates on an aperiodic or irregular schedule. In fact, the arrival of each time for each task is unknown. An example is an operator switchboard for a telephone. [13]

Hard deadline processes abort transactions that have passed the deadline, improving the system by cleaning out clutter that needs to be processed. Processes can clear out not only the transactions with expired deadlines but also transactions with the longest deadlines, assuming that once they reach the processor they would be obsolete. This means other transactions should be of higher priority. In addition, a system can remove the least critical transactions. When I was pre-selecting classes on during a high traffic period, a field in the database can become so busy with registration requests that it was unavailable for a while and the result of my transaction was a display of the SQL query sent and a message that said that the data is currently unavailable. This error is caused by the checker, a mechanism that checks the condition of the rules, and the rule that occurred before it. [14]

The goal of scheduling periods and deadlines is to update transactions guaranteed to complete before their deadline in such a way that the workload is minimal. With large real-time databases, buffering functions can help improve performance tremendously. A buffer is part of the database that is stored in main memory to reduce transaction response time. In order to reduce disk input and output transactions, a certain number of buffers should be allocated. [15] Sometimes multiversions are stored in buffers when the data block the transaction needs is currently in use. Later, the database has the data appended to it. Different strategies allocate buffers and must balance between taking an excessive amount of memory and having everything in one buffer that it has to search for. The goal is to eliminate search time and distribute the resources between buffer frames in order to access data quickly. A buffer manager is capable of allocating more memory, if necessary, to improve response time. The buffer manager can even determine whether a transaction that it has should advance. Buffering can improve speed in real-time systems. [15]

Future database systems

Traditional databases are persistent but are incapable of dealing with dynamic data that constantly changes. Therefore, another system is needed. Real-time databases may be modified to improve accuracy and efficiency and to avoid conflict, by providing deadlines and wait periods to insure temporal consistency. Real-time database systems offer a way of monitoring a physical system and representing it in data streams to a database. A data stream, like memory, fades over time. In order to guarantee that the freshest and most accurate information is recorded there are a number of ways of checking transactions to make sure they are executed in the proper order. An online auction house provides an example of a rapidly changing database.

Now database systems are faster than they were in the past. In the future, we can look forward to even faster database systems. Although we have faster systems now, an effort to reduce misses and tardy times will still be beneficial. The ability to process results in a timely and predictable manner will always be more important than fast processing. Fast processing that is misapplied is not helpful for real-time database systems. Transactions that run faster still sometimes block in such a way that they have to be aborted and restarted. In fact, faster processing hurts some real-time applications because increased speed brings more complexity and more of a chance for problems caused by a variance of speed. Faster processing makes it harder to determine which deadlines have been met successfully. With future database systems running even faster than ever, there is a need to do more studies so we can continue to have efficient systems. [16]

The amount of research studying real-time database systems will increase because of commercial applications such as web based auction houses like eBay. More developing countries are expanding their phone systems, and the number of people with cell phones in the United States as well as other places in the world continues to grow. Also likely to spur real-time research is the exponentially increasing speed of the microprocessor. This also enables new technologies such as web-video conferencing and instant messenger conversations in sound and high-resolution video, which are reliant on real-time database systems. Studies of temporal consistency result in new protocols and timing constraints with the goal of handling real-time transactions more effectively. [7]

Related Research Articles

Real-time computing (RTC) is the computer science term for hardware and software systems subject to a "real-time constraint", for example from event to system response. Real-time programs must guarantee response within specified time constraints, often referred to as "deadlines".

A real-time operating system (RTOS) is an operating system (OS) for real-time computing applications that processes data and events that have critically defined time constraints. An RTOS is distinct from a time-sharing operating system, such as Unix, which manages the sharing of system resources with a scheduler, data buffers, or fixed task prioritization in a multitasking or multiprogramming environment. Processing time requirements need to be fully understood and bound rather than just kept as a minimum. All processing must occur within the defined constraints. Real-time operating systems are event-driven and preemptive, meaning the OS can monitor the relevant priority of competing tasks, and make changes to the task priority. Event-driven systems switch between tasks based on their priorities, while time-sharing systems switch the task based on clock interrupts.

In computer science, ACID is a set of properties of database transactions intended to guarantee data validity despite errors, power failures, and other mishaps. In the context of databases, a sequence of database operations that satisfies the ACID properties is called a transaction. For example, a transfer of funds from one bank account to another, even involving multiple changes such as debiting one account and crediting another, is a single transaction.

In information technology and computer science, especially in the fields of computer programming, operating systems, multiprocessors, and databases, concurrency control ensures that correct results for concurrent operations are generated, while getting those results as quickly as possible.

In computing, scheduling is the action of assigning resources to perform tasks. The resources may be processors, network links or expansion cards. The tasks may be threads, processes or data flows.

In computer science, a lock or mutex is a synchronization primitive: a mechanism that enforces limits on access to a resource when there are many threads of execution. A lock is designed to enforce a mutual exclusion concurrency control policy, and with a variety of possible methods there exists multiple unique implementations for different applications.

In database systems, durability is the ACID property that guarantees that the effects of transactions that have been committed will survive permanently, even in case of failures, including incidents and catastrophic events. For example, if a flight booking reports that a seat has successfully been booked, then the seat will remain booked even if the system crashes.

In computer science, an algorithm is called non-blocking if failure or suspension of any thread cannot cause failure or suspension of another thread; for some operations, these algorithms provide a useful alternative to traditional blocking implementations. A non-blocking algorithm is lock-free if there is guaranteed system-wide progress, and wait-free if there is also guaranteed per-thread progress. "Non-blocking" was used as a synonym for "lock-free" in the literature until the introduction of obstruction-freedom in 2003.

In computer science, software transactional memory (STM) is a concurrency control mechanism analogous to database transactions for controlling access to shared memory in concurrent computing. It is an alternative to lock-based synchronization. STM is a strategy implemented in software, rather than as a hardware component. A transaction in this context occurs when a piece of code executes a series of reads and writes to shared memory. These reads and writes logically occur at a single instant in time; intermediate states are not visible to other (successful) transactions. The idea of providing hardware support for transactions originated in a 1986 paper by Tom Knight. The idea was popularized by Maurice Herlihy and J. Eliot B. Moss. In 1995 Nir Shavit and Dan Touitou extended this idea to software-only transactional memory (STM). Since 2005, STM has been the focus of intense research and support for practical implementations is growing.

The Access Database Engine is a database engine on which several Microsoft products have been built. The first version of Jet was developed in 1992, consisting of three modules which could be used to manipulate a database.

Online transaction processing (OLTP) is a type of database system used in transaction-oriented applications, such as many operational systems. "Online" refers to that such systems are expected to respond to user requests and process them in real-time. The term is contrasted with online analytical processing (OLAP) which instead focuses on data analysis.

Replication in computing involves sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility.

A transaction processing system (TPS) is a software system, or software/hardware combination, that supports transaction processing.

In concurrency control of databases, transaction processing, and various transactional applications, both centralized and distributed, a transaction schedule is serializable if its outcome is equal to the outcome of its transactions executed serially, i.e. without overlapping in time. Transactions are normally executed concurrently, since this is the most efficient way. Serializability is the major correctness criterion for concurrent transactions' executions. It is considered the highest level of isolation between transactions, and plays an essential role in concurrency control. As such it is supported in all general purpose database systems. Strong strict two-phase locking (SS2PL) is a popular serializability mechanism utilized in most of the database systems since their early days in the 1970s.

In computer science, synchronization refers to one of two distinct but related concepts: synchronization of processes, and synchronization of data. Process synchronization refers to the idea that multiple processes are to join up or handshake at a certain point, in order to reach an agreement or commit to a certain sequence of action. Data synchronization refers to the idea of keeping multiple copies of a dataset in coherence with one another, or to maintain data integrity. Process synchronization primitives are commonly used to implement data synchronization.

Data consistency refers to whether the same data kept at different places do or do not match.

Operational database management systems, are used to update data in real-time. These types of databases allow users to do more than simply view archived data. Operational databases allow you to modify that data, doing it in real-time. OLTP databases provide transactions as main abstraction to guarantee data consistency that guarantee the so-called ACID properties. Basically, the consistency of the data is guaranteed in the case of failures and/or concurrent access to the data.

OS 2200 is the operating system for the Unisys ClearPath Dorado family of mainframe systems. The operating system kernel of OS 2200 is a lineal descendant of Exec 8 for the UNIVAC 1108. Documentation and other information on current and past Unisys systems can be found on the Unisys public support website.

Earliest deadline first (EDF) or least time to go is a dynamic priority scheduling algorithm used in real-time operating systems to place processes in a priority queue. Whenever a scheduling event occurs the queue will be searched for the process closest to its deadline. This process is the next to be scheduled for execution.

Time-Sensitive Networking (TSN) is a set of standards under development by the Time-Sensitive Networking task group of the IEEE 802.1 working group. The TSN task group was formed in November 2012 by renaming the existing Audio Video Bridging Task Group and continuing its work. The name changed as a result of the extension of the working area of the standardization group. The standards define mechanisms for the time-sensitive transmission of data over deterministic Ethernet networks.

References

  1. Buchmann, A. "Real Time Database Systems." Encyclopedia of Database Technologies and Applications. Ed. Laura C. Rivero, Jorge H. Doorn, and Viviana E. Ferraggine. Idea Group, 2005.
  2. Carpron, H.L., J. A. Johnson. Computers: Tools for the Information Age. Prentice Hall, 1998. 5th ed.
  3. "What is and what isn't a hard real-time database system?". db-engines.com. Retrieved 2023-03-17.
  4. 1 2 3 4 5 6 7 Abbot, Robert K., and Hector Garcia-Molina. (1992). "Scheduling Real-Time Transactions: a Performance Evaluation" (PDF). ACM Transactions on Database Systems. Stanford University and Digital Equipment Corp. ACM. 17 (3): 513–560. doi:10.1145/132271.132276. S2CID   28960 . Retrieved 13 December 2006.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  5. "Real-Time Database Systems Aren't Actually Real-Time. Unless They Are". www.electronicdesign.com. Retrieved 2023-01-21.
  6. Singhal, Mukesh. Approaches to Design of Real-Time Database Systems, SIGMOD Record, volume 17, no. 1, March 1988
  7. 1 2 3 4 Haritsa, J., J. Stankovic, and M Xiong. "A State-Conscious Concurrency Control Protocol for Replicated Real-Time Databases". University of Virginia. IEEE Real-Time Applications Symposium. Retrieved 13 December 2006.{{cite journal}}: Cite journal requires |journal= (help)CS1 maint: multiple names: authors list (link)
  8. 1 2 (Snodgrass)
  9. Lee, Juhnyoung (1994). "Concurrency Control Algorithms for Real-Time Database Systems". Diss. Univ. of Virginia. Retrieved 13 December 2006.{{cite journal}}: Cite journal requires |journal= (help)
  10. (Porkka)
  11. 1 2 Kang, K D., S Son, and J Stankovic. Specifying and Managing Quality of Real-Time Data Services. University of Virginia. IEEE TKDE, 2004.
  12. Kao & Garcia-Molina 1994, pp. 261–282.
  13. 1 2 Stankovic, John A., Marco Spuri, Krithi Ramamritham, and Giorgio C. Buttazzo. Deadline Scheduling for Real-Time Systems: EDF and Related Algorithms. Springer, 1998.
  14. (Ramamritham)
  15. 1 2 (O'Neil)
  16. Lam, Kam-Yiu, and Tei-Wei Kuo. Real-Time Database Systems: Architecture and Techniques. Springer, 2001.

Further reading