A document management system (DMS) is usually a computerized system used to store, share, track and manage files or documents. Some systems include history tracking where a log of the various versions created and modified by different users is recorded. The term has some overlap with the concepts of content management systems. It is often viewed as a component of enterprise content management (ECM) systems and related to digital asset management, document imaging, workflow systems and records management systems.
While many electronic document management systems store documents in their native file format (Microsoft Word or Excel, PDF), some web-based document management systems are beginning to store content in the form of HTML. These HTML-based document management systems can act as publishing systems or policy management systems. [1] Content is captured either by using browser based editors or the importing and conversion of not HTML content. Storing documents as HTML enables a simpler full-text workflow as most search engines deal with HTML natively. DMS without an HTML storage format is required to extract the text from the proprietary format making the full text search workflow slightly more complicated.
Search capabilities including boolean queries, cluster analysis, and stemming [2] have become critical components of DMS as users have grown used to internet searching and spend less time organizing their content.
Document management systems commonly provide storage, versioning, metadata, security, as well as indexing and retrieval capabilities. Here is a description of these components:
Topic | Description |
---|---|
Metadata | Metadata is typically stored for each document. Metadata may, for example, include the date the document will be stored and the identity of the user storing it. The DMS may also extract metadata from the document automatically or prompt the user to add metadata. Some systems also use optical character recognition on scanned images, or perform text extraction on electronic documents. The resulting extracted text can be used to assist users in locating documents by identifying probable keywords or providing for full text search capability, or can be used on its own. Extracted text can also be stored as a component of metadata, stored with the document, or separately from the document as a source for searching document collections. [3] |
Integration | Many document management systems attempt to provide document management functionality directly to other applications, so that users may retrieve existing documents directly from the document management system repository, make changes, and save the changed document back to the repository as a new version, all without leaving the application. Such integration is commonly available for a variety of software tools such as workflow management and content management systems, typically through an application programming interface (API) using open standards such as ODMA, LDAP, WebDAV, and SOAP or RESTful web services. [4] [5] |
Capture | Capture primarily involves accepting and processing images of paper documents from scanners or multifunction printers. Optical character recognition (OCR) software is often used, whether integrated into the hardware or as stand-alone software, in order to convert digital images into machine readable text. Optical mark recognition (OMR) software is sometimes used to extract values of check-boxes or bubbles. Capture may also involve accepting electronic documents and other computer-based files. [6] |
Data validation | Data validation rules can check for document failures, missing signatures, misspelled names, and other issues, recommending real-time correction options before importing data into the DMS. Additional processing in the form of harmonization and data format changes may also be applied as part of data validation. [7] [8] |
Indexing | Indexing tracks electronic documents. Indexing may be as simple as keeping track of unique document identifiers; but often it takes a more complex form, providing classification through the documents' metadata or even through word indexes extracted from the documents' contents. Indexing exists mainly to support information query and retrieval. One area of critical importance for rapid retrieval is the creation of an index topology or scheme. [9] |
Storage | Store electronic documents. Storage of the documents often includes management of those same documents; where they are stored, for how long, migration of the documents from one storage media to another (hierarchical storage management) and eventual document destruction. |
Retrieval | Retrieve the electronic documents from the storage. Although the notion of retrieving a particular document is simple, retrieval in the electronic context can be quite complex and powerful. Simple retrieval of individual documents can be supported by allowing the user to specify the unique document identifier, and having the system use the basic index (or a non-indexed query on its data store) to retrieve the document. [9] More flexible retrieval allows the user to specify partial search terms involving the document identifier and/or parts of the expected metadata. This would typically return a list of documents which match the user's search terms. Some systems provide the capability to specify a Boolean expression containing multiple keywords or example phrases expected to exist within the documents' contents. The retrieval for this kind of query may be supported by previously built indexes, [9] or may perform more time-consuming searches through the documents' contents to return a list of the potentially relevant documents. See also Document retrieval. |
Distribution | A document ready for distribution has to be in a format that cannot be easily altered. An original master copy of the document is usually never used for distribution; rather, an electronic link to the document itself is more common. If a document is to be distributed electronically in a regulatory environment, then additional criteria must be met, including assurances of traceability and versioning, even across other systems. [10] This approach applies to both of the systems by which the document is to be inter-exchanged, if the integrity of the document is imperative. |
Security | Document security is vital in many document management applications. Compliance requirements for certain documents can be quite complex depending on the type of documents. For instance, in the United States, standards such as ISO 9001 and ISO 13485, as well as U.S. Food and Drug Administration regulations, dictate how the document control process should be addressed. [11] Document management systems may have a rights management module that allows an administrator to give access to documents based on type to only certain people or groups of people. Document marking at the time of printing or PDF-creation is an essential element to preclude alteration or unintended use. |
Workflow | Workflow is a complex process, and some document management systems have either a built-in workflow module [12] or can integrate with workflow management tools. [5] There are different types of workflow. Usage depends on the environment to which the electronic document management system (EDMS) is applied. Manual workflow requires a user to view the document and decide whom to send it to. Rules-based workflow allows an administrator to create a rule that dictates the flow of the document through an organization: for instance, an invoice passes through an approval process and then is routed to the accounts-payable department. Dynamic rules allow for branches to be created in a workflow process. A simple example would be to enter an invoice amount and if the amount is lower than a certain set amount, it follows different routes through the organization. Advanced workflow mechanisms can manipulate content or signal external processes while these rules are in effect. |
Collaboration | Collaboration should be inherent in an EDMS. In its basic form, collaborative EDMS should allow documents to be retrieved and worked on by an authorized user. Access should be blocked to other users while work is being performed on the document. Other advanced forms of collaboration act in real time, allowing multiple users to view and modify (or markup) documents at the same time. The resulting document is comprehensive, including all users additions. Collaboration within document management systems means that the various markups by each individual user during the collaboration session are recorded, allowing document history to be monitored. [13] |
Versioning | Versioning is a process by which documents are checked in or out of the document management system, allowing users to retrieve previous versions and to continue work from a selected point. Versioning is useful for documents that change over time and require updating, but it may be necessary to go back to or reference a previous copy. [13] |
Searching | Searching finds documents and folders using template attributes or full text search. Documents can be searched using various attributes and document content. |
Federated search | This refers to the capability to extend search capabilities to draw results from multiple sources, or from multiple DMSes within an enterprise. [14] |
Publishing | Publishing a document involves the procedures of proofreading, peer or public reviewing, authorizing, printing and approving etc. Those steps ensure prudence and logical thinking. Any careless handling may result in the inaccuracy of the document and therefore mislead or upset its users and readers. In law regulated industries, some of the procedures have to be completed as evidenced by their corresponding signatures and the date(s) on which the document was signed. Refer to the ISO divisions of ICS 01.140.40 and 35.240.30 for further information. [15] [16] The published document should be in a format that is not easily altered without a specific knowledge or tools, and yet it is read-only or portable. [17] |
Hard copy reproduction | Document/image reproduction is often necessary within a document management system, and its supported output devices and reproduction capabilities should be considered. [18] |
Many industry associations publish their own lists of particular document control standards that are used in their particular field. Following is a list of some of the relevant ISO documents. Divisions ICS 01.140.10 and 01.140.20. [19] [20] The ISO has also published a series of standards regarding the technical documentation, covered by the division of 01.110. [21]
Government regulations typically require that companies working in certain industries control their documents. A Document Controller is responsible to control these documents strictly. These industries include accounting (for example: 8th EU Directive, Sarbanes–Oxley Act), food safety (for example the Food Safety Modernization Act in the US), ISO (mentioned above), medical device manufacturing (FDA), manufacture of blood, human cells, and tissue products (FDA), healthcare ( JCAHO ), and information technology ( ITIL ). [22] Some industries work under stricter document control requirements due to the type of information they retain for privacy, warranty, or other highly regulated purposes. Examples include protected health information (PHI) as required by HIPAA or construction project documents required for warranty periods. An information systems strategy plan (ISSP) can shape organisational information systems over medium to long-term periods. [23]
Documents stored in a document management system—such as procedures, work instructions, and policy statements—provide evidence of documents under control. Failing to comply can cause fines, the loss of business, or damage to a business's reputation.
Document control includes: [24]
These document control requirements form part of an organisation's compliance costs alongside related functions such as a data protection officer and internal audit.
Integrated document management comprises the technologies, tools, and methods used to capture, manage, store, preserve, deliver and dispose of 'documents' across an enterprise. In this context 'documents' are any of a myriad of information assets including images, office documents, graphics, and drawings as well as the new electronic objects such as Web pages, email, instant messages, and video.
Paper documents have long been used in storing information. However, paper can be costly and, if used excessively, wasteful. Document management software is not simply a tool but it lets a user manage access, track and edit information stored. Document management software is an electronic cabinet that can be used to organize all paper and digital files. [25] The software helps the businesses to combine paper to digital files and store it into a single hub after it is scanned and digital formats get imported. [26] One of the most important benefits of digital document management is a “fail-safe” environment for safeguarding all documents and data. [27] In the heavy construction industry specifically, document management software allows team members to securely view and upload documents for projects they are assigned to from anywhere and at any time to help streamline day-to-day operations. [28]
Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems. Based on the PostScript language, each PDF file encapsulates a complete description of a fixed-layout flat document, including the text, fonts, vector graphics, raster images and other information needed to display it. PDF has its roots in "The Camelot Project" initiated by Adobe co-founder John Warnock in 1991. PDF was standardized as ISO 32000 in 2008. The last edition as ISO 32000-2:2020 was published in December 2020.
Wiki software is collaborative software that runs a wiki, which allows the users to create and collaboratively edit pages or entries via a web browser. A wiki system is usually a web application that runs on one or more web servers. The content, including previous revisions, is usually stored in either a file system or a database. Wikis are a type of web content management system, and the most commonly supported off-the-shelf software that web hosting facilities offer.
In communications and computing, a machine-readable medium is a medium capable of storing data in a format easily readable by a digital computer or a sensor. It contrasts with human-readable medium and data.
A content management system (CMS) is computer software used to manage the creation and modification of digital content . A CMS is typically used for enterprise content management (ECM) and web content management (WCM). ECM typically supports multiple users in a collaborative environment, by integrating document management, digital asset management, and record retention. Alternatively, WCM is the collaborative authoring for websites and may include text and embed graphics, photos, video, audio, maps, and program code that display content and interact with the user. ECM typically includes a WCM function.
Workflow is a generic term for orchestrated and repeatable patterns of activity, enabled by the systematic organization of resources into processes that transform materials, provide services, or process information. It can be depicted as a sequence of operations, the work of a person or group, the work of an organization of staff, or one or more simple or complex mechanisms.
Documentation is any communicable material that is used to describe, explain or instruct regarding some attributes of an object, system or procedure, such as its parts, assembly, installation, maintenance, and use. As a form of knowledge management and knowledge organization, documentation can be provided on paper, online, or on digital or analog media, such as audio tape or CDs. Examples are user guides, white papers, online help, and quick-reference guides. Paper or hard-copy documentation has become less common. Documentation is often distributed via websites, software products, and other online applications.
The Organization for the Advancement of Structured Information Standards is a nonprofit consortium that works on the development, convergence, and adoption of projects - both open standards and open source - for Computer security, blockchain, Internet of things (IoT), emergency management, cloud computing, legal data exchange, energy, content technologies, and other areas.
An open file format is a file format for storing digital data, defined by an openly published specification usually maintained by a standards organization, and which can be used and implemented by anyone. An open file format is licensed with an open license. For example, an open format can be implemented by both proprietary and free and open-source software, using the typical software licenses used by each. In contrast to open file formats, closed file formats are considered trade secrets.
Application software is any computer program that is intended for end-user use – not operating, administering or programming the computer. An application is any program that can be categorized as application software. Common types of applications include word processor, media player and accounting software.
Digital asset management (DAM) and the implementation of its use as a computer application is required in the collection of digital assets to ensure that the owner, and possibly their delegates, can perform operations on the data files.
Records management, also known as records and information management, is an organizational function devoted to the management of information in an organization throughout its life cycle, from the time of creation or receipt to its eventual disposition. This includes identifying, classifying, storing, securing, retrieving, tracking and destroying or permanently preserving records. The ISO 15489-1: 2001 standard defines records management as "[the] field of management responsible for the efficient and systematic control of the creation, receipt, maintenance, use and disposition of records, including the processes for capturing and maintaining evidence of and information about business activities and transactions in the form of records".
Enterprise content management (ECM) extends the concept of content management by adding a timeline for each content item and, possibly, enforcing processes for its creation, approval, and distribution. Systems using ECM generally provide a secure repository for managed items, analog or digital. They also include one methods for importing content to manage new items, and several presentation methods to make items available for use. Although ECM content may be protected by digital rights management (DRM), it is not required. ECM is distinguished from general content management by its cognizance of the processes and procedures of the enterprise for which it is created.
A proprietary file format is a file format of a company, organization, or individual that contains data that is ordered and stored according to a particular encoding-scheme, such that the decoding and interpretation of this stored data is easily accomplished only with particular software or hardware that the company itself has developed. In contrast, a open or free format is a file format that is published and free to be used by everybody.
A Common Source Database (CSDB) is to provide the user with automated processes to handle the complete palette of CSDB objects. Technical documentation is used in many areas of the everyday life. Product liability and many other issues regarding consumer protection have to be covered inside technical documentation. At minimum, a drawing including a few locators has to be provided. Much of this information is in accordance with the international S1000D specification.
A specification often refers to a set of documented requirements to be satisfied by a material, design, product, or service. A specification is often a type of technical standard.
Metadata is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including:
Document capture software refers to applications that provide the ability and feature set to automate the process of scanning paper documents or importing electronic documents, often for the purposes of feeding advanced document classification and data collection processes. Most scanning hardware, both scanners and copiers, provides the basic ability to scan to any number of image file formats, including: PDF, TIFF, JPG, BMP, etc. This basic functionality is augmented by document capture software, which can add efficiency and standardization to the process.
The Portable Document Format (PDF) was created by Adobe Systems, introduced at the Windows and OS/2 Conference in January 1993 and remained a proprietary format until it was released as an open standard in 2008. Since then, it has been under the control of an International Organization for Standardization (ISO) committee of industry experts.
The data validation rules should be embedded in the form itself, rather than accomplished in a post-processing environment. This provides the use an interactive real-time experience. Often data validation requires a database look-up. The rules should allow this database query, providing the user real-time choices based on query results.
At the organisational level an information systems strategy plan (ISSP) is a way to determine in general terms what information systems an organisation should have in place over the medium to long term (typically around three to five years [...]).