XDXF

Last updated
XDXF
Filename extension
.xdxf
Internet media type
application/xml
Developed byInitial development by Sergey Singov, later by Leonid Soshinskiy
Initial release10 September 2006;17 years ago (2006-09-10)
Latest release
rev.34
January 19, 2022;2 years ago (2022-01-19)
Type of format XML dictionary format
Open format?Yes
Website github.com/soshial/xdxf_makedict/

XDXF (XML Dictionary eXchange Format) is a project to unite all existing open dictionaries and provide both users and developers with a universal XML-based format, convertible from and to other popular formats like Mova, PtkDic, and StarDict.

Contents

Available dictionaries

As of December 15, 2006 the XDXF project repository contains 615 dictionaries, which are collectively 936,189,613 bytes in size (compressed) and contain 24,804,355 articles.

Software

GUIs

The XDXF file format is used by Alpus, SimpleDict and GoldenDict. [1] Also StarDict starting with version 2.4.6 has basic support for XDXF. [2]

Converters

There are numerous converters: pyglossary, xdxf2slob and others. Initially, the project had its own converter, but it was deprecated.

Alternatives

Many languages serve a similar purpose, e.g., the Lexical Markup Framework (XML and other serializations), OntoLex (RDF), DICT (text format), or the dicML markup languages. As for dicML and XDXF, neither concept is specified completely. For example, XDXF lacks elements to annotate possible hyphenations, while the recent working draft of dicML does not include elements to describe the etymology of words.

Related Research Articles

<span class="mw-page-title-main">LaTeX</span> Document preparation software system

LaTeX is a software system for typesetting documents. LaTeX markup describes the content and layout of the document, as opposed to the formatted text found in WYSIWYG word processors like Microsoft Word, LibreOffice Writer and Apple Pages. The writer uses markup tagging conventions to define the general structure of a document, to stylise text throughout a document, and to add citations and cross-references. A TeX distribution such as TeX Live or MiKTeX is used to produce an output file suitable for printing or digital distribution.

<span class="mw-page-title-main">XML</span> Markup language by the W3C for encoding of data

Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.

DocBook is a semantic markup language for technical documentation. It was originally intended for writing technical documents related to computer hardware and software, but it can be used for any other sort of documentation.

Mathematical Markup Language (MathML) is a mathematical markup language, an application of XML for describing mathematical notations and capturing both its structure and content, and is one of a number of mathematical markup languages. Its aim is to natively integrate mathematical formulae into World Wide Web pages and other documents. It is part of HTML5 and standardised by ISO/IEC since 2015.

DICT is a dictionary network protocol created by the DICT Development Group in 1997, described by RFC 2229. Its goal is to surpass the Webster protocol to allow clients to access a variety of dictionaries via a uniform interface.

In computing, RELAX NG is a schema language for XML—a RELAX NG schema specifies a pattern for the structure and content of an XML document. A RELAX NG schema is itself an XML document but RELAX NG also offers a popular compact, non-XML syntax. Compared to other XML schema languages RELAX NG is considered relatively simple.

A lightweight markup language (LML), also termed a simple or humane markup language, is a markup language with simple, unobtrusive syntax. It is designed to be easy to write using any generic text editor and easy to read in its raw form. Lightweight markup languages are used in applications where it may be necessary to read the raw document as well as the final rendered output.

<span class="mw-page-title-main">StarDict</span> Free software multilingual dictionary

StarDict, developed by Hu Zheng (胡正), is a free GUI released under the GPL-3.0-or-later license for accessing StarDict dictionary files. It is the successor of StarDic, developed by Ma Su'an (馬蘇安), continuing its version numbers.

The Darwin Information Typing Architecture (DITA) specification defines a set of document types for authoring and organizing topic-oriented information, as well as a set of mechanisms for combining, extending, and constraining document types. It is an open standard that is defined and maintained by the OASIS DITA Technical Committee.

Simple Groupware is a groupware package written in PHP. It uses the MySQL database. It contains a calendar system, an email client, an inventory system, and a number of other features. Simple Groupware is free software released under the GNU General Public License.

Office Open XML is a zipped, XML-based file format developed by Microsoft for representing spreadsheets, charts, presentations and word processing documents. Ecma International standardized the initial version as ECMA-376. ISO and IEC standardized later versions as ISO/IEC 29500.

GraphML is an XML-based file format for graphs. The GraphML file format results from the joint effort of the graph drawing community to define a common format for exchanging graph structure data. It uses an XML-based syntax and supports the entire range of possible graph structure constellations including directed, undirected, mixed graphs, hypergraphs, and application-specific attributes.

The Theological Markup Language (ThML) is a "royalty-free" XML-based format created in 1998 by the Christian Classics Ethereal Library (CCEL) to create electronic theological texts. Other formats such as STEP and Logos Library System (LLS) were found unacceptable by CCEL as they are proprietary, prompting the creation of the new language. The ThML format borrowed elements from a somewhat similar format, the Text Encoding Initiative (TEI).

UTX is a simple glossary format. UTX is developed by AAMT.

hOCR is an open standard of data representation for formatted text obtained from optical character recognition (OCR). The definition encodes text, style, layout information, recognition confidence metrics and other information using Extensible Markup Language (XML) in the form of Hypertext Markup Language (HTML) or XHTML.

The Office Open XML file formats are a set of file formats that can be used to represent electronic office documents. There are formats for word processing documents, spreadsheets and presentations as well as specific formats for material such as mathematical formulas, graphics, bibliographies etc.

TeX4ht is a configurable converter capable of translating TeX and LaTeX documents to HTML and certain XML formats. Most notably, TeX4ht serves for converting (La)TeX documents to formats used by word processors. It was developed by Eitan M. Gurari.

<span class="mw-page-title-main">Akoma Ntoso</span>

Akoma Ntoso (Architecture for Knowledge-Oriented Management of African Normative Texts using Open Standards and Ontologies) is an international technical standard for representing executive, legislative and judiciary documents in a structured manner using a domain specific, legal XML vocabulary.

References

  1. "GoldenDict's Supported Formats". Github. Konstantin Isakov. Retrieved 1 Apr 2017.
  2. "StarDict File Format document". Github. Hu Zheng. Retrieved 16 May 2014.