Heurist

Last updated

Heurist
H6logo intro.png
Original author(s) Ian Johnson (Team Leader), Artem Osmakov (Senior Developer), Jessica Norris (Designer), Mitema Emmanuel (Programmer), Vincent Sheehan (Documentation/Webmaster), Abed Kassis (Server Manager), Tom Murtagh, Kim Jackson, Steve White and others..
Developer(s) Faculty of Arts at
The University of Sydney
Stable release
v5.1.10
Repository
Written in PHP, JavaScript
Operating system Linux, Microsoft Windows
Available inEnglish
Type Web-based user-configurable data management software
License GNU GPLv3+
Website heuristnetwork.org
github.com/HeuristNetwork/heurist
As ofDecember 2019

Heurist is an Open Source online database builder and CMS publisher designed for Humanities research data and collections, including data on people, organisations, places, events, artefacts, documents, media, bibliographic records, [1] contemporary stories and other data which is rich in text and classification data, richly interlinked, and often heterogeneous. [2]

Contents

Heurist was originally designed by Ian Johnson (from 2005) and developed by the (now disbanded) Arts eResearch unit (AeR) at the University of Sydney. It continues to be actively developed within the Faculty of Arts and Social Sciences (version 6 released 2021). Free web services for building research databases are available at https://heuristplus.sydney.edu.au/ and https://heurist.Huma-Num.fr . New Heurist servers can be set up using installation packages downloadable from the project web site (http://HeuristNetwork.org). The source is available at https://github.com/HeuristNetwork/heurist).

Heurist was developed to overcome three problems identified as common to researchers in the Humanities (and others):

It aims to tackle these issues by:

Methodology

Heurist is written in PHP and JavaScript, on top of a fixed MySQL/MariaDB data structure (all Heurist databases have the same underlying MySQL structure, as the schema of the domain is encoded directly in the database as editable data). Entities/record types, fields, vocabularies and terms are defined through data within the database rather than being hardcoded in the software or database structure. Heurist uses a key-value pair approach linked to a primary data table instantiating typed entities, allowing variant data structures and repeating value fields (0 or 1 ..1..m cardinality) with maintained order. Relationships between entities are implemented as record pointer fields (equivalent to a Foreign Key) and Relationship Marker fields (constraining the creation of relationship records linking any two records/entities).

Heurist has the following field types, all of which can have multiple cardinality:

Heurist provides several modes of data visualisation and export based on filtered subsets of the database: export in CSV, JSon, XML, KML, GeoJSon, GEFX for Gephi, IIIF manifests; tabular listing; user-defined reporting using Smarty; interactive maps and timelines (items with geographic or time fields); simple network diagrams; crosstabulation. Widgets for these visualisations can be embedded in the CMS website generated from the database, or in standalone web pages or iframes in an external website.

Databases can be populated through form-based data entry, CSV import via a wizard which matches existing records and normalises data by extracting and linking entities based on selected columns, Zotero bibliography synchronisation, KML import, media uploads and indexing.

Other functions include wizards to build simple or facetted searches, personal and shared saved searches, search expansion rules to pull in related records, workgroup ownership of records, group notifications, blogging, a bookmarklet for capturing web references, WYSIWYG formatted text, user and workgroup tags.

For developers there is an API and all the export formats are available as live feeds. XML output can be transformed through XSLT stored in records within the database (temporarily unavailable, due to be reinstated 2022). Heurist source code is available under GNU GPL from the GitHub repository at https://github.com/HeuristNetwork/heurist and can be installed on any LAMP server, including virtual servers in the NeCTAR Research cloud, Amazon AWS and virtual servers from most ISPs. It has also been successfully installed on Windows servers.

Applicability

Heurist was conceived as a digital knowledgebase for managing heterogeneous data with rich interlinking, in small to medium collections (typically <500K records), often rich in media, textual and categorisation data, such as those typically found in the Arts and Humanities, and in personal research spaces. It is not suitable for large, structured, homogeneous, numerical datasets typical of the Sciences. [5] [6]

Heurist allows management of information with spatial and temporal components. Spatial components include the ability to enter georeferenced points, polygons etc. directly into an editor, as well as the ability to upload spatial data such as KML and Shapefiles. Spatial data is displayed on a map view within the database. Temporal components include the ability to enter dates as calendar dates, ranges, fuzzy dates or radiocarbon dates, with confidence levels. Dates are displayed on a timeline generally linked to the map display.

As of end 2021 Heurist is supporting a coupe of hundred projects on the public servers, ranging from large ERC (Europe), AHRC (UK), ANR (France) and ARC (Australia) to many small personal projects such as PhD research, primarily in Humanities disciplines.

Example applications

A more extensive list of examples can be found at http://HeuristNetwork.org/Projects

Recent projects (last 5 years)

tbc

Older projects

These projects remain active (end 2021)

Past projects

These projects are complete or no longer active.

Related Research Articles

The Semantic Web, sometimes known as Web 3.0, is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.

A GIS file format is a standard of encoding geographical information into a computer file. They are created mainly by government mapping agencies or by GIS software developers.

The Industry Foundation Classes (IFC) is a CAD data exchange file format intended for description of architectural, building and construction industry data.

Shapefile Geospatial vector data format

The shapefile format is a geospatial vector data format for geographic information system (GIS) software. It is developed and regulated by Esri as a mostly open specification for data interoperability among Esri and other GIS software products. The shapefile format can spatially describe vector features: points, lines, and polygons, representing, for example, water wells, rivers, and lakes. Each item usually has attributes that describe it, such as name or temperature.

Georeferencing means that the internal coordinate system of a map or aerial photo image can be related to a geographic coordinate system. The relevant coordinate transforms are typically stored within the image file, though there are many possible mechanisms for implementing georeferencing. The most visible effect of georeferencing is that display software can show ground coordinates and also measure ground distances and areas.

ArcGIS Server is the core server geographic information system (GIS) software made by Esri. ArcGIS Server is used for creating and managing GIS Web services, applications, and data. ArcGIS Server is typically deployed on-premises within the organization’s service-oriented architecture (SOA) or off-premises in a cloud computing environment.

Geospatial metadata is a type of metadata applicable to geographic data and information. Such objects may be stored in a geographic information system (GIS) or may simply be documents, data-sets, images or other objects, services, or related items that exist in some other native environment but whose features may be appropriate to describe in a (geographic) metadata catalog.

The Great Britain Historical GIS, is a spatially enabled database that documents and visualises the changing human geography of the British Isles, although is primarily focussed on the subdivisions of the United Kingdom mainly over the 200 years since the first census in 1801. The project is currently based at the University of Portsmouth, and is the provider of the website A Vision of Britain through Time.

Linked data Structured data and method for its publication

In computing, linked data is structured data which is interlinked with other data so it becomes more useful through semantic queries. It builds upon standard Web technologies such as HTTP, RDF and URIs, but rather than using them to serve web pages only for human readers, it extends them to share information in a way that can be read automatically by computers. Part of the vision of linked data is for the Internet to become a global database.

Digital history is the use of digital media to further historical analysis, presentation, and research. It is a branch of the digital humanities and an extension of quantitative history, cliometrics, and computing. Digital history is commonly digital public history, concerned primarily with engaging online audiences with historical content, or, digital research methods, that further academic research. Digital history outputs include: digital archives, online presentations, data visualizations, interactive maps, time-lines, audio files, and virtual worlds to make history more accessible to the user. Recent digital history projects focus on creativity, collaboration, and technical innovation, text mining, corpus linguistics, network analysis, 3D modeling, and big data analysis. By utilizing these resources, the user can rapidly develop new analyses that can link to, extend, and bring to life existing histories

A geographic data model, geospatial data model, or simply data model in the context of geographic information systems, is a mathematical and digital structure for representing phenomena over the Earth. Generally, such data models represent various aspects of these phenomena by means of geographic data, including spatial locations, attributes, change over time, and identity. For example, the vector data model represents geography as collections of points, lines, and polygons, and the raster data model represent geography as cell matrices that store numeric values. Data models are implemented throughout the GIS ecosystem, including the software tools for data management and spatial analysis, data stored in a variety of GIS file formats, specifications and standards, and specific designs for GIS installations.

Metadata Data about data

Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including:

Semi-structured data is a form of structured data that does not obey the tabular structure of data models associated with relational databases or other forms of data tables, but nonetheless contains tags or other markers to separate semantic elements and enforce hierarchies of records and fields within the data. Therefore, it is also known as self-describing structure.

Database preservation usually involves converting the information stored in a database to a form likely to be accessible in the long term as technology changes, without losing the initial characteristics of the data.

Content migration is the process of moving information stored on a given computer information system (IS) to a new system. The IS may be a Web content management system (CMS), a digital asset management (DAM), or a document management system (DMS). The IS may also be based on flat HTML content, including HTML files, Active Server Pages (ASP), JavaServer Pages (JSP), PHP, or content stored in some type of HTML/JavaScript based system and can be either static or dynamic content.

Integrated Archaeological Database

The Integrated Archaeological Database system, or IADB for short, is an open-source web-based application designed to address the data management requirements throughout the lifespan of archaeological excavation projects, from initial excavation recording, through post-excavation analysis and research to eventual dissemination and archiving.

The following is provided as an overview of and topical guide to databases:

In geographic information systems, toponym resolution is the relationship process between a toponym, i.e. the mention of a place, and an unambiguous spatial footprint of the same place.

Lightweight Information Describing Objects (LIDO) is an XML schema for describing museum or collection objects. Memory institutions use LIDO for “exposing, sharing and connecting data on the web”. It can be applied to all kind of disciplines in cultural heritage, e.g. art, natural history, technology, etc. LIDO is a specific application of CIDOC CRM.

References

  1. What’s new in the world of citation Management?
  2. Blanke, Tobias; Ann Borda; Gaby Bright; Bridget Soulsby (October 2008). "eResearch Australasia 2008". Ariadne. 57. Retrieved 8 October 2009.
  3. Berman, Merrick (March 2008). Georeferencing Historical Placenames and Tracking Changes Over Time (PDF). Georeferencing Workshop. Harvard University. Retrieved 8 October 2009.
  4. Wynne, Martin (July 2008). "Digital Humanities 2008 Oulu, Finland, June 25-28th" (PDF). CLARIN Newsletter (2): 7. Archived from the original (PDF) on 20 July 2011. Retrieved 8 October 2009.
  5. Heurist Help
  6. Johnson, Ian (2008). "Mapping the fourth dimension: a ten year retrospective" (PDF). Archeologia e Calcolatori. 19: 31–44. Retrieved 8 October 2009.
  7. "Pearling, Testimony of an Island Economy".
  8. About - Gallipoli: The First Day