Digital Automated Identification System

Last updated
Digital automated identification system (DAISY)
Developer(s) Mark A. O'Neill
Stable release
2.1.0 / February 1, 2016;6 years ago (2016-02-01)
Written in C [ citation needed ]
Operating system Linux
Platform IA-32 x86-64 ARM
Available inEnglish
License Proprietary commercial software
Website www.tumblingdice.co.uk/daisy

Digital automated identification system (DAISY) is an automated species identification system optimised for the rapid screening of invertebrates (e.g. insects) by non-experts (e.g. parataxonomists).

Contents

It was developed by Dr. Mark O'Neill during the mid-1990s. Development was supported by funding from the Darwin Initiative in 1997 [1] and BBSRC. [2] The intellectual property rights were acquired by O'Neill's company, Tumbling Dice Ltd, in February 2000 [3] at the end of the grant funded Darwin Project. The system underwent further development resulting in an producing an exemplar which is web accessible and which can cope in near real time with groups (e.g. hawk moths) which contain several hundred taxa. On medium to high end PC server hardware (e.g. a blade server) an identification is possible in under a second for a 300 taxon group. Parallelisation of the critical DAISY classifier codes (using either bespoke FPGA technology or general purpose GPU programming technology such as CUDA) will give an order of magnitude increase in performance. This means that DAISY can be deployed to make real time identifications within groups containing thousands of taxa (e.g. true flies).

DAISY results for selected insect taxa
taxonimage typestructuretraining imagesspeciessuccess(%)
Belize Sphingidae RGBwing7055897
Xylophanes sp.RGBwing5433099
Parasitic wasps monowing5594795
UK butterfliesRGBwing8185798
UK macro mothsRGBwing7443798
CaterpillarsRGBhead91793
CaterpillarsRGBbody5081099
Soft fruit pestsRGBbody26342391
DAISY results for other image classification tasks
descriptionimage typestructuretraining imagesclassessuccess(%)
Food cansRGBlabel315100
Industrial objectsRGBunposed object15514100
Foraminifera testsRGBunposed object198895
Pollen grainsRGBunposed object66011299
Spidersmonogenitalia102691
Human facesmonounposed face4004199

DAISY has been used in several research projects by O'Neill [4] and others, and featured in popular science TV and magazine articles. The project has also been the subject of a recent article in Science . [5]

In 2011, the first DAISY installation capable of scaling to hundreds of taxa was installed at Natural History Museum in London. This server offered both VNC and web service based interfaces and was able to offload compute intensive pattern matching operations onto an NVIDIA GPU programmed using CUDA. This installation was capable of providing identification to species given a 300+ taxon dataset in less than a second in a multiple user environment.

More recently, under the aegis of Innovate UK funding, DAISY has been extensively modified to meet the needs of upstream activities within the oil and gas sector, in particular biostratigraphy. The resultant system, GeoDAISY represents a significant technological advance. It is capable of deep learning, knowledge encapsulation, pattern based data mining and (image based) content search and can efficiently handle training sets consisting of millions of patterns on commodity hardware using a combination of smart data caching and OpenMP. Further details of GeoDAISY, and the rationale for developing it are available as white papers on the Tumbling Dice LinkedIn page.

See also

Related Research Articles

Taxonomy (biology) Science of naming, defining and classifying organisms

In biology, taxonomy is the scientific study of naming, defining (circumscribing) and classifying groups of biological organisms based on shared characteristics. Organisms are grouped into taxa and these groups are given a taxonomic rank; groups of a given rank can be aggregated to form a more inclusive group of higher rank, thus creating a taxonomic hierarchy. The principal ranks in modern use are domain, kingdom, phylum, class, order, family, genus, and species. The Swedish botanist Carl Linnaeus is regarded as the founder of the current system of taxonomy, as he developed a ranked system known as Linnaean taxonomy for categorizing organisms and binominal nomenclature for naming organisms.

Graphics processing unit Specialized electronic circuit; graphics accelerator

A graphics processing unit (GPU) is a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobile phones, personal computers, workstations, and game consoles.

Morphology (biology) In biology, the form and structure of organisms

Morphology is a branch of biology dealing with the study of the form and structure of organisms and their specific structural features.

General-purpose computing on graphics processing units is the use of a graphics processing unit (GPU), which typically handles computation only for computer graphics, to perform computation in applications traditionally handled by the central processing unit (CPU). The use of multiple video cards in one computer, or large numbers of graphics chips, further parallelizes the already parallel nature of graphics processing.

<i>Incertae sedis</i> Term to indicate an uncertain taxonomic position

Incertae sedis or problematica is a term used for a taxonomic group where its broader relationships are unknown or undefined. Alternatively, such groups are frequently referred to as "enigmatic taxa". In the system of open nomenclature, uncertainty at specific taxonomic levels is indicated by incertae familiae, incerti subordinis, incerti ordinis and similar terms.

Botanical name Scientific name for a plant, alga or fungus

A botanical name is a formal scientific name conforming to the International Code of Nomenclature for algae, fungi, and plants (ICN) and, if it concerns a plant cultigen, the additional cultivar or Group epithets must conform to the International Code of Nomenclature for Cultivated Plants (ICNCP). The code of nomenclature covers "all organisms traditionally treated as algae, fungi, or plants, whether fossil or non-fossil, including blue-green algae (Cyanobacteria), chytrids, oomycetes, slime moulds and photosynthetic protists with their taxonomically related non-photosynthetic groups ."

The Henry Classification System is a long-standing method by which fingerprints are sorted by physiological characteristics for one-to-many searching. Developed by Hem Chandra Bose, Qazi Azizul Haque and Sir Edward Henry in the late 19th century for criminal investigations in British India, it was the basis of modern-day AFIS classification methods up until the 1990s. In recent years, the Henry Classification System has generally been replaced by ridge flow classification approaches.

Botanical nomenclature is the formal, scientific naming of plants. It is related to, but distinct from taxonomy. Plant taxonomy is concerned with grouping and classifying plants; botanical nomenclature then provides names for the results of this process. The starting point for modern botanical nomenclature is Linnaeus' Species Plantarum of 1753. Botanical nomenclature is governed by the International Code of Nomenclature for algae, fungi, and plants (ICN), which replaces the International Code of Botanical Nomenclature (ICBN). Fossil plants are also covered by the code of nomenclature.

Nomenclature codes or codes of nomenclature are the various rulebooks that govern biological taxonomic nomenclature, each in their own broad field of organisms. To an end-user who only deals with names of species, with some awareness that species are assignable to families, it may not be noticeable that there is more than one code, but beyond this basic level these are rather different in the way they work.

The Texas Advanced Computing Center (TACC) at the University of Texas at Austin, United States, is an advanced computing research center that provides comprehensive advanced computing resources and support services to researchers in Texas and across the USA. The mission of TACC is to enable discoveries that advance science and society through the application of advanced computing technologies. Specializing in high performance computing, scientific visualization, data analysis & storage systems, software, research & development and portal interfaces, TACC deploys and operates advanced computational infrastructure to enable computational research activities of faculty, staff, and students of UT Austin. TACC also provides consulting, technical documentation, and training to support researchers who use these resources. TACC staff members conduct research and development in applications and algorithms, computing systems design/architecture, and programming tools and environments.

Biodiversity informatics is the application of informatics techniques to biodiversity information, such as taxonomy, biogeography or ecology. Modern computer techniques can yield new ways to view and analyze existing information, as well as predict future situations. Biodiversity informatics is a term that was only coined around 1992 but with rapidly increasing data sets has become useful in numerous studies and applications, such as the construction of taxonomic databases or geographic information systems. Biodiversity informatics contrasts with "bioinformatics", which is often used synonymously with the computerized handling of data in the specialized area of molecular biology.

Automated species identification is a method of making the expertise of taxonomists available to ecologists, parataxonomists and others via digital technology and artificial intelligence. Today, most automated identification systems rely on images depicting the species for the identification. Based on precisely identified images of a species, a classifier is trained. Once exposed to a sufficient amount of training data, this classifier can then identify the trained species on previously unseen images. Accurate species identification is the basis for all aspects of taxonomic research and is an essential component of workflows in biological research.

Form classification Classification of organisms based on their morphology

Form classification is the classification of organisms based on their morphology, which does not necessarily reflect their biological relationships. Form classification, generally restricted to palaeontology, reflects uncertainty; the goal of science is to move "form taxa" to biological taxa whose affinity is known.

Taxonomic rank Level in a taxonomic hierarchy

In biological classification, taxonomic rank is the relative level of a group of organisms in a taxonomic hierarchy. Examples of taxonomic ranks are species, genus, family, order, class, phylum, kingdom, domain, etc.

Mark A. ONeill

Mark A. O'Neill is an English computational biologist with interests in artificial intelligence, systems biology, complex systems and image analysis. He is the creator and lead programmer on a number of computational projects including the Digital Automated Identification SYstem (DAISY) for automated species identification and PUPS P3, an organic computing environment for Linux.

Walter Max Zimmermann was a German botanist and systematist. Zimmernann’s notions of classifying life objectively based on phylogenetic methods and on evolutionarily important characters were foundational for modern phylogenetics. Though they were later implemented by Willi Hennig in his fundamental work on phylogenetic systematics, Zimmermann's contributions to this field have largely been overlooked. Zimmermann also made several significant developments in the field of plant systematics such as the discovery of the telome theory. The standard botanical author abbreviation W.Zimm. is applied to species he described.

A taxonomic database is a database created to hold information on biological taxa – for example groups of organisms organized by species name or other taxonomic identifier – for efficient data management and information retrieval. Taxonomic databases are routinely used for the automated construction of biological checklists such as floras and faunas, both for print publication and online; to underpin the operation of web-based species information systems; as a part of biological collection management ; as well as providing, in some cases, the taxon management component of broader science or biology information systems. They are also a fundamental contribution to the discipline of biodiversity informatics.

PUPS P3

PUPS/P3 is an implementation of an organic computing environment for Linux which provides support for the implementation of low level persistent software agents.

GeForce 600 series Series of GPUs by Nvidia

Serving as the introduction of Kepler architecture, the GeForce 600 series is a series of graphics processing units developed by Nvidia, first released in 2012.

RanaVision

Rana motion vision system is a motion detection that uses vision to detect the presence of objects within its visual field. Rana is based on the open source motion package for Linux, but has significantly enhanced motion detection capabilities. It has been designed top operate as an efficient camera trap system for recording the movements of small invertebrates, capable of operating autonomously in the field for extended periods. To date, Rana has been used a number of projects looking eusocial hymenoptera including studies of bumblebee and hornet activity in the vicinity of their nests and of the behaviour of hover flies and other pollinators at flowers and as a general purpose e-ecology tool for the automated remote observation of plant-pollinator interactions in the field.

References

  1. "Automating Insect Identification for Inventorying Costa Rican Biodiversity". Darwin Initiative . Defra . Retrieved 15 December 2010.
  2. "Daisy Overview" (PDF). Tumbling Dice Ltd. 2007. Retrieved 14 December 2010.
  3. "Daisy". Tumbling Dice. Retrieved 14 December 2010.
  4. Watson, Anna T.; O'Neill, Mark A.; Kitching, Ian J. (2003). "A qualitative study investigating automated identification of living macrolepidoptera using the Digital Automated Identification SYstem (DAISY)". Systematics and Biodiversity. 1: 287–300. doi:10.1017/S1477200003001208. S2CID   86265419.
  5. Reed, Sarah (2010). "Pushing DAISY". Science. 328 (5986): 1628–1629. doi:10.1126/science.328.5986.1628. PMID   20576867.
  6. Leafsnap
  7. iPflanzen
  8. PlantNet
  9. Plants
  10. Plantifier
  11. NatureGate