NCSA Brown Dog

Last updated

NCSA Brown Dog is a research project to develop a method for easily accessing historic research data stored in order to maintain the long-term viability of large bodies of scientific research. It is supported by the National Center for Supercomputing Applications (NCSA) that is funded by the National Science Foundation (NSF). [1]

Contents

History

Brown Dog is part of the DataNet partners program funded by NSF in 2008. DataNet was conceived to address the increasingly digital and data-intensive nature of science, engineering and education. Brown Dog is part of a follow-on effort called Data Infrastructure Building Blocks (DIBBs), focused on building software to support DataNet. The project was proposed by researchers at NCSA and the University of Illinois Urbana-Champaign as well as researchers from Boston University and the University of North Carolina at Chapel Hill.

Unstructured, uncurated, long tail data

Much scientific data is smaller, unstructured and uncurated and thus not easily shared. Such data is sometimes referred to as "long tail" data. This borrows a term from statistics and refers to the tail of the distribution of project sizes. The majority of smaller projects lack the resources to properly steward the data they produce. This so-called "long tail" data, both past and present, has the potential to inform future research in many study areas. Much of this data has become inaccessible due to obsolete software and file formats. The resulting impossibility of reviewing data from older research disrupts the overall scientific research project. [2]

Approach

Brown Dog describes itself as the "super mutt" of software [3] (thus the name "Brown Dog"), serving as a low-level data infrastructure to interface digital data content across the internet. Its approach is to use every possible source of automated help (i.e., software) in existence in a robust and provenance-preserving manner to create a service that can deal with as much of this data as possible. [4] The project sees the broader impact of its work in its potential to serve the general public as a sort of "DNS for data", with the goal of making all data and all file formats as accessible as webpages are today.

Technology

Brown Dog seeks to address problems involving the use of uncurated and unstructured data collections through the development of two services: the Data Access Proxy (DAP) to aid in the conversion of file formats and the Data Tilling Services (DTS) for the automatic extraction of metadata from file contents. Once developed, researchers and general public users will be able to download browser plugins and other tools from the Brown Dog tool catalog. [1] [5]

Data Tilling Service

Data Tilling Service (DTS) will allow users to search data collections using an existing file to discover other similar files in a collection. A DTS search field will be appended to configured browsers where example files can be dropped. This tells DTS to search all the files under a given URL for files similar to the dropped file. For example, while browsing an online image collection, a user could drop an image of three people into the search field, and the DTS would return all images in the collection that also contain three people. If DTS encounters a foreign file format, it will utilize DAP to make the file accessible. DTS also indexes the data and extract and appends metadata to files and collections enabling users to gain some sense of the type of data they are encountering.

This service runs on port 9443.

Data Access Proxy

Data Access Proxy (DAP) allows users to access data files that would otherwise be unreadable. Similar to an internet gateway or Domain Name Service, the DAP configuration would be entered into a user's machine and browser settings. Data requests over HTTP would first be examined by DAP to determine if the native file format is readable on the client device. If not, DAP converts the file into the best available format readable by the client machine. Alternatively, the user could specify the desired format themselves.

This service runs on port 8184.

Use cases

Brown Dog targets three use cases proposed by groups within the EarthCube research communities. Developers and researchers from these communities will work together on use cases that span geoscience, engineering, biology and social science.

Long tail vegetation data in ecology and global change biology

This use case is led by Michael Dietze, Boston University

Data on the abundance, species composition, and size structure of vegetation is critically important for a wide array of sub-disciplines in ecology, conservation, natural resource management, and global change biology. However, addressing many of the pressing questions in these disciplines will require that terrestrial biosphere and hydrologic models are able to assimilate the large amount of long-tail data that exists but is largely inaccessible. The Brown Dog team in cooperation with researches from Dietze's lab will facilitate the capture of a huge body of smaller research-oriented vegetation data sets collected over many decades and historical vegetation data embedded in Public Land Survey data dating back to 1785. This data will be used as initial conditions for models, to make sense of other large data sets and for model calibration and validation. [1] [6]

Designing green infrastructure considering storm water and human requirements

This use case is led by Barbara Minsker, University of Illinois at Urbana-Champaign; William Sullivan, University of Illinois at Urbana-Champaign; Arthur Schmidt, University of Illinois at Urbana-Champaign

This case study involves developing novel green infrastructure design criteria and models that integrate requirements for storm water management and ecosystem and human health and well being. To address the scientific and social problems associated with the design of green spaces, data accessibility and availability is a major challenge. This study will focus on identified areas of the Green Healthy Neighborhood Planning region within the City of Chicago where existing local sewer performance is most deficient and where changes in impervious area through green infrastructure would be beneficial to under served neighborhoods. Brown Dog will be used to extract long-tail experimental data on human landscape preferences and health impacts. This data will be used to develop a human health impacts model that will then be linked together with a terrestrial biosphere model and a storm water model using Brown Dog technology. [1]

Development and application for critical zone studies

This use case is led by Praveen Kumar, University of Illinois at Urbana-Champaign

Critical Zone (CZ) is the "skin" of the earth that extends from the treetops to the bedrock that is created by life processes working at scales from microbes to biomes. The Critical Zone supports all terrestrial living systems. Its upper part is the bio-mantle. This is where terrestrial biota live, reproduce, use and expend energy, and where their wastes and remains accumulate and decompose. It encompasses the soil, which acts as a geomembrane through which water and solutes, energy, gases, solids, and organisms interact with the atmosphere, biosphere, hydrosphere, and lithosphere. A variety of drivers affect this bio-dynamic zone, ranging from climate and deforestation to agriculture, grazing and human development. Understanding and predicting these effects is central to managing and sustaining vital ecosystem services such as soil fertility, water purification, and production of food resources, and, at larger scales, global carbon cycling and carbon sequestration. The CZ provides a unifying framework for integrating terrestrial surface and near-surface environments, and reflects an intricate web of biological and chemical processes and human impacts occurring at vastly different temporal and spatial scales. The nature of these data create significant challenges for inter-disciplinary studies of the CZ because integration of the variety and number of data products and models has been a barrier. On the other hand, CZ data provides an excellent opportunity for defining, testing and implementing Brown Dog technologies. In this context "unstructured" data is viewed broadly as consisting of a collection of heterogeneous data with formats that reflect temporal and disciplinary legacies, data from emerging low cost open hardware based sensors and embedded sensor networks that lack well defined metadata and sensor characteristics, as well as data that are available as maps, images and text. [1]

NSF Award

CIF21 DIBBs: Brown Dog was awarded in the winter of 2013 with a start date of October 1, 2013. Estimated expiration date is September 30, 2018. [7]

The award amount was $10,519,716.00, the largest DIBB award. The principal investigator is Kenton McHenry of NCSA at the University of Illinois at Urbana-Champaign. Coleaders are Jong Lee NCSA/UIUC; Barbara Minsker, Civil and Environmental Engineering, University of Illinois at Urbana-Champaign; Praveen Kumar, Civil and Environmental Engineering, University of Illinois at Urbana-Champaign; Michael Dietze, Department of Earth and Environment, Boston University.

Related Research Articles

<span class="mw-page-title-main">National Center for Supercomputing Applications</span> Illinois-based applied supercomputing research organization

The National Center for Supercomputing Applications (NCSA) is a state-federal partnership to develop and deploy national-scale computer infrastructure that advances research, science and engineering based in the United States. NCSA operates as a unit of the University of Illinois Urbana-Champaign, and provides high-performance computing resources to researchers across the country. Support for NCSA comes from the National Science Foundation, the state of Illinois, the University of Illinois, business and industry partners, and other federal agencies.

<span class="mw-page-title-main">Urbana, Illinois</span> City in Illinois, United States

Urbana is a city in and the county seat of Champaign County, Illinois, United States. As of the 2020 census, Urbana had a population of 38,336. As of the 2010 United States Census, Urbana is the 38th-most populous municipality in Illinois. It is included in the Champaign–Urbana metropolitan area.

<span class="mw-page-title-main">University of Illinois System</span> Public university system in Illinois

The University of Illinois System is a system of public universities in Illinois consisting of three universities: Chicago, Springfield, and Urbana-Champaign. Across its three universities, the University of Illinois System enrolls more than 94,000 students. It had an operating budget of $7.18 billion in 2021.

NCSA HTTPd is an early, now discontinued, web server originally developed at the NCSA at the University of Illinois at Urbana–Champaign by Robert McCool and others. First released in 1993, it was among the earliest web servers developed, following Tim Berners-Lee's CERN httpd, Tony Sanders' Plexus server, and some others. It was for some time the natural counterpart to the Mosaic web browser in the client–server World Wide Web. It also introduced the Common Gateway Interface, allowing for the creation of dynamic websites.

<span class="mw-page-title-main">University of Illinois Urbana-Champaign</span> Public university in Illinois, U.S.

The University of Illinois Urbana-Champaign is a public land-grant research university in Illinois in the twin cities of Champaign and Urbana. It is the flagship institution of the University of Illinois system and was founded in 1867. Enrolling over 56,000 undergraduate and graduate students, the University of Illinois is one of the largest public universities by enrollment in the country.

<span class="mw-page-title-main">Visual Molecular Dynamics</span> Visualization and modelling software

Visual Molecular Dynamics (VMD) is a molecular modelling and visualization computer program. VMD is developed mainly as a tool to view and analyze the results of molecular dynamics simulations. It also includes tools for working with volumetric data, sequence data, and arbitrary graphics objects. Molecular scenes can be exported to external rendering tools such as POV-Ray, RenderMan, Tachyon, Virtual Reality Modeling Language (VRML), and many others. Users can run their own Tcl and Python scripts within VMD as it includes embedded Tcl and Python interpreters. VMD runs on Unix, Apple Mac macOS, and Microsoft Windows. VMD is available to non-commercial users under a distribution-specific license which permits both use of the program and modification of its source code, at no charge.

NCSA may refer to:

Spyglass, Inc. was an Internet software company. It was founded in 1990, in Champaign, Illinois, as an offshoot of the University of Illinois at Urbana–Champaign, and later moved to Naperville, Illinois. Spyglass was created to commercialize and support technologies from the National Center for Supercomputing Applications (NCSA). It focused on data visualization tools, such as graphing packages and 3D rendering engines.

APE tags comprise one extant convention used to store information (metadata) about a given digital audio file. Each APE tag constitutes a discrete element that describes a single attribute of the file's contents. Each consists of a key/value pair; the key is simply a label that names the attribute, such as Year, Title, Artist, or Track Number, etc.), and associated with it is a corresponding value, namely, some information descriptive of this file, in terms of the attribute in question. APE tags can be used with .ape-formatted recordings, as well as with sound files of other audio file formats.

<span class="mw-page-title-main">AMosaic</span> Web browser port for Amiga computers

AMosaic was a port to the Amiga of the Mosaic web browser, developed beginning in 1993, and was the first graphical web browser for the Amiga. AMosaic was based on NCSA's Mosaic, but was not distributed by the University of Illinois or NCSA. It was developed by Michael Fischer at Stony Brook University, Michael Meyer at the University of California, Berkeley, and Michael Witbrock at Carnegie Mellon University.

Eric J. Bina is an American software programmer who is the co-creator of Mosaic and the co-founder of Netscape. In 1993, Bina along with Marc Andreessen authored the first version of Mosaic while working as a programmer at National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign.

NCSA Telnet is an implementation of the Telnet protocol developed at the National Center for Supercomputing Applications of the University of Illinois at Urbana-Champaign, first released in 1986 and continuously developed until 1995. The initial implementation ran under Mac OS and Microsoft MS-DOS, and provided basic DEC VT102 terminal emulation with support for multiple simultaneous connections and an internal FTP server. NCSA Telnet was the first implementation of telnet for the Macintosh or PC that provided the ability to connect to multiple hosts simultaneously.

Edward Seidel is an American academic administrator and scientist serving as the president of the University of Wyoming since July 1, 2020. He previously served as the Vice President for Economic Development and Innovation for the University of Illinois System, as well as a Founder Professor in the Department of Physics and a professor in the Department of Astronomy at the University of Illinois at Urbana-Champaign. He was the director of the National Center for Supercomputing Applications at Illinois from 2014 to 2017.

<span class="mw-page-title-main">Blue Waters</span> Supercomputer at the University of Illinois at Urbana-Champaign, United States

Blue Waters was a petascale supercomputer operated by the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign. On August 8, 2007, the National Science Board approved a resolution which authorized the National Science Foundation to fund "the acquisition and deployment of the world's most powerful leadership-class supercomputer." The NSF awarded $208 million for the Blue Waters project.

CGNS stands for CFD General Notation System. It is a general, portable, and extensible standard for the storage and retrieval of CFD analysis data. It consists of a collection of conventions, and free and open software implementing those conventions. It is self-descriptive, cross-platform also termed platform or machine independent, documented, and administered by an international steering committee. It is also an American Institute of Aeronautics and Astronautics (AIAA) recommended practice. The CGNS project originated in 1994 as a joint effort between Boeing and NASA, and has since grown to include many other contributing organizations worldwide. In 1999, control of CGNS was completely transferred to a public forum known as the CGNS Steering Committee. This Committee is made up of international representatives from government and private industry.

<span class="mw-page-title-main">Maxine D. Brown</span> American computer scientist

Maxine D. Brown is an American computer scientist and retired director of the Electronic Visualization Laboratory (EVL) at the University of Illinois at Chicago (UIC). Along with Tom DeFanti and Bruce McCormick, she co-edited the 1987 NSF report, Visualization in Scientific Computing, which defined the field of scientific visualization.

<span class="mw-page-title-main">Metadata</span> Data about data

Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including:

The Center for Simulation of Advanced Rockets (CSAR) is an interdisciplinary research group at the University of Illinois at Urbana-Champaign, and is part of the United States Department of Energy's Advanced Simulation and Computing Program. CSAR's goal is to accurately predict the performance, reliability, and safety of solid propellant rockets.

Data Infrastructure Building Blocks (DIBBs) is a U.S. National Science Foundation program.

<span class="mw-page-title-main">University of Illinois Urbana-Champaign University Library</span> Library system of the University of Illinois

The University Library at the University of Illinois Urbana–Champaign is the network of libraries, including both physical and virtual library spaces, which serves the university's students, faculty, and staff, as well as scholars and researchers worldwide. The University Library continues to evolve to serve the needs of the University of Illinois at Urbana–Champaign campus.

References

  1. 1 2 3 4 5 "Brown Dog". NCSA Brown Dog. Retrieved 31 July 2014.
  2. "DataUp—Data Curation for the Long Tail of Science". Microsoft Research Connections Blog. Microsoft Research Connections Team. Retrieved 7 August 2014.
  3. Woodie, Alex (6 January 2014). "NCSA Project Aims to Create a DNS-Like Service for Data". datanami. Retrieved 7 August 2014.
  4. Pletz, John (December 2013). "U of I researchers get millions for 'super mutt' to sniff out big-data trends". Chicago Business. Crain Communications, Inc. Retrieved 7 August 2014.
  5. Jewett, Barbara. "DATA SET FREE". NCSA Access Magazine. NCSA. Retrieved 7 August 2014.
  6. "BU Scientist, Collaborators Get $10.5 Million Grant to Develop Software for un-Curated Data". www.newswise.com. Boston University College of Arts and Sciences. Retrieved 7 August 2014.
  7. "Award#1261582 - CIF21 DIBBs: Brown Dog". nsf.gov. Retrieved 31 July 2014.