Type of site | Scientific support |
---|---|
Available in | English |
URL | cyverse |
Commercial | No |
Launched | 2008 |
The iPlant Collaborative, renamed Cyverse in 2017, is a virtual organization created by a cooperative agreement funded by the US National Science Foundation (NSF) to create cyberinfrastructure for the plant sciences (botany). [1] The NSF compared cyberinfrastructure to physical infrastructure, "... the distributed computer, information and communication technologies combined with the personnel and integrating components that provide a long-term platform to empower the modern scientific research endeavor". [2] In September 2013 it was announced that the National Science Foundation had renewed iPlant's funding for a second 5-year term with an expansion of scope to all non-human life science research. [3]
The project develops computing systems and software that combine computing resources, like those of TeraGrid, and bioinformatics and computational biology software. Its goal is easier collaboration among researchers with improved data access and processing efficiency. Primarily centered in the United States, it collaborates internationally.
Biology is relying more and more on computers. [4] Plant biology is changing with the rise of new technologies. [5] With the advent of bioinformatics, computational biology, DNA sequencing, geographic information systems and others computers can greatly assist researchers who study plant life looking for solutions to challenges in medicine, biofuels, biodiversity, agriculture and problems like drought tolerance, plant breeding, and sustainable farming. [6] Many of these problems cross traditional disciplines and facilitating collaboration between plant scientists of diverse backgrounds and specialties is necessary. [6] [7] [8]
In 2006, the NSF solicited proposals to create "a new type of organization – a cyberinfrastructure collaborative for plant science" with a program titled "Plant Science Cyberinfrastructure Collaborative" (PSCIC) with Christopher Greer as program director. [9] A proposal was accepted (adopting the convention of using the word "Collaborative" as a noun) and iPlant was officially created on February 1, 2008. [1] [9] Funding was estimated as $10 million per year over five years. [10]
Richard Jorgensen led the team through the proposal stage and was the principal investigator (PI) from 2008 to 2009. [10] Gregory Andrews, Vicki Chandler, Sudha Ram and Lincoln Stein served as Co-Principal Investigators (Co-PIs) from 2008 to 2009. In late 2009, Stephen Goff was named PI and Daniel Stanzione was added as a Co-PI. [1] [11] [12] As of May 2014, Co-PI Stanzione was replaced by 4 new Co-PIs: Doreen Ware at Cold Spring Harbor, Nirav Merchant and Eric Lyons at the University of Arizona, and Matthew Vaughn at the Texas Advanced Computing Center. [13]
The iPlant project supports what has been called e-Science, which is a use of information systems technology that is being adopted by the research community in efforts such as the National Center for Ecological Analysis and Synthesis (NCEAS), ELIXIR, [14] and the Bamboo Technology Project that started in September 2010. [15] [16] iPlant is "designed to create the foundation to support the computational needs of the research community and facilitate progress toward solutions of major problems in plant biology." [6] [17]
The project works as a collaboration. It seeks input from the wider plant science community on what to build. [18] Based on that input, it has enabled easier use of large data sets, [19] created a community-driven research environment to share existing data collections within a research area and between research areas [20] and shares data with provenance tracking. [21] [22] One model studied for collaboration was Wikipedia. [23] [24]
Several more recent National Science Foundation awards mentioned iPlant explicitly in their descriptions, as either a design pattern to follow or a collaborator with whom the recipient will work. [25]
The primary institution for the iPlant project is the University of Arizona, located within the BIO5 Institute in Tucson. [26] Since its inception in 2008, personnel worked at other institutions including Cold Spring Harbor Laboratory, University of North Carolina, Wilmington, and the University of Texas at Austin in the Texas Advanced Computing Center. [27] Purdue University and Arizona State University were part of the original project group. [10]
Other collaborating institutions that received support from iPlant for their work on a Grand Challenge in phylogenetics starting in March 2009 included Yale University, University of Florida, and the University of Pennsylvania. [27] A trait evolution group was led at the University of Tennessee. [28] A visualization workshop employing iPlant was run by Virginia Tech in 2011. [29]
The NSF requires that funding subcontracts stay within the United States, but international collaboration started in 2009 with the Technical University Munich [27] and University of Toronto in 2010. [29] [30] East Main Evaluation & Consulting provides external oversight, advice, and assistance. [31]
The iPlant project makes its cyberinfrastructure available several different ways and offers services to make it the accessible to its primary audience. The design was meant to grow in response to needs of the research community it serves. [6]
The Discovery Environment integrates community-recommended software tools into a system that can handle terabytes of data using high-performance supercomputers to perform these tasks much more quickly. It has an interface designed to hide the complexity needed to do this from the end user. The goal was to make the cyberinfrastructure available to non-technical end users who are not as comfortable using a command-line interface. [6] [32]
A set of application programming interfaces (APIs) for developers allow access to iPlant services, including authentication, data management, high performance supercomputing resources from custom, locally produced software. [6] [33]
Atmosphere is a cloud computing platform that provides easy access to pre-configured, frequently used analysis routines, relevant algorithms, and data sets, and accommodates computationally and data-intensive bioinformatics tasks. [6] It uses the Eucalyptus virtualization platform. [34] [35]
The iPlant Semantic Web effort uses an iPlant-created architecture, protocol, and platform called the Simple Semantic Web Architecture and Protocol (SSWAP) for semantic web linking using a plant science focused ontology. [6] [36] [37] SSWAP is based on the notion of RESTful web services with an ontology based on Web Ontology Language (OWL). [38] [39]
The Taxonomic Name Resolution Service (TNRS) is a free utility for correcting and standardizing plant names. This is needed because plant names that are misspelled, out of date (because a newer synonym is preferred), or incomplete make it hard to use computers to process large lists. [6] [40] [41]
My-Plant.org is a social networking community for plant biologists, educators and others to come together to share information and research, collaborate, and track the latest developments in plant science. [6] [42] The My-Plant network uses the terminology clades to group users in a manner similar to phylogenetics of plants themselves. [42] It was implemented using Drupal as its content management system. [42]
The DNA Subway website uses a graphical user interface (GUI) to generate DNA sequence annotations, explore plant genomes for members of gene and transposon families, and conduct phylogenetic analyses. It makes high-level DNA analysis available to faculty and students by simplifying annotation and comparative genomics workflows. [6] [43] It was developed for iPlant by the Dolan DNA Learning Center. [44] [45]
Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The process of analyzing and interpreting data can sometimes be referred to as computational biology, however this distinction between the two terms is often disputed. To some, the term computational biology refers to building and using models of biological systems.
Computational biology refers to the use of techniques in computer science, data analysis, mathematical modeling and computational simulations to understand biological systems and relationships. An intersection of computer science, biology, and data science, the field also has foundations in applied mathematics, molecular biology, cell biology, chemistry, and genetics.
The San Diego Supercomputer Center (SDSC) is an organized research unit of the University of California, San Diego. Founded in 1985, it was one of the five original NSF supercomputing centers.
E-Science or eScience is computationally intensive science that is carried out in highly distributed network environments, or science that uses immense data sets that require grid computing; the term sometimes includes technologies that enable distributed collaboration, such as the Access Grid. The term was created by John Taylor, the Director General of the United Kingdom's Office of Science and Technology in 1999 and was used to describe a large funding initiative starting in November 2000. E-science has been more broadly interpreted since then, as "the application of computer technology to the undertaking of modern scientific investigation, including the preparation, experimentation, data collection, results dissemination, and long-term storage and accessibility of all materials generated through the scientific process. These may include data modeling and analysis, electronic/digitized laboratory notebooks, raw and fitted data sets, manuscript production and draft versions, pre-prints, and print and/or electronic publications." In 2014, IEEE eScience Conference Series condensed the definition to "eScience promotes innovation in collaborative, computationally- or data-intensive research across all disciplines, throughout the research lifecycle" in one of the working definitions used by the organizers. E-science encompasses "what is often referred to as big data [which] has revolutionized science... [such as] the Large Hadron Collider (LHC) at CERN... [that] generates around 780 terabytes per year... highly data intensive modern fields of science...that generate large amounts of E-science data include: computational biology, bioinformatics, genomics" and the human digital footprint for the social sciences.
The Biocomplexity Institute of Virginia Tech was a research institute specializing in bioinformatics, computational biology, and systems biology. The institute had more than 250 personnel, including over 50 tenured and research faculty. Research at the institute involved collaboration in diverse disciplines such as mathematics, computer science, biology, plant pathology, biochemistry, systems biology, statistics, economics, synthetic biology and medicine. The institute developed -omic and bioinformatic tools and databases that can be applied to the study of human, animal and plant diseases as well as the discovery of new vaccine, drug and diagnostic targets.
United States federal research funders use the term cyberinfrastructure to describe research environments that support advanced data acquisition, data storage, data management, data integration, data mining, data visualization and other computing and information processing services distributed over the Internet beyond the scope of a single institution. In scientific usage, cyberinfrastructure is a technological and sociological solution to the problem of efficiently connecting laboratories, data, computers, and people with the goal of enabling derivation of novel scientific theories and knowledge.
Vasant G. Honavar is an Indian-American computer scientist, and artificial intelligence, machine learning, big data, data science, causal inference, knowledge representation, bioinformatics and health informatics researcher and professor.
The Texas Advanced Computing Center (TACC) at the University of Texas at Austin, United States, is an advanced computing research center that is based on comprehensive advanced computing resources and supports services to researchers in Texas and across the U.S. The mission of TACC is to enable discoveries that advance science and society through the application of advanced computing technologies. Specializing in high-performance computing, scientific visualization, data analysis and storage systems, software, research and development, and portal interfaces, TACC deploys and operates advanced computational infrastructure to enable the research activities of faculty, staff, and students of UT Austin. TACC also provides consulting, technical documentation, and training to support researchers who use these resources. TACC staff members conduct research and development in applications and algorithms, computing systems design/architecture, and programming tools and environments.
Richard A. Jorgensen is an American molecular geneticist and an early pioneer in the study of post transcriptional gene silencing.
Carole Anne Goble, is a British academic who is Professor of Computer Science at the University of Manchester. She is principal investigator (PI) of the myGrid, BioCatalogue and myExperiment projects and co-leads the Information Management Group (IMG) with Norman Paton.
Integrated computational materials engineering (ICME) involves the integration of experimental results, design models, simulations, and other computational data related to a variety of materials used in multiscale engineering and design. Central to the achievement of ICME goals has been the creation of a cyberinfrastructure, a Web-based, collaborative platform which provides the ability to accumulate, organize and disseminate knowledge pertaining to materials science and engineering to facilitate this information being broadly utilized, enhanced, and expanded.
Pavel Arkadevich Pevzner is the Ronald R. Taylor Professor of Computer Science and director of the NIH Center for Computational Mass Spectrometry at University of California, San Diego. He serves on the editorial board of PLoS Computational Biology and he is a member of the Genome Institute of Singapore scientific advisory board.
Cathy H. Wu is the Edward G. Jefferson Chair and professor and director of the Center for Bioinformatics & Computational Biology (CBCB) at the University of Delaware. She is also the director of the Protein Information Resource (PIR) and the North east Bioinformatics Collaborative Steering Committee, and the adjunct professor at the Georgetown University Medical Center.
BisQue is a free, open source web-based platform for the exchange and exploration of large, complex datasets. It is being developed at the Vision Research Lab at the University of California, Santa Barbara. BisQue specifically supports large scale, multi-dimensional multimodal-images and image analysis. Metadata is stored as arbitrarily nested and linked tag/value pairs, allowing for domain-specific data organization. Image analysis modules can be added to perform complex analysis tasks on compute clusters. Analysis results are stored within the database for further querying and processing. The data and analysis provenance is maintained for reproducibility of results. BisQue can be easily deployed in cloud computing environments or on computer clusters for scalability. BisQue has been integrated into the NSF Cyberinfrastructure project CyVerse. The user interacts with BisQue via any modern web browser.
Sorin Drăghici is a Romanian-American computer scientist and a program director in the Division of Information and Intelligent Systems (IIS) of the Directorate for Computer and Information Science and Engineering (CISE) at the National Science Foundation (NSF). Previous positions include: Associate Dean for Entrepreneurship and Innovation of Wayne State University's College of Engineering, the Director of the Bioinformatics and Biostatistics Core at Karmanos Cancer Institute, and the Director of the James and Patricia Anderson Engineering Ventures Institute. Draghici was elected a Fellow of the Institute of Electrical and Electronics Engineers (IEEE) in 2022, for contributions to the analysis of high-throughput genomics and proteomics data. He has also been elected a Fellow of the Asia-Pacific Artificial Intelligence Association (AAIA).
Science gateways provide access to advanced resources for science and engineering researchers, educators, and students. Through streamlined, online, user-friendly interfaces, gateways combine a variety of cyberinfrastructure (CI) components in support of a community-specific set of tools, applications, and data collections.: In general, these specialized, shared resources are integrated as a Web portal, mobile app, or a suite of applications. Through science gateways, broad communities of researchers can access diverse resources which can save both time and money for themselves and their institutions. As listed below, functions and resources offered by science gateways include shared equipment and instruments, computational services, advanced software applications, collaboration capabilities, data repositories, and networks.
Srinivas Aluru is a professor in the School of Computational Science and Engineering at Georgia Institute of Technology, and co-Executive Director for the Georgia Tech Interdisciplinary Research Institute in Data Engineering and Science. His main areas of research are high performance computing, data science, bioinformatics and systems biology, combinatorial methods in scientific computing, and string algorithms. Aluru is a Fellow of the American Association for the Advancement of Science (AAAS) and the Institute for Electrical and Electronic Engineers (IEEE). He is best known for his research contributions in parallel algorithms and applications, interdisciplinary research in bioinformatics and computational biology, and particularly the intersection of these two fields.
Tracy Teal is an American bioinformatician and the executive director of Data Carpentry. She is known for her work in open science and biomedical data science education.
Keith A. Crandall is an American computational biologist, bioinformaticist, and population geneticist at George Washington University, where he is the founding director of the Computational Biology Institute, and professor in the Department of Biostatistics and Bioinformatics.
Ilkay Altintas is a Turkish-American data and computer scientist, and researcher in the domain of supercomputing and high-performance computing applications. Since 2015, Altintas has served as chief data science officer of the San Diego Supercomputer Center (SDSC), at the University of California, San Diego (UCSD), where she has also served as founder and director of the Workflows for Data Science Center of Excellence (WorDS) since 2014, as well as founder and director of the WIFIRE lab. Altintas is also the co-initiator of the Kepler scientific workflow system, an open-source platform that endows research scientists with the ability to readily collaborate, share, and design scientific workflows.
{{cite web}}
: Missing or empty |title=
(help){{cite web}}
: CS1 maint: unfit URL (link)The iPlant programme was designed to give plant scientists a new information infrastructure. But first they had to decide what they wanted...