Galaxy[2] is an open-sourcescientific workflow system designed to make research accessible, reproducible, and transparent. Originally developed for computational biology, Galaxy has evolved into a domain-agnostic framework utilized across various scientific disciplines. Some examples include: data science[3], microbiology[4], medical research[5], neuroscience[6], virology[7] and outbreak detection[8], food safety[9], wastewater tracking and antibiotic resistance[10], long-read[11] and high-throughput[12] genomic sequencing, bioinformatics[13], and other scientific disciplines.
For many computational biology processes, Galaxy accommodates scientists from newcomers to professionals. It supports code-free workflow development, GUI workflow visualization as well as command-line interface access, scheduled jobs, and cloud infrastructure management. It supports data persistence and data publishing to facilitate collaboration. The freely hosted services of UseGalaxy (United States, EU, and Australia) support a global community of over 500,000 registered users through the Galaxy Hub which holds events, an annual conference, and hundreds of free online tutorials at the Galaxy Training Network.
Use
Use Cases
In 2021, members of the Galaxy team published a paper in Nature Biotechnology[14] detailing a method for tracking COVID-19 variants using Galaxy's scheduled jobs feature, Planemo, which is capable of processing and monitoring hundreds of thousands of samples.
In 2021, Galaxy partnered with the Vertebrate Genomes Project (VGP) which "aims to generate near error-free reference genome assemblies"[15] for approximately 70,000 vertebrate species.
In 2022, Goecks Lab introduced a scalable and modular pipeline, MCMICRO, which is capable of processing multiplexed imaging critical for analyzing complex tissue in cancer research and for improving precision oncology[16].
Galaxy is "an open, web-based platform for performing accessible, reproducible, and transparent genomic science."[23]
Accessibility
Computational biology is a specialized domain that often requires knowledge in computer programming. Galaxy provides biomedical researchers access to computational biology without requiring expertise in computer programming.[24][25] To achieve this, Galaxy prioritizes a user-friendly interface[26] over the flexibility to construct highly complex workflows. This design choice makes it relatively easy to build typical analyses, but more difficult to build complex workflows that include, for example, looping constructs. (See Apache Taverna for an example of a data-driven workflow system that supports looping.[27])
Reproducibility
Reproducibility is fundamental to science: when scientific results are published, they should include sufficient information for others to replicate the experiment and obtain the same results. In recent years, significant efforts have been made to extend this standard beyond traditional laboratory experiments (the "wet lab") to computational research (the "dry lab"). However, achieving reproducibility in computational experiments has proven more challenging than initially anticipated.[28]
Galaxy supports reproducibility by systematically capturing all essential details of a computational analysis, ensuring that it can be precisely replicated at any point in the future. This includes recording all input, intermediate, and final datasets, as well as the parameters used and the exact sequence of analytical steps.
Transparency
Transparency is essential in science, as it enables verification, fosters collaboration, and accelerates discoveries by allowing others to build upon existing work. Galaxy promotes transparency in scientific research by allowing researchers to share their GalaxyObjects either publicly or with specific individuals. Shared items can be thoroughly examined, rerun as needed, and copied or modified to explore new hypotheses.
Galaxy provides a web interface for many text manipulation tools, enabling researchers to do their own custom reformatting and manipulation without having to know computer programming or shell scripting. Galaxy includes interval manipulation tools for doing set theoretic operations (e.g. intersection, union, ...) on intervals. Many biological file formats include genomic interval data (a frame of reference, e.g., chromosome or contig name, and start and stop positions), allowing these data to be integrated.
Galaxy Objects: Datasets, Workflows, Histories, and Pages
Galaxy objects are anything that can be saved, persisted, and shared in Galaxy:
Datasets
Datasets includes any input, intermediate, or output dataset, used or produced in an analysis. Galaxy's data integration platform supports file uploads from the user's computer, by URL, and directly from many online external resources (such as the UCSC Genome Browser, BioMart and InterMine). Galaxy supports a range of widely used biological data formats, translation between those formats, and data conversions (see Tools).
Workflows
Workflows are computational analyses that specify all the steps (and parameters) in the analysis, but none of the data. Workflows are used to run the same analysis against multiple sets of input data.
Galaxy is a scientific workflow system. These systems provide a means to build multi-step computational analyses akin to a recipe. They typically provide a graphical user interface[31] for specifying what data to operate on, what steps to take, and what order to do them in.
Histories
Histories are computational analyses (recipes) run with specified input datasets, computational steps and parameters. Histories include all intermediate and output datasets as well.
Pages
Pages enables the creation of a virtual paper that describes the how and why of the overall experiment. Histories, workflows and datasets can include user-provided annotation. Tight integration of Pages with Histories, Workflows, and Datasets supports this goal.
Availability
Galaxy is available:
As a free public web server,[32] supported by the Galaxy Project.[33] This server includes many bioinformatics tools that are widely useful in many areas of genomics research. Users can create logins, and save histories, workflows, and datasets on the server. These saved items can also be shared with others.
Public web servers hosted by other organizations.[36] Several organizations with their own Galaxy installation have also opted to make those servers available to others.
Galaxy is an open source project and the community includes users, organizations that install their own instance, Galaxy developers, and bioinformatics tool developers. The Galaxy project has mailing lists,[40] a community hub,[41] and annual meetings.[42]
↑ Nasr, Engy; Amato, Pierre; Bhardwaj, Anshu; Blankenberg, Daniel; Brites, Daniela; Cumbo, Fabio; Do, Katherine; Ferrari, Emanuele; Griffin, Timothy J. (2024-12-27), "microGalaxy: A gateway to tools, workflows, and training for reproducible and FAIR analysis of microbial data", bioRxiv: The Preprint Server for Biology, doi:10.1101/2024.12.23.629682, PMC11703195, PMID39764050
↑ Soiland-Reyes, S (2010-12-13). "Looping". The Taverna Knowledge Blog. knowledgeblog.org. Archived from the original on 30 December 2016. Retrieved 28 January 2015.
↑ Ioannidis, J. P. A.; Allison, D. B.; Ball, C. A.; Coulibaly, I.; Cui, X.; Culhane, A. N. C.; Falchi, M.; Furlanello, C.; Game, L.; Jurman, G.; Mangion, J.; Mehta, T.; Nitzberg, M.; Page, G. P.; Petretto, E.; Van Noort, V. (2008). "Repeatability of published microarray gene expression analyses". Nature Genetics. 41 (2): 149–155. doi:10.1038/ng.295. PMID19174838. S2CID5153795.
This page is based on this Wikipedia article Text is available under the CC BY-SA 4.0 license; additional terms may apply. Images, videos and audio are available under their respective licenses.