The Carpentries

Last updated
The Carpentries
FounderGreg Wilson
Location
Executive Director
Kari L. Jordan
Website carpentries.org
Formerly called
Software Carpentry Foundation

The Carpentries is a nonprofit organization that teaches software engineering and data science skills to researchers through instructional workshops. [1] [2] The Carpentries is made up of three programs areas: Software Carpentry, Data Carpentry and Library Carpentry.

Contents

The Carpentries workshops have been run internationally, including workshops at the Smithsonian Institution, [3] the Australian Research Data Commons, [4] CERN, [5] and in Antarctica. [6]

History

Software Carpentry workshops began in 1998 as week-long training courses by Brent Gorda and Greg Wilson. [7] [8] [9] at Los Alamos National Laboratory. The Software Carpentry Foundation was formed in 2014 alongside the sibling foundation, Data Carpentry. [9] These organizations were merged in 2018 to form what is now known as The Carpentries. [2] In 2018, Library Carpentry became the third lesson program of The Carpentries. [1]

Workshops

Carpentries workshops are two-day workshops led by volunteer instructors who have been certified through the organization's training program. [10] [11] Content covered in a standard workshop includes using the command line and an introduction to a programming language such as R or Python. [1] [12] Workshops under the Data Carpentry program focus on specific subject domains, such as life sciences or social sciences. [10]

A Software Carpentry workshop is designed as an active learning and collaborative experience. The lesson content is hands-on with practice following instructors live coding, while helpers are ready to assist students and keep the class pace. Training covers the core skills needed to be productive in a small research team. Tutorials in the lesson alternate with practical exercises, where collaboration is attempted. There is a collaborative document where the learning process is constructed. [13] [14]

Lessons

Stable lessons

All lesson content under The Carpentries curriculum are licensed openly under Creative Commons licenses. [1] [11]

Before being adopted as an official Carpentries lesson, new lessons go through a series of stages designed to ensure they are sufficiently documented to be teachable by instructors outside of the initial author group.

The Carpentries shares The Carpentries Community Developed Lessons (there are three core topics: the Unix shell, version control with Git, and a programming language (Python or R). Curricula for these lessons in English and Spanish (select lessons only) and also Data Carpentry's lessons.

Community developed lesson

The Carpentries community has a collaborative and open process for lesson development and to sharing teaching materials.[ citation needed ] The Carpentries incubator [15] contains lessons developed by community members. These lessons follow a life cycle that begins with pre-alpha, where only the concept is offered, and ends with beta, where the lesson is taught in a workshop by instructors other than the authors. There are 4 stages: pre-alpha, alpha, beta, and stable.

Pre-alpha is the draft from the initial lesson idea. Alpha's goal is to collect and incorporate feedback from learners and co-instructor. The two lessons in beta stages are Reproducible Computational Environments using Containers [16] and Data Harvesting for Agriculture. [17]

Carpentries incubator has approximately 30 lessons available in alpha stage, ranging from a spreadsheet to a database [18] through Python for Humanities [19] and Metagenomics. [20] There is another main way for community members to share lessons material: The CarpentriesLab, [15] which is a repository for high-quality, peer-reviewed, short-format, lessons that use the teaching approach and lesson design from The Carpentries. It is also possible to get peer-review on the content of a lesson by submitting it to The Incubator through Carpentries. [21]

The lessons from both Carpentries Incubator and CarpentriesLab can be taught in meetups, classes or as complements to a standard two-day Carpentries workshop.

Other language lessons

The Carpentries community has developed Spanish versions of its core lessons which are the Unix shell, version control with Git and R as a programming language.

Funding

The Carpentries is fiscally sponsored by Community Initiatives [22] and funded through a combination of memberships, workshop fees, grants and donations. The Carpentries has over 70 member organizations, [23] including the Software Sustainability Institute, [24] the National Institute of Standards and Technology, [25] New Zealand eScience Infrastructure, [26] and Compute Canada. [27]

Library Carpentry was seed funded in 2015 by a small grant from the Software Sustainability Institute. [28] . In November 2017, the Library Carpentry program received a supplemental Institute of Museum and Library Services grant, in partnership with the California Digital Library, valued at $249,553. [29] [30]

In November 2019, the Chan Zuckerberg Initiative and the Gordon and Betty Moore Foundation announced a joint award of $2.65 million for The Carpentries. [31]

Related Research Articles

<span class="mw-page-title-main">Calculix</span> Finite-element analysis application

CalculiX is a free and open-source finite-element analysis application that uses an input format similar to Abaqus. It has an implicit and explicit solver (CCX) written by Guido Dhondt and a pre- and post-processor (CGX) written by Klaus Wittig. The original software was written for the Linux operating system. Convergent Mechanical has ported the application to the Windows operating system.

<span class="mw-page-title-main">NumPy</span> Python library for numerical programming

NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. The predecessor of NumPy, Numeric, was originally created by Jim Hugunin with contributions from several other developers. In 2005, Travis Oliphant created NumPy by incorporating features of the competing Numarray into Numeric, with extensive modifications. NumPy is open-source software and has many contributors. NumPy is a NumFOCUS fiscally sponsored project.

<span class="mw-page-title-main">Wolfram Research</span> American multinational company

Wolfram Research, Inc. is an American multinational company that creates computational technology. Wolfram's flagship product is the technical computing program Wolfram Mathematica, first released on June 23, 1988. Other products include WolframAlpha, Wolfram SystemModeler, Wolfram Workbench, gridMathematica, Wolfram Finance Platform, webMathematica, the Wolfram Cloud, and the Wolfram Programming Lab. Wolfram Research founder Stephen Wolfram is the CEO. The company is headquartered in Champaign, Illinois, United States.

The actor model in computer science is a mathematical model of concurrent computation that treats an actor as the basic building block of concurrent computation. In response to a message it receives, an actor can: make local decisions, create more actors, send more messages, and determine how to respond to the next message received. Actors may modify their own private state, but can only affect each other indirectly through messaging.

Quantum programming is the process of designing or assembling sequences of instructions, called quantum circuits, using gates, switches, and operators to manipulate a quantum system for a desired outcome or results of a given experiment. Quantum circuit algorithms can be implemented on integrated circuits, conducted with instrumentation, or written in a programming language for use with a quantum computer or a quantum processor.

<span class="mw-page-title-main">PsychoPy</span>

PsychoPy is an open source software package written in the Python programming language primarily for use in neuroscience and experimental psychology research. Developed initially as a Python library and then as an application with a graphical interface, it now also supports JavaScript outputs to run studies online and on mobile devices. Unlike most packages, it provides users with a choice of interface - they can generate experiments by writing Python scripts, use a graphical interface that will generate a script for them, or combine both methods. Its platform independence is achieved through use of the wxPython widget library for the application and OpenGL for graphics calls. It is also capable of generating and delivering auditory stimuli.

Haskell is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research, and industrial applications, Haskell has pioneered a number of programming language features such as type classes, which enable type-safe operator overloading, and monadic input/output (IO). It is named after logician Haskell Curry. Haskell's main implementation is the Glasgow Haskell Compiler (GHC).

<span class="mw-page-title-main">Madagascar (software)</span>

Madagascar is a software package for multidimensional data analysis and reproducible computational experiments.

<span class="mw-page-title-main">Stan (software)</span> Probabilistic programming language for Bayesian inference

Stan is a probabilistic programming language for statistical inference written in C++. The Stan language is used to specify a (Bayesian) statistical model with an imperative program calculating the log probability density function.

Eclipse Deeplearning4j is a programming library written in Java for the Java virtual machine (JVM). It is a framework with wide support for deep learning algorithms. Deeplearning4j includes implementations of the restricted Boltzmann machine, deep belief net, deep autoencoder, stacked denoising autoencoder and recursive neural tensor network, word2vec, doc2vec, and GloVe. These algorithms all include distributed parallel versions that integrate with Apache Hadoop and Spark.

Feature engineering is a preprocessing step in supervised machine learning and statistical modeling which transforms raw data into a more effective set of inputs. Each input comprises several attributes, known as features. By providing models with relevant information, feature engineering significantly enhances their predictive accuracy and decision-making capability.

<span class="mw-page-title-main">Notebook interface</span> Programming tool blending code and documents

A notebook interface or computational notebook is a virtual notebook environment used for literate programming, a method of writing computer programs. Some notebooks are WYSIWYG environments including executable calculations embedded in formatted documents; others separate calculations and text into separate sections. Notebooks share some goals and features with spreadsheets and word processors but go beyond their limited data models.

spaCy Software library

spaCy is an open-source software library for advanced natural language processing, written in the programming languages Python and Cython. The library is published under the MIT license and its main developers are Matthew Honnibal and Ines Montani, the founders of the software company Explosion.

<span class="mw-page-title-main">Cisco DevNet</span>

Cisco DevNet is Cisco's developer program to help developers and IT professionals who want to write applications and develop integrations with Cisco products, platforms, and APIs. Cisco DevNet includes Cisco's products in software-defined networking, security, cloud, data center, internet of things, collaboration, and open-source software development. The developer.cisco.com site also provides learning and sandbox environments as well as a video series for those trying to learn coding and testing apps.

<span class="mw-page-title-main">ML.NET</span> Machine learning library

ML.NET is a free software machine learning library for the C# and F# programming languages. It also supports Python models when used together with NimbusML. The preview release of ML.NET included transforms for feature engineering like n-gram creation, and learners to handle binary classification, multi-class classification, and regression tasks. Additional ML tasks like anomaly detection and recommendation systems have since been added, and other approaches like deep learning will be included in future versions.

<span class="mw-page-title-main">Tracy Teal</span> American bioinformatician

Tracy Teal is an American bioinformatician and the executive director of Data Carpentry. She is known for her work in open science and biomedical data science education.

<span class="mw-page-title-main">Qiskit</span> Open-source software development kit

Qiskit is an open-source software development kit (SDK) for working with quantum computers at the level of circuits, pulses, and algorithms. It provides tools for creating and manipulating quantum programs and running them on prototype quantum devices on IBM Quantum Platform or on simulators on a local computer. It follows the circuit model for universal quantum computation, and can be used for any quantum hardware that follows this model.

<span class="mw-page-title-main">Tidyverse</span> Collection of R packages

The tidyverse is a collection of open source packages for the R programming language introduced by Hadley Wickham and his team that "share an underlying design philosophy, grammar, and data structures" of tidy data. Characteristic features of tidyverse packages include extensive use of non-standard evaluation and encouraging piping.

Lean is a proof assistant and a functional programming language. It is based on the calculus of constructions with inductive types. It is an open-source project hosted on GitHub. It was developed primarily by Leonardo de Moura while employed by Microsoft Research and now Amazon Web Services, and has had significant contributions from other coauthors and collaborators during its history. Development is currently supported by the non-profit Lean Focused Research Organization (FRO).

<span class="mw-page-title-main">Karthik Ram</span> Data scientist

Karthik Ram is a research scientist at the Berkeley Institute for Data Science and member of the Initiative for Global Change Biology at the University of California, Berkeley. He is best known for being the co-founder of rOpenSci. Ram's work focuses on global change, data science, and open research software.

References

  1. 1 2 3 4 Pugachev, Sarah (2019). "What Are "The Carpentries" and What Are They Doing in the Library?". Portal: Libraries and the Academy. 19 (2): 209–214. doi:10.1353/pla.2019.0011. ISSN   1530-7131. S2CID   146034351.
  2. 1 2 Atwood, Thea P; Creamer, Andrew T.; Dull, Joshua; Goldman, Julie; Lee, Kristin; Leligdon, Lora C.; Oelker, Sarah K (2019). "Joining Together to Build More: The New England Software Carpentry Library Consortium". Journal of eScience Librarianship. 8 (1): e1161. doi: 10.7191/jeslib.2019.1161 .
  3. "Carpentries, Genomics, and Data Science training at the Smithsonian | Smithsonian Data Science Lab". datascience.si.edu. Retrieved 2019-11-10.
  4. "Supporting The Carpentries". ARDC. Retrieved 2019-11-10.
  5. "Software Carpentry at CERN (27-29 November 2019): Overview · Indico". Indico. Retrieved 2019-11-10.
  6. Perkel, Jeffrey M. (2018). "Software training in Antarctica". Nature. 560 (7719): 515. Bibcode:2018Natur.560..515P. doi: 10.1038/d41586-018-06011-1 . PMID   30127483. S2CID   52048713.
  7. Markel, Scott; Devenyi, Gabriel A.; Emonet, Rémi; Harris, Rayna M.; Hertweck, Kate L.; Irving, Damien; Milligan, Ian; Wilson, Greg (2018). "Ten simple rules for collaborative lesson development". PLOS Computational Biology. 14 (3): e1005963. arXiv: 1707.02662 . Bibcode:2018PLSCB..14E5963D. doi: 10.1371/journal.pcbi.1005963 . ISSN   1553-7358. PMC   5832188 . PMID   29494585.
  8. Wilson, Gregory (2021). "The Third Bit". third-bit.com. “Start where you are, use what you have, help who you can”
  9. 1 2 Wilson, Greg (2016). "Software Carpentry: lessons learned". F1000Research. 3: 62. doi: 10.12688/f1000research.3-62.v2 . ISSN   2046-1402. PMC   3976103 . PMID   24715981.
  10. 1 2 Pawlik, Aleksandra; van Gelder, Celia W.G.; Nenadic, Aleksandra; Palagi, Patricia M.; Korpelainen, Eija; Lijnzaad, Philip; Marek, Diana; Sansone, Susanna-Assunta; Hancock, John; Goble, Carole (2017). "Developing a strategy for computational lab skills training through Software and Data Carpentry: Experiences from the ELIXIR Pilot action". F1000Research. 6: 1040. doi: 10.12688/f1000research.11718.1 . ISSN   2046-1402. PMC   5516217 . PMID   28781745.
  11. 1 2 Labou, Stephanie; Otsuji, Reid (2019). "Expanding Library Resources for Data and Compute-Intensive Education and Research". 2019 15th International Conference on eScience (EScience). San Diego, CA, USA: IEEE. pp. 646–647. doi:10.1109/eScience.2019.00100. ISBN   978-1-7281-2451-3. S2CID   214594737.
  12. National Academies Of Sciences, Engineering; Division of Behavioral Social Sciences Education; Board On Science, Education; Division on Engineering Physical Sciences; Committee on Applied Theoretical Statistics; Board on Mathematical Sciences Analytics; Computer Science Telecommunications Board; Committee on Envisioning the Data Science Discipline: The Undergraduate Perspective (2018). Data Science for Undergraduates: Opportunities and Options. Washington, DC: The National Academies Press. p. 55. doi:10.17226/25104. ISBN   978-0-309-47559-4. PMID   30407778. S2CID   86392049.
  13. Weaver, Belinda (2020). The efficacy and usefulness of software carpentry training: a follow-up cohort study (PDF) (master). Retrieved 2021-01-01.[ dead link ]
  14. "Instructor Training" . Retrieved 2021-07-02.
  15. 1 2 "Community Developed Lessons". The Carpentries.
  16. "Reproducible Computational Environments Using Containers: Introduction to Docker". carpentries-incubator.github.io.
  17. "Data Harvesting for Agriculture". carpentries-incubator.github.io.
  18. "From a Spreadsheet to a Database". carpentries-incubator.github.io.
  19. "Python for Humanities". carpentries-incubator.github.io.
  20. "Data processing and visualization for metagenomics". carpentries-incubator.github.io.
  21. "GitHub Repository". github.com. 9 November 2021.
  22. "Fiscally Sponsored Projects". Community Initiatives. Retrieved 2019-11-10.
  23. "4TU.ResearchData | Expanding Researchers' software skills at Technical Universities across The Netherlands". researchdata.4tu.nl. Archived from the original on 2020-07-07. Retrieved 2020-07-07.
  24. "The Carpentries and our partnership | Software Sustainability Institute". software.ac.uk. Retrieved 2019-11-11.
  25. Greene, Gretchen (2019-07-02). "Software and Data Carpentry". NIST. Retrieved 2019-11-11.
  26. "NeSI partners with Software Carpentry to expand research computing training". New Zealand eScience Infrastructure. Retrieved 2020-07-07.
  27. "Training | Compute Canada". 3 April 2015. Retrieved 2019-11-11.
  28. "Library Carpentry in words and numbers: all code, no woodwork". cradledincaricature.com. Retrieved 2024-09-17.
  29. "Library Carpentry Receives Supplemental IMLS Grant – UC3 :: California Digital Library" . Retrieved 2020-01-13.
  30. "RE-85-17-0121-17". Institute of Museum and Library Services. 2017-08-30. Retrieved 2020-01-13.
  31. "$2.65 million to expand computational research skills in science". Scienceboard.net. Retrieved 2019-11-11.