This article has multiple issues. Please help improve it or discuss these issues on the talk page . (Learn how and when to remove these messages)
|
Founder | Greg Wilson |
---|---|
Location | |
Executive Director | Kari L. Jordan |
Website | carpentries |
Formerly called | Software Carpentry Foundation |
The Carpentries is a nonprofit organization that teaches software engineering and data science skills to researchers through instructional workshops. [1] [2] The Carpentries is made up of three programs areas: Software Carpentry, Data Carpentry and Library Carpentry.
The Carpentries workshops have been run internationally, including workshops at the Smithsonian Institution, [3] the Australian Research Data Commons, [4] CERN, [5] and in Antarctica. [6]
Software Carpentry workshops began in 1998 as week-long training courses by Brent Gorda and Greg Wilson. [7] [8] [9] at Los Alamos National Laboratory. The Software Carpentry Foundation was formed in 2014 alongside the sibling foundation, Data Carpentry. [9] These organizations were merged in 2018 to form what is now known as The Carpentries. [2] In 2018, Library Carpentry became the third lesson program of The Carpentries. [1]
Carpentries workshops are two-day workshops led by volunteer instructors who have been certified through the organization's training program. [10] [11] Content covered in a standard workshop includes using the command line and an introduction to a programming language such as R or Python. [1] [12] Workshops under the Data Carpentry program focus on specific subject domains, such as life sciences or social sciences. [10]
A Software Carpentry workshop is designed as an active learning and collaborative experience. The lesson content is hands-on with practice following instructors live coding, while helpers are ready to assist students and keep the class pace. Training covers the core skills needed to be productive in a small research team. Tutorials in the lesson alternate with practical exercises, where collaboration is attempted. There is a collaborative document where the learning process is constructed. [13] [14]
All lesson content under The Carpentries curriculum are licensed openly under Creative Commons licenses. [1] [11]
Before being adopted as an official Carpentries lesson, new lessons go through a series of stages designed to ensure they are sufficiently documented to be teachable by instructors outside of the initial author group.
The Carpentries shares The Carpentries Community Developed Lessons (there are three core topics: the Unix shell, version control with Git, and a programming language (Python or R). Curricula for these lessons in English and Spanish (select lessons only) and also Data Carpentry's lessons.
The Carpentries community has a collaborative and open process for lesson development and to sharing teaching materials.[ citation needed ] The Carpentries incubator [15] contains lessons developed by community members. These lessons follow a life cycle that begins with pre-alpha, where only the concept is offered, and ends with beta, where the lesson is taught in a workshop by instructors other than the authors. There are 4 stages: pre-alpha, alpha, beta, and stable.
Pre-alpha is the draft from the initial lesson idea. Alpha's goal is to collect and incorporate feedback from learners and co-instructor. The two lessons in beta stages are Reproducible Computational Environments using Containers [16] and Data Harvesting for Agriculture. [17]
Carpentries incubator has approximately 30 lessons available in alpha stage, ranging from a spreadsheet to a database [18] through Python for Humanities [19] and Metagenomics. [20] There is another main way for community members to share lessons material: The CarpentriesLab, [15] which is a repository for high-quality, peer-reviewed, short-format, lessons that use the teaching approach and lesson design from The Carpentries. It is also possible to get peer-review on the content of a lesson by submitting it to The Incubator through Carpentries. [21]
The lessons from both Carpentries Incubator and CarpentriesLab can be taught in meetups, classes or as complements to a standard two-day Carpentries workshop.
The Carpentries community has developed Spanish versions of its core lessons which are the Unix shell, version control with Git and R as a programming language.
The Carpentries is fiscally sponsored by Community Initiatives [22] and funded through a combination of memberships, workshop fees, grants and donations. The Carpentries has over 70 member organizations, [23] including the Software Sustainability Institute, [24] the National Institute of Standards and Technology, [25] New Zealand eScience Infrastructure, [26] and Compute Canada. [27]
Library Carpentry was seed funded in 2015 by a small grant from the Software Sustainability Institute. [28] . In November 2017, the Library Carpentry program received a supplemental Institute of Museum and Library Services grant, in partnership with the California Digital Library, valued at $249,553. [29] [30]
In November 2019, the Chan Zuckerberg Initiative and the Gordon and Betty Moore Foundation announced a joint award of $2.65 million for The Carpentries. [31]
CalculiX is a free and open-source finite-element analysis application that uses an input format similar to Abaqus. It has an implicit and explicit solver (CCX) written by Guido Dhondt and a pre- and post-processor (CGX) written by Klaus Wittig. The original software was written for the Linux operating system. Convergent Mechanical has ported the application to the Windows operating system.
NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. The predecessor of NumPy, Numeric, was originally created by Jim Hugunin with contributions from several other developers. In 2005, Travis Oliphant created NumPy by incorporating features of the competing Numarray into Numeric, with extensive modifications. NumPy is open-source software and has many contributors. NumPy is a NumFOCUS fiscally sponsored project.
Wolfram Research, Inc. is an American multinational company that creates computational technology. Wolfram's flagship product is the technical computing program Wolfram Mathematica, first released on June 23, 1988. Other products include WolframAlpha, Wolfram SystemModeler, Wolfram Workbench, gridMathematica, Wolfram Finance Platform, webMathematica, the Wolfram Cloud, and the Wolfram Programming Lab. Wolfram Research founder Stephen Wolfram is the CEO. The company is headquartered in Champaign, Illinois, United States.
The actor model in computer science is a mathematical model of concurrent computation that treats an actor as the basic building block of concurrent computation. In response to a message it receives, an actor can: make local decisions, create more actors, send more messages, and determine how to respond to the next message received. Actors may modify their own private state, but can only affect each other indirectly through messaging.
Quantum programming is the process of designing or assembling sequences of instructions, called quantum circuits, using gates, switches, and operators to manipulate a quantum system for a desired outcome or results of a given experiment. Quantum circuit algorithms can be implemented on integrated circuits, conducted with instrumentation, or written in a programming language for use with a quantum computer or a quantum processor.
PsychoPy is an open source software package written in the Python programming language primarily for use in neuroscience and experimental psychology research. Developed initially as a Python library and then as an application with a graphical interface, it now also supports JavaScript outputs to run studies online and on mobile devices. Unlike most packages, it provides users with a choice of interface - they can generate experiments by writing Python scripts, use a graphical interface that will generate a script for them, or combine both methods. Its platform independence is achieved through use of the wxPython widget library for the application and OpenGL for graphics calls. It is also capable of generating and delivering auditory stimuli.
Haskell is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research, and industrial applications, Haskell has pioneered a number of programming language features such as type classes, which enable type-safe operator overloading, and monadic input/output (IO). It is named after logician Haskell Curry. Haskell's main implementation is the Glasgow Haskell Compiler (GHC).
Madagascar is a software package for multidimensional data analysis and reproducible computational experiments.
Stan is a probabilistic programming language for statistical inference written in C++. The Stan language is used to specify a (Bayesian) statistical model with an imperative program calculating the log probability density function.
Eclipse Deeplearning4j is a programming library written in Java for the Java virtual machine (JVM). It is a framework with wide support for deep learning algorithms. Deeplearning4j includes implementations of the restricted Boltzmann machine, deep belief net, deep autoencoder, stacked denoising autoencoder and recursive neural tensor network, word2vec, doc2vec, and GloVe. These algorithms all include distributed parallel versions that integrate with Apache Hadoop and Spark.
Feature engineering is a preprocessing step in supervised machine learning and statistical modeling which transforms raw data into a more effective set of inputs. Each input comprises several attributes, known as features. By providing models with relevant information, feature engineering significantly enhances their predictive accuracy and decision-making capability.
A notebook interface or computational notebook is a virtual notebook environment used for literate programming, a method of writing computer programs. Some notebooks are WYSIWYG environments including executable calculations embedded in formatted documents; others separate calculations and text into separate sections. Notebooks share some goals and features with spreadsheets and word processors but go beyond their limited data models.
spaCy is an open-source software library for advanced natural language processing, written in the programming languages Python and Cython. The library is published under the MIT license and its main developers are Matthew Honnibal and Ines Montani, the founders of the software company Explosion.
Cisco DevNet is Cisco's developer program to help developers and IT professionals who want to write applications and develop integrations with Cisco products, platforms, and APIs. Cisco DevNet includes Cisco's products in software-defined networking, security, cloud, data center, internet of things, collaboration, and open-source software development. The developer.cisco.com site also provides learning and sandbox environments as well as a video series for those trying to learn coding and testing apps.
ML.NET is a free software machine learning library for the C# and F# programming languages. It also supports Python models when used together with NimbusML. The preview release of ML.NET included transforms for feature engineering like n-gram creation, and learners to handle binary classification, multi-class classification, and regression tasks. Additional ML tasks like anomaly detection and recommendation systems have since been added, and other approaches like deep learning will be included in future versions.
Tracy Teal is an American bioinformatician and the executive director of Data Carpentry. She is known for her work in open science and biomedical data science education.
Qiskit is an open-source software development kit (SDK) for working with quantum computers at the level of circuits, pulses, and algorithms. It provides tools for creating and manipulating quantum programs and running them on prototype quantum devices on IBM Quantum Platform or on simulators on a local computer. It follows the circuit model for universal quantum computation, and can be used for any quantum hardware that follows this model.
The tidyverse is a collection of open source packages for the R programming language introduced by Hadley Wickham and his team that "share an underlying design philosophy, grammar, and data structures" of tidy data. Characteristic features of tidyverse packages include extensive use of non-standard evaluation and encouraging piping.
Lean is a proof assistant and a functional programming language. It is based on the calculus of constructions with inductive types. It is an open-source project hosted on GitHub. It was developed primarily by Leonardo de Moura while employed by Microsoft Research and now Amazon Web Services, and has had significant contributions from other coauthors and collaborators during its history. Development is currently supported by the non-profit Lean Focused Research Organization (FRO).
Karthik Ram is a research scientist at the Berkeley Institute for Data Science and member of the Initiative for Global Change Biology at the University of California, Berkeley. He is best known for being the co-founder of rOpenSci. Ram's work focuses on global change, data science, and open research software.