Semmle

Last updated

Semmle
Semmle logo.png
Type of business Subsidiary
FoundedDecember 2006 (2006-12) in Oxford, England
HeadquartersSan Francisco, California, U.S.
Founder(s) Oege de Moor
Key peopleOege de Moor, Pavel Avgustinov, Julian Tibble
IndustrySoftware analysis
Products Code analysis software and services
Parent GitHub [1] (2019–present)
URL semmle.com [ dead link ]

Semmle Inc is a code-analysis platform; Semmle was acquired by GitHub (itself owned by Microsoft) on 18 September 2019 for an undisclosed amount. [2] Semmle's LGTM technology automates code review, tracks developer contributions, and flags software security issues. [2] The LGTM platform leverages the CodeQL query engine (formerly QL) [3] to perform semantic analysis on software code bases. GitHub aims to integrate Semmle technology to provide continuous vulnerability detection services. [4] In November 2019, use of CodeQL was made free for research and open source. [5] CodeQL either shares a direct pedigree with .QL (dot-que-ell), which derives from the Datalog family tree, or is an evolution of similar technology.[ clarification needed ]

Contents

SemmleCode is an object-oriented query language for deductive databases developed by Semmle. It is distinguished within this class by its support for recursive query.

Corporate background

The company was headquartered in San Francisco, with its development operations based in Blue Boar Court, Alfred Street, central Oxford, England. Semmle's customers included Credit Suisse, NASA, and Dell. [6]

SemmleCode background

Academic

SemmleCode builds on academic research on querying the source of software programs. The first such system was Linton's Omega system, [7] where queries were phrased in QUEL. QUEL did not allow for recursion in queries, making it difficult to inspect hierarchical program structures such as the call graph. The next significant development was therefore the use of logic programming, which does allow such recursive queries, in the XL C++ Browser. [8] The disadvantage of using a full logic programming language is however that it is very difficult to attain acceptable efficiency. The CodeQuest system, [9] developed at the University of Oxford, was the first to exploit the observation that Datalog, a very restrictive version of logic programming, is in the sweet spot between expressive power and efficiency. The QL query language is an object-oriented version of Datalog.

Industrial

The early research works on querying the source of software programs spun off a number of industrial applications. In particular it became the cornerstone of systems for application intelligence (data mining on the source of software systems) and software renovation. In 2007, Paris-based CAST [10] is one of the market leaders in that area, and other significant players include BluePhoenix in Herzliya, Israel. SemmleCode differs from these systems in its use of an object-oriented query language, which allows programmers to easily formulate new queries that are particular to their own project.

A full account of the academic and industrial developments leading up to the creation of SemmleCode can be found in a paper by Hajiyev et al. [11]

Sample query in QL

To illustrate the use of QL, consider the well-known rule in object-oriented programming that public fields should be declared final. To find violations of that rule, we should search for fields that are public but not final. In QL, that requirement is expressed as follows:

fromFieldfwheref.hasModifier("public")andnot(f.hasModifier("final"))selectf.getDeclaringType().getPackage(),f.getDeclaringType(),f

Here not only is the offending field f selected, but also the package and type in which its declaration occurs.

SemmleCode integration with development environments

SemmleCode provides a user interface via the Eclipse IDE to query Java code (both source code and bytecode) as well as XML files, and to edit QL queries. This is however but one application of the technology that underlies it: QL can be used to query any other type of complex data.

As part of the fold into the Microsoft/GitHub corporate house, the original Eclipse-based workflow has been supplanted with a workflow based around Microsoft's Visual Studio Code. [3]

See also

Related Research Articles

Structured Query Language (SQL) is a domain-specific language used in programming and designed for managing data held in a relational database management system (RDBMS), or for stream processing in a relational data stream management system (RDSMS). It is particularly useful in handling structured data, i.e., data incorporating relations among entities and variables.

OCaml is a general-purpose, high-level, multi-paradigm programming language which extends the Caml dialect of ML with object-oriented features. OCaml was created in 1996 by Xavier Leroy, Jérôme Vouillon, Damien Doligez, Didier Rémy, Ascánder Suárez, and others.

Programming languages can be grouped by the number and types of paradigms supported.

<span class="mw-page-title-main">F Sharp (programming language)</span> Microsoft programming language

F# is a general-purpose, strongly typed, multi-paradigm programming language that encompasses functional, imperative, and object-oriented programming methods. It is most often used as a cross-platform Common Language Infrastructure (CLI) language on .NET, but can also generate JavaScript and graphics processing unit (GPU) code.

A query language, also known as data query language or database query language (DQL), is a computer language used to make queries in databases and information systems. In database systems, query languages rely on strict theory to retrieve information. A well known example is the Structured Query Language (SQL).

Datalog is a declarative logic programming language. While it is syntactically a subset of Prolog, Datalog generally uses a bottom-up rather than top-down evaluation model. This difference yields significantly different behavior and properties from Prolog. It is often used as a query language for deductive databases. Datalog has been applied to problems in data integration, networking, program analysis, and more.

<span class="mw-page-title-main">Git</span> Software for version control of files

Git is a distributed version control system that tracks changes in any set of computer files, usually used for coordinating work among programmers who are collaboratively developing source code during software development. Its goals include speed, data integrity, and support for distributed, non-linear workflows.

.QL is an object-oriented query language used to retrieve data from relational database management systems. It is reminiscent of the standard query language SQL and the object-oriented programming language Java. .QL is an object-oriented variant of a logical query language called Datalog. Hierarchical data can therefore be naturally queried in .QL in a recursive manner.

Azure DevOps Server is a Microsoft product that provides version control, reporting, requirements management, project management, automated builds, testing and release management capabilities. It covers the entire application lifecycle and enables DevOps capabilities. Azure DevOps can be used as a back-end to numerous integrated development environments (IDEs) but is tailored for Microsoft Visual Studio and Eclipse on all platforms.

Domain-driven design (DDD) is a major software design approach, focusing on modeling software to match a domain according to input from that domain's experts.

T2 Temporal Prover is an automated program analyzer developed in the Terminator research project at Microsoft Research.

<span class="mw-page-title-main">.NET Framework</span> Software platform developed by Microsoft

The .NET Framework is a proprietary software framework developed by Microsoft that runs primarily on Microsoft Windows. It was the predominant implementation of the Common Language Infrastructure (CLI) until being superseded by the cross-platform .NET project. It includes a large class library called Framework Class Library (FCL) and provides language interoperability across several programming languages. Programs written for .NET Framework execute in a software environment named the Common Language Runtime (CLR). The CLR is an application virtual machine that provides services such as security, memory management, and exception handling. As such, computer code written using .NET Framework is called "managed code". FCL and CLR together constitute the .NET Framework.

Perforce Software, Inc. is an American developer of software used for developing and running applications, including version control software, web-based repository management, developer collaboration, application lifecycle management, web application servers, debugging tools and agile planning software.

<span class="mw-page-title-main">Dafny</span> Programming language

Dafny is an imperative and functional compiled language that compiles to other programming languages, such as C#, Java, JavaScript, Go and Python. It supports formal specification through preconditions, postconditions, loop invariants, loop variants, termination specifications and read/write framing specifications. The language combines ideas from the functional and imperative paradigms; it includes support for object-oriented programming. Features include generic classes, dynamic allocation, inductive datatypes and a variation of separation logic known as implicit dynamic frames for reasoning about side effects. Dafny was created by Rustan Leino at Microsoft Research after his previous work on developing ESC/Modula-3, ESC/Java, and Spec#.

<span class="mw-page-title-main">SimpleITK</span> Open-source interface for the ITK framework

SimpleITK is a simplified, open-source interface to the Insight Segmentation and Registration Toolkit (ITK). The SimpleITK image analysis library is available in multiple programming languages including C++, Python, R, Java, C#, Lua, Ruby and Tcl. Binary distributions are available for all three major operating systems.

Microsoft, a technology company historically known for its opposition to the open source software paradigm, turned to embrace the approach in the 2010s. From the 1970s through 2000s under CEOs Bill Gates and Steve Ballmer, Microsoft viewed the community creation and sharing of communal code, later to be known as free and open source software, as a threat to its business, and both executives spoke negatively against it. In the 2010s, as the industry turned towards cloud, embedded, and mobile computing—technologies powered by open source advances—CEO Satya Nadella led Microsoft towards open source adoption although Microsoft's traditional Windows business continued to grow throughout this period generating revenues of 26.8 billion in the third quarter of 2018, while Microsoft's Azure cloud revenues nearly doubled.

Z3, also known as the Z3 Theorem Prover, is a satisfiability modulo theories (SMT) solver developed by Microsoft.

TerminusDB is an open source knowledge graph and document store. It is used to build versioned data products. It is a native revision control database that is architecturally similar to Git. It is listed on DB-Engines.

Flix is a functional, imperative, and logic programming language developed at Aarhus University, with funding from the Independent Research Fund Denmark, and by a community of open source contributors. The Flix language supports algebraic data types, pattern matching, parametric polymorphism, currying, higher-order functions, extensible records, channel and process-based concurrency, and tail call elimination. Two notable features of Flix are its type and effect system and its support for first-class Datalog constraints.

The Vadalog system is a Knowledge Graph Management System (KGMS) that offers a language for performing complex logic reasoning tasks over knowledge graphs. At the same time, Vadalog delivers a platform to support the entire spectrum of data science tasks: data integration, pre-processing, statistical analysis, machine learning, algorithmic modeling, probabilistic reasoning and temporal reasoning. Its language is based on an extension of the rule-based language Datalog, Warded Datalog±, a high-performance language using an aggressive termination control strategy. Vadalog can support the entire spectrum of data science activities and tools. The system can read from and connect to multiple sources, from relational databases, such as PostgreSQL and MySQL, to graph databases, such as Neo4j, as well as make use of machine learning tools, and a web data extraction tool, OXPath. Additional Python libraries and extensions can also be easily integrated into the system.

References

  1. "GitHub acquires Semmle to help developers spot code exploits". venturebeat.com. Retrieved 20 September 2019.
  2. 1 2 Lardinois, Frederic (18 September 2019). "GitHub acquires code analysis tool Semmle". techcrunch.com. TechCrunch . Retrieved 13 March 2021.
  3. 1 2 "Introducing CodeQL". semmle.com. Semmle. September 2019. Retrieved 13 March 2021. the 'QL' product and tooling has been renamed to CodeQL ... what was previously called a 'QL snapshot' is now a CodeQL database.
  4. De Simone, Sergio (19 September 2019). "GitHub to Integrate Semmle Code Analysis for Continuous Vulnerability Detection". infoq. InfoQ . Retrieved 13 March 2021.
  5. Krill, Paul (15 November 2019). "GitHub makes CodeQL free for research and open source". infoworld.com. InfoWorld . Retrieved 13 March 2021.
  6. "Spin-out company Semmle secures $8M from Accel Partners" (Press release). University of Oxford. 16 September 2014. Retrieved 18 September 2015.
  7. "Linton's Omega system". USA: University of California, Berkeley. 1983.
  8. Shahram Javey, Kin’ichi Mitsui, Hiroaki Nakamura, Tsuyoshi Ohira, Kazu Yasuda, Kazushi Kuse, Tsutomu Kamimura, and Richard Helm. Architecture of the XL C++ browser. In CASCON ’92: Proceedings of the 1992 conference of the Centre for Advanced Studies on Collaborative research, pages 369–379. IBM Press, 1992.
  9. "CodeQuest system". UK: Oxford University Computing Laboratory. Archived from the original on 9 October 2006.
  10. "CAST Software".
  11. Elnar Hajiyev, Mathieu Verbaere, and Oege de Moor, CodeQuest: Scalable Source Code Queries with Datalog. In ECOOP 2006: Proceedings of the 2006 European Conference on Object-Oriented Programming, pages 2–27. Springer, 2006.

Further reading