ChatScript

Last updated

ChatScript is a combination Natural Language engine and dialog management system designed initially for creating chatbots, but is currently also used for various forms of NL processing. It is written in C++. The engine is an open source project at SourceForge. [1] and GitHub. [2]

Contents

ChatScript was written by Bruce Wilcox and originally released in 2011, after Suzette (written in ChatScript) won the 2010 Loebner Prize, fooling one of four human judges. [3]

Features

In general ChatScript aims to author extremely concisely, since the limiting scalability of hand-authored chatbots is how much/fast one can write the script.

Because ChatScript is designed for interactive conversation, it automatically maintains user state across volleys. A volley is any number of sentences the user inputs at once and the chatbots response.

The basic element of scripting is the rule. A rule consists of a type, a label (optional), a pattern, and an output. There are three types of rules. Gambits are something a chatbot might say when it has control of the conversation. Rejoinders are rules that respond to a user remark tied to what the chatbot just said. Responders are rules that respond to arbitrary user input which is not necessarily tied to what the chatbot just said. Patterns describe conditions under which a rule may fire. Patterns range from extremely simplistic to deeply complex (analogous to Regex but aimed for NL). Heavy use is typically made of concept sets, which are lists of words sharing a meaning. ChatScript contains some 2000 predefined concepts and scripters can easily write their own. Output of a rule intermixes literal words to be sent to the user along with common C-style programming code.

Rules are bundled into collections called topics. Topics can have keywords, which allows the engine to automatically search the topic for relevant rules based on user input.

Example code

Topic: ~food(  ~fruit  fruit food eat)  t: What is your favorite food?     a: (~fruit) I like fruit also.     a: (~metal) I prefer listening to heavy metal music rather than eating it.  ?:  WHATMUSIC ( << what music you ~like >>) I prefer rock music. s: ( I * ~like * _~music_types)    ^if (_0 == country) {I don't like country.} else {So do I.} 

Words starting with ~ are concept sets. For example, ~fruit is the list of all known fruits. The simple pattern (~fruit) reacts if any fruit is mentioned immediately after the chatbot asks for favorite food. The slightly more complex pattern for the rule labelled WHATMUSIC requires all the words what, music, you and any word or phrase meaning to like, but they may occur in any order. Responders come in three types.  ?: rules react to user questions. s: rules react to user statements. u: rules react to either.

ChatScript code supports standard if-else, loops, user-defined functions and calls, and variable assignment and access.

Data

Some data in ChatScript is transient, meaning it will disappear at the end of the current volley. Other data is permanent, lasting forever until explicitly killed off. Data can be local to a single user or shared across all users at the bot level.

Internally all data is represented as text and is automatically converted to a numeric form as needed.

Variables

User variables come in several kinds. Variables purely local to a topic or function are transient. Global variables can be declared as transient or permanent. A variable is generally declared merely by using it, and its type depends on its prefix ($, $$, $_).

$_local  = 1   is a local transient variable being assigned a 1 $$global1.value = “hi” is a transient global variable which is a JSON object $global2 += 20  is a permanent global variable 

Facts

In addition to variables, ChatScript supports facts – triples of data, which can also be transient or permanent. Functions can query for facts having particular values of some of the fields, making them act like an in-memory database. Fact retrieval is very quick and efficient the number of available in-memory facts is largely constrained to the available memory of the machine running the ChatScript engine. Facts can represent record structures and are how ChatScript represents JSON internally. Tables of information can be defined to generate appropriate facts.

table: ~inventors(^who ^what) createfact(^who invent ^what) DATA: "Johannes Gutenberg" "printing press" "Albert Einstein" ["Theory of Relativity" photon "Theory of General Relativity"] 

The above table links people to what they invented (1 per line) with Einstein getting a list of things he did.

External communication

ChatScript embeds the Curl library and can directly read and write facts in JSON to a website.

Server

A ChatScript engine can run in local or server mode.

Pos-tagging, parsing, and ontology

ChatScript comes with a copy of English WordNet embedded within, including its ontology, and creates and extends its own ontology via concept declarations. It has an English language pos-tagger and parser and supports integration with TreeTagger for pos-tagging a number of other languages (TreeTagger commercial license required).

Databases

In addition to an internal fact database, ChatScript supports PostgreSQL, MySQL, MSSQL and MongoDB both for access by scripts, but also as a central filesystem if desired so ChatScript can be scaled horizontally. A common use case is to use a centralized database to host the user files and multiple servers to scale the ChatScript engine.

JavaScript

ChatScript also embeds DukTape, ECMAScript E5/E5.1 compatibility, with some semantics updated from ES2015+.

Spelling Correction

ChatScript has built-in automatic spell checking, which can be augmented in script as both simple word replacements or context sensitive changes. With appropriate simple rules you can change perfect legal words into other words or delete them. E.g., if you have a concept of ~electronic_goods and don't want an input of Radio Shack (a store name) to be detected as an electronic good, you can get the input to change to Radio_Shack (a single word), or allow the words to remain but block the detection of the concept.

This is particularly useful when combined with speech-to-text code that is imperfect, but you are familiar with common failings of it and can compensate for them in script.

Control flow

A chatbot's control flow is managed by the control script. This is merely another ordinary topic of rules, that invokes API functions of the engine. Thus control is fully configurable by the scripter (and functions exist to allow introspection into the engine). There are pre-processing control flow and post-processing control flow options available, for special processing.

Related Research Articles

<span class="mw-page-title-main">AWK</span> Programming language

AWK is a domain-specific language designed for text processing and typically used as a data extraction and reporting tool. Like sed and grep, it is a filter, and is a standard feature of most Unix-like operating systems.

<span class="mw-page-title-main">JavaScript</span> High-level programming language

JavaScript, often abbreviated as JS, is a programming language and core technology of the Web, alongside HTML and CSS. 99% of websites use JavaScript on the client side for webpage behavior.

<span class="mw-page-title-main">ELIZA</span> Early natural language processing computer program

ELIZA is an early natural language processing computer program developed from 1964 to 1967 at MIT by Joseph Weizenbaum. Created to explore communication between humans and machines, ELIZA simulated conversation by using a pattern matching and substitution methodology that gave users an illusion of understanding on the part of the program, but had no representation that could be considered really understanding what was being said by either party. Whereas the ELIZA program itself was written (originally) in MAD-SLIP, the pattern matching directives that contained most of its language capability were provided in separate "scripts", represented in a lisp-like representation. The most famous script, DOCTOR, simulated a psychotherapist of the Rogerian school, and used rules, dictated in the script, to respond with non-directional questions to user inputs. As such, ELIZA was one of the first chatterbots and one of the first programs capable of attempting the Turing test.

Rebol is a cross-platform data exchange language and a multi-paradigm dynamic programming language designed by Carl Sassenrath for network communications and distributed computing. It introduces the concept of dialecting: small, optimized, domain-specific languages for code and data, which is also the most notable property of the language according to its designer Carl Sassenrath:

Although it can be used for programming, writing functions, and performing processes, its greatest strength is the ability to easily create domain-specific languages or dialects

sed Standard UNIX utility for editing streams of data

sed is a Unix utility that parses and transforms text, using a simple, compact programming language. It was developed from 1973 to 1974 by Lee E. McMahon of Bell Labs, and is available today for most operating systems. sed was based on the scripting features of the interactive editor ed and the earlier qed. It was one of the earliest tools to support regular expressions, and remains in use for text processing, most notably with the substitution command. Popular alternative tools for plaintext string manipulation and "stream editing" include AWK and Perl.

<span class="mw-page-title-main">Serialization</span> Conversion process for computer data

In computing, serialization is the process of translating a data structure or object state into a format that can be stored or transmitted and reconstructed later. When the resulting series of bits is reread according to the serialization format, it can be used to create a semantically identical clone of the original object. For many complex objects, such as those that make extensive use of references, this process is not straightforward. Serialization of objects does not include any of their associated methods with which they were previously linked.

<span class="mw-page-title-main">Comparison of command shells</span>

A command shell is a command-line interface to interact with and manipulate a computer's operating system.

JSON is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays. It is a commonly used data format with diverse uses in electronic data interchange, including that of web applications with servers.

A read–eval–print loop (REPL), also termed an interactive toplevel or language shell, is a simple interactive computer programming environment that takes single user inputs, executes them, and returns the result to the user; a program written in a REPL environment is executed piecewise. The term usually refers to programming interfaces similar to the classic Lisp machine interactive environment. Common examples include command-line shells and similar environments for programming languages, and the technique is very characteristic of scripting languages.

A dialog manager (DM) is a component of a dialog system (DS), responsible for the state and flow of the conversation. Usually:

A webform, web form or HTML form on a web page allows a user to enter data that is sent to a server for processing. Forms can resemble paper or database forms because web users fill out the forms using checkboxes, radio buttons, or text fields. For example, forms can be used to enter shipping or credit card data to order a product, or can be used to retrieve search results from a search engine.

<span class="mw-page-title-main">YUI Library</span>

The Yahoo! User Interface Library (YUI) is a discontinued open-source JavaScript library for building richly interactive web applications using techniques such as Ajax, DHTML, and DOM scripting. YUI includes several cores CSS resources. It is available under a BSD License. Development on YUI began in 2005 and Yahoo! properties such as My Yahoo! and the Yahoo! front page began using YUI in the summer of that year. YUI was released for public use in February 2006. It was actively developed by a core team of Yahoo! engineers.

A single-page application (SPA) is a web application or website that interacts with the user by dynamically rewriting the current web page with new data from the web server, instead of the default method of loading entire new pages. The goal is faster transitions that make the website feel more like a native app.

<span class="mw-page-title-main">Google Closure Tools</span> JavaScript developer toolkit

Google Closure Tools is a set of tools to help developers build rich web applications with JavaScript. It was developed by Google for use in their web applications such as Gmail, Google Docs and Google Maps. As of 2023, the project had over 230K LOCs not counting the embedded Mozilla Rhino compiler.

CoffeeScript is a programming language that compiles to JavaScript. It adds syntactic sugar inspired by Ruby, Python, and Haskell in an effort to enhance JavaScript's brevity and readability. Specific additional features include list comprehension and destructuring assignment.

Mustache is a web template system. It is described as a logic-less system because it lacks any explicit control flow statements, like if and else conditionals or for loops; however, both looping and conditional evaluation can be achieved using section tags processing lists and anonymous functions (lambdas). It is named "Mustache" because of heavy use of braces, { }, that resemble a sideways moustache. Mustache is used mainly for mobile and web applications.

<span class="mw-page-title-main">Ember.js</span> JavaScript framework

Ember.js is an open-source JavaScript web framework that utilizes a component-service pattern. It is designed to allow developers to create scalable single-page web applications by incorporating common idioms, best practices, and patterns from other single-page-app ecosystem patterns into the framework.

<span class="mw-page-title-main">React (JavaScript library)</span> JavaScript library for building user interfaces

React is a free and open-source front-end JavaScript library for building user interfaces based on components by Facebook Inc. It is maintained by Meta and a community of individual developers and companies.

<span class="mw-page-title-main">Nim (programming language)</span> Programming language

Nim is a general-purpose, multi-paradigm, statically typed, compiled high-level system programming language, designed and developed by a team around Andreas Rumpf. Nim is designed to be "efficient, expressive, and elegant", supporting metaprogramming, functional, message passing, procedural, and object-oriented programming styles by providing several features such as compile time code generation, algebraic data types, a foreign function interface (FFI) with C, C++, Objective-C, and JavaScript, and supporting compiling to those same languages as intermediate representations.

Shapes Constraint Language (SHACL) is a World Wide Web Consortium (W3C) standard language for describing Resource Description Framework (RDF) graphs. SHACL has been designed to enhance the semantic and technical interoperability layers of ontologies expressed as RDF graphs.

References