JSFuck

Last updated

JSFuck is an esoteric subset of JavaScript, where code is written using only six characters: [, ], (, ), !, and +. The name is derived from Brainfuck, an esoteric programming language that also uses a minimalistic alphabet of only punctuation. Unlike Brainfuck, which requires its own compiler or interpreter, JSFuck is valid JavaScript code, meaning that JSFuck programs can be run in any web browser or engine that interprets JavaScript. JSFuck is able to recreate all JavaScript functionality using such a limited set of characters because JavaScript allows the evaluation of any expression as any type. [1]

Contents

History

In July 2009, Yosuke Hasegawa created a web application called jjencode which could encode arbitrary JavaScript into an obfuscated form utilizing only the 18 symbols []()!+,\"$.:;_{}~=. [2] [3] In January 2010, an informal competition was held in the "Obfuscation" forum of the sla.ckers.org web application security site to come up with a way to get the minimum number of characters required down to less than eight: []()!+,/. Contributors to the thread managed to eliminate the need for the , and / characters. [4] As of March 2010, an online encoder called JS-NoAlnum was available which utilized only the final set of six characters. [5] By the end of 2010, Hasegawa made a new encoder available named JSF*ck which also used only the minimum six characters. [6] [7] In 2012, Martin Kleppe created a "jsfuck" project on GitHub, [8] and a JSFuck.com website with a web app using that implementation of the encoder. [9]

JSFuck can be used to bypass detection of malicious code submitted on websites, e.g. in cross-site scripting (XSS) attacks. [10] Another potential use of JSFuck lies in code obfuscation. An optimized version of JSFuck has been used to encode jQuery, a JavaScript library, into a fully functional version written with just the six characters. [11]

Encoding methods

JSFuck code is extremely "verbose": In JavaScript, the code alert("Hello World!"), which causes a pop-up window to open with the text "Hello world", is 21 characters long. In JSFuck, the same code has a length of 4325 characters. [12] Certain single characters require far more than 1000 characters when expanded as JSFuck. This section offers an overview of how this expansion works.

Numbers

The number 0 is created by +[], where [] is the empty array and + is the unary plus, used to convert the right side to a numeric value (zero here). The number 1 is formed as +!![] or +!+[], where the boolean value true (expressed as !![] or !+[] in JSFuck) is converted into the numeric value 1 by the prepended plus sign. The digits 2 to 9 are formed by summing true the appropriate number of times. E.g. in JavaScript true + true = 2 and true = !![] = !+[], hence 2 can be written as !![]+!![] or !+[]+!+[]. Other digits follow a similar pattern. Integers consisting of two or more digits are written, as a string, by concatenating 1-digit arrays with the plus operator. For example, the string "10" can be expressed in JavaScript as [1] + [0]. By replacing the digits with the respective JSFuck expansions, this yields [+!+[]]+[+[]]. To get a numeric value instead of a string, one would enclose the previous expression in parentheses or square brackets and prepend a plus, yielding 10 = +([+!+[]]+[+[]]).

Letters

Some letters can be obtained in JSFuck by accessing single characters in the string representations of simple boolean or numeric values like "false", "true", "NaN", "undefined" with an indexer (a number in square brackets). Other tricks are needed to produce other letters – for example by casting the string 1e1000 into a number, which gives Infinity, which in turn makes the letter y accessible. [13]

The following is a list of primitive values used as building blocks to produce the most simple letters.

ValueJSFuck
false![]
true!![] or !+[]
NaN+[![]]
undefined[][[]]
Infinity+(+!+[]+(!+[]+[])[!+[]+!+[]+!+[]]+[+!+[]]+[+[]]+[+[]]+[+[]])

Example: Creating the letter "a"

"a": Taken from the string "false". The second character of "false" is a, which can be accessed with:

  1. "false"[1]. "false" can be made from false+[], i.e. the boolean constant false plus an empty array.
  2. (false+[])[1]: We write false as ![] (negation applied to an empty array).
  3. (![]+[])[1]: 1 is a number, we can write it as +true.
  4. (![]+[])[+true]: Since false is ![], true is !![].
  5. (![]+[])[+!![]] – which evaluates to "a".

Proof: In JavaScript, alert((![]+[])[+!![]]) does the same as alert("a"). [14]

Other constructs

The Function constructor can be used to trigger execution of JavaScript code contained in a string as if it were native JavaScript. So, for example, the statement alert(1) is equivalent to Function("alert(1)")(). The Function constructor can be retrieved in JSFuck by accessing the constructor property of a well known function, such as []["filter"] (Array.prototype.filter) or []["flat"] (Array.prototype.flat) in modern browsers. And then alert(1) becomes []["flat"]["constructor"]("alert(1)")().

Character table

The characters with the shortest JSFuck expansions are listed below. Other UTF-8 characters can be expressed as well but will generate considerably longer code.

CharacterJSFuck
+(+(+!+[]+(!+[]+[])[!+[]+!+[]+!+[]]+[+!+[]]+[+[]]+[+[]])+[])[!+[]+!+[]]
.(+(+!+[]+[+!+[]]+(!![]+[])[!+[]+!+[]+!+[]]+[!+[]+!+[]]+[+[]])+[])[+!+[]]
0+[]
1+!![]
or +!+[]
2!![]+!![]
or !+[]+!+[]
3!![]+!![]+!![]
or !+[]+!+[]+!+[]
4!![]+!![]+!![]+!![]
or !+[]+!+[]+!+[]+!+[]
5!![]+!![]+!![]+!![]+!![]
or !+[]+!+[]+!+[]+!+[]+!+[]
6!![]+!![]+!![]+!![]+!![]+!![]
or !+[]+!+[]+!+[]+!+[]+!+[]+!+[]
7!![]+!![]+!![]+!![]+!![]+!![]+!![]
or !+[]+!+[]+!+[]+!+[]+!+[]+!+[]+!+[]
8!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![]
or !+[]+!+[]+!+[]+!+[]+!+[]+!+[]+!+[]+!+[]
9!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![]
or !+[]+!+[]+!+[]+!+[]+!+[]+!+[]+!+[]+!+[]+!+[]
a(![]+[])[+!+[]]
b([]+{})[!![]+!![]]
c([]+[][(![]+[])[+!![]]+(!![]+[])[+[]]])[!![]+!![]+!![]]
d([][[]]+[])[!+[]+!+[]]
e(!![]+[])[!+[]+!+[]+!+[]]
f(![]+[])[+[]]
i([![]]+[][[]])[+!+[]+[+[]]]
I(+(+!+[]+(!+[]+[])[!+[]+!+[]+!+[]]+(+!+[])+(+[])+(+[])+(+[]))+[])[+[]]
j([]+{})[!![]+!![]+!![]]
l(![]+[])[!+[]+!+[]]
N(+[![]]+[])[+[]]
n([][[]]+[])[+!+[]]
o(!![]+[][(![]+[])[+!![]]+(!![]+[])[+[]]])[+!![]+[+[]]]
r(!+[]+[])[+!+[]]
s(![]+[])[!+[]+!+[]+!+[]]
t(!+[]+[])[+[]]
u([][[]]+[])[+[]]
y(+[![]]+[+(+!+[]+(!+[]+[])[!+[]+!+[]+!+[]]+(+!+[])+(+[])+(+[])+(+[]))])[+!+[]+[+[]]]

Security

Lacking the distinct features of "usual" JavaScript, obfuscation techniques like JSFuck can assist malicious JavaScript code in bypassing intrusion prevention systems [15] or content filters. For instance, the lack of alphanumeric characters in JSFuck and a flawed content filter allowed sellers to embed arbitrary JSFuck scripts in their eBay auction pages. [10]

See also

Related Research Articles

Brainfuck is an esoteric programming language created in 1993 by Urban Müller.

While Hypertext Markup Language (HTML) has been in use since 1991, HTML 4.0 from December 1997 was the first standardized version where international characters were given reasonably complete treatment. When an HTML document includes special characters outside the range of seven-bit ASCII, two goals are worth considering: the information's integrity, and universal browser display.

<span class="mw-page-title-main">JavaScript</span> High-level programming language

JavaScript, often abbreviated as JS, is a programming language and core technology of the World Wide Web, alongside HTML and CSS. As of 2023, 98.7% of websites use JavaScript on the client side for webpage behavior, often incorporating third-party libraries. All major web browsers have a dedicated JavaScript engine to execute the code on users' devices.

UTF-8 is a variable-length character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from Unicode Transformation Format – 8-bit.

<span class="mw-page-title-main">UTF-16</span> Variable-width encoding of Unicode, using one or two 16-bit code units

UTF-16 (16-bit Unicode Transformation Format) is a character encoding capable of encoding all 1,112,064 valid code points of Unicode (in fact this number of code points is dictated by the design of UTF-16). The encoding is variable-length, as code points are encoded with one or two 16-bit code units. UTF-16 arose from an earlier obsolete fixed-width 16-bit encoding, now known as UCS-2 (for 2-byte Universal Character Set), once it became clear that more than 216 (65,536) code points were needed.

<span class="mw-page-title-main">Character (computing)</span> Primitive data type

In computer and machine-based telecommunications terminology, a character is a unit of information that roughly corresponds to a grapheme, grapheme-like unit, or symbol, such as in an alphabet or syllabary in the written form of a natural language.

<span class="mw-page-title-main">Data type</span> Attribute of data

In computer science and computer programming, a data type is a collection or grouping of data values, usually specified by a set of possible values, a set of allowed operations on these values, and/or a representation of these values as machine types. A data type specification in a program constrains the possible values that an expression, such as a variable or a function call, might take. On literal data, it tells the compiler or interpreter how the programmer intends to use the data. Most programming languages support basic data types of integer numbers, floating-point numbers, characters and Booleans.

A string literal or anonymous string is a literal for a string value in the source code of a computer program. Modern programming languages commonly use a quoted sequence of characters, formally "bracketed delimiters", as in x = "foo", where "foo" is a string literal with value foo. Methods such as escape sequences can be used to avoid the problem of delimiter collision and allow the delimiters to be embedded in a string. There are many alternate notations for specifying string literals especially in complicated cases. The exact notation depends on the programming language in question. Nevertheless, there are general guidelines that most modern programming languages follow.

Cross-site scripting (XSS) is a type of security vulnerability that can be found in some web applications. XSS attacks enable attackers to inject client-side scripts into web pages viewed by other users. A cross-site scripting vulnerability may be used by attackers to bypass access controls such as the same-origin policy. Cross-site scripting carried out on websites accounted for roughly 84% of all security vulnerabilities documented by Symantec up until 2007. XSS effects vary in range from petty nuisance to significant security risk, depending on the sensitivity of the data handled by the vulnerable site and the nature of any security mitigation implemented by the site's owner network.

In computer science, primitive data types are a set of basic data types from which all other data types are constructed. Specifically it often refers to the limited set of data representations in use by a particular processor, which all compiled programs must use. Most processors support a similar set of primitive data types, although the specific representations vary. More generally, "primitive data types" may refer to the standard data types built into a programming language. Data types which are not primitive are referred to as derived or composite.

<span class="mw-page-title-main">ActionScript</span> Object-oriented programming language created for the Flash multimedia platform

ActionScript is an object-oriented programming language originally developed by Macromedia Inc.. It is influenced by HyperTalk, the scripting language for HyperCard. It is now an implementation of ECMAScript, though it originally arose as a sibling, both being influenced by HyperTalk. ActionScript code is usually converted to byte-code format by a compiler.

<span class="mw-page-title-main">Java syntax</span> Set of rules defining correctly structured program

The syntax of Java is the set of rules defining how a Java program is written and interpreted.

In computer science, a literal is a textual representation (notation) of a value as it is written in source code. Almost all programming languages have notations for atomic values such as integers, floating-point numbers, and strings, and usually for booleans and characters; some also have notations for elements of enumerated types and compound values such as arrays, records, and objects. An anonymous function is a literal for the function type.

This article compares Unicode encodings. Two situations are considered: 8-bit-clean environments, and environments that forbid use of byte values that have the high bit set. Originally such prohibitions were to allow for links that used only seven data bits, but they remain in some standards and so some standard-conforming software must generate messages that comply with the restrictions. Standard Compression Scheme for Unicode and Binary Ordered Compression for Unicode are excluded from the comparison tables because it is difficult to simply quantify their size.

<span class="mw-page-title-main">JavaScript syntax</span> Set of rules defining correctly structured programs

The syntax of JavaScript is the set of rules that define a correctly structured JavaScript program.

<span class="mw-page-title-main">Attribute (computing)</span> Metadata which defines a property

In computing, an attribute is a specification that defines a property of an object, element, or file. It may also refer to or set the specific value for a given instance of such. For clarity, attributes should more correctly be considered metadata. An attribute is frequently and generally a property of a property. However, in actual usage, the term attribute can and is often treated as equivalent to a property depending on the technology being discussed. An attribute of an object usually consists of a name and a value. For an element these can be a type and class name, while for a file these can be a name and an extension, respectively.

Action Message Format (AMF) is a binary format used to serialize object graphs such as ActionScript objects and XML, or send messages between an Adobe Flash client and a remote service, usually a Flash Media Server or third party alternatives. The Actionscript 3 language provides classes for encoding and decoding from the AMF format.

This article compares a large number of programming languages by tabulating their data types, their expression, statement, and declaration syntax, and some common operating-system interfaces.

<span class="mw-page-title-main">GNU Unifont</span> Duospaced bitmap font

GNU Unifont is a free Unicode bitmap font created by Roman Czyborra. The main Unifont covers all of the Basic Multilingual Plane (BMP). The "upper" companion covers significant parts of the Supplementary Multilingual Plane (SMP). The "Unifont JP" companion contains Japanese kanji present in the JIS X 0213 character set.

BSON is a computer data interchange format. The name "BSON" is based on the term JSON and stands for "Binary JSON". It is a binary form for representing simple or complex data structures including associative arrays, integer indexed arrays, and a suite of fundamental scalar types. BSON originated in 2009 at MongoDB. Several scalar data types are of specific interest to MongoDB and the format is used both as a data storage and network transfer format for the MongoDB database, but it can be used independently outside of MongoDB. Implementations are available in a variety of languages such as C, C++, C#, D, Delphi, Erlang, Go, Haskell, Java, JavaScript, Julia, Lua, OCaml, Perl, PHP, Python, Ruby, Rust, Scala, Smalltalk, and Swift.

References

  1. Jane Bailey/The Daily WTF: "Bidding on Security". http://thedailywtf.com/articles/bidding-on-security
  2. Hasegawa, Yosuke (2009-07-10). "jjencode - Encode any JavaScript program using only symbols". utf-8.jp. Archived from the original on 2009-07-15. Retrieved 2017-10-25.
  3. Hasegawa, Yosuke (July 2009). "UTF-8.jp [2009-07-28]". utf-8.jp. Archived from the original on 2009-07-28. Retrieved 2017-10-25.
  4. "Yet Another Useless Contest (but fun!) Less chars needed to run arbitrary JS code". sla.ckers.org. 2010-01-14. Archived from the original on 2011-03-01. Retrieved 2017-10-25.
  5. "js-noalnum_com.php". discogscounter.getfreehosting.co.uk. Archived from the original on 2010-03-01. Retrieved 2017-10-25.
  6. Aiko, Kenji (November 2010). "JSF*ck - []()!+". utf-8.jp. Archived from the original on 2010-12-01. Retrieved 2017-10-25.
  7. Hasegawa, Yosuke (November 2010). "UTF-8.jp [2010-11-30]". utf-8.jp. Archived from the original on 2010-11-30. Retrieved 2017-10-25.
  8. Kleppe, Martin (2012-07-16). "Commits · aemkei/jsfuck". github.com. Retrieved 2017-10-25.
  9. Kleppe, Martin (September 2012). "Site report for www.jsfuck.com". toolbar.netcraft.com. Retrieved 2017-10-25.
  10. 1 2 Dan Goodin (3 February 2016). "eBay has no plans to fix "severe" bug that allows malware distribution [Updated]". Ars Technica.
  11. https://github.com/fasttime/jquery-screwed jQuery JavaScript library made of only six different characters: ! ( ) + [ ]
  12. "JScrewIt". JScrewIt. Retrieved 13 June 2021.
  13. http://patriciopalladino.com/blog/2012/08/09/non-alphanumeric-javascript.html "Brainfuck Beware: JavaScript is after you!"
  14. Adapted from: https://esolangs.org/wiki/JSFuck
  15. Ré Medina, Matías A. (2012-09). Bypassing WAFs with non-alphanumeric XSS. Retrieved from http://blog.infobytesec.com/2012/09/bypassing-wafs-with-non-alphanumeric-xss.html.
  16. Easter, Brandee (2020-04-02). "Fully Human, Fully Machine: Rhetorics of Digital Disembodiment in Programming". Rhetoric Review. 39 (2): 202–215. doi:10.1080/07350198.2020.1727096. ISSN   0735-0198. S2CID   219665562.