Autovivification

Last updated

In the Perl programming language, autovivification is the automatic creation of new arrays and hashes as required every time an undefined value is dereferenced. Perl autovivification allows a programmer to refer to a structured variable, and arbitrary sub-elements of that structured variable, without expressly declaring the existence of the variable and its complete structure beforehand. [1]

Contents

In contrast, other programming languages either:

  1. Require a programmer to expressly declare an entire variable structure before using or referring to any part of it; or
  2. Require a programmer to declare a part of a variable structure before referring to any part of it; or
  3. Create an assignment to a part of a variable before referring, assigning to or composing an expression that refers to any part of it.

Perl autovivification can be contrasted against languages such as Python, PHP, Ruby, and many of the C style languages, where dereferencing null or undefined values is not generally permitted. [lower-alpha 1] It can be compared to the HTML standard's "named access on the window object" [2] which results in corresponding globally scoped variables being automatically accessible to browser-based JavaScript.

Hashes

It is important to remember that autovivification happens when an undefined value is dereferenced. An assignment is not necessary. The debugger session below illustrates autovivification of a hash just from examining it:

DB<1>x\%h0HASH(0x2f1a248)emptyhashDB<2>x$h{1}{2}{3}{4}0undefDB<3>x\%h0HASH(0x2f1a248)1=>HASH(0x2f1a260)2=>HASH(0x29a3c68)3=>HASH(0x2dc3038)emptyhashDB<4>

The debugger session below illustrates autovivification of a hash from assigning to an inner hash:

DB<1>$h{A}{B}{C}{D}=1DB<2>x\%h0HASH(0x83c71ac)'A'=>HASH(0x837d50c)'B'=>HASH(0x83c71e8)'C'=>HASH(0x83c7218)'D'=>1DB<3>

Hashes several layers deep were created automatically without any declarations. Autovivification can prevent excessive typing. If Perl did not support autovivification, the structure above would have to be created as follows:

DB<1>%h=(A=>{B=>{C=>{D=>1}}})DB<2>x\%h0HASH(0x83caba4)'A'=>HASH(0x83cfc28)'B'=>HASH(0x83cab74)'C'=>HASH(0x83b6110)'D'=>1DB<3>

File and directory handles

Perl 5.6.1 and newer support autovivification of file and directory handles. [3] Calling open() on an undefined variable will set it to a filehandle. According to perl561delta, "[t]his largely eliminates the need for typeglobs when opening filehandles that must be passed around, as in the following example:

formy$file(qw(this.conf that.conf)){my$fin=open_or_throw('<',$file);process_conf($fin);# no close() needed}useCarp;subopen_or_throw{my($mode,$filename)=@_;openmy$h,$mode,$filenameorcroak"Could not open '$filename': $!";return$h;}

Emulation in other programming languages

C++

The C++ Standard Library's associative containers (std::unordered_map and std::map) use operator[] to get the value associated to a key. If there is nothing associated to this key, it will construct it and value initialize [4] [ unreliable source ][ failed verification ] the value. For simple types like int or float, the value initialization will be zero initialization.

std::unordered_map<std::string,std::vector<int>>a;a["the answer"].push_back(42);// Autovivifies the a["the answer"] vector and then appends to it.

Another example of counting occurrences of strings:

std::unordered_map<std::string,int>counts;while(auto&s=GetNextString()){counts[s]++;// creates counts[s] if it doesn't exist and set to zero, then increment.}

A similar trick can be achieved with the insert() method, which returns an iterator to the element associated to the key, even if it already exists.

Python

Python's built-in dict class can be subclassed to implement autovivificious dictionaries simply by overriding the __missing__() method that was added to the class in Python v2.5. [5] There are other ways of implementing the behavior, [6] [7] but the following is one of the simplest and instances of the class print just like normal Python dictionary objects.

>>> classTree(dict):... def__missing__(self,key):... value=self[key]=type(self)()... returnvalue>>> # Common names by class, order, genus, and type-species>>> common_names=Tree()>>> common_names['Mammalia']['Primates']['Homo']['H. sapiens']='human being'>>> common_names{'Mammalia': {'Primates': {'Homo': {'H. sapiens': 'human being'}}}}>>> # Famous quotes by play, act, scene, and page>>> quotes=Tree()>>> quotes['Hamlet'][1][3][3]='This above all: to thine own self be true.'>>> quotes{'Hamlet': {1: {3: {3: 'This above all: to thine own self be true.'}}}}

Ruby

Ruby hashes can take a block specifying an object to be returned for non-existing indexes. These can be used to implement autovivificious maps.

irb(main):001:0> tree=proc{Hash.new{|hash,key|hash[key]=tree.call}}=> #<Proc:0x007fda528749a0@(irb):1>irb(main):002:0> lupin=tree.call=> {}irb(main):003:0> lupin["express"][3]="stand and deliver"=> "stand and deliver"irb(main):004:0> lupin=> {"express"=>{3=>"stand and deliver"}}

Java

Java Map has a method computeIfAbsent [8] that can be used to emulate autovivificous maps.

publicstatic<K,V>Function<K,V>defaultDict(Map<K,V>map,Supplier<?extendsV>supplier){returnkey->map.computeIfAbsent(key,k->supplier.get());}publicstaticvoidmain(String[]args){Function<String,List<String>>dict=defaultDict(newHashMap<>(),ArrayList::new);dict.apply("foo").add("bar");}

PHP

PHP arrays are natively autovivificious.

$arr=array();$arr["express"][3]="stand and deliver";

However, this only applies to assignment, and not array access.

JavaScript

ES6 introduces a new Proxy class that can be used to implement autovivification. With other features of JavaScript, this can be reduced to a single line of code:

vartree=()=>newProxy({},{get:(target,name)=>nameintarget?target[name]:target[name]=tree()});// Test:vart=tree();t.first.second.third='text';console.log(t.first.second.third);// or t['first']['second']['third']

C#

C#, using indexers and C# 4.0 dynamics,

classTree{privateIDictionary<string,object>_dict=newDictionary<string,object>();publicdynamicthis[stringkey]{get{return_dict.ContainsKey(key)?_dict[key]:_dict[key]=newTree();}set{_dict[key]=value;}}}// Test:vart=newTree();t["first"]["second"]["third"]="text";Console.WriteLine(t["first"]["second"]["third"]);

DynamicObject can be used for implementing different syntaxes also,

usingSystem;usingSystem.Collections.Generic;usingSystem.Dynamic;classTree:DynamicObject{privateIDictionary<object,object>dict=newDictionary<object,object>();// for t.first.second.third syntaxpublicoverrideboolTryGetMember(GetMemberBinderbinder,outobjectresult){varkey=binder.Name;if(dict.ContainsKey(key))result=dict[key];elsedict[key]=result=newTree();returntrue;}publicoverrideboolTrySetMember(SetMemberBinderbinder,objectvalue){dict[binder.Name]=value;returntrue;}// for t["first"]["second"]["third"] syntaxpublicoverrideboolTryGetIndex(GetIndexBinderbinder,object[]indexes,outobjectresult){varkey=indexes[0];if(dict.ContainsKey(key))result=dict[key];elsedict[key]=result=newTree();returntrue;}publicoverrideboolTrySetIndex(SetIndexBinderbinder,object[]indexes,objectvalue){dict[indexes[0]]=value;returntrue;}}// Test:dynamict=newTree();t.first.second.third="text";Console.WriteLine(t.first.second.third);// or,dynamict=newTree();t["first"]["second"]["third"]="text";Console.WriteLine(t["first"]["second"]["third"]);

See also

Notes

  1. For example, Python raises an TypeError if None.__getitem__ is called. Dereferencing a null pointer in C results in undefined behavior; many C implementations choose to raise a segmentation fault.

Related Research Articles

<span class="mw-page-title-main">Hash table</span> Associative array for storing key-value pairs

In computing, a hash table is a data structure that implements an associative array, also called a dictionary or simply map; an associative array is an abstract data type that maps keys to values. A hash table uses a hash function to compute an index, also called a hash code, into an array of buckets or slots, from which the desired value can be found. During lookup, the key is hashed and the resulting hash indicates where the corresponding value is stored. A map implemented by a hash table is called a hash map.

<span class="mw-page-title-main">Ruby (programming language)</span> General-purpose programming language

Ruby is an interpreted, high-level, general-purpose programming language. It was designed with an emphasis on programming productivity and simplicity. In Ruby, everything is an object, including primitive data types. It was developed in the mid-1990s by Yukihiro "Matz" Matsumoto in Japan.

In computer science, an associative array, map, symbol table, or dictionary is an abstract data type that stores a collection of pairs, such that each possible key appears at most once in the collection. In mathematical terms, an associative array is a function with finite domain. It supports 'lookup', 'remove', and 'insert' operations.

In computer programming, an iterator is an object that progressively provides access to each item of a collection, in order.

In object-oriented (OO) and functional programming, an immutable object is an object whose state cannot be modified after it is created. This is in contrast to a mutable object, which can be modified after it is created. In some cases, an object is considered immutable even if some internally used attributes change, but the object's state appears unchanging from an external point of view. For example, an object that uses memoization to cache the results of expensive computations could still be considered an immutable object.

In computer programming, a reference is a value that enables a program to indirectly access a particular datum, such as a variable's value or a record, in the computer's memory or in some other storage device. The reference is said to refer to the datum, and accessing the datum is called dereferencing the reference. A reference is distinct from the datum itself.

In computer science, primitive data types are a set of basic data types from which all other data types are constructed. Specifically it often refers to the limited set of data representations in use by a particular processor, which all compiled programs must use. Most processors support a similar set of primitive data types, although the specific representations vary. More generally, "primitive data types" may refer to the standard data types built into a programming language. Data types which are not primitive are referred to as derived or composite.

In computing, type introspection is the ability of a program to examine the type or properties of an object at runtime. Some programming languages possess this capability.

<span class="mw-page-title-main">C syntax</span> Set of rules defining correctly structured programs

The syntax of the C programming language is the set of rules governing writing of software in C. It is designed to allow for programs that are extremely terse, have a close relationship with the resulting object code, and yet provide relatively high-level data abstraction. C was the first widely successful high-level language for portable operating-system development.

In computer programming, a function object is a construct allowing an object to be invoked or called as if it were an ordinary function, usually with the same syntax. In some languages, particularly C++, function objects are often called functors.

<span class="mw-page-title-main">Foreach loop</span> Control flow statement for traversing items in a collection

In computer programming, foreach loop is a control flow statement for traversing items in a collection. foreach is usually used in place of a standard for loop statement. Unlike other for loop constructs, however, foreach loops usually maintain no explicit counter: they essentially say "do this to everything in this set", rather than "do this x times". This avoids potential off-by-one errors and makes code simpler to read. In object-oriented languages, an iterator, even if implicit, is often used as the means of traversal.

In computer programming, the ternary conditional operator is a ternary operator that is part of the syntax for basic conditional expressions in several programming languages. It is commonly referred to as the conditional operator, conditional expression, ternary if, or inline if. An expression if a then b else c or a ? b : c evaluates to b if the value of a is true, and otherwise to c. One can read it aloud as "if a then b otherwise c". The form a ? b : c is the most common, but alternative syntax do exist; for example, Raku uses the syntax a ?? b !! c to avoid confusion with the infix operators ? and !, whereas in Visual Basic .NET, it instead takes the form If(a, b, c).

<span class="mw-page-title-main">Dynamic array</span> List data structure to which elements can be added/removed

In computer science, a dynamic array, growable array, resizable array, dynamic table, mutable array, or array list is a random access, variable-size list data structure that allows elements to be added or removed. It is supplied with standard libraries in many modern mainstream programming languages. Dynamic arrays overcome a limit of static arrays, which have a fixed capacity that needs to be specified at allocation.

In computer programming, a sigil is a symbol affixed to a variable name, showing the variable's datatype or scope, usually a prefix, as in $foo, where $ is the sigil.

In computer programming, thread-local storage (TLS) is a memory management method that uses static or global memory local to a thread. The concept allows storage of data that appears to be global in a system with separate threads.

The null coalescing operator is a binary operator that is part of the syntax for a basic conditional expression in several programming languages, such as : C# since version 2.0, Dart since version 1.12.0, PHP since version 7.0.0, Perl since version 5.10 as logical defined-or, PowerShell since 7.0.0, and Swift as nil-coalescing operator.

This comparison of programming languages (associative arrays) compares the features of associative array data structures or array-lookup processing for over 40 computer programming languages.

This comparison of programming languages compares how object-oriented programming languages such as C++, Java, Smalltalk, Object Pascal, Perl, Python, and others manipulate data structures.

A symbol in computer programming is a primitive data type whose instances have a human-readable form. Symbols can be used as identifiers. In some programming languages, they are called atoms. Uniqueness is enforced by holding them in a symbol table. The most common use of symbols by programmers is to perform language reflection, and the most common indirectly is their use to create object linkages.

The syntax of the Ruby programming language is broadly similar to that of Perl and Python. Class and method definitions are signaled by keywords, whereas code blocks can be defined by either keywords or braces. In contrast to Perl, variables are not obligatorily prefixed with a sigil. When used, the sigil changes the semantics of scope of the variable. For practical purposes there is no distinction between expressions and statements. Line breaks are significant and taken as the end of a statement; a semicolon may be equivalently used. Unlike Python, indentation is not significant.

References

  1. Schwartz, Randal L.; Phoenix, Tom (2003). Learning Perl Objects . O'Reilly Media, Inc. p.  42. ISBN   9780596004781. This process is called autovivification. Any nonexisting variable, or a variable containing undef, which is dereferenced while looking for a variable location (technically called an lvalue context), is automatically stuffed with the appropriate reference to an empty item...
  2. "HTML Standard. Named access on the Window object".
  3. "perl561delta - what's new for perl v5.6.1". Perl Programming Documentation.
  4. "Value initialization", C++ reference (wiki)
  5. "Mapping Types — dict" . Retrieved 2016-06-13.
  6. "What is the best way to implement nested dictionaries in Python?" . Retrieved 2016-06-13.
  7. "One-line Tree in Python" . Retrieved 2017-12-27.
  8. "Map (Java Platform SE 8)" . Retrieved 2015-05-17.