Autovivification

Last updated

In the Perl programming language, autovivification is the automatic creation of new arrays and hashes as required every time an undefined value is dereferenced. Perl autovivification allows a programmer to refer to a structured variable, and arbitrary sub-elements of that structured variable, without expressly declaring the existence of the variable and its complete structure beforehand. [1]

Contents

In contrast, other programming languages either: 1) require a programmer to expressly declare an entire variable structure before using or referring to any part of it; or 2) require a programmer to declare a part of a variable structure before referring to any part of it; or 3) create an assignment to a part of a variable before referring, assigning to or composing an expression that refers to any part of it.

Perl autovivification can be contrasted against languages such as Python, PHP, Ruby, and many of the C style languages, where dereferencing null or undefined values is not generally permitted. [lower-alpha 1] It can be compared to the HTML standard's "named access on the window object" [2] which results in corresponding globally scoped variables being automatically accessible to browser-based JavaScript.

Hashes

It is important to remember that autovivification happens when an undefined value is dereferenced. An assignment is not necessary. The debugger session below illustrates autovivification of a hash just from examining it:

DB<1>x\%h0HASH(0x2f1a248)emptyhashDB<2>x$h{1}{2}{3}{4}0undefDB<3>x\%h0HASH(0x2f1a248)1=>HASH(0x2f1a260)2=>HASH(0x29a3c68)3=>HASH(0x2dc3038)emptyhashDB<4>

The debugger session below illustrates autovivification of a hash from assigning to an inner hash:

DB<1>$h{A}{B}{C}{D}=1DB<2>x\%h0HASH(0x83c71ac)'A'=>HASH(0x837d50c)'B'=>HASH(0x83c71e8)'C'=>HASH(0x83c7218)'D'=>1DB<3>

Hashes several layers deep were created automatically without any declarations. Autovivification can prevent excessive typing. If Perl did not support autovivification, the structure above would have to be created as follows:

DB<1>%h=(A=>{B=>{C=>{D=>1}}})DB<2>x\%h0HASH(0x83caba4)'A'=>HASH(0x83cfc28)'B'=>HASH(0x83cab74)'C'=>HASH(0x83b6110)'D'=>1DB<3>

File and directory handles

Perl 5.6.1 and newer support autovivification of file and directory handles. [3] Calling open() on an undefined variable will set it to a filehandle. According to perl561delta, "[t]his largely eliminates the need for typeglobs when opening filehandles that must be passed around, as in the following example:

formy$file(qw(this.conf that.conf)){my$fin=open_or_throw('<',$file);process_conf($fin);# no close() needed}useCarp;subopen_or_throw{my($mode,$filename)=@_;openmy$h,$mode,$filenameorcroak"Could not open '$filename': $!";return$h;}

Emulation in other programming languages

C++

The C++ Standard Library's associative containers (std::unordered_map and std::map) use operator[] to get the value associated to a key. If there is nothing associated to this key, it will construct it and value initialize [4] the value. For simple types like int or float, the value initialization will be zero initialization.

std::unordered_map<std::string,std::vector<int>>a;a["the answer"].push_back(42);// Autovivifies the a["the answer"] vector and then appends to it.

Another example of counting occurrences of strings:

std::unordered_map<std::string,int>counts;while(auto&s=GetNextString()){counts[s]++;// creates counts[s] if it doesn't exist and set to zero, then increment.}

A similar trick can be achieved with the insert() method, which returns an iterator to the element associated to the key, even if it already exists.

Python

Python's built-in dict class can be subclassed to implement autovivificious dictionaries simply by overriding the __missing__() method that was added to the class in Python v2.5. [5] There are other ways of implementing the behavior, [6] [7] but the following is one of the simplest and instances of the class print just like normal Python dictionary objects.

>>> classTree(dict):... def__missing__(self,key):... value=self[key]=type(self)()... returnvalue>>> # Common names by class, order, genus, and type-species>>> common_names=Tree()>>> common_names['Mammalia']['Primates']['Homo']['H. sapiens']='human being'>>> common_names{'Mammalia': {'Primates': {'Homo': {'H. sapiens': 'human being'}}}}>>> # Famous quotes by play, act, scene, and page>>> quotes=Tree()>>> quotes['Hamlet'][1][3][3]='This above all: to thine own self be true.'>>> quotes{'Hamlet': {1: {3: {3: 'This above all: to thine own self be true.'}}}}

Ruby

Ruby hashes can take a block specifying an object to be returned for non-existing indexes. These can be used to implement autovivificious maps.

irb(main):001:0> tree=proc{Hash.new{|hash,key|hash[key]=tree.call}}=> #<Proc:0x007fda528749a0@(irb):1>irb(main):002:0> lupin=tree.call=> {}irb(main):003:0> lupin["express"][3]="stand and deliver"=> "stand and deliver"irb(main):004:0> lupin=> {"express"=>{3=>"stand and deliver"}}

Java

Java Map has a method computeIfAbsent [8] that can be used to emulate autovivificous maps.

publicstatic<K,V>Function<K,V>defaultDict(Map<K,V>map,Supplier<?extendsV>supplier){returnkey->map.computeIfAbsent(key,k->supplier.get());}publicstaticvoidmain(String[]args){Function<String,List<String>>dict=defaultDict(newHashMap<>(),ArrayList::new);dict.apply("foo").add("bar");}

PHP

PHP arrays are natively autovivificious.

$arr=array();$arr["express"][3]="stand and deliver";

However, this only applies to assignment, and not array access.

JavaScript

ES6 introduces a new Proxy class that can be used to implement autovivification. With other features of JavaScript, this can be reduced to a single line of code:

vartree=()=>newProxy({},{get:(target,name)=>nameintarget?target[name]:target[name]=tree()});// Test:vart=tree();t.first.second.third='text';console.log(t.first.second.third);// or t['first']['second']['third']

C#

C#, using indexers and C# 4.0 dynamics,

classTree{privateIDictionary<string,object>_dict=newDictionary<string,object>();publicdynamicthis[stringkey]{get{return_dict.ContainsKey(key)?_dict[key]:_dict[key]=newTree();}set{_dict[key]=value;}}}// Test:vart=newTree();t["first"]["second"]["third"]="text";Console.WriteLine(t["first"]["second"]["third"]);

DynamicObject can be used for implementing different syntaxes also,

usingSystem;usingSystem.Collections.Generic;usingSystem.Dynamic;classTree:DynamicObject{privateIDictionary<object,object>dict=newDictionary<object,object>();// for t.first.second.third syntaxpublicoverrideboolTryGetMember(GetMemberBinderbinder,outobjectresult){varkey=binder.Name;if(dict.ContainsKey(key))result=dict[key];elsedict[key]=result=newTree();returntrue;}publicoverrideboolTrySetMember(SetMemberBinderbinder,objectvalue){dict[binder.Name]=value;returntrue;}// for t["first"]["second"]["third"] syntaxpublicoverrideboolTryGetIndex(GetIndexBinderbinder,object[]indexes,outobjectresult){varkey=indexes[0];if(dict.ContainsKey(key))result=dict[key];elsedict[key]=result=newTree();returntrue;}publicoverrideboolTrySetIndex(SetIndexBinderbinder,object[]indexes,objectvalue){dict[indexes[0]]=value;returntrue;}}// Test:dynamict=newTree();t.first.second.third="text";Console.WriteLine(t.first.second.third);// or,dynamict=newTree();t["first"]["second"]["third"]="text";Console.WriteLine(t["first"]["second"]["third"]);

See also

Notes

  1. For example, Python raises an TypeError if None.__getitem__ is called. Dereferencing a null pointer in C results in undefined behavior; many C implementations choose to raise a segmentation fault.

Related Research Articles

<span class="mw-page-title-main">Ruby (programming language)</span> General-purpose programming language

Ruby is an interpreted, high-level, general-purpose programming language which supports multiple programming paradigms. It was designed with an emphasis on programming productivity and simplicity. In Ruby, everything is an object, including primitive data types. It was developed in the mid-1990s by Yukihiro "Matz" Matsumoto in Japan.

In computer science, an associative array, map, symbol table, or dictionary is an abstract data type that stores a collection of pairs, such that each possible key appears at most once in the collection. In mathematical terms, an associative array is a function with finite domain. It supports 'lookup', 'remove', and 'insert' operations.

In computer programming, an iterator is an object that enables a programmer to traverse a container, particularly lists. Various types of iterators are often provided via a container's interface. Though the interface and semantics of a given iterator are fixed, iterators are often implemented in terms of the structures underlying a container implementation and are often tightly coupled to the container to enable the operational semantics of the iterator. An iterator performs traversal and also gives access to data elements in a container, but does not itself perform iteration.

In object-oriented (OO) and functional programming, an immutable object is an object whose state cannot be modified after it is created. This is in contrast to a mutable object, which can be modified after it is created. In some cases, an object is considered immutable even if some internally used attributes change, but the object's state appears unchanging from an external point of view. For example, an object that uses memoization to cache the results of expensive computations could still be considered an immutable object.

In computer programming, a reference is a value that enables a program to indirectly access a particular data, such as a variable's value or a record, in the computer's memory or in some other storage device. The reference is said to refer to the datum, and accessing the datum is called dereferencing the reference. A reference is distinct from the datum itself.

A list comprehension is a syntactic construct available in some programming languages for creating a list based on existing lists. It follows the form of the mathematical set-builder notation as distinct from the use of map and filter functions.

In computing, type introspection is the ability of a program to examine the type or properties of an object at runtime. Some programming languages possess this capability.

<span class="mw-page-title-main">Pointer (computer programming)</span> Object which stores memory addresses in a computer program

In computer science, a pointer is an object in many programming languages that stores a memory address. This can be that of another value located in computer memory, or in some cases, that of memory-mapped computer hardware. A pointer references a location in memory, and obtaining the value stored at that location is known as dereferencing the pointer. As an analogy, a page number in a book's index could be considered a pointer to the corresponding page; dereferencing such a pointer would be done by flipping to the page with the given page number and reading the text found on that page. The actual format and content of a pointer variable is dependent on the underlying computer architecture.

In computer programming, a function object is a construct allowing an object to be invoked or called as if it were an ordinary function, usually with the same syntax. In some languages, particularly C++, function objects are often called functors.

<span class="mw-page-title-main">Foreach loop</span> Control flow statement for traversing items in a collection

In computer programming, foreach loop is a control flow statement for traversing items in a collection. foreach is usually used in place of a standard for loop statement. Unlike other for loop constructs, however, foreach loops usually maintain no explicit counter: they essentially say "do this to everything in this set", rather than "do this x times". This avoids potential off-by-one errors and makes code simpler to read. In object-oriented languages, an iterator, even if implicit, is often used as the means of traversal.

In computer programming, the ternary conditional operator is a ternary operator that is part of the syntax for basic conditional expressions in several programming languages. It is commonly referred to as the conditional operator, ternary if, or inline if. An expression a ? b : c evaluates to b if the value of a is true, and otherwise to c. One can read it aloud as "if a then b otherwise c". The form a ? b : c is by far and large the most common, but alternative syntaxes do exist; for example, Raku uses the syntax a ?? b !! c to avoid confusion with the infix operators ? and !, whereas in Visual Basic .NET, it instead takes the form If(a, b, c).

A bit array is an array data structure that compactly stores bits. It can be used to implement a simple set data structure. A bit array is effective at exploiting bit-level parallelism in hardware to perform operations quickly. A typical bit array stores kw bits, where w is the number of bits in the unit of storage, such as a byte or word, and k is some nonnegative integer. If w does not divide the number of bits to be stored, some space is wasted due to internal fragmentation.

In computer programming, thread-local storage (TLS) is a memory management method that uses static or global memory local to a thread.

In computer programming, an anonymous function is a function definition that is not bound to an identifier. Anonymous functions are often arguments being passed to higher-order functions or used for constructing the result of a higher-order function that needs to return a function. If the function is only used once, or a limited number of times, an anonymous function may be syntactically lighter than using a named function. Anonymous functions are ubiquitous in functional programming languages and other languages with first-class functions, where they fulfil the same role for the function type as literals do for other data types.

The null coalescing operator is a binary operator that is part of the syntax for a basic conditional expression in several programming languages, including C#, PowerShell as of version 7.0.0, Perl as of version 5.10, Swift, and PHP 7.0.0. While its behavior differs between implementations, the null coalescing operator generally returns the result of its left-most operand if it exists and is not null, and otherwise returns the right-most operand. This behavior allows a default value to be defined for cases where a more specific value is not available.

This Comparison of programming languages (associative arrays) compares the features of associative array data structures or array-lookup processing for over 40 computer programming languages.

This comparison of programming languages compares how object-oriented programming languages such as C++, Java, Smalltalk, Object Pascal, Perl, Python, and others manipulate data structures.

In computer science, array is a data type that represents a collection of elements, each selected by one or more indices that can be computed at run time during program execution. Such a collection is usually called an array variable or array value. By analogy with the mathematical concepts vector and matrix, array types with one and two indices are often called vector type and matrix type, respectively. More generally, a multidimensional array type can be called a tensor type, by analogy with the physical concept, tensor.

The structure of the Perl programming language encompasses both the syntactical rules of the language and the general ways in which programs are organized. Perl's design philosophy is expressed in the commonly cited motto "there's more than one way to do it". As a multi-paradigm, dynamically typed language, Perl allows a great degree of flexibility in program design. Perl also encourages modularization; this has been attributed to the component-based design structure of its Unix roots, and is responsible for the size of the CPAN archive, a community-maintained repository of more than 100,000 modules.

The syntax of the Ruby programming language is broadly similar to that of Perl and Python. Class and method definitions are signaled by keywords, whereas code blocks can be defined by either keywords or braces. In contrast to Perl, variables are not obligatorily prefixed with a sigil. When used, the sigil changes the semantics of scope of the variable. For practical purposes there is no distinction between expressions and statements. Line breaks are significant and taken as the end of a statement; a semicolon may be equivalently used. Unlike Python, indentation is not significant.

References

  1. Schwartz, Randal L.; Phoenix, Tom (2003). Learning Perl Objects . O'Reilly Media, Inc. p.  42. ISBN   9780596004781. This process is called autovivification. Any nonexisting variable, or a variable containing undef, which is dereferenced while looking for a variable location (technically called an lvalue context), is automatically stuffed with the appropriate reference to an empty item...
  2. "HTML Standard. Named access on the Window object".
  3. "perl561delta - what's new for perl v5.6.1". Perl Programming Documentation.
  4. "Value initialization", C++ reference (wiki)
  5. "Mapping Types — dict" . Retrieved 2016-06-13.
  6. "What is the best way to implement nested dictionaries in Python?" . Retrieved 2016-06-13.
  7. "One-line Tree in Python" . Retrieved 2017-12-27.
  8. "Map (Java Platform SE 8)" . Retrieved 2015-05-17.