Zipping (computer science)

Last updated

In computer science, zipping is a function which maps a tuple of sequences into a sequence of tuples. This name zip derives from the action of a zipper in that it interleaves two formerly disjoint sequences. The inverse function is unzip.

Contents

Example

Given the three words cat, fish and be where |cat| is 3, |fish| is 4 and |be| is 2. Let denote the length of the longest word which is fish; . The zip of cat, fish, be is then 4 tuples of elements:

where # is a symbol not in the original alphabet. In Haskell this truncates to the shortest sequence , where :

zip3"cat""fish""be"-- [('c','f','b'),('a','i','e')]

Definition

Let Σ be an alphabet, # a symbol not in Σ.

Let x1x2... x|x|, y1y2... y|y|, z1z2... z|z|, ... be n words (i.e. finite sequences) of elements of Σ. Let denote the length of the longest word, i.e. the maximum of |x|, |y|, |z|, ... .

The zip of these words is a finite sequence of n-tuples of elements of (Σ ∪ {#}), i.e. an element of :

,

where for any index i > |w|, the wi is #.

The zip of x, y, z, ... is denoted zip(x, y, z, ...) or xyz ⋆ ...

The inverse to zip is sometimes denoted unzip.

A variation of the zip operation is defined by:

where is the minimum length of the input words. It avoids the use of an adjoined element , but destroys information about elements of the input sequences beyond .

In programming languages

Zip functions are often available in programming languages, often referred to as zip. In Lisp-dialects one can simply map the desired function over the desired lists, map is variadic in Lisp so it can take an arbitrary number of lists as argument. An example from Clojure: [1]

;; `nums' contains an infinite list of numbers (0 1 2 3 ...)(def nums(range))(def tens[102030])(def firstname"Alice");; To zip (0 1 2 3 ...) and [10 20 30] into a vector, invoke `map vector' on them; same with list(map vector numstens); ⇒ ([0 10] [1 20] [2 30])(map list numstens); ⇒ ((0 10) (1 20) (2 30))(map str numstens); ⇒ ("010" "120" "230");; `map' truncates to the shortest sequence; note missing \c and \e from "Alice"(map vector numstensfirstname); ⇒ ([0 10 \A] [1 20 \l] [2 30 \i])(map str numstensfirstname); ⇒ ("010A" "120l" "230i");; To unzip, apply `map vector' or `map list'(apply map list (map vector numstensfirstname));; ⇒ ((0 1 2) (10 20 30) (\A \l \i))

In Common Lisp:

(defparameternums'(123))(defparametertens'(102030))(defparameterfirstname"Alice")(mapcar#'listnumstens);; ⇒ ((1 10) (2 20) (3 30))(mapcar#'listnumstens(coercefirstname'list));; ⇒ ((1 10 #\A) (2 20 #\l) (3 30 #\i)) — truncates on shortest list;; Unzips(apply#'mapcar#'list(mapcar#'listnumstens(coercefirstname'list)));; ⇒ ((1 2 3) (10 20 30) (#\A #\l #\i))

Languages such as Python provide a zip() function. [2] zip() in conjunction with the * operator unzips a list: [2]

>>> nums=[1,2,3]>>> tens=[10,20,30]>>> firstname='Alice'>>> zipped=list(zip(nums,tens))>>> zipped[(1, 10), (2, 20), (3, 30)]>>> list(zip(*zipped))# unzip[(1, 2, 3), (10, 20, 30)]>>> zipped2=list(zip(nums,tens,list(firstname)))>>> zipped2# zip, truncates on shortest[(1, 10, 'A'), (2, 20, 'l'), (3, 30, 'i')] >>> list(zip(*zipped2))# unzip[(1, 2, 3), (10, 20, 30), ('A', 'l', 'i')]

Haskell has a method of zipping sequences but requires a specific function for each arity (zip for two sequences, zip3 for three etc.), [3] similarly the functions unzip and unzip3 are available for unzipping:

-- nums contains an infinite list of numbers [1, 2, 3, ...] nums=[1..]tens=[10,20,30]firstname="Alice"zipnumstens-- ⇒ [(1,10), (2,20), (3,30)] — zip, truncates infinite listunzip$zipnumstens-- ⇒ ([1,2,3], [10,20,30]) — unzipzip3numstensfirstname-- ⇒ [(1,10,'A'), (2,20,'l'), (3,30,'i')] — zip, truncatesunzip3$zip3numstensfirstname-- ⇒ ([1,2,3], [10,20,30], "Ali") — unzip

Language comparison

List of languages by support of zip:

Zip in various languages
LanguageZipZip 3 listsZip n listsNotes
Chapel zip (iter1iter2)zip (iter1iter2iter3)zip (iter1 ... itern)The shape of each iterator, the rank and the extents in each dimension, must be identical. [4]
Clojure (map list list1list2)
(map vector list1list2)
(map list list1list2list3)
(map vector list1list2list3)
(map list list1listn)
(map vector list1listn)
Stops after the length of the shortest list.
Common Lisp (mapcar#'listlist1list2)(mapcar#'listlist1list2list3)(mapcar#'listlist1...listn)Stops after the length of the shortest list.
D zip(range1, range2)
range1.zip(range2)
zip(range1, range2,range3)
range1.zip(range2, range3)
zip(range1, …, rangeN)
range1.zip(…, rangeN)
The stopping policy defaults to shortest and can be optionally provided as shortest, longest, or requiring the same length. [5] The second form is an example of UFCS.
F# List.zip list1list2
Seq.zip source1source2
Array.zip array1array2
List.zip3 list1list2list3
Seq.zip3 source1source2source3
Array.zip3 array1array2array3
Haskell zip list1list2zip3 list1list2list3zipnlist1listnzipn for n > 3 is available in the module Data.List. Stops after the shortest list ends.
Python zip(list1, list2)zip(list1, list2, list3)zip(list1, …, listn)zip() and map() (3.x) stops after the shortest list ends, whereas map() (2.x) and itertools.zip_longest() (3.x) extends the shorter lists with None items
Ruby list1.zip(list2)list1.zip(list2, list3)list1.zip(list1, .., listn)When the list being executed upon (list1) is shorter than the lists being zipped the resulting list is the length of list1. If list1 is longer nil values are used to fill the missing values [6]
Scala list1.zip(list2)If one of the two collections is longer than the other, its remaining elements are ignored. [7]
Unzip in various languages
LanguageUnzipUnzip 3 tuplesUnzip n tuplesNotes
Clojure (apply map vector ziplist)(apply map vector ziplist)(apply map vector ziplist)
Common Lisp (apply#'mapcar#'listziplist)(apply#'mapcar#'listziplist)(apply#'mapcar#'listziplist)
F# List.unzip list1list2
Seq.unzip source1source2
Array.unzip array1array2
List.unzip3 list1list2list3
Seq.unzip3 source1source2source3
Array.unzip3 array1array2array3
Haskell unzip ziplistunzip3 ziplistunzipnziplistunzipn for n > 3 is available in the module Data.List.
Python zip(*zipvlist)zip(*zipvlist)zip(*zipvlist)

See also

References

  1. map from ClojureDocs
  2. 1 2 map(function, iterable, ...) from section Built-in Functions from Python v2.7.2 documentation
  3. zip :: [a] -> [b] -> [(a, b)] from Prelude, Basic libraries
  4. "Statements — Chapel Documentation 1.25".
  5. "std.range - D Programming Language".
  6. "Class: Array".
  7. "IterableOps". scala-lang.org.