ECLiPSeWiki | Joachim / Arrays

Array handling

Arrays in ECLiPSe are represented as structures using the functor []/N with any arity. For example [](11,22,33) is an array of length 3, with elements at position 1, 2 and 3, represented by a term with functor []/3. Multi-dimensional arrays are represented as nested arrays, i.e. arrays of arrays.

This is note discusses a number of special cases arising in relation to empty arrays, multi-dimensional arrays, and "ragged" arrays.

Array or not array

ECLiPSe, like Prolog, does not use static typing. All data is either number, string, or a term with a functor (atom or structure). How functor-terms are interpreted is up the the program code. For arrays, this means that a term with functor []/N can be interpreted as an array or as a general data structure with a different meaning. It also means that everything that "looks like an array", i.e. has the functor []/N, can be used as an array.

This is often not a problem because a program should be clear about where it expects an array.

It becomes relevant when arrays contain arrays, because a sub-array could be interpreted as part of a multidimensional array, or simply as an element of an array with lower dimension.

An additional case of ambiguity arises from the use of [] (i.e. a term with functor []/0) for the representation of the empty array, because it is the same representation that is used for empty lists (and sometimes for empty instances of other data structures like trees). It could also be meant to represent a general atom whose name just happens to be '[]'.

Interpretation

The following table gives an overview about how different array-related primitives handle the special cases. We consider the following variants of arrays:

one-dim [N]: One-dimensional arrays of length 1 or greater. This is the basic case and handled by all array-related primitives.
one-dim [0]: One-dimensional array of length 0. This is handled by all primitives except the ones that access elements.
fixed-multi [M,N]: Two-or-more-dimensional arrays without degenerate/empty dimensions. These are handled by all array-related primitives.
fixed-multi [M,0]: Two-or-more-dimensional arrays where the last dimension is 0.
ragged [M,1..N]: Multi-dimensional, where the size of the subarrays in the same dimension may vary. This can be used to save space when representing diagonal matrices, for example. In most mathematical applications, there are no empty sub-arrays.
ragged [M,0..N]: As before, but individual sub-arrays may be empty.

	one-dim [N]	one-dim [0]	fixed multi [M,N]	fixed multi [M,0]	ragged [M,1..N]	ragged [M,0..N]
dim(-,+)	ok	ok	ok	3	n/a	n/a
dim(+,-)	ok	ok	ok	4	6	4,6
array_list/2 etc	ok	ok	ok 2	ok 2	ok 2	ok 2
array_flat/3	ok	ok	ok	ok	ok	ok
A[...],subscript/3	ok	ok 1	ok	ok	ok	ok
foreachelem do	ok	ok	ok	5	ok	5

Notes:

1: There are no elements to access (error).
2: primitives specifically for one-dimensional arrays (like array_list/2) consider only the top-level dimension.
3: will not generate empty subarrays, whole array is [] (since 6.1#175).
4: will consider empty dimension as not existing, i.e. dim([]([]),[1]).
5: will treat [] as array element rather than sub-array (except at the top level).
6: will give first subarray's size as dimension.

Principles

The principles behind the above behaviours are supposed to be
(1) where an array is (unambiguously) expected, interpret [] as (empty) array
(2) where there is ambiguity, do not interpret [] as an array

The alternative rule would have been
(2a) even where there is ambiguity, interpret [] as empty array

Observations affecting the choice are:

empty arrays are clearly useful in the 1-dimensional case, and do not cause any problems
in many use cases, multidimensional arrays do not need to have empty dimensions, e.g. mathematical matrix applications
arrays may contain things other than numbers, in particular lists (e.g. in the case of the ic_sets library, which uses lists to represent sets) or general atoms. With (2a) it would not be possible to interpret [] as array element, e.g. it would always be skipped by foreachelem.
(2) is the more flexible choice in the sense that if foreachelem delivers [] as an array element, the application code can explicit test for this, and interpret it as an empty array and skip it.
empty sub-arrays are sometimes useful, in particular with ragged arrays

Programming techniques

The deeper reason for the discussed ambiguities is the use of a "defaulty" data structure, i.e. one where the different cases are not clearly distinguished by different functors (instead the rule is "when it looks like an array, it's an array, everything else is an array element"). There are two ways to avoid this:

Wrap all array elements into a structure elem/1. Then an array can only contain elem/1 and []/N structures, and there is no ambiguity.
Carry the array dimensions [M,N], or at least the number of dimensions along together with the array itself. That way, it should be clear which levels of a multi-dimensional array contain sub-arrays, and which not.

Note that 1 is unnecessary when the array contains only numbers or strings, which are anyway distinguishable from arrays.

Open issues

The behaviour of dim(M,[3,0]) giving M=[] is debatable, on the other hand, it is not clear why there should be multiple representations of an array that holds no elements. If desired, it is relatively easy to replace dim(M,[3,0]) with

    dim(M,[3]), (foreachelem([],M) do true)

Note that, because of (2), dim([]([],[]),D) must return D=[2].

It may be good to have a variant of foreachelem which specifies the array depth in the same way as array_flat/3.