A Guide to the S-Lang Language: Arrays

10. Arrays

An array is a container object that can contain many values of one data type. Arrays are very useful objects and are indispensable for certain types of programming. The purpose of this chapter is to describe how arrays are defined and used in the S-Lang language.

10.1 Creating Arrays

The S-Lang language supports multi-dimensional arrays of all data types. Since the Array_Type is a data type, one can even have arrays of arrays. To create a multi-dimensional array of SomeType use the syntax


      SomeType [dim0, dim1, ..., dimN]

Here dim0, dim1, ... dimN specify the size of the individual dimensions of the array. The current implementation permits arrays consist of up to 7 dimensions. When a numeric array is created, all its elements are initialized to zero. The initialization of other array types depend upon the data type, e.g., String_Type and Struct_Type arrays are initialized to NULL.

As a concrete example, consider


     a = Integer_Type [10];

which creates a one-dimensional array of 10 integers and assigns it to a. Similarly,


     b = Double_Type [10, 3];

creates a 30 element array of double precision numbers arranged in 10 rows and 3 columns, and assigns it to b.

There is a more convenient syntax for creating and initializing a 1-d arrays. For example, to create an array of ten integers whose elements run from 1 through 10, one may simply use:


     a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];

Similarly,


     b = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0];

specifies an array of ten doubles.

An even more compact way of specifying a numeric array is to use a range-array. For example,


     a = [0:9];

specifies an array of 10 integers whose elements range from 0 through 9. The most general form of a range array is


     [first-value : last-value : increment]

where the increment is optional and defaults to 1. This creates an array whose first element is first-value and whose successive values differ by increment. last-value sets an upper limit upon the last value of the array. The number of elements in the array is given by the expression

1 + (last-value - first-value)/increment

Another way to create an array is apply the dereference operator @ to the DataType_Type literal Array_Type. The actual syntax for this operation resembles a function call

variable a = @Array_Type (data-type, integer-array);

where data-type is of type DataType_Type and integer-array is a 1-d array of integers that specify the size of each dimension. For example,


     variable a = @Array_Type (Double_Type, [10, 20]);

will create a 10 by 20 array of doubles and assign it to a. This method of creating arrays derives its power from the fact that it is more flexible than the methods discussed in this section. We shall encounter it again in section ??? in the context of the array_info function.

10.2 Reshaping Arrays

It is sometimes possible to change the `shape' of an array using the reshape function. For example, a 1-d 10 element array may be reshaped into a 2-d array consisting of 5 rows and 2 columns. The only only restriction on the operation is that the arrays must be commensurate. The reshape function follows the syntax

reshape (array-name, integer-array);

where array-name specifies the array to be reshaped to have the dimensions given by integer-array, a 1-dimensional array of integers. It is important to note that this does not create a new array, it simply reshapes the existing array. Thus,


       variable a = Double_Type [100];
       reshape (a, [10, 10]);

turns a into a 10 by 10 array.

10.3 Indexing Arrays

An individual element of an array may be referred to by its index. For example, a[0] specifies the zeroth element of the one dimensional array a, and b[3,2] specifies the element in the third row and second column of the two dimensional array b. As in C array indices are numbered from 0. Thus if a is a one-dimensional array of ten integers, the last element of the array is given by a[9]. Using a[10] would result in a range error.

A negative index may be used to index from the end of the array, with a[-1] referring to the last element of a, a[-2] referring to the next to the last element, and so on.

One may use the indexed value like any other variable. For example, to set the third element of an integer array to 6, use


     a[3] = 6;

Similarly, that element may be used in an expression, such as


     y = a[3] + 7;

Unlike other S-Lang variables which inherit a type upon assignment, array elements already have a type. For example, an attempt to assign a string value to an element of an integer array will result in a type-mismatch error.

One may use any integer expression to index an array. A simple example that computes the sum of the elements of 10 element 1-d array is


      variable i, sum;
      sum = 0;
      for (i = 0; i < 10; i++) sum += a[i];

Unlike many other languages, S-Lang permits arrays to be indexed by other integer arrays. Suppose that a is a 1-d array of 10 doubles. Now consider:


      i = [6:8];
      b = a[i];

Here, i is a 1-dimensional range array of three integers with i[0] equal to 6, i[1] equal to 7, and i[2] equal to 8. The statment b = a[i]; will create a 1-d array of three doubles and assign it to b. The zeroth element of b, b[0] will be set to the sixth element of a, or a[6], and so on. In fact, these two simple statements are equivalent to


     b = Double_Type [3];
     b[0] = a[6];
     b[1] = a[7];
     b[2] = a[8];

except that using an array of indices is not only much more convenient, but executes much faster.

More generally, one may use an index array to specify which elements are to participate in a calculation. For example, consider


     a = Double_Type [1000];
     i = [0:499];
     j = [500:999];
     a[i] = -1.0;
     a[j] = 1.0;

This creates an array of 1000 doubles and sets the first 500 elements to -1.0 and the last 500 to 1.0. Actually, one may do away with the i and j variables altogether and use


     a = Double_Type [1000];
     a [[0:499]] = -1.0;
     a [[500:999]] = 1.0;

It is important to understand the syntax used and, in particular, to note that a[[0:499]] is not the same as a[0:499]. In fact, the latter will generate a syntax error.

Often, it is convenient to use a rubber range to specify indices. For example, a[[500:]] specifies all elements of a whose index is greater than or equal to 500. Similarly, a[[:499]] specifies the first 500 elements of a. Finally, a[[:]] specifies all the elements of a.

Now consider a multi-dimensional array. For simplicity, suppose that a is a 100 by 100 array of doubles. Then the expression a[0, [:]] specifies all elements in the zeroth row. Similary, a[[:], 7] specifies all elements in the seventh column. Finally, a[[3:5][6:12]] specifies the 3 by 7 region consisting of rows 3, 4, and 5, and columns 6 through 12 of a.

We conclude this section with a few examples.

Here is a function that computes the trace (sum of the diagonal elements) of a square 2 dimensional n by n array:


      define array_trace (a, n)
      {
         variable sum = 0, i;
         for (i = 0; i < n; i++) sum += a[i, i];
         return sum;
      }

This fragment creates a 10 by 10 integer array, sets its diagonal elements to 5, and then computes the trace of the array:


      a = Integer_Type [10, 10];
      for (j = 0; j < 10; j++) a[j, j] = 5;
      the_trace = array_trace(a, 10);

We can get rid of the for loop as follows:


      j = Integer_Type [10, 2];
      j[[:],0] = [0:9];
      j[[:],1] = [0:9];
      a[j] = 5;

Here, the goal was to construct a 2-d array of indices that correspond to the diagonal elements of a, and then use that array to index a. To understand how this works, consider the middle statements. They are equivalent to the following for loops:


      variable i;
      for (i = 0; i < 10; i++) j[i, 0] = i;
      for (i = 0; i < 10; i++) j[i, 1] = i;

Thus, row n of j will have the value (n,n), which is precisely what was sought.

Another example of this technique is the function:


      define unit_matrix (n)
      {
         variable a = Integer_Type [n, n];
         variable j = Integer_Type [n, 2];
         j[[:],0] = [0:n - 1];
         j[[:],1] = [0:n - 1];
         
         a[j] = 1;
         return a;
      }

This function creates an creates an n by n unit matrix, that is a 2-d n by n array whose elements are all zero except on the disgonal where they have a value of 1.

10.4 Arrays and Variables

When an array is created and assigned to a variable, the interpreter allocates the proper amount of space for the array, initializes it, and then assigns to the variable a reference to the array. So, a variable that represents an array has a value that is really a reference to the array. This has several connsequences, some good and some bad. It is believed that the advantages of this representation outweigh the disadvantages. First, we shall look at the positive aspects.

When a variable is passed to a function, it is always the value of the variable that gets passed. Since the value of a variable representing an array is a reference, a reference to the array gets passed. One major advantage of this is rather obvious: it is a fast and efficient way to pass the array. This also has another consequence that is illustrated by the function


      define init_array (a, n)
      {
         variable i;
         
         for (i = 0; i < n; i++) a[i] = some_function (i);
      }

where some_function is a function that generates a scalar value to initialize the ith element. This function can be used in the following way:


      variable X = Double_Type [100000];
      init_array (X, 100000);

Since the array is passed to the function by reference, there is no need to make a separate copy of the 100000 element array. As pointed out above, this saves both execution time and memory. The other salient feature to note is that any changes made to the elements of the array within the function will be manifested in the array outside the function. Of course, in this case, this is a desirable side-effect.

To see the downside of this representation, consider:


      variable a, b;
      a = Double_Type [10];
      b = a;
      a[0] = 7;

What will be the value of b[0]? Since the value of a is really a reference to the array of ten doubles, and that reference was assigned to b, b also refers to the same array. Thus any changes made to the elements of a, will also be made implicitly to b.

This begs the question: If the assignment of one variable which represents an array, to another variable results in the assignment of a reference to the array, then how does one make separate copies of the array? There are several answers including using an index array, e.g., b = a[[:]]; however, the most natural method is to use the dereference operator:


      variable a, b;
      a = Double_Type [10];
      b = @a;
      a[0] = 7;

In this example, a separate copy of a will be created and assigned to b. It is very important to note that S-Lang never implicitly dereferences an object, one must explicitly use the dereference operator. This means that the elements of a dereferenced array are not themselves dereferenced. For example, consider dereferencing an array of arrays, e.g.,


      variable a, b;
      a = Array_Type [2];  
      a[0] = Double_Type [10];
      a[1] = Double_Type [10];
      b = @a;

In this example, b[0] will be a reference to the array that a[0] references because a[0] was not explicitly dereferenced.

10.5 Using Arrays in Computations

Many functions and operations work transparantly with arrays. For example, if a and b are arrays, then the sum a + b is an array whose elements are formed from the sum of the corresponding elements of a and b. A similar statement holds for all other binary and unary operations.

Let's consider a simple example. Suppose, that we wish to solve a set of n quadratic equations whose coefficients are given by the 1-d arrays a, b, and c. In general, the solution of a quadratic equation will be two complex numbers. For simplicity, suppose that all we really want is to known what subset of the coefficients, a, b, c, correspond to real-valued solutions. In terms of for loops, we can write:


     variable i, d, index_array;
     index_array = Integer_Type [n];
     for (i = 0; i < n; i++)
       {
          d = b[i]^2 - 4 * a[i] * c[i];
          index_array [i] = (d >= 0.0);
       }

In this example, the array index_array will contain a non-zero value if the corresponding set of coefficients has a real-valued solution. This code may be written much more compactly and with more clarity as follows:


     variable index_array = ((b^2 - 4 * a * c) >= 0.0);

S-Lang has a powerful built-in function called where. This function takes an array of integers and returns a 2-d array of indices that correspond to where the elements of the input array are non-zero. This simple operation is extremely useful. For example, suppose a is a 1-d array of n doubles, and it is desired to set to zero all elements of the array whose value is less than zero. One way is to use a for loop:


     for (i = 0; i < n; i++) 
       if (a[i] < 0.0) a[i] = 0.0;

If n is a large number, this statement can take some time to execute. The optimal way to achieve the same result is to use the where function:


     a[where (a < 0.0)] = 0;

Here, the expression (a < 0.0) returns an array whose dimensions are the same size as a but whose elements are either 1 or 0, according to whether or not the corresponding element of a is less than zero. This array of zeros and ones is then passed to where which returns a 2-d integer array of indices that indicate where the elements of a are less than zero. Finally, those elements of a are set to zero.

As a final example, consider once more the example involving the set of n quadratic equations presented above. Suppose that we wish to get rid of the coefficients of the previous example that generated non-real solutions. Using an explicit for loop requires code such as:


     variable i, j, nn, tmp_a, tmp_b, tmp_c;
     
     nn = 0;
     for (i = 0; i < n; i++) 
       if (index_array [i]) nn++;
     
     tmp_a = Double_Type [nn];
     tmp_b = Double_Type [nn];
     tmp_c = Double_Type [nn];
     
     j = 0;
     for (i = 0; i < n; i++)
       {
          if (index_array [i]) 
            {
               tmp_a [j] = a[i];
               tmp_b [j] = b[i];
               tmp_c [j] = c[i];
               j++;
            }
       }
     a = tmp_a;
     b = tmp_b;
     c = tmp_c;

Not only is this alot of code, it is also clumsy and error-prone. Using the where function, this task is trivial:


     variable i;
     i = where (index_array != 0);
     a = a[i];
     b = b[i];
     c = c[i];

All the examples up to now assumed that the dimensions of the array were known. However, the function array_info may be used to get information about an array, such as its data type and size. The function returns three values: the data type, the number of dimensions, and an integer array containing the size of each dimension. It may be used to determine the number of rows of an array as follows:


     define num_rows (a)
     {
        variable dims, type, num_dims;
        
        (dims, num_dims, type) = array_info (a);
        return dims[0];
     }

The number of columns may be obtained in a similar manner:


     define num_cols (a)
     {
        variable dims, type, num_dims;
        
        (dims, num_dims, type) = array_info (a);
        if (num_dims > 1) return dims[1];
        return 1;
     }

Another use of array_info is to create an array that has the same number of dimensions as another array:


     define make_int_array (a)
     {
        variable dims, num_dims, type;
        
        (dims, num_dims, type) = array_info (a);
        return @Array_Type (Integer_Type, dims);
     }

Next Previous Contents