In previous post, we showed several simple usages of pattern matching. Pattern matching is actually even more useful with algebraic data types. In functional programming, an algebraic data type is a kind of composite type, i.e. a type formed by combining other types. Two common classes of algebraic type are Cartesian product types, i.e. tuples and records, and sum types (also called variants).

Tuples

A tuple is a collection of values, called fields, of arbitrary types.

# let t = (1, "ocaml");;
val t : int * string = (1, "ocaml")

To create a tuple, we write multiple values in order between (optional) parentheses and separated by commas. On the other hand, the corresponding type is written as an ordered sequence of n types, separated by *. We usually work with tuple values directly and let the compiler infer the type. Of course, we can explicitly define a tuple type.

# type mytype = int * string;;
type mytype = int * string

The way to get values back out of a tuple is by pattern matching:

# let (x, y) = t in x;;
- : int = 1
# let (x, y) = t in y;;
- : string = "ocaml"

The structure (x, y) is a pattern to match the fields of tuple. This is a perfect example that patterns are used to select data structures of a given shape, and bind identifiers to components of the data structure. Because it is a very common operation, OCaml has the helper functions fst and snd to return the first and second field of a pair.

# fst t;;
- : int = 1
# snd t;;
- : string = "ocaml"

We can easily implement these functions:

# let fst (x, _) = x;;
val fst : 'a * 'b -> 'a = <fun>
# let snd (_, y) = y;;
val snd : 'a * 'b -> 'b = <fun>

Note that fst and snd take only one argument that is a tuple. A C/C++ programmer may mistake it as two arguments. It is actually a pattern matching of the input argument. Besides, a variable cannot be bound several times by a given pattern. It is a common mistake that beginners use the same variable twice in a pattern to test for equality between two parts of a data structure.

# let eq_pair = function
  | (x, x) -> true
  | _ -> false;;
Error: Variable x is bound several times in this matching

However, we can use the when guards for this purpose.

# let eq_pair = function
  | (x, y) when x = y -> true
  | _ -> false;;
val eq_pair : 'a * 'a -> bool = <fun>

In above examples, you may notice interesting things like 'a and 'b in the types of functions fst, snd and eq_pair. They are the type variables of the form 'identifier to enable parametric polymorphism in OCaml. When such a polymorphic function is applied to an argument, the type inference mechanism knows how to instantiate the type variable. So for each particular use, the type of a polymorphic function can be determined. We will discuss the polymorphism in details in the next post.

Records

Records are similar to tuples for holding multiple values. However, they carry an unordered collection of labeled values. Before making a record, we must declare its type that is enclosed in braces and each field has a name and a type of its own.

# type person = {
  name : string;
  age  : int;
};;
type person = { name : string; age : int; }

Record expressions are of the form {f1=e1;...;fn=en} where the identifiers fi are field labels and ei are values.

# let peter = {name = "Peter Pan"; age = 7};;
val peter : person = {name = "Peter Pan"; age = 7}

Since fields are unordered in records, it does not matter in what order of fields when creating a record. To “update” a record, we use the keyword “with”.

# let peter = {peter with age = 10};;
val peter : person = {name = "Peter Pan"; age = 10}

Note that records are immutable by default. The “update” actually just create a new record. In the example, we don’t declare the type of record peter while the compiler successfully determines the type based on the filed labels. This is possible because records have a flat namespace. In old time, this will confuse the compiler’s type inference algorithm if fields in different records share the same name. Since OCaml 4.01, the compiler is smart enough to handle it in many situations.

# type person =
  {
    name : string;
    age  : int;
  };;
type person = { name : string; age : int; }
# type people =
  {
    name   : string;
    height : float;
  };;
type people = { name : string; height : float; }
# {name = "Peter Pan"; age = 10};;
- : person = {name = "Peter Pan"; age = 10}
# {name = "Alice"; height = 4.};;
- : people = {name = "Alice"; height = 4.}

Because the fields are labeled in records, it is easy to access them.

# peter.name;;
- : string = "Peter Pan"
# peter.age;;
- : int = 10

On the other hand, pattern matching is still our good friend to work with records.

# let print_person = function
  | {age; _}    when age < 10 -> print_string "kids"
  | {age; name} when age < 20 -> print_string name
  | _ -> ();;
val print_person : person -> unit = <fun>

Variants

In C++, we have the union type to allow a piece of memory to be accessed as different data types. However, it is error-prone and has many restrictions. The boost library provides the variant class as a safer alternative. However, it is still tedious to manipulate. In OCaml, we have a much better concept, the sum types or variants, to organize heterogeneous values of various types into the same type. The declaration of a variant type lists all possible shapes for values of that type. Each case is identified by a name, called a constructor, which serves both for constructing values of the variant type and inspecting them by pattern matching. Constructor names are capitalized to distinguish them from variable names that must start with a lowercase letter. Variants can be introduced with the following syntax:

type typename =
| Identifier1 of type1
| Identifier2 of type2
| Identifier3 of type3
| Identifier4 of type4

Like match with construct, the first vertical bar is also optional. The enum types in C++ are a special case of variants of constant constructors that take no arguments.

# type t =
  | Male
  | Female;;
type t = Male | Female

Once introduced, a constant constructor can be used afterwards exactly like a numeric constant.

# let x = Male;;
val x : t = Male

Lists

Of course, constructors with arguments are of more interest. In fact, the list types are sum types. Lists are extensively used in functional programming. A list is a sequence of values of the same type in the form [e1; e2; …; en]. Lists can also be built from the empty list [] by adding elements in front using the :: (“cons”) constructor.

# [1; 2; 3; 4];;
- : int list = [1; 2; 3; 4]
# "Life" :: ["is"; "good"];;
- : string list = ["Life"; "is"; "good"]

Conceptually, we can define the list type as

# type 'a list =
  | E (* empty list *)
  | Cons of 'a * 'a list;;
type 'a list = E | Cons of 'a * 'a list

As built-in types, E and Cons are replaced by [] and :: in OCaml respectively, providing a feeling of operators. But they are not operators as shown in the following

# (::) 1 [];;
Error: Syntax error: operator expected.

But it is okey to do

# (::)(1, []);;
- : int list = [1]

which shows that [] and :: are actually constructors.

In the type definition of lists, 'a is a type variable and stands for arbitrary type. Right now, C++ programmers may regard it as the type parameters in C++ templates although the mechanisms are different.

Clearly, lists in OCaml are same as the singly linked lists in C/C++ that are managed by pointers. But OCaml programmers do not need to manage memory and pointers explicitly. All memory management is done automatically by the compiler and runtime.

As with tuples and records, inspecting and destructuring lists are performed by pattern matching. List patterns have the exact same shape as list expressions, with identifier representing unspecified parts of the list. For example, we can define the function mem to test if a value exists in a list, and the function assoc to return the value associated with a given key in the list of pairs:

# let rec mem x = function
  | [] -> false
  | y :: l -> x = y || mem x l;;
val mem : 'a -> 'a list -> bool = <fun>
# let rec assoc key = function
  | (k, v) :: _ when k = key -> v
  | (k, v) :: l -> assoc key l
  | [] -> raise Not_found;;
val assoc : 'a -> ('a * 'b) list -> 'b = <fun>

It is not necessary to implement these functions by yourself. They are available in the module List of standard library. Of the List module, the map function is probably most used in programs. It applies a given function to each element of a list, and returns the list of the results.

# List.map (fun n -> n * 2) [1;2;3;4];;
- : int list = [2; 4; 6; 8]

The map function can be easily defined as follows

let rec map f = function
  | [] -> []
  | x :: l -> f x :: map f l;;

It is also a good example that functional programming prefers recursions to iterations.

Options

Options are another useful standard type that is defined by constructors.

type 'a option =
  | None
  | Some of 'a;;

In OCaml, options are widely used to represent nullable values. It is a little like NULL in C/C++, but in a type and memory safe way. Recall optional arguments are arguments with default values. In fact, we may not provide the default value of optional arguments in the definition of a function. If so, the optional argument will be of option type with the default value None.

# let add ?x y =
  match x with
  | None -> y
  | Some x -> x + y;;
val add : ?x:int -> int -> int = <fun>

We have reviewed some OCaml’s built-in algebraic data types. A lot of power of OCaml comes from its rich type system indeed. During our review, we also frequently meet polymorphic types and functions. In the next post, we will discuss OCaml’s approaches of polymorphism.

Advertisements