In previous post, we started our journey of OCaml with a light discussion on functional programming. All functional programming languages are expression-oriented, i.e. every (or nearly every) construction is an expression and thus yields a value. In this post, we will study some basic expressions through code snippets, which can be directly tried out in the interactive OCaml interpreter called “toplevel”. Note that OCaml’s toolchain also includes a bytecode compiler and an optimizing native code compiler. Don’t worry if you don’t have OCaml available on your system. The website Try OCaml provides a complete online REPL in a webpage.

Toplevel repeatedly reads OCaml phrases from the input, then typechecks, compile and evaluate them, then prints the inferred type and result value. Toplevel prints a # prompt before reading each phrase, which can span several lines and is terminated by a double-semicolon. Note that the double semicolon in toplevel exists for historical reasons and is not really needed in the source files for compilation. Besides, an opening comment paren (* will absorb everything as part of the comment until a well-balanced closing comment paren *) is found. Nested comments are handled correctly (Oh Yeah!).

In a terminal, simply type ocaml to launch the toplevel. One can use the toplevel as a calculator. OCaml provides the common arithmetic operators on int and float (can’t start with a decimal point). For example, type the following simple expressions in the toplevel:

 # 2+3;; - : int = 5 # 2.+.3.;; - : float = 5. 

Of course, we will get the right answer 5 and 5., respectively. Compared to dynamically typed languages such as Python, the toplevel first prints the type of the result as OCaml is strongly typed. On the other hand, it looks odd for C/C++ programmers that OCaml uses the different operators (+ vs +.) on int and float, respectively. In fact, OCaml does’t have implicit type conversion and thus the following expression is invalid:

 # 2.+3;; Error: This expression has type float but an expression was expected of type int 

Without automatic cast, OCaml prevents some kinds of bugs in C/C++. Moreover, the different set of arithmetic operators on int and float makes the type inference easier. This has more impacts on subtyping and objects, which we will discuss later. Besides, OCaml provides a set of coercion functions on primitive types int, float, bool, char, and string.

 # int_of_float 2. + 3;; - : int = 5 

Here we apply the function int_of_float to cast 2. to an integer. In OCaml, function application is as simple as prefix juxtaposition, no parentheses are needed. But parentheses can be used for disambiguation and changing precedence.

 # float_of_int (2 + 3);; - : float = 5. 

Now it is time for the classic hello world example.

 # print_string "Hello world!\n";; Hello world! - : unit = () 

Several interesting things to notice. First, string is a built-in primitive type and string literals are written in double-quotes with the usual C-style backslash escapes. Strings can be concatenated by the operator ^. The string type consists of a series of 8-bit bytes in essence but there’s no built-in support for handling UTF-8. Strings are very unusual in OCaml in that they are mutable data structures. We will discuss the mutable data structures in another post.

Because every expression yields a value, this printing expression must also return a value. As shown in the top-level, the value is written as () of the unit type. The unit type consists of exactly one value and is used to represent the type of expressions that have “no value”, i.e. expressions that are evaluated for side-effects only. Besides printing functions, the value of a while or for loop (both of which are expressions, not statements) is also of type unit. As while or for loops are expressions, so are if expressions.

Before finding out the types of if expressions, let’s look at the bool type since the presence of the conditional construct implies the presence of boolean values. The type bool is composed of two values true and false. As usual, there are comparison operators returning boolean values. Most of these operators will look familiar: =, <>, <, <=, >, >=. The comparison operators are polymorphic meaning they work on most built-in data types. Note that polymorphic and automatic casting are two different concepts. In fact, one can’t compare two values of different types (e.g. 1 = 1.) due to the strongly-typed nature of OCaml.

Because functions are first-class citizens just like any other values, one may wonder if we can compare two functions for equality? The answer is NO because comparing two functions is undecidable in general. If we can compare two functions, then we can say if a function terminates or not. But the halting problem is undecidable for Turing machines and Lambda calculus.

Back to if expressions. In OCaml, if is much more like the conditional operator ? : in C. The if expression is in the form of

if <bool_expression> then <expression1> else <expression2>

OCaml requires that both branches of an if expression have the same type. If the <bool_expression> is true, then <expression1> is evaluated and is the value of the if expression, otherwise <expression2> is evaluated and is the value of the if expression. Since an expression must have a value, the else part is required unless the then part returns ().

We have briefly discussed some aspects of expressions. But there are plenty of features and details left. You are encouraged very much to explore more in the official OCaml website. In the next post of this series, I will talk about the variables in OCaml.

Advertisements