Tags

,

smile
Today I am very excited to announce that Smile 1.1 is released! Among many improvements, we get the new high level Scala API, interactive Shell, and a nice project website with programming guides, API doc, etc.!

With Smile 1.1, data scientists can develop advanced models with high level Scala operators in the Shell and developers can deploy them immediately in the app. That is, data scientists and developers can speak the same language now!

To try out Smile, please download prebuilt packages in tarball or Mac dmg files from the GitHub project releases page.

Smile runs on both Windows and UNIX-like systems (e.g. Linux, Mac OS). All you need is to have Java installed on your system PATH, or the JAVA_HOME environment variable pointing to a Java installation.

The download packages are built on Java 8. But Java 7 is sufficient to build it if needed. For the Scala API, we uses Scala 2.11. Check out the project website on how to build it by yourself.

After installation or unpackaging, the easiest way to play with Smile is the new interactive Shell with pre imported Scala API. The Java API can be used too in the Shell through the Scala/Java interoperability.

In the home directory of Smile, type

./bin/smile

to enter the shell, which is based on Scala interpreter. So you can run any valid Scala expressions in the shell. In the simplest case, you can use it as a calculator. Besides, all high-level Smile operators are predefined in the shell. Be default, the shell uses up to 4GB memory. If you need more memory to handle large data, use the option -J-Xmx. For example,

./bin/smile -J-Xmx8192M

You can also modify the configuration file ./conf/application.ini for the memory and other JVM settings.

In the shell, type :help to print Scala interpreter help information. To get help information of Smile high-level operators, the help. You can also get detailed information on each operator by typing help("command"), e.g. help("svm"). To exit the shell, type :quit.

In the shell, type demo to bring up the demo window, which shows off various Smile’s machine learning capabilities.

You can also type benchmark() to see Smile’s performance on a couple of test data. You can run a particular benchmark bybencharm("test name"), where test name could be “airline”, “usps”, etc.

In the data directory, we also include many open datasets, which are frequently used in research and benchmark. Now let’s build a classification model with Smile. It is as easy as

val data = readArff("data/weka/iris.arff", 4)
val (x, y) = data.unzipInt

val rf = randomForest(x, y)
println(s"OOB error = ${rf.error}")
rf.predict(x(0))

In this example, we use the famous Iris data from R.A. Fisher. The data is in Weka’s ARFF format. The second parameter of readArff is the column index of response variable. With our parsers, the column index starts with 0. The function readArff returns an object of AttributeDataset. Besides the data itself, an AttributeDataset object also contains many meta data. Then we use the help function unzipInt to get the training data and labels. For regression, you may use unzipDouble as the response variable is real value. Finally, we train a random forest with default parameters and print out its OOB (out of bag) error. We can apply the model on new data samples with the method predict.

There are a lot of advanced classification and regression algorithms available in Smile. For learning purpose, you can easily visualize the decision boundary on 2D data to feel their behaviors.

SVM Classification Boundary

Advanced unsupervised learning algorithms from clustering, vector quantization, association rule mining, manifold learning, to multi-dimensional scaling are available in Scala API too.

DBScan Clustering

Please go to download Smile now and start to build your new cool machine learning app now!

Advertisements