Smile 1.2.0 Released!

Dear Smilers,

We are proud to announce the release of Smile 1.2.0.

The key features of the 1.2.0 release are:

  • Headless plot. Smile’s plot functions depends on Java Swing. In server applications, it is needed to generate plots without creating Swing windows. With headless plot (enabled by -Djava.awt.headless=true JVM options), we can create plots as follows:
    val canvas = ScatterPlot.plot(x, '.')
    val headless = new Headless(canvas);
  • All classification and regression models can be serialized by
    write(model) // Java serialization


    write.xstream(model) // XStream serialization
  • Refactor of Scala API.
    • Parsers are in object.
    • Parse JDBC ResultSet to AttributeDataset.
    • Model serialization methods in smile.write object.
  • Platt scaling for SVM
  • Smile NLP tokenizers are unicode-aware.
  • Least squares can handle rank deficient now.
  • Various code improvements.

Unicorn 2.0 is Released!


There are a lot of NoSQL databases out there. We have used or tried out many of them. We love a lot of cool features they offer. However, we also face many unique challenges in a highly regulated HCM SaaS business. So we have kept looking for the unicorn database to meet our requirements. Unfortunately, none of existing solutions fully address all of our challenges. So we asked ourselves two years ago if we can build our own solution. It was how Unicorn database was born. Unicorn is built on top of BigTable-like storage engines such as Cassandra, HBase, or Accumulo. With different storage engine, we can achieve different strategies on consistency, replication, etc. Beyond the plain abstraction of BigTable data model, Unicorn provides the easy-to-use document data model and MongoDB-like API. Moreover, Unicorn supports directed property multigraphs and documents can just be vertices in a graph. With the built-in document and graph data models, developers can focus on the business logic rather than work with tedious key-value pair manipulations. Of course, developers are still free to use key-value pairs for flexibility in some special cases.

During the past two years, we have learned a lot and made a lot of improvements, which resulted in Unicorn 2.0, which we are excited to open source to the community. Continue reading

In Memory of Andy Grove




Legendary former Intel CEO Andy Grove left us recently. Wearing many hats, he is an entrepreneur, a teacher, a writer, a philanthropist, etc. As Marc Andreessen says on Twitter, he is “the best company builder Silicon Valley has ever seen, and likely will ever see“. Even after more than thirty years, his book “High Output Management” is still a must-read for all middle level managers.

In his another best-seller book, “Only the Paranoid Survive“, he introduced the concepts such as “strategic inflection point” and “strategic dissonance”, which have become part of the lexicon both in academia and in practice. A strategic inflection point is a time in the life of business when its fundamentals are about to change. That change can mean an opportunity to rise to new heights. But it may just as likely signal the beginning of the end.


Andy Grove steered Intel through several strategic inflection points, for example, the shift from the memory business to microprocessors when they realized they couldn’t keep up with Japanese competition. Soon Intel will face another inflection point. Actually this new inflection point already started. Unfortunately, Intel doesn’t have Andy Grove any longer. Continue reading

Smile 1.1 is Released!



Today I am very excited to announce that Smile 1.1 is released! Among many improvements, we get the new high level Scala API, interactive Shell, and a nice project website with programming guides, API doc, etc.!

With Smile 1.1, data scientists can develop advanced models with high level Scala operators in the Shell and developers can deploy them immediately in the app. That is, data scientists and developers can speak the same language now! Continue reading

There is no big data in machine learning


, ,

Dogue allemand HARLEQUIN adulte debout devant un fond blanc

Back to graduate school, I had been working on the so-called small sample size problem. In particular, I was working on linear discriminant analysis (LDA). For high-dimensional data (e.g. images, gene expression, etc.), the within-scatter matrix is singular when the number of samples is smaller than the dimensionality. Therefore LDA cannot be applied directly. You may think that we don’t have such small sample size problems anymore in the era of Big Data. Well, the challenge is deeper than what it looks like. Continue reading

High End Disruption


, , , ,

Professor Clayton Christensen’s theory of disruptive innovation has been enjoying a huge success on examining low-end disruptions and new-market disruptions. But it had recently met difficulties to explaining high-end disruptions such as iPhone and Telsa. In fact, technologies that starts from high-end market and then reaches mainstream market are not new. Thomas Edison did it more than 100 years ago.


So did only the rich (and cow boys/girls) ride the horses after Henry Ford invented the Model T. Today, Elon Musk does it again!


Tomorrow, only few can afford driving a car when self driving cars take the mainstream market.

The Other Side of Sharing Economy


, ,


The Sharing Economy is now touching on nearly every aspect of everyday life. Besides the skyrocketing valuations, people also talk things like:

  • Uber, the world’s largest taxi company, owns no vehicles.
  • AirBnB, the world’s largest accommodation provider, owns no real estate.

It is true that AirBnB doesn’t own a single room. But most hotels don’t own real estate either! They lease. Continue reading

Oracle’s Dilemma


, , ,

Opinions expressed are solely my own and do not express the views or opinions of my employer.

Today I am in the Oracle Cloud World. I have high hope to learn their cloud strategy because Oracle is the only hugely successful company founded in the time of client/server movement that is still leaded by its founder. Today we are facing another significant change: Cloud and SaaS. It is very interesting to see how Oracle responses to it. Continue reading

How to Disrupt Financial Services


“Silicon Valley is coming,” JPMorgan Chase CEO Jamie Dimon warned in his annual letter to shareholders. Yes, FinTech startups are coming. In fact, there are currently 12,000 FinTech startups in the field. No doubt, most of them will fail but a few will succeed and disrupt financial service market. Of course, the million-dollar question is “How?!” I don’t have a crystal ball to divine the future, but history may teach us something. Let’s look at how technologies disrupted other fields (and themselves). Continue reading