Risk Aversion and Sunk Cost Fallacy

In his book Misbehaving, Richard H. Thaler tells an interesting story. In a class on decision-making to a group of executives from a company in the print media industry, Thaler puts the executives to a scenario: Suppose you were offered an investment opportunity for your division that will yield one of two payoffs. After the investment is made, there is a 50% chance it will make a profit of $2 million, and a 50% chance it will lose $1 million. When Thaler asked who would take on this project, only three of twenty-three executives would do it. Then he asked the CEO how many of the projects would he want to undertake (suppose all projects were independent, that is the success of one was unrelated to others), the answer is all of them! Continue reading

Agile Software Development: China Navy



The best demonstration of agile software development is probably the modernization of China Navy. Following a “Run Swiftly in Small Steps” strategy, China Navy has undergone a stunning modernization push that puts it near parity with the US. Look below how China Navy has steadily improved each class of their destroyers in gradually shorter and shorter time. They are the grand master of agile development. Continue reading

Payroll: An Overlooked Area in FinTech




A lot of brain power and money have been poured into FinTech, especially lending and payment areas. These are indeed exciting areas with new business models and technologies. On the other hand, people rarely associate the sexy FinTech with payroll services. Although it may sound boring, payroll is actually an overlooked gold mine for innovators. Traditionally, payroll service companies make money by service fees. New HCM service companies such as Zenefits work as insurance brokers while providing free payroll and HR services. But if we lean under the hood and look at the process, there is an interesting opportunity. Continue reading

Smile 1.2.0 Released!

Dear Smilers,

We are proud to announce the release of Smile 1.2.0.

The key features of the 1.2.0 release are:

  • Headless plot. Smile’s plot functions depends on Java Swing. In server applications, it is needed to generate plots without creating Swing windows. With headless plot (enabled by -Djava.awt.headless=true JVM options), we can create plots as follows:
    val canvas = ScatterPlot.plot(x, '.')
    val headless = new Headless(canvas);
    canvas.save(new java.io.File("zone.png"))
  • All classification and regression models can be serialized by
    write(model) // Java serialization


    write.xstream(model) // XStream serialization
  • Refactor of smile.io Scala API.
    • Parsers are in smile.read object.
    • Parse JDBC ResultSet to AttributeDataset.
    • Model serialization methods in smile.write object.
  • Platt scaling for SVM
  • Smile NLP tokenizers are unicode-aware.
  • Least squares can handle rank deficient now.
  • Various code improvements.

Unicorn 2.0 is Released!


There are a lot of NoSQL databases out there. We have used or tried out many of them. We love a lot of cool features they offer. However, we also face many unique challenges in a highly regulated HCM SaaS business. So we have kept looking for the unicorn database to meet our requirements. Unfortunately, none of existing solutions fully address all of our challenges. So we asked ourselves two years ago if we can build our own solution. It was how Unicorn database was born. Unicorn is built on top of BigTable-like storage engines such as Cassandra, HBase, or Accumulo. With different storage engine, we can achieve different strategies on consistency, replication, etc. Beyond the plain abstraction of BigTable data model, Unicorn provides the easy-to-use document data model and MongoDB-like API. Moreover, Unicorn supports directed property multigraphs and documents can just be vertices in a graph. With the built-in document and graph data models, developers can focus on the business logic rather than work with tedious key-value pair manipulations. Of course, developers are still free to use key-value pairs for flexibility in some special cases.

During the past two years, we have learned a lot and made a lot of improvements, which resulted in Unicorn 2.0, which we are excited to open source to the community. Continue reading

In Memory of Andy Grove




Legendary former Intel CEO Andy Grove left us recently. Wearing many hats, he is an entrepreneur, a teacher, a writer, a philanthropist, etc. As Marc Andreessen says on Twitter, he is “the best company builder Silicon Valley has ever seen, and likely will ever see“. Even after more than thirty years, his book “High Output Management” is still a must-read for all middle level managers.

In his another best-seller book, “Only the Paranoid Survive“, he introduced the concepts such as “strategic inflection point” and “strategic dissonance”, which have become part of the lexicon both in academia and in practice. A strategic inflection point is a time in the life of business when its fundamentals are about to change. That change can mean an opportunity to rise to new heights. But it may just as likely signal the beginning of the end.


Andy Grove steered Intel through several strategic inflection points, for example, the shift from the memory business to microprocessors when they realized they couldn’t keep up with Japanese competition. Soon Intel will face another inflection point. Actually this new inflection point already started. Unfortunately, Intel doesn’t have Andy Grove any longer. Continue reading

Smile 1.1 is Released!



Today I am very excited to announce that Smile 1.1 is released! Among many improvements, we get the new high level Scala API, interactive Shell, and a nice project website with programming guides, API doc, etc.!

With Smile 1.1, data scientists can develop advanced models with high level Scala operators in the Shell and developers can deploy them immediately in the app. That is, data scientists and developers can speak the same language now! Continue reading

There is no big data in machine learning


, ,

Dogue allemand HARLEQUIN adulte debout devant un fond blanc

Back to graduate school, I had been working on the so-called small sample size problem. In particular, I was working on linear discriminant analysis (LDA). For high-dimensional data (e.g. images, gene expression, etc.), the within-scatter matrix is singular when the number of samples is smaller than the dimensionality. Therefore LDA cannot be applied directly. You may think that we don’t have such small sample size problems anymore in the era of Big Data. Well, the challenge is deeper than what it looks like. Continue reading

High End Disruption


, , , ,

Professor Clayton Christensen’s theory of disruptive innovation has been enjoying a huge success on examining low-end disruptions and new-market disruptions. But it had recently met difficulties to explaining high-end disruptions such as iPhone and Telsa. In fact, technologies that starts from high-end market and then reaches mainstream market are not new. Thomas Edison did it more than 100 years ago.


So did only the rich (and cow boys/girls) ride the horses after Henry Ford invented the Model T. Today, Elon Musk does it again!


Tomorrow, only few can afford driving a car when self driving cars take the mainstream market.