unicorn

There are a lot of NoSQL databases out there. We have used or tried out many of them. We love a lot of cool features they offer. However, we also face many unique challenges in a highly regulated HCM SaaS business. So we have kept looking for the unicorn database to meet our requirements. Unfortunately, none of existing solutions fully address all of our challenges. So we asked ourselves two years ago if we can build our own solution. It was how Unicorn database was born. Unicorn is built on top of BigTable-like storage engines such as Cassandra, HBase, or Accumulo. With different storage engine, we can achieve different strategies on consistency, replication, etc. Beyond the plain abstraction of BigTable data model, Unicorn provides the easy-to-use document data model and MongoDB-like API. Moreover, Unicorn supports directed property multigraphs and documents can just be vertices in a graph. With the built-in document and graph data models, developers can focus on the business logic rather than work with tedious key-value pair manipulations. Of course, developers are still free to use key-value pairs for flexibility in some special cases.

During the past two years, we have learned a lot and made a lot of improvements, which resulted in Unicorn 2.0, which we are excited to open source to the community.

Unicorn is implemented in Scala and can be used as a client-side library without overhead. Unicorn also provides a shell for quick access of database. The code snippets in this document can be directly run in the Shell. A HTTP API, in the module Rhino, is also provided to non-Scala users.

With the module Narwhal that is specialized for HBase, advanced features such as time travel, rollback, counters, server side filter, etc. are available. The user can also export the data to Spark as RDD for large scale analytics. These RDDs can also be converted to DataFrames or Datasets, which support SQL queries. Unicorn graphs can be analyzed by Spark GraphX too.

JSON

To support the document model, Unicorn has a very rich and advanced JSON library. With it, the users can operate JSON data just like in JavaScript. Moreover, it supports JSONPath for flexibly analyse, transform and selectively extract data out of JSON objects. Meanwhile, it is type safe and may capture many errors during the compile time. Creating a JSON object is as simple as

val doc = json"""
  {
    "store": {
      "book": [
        {
          "category": "reference",
          "author": "Nigel Rees",
          "title": "Sayings of the Century",
          "price": 8.95
        },
        {
          "category": "fiction",
          "author": "Evelyn Waugh",
          "title": "Sword of Honour",
          "price": 12.99
        },
        {
          "category": "fiction",
          "author": "Herman Melville",
          "title": "Moby Dick",
          "isbn": "0-553-21311-3",
          "price": 8.99
        },
        {
          "category": "fiction",
          "author": "J. R. R. Tolkien",
          "title": "The Lord of the Rings",
          "isbn": "0-395-19395-8",
          "price": 22.99
        }
      ],
      "bicycle": {
        "color": "red",
        "price": 19.95
      }
    }
  }
  """

You can use the dot notation to access its fields just like in JavaScript:

doc.store.bicycle.color
doc.store.book(0).author

It is worth noting that we didn’t define the type/schema of the document while Scala is a strong type language. In other words, we have both the type safe features of strong type language and the flexibility of dynamic language in Unicorn’s JSON library.

We can also query JSON structures with JSONPath expressions in the same way as XPath expression are used in combination with an XML document.

val jspath = JsonPath(doc)

// the authors of all books in the store
jspath("$.store.book[*].author")

// all authors
jspath("$..author")

// all things in store
jspath("$.store.*")

// the price of everything in the store
jspath("$.store..price")

// the third book
jspath("$..book[2]")

// the last book in order
jspath("$..book[-1:]")

// the first two books
jspath("$..book[0,1]")
jspath("$..book[:2]")

// filter all books with isbn number
jspath("$..book[?(@.isbn)]")

//filter all books cheaper than 10
jspath("$..book[?(@.price<10)]")

// all members of JSON structure
jspath("$..*")

Documents

With the easy-to-use document model and the approach of data-as-API, agile development is not a dream. A document is essentially a JSON object with a unique key. With document data model, the application developers will focus on the business logic while Unicorn efficiently maps documents to key-value pairs in BigTable.

It is easy to insert/upsert a document and get it back with the key.

// Create a table of documents.
val db = Unibase(Accumulo())
db.createTable("worker")
val workers = db("worker")

// Upsert a document
val joe = JsObject(
  "name" -> "Joe",
  "gender" -> "Male",
  "salary" -> 50000.0,
  "address" -> JsObject(
    "street" -> "1 ADP Blvd",
    "city" -> "Roseland",
    "state" -> "NJ",
    "zip" -> "07068"
  ),
  "project" -> JsArray("HCM", "NoSQL", "Analytics")
)

val key = workers.upsert(joe)

// Get it back
workers(key).get.prettyPrint

To update a document, simply throw a JSON object compatible with MongoDB’s API:

val update = JsObject(
   "$id" -> key,
   "$set" -> JsObject(
     "salary" -> 100000.0,
     "address.street" -> "5 ADP Blvd"
   ),
   "$unset" -> JsObject(
     "gender" -> JsTrue
   )
)

workers.update(update)

In SaaS applications, multi-tenancy, which multiple clients share the same database but each should see only its own data, is common. Unicorn supports multi-tenancy nicely to ensure the suitable view to the clients.

val workers = db("worker")
workers.tenant = "IBM"
val ibmer = workers.upsert(json"""
  {
    "name": "Tom",
    "age": 40
  }
""")

workers.tenant = "Google"
val googler = workers.upsert(json"""
  {
    "name": "Tom",
    "age": 30
  }
""")

Because the tenant is “Google” now, the data of tenant “IBM” are not visible.

unicorn> workers(ibmer)
res5: Option[unicorn.json.JsObject] = None
unicorn> workers(googler)
res6: Option[unicorn.json.JsObject] = Some({"name":"Tom","age":30,"_id":"545ed4d1-280c-4b6a-a3cc-e0a3c5fc5b43"})

There are a lot of other cool features such as locality, scripting, time travel, filter, etc., which we cannot cover in this short overview. Please refer to our Github project for details. But we do want you to taste some of our Spark and Graph features in the following.

Spark

For large scale analytics, we can export documents to Spark as RDD[JsObject].

import org.apache.spark._
import org.apache.spark.rdd.RDD

val conf = new SparkConf().setAppName("unicorn").setMaster("local[4]")
val sc = new SparkContext(conf)

val db = new Narwhal(HBase())
val table = db("worker")
table.tenant = "IBM"
val rdd = table.rdd(sc, json"""
                          {
                            "$$or": [
                              {
                                "age": {"$$gt": 30}
                              },
                              {
                                "state": "NJ"
                              }
                            ]
                          }
                        """)
rdd.count()

For analytics, SQL is still the best language. We can easily convert RDD[JsObject] to a strong-typed DataFrame to be analyzed in SparkSQL.

val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._

case class Worker(name: String, age: Int)
val workers = rdd.map { js => Worker(js.name, js.age) }
val df = sqlContext.createDataFrame(workers)
df.cache
df.show

df.registerTempTable("worker")
sqlContext.sql("SELECT * FROM worker WHERE age > 30").show

Graph

Unicorn supports directed property multigraphs. Documents from different tables can be added as vertices to a multigraph. It is also okay to add vertices without corresponding to documents. Each relationship/edge has a label and optional data (any valid JsValue, default value JsInt(1)). In what follows, we create a graph of gods, an example from Titan graph database.

val db = Unibase(Accumulo())
db.createGraph("gods")
val gods = db.graph("gods", new Snowflake(0))

val saturn = gods.addVertex(json"""{"label": "titan", "name": "saturn", "age": 10000}""")
val sky = gods.addVertex(json"""{"label": "location", "name": "sky"}""")
val sea = gods.addVertex(json"""{"label": "location", "name": "sea"}""")
val jupiter = gods.addVertex(json"""{"label": "god", "name": "jupiter", "age": 5000}""")
val neptune = gods.addVertex(json"""{"label": "god", "name": "neptune", "age": 4500}""")
val hercules = gods.addVertex(json"""{"label": "demigod", "name": "hercules", "age": 30}""")
val alcmene = gods.addVertex(json"""{"label": "human", "name": "alcmene", "age": 45}""")
val pluto = gods.addVertex(json"""{"label": "god", "name": "pluto", "age": 4000}""")
val nemean = gods.addVertex(json"""{"label": "monster", "name": "nemean"}""")
val hydra = gods.addVertex(json"""{"label": "monster", "name": "hydra"}""")
val cerberus = gods.addVertex(json"""{"label": "monster", "name": "cerberus"}""")
val tartarus = gods.addVertex(json"""{"label": "location", "name": "tartarus"}""")

gods.addEdge(jupiter, "father", saturn)
gods.addEdge(jupiter, "lives", sky, json"""{"reason": "loves fresh breezes"}""")
gods.addEdge(jupiter, "brother", neptune)
gods.addEdge(jupiter, "brother", pluto)

gods.addEdge(neptune, "lives", sea, json"""{"reason": "loves waves"}""")
gods.addEdge(neptune, "brother", jupiter)
gods.addEdge(neptune, "brother", pluto)

gods.addEdge(hercules, "father", jupiter)
gods.addEdge(hercules, "mother", alcmene)
gods.addEdge(hercules, "battled", nemean, json"""{"time": 1, "place": {"latitude": 38.1, "longitude": 23.7}}""")
gods.addEdge(hercules, "battled", hydra, json"""{"time": 2, "place": {"latitude": 37.7, "longitude": 23.9}}""")
gods.addEdge(hercules, "battled", cerberus, json"""{"time": 12, "place": {"latitude": 39.0, "longitude": 22.0}}""")

gods.addEdge(pluto, "brother", jupiter)
gods.addEdge(pluto, "brother", neptune)
gods.addEdge(pluto, "lives", tartarus, json"""{"reason": "no fear of death"}""")
gods.addEdge(pluto, "pet", cerberus)

gods.addEdge(cerberus, "lives", tartarus)

For graph traversal, we support a Gremlin-like API. The following example shows how to get saturn’s grandchildren’s name.

val g = gods.traversal
g.v(saturn).in("father").in("father").name

Beyond simple graph traversal, Unicorn supports DFS, BFS, A* search, Dijkstra algorithm, etc.

val path = GraphOps.dijkstra(jupiter, cerberus, new SimpleTraveler(gods)).map { edge =>
  (edge.from, edge.label, edge.to)
}

path.foreach(println(_))

Note that this search is performed by a single machine. For very large graph, it is better to use some distributed graph computing engine such as Spark GraphX.

import org.apache.spark._

val conf = new SparkConf().setAppName("unicorn").setMaster("local[4]")
val sc = new SparkContext(conf)

val graph = db.graph("gods")
val graphx = graph.graphx(sc)

// Run PageRank
val ranks = graphx.pageRank(0.0001).vertices

In The Future

As you have seen, Unicorn has a lot of cool features. We have come a long way but we never stop. We are working hard to overhaul the design of full text search and secondary index. We are also working on full ACID distributed transaction management. Stay tuned and please also contribute to the project. Thank you!

Advertisements