Retrospective time-based UUID generation (with Spark)

I have faced a problem: having about 50 Gb of data in one database, export records to another database with slight modifications, which include UUID generation based on timestamps of records (collisions were intolerable). We have Spark cluster, and with it the problem did not seem even a little tricky: create RDD, map it and send to the target DB.

(In this post I am telling about Spark. I have not told about it in this blog so far, but I will. Generally, it is a framework for large-scale data processing, like Hadoop. It operates on abstract distributed data collections called Resilient Distributed Datasets (RDDs). Their interface is quite similar to functional collections (map, flatMap, etc.), and also has some operations specific for Spark’s distributed and large-scale nature.)

However, I was too optimistic – there were difficulties.

Continue reading

Type-safe query builders in Scala

Recently, I was hacking on a Scala library for queries to Cassandra database, phantom. At the moment, it was not able to build prepared statements, which I needed, so I added this functionality. However, phantom developers also implemented prepared statements quickly :) Nevertheless, I decided to write this post about the core idea of my implementation.

Caution: Don’t do this at home, use shapeless :)

My plan was to be able to prepare queries like this:

and after that to execute this query: query.execute("string", 1, false). Moreover, I wanted this execution to be type-safe, so it was not possible to execute something like query.execute(1, "string", 0).

I will abstract from phantom queries and will focus on the general idea. The code is here.

Continue reading