Linkblog #3

Articles

  • NOSQL Patterns – an article about various patterns you can meet in NoSQL-systems: data partitioning, replication, cluster membership, consistency models etc.
  • NoSQL Data Modeling Techniques. If the previous article is about internals of NoSQL systems, this one is closer to usage. It covers some data modeling techniques useful for storing your data in a NoSQL storage, such as denormalization, aggregation, hierarchy storing etc.
  • Facebook’s Mystery Machine: End-to-end Performance Analysis of Large-scale Internet Services – an overview of a paper that describes Facebook’s approach to analyze (internal) service performance and interesting findings about it. Shortly, they parse logs of internal services and this gives them information about how long each part of a request is. The approach requires no additional instrumentation, as it said.
  • A series of articles about implementing monads in C# with a bit of theory. The series shows that there is no mystery in monads and monad pattern can be easily implemented in a non-functional language like C#. That might be useful for understanding monads without diving into a language like Haskell. Follow the links at the end of each article to get to the next part.
  • Java Garbage Collection Distilled – an explanation of garbage collectors used HotSpot JVM and OpenJDK, GC tradeoffs, its monitoring and tuning.
  • Akka Cluster Load Balancing – an approach to adaptive load balancing in Akka cluster based on free heap space metrics. The author has implemented a custom actor router that directs workload to the actor on a node with the smallest heap.

Talks

Projects

  • Bazel – Google have open-sourced theirs build tool Bazel that is used to build most of theirs projects. (Marked as alpha.)
  • facebook-tunnel – have free Facebook traffic? You can get to your internets through Facebook chat :) An amusing proof-of-concept.

Podcasts

  • IoT Podcast – a new podcast about Internet of things. There are two episodes for now.

Courses

  • Principles of Reactive Programming – the new iteration of Martin Odersky’s, Erik Meijer’s and Roland Kuhn’s course about reactive programming, reactive streams, actors, Akka, etc. The material is promised to be updated and improved. It starts at 13 April. I finished the course already, but it will be interesting to know how it has changed.
    If you are going to take the course, drop me a couple of lines, it will be interesting to discuss things.

And a funny picture for you →

The Bloom filter

In many software engineering problems, we have a set and need to determine if some value belongs to this set. If the possible maximum set cardinality (size; maximum size = total count of elements we consider) is small, the solution is straightforward: just store the set explicitly (for instance, in form of a RB-tree), update it when necessary and check if the set contains elements that we are interested in. But what if maximum set cardinality is large or we need many such sets to operate simultaneously? Or if the set membership test is an expensive operation?

Suppose we want to know if an element belongs to a set. We have decided that it is acceptable to get false positive answers (the answer is “yes”, but the element is not actually in the set) with probability p and not acceptable to get false negative (the answer is “no”, but the element in actually in the set). The data structure that could help us in this situation is called the Bloom filter.

A Bloom filter (proposed by Burton Howard Bloom in 1970) is a bit array of m bits (initially set to 0) and k different hash functions. Each hash function maps a value into a single integer number.

Look at this picture from Wikipedia:
Bloom filter

Continue reading

Linkblog #2

Articles

Talks

Projects

Job

Books

  • Forecasting: principles and practice by Rob J. Hyndman and George Athana­sopou­los. I just have started to read it, but it seems to be a good introduction to forecasting and various analytics on time series.

Courses

And a bit of humour for you →

Linkblog #1

Articles

Projects

Podcasts

Job

And a funny picture for you →

Delayed message delivery in RabbitMQ

UPD June 01, 2015: there is a plugin for this now.

A lot of developers use RabbitMQ message broker. It is quite mature but still lacks for some features that one may need. One of them is delayed message delivery: there is no way to send a message that will be delivered after a specified delay (it’s a limitation of AMQP protocol). Hopefully, there is a hack for this.

RabbitMQ logo

Let’s start from dead letters. A message can become “dead” by several reasons, such as rejection or TTL (time to live) expiration. RabbitMQ can deal with such messages by redirecting them to a particular exchange and routing key. We can use this ability to implement delayed delivery. We will create a special queue for holding delayed messages. This queue will not have any subscribers in order for messages to expire. After the expiration, messages will be passed to a destination exchange and routing key, just as planned.

Continue reading

Introduction to Akka

There are several models of concurrent computing, the actor model is one of them. I am going to give a glimpse of this model and one of its implementation – Akka toolkit.

Akka logo

The actor model

In the actor model, actors are objects that have state and behavior and communicate to each other by message passing. This sounds like good old objects from OOP, but the crucial difference is that message passing is one-way and asynchronous: an actor sends a message to another actor and continues its work. In fact, actors are totally reactive, all theirs activity is happening as reaction to incoming messages, which are processed one by one. However, it is not a limitation because messages can be of any sort including scheduled messages (by timer) and network messages.

Continue reading

Value Classes in Scala

Type systems and compile-time type checking are great things that can save you a couple of hours of debugging and also have documenting potential, could make the code more understandable. In my opinion, it’s wise to use them, and unfortunately, sometimes we don’t do this enough. Consider Integer/Int/int. A counter could be Integer, an entity identifier could be Integer, an integer number in arithmetic expression could be Integer. In most cases all this Integers have nothing to do with each other: in your domain it is a bad idea to compare them, do arithmetic operations on them, pass one instead of another as a function parameter etc.

In one of my projects (in C#) there are a dozen of domain entities that have integer identifiers that are passed all over the code. After a couple of bugs connected with mixed up identifiers of different entities I’ve solved this problem by replacing plain integer numbers with structs (in C#, structs are value types used for representing lightweight objects such as Point or Color) like Id<EntityName>T (T is to distinct type from property names). The key idea was to introduce a new level of types to let the type checker intently look at the code instead of me. It’s worked: I’ve gotten rid of some old bugs in rarely used parts of code and hope new bugs of such a type won’t bother me in the future. (Aside: I hope, this post will persuade you not only to consider using value classes but also to think about the role of types in code quality).

Continue reading

Why I like Scala

I am familiar (more or less) with a number of programming languages and have both emotional and rational thoughts of them. Scala is for certain in the group of languages I like. I have decided to summarize my judgments of Scala attractive parts in a blog post and here it is. Also, I have got some ideas of posts about Scala and its technology stack and an introduction is possibly needed.

Scala logo

Scala logo

Scala is a general purpose programming language created by Martin Odersky more than ten years ago. It compiles into JVM byte code and interoperable (both direction) with Java (including mixed compilation), which gives Scala an ability to use all this enormous amount of code created for JVM. The interesting property and also one of the strongest selling points of the language is fusion of object-oriented and functional programming paradigms.

Continue reading

Creating a simple parser with ANTLR

Recently, I’ve faced a task of developing a tool which allows the application to have base of (not very complex) logical rules. There were three demands:

  1. The rules were to be written by non-programmers, so using of the languages which the program is written in (Java/Scala), wasn’t very good.
  2. The rule base should be changeable without redeployment of the application, ideally, should be stored in a database.
  3. We should have control on compilation and error emission.

The first and the second demand could be met by developing some kind of Scala- or Groovy-based DSL, extremely simple. But I’ve come with several arguments against:

  • The third requirement might be hard to meet.
  • The rules are quite simple, so embedding an interpreter of a general-purpose language might be overkill.
  • The language which rules consumer is written in might be changed (from Java/Scala to Python e.g.)

So, I’ve decided to write a very simple rule parser/compiler. After I’d created a prototype I decided to write this post, hope it’ll be useful. I say in advance that you can see the code adapted for this post in this repository.
Continue reading

GNU Parallel

How much CPU cores does your computer have? 2-8, I think. It’s very time to use them all, isn’t it? But there are plenty of Unix utils such as grep, find, wc etc., which have no idea about parallel data processing. They can’t split their input into 8 pieces and spawn the corresponding number of threads or processes to process it using all the power of your modern CPU.

Definitely, this problem is quite interesting and practical to rest unsolved. According to the Unix philosophy, 1) it is good for programs to do one thing well; 2) it is a good idea to combine simple programs to do complex things. grep do pattern matching well. How about parallelization?

There is an utility know as GNU Parallel which main purpose is to execute arbitrary jobs in parallel on one or even multiple machines. The program is quite complex and multifunctional, look at man and tutorial. Here, I want just to give a little flavor of it.

Continue reading