Excellent post about Akka Streams — Akka Team
Really nice introduction to #Akka Streams! FEEL THE POWER! ^^ — Viktor Klang
A must-read on #Akka #Streams!!! — Ellan Vannin CA
In many computer programs, the whole logic (or a vast part of it) is essentially step-by-step processing of data. Of course, this includes the situation when we iterate over the data and just execute the processing logic on every piece of it. However, there are a couple of complications here:
- the processing logic may be quite complex, with various aggregations, merging, routing, error recoveries, etc.;
- we might want to execute steps asynchronously, for instance, to take advantage of multi-processor machines or use I/O;
- asynchronous execution of data processing steps inherently involves buffering, queues, congestion, and other matter, which are really difficult to handle properly (please, read “Handling Overload” by Fred Hébert).
Therefore, sometimes it is a good idea to express this logic on a high level using some kind of a framework, without a need to implement (possibly complex) mechanics of asynchronous pipeline processing. This was one of the rationales behind frameworks like Apache Camel or Apache Storm.
Actor systems like Erlang or Akka are fairly good for building robust asynchronous data pipelines. However, they are quite low-level by themselves, so writing such pipelines might be tiresome. Newer versions of Akka include the possibility for doing pipeline processing on a quite high level, which is called Akka Streams. Akka Streams grows from Reactive Streams initiative. It implements a streaming interface on top of Akka actor system. In this post I would like to give a short introduction to this library.
You might be interested in an introductory post about Akka in this blog.
We will need a Scala project with two dependencies:
1 2 |
"com.typesafe.akka" %% "akka-actor" % "2.4.14", "com.typesafe.akka" %% "akka-stream" % "2.4.14" |
Akka Streams basics
In Akka Streams, we represent data processing in a form of data flow through an arbitrary complex graph of processing stages. Stages have zero or more inputs and zero or more outputs. The basic building blocks are Source
s (one output), Sink
s (one input) and Flow
s (one input and one output). Using them, we can build arbitrary long linear pipelines. An example of a stream with just a Source
, one Flow
and a Sink
:
Continue reading