Introducing Akka Cloud to Edge Continuum. Build once for the Cloud. Seamlessly deploy to the Edge - Read Blog
Support
akka reactive

Concurrent sharing of “data in motion” across clusters with CRDTs in Akka Distributed Data

If we used just our hands to count the differences between traditional “Full Stack” applications and the new generation of loosely-coupled, distributed “Reactive” systems, we’d quickly run out of fingers.

Among things like higher resiliency and better elasticity due to asynchronicity and distribution, Reactive systems are built for hybrid cloud environments, multi-core processing, in-memory computation, data streaming, experimenting using microservices instead of large monolithic app servers, and so on.  

But how do Reactive systems, namely those based on Akka Clusters employing the actor model, concurrently share data across nodes, especially in light of issues like network partitions? This is when one or more nodes in the cluster become unresponsive to pings, and must be handled appropriately. After all, the node that believes the others to be unreachable may in fact be the unreachable node, and it’s good to know this. Especially when data sharing is required to perform correctly, in examples like:

  • Shopping cart that is available from all nodes (e.g. Amazon, Walmart, Expedia, etc.)
  • Location registry of microservices (e.g. Netflix, Twitter, Gilt, etc.)
  • State tracking of distributed compute jobs (think Apache Spark in the Cloud)
  • Caching of data, which is owned by external services (e.g. Banks, Medical, etc)

If you’re unfamiliar with Akka actors, then this isn’t as simple as you might think. I sat down with Patrik Nordwall, a senior developer on the Akka team, to talk more about Akka Distributed Data, and its use of Conflict-free Replicated Data Types (CRDTs), a special data type designed specifically to resolve conflicts that arise during concurrent updates across a cluster in a deterministic way.

Why it is hard to share data in actor-based distributed systems?

In distributed Reactive systems, there is information that needs to be shared across the cluster–this can be done very fast, with eventual consistency across nodes, or slower, with consistent data across nodes. It really depends on your needs.

So image that you are a site like Amazon using an Akka Cluster-based application where the user shopping cart needs to be available from all nodes. You can approach sharing data across nodes in a couple ways, neither of which are silver unicorns:

  1. Publish messages via the distributed pub-sub utility. This solution comes with some problems, such as what to do with lost messages, and how to get old information to new members in the cluster.
  2. Push to a database. This will add more infrastructure to your system, and this option is unlikely to be able to notify you in near-real-time when something is changed with actors.

Another concern is how to deal with concurrent updates. What happens when two actors on two different nodes in the cluster update the state at the same time? What is the correct value? And anyone working with distributed systems will eventually have to deal with the dreaded network partition mentioned above, in which the actors cannot communicate for a while yet still need to be able read the latest known data and eventually update it from both sides of the network split.  

Luckily, CRDTs were invented specifically for these scenarios. Here’s how they work...

What are CRDTs and why should I care?

CRDTs are unique data types that always know how to resolve conflicts of concurrent updates in a deterministic way. That means that updates can be performed from different places without any coordination, even when a communication channel is unavailable. Later, the changes can be reconciled into an eventually consistent state.

One of the most trivial CRDTs is a set where you can only add elements. Add "a" to the set from node-1 and "b" from node-2. When the set containing "a" is sent from node-1 to node-2 it is merged with the set containing "b" by taking the union of the two sets, i.e. the resulting set contains "a" and "b".

It does not matter in which order this is performed or how many times it is repeated, the end result is always the same. This type of set is trivial to implement because it does not allow removal of elements, but there are other types of CRDT sets that also support the removal of elements.

In the image above, we can see an example that illustrates how CRDTs can be implemented in a counter. Instead of using one single counter value, it is internally using one counter value per node. When the counter is incremented from node-1 it increments the slot belonging to node-1. Node-2 has its own slot, and so on. When merging it takes the highest value of each slot. The total count is the sum of all slots.

So why should you care about this? Basically, without CRDTs and concurrent data sharing among actors in your distributed system, you won’t be able to ensure the eventual consistency of any data in your production systems.

Akka Distributed Data and “Eventual Consistency” with CRDTs

Akka Distributed Data is a module that provides several useful CRDTs for sets, maps, counters and registers. It takes advantage of Akka features for replicating the data across nodes in a cluster via direct replication and gossip-based dissemination; this makes it incredibly fast at handling small data sets and let’s you have fine-grained control of the consistency level for reads and writes––i.e. highly persistent (meaning slower, but consistent data...see Akka Persistence for this) or eventually consistent (meaning faster, with data eventually becoming consistent).

Akka Distributed Data is eventually consistent and geared toward providing high speed read-write actions with low latency and a tolerance for network partitions. Compared to a persistent system (i.e. which is what Akka Persistence is used for), an eventually consistent system runs processes much faster but may return a read with an out-of-date value, which will be updated when full responsiveness is regained.

The data is accessed with an actor providing a key-value-store-like API. Below is an example of an actor that schedules tick messages to itself and for each tick adds or removes elements from a ORSet (observed-remove set), additionally subscribing to changes of this:

 object DataBot {
  private case object Tick
}
 
class DataBot extends Actor with ActorLogging {
  import DataBot._
 
  val replicator = 
    DistributedData(context.system).replicator
  implicit val node = Cluster(context.system)
 
  import context.dispatcher
  val tickTask = context.system.scheduler.schedule(5.seconds, 
    5.seconds, self, Tick)
 
  val DataKey = ORSetKey[String]("key")
 
  replicator ! Subscribe(DataKey, self)
 
  def receive = {
    case Tick =>
      val s = ThreadLocalRandom.current().nextInt(97, 123)
        .toChar.toString
      if (ThreadLocalRandom.current().nextBoolean()) {
        // add
        log.info("Adding: {}", s)
        replicator ! Update(DataKey, 
          ORSet.empty[String], WriteLocal)(_ + s)
      } else {
        // remove
        log.info("Removing: {}", s)
        replicator ! Update(DataKey,
          ORSet.empty[String], WriteLocal)(_ - s)
      }
 
    case _: UpdateResponse[_] => // ignore
 
    case c @ Changed(DataKey) =>
      val data = c.get(DataKey)
      log.info("Current elements: {}", data.elements)
  }
 
  override def postStop(): Unit = tickTask.cancel()     
}

How to get started today

Akka Distributed Data is available in Akka 2.4.0-RC1 (or later) and the documentation contains more information of how to get started. Several interesting samples are included and described in the Typesafe Activator tutorial named Akka Distributed Data Samples with Scala and Akka Distributed Data Samples with Java. Examples include:

  • Low Latency Voting Service
  • Highly Available Shopping Cart
  • Distributed Service Registry
  • Replicated Cache
  • Replicated Metrics

Get help with Akka from the experts

  • Training - Fast Track or Advanced on-site training courses for Java or Scala
  • Consulting - Code Reviews, Architecture Reviews, Production Readiness Reviews
  • Expert support - Unlimited support queries, 1-1 interaction and up to 24/7 production SLA with Typesafe Reactive Platform

SCHEDULE A 15MIN CHAT

 

The Total Economic Impact™
Of Lightbend Akka

  • 139% ROI
  • 50% to 75% faster time-to-market
  • 20x increase in developer throughput
  • <6 months Akka pays for itself