How does Riak_Core ring distribute vnodes

As I have already mentioned Riak_Core is a great framework to use if you like Erlang and have a problem which could be solved with a masterless distributed ring of hosts.

There are several introduction articles about riak_core but I think the initial introduction by Andy Gross, Try Try Try blog by Ryan Zezeski and Riak Core wiki still remain the most useful resources to learn about riak_core.

So why to write this post you might wonder? Because even after having studied every single bit of information about riak_core multiple times I still had tons of questions about how the ring actually worked. It might well be the evidence of my stupidity but it also says something about the quality of Basho’s documentation which … could be better (in contrast with their Erlang code which is beautiful). For me the best way to figure something out is to write it down, so what follows next is more for me than for you – sorry.

So, here is the ring:

Each sector of the ring represents a *primary* partition – this is important, in a stable ring there are no *fallback* partitions. Typically when you save data into the ring you use a write replication factor (usually W=3) and riak will pick a sector on the ring and then walk down the ring and pick up the next two sequential partitions – this is what is called Preflist. As you can see from the picture these partitions will live on different physical nodes and your data will be replicated across multiple physical nodes. All of this is true if you followed Basho’s recommendation and setup your ring correctly.

So the question number one – what happens at a startup of each physical node in the ring? I mean, you built the ring, committed the changes so each physical node in the ring knows exactly what partitions belong to it?

In reality each physical node always starts up *all* the vnodes configured for the ring. If the ring was setup with 64 partitions – each node on startup will create 64 vnodes. It is easy to check – set lager logging level into “debug” and restart some node in the ring, you will see that it starts all the partitions.

21:47:27.551 [debug] Will start VNode for partition 0
21:47:27.561 [debug] Will start VNode for partition 548063113999088594326381812268606132370974703616
21:47:27.562 [debug] Will start VNode for partition 91343852333181432387730302044767688728495783936

After that the newly started node exchange metadata information with the rest of the ring and starts an ownership handoff. When this is done the node will have only vnodes for the primary partitions that it owns – other vnodes processes will be shutdown.

Basically this means that you need to be careful if you want to have some logic in init function of your vnode that needs to execute only once for the primary partition on startup.

The next question – where do the *fallback* partitions come from?

You have your ring up and ready then something goes wrong and one of your physical nodes dies. The docs say your data will go to the *fallback* node but where is it and what is it?

When I initially read the docs I got the incorrect impression that one of the *primary* partitions will step in and accept the data until the failed node with a primary partition recovers. This is wrong.

Lets see what really happens:

I am running 3 node cluster and asking for 3 primary partitions to ping:

(node1@iMac.home)8> riak_core_apl:get_apl(riak_core_util:chash_key({<<"ping">>, <<"test1">>}), 3, magma).

now, lets kill node3 which holds partition 913438523331814323877303020447676887284957839360

and run the same command on the same node:

(node1@iMac.home)10> riak_core_apl:get_apl(riak_core_util:chash_key({<<"ping">>, <<"test1">>}), 3, magma).

ok, now we got the very similar list but the ring says that the partition 913438523331814323877303020447676887284957839360 now lives on node1 instead of node3.

Really? lets check it, observer on the node1 still shows that the node has only 6 original vnodes it had before.

Screen Shot 2015-08-23 at 00.01.09

so, lets run a ping against the {913438523331814323877303020447676887284957839360, ‘node1@iMac.home’}

(node1@iMac.home)12> riak_core_vnode_master:sync_spawn_command({913438523331814323877303020447676887284957839360, 'node1@iMac.home'}, ping, magma_vnode_master).
00:03:34.079 [debug] Will start VNode for partition 913438523331814323877303020447676887284957839360
00:03:34.080 [debug] vnode :: magma_vnode/913438523331814323877303020447676887284957839360 :: undefined
00:03:34.080 [debug] Started VNode, waiting for initialization to complete <0.3531.0>, 913438523331814323877303020447676887284957839360
00:03:34.080 [debug] VNode initialization ready <0.3531.0>, 913438523331814323877303020447676887284957839360

this is interesting, riak *has started* a new vnode process for us and this vnode replied to the ping.

and sure enough now we can find this process in the observer:

Screen Shot 2015-08-23 at 00.04.18

So, this is the *fallback* vnode for the failed primary partition, it has the same partition number but lives on a different physical node and got created auto-magically on the request.

Lets see what happens now when we bring the node3 back to life.

It takes few moments for a new node3 to join the cluster and for the ring to gossip metadata information but when it is done the ring realises that now we have two vnodes for the same partition. This realisation will trigger *hinted* handoff and data (if there were any) will be moved from node1 to the node3 that hosts the primary partition now. After that *fallback* partition on node1 gets deleted.

0:12:13.110 [debug] completed metadata exchange with 'node3@iMac.home'. nothing repaired
00:12:15.344 [debug] 913438523331814323877303020447676887284957839360 magma_vnode vnode finished handoff and deleted.
00:12:15.344 [debug] vnode hn/fwd :: magma_vnode/913438523331814323877303020447676887284957839360 :: undefined -> 'node3@iMac.home'
00:12:15.345 [debug] 913438523331814323877303020447676887284957839360 magma_vnode vnode excluded and unregistered.
00:12:23.113 [debug] started riak_core_metadata_manager exchange with 'node2@iMac.home' (<0.5662.0>)

This is pretty cool and now makes the perfect sense.

Hopefully, this little post will help somebody to learn a little bit more about riak_core.

!!! UPDATE – shortly after I published this post, Valery Meleshkin (@sum3rman) pinged me and sent some awesome slides from the internal talk in his company about “Riak_Core Concepts and Misconceptions”. With his kind permission I am adding these slides here: riak_core

Posted in Erlang, riak_core | Tagged | Leave a comment

How to add Riak_Control to a custom Riak _Core application

Riak_Core is Erlang based implementation of Amazon Dynamo model written and open-sourced by Basho.  As everything written by Basho this framework is super awesome and hugely beneficial for anyone who is interested in writing distributed applications. It abstracts away tons of complicated issues which are typical for any distributed system.

In my opinion Riak_Core provides a next level of abstraction on top of Erlang OTP and makes it relatively straightforward to  build a project if you have a task fitting nicely into masterless, distributed model, which is what Dynamo model gives you.

Riak_Control is an application which provides a Web GUI dashboard for Riak.  It allows to monitor the health of Riak Ring, add / remove nodes and is basic and useful management tool for Riak.

Riak_Control is a generic app and could be used for any Riak_Core based system. To integrate it with your own app you just need to change few configuration settings:

First add riak_control and riak_api into your rebar.config:

{riak_control, ".*", {git, "", {tag, "2.1"}}},
{riak_api, ".*", {git, "", {tag, "2.1"}}}

riak_control uses mochiweb and webmachine so these dependencies will be pulled automatically. It would be very nice if Basho allowed the users to choose what webserver to use (I would much rather prefer to use Cowboy because it also gives you websockets which are missing in mochiweb) but currently it is not an option.

Add these apps to your application manifest so they could be auto-started before your app starts:

{applications, [

Add entries for riak_api and riak_control into your app config file:

   %% Other configs
   {http,[{"", 8199}]}

 {riak_control, [
                 {enabled, true}

and this is it.

When you build a release for your app and start it (either on a single node or multiple nodes) you should be able to go to the link http://localhost:8199/admin#/cluster and see something like this:

Screen Shot 2015-08-21 at 21.29.34

You now can use this dashboard to add / remove nodes for your app, which is very convenient.

Posted in Uncategorized | 1 Comment

How to setup Prolog mode for emacs on OSX

Erlang programmers are lucky because for them there is no question what language to learn next. Of course it should be Prolog. After all Erlang is “a bastard child of Scheme and Prolog”. And of course hard-core Erlang programmers use emacs :-). Are there any other text editors?

So, here are steps how to get SWI Prolog work with emacs on OS X.

1. Install SWI-prolog with brew:

brew install swi-prolog 

I had a problem with brew after upgrade to Yosemite.

/usr/local/bin/brew: /usr/local/Library/brew.rb: /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/bin/ruby: bad interpreter: No such file or directory
/usr/local/bin/brew: line 23: /usr/local/Library/brew.rb: Undefined error: 0

If you get the error above just edit file “/usr/local/Library/brew.rb” to point to Current version of ruby interpreter:

#!/System/Library/Frameworks/Ruby.framework/Versions/Current/usr/bin/ruby -W0

2. Install prolog-el mode. I use el-get, so M-x el-get-install prolog-el did a trick.

3. Add the following into your init.el:

  (autoload 'run-prolog "prolog" "Start a Prolog sub-process." t)
  (autoload 'prolog-mode "prolog" "Major mode for editing Prolog programs." t)
  (autoload 'mercury-mode "prolog" "Major mode for editing Mercury programs." t)
  (setq prolog-system 'swi)
  (setq auto-mode-alist (append '(("\\.pl$" . prolog-mode)
                                ("\\.m$" . mercury-mode))

To test installation, open a new buffer, change to prolog mode with M-x prolog-mode and press C-c RET. This should open prolog interpreter in inferior mode. C-c C-b allows to evaluate (“conduct”) a buffer with prolog code. C-h b – will give you all the key bindings available in prolog mode.


Screen Shot 2015-01-11 at 17.36.52

Posted in Uncategorized | Tagged , , | Leave a comment

How to use Erlang application release with multiple configurations

On my opinion configuring and building releases for an Erlang applications has always been the most obscure and mis-understood part of Erlang ecosystem. For me personally, to build a release would always take a day or two because I could never set all the settings correctly in reltool.config and erlang release invariably failed with the most un-helpful error message.

The only Erlang book which explained in details how Erlang releases were supposed to work was Erlang and OTP in Action

Fortunately, one of the authors of this book, Eric Merritt, wrote a great tool relax and building releases became much simpler. Eric gave a talk about it. If you use Erlang you definitely need to check Relx, it will make you life much easier.

But some issues still remained. Relx.config allows you to specify sys.config and vm.args files for your release that set the configurations for VM (e.g. a name of the node , -sname or -name, cookie, etc) and some of these settings were not possible to change if you wanted to run multiple instances of your release. E.g. overriding -name parameter via command line argument wasn’t possible.

This was massively inconvenient  as you would need to build multiple releases of your application if, for example, you needed to run a cluster of several nodes. In this case you would use overlays in rebar.config to override some params of the release and generate a separate release for each node. Example of such configuration you can find here.

Tristan Sloughter fixed this issue with one of his many additions to relx.

His solution (which is unfortunately not documented in relx official documentation yet) makes use of additional env var $RELX_REPLACE_OS_VARS. If this var is set to true, the application start up script generated with relx will replace placeholders in vm.args and sys.config files with values of env variables which match the values of placeholders. It will then create vm2.args and sys2.config (with replaced placeholders values) and start your application with these config files instead of original ones. This allows you to run the same release with multiple configs and override settings for PROD and QA for example.

for example if you want to override the name of the node and the cookie:

in your relx.config you need to have the following option:

{extended_start_script, true}.

is your vm.args you add placeholders that you want to replace:

-name ${NODE_NAME}
-cookie ${COOKIE}

then in your file

export NODE_NAME=node_prod
export COOKIE=prod
exec _rel/your_app/bin/your_app foreground "$@"

and in file

export NODE_NAME=node_qa
export COOKIE=qa
exec _rel/your_app/bin/your_app foreground "$@"

make sure you generate your release with at least 1.1.0 version of relx.

Kudos to Tristan for this great solution, I asked the question on this Issue, otherwise I might’ve never found this override.

Posted in Uncategorized | Leave a comment

How to learn to type on Kinesis Advantage

I have recently got myself a new keyboard –


I won’t go in length into advantages / disadvantage of using it – you can find all

these discussions somewhere else. So far I am loving it – mechanical keys are definitely a way to go. I am heavy emacs user and the ability to reprogram the key binding is a massive plus.

But this post is about a little LifeHack. This keyboard will make you cry for at least couple of weeks until you re-learn to type again as all the key chords that your fingers remember won’t work for you anymore.

You can find some Training material but they are in paper format. The better way to use them is to feed these excercises into some touch typing  application.

Fortunetely, TIPP10 if perfect app for this as it allows you to use your own exercises.

You can download the lesson files from here

Posted in Uncategorized | Leave a comment

How to manage public and private Puppet modules with Vagrant

If you use Puppet for provisioning Vagrant VMs, the chances are that you want to use some standard public Puppet modules which could be found at Puppet Forge and the question becomes how to get them installed as part of provisioning process? You see, you can’t just download them as part of your Puppet provisioning as Puppet catalog will not compile because the modules will be missing at the time of Puppet run. It is a chicken and egg situation.

You can, of course, manually download the modules from Forge and store them in the repository of your project in puppet/modules directory alongside with your own Puppet code. This is rather messy, you have to manually maintain the dependencies and it makes your repository bigger than necessary. There has to be a better way of doing this.

I did some reading and found few ideas how to deal with this issue. One possibility is to make use of Puppet module Librarian-puppet. Librarian-puppet, once installed, allows you to specify a list of Puppet modules that your infrastructure depends on in Puppetfile and it will manages the installation of these modules and their dependencies for you.

The trick is you have to install Librarian and make it install modules from Forge *before* you can run your Puppet provisioner, otherwise Puppet will complain about missing modules while compiling the catalog. So, we need to use shell provisioner, get Librarian-puppet installed and then use it to install modules from Forge. After that you can safely use Puppet provisioner to run your own Puppet code knowing that all the dependencies that your code needs have been already installed.

Here are the steps:

In your Vagrantfile add the following line:

  # install librarian-puppet and run it to install puppet common modules.
  # This has to be done before puppet provisioning so that modules are available
  # when puppet tries to parse its manifests
  config.vm.provision :shell, :path => "provision/shell/"

It should be added *before* any puppet provisioners.

This will run which for lives in /provision/shell directory and looks like this:
(I borrowed this script from this project , most of the ideas in this post come from it)

#!/usr/bin/env bash

# Directory in which librarian-puppet should manage its modules directory

# NB: librarian-puppet might need git installed. If it is not already installed
# in your basebox, this will manually install it at this point using apt or yum

$(which git > /dev/null 2>&1)
if [ "$FOUND_GIT" -ne '0' ]; then
  echo 'Attempting to install git.'
  $(which apt-get > /dev/null 2>&1)
  $(which yum > /dev/null 2>&1)

  if [ "${FOUND_YUM}" -eq '0' ]; then
    yum -q -y makecache
    yum -q -y install git
    echo 'git installed.'
  elif [ "${FOUND_APT}" -eq '0' ]; then
    apt-get -q -y update
    apt-get -q -y install git
    echo 'git installed.'
    echo 'No package installer available. You may need to install git manually.'
  echo 'git found.'

if [ "$(gem search -i librarian-puppet)" = "false" ]; then
  gem install librarian-puppet
  cd $PUPPET_DIR && librarian-puppet install --path modules-contrib
  cd $PUPPET_DIR && librarian-puppet update
fi cd-es into your provision/puppet directory which will be mounted to Vagrant VM as /vagrant/provision/puppet and installs Librarian. It then reads Puppetfile and installs the Puppet modules defined in Puppetfile into provision/puppet/modules-contrib directory on your host machine. Librarian-puppet installs modules to the directories relative to location of Puppetfile and you can specify the name of the root directory by passing –path param to librarian-puppet when it is run first time:

librarian-puppet install --path modules-contrib

Puppetfile just gives a location of forge and lists modules that you want to pull off forge (the format is flexible enough to be able to pull code repos from github as well) :

# Puppetfile
# Configuration for librarian-puppet. For example:
forge ""
mod "garethr/docker"
mod "camptocamp/archive"
mod "puppetlabs/vcsrepo"
mod "maestrodev/wget"
mod "puppetlabs/git"

And finally, in your Vagrantfile for Pupper provisioner configuration you need to define “module_path” and list the locations of your custom puppet modules and the public ones which you installed with Librarian:

  # Provide basic configuration, install git
  config.vm.provision "puppet" do |d|
    d.manifests_path = 'provision/puppet/manifests'
    d.manifest_file = 'site.pp'
    d.module_path = [ 'provision/puppet/modules-contrib', 'provision/puppet/modules' ]
    #d.options = "--verbose --debug"

make sure that empty “modules-contrib” folder exists in your project, it needs to be there at the time when you do ‘vagrant up’ otherwise Vagrant won’t be able to mount this folder to your VM.

git doesn’t allow to keep empty directories but you can get around this by adding the following .gitignore into modules-contrib folder.

# Ignore everything in this directory
# Except this file
Posted in Puppet, Vagrant | Tagged , | 2 Comments

How to make Vagrant and Puppet to clone private github repo

I have been playing with Vagrant and Puppet lately to automate my home development environment and got to the point when I needed to clone / build code from several private github repos. I ended up spending two days to make this work as it turned out to be more difficult then I expected.

1. To start with you need to have working SSH private / public keys on your host computer. Public key needs to be added to your github private repo. Follow this instructions to setup and test.

2. Now you have a private key on your machine and public key sitting on github side. You also want to enable SSH agent forwarding to save you time of typing in the passphrase associated with your private key every time you login to github. Here are the instructions how to enable ssh forwarding.

3. At this point you probably want to get you Vagrant provisioning process to be able to login to your private github repo as well. There are two ways of doing this:

  1. you can setup a Vagrant task which will find your private and public keys and copy them over to Vagrant VM (into /root/.ssh because Vagrant runs provisioning using root account). This is possible but not very convenient.
  2. much better way is to enable Vagrant VM to use already existing ssh keys from your host machine. This will allow you to share your Vagrantfile with other developers who might have access to your private repo and they will be able to use their SSH keys

Add the following to your Vangrantfile:

 config.ssh.private_key_path = [ '~/.vagrant.d/insecure_private_key', '~/.ssh/id_rsa' ]
 config.ssh.forward_agent = true

You also need to add “” hostname into a list of ssh known hosts in your Vagrant VM.

The problem is even if you enable ssh forwarding from your Vagrant VM, when cloning job makes a first time connection to github it will get the following message and fail

RSA key fingerprint is 16:27:ac:a5:7c:28:2d:36:63:2b:56:4d:eb:df:a6:48.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added ',' (RSA) to the list of known hosts.

One way around this is to set option StrictHostKeyChecking=no for ssh forwarding agent but this opens you up to certain security risk. Better way is to add to /root/.ssh/known_hosts as part of provisioning. You can do it with the following rule in Vagrantfile which needs to be executed before you make ssh connection to github:

  # add github to the list of known_hosts
  config.vm.provision :shell do |shell|
    shell.inline = "mkdir $1 && touch $2 && ssh-keyscan -H $3 >> $2 && chmod 600 $2"
    shell.args = %q{/root/.ssh /root/.ssh/known_hosts ""}

or ( better! ) use the following Puppet module:

 # -*- mode: ruby -*-
# vi: set ft=ruby :

class known_hosts( $username = 'root' ) {
    $group = $username
    $server_list = [ '' ]

    file{ '/root/.ssh' :
      ensure => directory,
      group => $group,
      owner => $username,
      mode => 0600,

    file{ '/root/.ssh/known_hosts' :
      ensure => file,
      group => $group,
      owner => $username,
      mode => 0600,
      require => File[ '/root/.ssh' ],

    file{ '/tmp/' :
      ensure => present,
      source => 'puppet:///modules/known_hosts/',

    exec{ 'add_known_hosts' :
      command => "/tmp/",
      path => "/sbin:/usr/bin:/usr/local/bin/:/bin/",
      provider => shell,
      user => 'root',
      require => File[ '/root/.ssh/known_hosts', '/tmp/' ]

this rule will create /root/.ssh directory and execute the following

array=( '' )
for h in "${array[@]}"
    #echo $h
    ip=$(dig +short $h)
    ssh-keygen -R $h
    ssh-keygen -R $ip
    ssh-keyscan -H $ip >> /root/.ssh/known_hosts
    ssh-keyscan -H $h >> /root/.ssh/known_hosts

you need to add this bash script into ‘files’ directory of your module, e.g. for me it lives in “puppet/modules/known_hosts/files/”

All you need now is the rule to clone the github repo, I am using Puppet module vcrepo and it looks like this:

    vcsrepo { "/opt/code/${repo}":
      ensure => latest,
      owner => $username,
      group => $username,
      provider => git,
      require => [ Package[ 'git' ] ],
      source => "<your account name>/<your project name>.git",
      revision => 'master',

Here we clone from github into local Vagrant directory “/opt/code/repo” and this directory will be owned by username that you can define.

You can also add a rule to build your code:

    exec{ 'make': 
      command => "make",
      environment => "HOME=/home/${username}",
      cwd => "/opt/code/${repo}",
      path => "/sbin:/usr/bin:/usr/local/bin/:/bin/",
      require => [ Vcsrepo[ "/opt/code/${repo}" ], Package[ 'erlang' ] ],

Hopefully these instructions will save you some time.

Posted in Vagrant | Tagged , , , | 3 Comments