Resource Discovery in Erlang

It is interesting and surprising how some of the fundamental things are missing from Erlang OTP lib. Of course it is impossible to include everything into OTP but some of the things are almost obvious. On my opinion Ulf Wiger’s Gproc lib is one of such examples, I use it all the time to give a process a name and refer to it by a name rather by PId later on. It is easy to do with Gproc (without having to make a process into registered one) and project code quality is superb (not surprising given who the author is). Hopefully GProc will make it into OTP one day.

But another example is resource discovery problem in Erlang cluster. It is not easy, if you have a system made up from multiple components you kind of need to hardcode the names of the nodes that provide a service of a given type. And what if you have new nodes entering and exiting your cluster all the time? I don’t think OTP addresses this problem easily. Fortunately, Martyn Logan showed the possible solution in “Erlang and OTP in Action” book. In chapter 8 there is an example of a simple resource discovery protocol. Apart from the fact that “Erlang and OTP in Action” is a great book in its own right, it is worth buying just for this chapter alone.

Martyn implemented the initial idea and I and other people made few additions to it and now it is available on github resource_discovery .

I think it is as useful as Ulf’s GProc. The idea is simple, you have nodes in your cluster which provide services (e.g. logger or webserver or task worker, etc) and there are nodes which need to consume such services (e.g. a send a log message to a one of 10 different loggers). This is where you need automatic resource discovery mechanism, so instead of picking a name of the node from some config file you can ask a question : give me a resource of type ‘logger’ or whatever. And the system will reply with a list of all resources of this type. The resource could be a name of the node or PID of the process, it doesn’t matter. The important thing it is all dynamic, so if you need to add extra task workers to your cluster, you do it and then the resource discovery protocol will know that you have new nodes which provide ‘worker’ service. And the same thing happens when services are dropping off – one of 10 loggers could disappear from the cluster and it will be purged from the resource discovery automatically.

If you decide to use it you need to add “Resource Discovery” as a dependency to your project:

{deps, [
	{'resource_discovery', ".*",{git, "", "master"}}

you need to start ‘resource_discovery’ application, I usually do it as part of my start/0 function in _app.erl module:

-define(APPS, [lager, resource_discovery, example]).

%% Application callbacks
-export([start/0, start/2, stop/1]).

%% ===================================================================
%% Application callbacks
%% ===================================================================

start() ->
    [begin application:start(A), io:format("~p~n", [A]) end || A 
<- ?APPS].

start(_StartType, _StartArgs) ->
    lager:info("starting example on a node ~p", [node()]),

stop(_State) ->

then in the init/0 function of the process which provides the service, you announce that you have a service of the given type by adding it to resource discovery:

resource_discovery:add_local_resource_tuple({worker, self()}),

You can also register your interest to the service of another type that some other resource in the cluster provides and trigger resource synchronization:


here is possible example for init function:

init([]) ->
    process_flag(trap_exit, true),
    lager:info("starting task server on: ~p", [node()]),
    %% announce via resource_discovery that we have available resource
    resource_discovery:add_local_resource_tuple({worker, self()}),
    %% add request for logger
    %% synch resources
    {ok, #state{}}.

Now, if you need to find a PID of resource ‘worker’ from another nodes and use it (e.g. by sending a message with a task to it), you can ask ask how many such resources exist or get all resources or get a single resource:

 NofResource = resource_discovery:get_num_resource('worker'),
 AllWorkers = resource_discovery:get_resources('worker'),
 SingleWorker  = resource_discovery:get_resource('worker')

and when your worker leaves the cluster, you can cleanup the resource from the global resource_registry:

terminate(_Reason, _State) ->
    %% make resource 'worker' unavailable for other clients
    lager:info("worker is shutting down on node ~p", [node()]),

I find resource_discovery app hugely useful and hope that somebody else feels the same.

This entry was posted in Erlang, Resource Discovery and tagged , . Bookmark the permalink.

9 Responses to Resource Discovery in Erlang

  1. Pingback: Community | Mostly Erlang

  2. Could it be used to make some kind of “load-balancer” between resources? For instance we could ask to send a message to a worker of some type (resource type) and it would select the resource based on some algorithm (LRU, or round-robin). Is there some king of project that does this?

    • yes, in this scenario you would register your workers as “worker” resource type and from and when you have a task to execute you ask resource discovery for an instance of “worker” resource. You will get a pid of worker or its name, depending on what is your definition of “worker” resource and then just send a task to a pid for execution. Of course, as alternative you might choose to use RabbitMQ for example, to send your tasks to and get your workers to read from the queue.

  3. Pingback: 015 Languages With Robert Virding | Mostly Erlang

  4. This is a subject that I am also interested in. But I am not sure how resource_discovery would behave in production environment. Is this something that is used in production by anybody? Maybe they can share their experience with it.
    I am also wondering if riak-core can be used as a resource discovery backend? It might be very helpful for big clusters.

    • Riak_core does implement a gossip protocol but I haven’t looked at it in details and if u want riak_core just for resource discovery, it might be an overhead. I am yet to test a performance with large clusters, the protocol might be a bit chatty. Also, I am discussing some improvements with Martin and Eric now, related to more robust boot up sequence, currently RD crashes if it can’t ping any specified nodes so you need to add a localnode into a list of nodes to ping to. Martin is going to introduce an option in config file not to crash in the absence of reachable nodes.

  5. odo says:

    This looks neat! Since it requires the Erlang nodes to be configured it would benefit from no-hands node discovery mechanism like
    That would give a zeroconf-style behaviour.

    • Nice, looks like this framework uses multicast to discover nodes, this might be problematic in environments where multicasts are banned. I think Martin Logan investigated use of multicasts already.

      • odo says:

        You are right. We occasionally have problems when hosters have some `special` network setup. Trying on EC2 would be interesting.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s