Joe's spitting in the sawdust Erlang tutorials
Tutorial number 2
Last edited 2003-02-10
A webserver daemon

This tutorial shows you how to build a simple web server. All the code is here.

The web server runs as a system daemon. About half this tutorial is concerned with setting up a system demon. The other half is about the design and implementation of a simple web server - this web server is extremely simple - it's only three modules - but it does illustrate a few common programming techniques. As an added extra I also show how to test the code.

Please, report all errors, omissions or improvements to the author.

1. Design of a web server
2. Making a system daemon
    2.1. red hat 7.3
    2.2. Detached mode
    2.3. Heartbeat mode
3. Increasing reliability

The problem

1. Design of a web server

The web server is typical of a large number of programs. It involves the interaction between Erlang an some entity operating in the outside world.

In order to do this in a consistent manner we write a device driver which we use to interface the external world with Erlang.

As far as an Erlang process is concerned, all other objects in it's universe are Erlang processes. The only thing that an Erlang process knows how to do, is how to to send to and receive messages from other Erlang processes.

A web server is therefore a simple process that receives a message containing a request to read a page and which responds by sending that page to the process which requested the page.

The code for a simple web-server is something like:

    
    receive
       {Client, Request} ->
    	Response = generate_response(Request)
           	Client ! {self(), Response}
    end.
    

Here Request is an Erlang term representing a parsed HTTP request, and Response is Erlang term representing an HTTP response. We also need some deep trickery to arrange that one instance of this process is started for each incoming request, the deep trickery is done in tcp_server.erl .

The above server is pretty simple, but it can only handle a single request. HTTP/1.1 persistent connections could be handled as follows:

    
    loop(Client) ->
        receive
    	{Client, close} ->
    	    true;
    	{Client, Request} ->
    	    Response = generate_response(Request),
    	    Client ! {self(), Response},
    	    loop(Client)
    	after 10000 ->
    	    true
        end.
    

This 11 line function handles HTTP/1.1 persistent connections, and does data streaming etc. the entire web server code is in web_server.erl .

Now recall that HTTP requests, are not simple Erlang terms, but are actually TCP steams, which obey an ad hock syntax and grammar - just to make life even more interesting, the TCP streams can be arbitrarily segmented.

For this reason we introduce an intermediary process (called a middle-man) The middle man is a process whose only job is to recombine fragmented TCP packets, parse the packets assuming they are HTTP requests, and send the requests to the web server. This is shown below:

The structure of the http driver is simple:

    
    relay(Socket, Server, State) ->
        receive
    	{tcp, Socket, Bin} ->
    	    Data = binary_to_list(Bin),
    	    parse_request(State, Socket, Server, Data);
    	{tcp_closed, Socket} ->
    	    Server ! {self(), closed};
    	{Server, close} ->
    	    gen_tcp:close(Socket);
    	{Server, {data, Data}} ->
    	    gen_tcp:send(Socket, Data),
    	    relay(Socket, Server, State);
    	{'EXIT', Server, _} ->
    	    gen_tcp:close(Socket)
        end.
    

If a packet comes from the client, via a tcp socket, it is parsed by calling parse_request/4, if a message comes from the server it is sent directly to the client, and if either side terminates the connection, or an error occurs in the server, the connection is closed down. If this process terminates for any reason all the connections are automatically closed down (to see why this is so you should examine the link structure of the the program).

The variable State is a state variable representing the state of simple re-entrant parser that is used to parse the incoming HTTP requests.

This code is in http_driver.erl

summary

The web server is built from two main modules web_server.erl and http_driver.erl - the http driver is a simple re-entrant parser that interfaces the web server to the external world. The web-server thinks it is taking to an Erlang process. All the nasty little details of re-entrant parsing, and packet assembly are hidden in a device driver.

2. Making a system daemon

We want to run our web server as a system daemon.

A system daemon is a program which is automatically started when the system is started. To do this is system dependent. The notes below show how to make a system demon on my red hat 7.3 Linux machine. If anybody would like to mail me the details for how to do this for other operating systems I will happily include the details in the tutorial.

2.1. red hat 7.3

In the end of the file /etc/rc.d/rc.local I have edited in the following line:

    
    # start my local demons
    /etc/rc.d/joe_services.sh &
    

This runs the script /etc/rc.d/joe_services.sh in the background. Note the & is very important - running this in the foreground can be disastrous and may deadlock your system :-)

joe_services.sh is as follows:

#!/bin/sh
## start my local demons

for i in /home/joe/services/*.sh 
  do
    /bin/su joe $i start
  done

This script is run as root. The command su joe $i start runs the shell script $i as user joe - note not as root.

The directory /home/joe/services contains, among other things a file web_server.sh which is as follows:

#!/bin/sh
 
##
## usage web_server.sh {start|stop|debug}
##
 
##   PA   = path to the web server
##   PORT  = port to run as
 
PA=$HOME/tutorials/dev/web_server
PORT=4501
ERL=/usr/local/bin/erl
HOSTNAME=`hostname`
export HEART_COMMAND="$PA/web_server.sh start"

case $1 in

  start)
    $ERL -boot start_sasl -sname webserver001 -pa $PA \
         -heart -detached -s web_server start $PORT 
    echo  "Starting Webserver"
    ;;
 
  debug)
    $ERL -sname  webserver001 -pa $PA -s web_server start $PORT
    ;;
 
  stop)
    $ERL -noshell -sname webserver_stopper -pa $PA \
           -s web_server stop webserver001@$HOSTNAME
    echo "Stopping webserver"
    ;;
 
  *)
    echo "Usage: $0 {start|stop|debug}"
    exit 1
esac
 
exit 0
                    

For debugging I start the system with the command web_server.sh debug.

In production the server is started with the command: web_server.sh start. This starts Erlang with the flags -detached -heart

2.2. Detached mode

Starting Erlang with the flag -detached starts Erlang in detached mode. In detached mode Erlang runs silently in the background without a controlling terminal.

2.3. Heartbeat mode

Starting Erlang with the flag -heart starts Erlang in heartbeat mode. In heartbeat mode an external program monitors the Erlang system - if the Erlang system dies the system is restarted by evaluating the command in the environment variable ERLANG_HEART. The value of the environment variable is $PA/web_server.sh start and so the program just gets restarted.

We can see this as follows, first we list all the Erlang processes then we start the web server and check which Erlang and heart processes have been started:

    
    $ ps -ax | grep erl
    $ ./web_server.sh start
    Starting Webserver
    $ ps -ax | grep erl
    31367 pts/7    S      0:00 /usr/bin/beam 
    $ ps -ax | grep heart
    31369 ?        S      0:00 heart -pid 31367
    

Process 31367 is the Erlang web server. Process 31369 is the heartbeat processes which is monitoring process 31367.

We now kill process 31367 and check to see what happens:

    
    $ kill 31367
    $ ps -ax | grep erl
    31386 ?        S      0:00 /usr/bin/beam 
    $ ps -ax | grep heart
    31388 ?        S      0:00 heart -pid 31386
    

here we see that a new Erlang process and a new heartbeat process were started. The new Erlang process 31386 is the web server and it is monitored by process 31388.

The only way to the web server is to first kill the heartbeat process (31388) and then kill the Erlang process (21386). Alternatively, running the script web_server.sh stop will stop the web server in a controlled manner.

The above method of making a system daemon is in practice very reliable. The Wiki web at http://www.bluetail.com/wiki/ uses this technique and has been running for about two years without manual intervention.

3. Increasing reliability

We saw in the previous section how to use the -heart flag to restarting the entire Erlang system in the event of failure. While testing my programs I almost automatically perform a coverage analysis. The code in web_server.erl contains code to perform a coverage analysis. The relevant parts of the code are as follows:

    
    cover_start() ->
        cover:start(),
        map(fun(I) -> cover:compile(I) end, mods()),
        web_server:start(['4501']).
    
    cover_stop() ->
        map(fun(I) -> cover:analyse_to_file(I) end, mods()),
        cover:stop(),
        erlang:halt().
    

To run the coverage analysis I cold start Erlang, then give the command web_server:cover_start() in the Erlang shell. I then give a few commands in my web browser (to exercise the program). Then I move back to the command shell and give the command web_server:cover_stop() - this produces a number of file with names like web_server.COVER.out - these files can then be inspected to see how many times each individual line of code was evaluated.

At least - that's the theory - when I last tried this I got an error - if anybody knows why I'd be grateful if they told me :-)