SID2 Interface Specification

Reference Document
Web site: <http://www.sics.se/~olof/sid2.html>

Software site: <ftp://ftp.sics.se/users/olof/sid/sid2.tar.Z>

Olof Hagsand - <olof@sics.se>
The Swedish Institute of Computer Science

Stockholm, August 18, 1995

1. Introduction

SID2 is a communication module offering support for reliable multicast and group communication in a wide-area internet environment. The package is based on UDP/IP multicast and pthreads. The goal is to supply a distribution layer for hunderds or thousands of participants.

SID2 is an update of the SICS Distribution Package (SID or SID1) [1]. The most important difference between SID1 and SID2 is the nature of the reliable multicast protocol. In SID1, it was based on a positive acknowledgement scheme, while the reliable multicast protocol in SID2 is based on negative acknowledgements. SID2 is based on aggresive use of IP multicast, object-based communication, application layer framing and round-trip estimations.

We pursued an idea originally in SRM [2] to reduce the amount of message passing and thereby minimize network load and increase scalability. The method is to make heavy use of multicasting, to make communication object-based, and to base reliability on a negative acknowledgement request/response scheme.

In group communication, network-level multicasting leads to a fewer number of packets on the network than in point-to-point communication. Load is also lower on network interfaces, protocol processing and CPU as the number of participating processes. As multicast routing protocols and network interfaces evolve, we will see better multicast support in the future. But even today, multicast can be used beneficially on the internet, although there exists some limitations on the number of simultaneously connected groups. In short, multicast makes group communication scale better.

In SID2, reliable communication is based on objects recognized by the protocol. A protocol peer need not store messages until their arrival has been confirmed by all recipients. Instead, it may request the latest object replica from its local application by a callback/upcall.

If a peer detects a missing update message, or if it just requests an object, the peer simply requests the object on the associated multicast address. By round-trip-time estimation and a timeout algorithm, the closest peer with the latest version of the object responds. The reply is also multicast to inhibit other similar replies. In this way, the network is not flooded: in the optimal case, only the closest peer replies.

1.1 What has changed in SID2 ?

In summary, some important features and differences from the original SID version are the following:

2. Datatypes

We start by defining some important datatypes.

2.1 Object Identifiers - struct sid_id

Objects have globally unique identifiers that are 128 bit (16 bytes) long. The identifiers are used in the multicast protocol to distinguish between objects. Object identifiers are supplied by the application.

The following functions are defined for Sid identifiers:

2.1.1 int sid_id_eq(struct sid_id *id0, struct sid_id *id1)

Return 1 if id0 and id1 are equal.

2.1.2 void sid_id_copy(struct sid_id *id0, struct sid_id *id1)

Copy identifier id0 to id1.

2.1.3 struct sid_id *sid_id_dup(struct sid_id *id)
Allocate and return a copy of sid identifier id.

2.2 Sequence Numbers - seq_t

Objects have sequence numbers (32 bit unsigned integers) to determine versions of objects. The protocol uses sequence numbers to be able to drop "old" objects. The following set of functions (implemented as macros) are used to compare sequence numbers:

2.3 Addresses - struct sid_addr

Sid addresses are used to identify communication peers and multicast addresses.

2.3.1 void sa2sid(struct sockaddr_in *addr, sid_addr *addr2)

Convert an internet address to a sid address.

2.3.2 void sid2sa(sid_addr *addr, struct sockaddr_in *addr2)

Convert a sid address to an internet address.

2.3.3 int sid_addr_eq(sid_addr *a0, sid_addr *a1)
Returns 1 if two sid address are equal, otherwise 0.
2.3.4 void sid_addr_copy(sid_addr *a0, sid_addr *a1)
Copy sid address from a0 to a1.
2.3.5 sid_addr *sid_addr_dup(sid_addr *a)
Allocate and return a copy of address a.

2.4 Buffers - struct sidbuf

The sidbuf data structure has the following C definition:

     struct sidbuf {
       struct sidbuf *s_next;
       unsigned short s_len;
       unsigned short s_off;
       unsigned short s_max;
     };

where s_next points to the next sidbuf in a chain; s_len is the length of user data, s_off is where user data begins and s_max is the size of the whole sidbuf, i.e. the maximal possible of s_off+s_len, given at creation time.

The functions used to manipulate sidbufs are the following:

2.4.1 struct sidbuf *sidbuf_get(unsigned int size)

Allocate a new sidbuf with a maximum payload "size". When used in the sid protocols, the sidbuf should be initialized in the following way:

	struct sidbuf *s = sidbuf_get(sid_max_packet_size());
	s->s_off += sid_hdr_offset();
in order to pre-allocate space for the sid communication header.
2.4.2 void sidbuf_copy(struct sidbuf * s0, struct sidbuf *s1)

Copy sidbuf s0 to s1. s0 and s1 should have the same maximal size (i.e. s_max).

2.4.3 struct sidbuf *sidbuf_dup(struct sidbuf * s)

Allocate and return a new sidbuf and copy the contents from s to the new sidbuf.

2.4.4 void sidbuf_free(struct sidbuf* s)

Free a sidbuf chain, ie the sidbuf s and transitively, all sidbufs that are reached by s_next.

2.4.5 stod(s, type)

Macro that returns user data cast to "type" given a sidbuf "s".

2.4.6 Example
Allocate a sidbuf "s" with a maximal payload size of 1024. Add an offset of 64 (for example to reserve protocol header space) and copy an integer with value 23 into the payload.
     struct sidbuf *s, *s2;
     s = sidbuf_get(1024);
     s->s_off += 64;
     s->s_len = sizeof(int);
     *stod(s, int*) = 23;.

3. Requesting objects

An application peer requests objects by its identifier and a sequence number. The request is sent asynchronously to a multicast group and the sending thread does not block. When an update arrives, the callback interface dispatches the update to the requesting application (see Section 5). In particular, if an error or timeout occurs, the application is notified by a "giveup" callback.

3.1 void sid_obj_request(sid_addr *addr, struct sidbuf *s, struct sid_id *id, seq_t seq, int appl_entry)
Request an object with identifier "id", and an identifier which is greater or equal to "seq" on address "addr". If no update message matching the request is received, the request is resent periodically until it gives up. The appl_entry argument is a user-defined used when dispatching to applications.

This corrresponds to an "unrelated" request of an object, eg., stemming from an "internal" event. In the case when many peers may simultaneously request an object, such as in a a negative ack, sid_obj_request_rel should be used.

3.2 void sid_obj_request_rel(sid_addr *addr, sid_addr *from, struct sidbuf *s, struct sid_id *id, seq_t seq, int appl_entry)

Same as sid_obj_request, except that the request is not sent immediately. Instead, the request is sent only if an update (request?) is not received within a time limit.

This corresponds to a negative acknowledgement or a request stemming from when many peers simultaneously request an object. "from" specifies the probable source of a lost packet. A peer detecting a missing update should use this function rather than sid_obj_request.

The data parameter s can be supplied to incorporate a message to the responder. This parameter is deallocayed by Sid and should not be used further by the user.

4. Updating objects

A Sid application may send a new version of an object on a multicast address, or send its current version due to a request, or periodically resend a version of an object. All these cases are examples of object updates.

Updates are made with an application entry (appl_entry) as a hook for callbacks when dispatching to applications.

4.1 void sid_obj_update(sid_addr *addr, struct sidbuf *s, struct sid_id *id, seq_t seq, int appl_entry)

Send object with identifier "id" with version "seq" on address "addr". The actual object is supplied in "s". The data in "s" is deallocayed by Sid and should not be used further by the user. "appl_entry" is the dispatch entry to be used in callbacks to the application.

If a local request matching the update exists, that request is cancelled.

4.2 void sid_obj_update_rel(sid_addr *addr, sid_addr *from, struct sidbuf *s, struct sid_id *id, seq_t seq, int appl_entry)

Send the object as a result of a request by "from". Delay it and if a similar update is detected, cancel it before it is sent.

A peer answering on sid_obj_request should use this function rather than sid_obj_update to inhibit multiple replies.

5. Callback Interface

Sid needs to communicate with its applications by passing information, request services and notify of events. Such communication is managed by the callback interface and can be made in a variety of ways.

5.1 Callback dispatch type - enum dispatch_type

Sid applications register callbacks to be invoked as requests, updates and events occur in the system. Callbacks can be made in different ways and are registered through a functional interface (see below).

The following callback type are supported (NB: not all are yet implemented):

The default case is that a thread is forked and a function is called within the new thread.

5.2 int sid_callback_obj_req(int appl_entry, sid_callback_fn callback, enum dispatch_type how)

Register callback when a request of an object with dispatch type "appl_entry" occurs. The "callback" function along with "how" specifies how the callback is made. The "callback" function should have the following parameters:

sid_addr* addr, sid_addr* from, struct sidbuf *s, struct sid_id *id, seq_t seq, int appl_entry

The following example shows the application code to register a callback on an object request callback. That is, when an object request occurs with application entry 42, the function fn is called within a new thread.

     /* Register callback "fn" */
     sid_callback_obj_req(42, fn, DISPATCH_FORK);

     void *fn(sid_addr *addr, sid_addr *from, struct sidbuf *s, 
		struct sid_id *id, seq_t seq, int appl_entry)
        {
           /* Handle object request */
        }
5.3 int sid_callback_obj(int appl_entry, sid_callback_fn callback, enum dispatch_type how)

Register callback when an update of an object with dispatch type "appl_entry" occurs. The "callback" function along with "how" specifies how the callback is made.

5.4 void sid_callback_giveup(void *(*callback)(int appl_entry, char* string))

Called whenever the process gives up sending a request. The two arguments to the callback function are the application entry and an explaining string.

5.5 int sid_callback_block (int appl_entry, enum sid_packet_type pkt_type, struct timeval *T, sid_addr **addr, sid_addr **from, struct sidbuf **s, sid_id *id, seq_t *seq)
Block the calling thread until a "appl_entry" message occurs with type "pkt_type". The optional "T" specifies the maximum waiting period, if different from the system-defined timeout. The arguments: "addr", "from", "s", and "seq", are optional and will be filled in appropriately if a succesul return is made. If supplied, "id" works as a pattern matcher. The function returns -1 on error due to a timeout.

The following example sends a request of type 42 of identifier id, and then blocks until a reply is received:

    sid_obj_request(addr, s0, id, seq, 42);
    if (sid_callback_block(42, DATA_PKT, NULL, NULL, 
			   NULL, &s1, &id, NULL) < 0){
	fprintf(stderr, "timeout");
	return;
    }
    free(s1);
    free(id1);

6. Rate control

The flow control in Sid1 was based on positive acks, similar to the algorithm in TCP/IP. In Sid2, a rate control (traffic shaping) scheme controls output according to a token bucket algorithm [3].

In the token bucket model, a bucket has a size and a rate. Tokens are periodically inserted into the bucket, while outgoing packets remove them. The bucket size is limited and tokens overflow when the bucket is full. The token size can therefore be seen as the maximum burst size. The token rate controls how fast packets can be sent back-to-back.

6.1 void get_rate_control(int *tb_size, int *tb_rate)
Get values of token bucket rate control parameters.
6.2 void set_rate_control(int tb_size, int tb_rate)

Set values of token bucket rate control parameters.

7. Misc

7.1 void sid_init()

Initialize the sid module, create threads, initiate tables, etc. Must be called before all other sid functions.

7.2 void sid_exit()

Terminate and exit Sid gracefully.

7.3 void sid_netw_ttl(int ttl)

Set IP multicast ttl level to "ttl". All multicast packets sent from this peer use this level.

7.4 int sid_hdr_offset()
Returns the length of a sid protocol header.
7.5 unsigned int sid_max_packet_size()
Returns MTU (maximum transmission unit). A new sidbuf should not be smaller than this value.
7.6 void sid_set_max_packet_size(unsigned int size)
Set new length of MTU (maximum transmission unit). Can be useful for an adaptive scheme wher you want to adjust MTU because of changing network conditions. The default is set by the SID_NETW_PKT_SIZE, currently 1500.
7.7 void sid_netw_loss_rate(int prob)

Set the probability that individual packets will be dropped instead of sent to "prob", given in percent.

7.8 void sid_netw_callback(sid_netw_callback_type fn)

Set a user-supplied function to receive all incoming packets. This is a way to bypass normal SID protocol processing.

8 RPC module

8.1 int sid_rpc(sid_addr *addr, int type, void *data, int len, void **reply_data, int* reply_len, sid_addr *from, struct timeval *T)

Remote procedure call. Blocks until one reply is returned or timeout.

8.2 int sid_callback_rpc_server(int appl_entry, sid_data_fn callback)

Register rpc server callbacks.

A Installation

Sid should be able to runs on many Unix operating systems. The primary requirements are pthreads and IP multicast.

Sid is currently running on the following systems:

A.1 Software requirements

The software needed to run Sid are:

A.2 Retreiving the software

All required software outlined above can be fetched by anonymous ftp from sics.se: ftp://ftp.sics.se/users/olof/sid/sid2.tar.Z.

A.3 Compiling Sid

Sid uses GNU autoconf to be able to automatically configure for your machine. From the directory where you want to have the object code, run the "configure" script in the distribution top directory. This script will create appropriate GNUmakefiles for you system. After this you should be able to type "gmake all" or whatever name you have GNU make installed under.

A.4 Running Sid

(Obsolete)
In order to run Sid, a nameserver must be accessible. Currently, the name of the nameserver is sid_nameserver. When the nameserver is running, Sid applications may be run.

References

[1] O. Hagsand and A. Westerlund, "SID --- The SICS Distributed Architecture Software User and Reference Manual for SID release A", SICS, 1994

[2] S. Floyd, V. Jacobson, C-G Liu, S. McCanne and L. Zhang, "Reliable Multicast Framework for Light-weight Sessions and Application Level Framing", Proceedings from SIGCOMM'95, Boston, MA, Sept, 1995

[3] A. Parekh, "A Generalized Processor Sharing Approach to Flow Control in Integrated Services Networks", MIT Laboratory for Information and Decision Systems, Report No. LIDS-TH-2089.