Three prototypes were implemented for experimenting with different ways of gathering data and combining sensors in different settings. This section discusses the prototype implementations of the assistant and the lessons learnt. The implementations have been using an architecture similar to the one outlined in Section 2.4, where sensors monitor a pool of rather unstructured log-events.
Since the security debate about untrusted programs recently has focused a lot on applet-security, the initial plan was to develop a security assistant for the Java appletviewer. This idea was abandoned mainly due to problems in getting relevant log-data and the lack of non-trivial applets (from a security perspective). In the latter prototypes ordinary Unix-programs were monitored since they exhibit a more variegated behavior and better monitoring tools are available. Utility-programs, such as find, grep etc have an intuitively straightforward behavior that seemed relevant to monitor. Larger programs such as emacs or Netscape (at least when running its embedded Java-interpreter) can have very complex behaviors. They require more intricate descriptions of their expected behavior. Thus they where not considered in these prototypes.
The sensors implemented were focused on monitoring behavior related to file-accesses. This is because it is the domain that is most structured in today's operating systems and consequently easiest to get audit data from.
The first prototype was a based on a specialization of the SecurityManager-class which is responsible for implementing access policies for the standard Java libraries. The SecurityManager was modified to produce log-events whenever a resource was asked for. The log-events where immediately forwarded to the sensors, allowing them to throw a SecurityException whenever they found a resource request breaking the current security policy.
In this prototype the sensors were fairly simple. Most sensors kept a per-program count of what properties the applets accessed, e.g how many files the applets attempted to read or how many TCP-connections the applet opened. Some sensors where more complex and could ask the user whether a particular action should be allowed, e.g if a file could be read.
: Screenshot from the Java-prototype monitoring two applets.
Some limitations in the current version of Java (JDK1.0) concerning what audit data can be extracted from the appletviewer could be noted. These are most likely deliberate design decisions to enable Java programs to run on smaller CPU:s, but makes Java less satisfactory for unconstrained experimentation.
A note on Java. First, since memory management is part of the language it is not possible to gather data concerning each applet's memory consumption. Since the garbage collection is automatic, it is not possible to prevent an applet from keeping a reference to an object thus hindering it from being removed. Secondly, it is not possible to place limits on or to measure the applets' CPU-usage, nor to limit or get audit-events when threads are created.
Although the applets reside in different name-spaces, applets can under some circumstances communicate directly with each other [47]. This message-passing is not auditable in the current appletviewer.
The current implementation of the appletviewer also has some drawbacks. The Java security approach is more focused on hard security. If the applet is allowed to create a file, further accesses to it are not auditable from the SecurityManager. Once allowed to open a socket to its source URL the applet will henceforth be allowed to use this connection.
Most applets at the time were not using any auditable system resources and the latter prototypes monitored Unix-programs instead of applets. However, there is now a collection of hostile applets [24] which would be interesting to study. Most of these applets do different kinds of denial-of-service attacks (i.e spewing out large numbers of windows) but also some attacks such as unexpectedly killing other threads or sending mail in your name.
The second prototype was written entirely in Tcl
. The objective was to investigate how sensors
communicating in a KQML-like way should be constructed and since the sensors should be able to
draw on the screen easily.
The Solaris command truss
,
which prints all system calls that a process executes was used to generate system events. The system
events were parsed and routed to subscribing sensors. Other sensors could in turn subscribe to
messages from these sensors.
The prototype consists of four components, three sensors and one message router (a.k.a facilitator). The components were placed in separate name-spaces (called interpreters in Tcl-jargon). Since Tcl is single-threaded the messages were passed to a central message queue to ensure that each message would be handled completely before any other message could be invoked on a sensor. The facilitator subscribes to all log-events from a sensor filtering the audit data from truss. The sensors advertise their interest in certain events to the facilitator who then asks the audit data filter to achieve that these events are added to the stream of audit events. The facilitator keeps track of the current subscriptions and routes the messages to the appropriate sensors. The facilitator is quite simple and does not, in this prototype, match the sensors advertising what questions they can answer to the proper recipients.
The sensors were designed to be as independent of each other as possible. The approach was to let them communicate with a speech-act based protocol, posing questions to each other in a content language. KIF is often suggested as a good content language for agents since it makes it easy to formulate queries in a prolog-like manner with partially instantiated terms etc. Instead of KIF, this prototype uses a much simpler language where matching is done using just one regular expression. Instead of using anonymous variables in the terms the bound variables are tagged with their name and sorted according to the tags. This makes it possible to construct a regular expression matching this partially instantiated term.
To free the components from the need of sophisticated query-parsing their message streams are allowed to contain some messages that should not be transmitted. The facilitator is responsible for removing these unnecessary messages and to duplicate messages that were asked for by multiple subscribers. Hence all the messages are routed through the facilitator.
: Early version of the Tcl-prototype. ``visual fingerprint'' of the command find
/home1/ara/demo -name *.tcl -exec ls -l {} ;
There is no concept of variables in this simple content language and it is not possible to express that an unbound variable should to be bound to the same value on both sides of the conjunction. This means that more complex queries have to be subdivided into several queries, thereby requiring the query-issuer to maintain information about the query during the querying. This increases the message passing needed among the components.
The efforts to keep the sensors simple and expressed imperatively caused a lot of unnecessary message passing and caused the sensors to grow since they had to update a lot of state regarding what they knew about the state-of-affairs.
Most of the time in the program was spent by the facilitator parsing the messages and finding the appropriate recipient. In a time-critical application like this it is apparently much more efficient to add better support for message handling (especially better content language handling and better subscription-facilities) in the components themselves.
The main conclusion drawn is that the sensors should negotiate what data they are interested in before they start the auditing. If this is done, dedicated communication channels can be set up between the components so that they know what messages they can expect on a certain channel. This reduces the need for parsing and runtime routing. Optimally, the message-passing overhead should be similar to an indirect method-call.
The third prototype addressed the problems from the Tcl-prototype of extensive message-handling overhead and the awkwardness in expressing relations. The event-pool was moved into a prolog-program to which sensors could connect and subscribe to events of issue new events.
The prolog-program holds knowledge about how system events relate to the current state of the active processes. This reduces the need for storing and copying information that can be inferred from the log events.
The content language was changed to prolog. This made it much easier to search and to express
complex queries. The queries are plain prolog terms that are evaluated in the
pool. Sensors can monitor the result of a query or ask for all current
solutions. If the query is monitored the stream to the query-poser is kept with the query, hence
no routing of messages
is
needed.
: Prolog-prototype displaying sensors and sensor information.
The use of a good knowledge-representation language greatly simplified the task of structuring the log data. Sensors for events such as ``what files were opened at a certain time'' or ``did process A cause process B to have fd X set to the same file as process C's fd 5'' are now very easy to write.
Using a language like prolog means of course disregarding loss of some efficiency compared to a C-implementation, but the use of a powerful inference engine and an expressive content language was decidedly of great help.
Designing the sensors is the part that requires the most creativity and ingenuity. The sensors implemented in the prototypes were fairly simple and rule based, acting on certain events.
This section describes the sensors implemented and a brief discussion on some other sensors that can be conceived as well as some remarks on what features a sensor might have.
The sensors implemented in the prototypes were the following.
In the first prototype:
or for acting as a server for connections from the outside.
These sensors do not do any sophisticated classifying of events or comparison to a per-applet security profile. They were all straightforward to implement, with the exception of the applet sensor, since they use log-events from the SecurityManager.
Sensors measuring the number of threads and their respective CPU-usage could not be easily implemented since no hooks for that are available in the SecurityManager.
In the second prototype:
The third prototype re-implemented the filesensor and simpleutil-sensor in prolog. Since a lot of the facts that had to be saved explicitly in the Tcl-version could be inferred, the size of the sensors shrank considerably.
New sensors in the third prototype:
It is possible to conceive a lot of sensors to implement. I mention some possible sensors to give some inspiration as to what can be done. Whether they are the ones to be used in later prototypes will depend what actual behaviors the assistant will have to monitor.
All these measures can be used in conjunction with cardinality, frequency and distribution measures. The cardinality measures set common-sense limits to the activities. Frequency measures and distribution measures are harder to use since they suppose that you have long-term experience of this program. Moreover the demand that the particular implementation exhibits a certain distribution of events in time and category makes the measures more difficult to use in a general way.
Below are some ad hoc examples of higher level sensors describing the behavior of some classes of programs. These categories are not definite in any way, but are given as inspiration to what can be said about a certain class of programs. Note that the descriptions do not contain any descriptions of what it does not do since any non-stated behavior should be disallowed. The vague words like ``short'', ``might'' and ``almost'' connote that this behavior varies between executions.