There are two aspects of related work to be discussed. First, there are the security issues and approaches for managing untrusted programs. Secondly, the related work done in the Intrusion Detection field, especially the approaches related to the monitoring of programs rather than humans. Familiarity with traditional access control mechanisms is assumed and not discussed here. Neither is models for managing the information-flow within an organization, such as the Bell-LaPadula [4] or Biba [5] security models, since they concern the issues of how information can flow between hierarchical security levels. Our approach is that decision-support instead of classification of information, is more manageable for non-expert users and will make the security administration less laborious.
The discussion is based on literature studies of the respective fields. This section discusses some approaches to design of safe-languages and why and where runtime monitoring of programs is needed.
Most programming language research to date has focused on increasing the expressiveness and efficiency of the program languages designed and usually not on issues of incorporation program parts that can not be trusted upon. E.g, Stroustrup [38] holds that
... the fact that users can cheat and use [global variable names] is mitigated by the fact that users can always cheat anyway: Language-level protection mechanisms are protection against accident, not against fraud.
This view has to be abandoned if untrusted program components are to be incorporated in a framework where they interact safely and efficiently with other program components. This has lead to the design of so called ``safe'' languages. In these languages, a component is never allowed to circumvent the programming abstractions and access restrictions, e.g casting types illegally, calling a function with arguments of inappropriate type or overflowing the stack. All components can feel confident that their private data will not be revealed if they themselves are correctly written.
Many of the access protection problems stem from the programs' ability to forge pointers, i.e using an integer as a pointer into data that is known to reside on a certain memory location. If pointers and pointer arithmetic are allowed, a program can use this to violate access restrictions by accessing objects as something they are not, e.g a byte array or an object with the same data layout as the actual object but without its access-restrictions. Removing pointers is usually the first decision made when designing a safe language.
Other features a safe language can provide are separate name-spaces so that a program can know that its variables and functions cannot be confused with those of other programs, ways to ascertain that some functions can not be overridden so that the programs can be sure that a method, e.g in a super-class of another object, provides a certain service, automatic garbage collection to alleviate problems with memory management.
Generally, safe languages use one or more of three approaches to ensure that the programs' access privileges are constrained: The access to the underlying system can be disallowed or restricted, the program can be analyzed to ensure that it conforms to certain stipulated restrictions or the language can use a computational model that makes certain actions impossible to implement. These respective approaches are discussed below.
All the safe languages studied (Java, Tcl, TeleScript) restrict the access to the underlying computer. The programs are run in a virtual machine that provides its own API for the programs to use.
The simplest approach to ensure that programs can not access each other's data is to place the programs in different name-spaces and only allow them to communicate through channels explicitly given to them. This approach is used Tcl7.5 [30], Penguin (Safe-Perl) [15] and the Java ClassLoader [39]. The programs can then only communicate with each other by sending messages to one-another, much like how Unix-processes in different name-spaces are allowed to communicate. In Tcl (v7.5) and Penguin no standard API is yet defined. The untrusted programs are run in an interpreter without any access to the outside world, but can be given aliases to commands from one of the trusted interpreters.
In Java, the API is instrumented with calls to a SecurityManager object that can selectively grant access privileges to the programs accessing the resources via the interface. The only discrimination implemented in the current appletviewer is between the sites the programs come from.
TeleScript's security model has many more features, but seems much less manageable. The service routines made available to the visiting agents can test various features of the agents, such as who owns the object and where it came from. Various restrictions can be set on the visiting agent, such as how long it can execute, whether it can pass on objects and whether it is allowed to leave. There are no provisions for managing general security policies and every service made available has to set up the restrictions itself. The current version of TeleScript (v1.0) executable code is never transferred between sites [19], and the originator of the executable code is supposed to be well-known and trusted, so TeleScript is currently not very relevant in the discussion of untrusted code.
Separate name-spaces are a good starting point for a secure system. Lacking in all the above approaches is yet a way to easily grant access to system resources. Part of the idea of using a Security Assistant is to be helped with these matters. It is worth to mention, that in Tcl and Penguin the name-spaces are initially completely closed whereas in Java, objects can under certain circumstances communicate via objects loaded from the local file system ClassLoader [47]. Also, some decisions of what to (and what not to) monitor are taken by the Java SecurityManager. Adding more audit-points requires modification of the system libraries, as mentioned in Section 3.1.
The message sending between objects causes a lot of overhead if the components communicate extensively, e.g if one component is a subclass of another or when a library (such as the GUI-library) has to be used extensively. If the programs can be counted on to obey the access restrictions demanded by the language they can be put in the same name-space, thereby greatly improving performance. If the source-code or a semi-compiled intermediate representation of a program is used for storing the program, the adherence to the language specification can be verified when the program is received. Any safe language could use the method of sending the source-code, but the program would then have to be emulated or complied when it is received. The distribution of intermediate code has advantages in efficiency and intellectual property rights and is used in Java, but it introduces the possibility of a malicious complier which generates code breaking the high-level language abstractions.
In the Java Verifier [46] the objective is to verify that the complied code does not break the high-level language access restrictions and that it accesses the other available objects correctly. The verifier tests the code to ensure that [39] :
The complied code conforms to certain restrictions as to how it can can be structured. The compiled code contains type information, allowing the verifier to ensure that illegal type conversion cannot take place. All references to objects outside of the code fragment have to be done through a table in the head of the code fragment. The code for methods are augmented with information of their maximum stack space and maximum number of registers used.
To implement efficient data flow analysis of the code fragments the verifier requires the stack and the registers to be equal regarding what types they contain whenever two execution paths merge. This self-imposed constraint from the compilers has the effect that some programs written in low-level Java can be deemed as incorrect even though they do not break any access restrictions. Restrictions of this kind are mainly a problem when compilation of other languages to Java byte-code is considered [6]. This has not been seen as a big problem for general users, especially compared to the benefits, since only some efficiency and no functionality is lost.
Analysis of code to verify that it conforms to an expected format is in spirit related to anomaly detection. The misuse detection approach when analyzing programs is taken by [26]. The program is analyzed to see if it contains tell-tale signs of actions such as time-dependent computations, segments of code that does not make use of the rest of the program or sequences of system calls with abnormal features, e.g race-condition dependence. In this approach, all known, potentially dangerous execution paths are discovered. This includes false positives and warnings in code that might not be used at all or even being malicious in the context it is being used.
Relying completely on code analysis is dangerous in practice. Flaws in the verifier have been reported in [12,17] and more are likely to be found. In addition to the possibility of bugs in the verifier, bugs and ambiguities in the runtime system such as those reported for Java in [11,47] show that a program's correctness is not a complete guarantee for its well-behavedness. Analyzing techniques are therefore best seen as a possible way to decrease the needed observation for malicious activity by being able to tell more about the potential dangers of a program.
Imposing constraints on what can be expressed in the byte-code could be seen as just an intermediate format for a higher-level language, in this case the only thing gained is that the compilation does not have to be done several times.
On the other hand, if the idea of code conventions is extended, more service oriented conventions or computational models can be conceived. In Java, the programs' view of the world is much like that of ordinary programs, e.g they can open network connections and access files. This means that when these programs are allowed access to the system the whole range of well-known attacks, like DNS-spoofing and mail daemon attacks are available [11]. An interesting alternative to this is to use an altogether different computational model as is done in Obliq [7]. In Obliq the values of a program can reside on other computers, but the programs themselves can not move the values; this is done transparently to them. The Obliq programs do not need a notion of a TCP/IP-network etc. That a program conforms to a certain computational model can be ensured by stating that it has to conform to certain restrictions on a more general language such as the Java byte-code.
Still, the problem here is how to actually grant resources to a program. The general problem of allowing access to data cannot be solved by choosing another view-of-the-world for a program since the programs will still have to access real data and this always means that a certain risk is involved. Also, more abstract models of computation means loss of efficiency because of the emulation that has to be done.