Introduction

Distributed Architecture Definition Languages (DADLs) are emerging as tools for formally representing the architecture of distributed systems. As architectures become a dominant theme in large distributed system development, methods for unambiguously specifying a distributed architecture will become indispensable.

An architecture represents the components of a large distributed software system and their interfaces, methods of communication, and behaviors. It is the behaviors of the components, the communication between the pieces and parts, that are under-specified in current approaches. To date, distributed system architectures have largely been represented by informal graphics in which the components, their properties, their interaction semantics and connections, and behaviors are hand-waved in only partially successful attempts to specify the architecture.

Traditional computer languages, like C, concentrate mainly on the definition of the algorithm and data structure components by using language provided mechanisms to specify type definitions, functions, and algorithm control. The interface is under-defined by header files where function names, parameters, parameter types, and parameter order are specified. This is short of specifying the behavior of the interface. Traditional computer languages are much more suited to defining implementation than they are to defining architecture.

Consider the following simple C program where we calculate the sum of two integers. See Table 1 on page

**Table 1:** Example of a Traditional C Program with Header File.
$\begin{table} \begin{center} \begin{tabbing} 0123 \= 4567 \= 8901 \= 2345 \= 67... ...bf\it int\ }}n , {{\bf\it int\ }}m) ; \\ \end{tabbing} \end{center} \end{table}$

Traditional programming languages easily define the data structures and the algorithms. There is very limited help in defining the architecture. In fact, there is an assumed architecture, so implicit that most languages doesn't even define it as a feature. The functions main and plus communicate over a shared address space, memory resident, ordered, highly reliable, synchronous, and error-free communication medium materialized by using a call-frame stack.

The language of communication is defined by the call statement. The function main sends two integers to the function plus and waits for an integer in reply. The function plus receives two integers and replies with their sum. The implicit call and return in the C language materialize this architecture.

This implicit architecture is appropriate for small and simple programs but as applications become more complex, large, and distributed, the implicit call-frame stack architecture is no longer appropriate. A distributed architecture might deal with a disjoint address space, non-memory resident, unordered, non-reliable, asynchronous, and error prone communication mechanism. This is far from the assumptions of traditional computer languages. It is no wonder that large systems are hard to define using traditional programming languages.

Object based systems, like C++, extend the programming paradigm to include objects, sub types, polymorphism, and inheritance. This powerfully extends the ability of a language to define the data structures and algorithms. However, the underlying implicit architecture does not change. The architecture still dictates memory resident, ordered, highly reliable, synchronous, and error free communication over a shared address space, that is materialized by a call-frame stack.

Another shortfall of the implicit object-based system architecture is in the definition of the behavior. Though the C++ interface defines the methods exported by a class, it does not define the methods used or required by that class. Thus an implementation can perfectly match the interface but have an entirely different behavior than another similar implementation because it is composed with different primitives.

Some of the founding object-based languages, such as SIMULA [#!SIMULA62!#] and SmallTalk [#!SmallTalk83!#] [#!SmallTalk83b!#] [#!SmallTalk84!#] [#!SmallTalk89!#], tried to replace the implicit architecture of a call-frame stack with a message-passing queue. In this architecture, methods are evoked by passing messages between objects. However, the architecture is still implicit and under-defined leading to no choice in an alternative behaviors.

Distributed middle ware support systems, like DCE [#!DCE:1996!#], extend the programming paradigm. The DCE Interface Definition Language(IDL) includes argument flow (in or out parameters), interface identifiers, dynamic binding information, and exceptions. Using DCE it is possible to define communication mechanisms for architectures that are in disjoint address spaces, non-memory resident, non-reliable, and error prone. DCE accomplishes this by expanding the call mechanism. Asynchronous communication is dealt with by providing threads while unordered communication is provided by using network data grams under UDP. DCE replaces the traditional architecture with one that is more suited for distributed computing but does not allow choice between alternative architectures.

CORBA [#!CORBA:1996!#] extends the programming paradigm to include messaging and distributed objects. Communication is done over an information bus where requests are issued and brokers respond to satisfy those requests. CORBA is really directed at building object models for a large class of applications under one, and only one, request/broker architecture. Though this is extremely necessary for application development, architectural needs go unfulfilled. CORBA is more like a detailed requirement specification, defining in detail the needs of a particular application domain.

Megaprogramming [#!Wiederhold:1992!#] extends the call mechanism to an asynchronous messaging paradigm between large components called megamodules. The communication between two megamodules is defined with language structures like setup, estimate, invoke, extract, and examine.

Languages, like Rapide [#!Luchham:1996!#], extend the interface definitions to include events and causal relationships between events. Using the paradigm of hardware design, the behavior of the interface is governed by signals and events which are synchronized by a clock. The interface has been extended to include both the generated and required methods. This allows for the interface to act more like a meta-schema that governs both actions and simple behavior.

In comparison, Rapide expands the role of the call statement into a directed graph of causal events. Megaprogramming expands the call statement into a family of asynchronous primitives. While the proposed DADL expands the call statement into conversations, behaviors, and contracts concentrating on distributed systems.

Don't confuse a DADL with a requirement language. The requirement is a statement of the problem at a high level of abstraction. This is in contrast to a DADL which defines a generic plan that binds the requirements to the implementation. Requirement languages, such as STATEMATE [#!Harel:1988!#] and Modechart [#!Jahanian:1990!#], define the problem but not the solution.

This thesis proposes a DADL to specify architectures of distributed systems. This is accomplished by first defining the attributes of large distributed systems that distinguish a distributed system of other types of systems. Next the DADL language will be defined. DADL will then be used to specify several key architectures.

Other related work includes Rapide [#!Luchham:1996!#], UniCon [#!Shaw:1994!#], ArTek [#!Terry:1994!#], Wright [#!Allen:1994!#], Code [#!Newton:1992!#], Demeter [#!Palsberg:1995!#], Modechart [#!Jahanian:1990!#], PSDL/CAPS [#!Luqi:1993!#], Resolve [#!Edwards:1994!#] and Meta-H [#!Vestal1:1994!#].