C H A I M S: Compiling High-level Access Interfaces for Multi-site Software

Towards the Science of Component Engineering

Incremental result extraction and progress monitoring in other protocols:

in JointFlow
in SWAP
in CORBA-DII

Incremental result extraction and progress monitoring in JointFlow

JointFlow is the Joint Workflow Management Facility of CORBA [JointFlow98]. It is an implementation of the I4 protocol of the workflow reference model of WfMC [WfMC94] on top of CORBA. JointFlow adopts an object oriented view of workflow management: processes, activities, requesters, resources, process managers, event audits etc. are distributed objects, collaborating to get the overall job done. Each of these objects can be accessed over an ORB, the JointFlow specification defines their interfaces in IDL.

Starting execution of work

Simplified, work is started in the following way:

A requester (instance of WfRequester) gets the reference of a process manager (instance of WfProcessMgr) from somewhere, e.g. over a naming service. The requester then invokes the create_process operation of the process manager thus prompting it to create a new process (instance of WfProcess). Both, requester and process get the references of each other, allowing future communication. Finally, the requester sets the context attributes in the process and invokes the start operation of the process. Context attributes not only contain the data or pointers to data on which work should be performed. They can also determine which results are desired and what kind of work should be done (can replace the notion of having several methods to choose from in a megamodule).
The process may be a wrapper of legacy code or of a physical device, or it may contain several execution steps encapsulated in activities (instances of WfActivity) that are instanciated during the execution of the workflow represented by this process instance. An activity might need external (human) help. This is achieved by assigning a resource (instance of WfResource) to the activity, either by the activity itself or in conjunction with some resource manager that may or may not be implemented as an instance of a WfProcess. An activity may also act itself as a requester, and start some other process via a process manager to do the work for it.

Prior to requesting the creation of a new process, a requester can get the signatures of context as well as result attributes (ProcessDataInfo) from the process manager. Yet this information only contains a list of pairs of attribute name and type name (strings denoting IDL types). If the type name is a complex type, no further information about its structure is available. Also no constraint information is available, e.g. for determining which context attributes have to be set for specific results (e.g. for a travel reservation service, name information is necessary always whereas other information is different for reserving a hotel room, renting a car, or booking a flight).

Monitoring the progress of work

Both, processes and activities are in one of the following states: running, not_running.not_started, not_running.suspended, completed (successfully), terminated (unsuccessfully), aborted (unsuccessfully). Assignments are either in the state potential (assignment not yet accepted by resource) or accepted. A requester can query the state of a process, the states of the activities of the process (by querying and navigating the links from processes to activities), and the states of assignments (by querying and navigating the links from activities to assignments).
If the requester knows the workflow model with all its different steps implemented by the process, the requester even might be able to interpret the state information and figure out what the progress of the process is. Yet for a requester not familiar with the internal workflow logic of the process this status information is not of great help, the requester just can determine if the process is complete or not (yet if the process is complete the requester would be notified by its receive_event operation anyway), and if the process is running or suspended. Without intimate knowledge of the workflow logic and model the requester has no way of determining how far a process has advanced or how much more time it might take. The same is true whenever a process either does not have sub activities, or these sub activities are hidden as would be the case for autonomous processes located in other organizations that care about privacy.

Comparison to CPAM:

CPAM supports the notion that certain services may support progress information (e.g. 40% done) that can be monitored. This information is more detailed than just running or complete, and more aggregated and better suited for autonomous services than detailed information about component activities.
JointFlow signals completion of work to the requester and process, whereas in CPAM this information has to be polled for by repeated progress monitoring

There is a possible workaround in JointFlow for getting progress information: a process can have a special result attribute for progress information and the process is free to update that attribute regularly. It then can send a WfAuditEvent with the old and new value of the progress indicator result to its requester after each update. Yet this result attribute cannot be polled by a requester (in contrast to CPAM and SWAP), because get_result only returns results if all are available at least as intermediate results.

Extracting results incrementally

Both, processes and activities have an operation get_result():ProcessData (returning a list of name value pairs). Get_result does not take any input parameter and thus returns all the results. The get_result operation may be used to request intermediate result data, which may or may not be provided depending upon the details of the work being performed. If the results cannot yet be obtained, the operation get_result raises an exception and returns garbage. The results are not final until the unit of work is completed, resulting in a state change to the state complete and a notification of the container process via the operation complete() or of the requester via the operation receive_event(WfEventAudit). This kind of extracting intermediate results corresponds to the progressive extraction of all result attributes in CPAM.

The following features are not available in JointFlow:

Partial extraction with get_result: only all or none of the result values can be extracted by get_result, there is no mechanism to apply an exception only for some of the values.
Progressive extraction with get_result of just one result attribute when not yet all other results are ready for intermediate or final extraction
There is no accuracy information for intermediate results, unless it is in a separate result attribute. There is no possibility to find out the accuracy or if intermediate results are ready unless requesting these results. The same is true for getting result updates over an WfAuditEvent. Especially for large data amounts this might be nasty.

Whenever a process or an activity undergoes a state change, a change in the context data, or a change in the result data, the process or activity creates a WfAuditEvent containing the old and the new information (for context and result data only those attributes that have changed are listed). A process can send this event to its requester via the receive_event operation. Though no such operation is mentioned for sending the events from an activity to its container process, there must also exist a possibility for it, because a process is required to store all the events from itself and its activities in its history log. Drawback: In case of large data, this messaging mechanism could result in huge amounts of traffic, especially if many increments of intermediary results are made available.

General question: how, when and by whom are process and activity objects deleted?

Using WfAuditEvent for partial and progressive result extraction:

A process can notify its requester of all changes in result and context data, in addition to state changes.
Such events only contain the data (name value pairs of old and new data) that has changed, thus having the same effect as partial progressive extraction.
Drawback: it is the process that determines which events are to be sent, not the requester. Unless the process has a special context attribute that tells it if and which data change events to send, there is no way for the requester to inform a process about its preferences.
It is not clear which operations would be used to notify a process about changed result data in activities.

Given the fact that there exists no descriptions of scenarios for partial and progressive extraction and that partial extraction is mentioned nowhere and progressive extraction is only mentioned in the context of defining the effect of get_result when a process or activity is not yet completed, the use of events for partial and progressive extractions seems to be incidental. It becomes possible because a process is requested to log events, therefore events for changes in data exist, and the receive_event operation of requester can receive all the different kind of events, not only state change events. Therefore, though partial as well as progressive extraction is possible in JointFlow, JointFlow has not been designed to do it and the way of doing it is rather inconsistent and incidental. Furthermore, a specific process must provide additional context and result attributes to fine tune partial and progressive extraction (determining notification by events, progress indicator, accuracy indicator). Without special protocol support and being an integral part in the syntax and semantic of the JointFlow specification and the system designs based on it, it is doubtful that any services would provide partial and progressive result extraction or high level progress information.

CORBA notification service

All objects in JointFlow can use the CORBA notification service for WfEventAudits as well as additional events. Thus it is possible to implement any notification between any objects in a particular implementation. Drawback: it is not specified who should receive which events via the notification service, so implementations that use the notification service for communication between the objects of JointFlow are no more compatible. Notification via the CORBA notification service is mainly designed for integration of other outside systems.

Incremental result extraction and progress monitoring in SWAP

SWAP (Simple Workflow Access Protocol) is a proposal for a workflow protocol based on extending http. It mainly implements I4 (to some extend also I2 and I3) of the WfMC reference model.
The different components of a workflow system are internet resources that implement one or several of the interfaces defined in SWAP. The three main interfaces are ProcessInstance, ProcessDefinition and Observer. The messages exchanged between these resources are extended http-messages with headers like PROPFIND, CREATEPROCESSINSTANCE etc. The data is encoded as text/xml in the body of the message.

Starting work

Somebody, e.g. a resource implementing the interface Observer, knows the URI of the ProcessDefinition it is interested in. With PROPFIND it can ask for information about the resource, this includes information about names and types of context and result data. As the response is in XML, the protocol itself does not limit the amount and depth of type information given. The SWAP specification does not specify the syntax and semantic of type information.
A process instance is created and started by sending a CREATEPROCESSINSTANCE message to the appropriate ProcessDefinition resource. This message also contains the context data to be set and the URI of an observer resource that should be notified about completion and other events. The response returns the URI of the created process instance resource which implements the interface ProcessInstance. Context data can also be set by sending PROPPATCH messages to the process instance. The process is started either automatically by the ProcessDefinition resource if the CREATEINSTANCEMESSAGE contains the startImmediately flag, or by sending a PROPPATCH message to the process instance with the new state running.

A process instance resource can delegate work to other resources by creating ActivityObserver resources (a specialization of the Observer interface) which create new process instances via process definition resources or give work to some human being or legacy system.

Result extraction and result monitoring

Results are extracted from a process instance by sending it the message PROPFIND. This message either returns all available results, or if it has a list of result attributes to be returned, it only returns the selected ones. Only result attributes are returned that are available. If requested attributes are not yet available, presumably an exception should be returned. SWAP does not specify if the results returned by PROPFIND have to be final or not, though I rather assume they have to be final.

Completing of work of a process instance or another resource is signaled to an observer with the COMPLETE message. This message also contains the result data: all the name value pairs that represent the final set of data as of the time of completion. After sending the COMPLETE message the resource does not have to exist any longer.

A process instance can also send NOTIFY messages to an observer resource. These messages transmit state change events, data change events, and role change events. Data change events contain the names and values of data items that have changed. Who determines if an observer is notified about all or only some of the possible state, data (result and context) and role changes? Requiring notification of data changes as default or mandatory seems to be quite an overkill as all result attributes would be sent to the observer at least twice, once by NOTIFY and once by COMPLETE, or even more often if PROPFIND messages are used before a COMPLETE is received.

Process instances receive result data from other processes or legacy systems over the ActivityObserver interface. As activity observers they can receive COMPLETE messages and PROPPATCH messages. Both contain a list of result attributes as name value pairs, though in the case of PROPPATCH this can be a partial list. SWAP does not specify if the results may also be intermediary or not.

Process progress monitoring

PROPFIND not only returns all result values available, it also returns the state of the process instance and additional descriptive information about the process. Possible states can be specified by the process itself, PROPFIND also returns the list of all possible state values, yet in most cases it would probably just be not_yet_running, running, suspended, completed, terminated. A process instance can be asked for all the activities it contains (the URIs of the activity observers it contains), and these activity observers can then be asked for their state information which mirrors the state of the process instance or legacy system they are observing. Drawbacks: see section on JointFlow.

Overall progress information is not specified by SWAP, but it could be implemented by a special result attribute assuming that result attributes can be changed over time. Such result attributes could be extracted any time by PROPFIND, independent of the availability of other result attributes.

Drawback: PROPFIND always returns all possible information about a process instance, returning of result attribute values can be selected but not turned off.

Summary

These mechanisms allow the following kind of result extraction and progress monitoring:

Partial result extraction: Either pulling results via NOTIFY messages or pushing results via PROPFIND messages is possible. NOTIFY sends all new result data, PROPFIND returns all available result data whether or not they have already been returned by a previous PROPFIND. Notification of result changes without sending also the changes, or asking for the status of results without getting also the results, is not possible.
Progressive result extraction: After reading the SWAP specification it is not entirely clear if progressive result updates in a process instance are allowed or not. If not, the result attributes would not be available until their values are final. If yes, then progressive results can be extracted either by pulling results via NOTIFY messages or by pushing results via PROPFIND messages. NOTIFY sends all changed/updated result data, PROPFIND returns all available result data whether or not they have changed since the last PROPFIND. Accuracy indication is not provided, it would have to be implemented via additional result attributes. This is also true for simple complete/not_yet_final status of individual result attributes.
Monitoring state of processes: Using PROPFIND to query status of the process instance and state of sub activities and process instances.
Monitoring overall progress: Introducing additional result attribute.

SWAP presumably does not inhibit incremental result extraction and progress monitoring. Partial result extraction is even very straightforward and supported quite well by PROPFIND as well as NOTIFY. Some issues around progressive result extraction are not clear. Also the monitoring part for incremental result extraction and overall progress information is quite weak. This is clearly due to the fact that incremental result extraction is not a main objective of SWAP, if it has been an objective at all.

Incremental result extraction and progress monitoring in CORBA-DII

CORBA offers two modes for interaction between a client and remote servers: the static and the dynamic interface to an ORB. For the static interface an IDL must exist that is compiled into stub code that can be linked with the client. The client then executes remote procedure calls as if the remote methods were local.

The dynamic invocation interface (DII) offers dynamic access where no stub code is necessary. The client gets somehow the reference to a remote object (e.g. from another method call), and the client somehow has to know the IDL from the remote object, i.e., the names of the methods and the parameters they take. The client then creates a request for a method of that object. Creating a request for a specific object instance takes the following parameters: the method name as a string, a pointer to a list of named values for all the IN, INOUT and OUT parameters of the method, a named value for the return value of the method, and some flags. A named value is a structure containing the name of the parameter, the value as type any (or a pointer to the value and a CORBA type code), the length of the parameter, and some flags. The ORB needs all the information in the named value to make sure the parameters are the once the server expects. As this is not checked at compile time, it will be checked at run-time using the information like type codes. Creating a request has many similarities to how in Java JNI method handles are created for calling Java methods out of C code.
Once the request is created, the method can be invoked. This is either done synchronously with invoke or asynchronously with send (in fact, some flags allow more elaborate settings). Invoke returns after the invocation has finished, and the client can read all OUT parameters in the named value list. In case of a send, the client is not blocked. In order to figure out when the invocation has finished, the client can use get_response, either in a blocking (it waits until invocation is done) or a non-blocking mode. When / if the return status of get_response indicates that the invocation is done, the client can read OUT parameters from the named value list.

In case of the DII, asynchronous invocation of methods is supported in CORBA. The progress of an invocation can be monitored as far as DONE, NOT_DONE is concerned, but no further progress information is returned (e.g. how much is done). Incremental extraction of results (i.e. OUT parameters as well as the return value of a method) is not supported by DII. When creating a request the parameters for the method can be inserted into the request step by step using add_arg on the request object, yet this just concerns the creation of the request on the client side and cannot be compared to SETPARAM in CHAIMS.
In order ot mimic the incremental result extraction of CHAIMS, one could use asynchronous method invocation with DII coupled with the event service of CORBA. The client could be implemented as a PullConsumer for a special event channel CHAIMSresults, the servers could push results into that channel as soon as they are available, together with accuracy information. Though event channels could be used for that purpose (we could require that every megamodule uses event channels for this), an integration of incremental result extraction and invocation progress monitoring into the access protocol itself is definitely more adequate when we consider this to be an integral part of the protocol.

References

[JointFlow98] Workflow Management Facility, Revised Submission, OMG Document Number: bom/98-06-07, July 1998

[WfMC94] Workflow Management Coalition: The Workflow Reference Model, Document Number TC00-1003, Nov 1994

[SWAP98] Simple Workflow Access Protocol (SWAP), Keith Swenson, IETF internet draft, August 1998