PeerWare in a Nutshell

The Model

The PeerWare coordination model is centered around the notion of a global virtual data structure (GVDS), which is a generalization of the Lime coordination model. According to this model, peers interact by accessing a data space that is transiently shared and dynamically built out of the data spaces provided by each accessible peer. From the point of view of the user accessing this GVDS, the content of the data structure is automatically and dynamically reconfigured according to changes occurring in the system, typically induced by changes in connectivity among peers.

Sharing Data

The data structure managed by PeerWare is organized as a directed graph composed of labelled nodes and documents, collectively referred to as items. Nodes are organized in an unrooted tree, while documents represent the leaves of the graph and are linked to one or more nodes. This graph is meant to represent a containement relation used to structure and classify the documents managed through the middleware and resembles a filesystem, where directories play the role of nodes, files are the documents, and Unix-like hard links are allowed only on documents. Figure below shows an example of this data structure.

The data structure managed by PeerWare.

Each peer is associated with a local data structure, organized as described above, whose content is assumed to be stored locally to the peer. At any time, the local data structures held by the peers connected to PeerWare are made available to the other peers as part of the GVDS managed by PeerWare, which has the same structure of the local data structure (i.e., it complies with the above definition) and whose content is obtained by "superimposing" all the local data structures belonging to the peers currently connected, as shown in figure below.

The GVDS resulting from two peers joining the PeerWare network.

Changes in connectivity among peers (e.g., determined by mobility, or simply by logging in and out the PeerWare net) determine changes in the content of the GVDS managed by PeerWare, as new local data structures may become available or disappear. Nevertheless, this reconfiguration is completely hidden to the peers accessing the GVDS, which need only to be aware of the fact that its content and structure is allowed to change over time.

Accessing Data

The goal in designing PeerWare was to develop a flexible and extensible middleware including a minimal set of primitives, which could support both a proactive and a reactive style of interactions among peers. At the model level we pursued this goal by introducing only three main primitives, which can be applied either to the local data structure associated with a peer or to the GVDS: the first proactively operates on the data managed by PeerWare, the second is used to subscribe to events occurring on such data, while the third atomically combines the first two. The operation effectively performed by these primitives is not encoded within them, while it is provided as a parameter: the action in the description below.

I=execute(Fn, Fi, A).: This primitive takes a node filter Fn, an item filter Fi, and an action A, and executes the action on the projection of the data structure identified by Fn and Fi to determine a set of items I, which is returned back to the caller. In particular, Fn determines a set of matching nodes and Fi filters the content of such nodes to determine the set of items handed by action A.
subscribe(Fn, Fi, Fe, C).: Allows a peer to subscribe to the occurrence of an event matching the event filter Fe and being published within the projection of the data structure identified by the filters Fn and Fi. When the event occurs the callback C is executed locally to the caller.
I=executeAndSubscribe(Fn, Fi, Fe, A, C).: Executes an arbitrary action A on the projection of the data structure identified by Fn and Fi, similarly to the execute primitive. Also, in the same atomic step, it subscribes for events that match Fe, and occur within the same projection of data, by specifying the callback C that must be executed locally to the caller, when one of such events occurs.

Despite the fact that the signature of these operations is identical for both local and global data structures, their effect is limited in scope by the nature of the data structure they are applied to. Moreover, also the semantics of the operations is affected by this choice. In paticular, the semantics of a global operation can be regarded as being equivalent to a distributed execution of the corresponding operation on the local data structures of the peers currently connected.

As for the atomicity of the operations, this is guaranteed when they are invoked on the local data structure of a single peer, while when executed globally, PeerWare only guarantees atomicity on the execution of the corresponding operations on each local data structure, that we said to be an integral part of the global execution.

As a final remark, we may observe that the executeAndSubscribe operation extends execute with the ability to "hook" on some information, by allowing the realization of schemes providing strong consistency on such information by retrieving some data and monitoring events occurring on them. For instance, a programmer might want to retrieve the content of a node and be notified if any new document appears in that node, e.g., to build a graphical browser of the GVDS. The same behavior cannot be obtained by simply invoking execute followed by subscribe. In fact, given the inherently distributed and asynchronous nature of the system, a peer could publish a relevant event right in between the execute and the subscribe. Such event would not be captured by the subscription, and the notification would never show up, thus leading to an inconsistent state.

Other operations are included in the model, i.e., to create new items, destroy existing items, and notify the occurrence of events, for further details on these operations we suggest to read the papers describing PeerWare or to jump directly to the API.

The Middleware

General Considerations

The PeerWare model naturally suggests a middleware implementation that is intrinsically peer-to-peer, where each peer hosts a repository that contains its local data structure. An operation on the GVDS managed by the middleware, e.g., a global execute, is then performed by disseminating on the connected peers the request for a local invocation of the corresponding primitive, and sending the results back to the caller. Hence, each peer needs to host a run-time support to manage the routing of system messages, like event notifications and requests for operations.

Nevertheless, the model does not prescribe anything about how such routing must be performed, e.g., what is the topology of the network interconnecting the peers, and what algorithms are used to perform routing on top of it. On the other hand, the PeerWare model includes several choices that have been made on purpose to open up opportunities to improve efficiency and scalability of any PeerWare implementation, independently from the underlying architecture. In particular, the hierarchical nature of the data structure chosen happens to provide a natural way to restrict the scope of the operations performed over the GVDS, and thus to allow optimizations of the processing involved. For instance, the distribution of requests for an execute should always be somehow "steered" only towards the peers that actually contain the nodes that are targeted by this operation.

Moreover, the mechanism of actions not only allows programmers to define dynamically the exact behavior of the primitives through which they access the GVDS, but also allows computation to be moved close to resources, thus opening up interesting opportunities to efficiently implement complex operations over documents. At the architectural level this involves the use of mobile code technology to implement the shipping and fetching of the code of actions.

Finally, the model leaves unspecified the nature of the languages used to specify the filters Fn, Fi, and Fe. Here, the tradeoffs are between the expressive power placed in the hands of the programmer and the burden of added complexity and overhead placed on the middleware run-time support.

The Prototype

As mentioned, current PeerWare implementation is meant to support the development of peer-to-peer applications for collaborative work, in an a typical enterprise domain in which users are connected through wired or wireless links to a medium-sized fixed network. In this scenario, the fixed network may provide a backbone of permanently active peers, taking care of processing and routing the control messages related to requests for operations, as well as subscriptions to and notifications of events. Other peers may be permanently or discontinuously attached as leaves of this backbone, including a dynamic fringe of mobile peers, whose connectivity is enabled by wireless devices.

To support portability and platform independence, Java was chosen as the implementation language. Moreover, we designed the PeerWare run-time in a way that is independent from the underlying repository, by decoupling the two through the use of an adaptation layer, represented by a Java interface, which specifies the operations the PeerWare run-time needs to perform on the underlying repository. As a consequence of this choice, we prescribe very little about the data filtering language or the document format. Documents are managed by the PeerWare run-time as opaque data returned by the repository, whose processing may be delegated further to actions. In the current implementation, we chose a simple, open source XML repository, thus data filters are XQL queries and documents are XML data.

As for the node filtering language, we adopted a simplified form of regular expression, similar to the one used by Unix shells, to reduce the effort needed to interpret and evaluate node filters. In this language, the wildcard "*" may appear only at the end of an expression, hence allowing to point at either specific nodes or to all the subnodes of a given one.

For events and the related filters we borrowed from our previous experience in implementing and using Jedi, a distributed, publish/subscribe middleware. Then, PeerWare events are characterized by fields, each one having a name and a string value, like in Jedi. In this schema, the event filtering language allows programmers to specify which fields must be
present in the events they are interested to and, through a regular expression, which must be their value.

Jedi inspired also the design of the mechanisms currently used to route messages (i.e., events as well as requests for execute, subscribe, and executeAndSubscribe operations), which are based on a hierarchical architecture where peers are arranged in an unrooted tree. Given the characteristics of the domain we targeted, and the requirements we set, we decided that this architecture offers the best tradeoff, as it enables the possibility of using a fixed backbone of permanently active peers, it avoids the potential for routing loops, and keeps the routing algorithms simple and efficient. To enhance flexibility, the tree of peers is allowed to change dynamically.

Information about the peers that host each node is dynamically maintained by the PeerWare run-time and it is used to steer execute, subscribe, and executeAndSubscribe operations only towards the peers that effectively hosts the data required to perform the operation.

As in the case of the repository, security and access control is not handled directly by PeerWare, while it is delegated to an external security module, thus allowing for different policies and protocols to be supported. Clearly, PeerWare provides all the necessary hooks to establish secure communication channels as well as perform authentication and access control on top of it.

For further details on this issues we suggest to read the papers describing PeerWare or to refer to the documentation of the PeerWare API.