Friday, February 18, 2005

Distributed K-ary System (DKS)

Latest version of DKS has recently been changed however this report still has some valid information..

Short Report on DKS
ASIM GHAFFAR
Dept. of Computer and Systems Sciences (DSV), KTH

FAISAL GHIAS MIR AND WAQAS AHMED SAEED
Department of Microelectronics and Information Technology, KTH

Abstract: This report contains the description of DKS middleware architecture based on the learning we had while implementing a toy chat application on top of DKS. It tells what services some of the main packages provide. Additionally it also tells the logic behind the reasons and concerns for various layers.

General Terms: Architecture, Distributed System
Additional Key Words and Phrases: DKS, peer-to-peer, p2p

1. INTRODUCTION
DKS (Distributed K-ary System) is a peer-to-peer middleware developed at Royal Institute of Technology (KTH), and the Swedish Institute of Computer Science (SICS). It is entirely written in JAVA. Supports scalable Internet-scale Multicast, Broadcast, Name-based Routing, and provides a simple Distributed Hash Table abstraction. Strictly speaking, DKS is structured overlay system using K-aray trees. One of the important features that it has is location independent virtual identifier based routing.

2. DKS DESCRIPTION
There are five main packages. Main one mimics the system and is called orh.kth.dks. Other sub packages are also absolutely necessary for functioning system. In the following section we will briefly describe different packages before actually commenting on the architecture. In the next section we will conclude with reasons behind formation of different layers.

2.1 Packages
Main package (org.kth.dk) has four interfaces and four classes dealing with overall DKS structure base. At the heart of DKS is DKSInterface which is an important (i.e. Must to implement) interface that primarily deals with routing of the messages. All DKS nodes use this for joining and leaving the DKS system. It also has the important function for looking up nodes i.e findResponsible(long). DKSImpl is the reference implementation for DKSInterface. One has an option of directly implementing DKSInterface or else of subclassing from DKSInterface while developing nodes. For small application one can directly use DKSImpl as it does provide minimal functionality. This is exactly what we did for our chat application.

There is also another noteworthy interface i.e. DKSDHTInterface which is subinterface of DKSInterface but has additional declaration for DHT methods. DKS also provided reference implementation for this i.e. DKSDHTImpl.
Package dks_comm as the name suggest contains classes for communication and connection. It has wrapper for the underlying communication infrastructure abstracting both communication related data structures (i.e. DKSNetAddress etc) and functionalities (i.e. Listener etc) from rest of the DKS system.

Package dks_node deals with classes that are used by a particular DKS node rather then DKS system. There is one-to-one relation in the reference implementation between node and system. The important class in this package is DKSNode which also implement DKSCallbackInterface. This class is some what specific to a particular DKS system with configuration (N=1024, K=2, L=3, F=5).. For a different configuration one will have to implement a new class as most of the configuration is final. DKSCallbackIntrface

Package dks_marshal has classes for various messages. All messages classes implements a common interface DKSMessage which basically declares only three functions i.e marshall, unmarshall and getName. Package also has an additional class for marshalling and unmarshalling.

2.2 Architecture
Following diagram shows different layers. Middle four layers constitute DKS and they all have corresponding packages. However architecture is not strictly layered as node and communication both do marshalling, but at different level and also node has direct access to Communication. Node marshals all messages into XML before passing it to communication layer. However, later communication layer also marshals this XML messages into a wrapper class over byte array before actually sending it using dataoutputstream.

From programming perspective, main thing is DKSInterface, rest everything revolves around this interface. Currently, there are two implementations for this interface, which comes with the code. Supporting classes are in various packages (based on their position in layer) which one may decide to use or not. For example for common communication structure there are supporting classes in dks_comm which mimic TCP/IP. In other words, at this point in time DKS is an overlay system which runs over TCP/IP. However for other type of networks one would need to implement communication classes himself. That explains why these communication classes are not made part of main package as the reason is to abstract away DKS from underlying network.

Current code does not have implementation for multicasting however separation of node (DKSNode) from system (DKSImpl) implicitly provides this. Like, in order to have multicast, one way is to have each DKSImpl create an additional DKSNode per multicast group. As all nodes are connected in global group, one node can create a new group by creating a new DKSNode and then letting all other member of global group about this multicast group. All those who are interested in joining that group will simply create additional DKSNode (same IP different port) and will join that new node. However this assumes that then DKSImpl will have array of DKSNodes rather then one DKSNode because in that case DKSImpl will enable inter group communication. One another way of achieving this is to create additional DKSImpl object but in that case it would be application that would be handling intra group application. However this would be a dirty solution as multicast should not be headache of the application unless it has some application specific requirement.

3 CONCLUSION: WHY VARIOUS LAYERS?
This point has implicitly been told in the section 2.2. Here we will explicitly mention about it in bullets

1. Communication layer is separate so the DKS can support various network technologies. Current version has implementation for TCP/IP

2. Marshalling layer is separate because of several reasons. First is, in order to bring separation between main DKS system skeleton and the protocol. This will enable in future to work on protocol syntax separately from other component. Also at this moment protocol is initially defined in XML this might not be strictly needed by DKS System. So second reason is to separate protocol binding with message technology (i.e. XML) from rest of the system.

3. Node is separated from the main package as it represents systems initiation in one ring only. System is ring independent as it might create more then one ring (i.e. for multicasting). Also how node deals with communication and marshalling is immaterial to overall system. Overall system is only concerned with few basic functions i.e. routing, sending, joining, etc. How book keeping is done for it is not needed by the main system hence all such detail have been abstracted within dks_node package.