Publish/subscribe systems have emerged in recent years as a promising paradigm for offering various popular notification services such as news monitoring, ecommerce site monitoring and alerting services for digital libraries. Since XML is widely used as the standard format for data exchange on the Web, a lot of research has focused on designing efficient and scalable XML filtering systems. In these systems, subscribers submit continuous queries expressed in XPath/XQuery to the system asking to be notified whenever the query is satisfied by incoming XML data. However, offering XML filtering functionality on Internet-scale and avoid the typical problems of centralized solutions, we need to deploy such a service in a distributed environment. In this thesis, we design, develop and evaluate an XML filtering system called FoXtrot. Our proposal combines the strengths of automata for fast XML filtering and distributed hash tables for building a fully distributed scalable system. Our approach differs architecturally from the majority of recent proposals that deal with distributed XML filtering since they assume an XML broker architecture, whereas our solution is built using a structured overlay network. In FoXtrot, we distribute an nondeterministic finite automaton on top of Pastry DHT motivated by their ability to be in several states at the same time, resulting in many different parallel executions. Structural matching is performed using the automaton while we study different methods for also distributing the task of predicate evaluation. As a result, FoXtrot scales both for a large number of queries and a large number of predicates per query. Finally, we perform an extensive experimental evaluation of our system and demonstrate that it can index millions of user queries exhibiting a high indexing and filtering throughput. At the same time FoXtrot achieves very good load balancing properties, improves its performance as we increase the size of the network and exhibits a sufficient degree of fault-tolerance. Our evaluation was done in a controlled environment of a local cluster and on the worldwide testbed provided by the PlanetLab network which represents the real-world conditions of the Internet.
|Institution||National and Kapodistrian University of Athens|