Summary Hash History (SHH)

The unprecedented growth of the world’s first non-profit, open-source encyclopedia has put considerable stress on the Wikipedia foundation, who is constantly looking for donations to support their rising infrastructure and hosting costs while maintaining adequate quality of service. That the public-owned content depends on a single organization’s financial fate is a major concern to many. We propose using optimistic replication to ensure that the encyclopedia content is preserved at multiple sites managed by different organizations. Replicating the Wikipedia database not only requires an efficient update exchange protocol but also needs a mechanism to identify the origin of update pollution or “anonymous slander” as it is frequently referred to by Wikipedia users. In order to meet these challenges effectively, we introduce the Summary Hash History (SHH) approach. In this approach, each site maintains a tamper-evident update history to mitigate security challenges and to readily determine the exact set of updates to be transferred during peer-topeer reconciliation between sites. We first implemented Basic-SHH which confirmed our intuition that SHH can be used for both the tamper-evident history and the efficient update exchange mechanism. However, our evaluations revealed that Basic-SHH is unable to guarantee convergence among replicas in scenarios involving concurrent updates. Thus, we developed a variant called Associative-SHH that overcomes Basic-SHH’s limitations by not only providing eventual convergence but also enabling convergence of concurrent updates across partitioned networks.

SHH Applications

S-Sync

Our first implementation of SHH is in the form of a file/directory synchronization application (S-Sync). The shared unit in S-Sync can be a file or a directory. The S-Sync application provides interfaces for Conflict Detection and State Reconstruction