RepliWiki Experiment

In order to ensure the accuracy of our PlanetLab experiment, we used real world information extracted from the English version of the Wikipedia database for the date April 1st, 2005 during a 24 hour period. This article assumes that the reader has downloaded the complete English language article dump with all page edits (although the following should work with other languages assuming they use the same database schema). Information concerning downloading and installing the database dumps can be found through Wikipedia's website located below:

Alternatively, you can download our trace dumps which include the overall database structure for RepliWiki and the trace table containing a full days worth of updates for the date April 1st, 2005.


Because RepliWiki is a prototype application, we have no real need for the majority of the tables included within the traditional Wikipedia database database. Furthermore, the traditional Wikipedia database contains a number of key dependencies which can cause severe headaches for multi-master applications. For this reason, we chose to created our own database schema for RepliWiki as shown below:

Figure 1: Represents the users table. Note that each RepliNode keeps track of their own usernames, thus they are relative to their domain.

Figure 2: Represents the updates table. This table keeps track of updates received from other sites in order to avoid importing the same update more than once.

Figure 3: Represents the articles and trace tables. The trace table contains all article updates for April 1st, 2005 and is solely used by our RepliTraceLoader utility. The articles table on the other hand is used to import/export articles into SHH-Sync and is what is used to display information in our RepliWikiGUI.


In order to devive the update information from Wikipedia's database for importation into our modified database schema we queried for current and historical articles that contained the title, author, domain, timestamp, and text of article updates that occured on April 1st, 2005. We then used the popular MySQL tool, SQLyog, to export the resultset into an XML file. This XML file was then imported into our modified RepliWiki database schema through the use of our RepliTraceLoader utility.

Because we wanted to simulate article updates between the various PlanetLab nodes, we randomly assigned the 2167 authors to one out of the four possible domain names. We used our RepliShuffle program to perform this task. Below represents the output of its shuffling as well as a distribution table and chart:

Figure 4

Trace Data Table

Figure 5

Trace Data Chart

Figure 6


As Figure 4 shows, it is the trace table which contains the complete shuffled article list for April 1st, 2005. As our goal was to simulate user interaction to observe SHH's performance in a deployable environment, we created a RepliTrace program which runs simultaneously on each of the four PlanetLab nodes and inserts an article into the articles table at the timestamp in which it occured on that specific domain. Although the default parameter of '-t 0' will insert an article as it actually occurs in a 24 hour period, the experiment can be sped up by passing values greater than 0 into the replitrace.jar file. Below represents an instance of the RepliTrace program running locally at its default speed:

Figure 7

While the RepliTrace program merely simulates user activity locally at the domain in which it is run on, it is the RepliDriver program which imports and exports these updates to and from SHH-Sync's dissimination and reconciliation folder. As mentioned in the FAQ , SHH-Sync uses OR to distribute the latest article updates among other PlanetLab nodes. The RepliDriver program is silently run in a chronjob at a user specified interval (we used 5 minutes) in order to:

  1. Export entries occuring since the last recorded export
  2. Publish these entries for dissimination using SHH-Sync
  3. Import entries from other domains not previously imported.

Each of these updates are in the form of an XML document. Thus if the RepliDriver exports at 12:00 am, 12:15 am, and 12:30 am, it will ONLY export entries that occured since 12:30 am when it runs again at 12:45 am. We avoid this export redundency by storing a meta file in the RepliDriver program directory and avoid the import redundency by storing a hash of the update file already imported in the database.

Figure 8


Although we were readily able to observe SHH's performance using the SHH-Sync, RepliTrace and RepliDriver programs alone, we also developed a bare bones implementation of MediaWiki called RepliWikiGUI in order to visually access each of the PlanetLab nodes we maintain and watch article updates be dissiminated in real time.

Figure 9: Users login to their domain

Figure 10: Notice that article updates from others domains show up under Recent Changes

Figure 11: Users can view articles however we have not interpreted MediaWiki's templated language in RepliWiki. We do however support HTML and plain text as shown.

Figure 12: Users can edit articles too.