To give a sample of the benefits of using XML, within my PhD work (report.ps.gz), I wrote a program in 1998 to do clustering of AML (Astronomical Markup Language) documents. It was using both the meaningful links between the documents, and the keywords associated with them, using a noising partitioning technique, and displaying the result on a topic map. The documents could be retrieved automatically from various sources, starting from an initial document and using the AML links to retrieve the related documents. It was a success, but as many cool PhD software, it disappeared from the web since it could not be maintained anymore.
Back to 2004, I needed a program to cluster other documents, and couldn't find any free software to do this simple task. I decided to resurrect this project, and I found a way to specify the list of documents, keywords and links in an external XML document. This way, it can now work for any collection of documents, even non-XML documents.
Here is a sample document list, with keywords and links. The DTD is included in the package.
<DOCLIST> <DOCUMENT id="108"> <URL>section2_1_2_7_APPRENDRE.html</URL> <TITLE>Vitesse orbitale</TITLE> <KEYWORDS> <KEYWORD>KEPLER</KEYWORD> <KEYWORD>MASSE</KEYWORD> <KEYWORD>MOUVEMENT</KEYWORD> <KEYWORD>TRAJECTOIRE</KEYWORD> <KEYWORD>VITESSE</KEYWORD> </KEYWORDS> <LINKS> <LINK toid="110"/> </LINKS> </DOCUMENT> </DOCLIST>
When the document list is ready, the clustering program can be launched (just double-click on Clustering.jar).
The clustering algorithm is first spreading the documents randomly on the grid, then move them in order to reduce the "cost" progressively. After a while, it stops and the result is recorded in a grid.xml file.
This grid XML file can then be displayed with the DispGrid applet, with an HTML file containing this code:
<applet code="dispgrid.DispGrid" archive="DispGrid.jar" width="100" height="100"> <param name="url" value="http://server/grid.xml"> </applet>
Some web browsers prevent applets from displaying a new window : Internet Explorer with Windows XP SP2 (it used to work before SP2) or Google bar's popup blocker, Firefox 1.5 (it used to work before version 1.5). The applet cannot display a selected web page because of this.
Author: Damien Guillaume