MERLINO
Introduction
Topic maps as a meta-navigation layer may help to improve the structuring of information on the Internet [1]. To achieve this objective topic elements are connected with resources containing information that are specified as relevant to a given topic subject – the so called occurrences [2]. One unsolved question is how relevant information resources can be identified in the huge amount of information corpora on the Internet. This question is only insufficiently mentioned in the topic map literature [2, 3]. However, it becomes increasingly important. The more information resources in a topic map are connected, the more useful it becomes for a user. The common method of manual selection of occurrences is too time consuming for information sources on the Internet. Search engines are an efficient and proven tool for information retrieval on the Internet [4].
In this project we developed the prototype for a semi automated generation of occurrences using search engines. This prototype Merlino is a web application developed in Perl. It is capable to generate occurrences for any XML topic map [2]. The prototype identifies relevant information resources by querying multiple search engines automatically based on the knowledge stored in the topic map [1].
The prototype combines the retrieval power of search engines with the ability to express semantic relationships in topic maps [1]. The objective of the prototype is the acceleration and facilitation of the generation process for occurrences.
| | A New Version of MERLINO (coded in Java) will be comming soon!!! |
Publications
ESWC 2005, Heraklion (Greece) - 2nd European Semantic Web Conference
Bernd Markscheffel, Hendrik Thomas, Dirk Stelzer:
Merlino - a prototype for semi automated generation of occurrences in Topic Maps using Internet search engines
In: Demo and Poster Proceedings of ESWS 2005
[PDF] Download the expose (2 pp. / 17 kb)
View this Paper in the bibmap Topic Map
bibmap.xtm
Report on the Open Space Sessions (pp. 271 - 280)
Alexander Sigel
In: Charting the Topic Maps Research and Applications Landscape
Lutz Maicher and Jack Park
First International Workshop on Topic Map Research and Applications, TMRA 2005, Leipzig, Germany, October 6-7, 2005,
Revised Selected Papers, Springer 2006.
Workflow for the generation of occurrences
Analysis of the topic map
In the first step Merlino identifies information describing the subject and the context of the topic . This enables the prototype to identify relevant information resources for the topic. The prototype extracts the stored information via XPath queries from base and variant names of the topic, from existing occurrences and from associations the topic is involved with [2].
Generation of search queries
The objective of the second step is the generation of a set of search queries which specify the topic as accurately as possible. MERLINO automatically converts the identified information into a set of search queries. It applies a system of processing rules, which describe how the identified information can be transformed in to the search query syntax of the search engines.
Querying of search engines
Merlino transfers the set of search queries automatically to preselected search engines. In the current development status Google, Lycos and AltaVista can be queried . The search results of every query are extracted and stored in
XML files. The identified information resources are considered as potentially relevant for the topic and therefore are designated as occurrence candidates.
Pre-ranking of the occurrence candidates
The objective of the following step is the pre-ranking of the information resources from the occurrence candidates. The user may choose between the several scoring methods, for example:
Merlino can use scoring information extracted from the collected search result sets [6] for internal ranking.
The external web impact factor can also be used for the ranking of the occurrence candidates [5, 7]. To calculate the factor the prototype uses suitable search queries from AltaVista.
Manual evaluation
The prototype does not analyse the semantic content of the information resources. Thus, the automated identification of occurrence candidates performed by Merlino must be completed by a manual evaluation conducted by the user. Only users can evaluate the relevance of the information resources for the topic trustworthily [8].
Summary
The main advantage of the prototype is the ability to process a large quantity of information resources automatically to find useful occurrence candidates. The scoring methods decrease the user evaluation efforts. The prototype also helps to facilitate the generation of topic maps and to enhance the development of topic map applications based on a top down approach.
References
-
-
[3] Park J., Hunting S.
XML Topic Maps: Creating and using topic maps for the web. Pearson Education Inc., United States of America, 2003.
[4] Oppenheim C., Morris A., McKnight C. The evaluation of
WWW search engines. In Journal of Documentation, vol. 56, no. 2, 2000, 199-211.
[5] Thelwall M. Web impact factors and search engine coverage. In Journal of Documentation, vol. 56, no. 2,2000, 185 - 189.
[6] Eastman C. M. 30,000 Hits May Be Better Than 300: Precision Anomalies in Internet Searches. In Journal of the American Society for Information Science and Technology, vol. 53, no. 11, 2002, 879 - 882.
[7] Ingwersen P. The calculation of the web impact factors. In Journal of Documentation, vol. 54, no. 2, 1998, 236 -243.
[8] Fugmann, Robert Subject analysis and indexing: theoretical foundation and practical advice. Indeks-Verlag, Frankfurt/Main, 1993.
Back to the project page