(Semantic) Similarity-Blog

Why ballpoint pens and pencils are similar?

Archive for the 'SIM-DL' Category

SIM-DL_A: A Novel Semantic Similarity Measure for Description Logics Reducing Inter-Concept to Inter-Instance Similarity

Abstract. While semantic similarity plays a crucial role for human categorization and reasoning, computational similarity measures have also been applied to fields such as semantics-based information retrieval or ontology engineering. Several measures have been developed to compare concepts specified in various description logics. In most cases, these measures are either structural or require a populated ontology. Structural measures fail with an increasing expressivity of the used description logic, while several ontologies, e.g., geographic feature type ontologies, are not populated at all. In this paper, we present an approach to reduce inter-concept to inter-instance similarity and thereby avoid the canonization problem of structural measures. The novel approach, called SIM-DL_A, reuses existing similarity functions such as co-occurrence or network measures from our previous SIM-DL measure. The required instances for comparison are derived from the completion tree of a slightly modified DL-tableau algorithm as used for satisfiability checking. Instead of trying to find one (clash-free) model, the new algorithm generates a set of proxy individuals used for comparison. The paper presents the algorithm, alignment matrix, and similarity functions as well as a detailed example.

Janowicz, K. and Wilkes, M. (2009; forthcoming): SIM-DL_A: A Novel Semantic Similarity Measure for Description Logics Reducing Inter-Concept to Inter-Instance Similarity. The 6th Annual European Semantic Web Conference (ESWC2009). Lecture Notes in Computer Science 5554, Springer. pp. 353-367, 2009.

SIM-DL Server and Protege plug-in released beta2.3

The new beta versions of the SIM-DL semantic similarity server and the Protege plug-in (for 3.3.x) are available at sourceforge. Please note that this release contains a lot of experimental new features and is unstable (so using beta 2.2 may be a better idea in some cases). The new features (e.g., similarity estimations) are described in the FOIS 2008 paper linked below.

Janowicz, K., Maue, P., Wilkes, M., Braun M., Schade, S., Dupke, S., and Kuhn, W. (2008; forthcoming): Similarity as a Quality Indicator in Ontology Engineering. 5th International Conference on Formal Ontology in Information Systems (FOIS 2008). Saarbruecken, Germany October 31st - November 3rd 2008.

Human participants test on the visualization of result sets

We are currently running a small web-based human participants test on the visualization of result sets in information retrieval processes. We would greatly appreciate your help with this test, which should not take more than ten minutes to complete.

» start test

SIM-DL Server and Protege Plug-in released (beta2.2)

We are happy to announce the release of the next beta version (2.2) of our SimCat similarity server and Protege plug-in! The new release contains several bug fixes and a context logger. The next release will focus on performance issues.

SIM-DL Server and Protege Plug-in released (beta2.1)

We are happy to announce the release of the next beta version (2.1) of our SimCat similarity server and Protege plug-in! The new release contains some major improvements such as an extended context model with several kinds of contexts, caching, lazy unfolding and a more intuitive user guidance for the plug-in.

Kinds of Contexts and their Impact on Semantic Similarity Measurement

The final draft of my paper about the impact of context on semantic similarity accepted for CoMoRea 2008:

Abstract: Semantic similarity measurement gained attention over the last years as a non-standard inference service for various kinds of knowledge representations including description logics. Most existing similarity measures compute an undirected overall similarity, i.e., they do not take the context of the similarity query into account. If they do, the notion of context is usually reduced to the selection of particular concepts for comparison (instead of comparing all concepts within an examined ontology). The importance of context in deriving meaningful similarity judgments is beyond question and has been examined within recent research. This paper argues that there are several kinds of contexts. Each of them has its own impact on the resulting similarity values, but also on their interpretation. To support this view, the paper introduces definitions for the examined contexts and illustrates their influence by example.

Janowicz, K. (2008): Kinds of Contexts and their Impact on Semantic Similarity Measurement. 5th IEEE Workshop on Context Modeling and Reasoning (CoMoRea) at the 6th IEEE International Conference on Pervasive Computing and Communication (PerCom’08); Hong Kong, 17 – 21 March 2008.

Similarity and Context - Papers added

The following papers have been added to the similarity and context section:

[132] V. Kashyap, and A. Sheth (1996) Schematic and Semantic Similarities between Database Objects: A Context-based Approach. VLDB Journal 5 (4), pp 276-304 [PDF] (external link)

[133] Keßler, C.; Raubal, M.; Janowicz, K. (2007; forthcoming) The Effect of Context on Semantic Similarity Measurement. 3rd International IFIP Workshop On Semantic Web & Web Semantics (SWWS ‘07). Lecture Notes in Computer Science, Springer. Vilamoura, Algarve, Portugal.

Algorithm, Implementation and Application of the SIM-DL Similarity Server

Our paper submitted to the Second International Conference on GeoSpatial Semantics (GeoS 2007) has been accepted for publication. I am especially happy about this, because I’ve published my first similarity paper two years ago at the first GeoS conference. Beside the SIM-DL similarity server and Protege plug-in, we will also present our application area, i.e. an improved gazetteer web interface using subsumption and similarity reasoning. gazetteers_small.png

Abstract
Semantic similarity measurement gained attention as a methodology for ontology-based information retrieval within GIScience over the last years. Several theories explain how to determine the similarity between entities, concepts or spatial scenes, while concrete implementations and applications are still missing. In addition, most existing similarity theories use their own representation language while the majority of geo-ontologies is annotated using the Web Ontology Language (OWL). This paper presents a context and blocking aware semantic similarity theory for the description logic ALCHQ as well as its prototypical implementation within the open source SIM-DL similarity server. An application scenario is introduced showing how the Alexandria Digital Library Gazetteer can benefit from similarity in terms of improved search and annotation capabilities. Directions for further work are discussed.

Reference
[120] Janowicz, K., Keßler, C., Schwarz, M., Wilkes, M., Panov, I., Espeter, M. and Bäumer, B. (2007; forthcoming) Algorithm, Implementation and Application of the SIM-DL Similarity Server. Second International Conference on GeoSpatial Semantics (GeoS 2007). Lecture Notes in Computer Science, Springer. Mexico City, Mexico.

see also:
[73] Janowicz, K. (2006). Sim-DL: Towards a Semantic Similarity Measurement Theory for the Description Logic ALCNR in Geographic Information Retrieval. R. Meersman, Z. Tari, P. Herrero et al. (Eds.): SeBGIS 2006, OTM Workshops 2006, LNCS 4278, pp. 1681 – 1692, 2006.

A new version of the SIM-DL similarity server has been released

The first beta version of the SIM-DL similarity server and Protege plug-in is available at sourceforge for download!

SIM-DL server alpha2 released

Today the second alpha version of the SIM-DL similarity server was released. It can be downloaded from the sourceforge repository: SIM-DL. A new version of the Protege SIM-DL plug-in is also available.

Similarity Server Progress

Just a short report on our progress in developing a DIG compliant semantic similarity measurement server for the description logic ALCNQ: The server consists of three parts, the ALCNQ reasoner that checks for ABOX satisfiability (and therefore also subsumption), a DIG server based on the Jetty Web server and finally the similarity module. Up to now we have first running alpha versions of the reasoner and server parts and start to work on the similarity module now (using an extended version of the SIM-DL theory [73]). The software is implemented in java and we plan to release a first public alpha version in May at sourceforge.

[73] Janowicz, K. (2006). Sim-DL: Towards a Semantic Similarity Measurement Theory for the Description Logic ALCNR in Geographic Information Retrieval. R. Meersman, Z. Tari, P. Herrero et al. (Eds.): SeBGIS 2006, OTM Workshops 2006, LNCS 4278, pp. 1681 – 1692, 2006. [PDF] (external link)

Human Subject Test about Role-Filler Similarity

Over the last weeks I have developed a small, web-based human subject test to compare three models of role-filler similarity against human judgments. After an introduction and motivation section, people are asked to rate the similarity between spatial relations, objects and finally combinations of both. The results are compared to those of the computational theories. It turns out that both, the multiplicative approach and the weighted average with automatically determined flexible weightings are potential candidates whereas the simple (unweighted) average does not performed very well (as expected). Moreover there is evidence, that the multiplicative approach tends to underestimate while the weighted average overestimates in general. It took quite a while to really understand how the test should look like, which kind of rating system (sliders) to choose and how to randomize the questions - however I am still not satisfied, especially because sometimes the randomization leads to pairs that are really hard to compare or all dissimilar. I will report on all design decisions made later on.

At the moment the test is available only in German language, but an English version will be online within the next weeks. I will also give full access to the underlaying database, so that everyone interested in human similarity judgments can download and use the results. Until now more than 40 people have participated in the test. Note however, that it is still a pre-test and I will run a face-to-face test with selected participants and a slightly modified test settings in December. Human subject testing is a difficult task and there is a lot of ‘noise’ to be removed (or taken into account) before getting useful result - if you have ideas what can be improved, please comment on this posting.

Role & Filler -Test:  [German Version]

Sim-DL Slides form SeBGIS 2006

Here are the slides from my talk about measuring semantic similarity between concept representations phrased in description logics (such as ALCNR) given at the SeBGIS 2006 workshop in Montpellier/France.You can find all the math behind the framework in the paper. Comments are welcome! [PPT]

SimCat-Project

I am very happy to announce that we, the Ifgi Cognitive Engineering Group (ICEL) and the Muenster Semantic Interoperability Lab (MUSIL), have finally started our SimCat Project. The kickoff meeting was at the 19th October and the Project (funded by the German Research Foundation) will run until November 2008. Of course I will report on the ongoing work, for now here is just a list of topics that we will deal with or have already first results from our previous work. Moreover we have started to develop a similarity server basing on the SIM-DL [73] theory to measure similarity between concepts phrased in various description logics.

List of Topics:

Semantic Similarity (and)
   •Time:
       Concepts evolve over time and therefore also their similarity.

   •Context:
        As Goodman puts it, there is no meaning of similarity without defining its respects.

   •Goals / Affordances:
        Beside context, the goals and abilities of the user have influence on
similarity.
   •Structured Representation:
        Concepts are not bags of features, but have a structure that influences similarity.

   •Representation Extraction:
        How to extract dimensions for geometric similarity approaches out of  databases?

   •As Compromise: 
        The role of similarity in decision support systems involving several users.

   •Generalization:
        Levels of abstractions and their influence on similarity.

   •Description Logics:
        How to measure similarity between DL-concepts (see [73]).

   •Activation/Artificial Neural Networks:
        Can we use neural networks as activation & alignment structures for similairty?

List of Project Participants:

  • Boris Bäumer
  • Martin Espeter
  • Krzysztof Janowicz
  • Carsten Keßler
  • Ilija Panov
  • Martin Raubal
  • Mirco Schwarz
  • Marc Wilkes

Our Project Logo:


[73] Janowicz, K. (2006). Sim-DL: Towards a Semantic Similarity Measurement Theory for the Description Logic ALCNR in Geographic Information Retrieval. R. Meersman, Z. Tari, P. Herrero et al. (Eds.): SeBGIS 2006, OTM Workshops 2006, LNCS 4278, pp. 1681 – 1692, 2006. [PDF] (external link)

Role & Filler-Similarity for Description Logics

At least in my opinion there a two ways to handle similarity between role-filler pairs: The first (and maybe most straightforward one) is to define similarity as product of the similarities derived by comparing roles and fillers (see equation 1). The second approach is a weighted sum of role and filler similarities (see equation 2).

As example both equations measure overlap between existential quantifications (sime), where simr is the inter-role and simc the inter-filler (range-concept) similarity. Equation 1 returns 0 if compared roles or fillers are dissimilar (sim = 0), which is an advantage from the perspective of computation time and (more important) avoids misleading results as discussed below. Nevertheless defining role and filler similarity as equally important seems to be oversimplified. Moreover except {0, 1} the resulting similarity sime is per definition (of multiplication) smaller than simr and simc, which probably contradicts with humans way of perceiving similarity! The second approach however raises the question how to semi-automatically derive the weightings that determining the relative importance of inter-role respectively inter-filler similarity for sime. In addition, high similarity ratings (for sime) already occur if one of the measured similarities is significant, while the other may be even 0.
     Imagine a transportation device ontology, where R specifies an inside and S a disjoint relation. If both fillers C and D stand for waterways, equation 1 yields 0, while equation 2 results in ωc*1. Now one may argue that the weighting for inter-role similarity should be higher, but than you just need to switch the example (by defining dissimilar fillers) to run into the same difficulty again.
    To overcome this shortcoming I have recently added the notion of thresholds from neural networks to the additive similarity approach to define a minimum similarity value simr and simc need to overleap, else sime is 0. The question of how to derive the weightings and the threshold is still open, but maybe it is possible to integrate the notion of commonality and variability used in MDSM [37] for this purpose. However until now the theory presented in [73] uses the product similarity approach, its idea of context-awareness is comparable to MDSM and therefore a combinations seems to be promising. As start I have used a threshold t = 0.3, ωr = 0.6 and ωc = 0.4 for some first experiments within a simplified accommodation ontology.

[37] Rodríguez, A. M. and M.J. Egenhofer, Comparing Geospatial Entity Classes: An Asymmetric and Context-Dependent Similarity Measure. International Journal of Geographical Information Science, 2004. 18(3): p. 229-256

[73] Janowicz, K. (2006). Sim-DL: Towards a Semantic Similarity Measurement Theory for the Description Logic ALCNR in Geographic Information Retrieval. R. Meersman, Z. Tari, P. Herrero et al. (Eds.): SeBGIS 2006, OTM Workshops 2006, LNCS 4278, pp. 1681 – 1692, 2006.