(Semantic) Similarity-Blog

Why ballpoint pens and pencils are similar?

Similarity and Context at IFIP SWWS 2007

This year the Third International IFIP Workshop on Semantic Web & Web Semantics (IFIP SWWS 2007) held  in conjunction with OnTheMove Federated Conferences (OTM ‘07) will have a special track on context and similarity!

Beside many others the topics of interest are:

Context-driven methods for ontology exploitation

  • Semantic similarity among ontology instances
  • Semantic granularity
  • Semantic ranking

Context-driven methods in specialized domains, e.g.:

  • Context-driven methods for geographical information resources
  • Context-driven methods for Multidimensional Media
  • Context-driven methods for Digital Library
  • Context-awareness for the Semantic Web
  • Ontology views

Human centered aspects in sifting information resources w.r.t. context

  • Context representation
  • Context elicitation
  • Context visualization

Please download the CfP here: [PDF]

Similarity and Case-Based Reasoning

I have added a new section for similarity and case-based reasoning into the literature page to link to some interesting work of Armin Stahl and Thomas Gabel.

Special Issue on Similarity-based Pattern Recognition

There is a new special issue on  ‘Similarity-based Pattern Recognition’ in Pattern Recognition Volume 39, Issue 10, Pages 1813-1948 (October 2006) Edited by M. Bicego, V. Murino, M. Pelillo and A. Torsello. You can find the list of chapters at ScienceDirect.

Similarity and Clustering

I have added a new section (and 3 papers) to the literature page called ‘Similarity and Clustering’ Please let me know of other interesting papers in this area.

[107] Petko Valtchev, Rokia Missaoui, Similarity-based Clustering versus Galois lattice building: Strengths and Weaknesses. présenté à Workshop "Objects and Classification, A natual convergence". European Conference on Object-Oriented Programming 2000, Nice (FR), (12-16 June) 2000 [PDF] (external link)

[108] Miin-Shen Yang, Kuo-Lung Wu (2004) A Similarity-Based Robust Clustering Method. IEEE Trans. Pattern Anal. Mach. Intell. 26(4): 434-448

[109] Bicego, M.[Manuele], Murino, V.[Vittorio], Figueiredo, M.A.T. (2004) Similarity-based classification of sequences using hidden Markov models,
PR(37), No. 12, December 2004, pp. 2281-2291. [PDF] (external link)

SimRank: A Measure of Structural-Context Similarity

A college pointed my to a paper from KDD 2002 called SimRank: A Measure of Structural-Context Similarity. It describes an inter-instance similarity measure that measures similarity of the structural (graph-based) context in which instances occur, based on their relationships with other instances. In other words, such instances are assumed to be similar if they are related to similar instances and so on…
I also added the paper to the literature list.

[105] G. Jeh and J. Widom. (2002) SimRank: A Measure of Structural-Context Similarity. Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Canada, July 2002; p. 538-543.[PDF] (external link)

SemMF - A Semantic Matching Framework

SemMF is a flexible framework for calculating semantic similarity between objects that are represented as arbitrary RDF graphs. The framework allows taxonomic and non-taxonomic concept matching techniques to be applied to selected object properties. Moreover, new concept matchers are easily integrated into SemMF by implementing a simple interface, thus making it applicable in a wide range of different use case scenarios. ( taken from: http://sites.wiwiss.fu-berlin.de/suhl/radek/semmf/)

A poster from the International Semantic Web Conference (2005) about SemMF is available at sites.wiwiss.fu-berlin.de/suhl/radek/pub/SemMF_ISWC2005.pdf

Similarity and Web 2.0

I have added a new section to the literature list about papers related to the role of similarity in Web 2.0. The papers deal with the role of inter-user similarity to improve trust rating in social networks.

Similarity & GIS - Talks from the SeBGIS 2006 Workshop

Good news for GI scientists interested in semantic similarity and its applications: out of 15 papers accepted for presentation at the SeBGIS workshop co-located with the OTM 2006 Federated Conferences, 3 (in fact 4) were about semantic similarity. The talks are available at http://cs.ulb.ac.be/conferences/sebgis06/ . I will also put the references into the literature section.

Human Subject Test about Role-Filler Similarity

Over the last weeks I have developed a small, web-based human subject test to compare three models of role-filler similarity against human judgments. After an introduction and motivation section, people are asked to rate the similarity between spatial relations, objects and finally combinations of both. The results are compared to those of the computational theories. It turns out that both, the multiplicative approach and the weighted average with automatically determined flexible weightings are potential candidates whereas the simple (unweighted) average does not performed very well (as expected). Moreover there is evidence, that the multiplicative approach tends to underestimate while the weighted average overestimates in general. It took quite a while to really understand how the test should look like, which kind of rating system (sliders) to choose and how to randomize the questions - however I am still not satisfied, especially because sometimes the randomization leads to pairs that are really hard to compare or all dissimilar. I will report on all design decisions made later on.

At the moment the test is available only in German language, but an English version will be online within the next weeks. I will also give full access to the underlaying database, so that everyone interested in human similarity judgments can download and use the results. Until now more than 40 people have participated in the test. Note however, that it is still a pre-test and I will run a face-to-face test with selected participants and a slightly modified test settings in December. Human subject testing is a difficult task and there is a lot of ‘noise’ to be removed (or taken into account) before getting useful result - if you have ideas what can be improved, please comment on this posting.

Role & Filler -Test:  [German Version]

Sim-DL Slides form SeBGIS 2006

Here are the slides from my talk about measuring semantic similarity between concept representations phrased in description logics (such as ALCNR) given at the SeBGIS 2006 workshop in Montpellier/France.You can find all the math behind the framework in the paper. Comments are welcome! [PPT]

SimCat-Project

I am very happy to announce that we, the Ifgi Cognitive Engineering Group (ICEL) and the Muenster Semantic Interoperability Lab (MUSIL), have finally started our SimCat Project. The kickoff meeting was at the 19th October and the Project (funded by the German Research Foundation) will run until November 2008. Of course I will report on the ongoing work, for now here is just a list of topics that we will deal with or have already first results from our previous work. Moreover we have started to develop a similarity server basing on the SIM-DL [73] theory to measure similarity between concepts phrased in various description logics.

List of Topics:

Semantic Similarity (and)
   •Time:
       Concepts evolve over time and therefore also their similarity.

   •Context:
        As Goodman puts it, there is no meaning of similarity without defining its respects.

   •Goals / Affordances:
        Beside context, the goals and abilities of the user have influence on
similarity.
   •Structured Representation:
        Concepts are not bags of features, but have a structure that influences similarity.

   •Representation Extraction:
        How to extract dimensions for geometric similarity approaches out of  databases?

   •As Compromise: 
        The role of similarity in decision support systems involving several users.

   •Generalization:
        Levels of abstractions and their influence on similarity.

   •Description Logics:
        How to measure similarity between DL-concepts (see [73]).

   •Activation/Artificial Neural Networks:
        Can we use neural networks as activation & alignment structures for similairty?

List of Project Participants:

  • Boris Bäumer
  • Martin Espeter
  • Krzysztof Janowicz
  • Carsten Keßler
  • Ilija Panov
  • Martin Raubal
  • Mirco Schwarz
  • Marc Wilkes

Our Project Logo:


[73] Janowicz, K. (2006). Sim-DL: Towards a Semantic Similarity Measurement Theory for the Description Logic ALCNR in Geographic Information Retrieval. R. Meersman, Z. Tari, P. Herrero et al. (Eds.): SeBGIS 2006, OTM Workshops 2006, LNCS 4278, pp. 1681 – 1692, 2006. [PDF] (external link)

Similarity of Semantic Relations

There is a new paper from Peter D. Turney called ‘Similarity of Semantic Relations’. I will also put it into the literature list, however as start here is the abstract and a weblink to a PDF version:

There are at least two kinds of similarity. Relational similarity is correspondence between relations, in contrast with attributional similarity, which is correspondence between attributes.When two words have a high degree of attributional similarity, we call them synonyms. When two pairs of words have a high degree of relational similarity, we say that their relations are analogous. For example, the word pair mason:stone is analogous to the pair carpenter:wood. This paper introduces Latent Relational Analysis (LRA), a method for measuring relational similarity. LRA has potential applications in many areas, including information extraction, word sense disambiguation, and information retrieval. Recently the Vector SpaceModel (VSM) of information retrieval has been adapted to measuring relational similarity, achieving a score of 47% on a collection of 374 college-level multiple-choice word analogy questions. In the VSM approach, the relation between a pair of words is characterized by a vector of frequencies of predefined patterns in a large corpus. LRA extends the VSM approach in three ways: (1) the patterns are derived automatically from the corpus, (2) the Singular Value Decomposition (SVD) is used to smooth the frequency data, and (3) automatically generated synonyms are used to explore variations of the word pairs. LRA achieves 56% on the 374 analogy questions, statistically equivalent to the average human score of 57%. On the related problem of classifying semantic relations, LRA achieves similar gains over the VSM.

Turney, P.D. (2006), Similarity of semantic relations, Computational Linguistics, 32 (3), 379-416. [PDF] (external link)

Pictures from GIScience 2006

Just some pictures from GIScience 2006 as flickr slideshow: [slideshow]

Similarity-Based Identity Assumptions

Here are the slides from my talk about steps towards a similarity-based identity assumption service given at GIScience 2006. You need PowerPoint >2000 for the transparency effects, sorry for that! [PPT]

See also [PDF] for a short summary of the underlaying idea of using thematic information as additional kind of (spatial) reference. Maybe this is also interesting in terms of Gazetteer research?

Copyleft instead of Copyright

Please note that all papers and other information sources discussed here (also my own) are the intellectual property of their authors and may be affected by copyright restrictions of the respective publishers. Please contact the authors or publishers directly if you have any questions and keep their copyrights in mind.

Note however that all content provided as articles within the similarity-blog are of course licensed under the Creative Commons License. Please click on the CC logo to read more about the chosen license.

Interesting Course: Categories and Concepts

The Categories and Concepts course given in 2003 at the psychology department of University of Texas at Austin by Bradley Love and Jody Hendrix contains a list (and download links) of selected core readings for people interested in concepts & categories but also in the role of similarity.

Categories and Concepts-course at: http://love.psy.utexas.edu/~love/concepts/

SimPack - Toolkit

The Department of Informatics at the University of Zurich has developed a similarity measurement toolkit called SimPack. Until now, it supports the following measurement approaches:

  • feature vectors
  • strings or sequences of strings
  • trees and graphs
  • information theory

The project is developed in Java and available (together with the Javadoc API) at: http://www.ifi.unizh.ch/ddis/simpack.html

Why Ballpoint Pens and Pencils are Similar?

Just in case you are wondering about the title of this blog: it is taken from an urban legend claiming that the NASA spends million dollars on developing a space pen where the ink does not run out due to the missing gravity - while the Russians just use a pencil to solve the same problem (see http://en.wikipedia.org/wiki/Pencil; ‘pencils in space’-section.) The German business magazine Handelsblatt takes up the story for an ad clip in television stating ‘its substance that matters’. The funny thing about the clip is that Handelsblatt claims that substance decides, but was not aware of the fact that the space pen story was a fake.
    Nevertheless I took the WordNet definition of ballpoint pen and pencil to compare their similarity using the MDSM approach [37] and it turns out that they were not very similar at all. This was the inspiration for my MDSM+TR [39] paper, where I tried to integrate Sowa’s Thematic Roles into MSDM to stress the importance of function for similarity assessments. Ballpoint pens and pencils are made out of different parts and materials, but both share the role of being writing implements.

Download the ad clip (German language only) at: http://www.bbdo.de/de/home/news/20030/spot_space_pen.html

Sowa’s Thematic Roles: http://www.jfsowa.com/ontology/thematic.htm
 

[37] Rodríguez, A. M. and M.J. Egenhofer, Comparing Geospatial Entity Classes: An Asymmetric and Context-Dependent Similarity Measure. International Journal of Geographical Information Science, 2004. 18(3): p. 229-256.

[39] Janowicz, K. (2005) Extending Semantic Similarity Measurement by Thematic Roles, in First International Conference on GeoSpatial Semantics, GeoS 2005, Mexico City, Mexico.2005, Springer Verlag: Berlin. p. 137-152. [PDF] (external link)

Role & Filler-Similarity for Description Logics

At least in my opinion there a two ways to handle similarity between role-filler pairs: The first (and maybe most straightforward one) is to define similarity as product of the similarities derived by comparing roles and fillers (see equation 1). The second approach is a weighted sum of role and filler similarities (see equation 2).

As example both equations measure overlap between existential quantifications (sime), where simr is the inter-role and simc the inter-filler (range-concept) similarity. Equation 1 returns 0 if compared roles or fillers are dissimilar (sim = 0), which is an advantage from the perspective of computation time and (more important) avoids misleading results as discussed below. Nevertheless defining role and filler similarity as equally important seems to be oversimplified. Moreover except {0, 1} the resulting similarity sime is per definition (of multiplication) smaller than simr and simc, which probably contradicts with humans way of perceiving similarity! The second approach however raises the question how to semi-automatically derive the weightings that determining the relative importance of inter-role respectively inter-filler similarity for sime. In addition, high similarity ratings (for sime) already occur if one of the measured similarities is significant, while the other may be even 0.
     Imagine a transportation device ontology, where R specifies an inside and S a disjoint relation. If both fillers C and D stand for waterways, equation 1 yields 0, while equation 2 results in ωc*1. Now one may argue that the weighting for inter-role similarity should be higher, but than you just need to switch the example (by defining dissimilar fillers) to run into the same difficulty again.
    To overcome this shortcoming I have recently added the notion of thresholds from neural networks to the additive similarity approach to define a minimum similarity value simr and simc need to overleap, else sime is 0. The question of how to derive the weightings and the threshold is still open, but maybe it is possible to integrate the notion of commonality and variability used in MDSM [37] for this purpose. However until now the theory presented in [73] uses the product similarity approach, its idea of context-awareness is comparable to MDSM and therefore a combinations seems to be promising. As start I have used a threshold t = 0.3, ωr = 0.6 and ωc = 0.4 for some first experiments within a simplified accommodation ontology.

[37] Rodríguez, A. M. and M.J. Egenhofer, Comparing Geospatial Entity Classes: An Asymmetric and Context-Dependent Similarity Measure. International Journal of Geographical Information Science, 2004. 18(3): p. 229-256

[73] Janowicz, K. (2006). Sim-DL: Towards a Semantic Similarity Measurement Theory for the Description Logic ALCNR in Geographic Information Retrieval. R. Meersman, Z. Tari, P. Herrero et al. (Eds.): SeBGIS 2006, OTM Workshops 2006, LNCS 4278, pp. 1681 – 1692, 2006. 

Hybrid Approaches to Similarity?

I have added a new category called ‘Hybrid Approaches to Similarity‘ to the literature section; however I am not satisfied doing so. Some authors explicitly state that their approaches are hybrid, but in my opinion this is the case for most recent theories. For instance MDSM [37] is an extended version of Tversky’s ratio model [4] and therefore a classical feature-based approach. Nevertheless in equation 2 and 3 a network model (based on the distance to the least upper bound) is chosen to determine the weighting α and therefore asymmetry. Should this be called hybrid?
    As a start I put some papers into this section that clearly combine several approaches. A good example may be Schwering’s hybrid model [51].

[37] Rodríguez, A. M. and M.J. Egenhofer, Comparing Geospatial Entity Classes: An Asymmetric and Context-Dependent Similarity Measure. International Journal of Geographical Information Science, 2004. 18(3): p. 229-256

[4] Tversky, A. (1977) Features of Similarity. Psychological Review. 84(4): p.327-352.

[51] Schwering, A. (2005). Hybrid model for semantic similarity measurement. 4th International Conference on Ontologies, DataBases, and Applications of Semantics (ODBASE05). Agia Napa, Cyprus. Springer.

« Previous PageNext Page »