(Semantic) Similarity-Blog

Why ballpoint pens and pencils are similar?

Archive for the 'Related Work' Category

Similarity and Context - Papers added

The following papers have been added to the similarity and context section:

[132] V. Kashyap, and A. Sheth (1996) Schematic and Semantic Similarities between Database Objects: A Context-based Approach. VLDB Journal 5 (4), pp 276-304 [PDF] (external link)

[133] Keßler, C.; Raubal, M.; Janowicz, K. (2007; forthcoming) The Effect of Context on Semantic Similarity Measurement. 3rd International IFIP Workshop On Semantic Web & Web Semantics (SWWS ‘07). Lecture Notes in Computer Science, Springer. Vilamoura, Algarve, Portugal.

New Papers added…

… about 20 new similarity papers  have been added to the literature section.

Similarity and Case-Based Reasoning

I have added a new section for similarity and case-based reasoning into the literature page to link to some interesting work of Armin Stahl and Thomas Gabel.

Special Issue on Similarity-based Pattern Recognition

There is a new special issue on  ‘Similarity-based Pattern Recognition’ in Pattern Recognition Volume 39, Issue 10, Pages 1813-1948 (October 2006) Edited by M. Bicego, V. Murino, M. Pelillo and A. Torsello. You can find the list of chapters at ScienceDirect.

Similarity and Clustering

I have added a new section (and 3 papers) to the literature page called ‘Similarity and Clustering’ Please let me know of other interesting papers in this area.

[107] Petko Valtchev, Rokia Missaoui, Similarity-based Clustering versus Galois lattice building: Strengths and Weaknesses. présenté à Workshop "Objects and Classification, A natual convergence". European Conference on Object-Oriented Programming 2000, Nice (FR), (12-16 June) 2000 [PDF] (external link)

[108] Miin-Shen Yang, Kuo-Lung Wu (2004) A Similarity-Based Robust Clustering Method. IEEE Trans. Pattern Anal. Mach. Intell. 26(4): 434-448

[109] Bicego, M.[Manuele], Murino, V.[Vittorio], Figueiredo, M.A.T. (2004) Similarity-based classification of sequences using hidden Markov models,
PR(37), No. 12, December 2004, pp. 2281-2291. [PDF] (external link)

SimRank: A Measure of Structural-Context Similarity

A college pointed my to a paper from KDD 2002 called SimRank: A Measure of Structural-Context Similarity. It describes an inter-instance similarity measure that measures similarity of the structural (graph-based) context in which instances occur, based on their relationships with other instances. In other words, such instances are assumed to be similar if they are related to similar instances and so on…
I also added the paper to the literature list.

[105] G. Jeh and J. Widom. (2002) SimRank: A Measure of Structural-Context Similarity. Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Canada, July 2002; p. 538-543.[PDF] (external link)

Similarity and Web 2.0

I have added a new section to the literature list about papers related to the role of similarity in Web 2.0. The papers deal with the role of inter-user similarity to improve trust rating in social networks.

Similarity of Semantic Relations

There is a new paper from Peter D. Turney called ‘Similarity of Semantic Relations’. I will also put it into the literature list, however as start here is the abstract and a weblink to a PDF version:

There are at least two kinds of similarity. Relational similarity is correspondence between relations, in contrast with attributional similarity, which is correspondence between attributes.When two words have a high degree of attributional similarity, we call them synonyms. When two pairs of words have a high degree of relational similarity, we say that their relations are analogous. For example, the word pair mason:stone is analogous to the pair carpenter:wood. This paper introduces Latent Relational Analysis (LRA), a method for measuring relational similarity. LRA has potential applications in many areas, including information extraction, word sense disambiguation, and information retrieval. Recently the Vector SpaceModel (VSM) of information retrieval has been adapted to measuring relational similarity, achieving a score of 47% on a collection of 374 college-level multiple-choice word analogy questions. In the VSM approach, the relation between a pair of words is characterized by a vector of frequencies of predefined patterns in a large corpus. LRA extends the VSM approach in three ways: (1) the patterns are derived automatically from the corpus, (2) the Singular Value Decomposition (SVD) is used to smooth the frequency data, and (3) automatically generated synonyms are used to explore variations of the word pairs. LRA achieves 56% on the 374 analogy questions, statistically equivalent to the average human score of 57%. On the related problem of classifying semantic relations, LRA achieves similar gains over the VSM.

Turney, P.D. (2006), Similarity of semantic relations, Computational Linguistics, 32 (3), 379-416. [PDF] (external link)

Interesting Course: Categories and Concepts

The Categories and Concepts course given in 2003 at the psychology department of University of Texas at Austin by Bradley Love and Jody Hendrix contains a list (and download links) of selected core readings for people interested in concepts & categories but also in the role of similarity.

Categories and Concepts-course at: http://love.psy.utexas.edu/~love/concepts/

SimPack - Toolkit

The Department of Informatics at the University of Zurich has developed a similarity measurement toolkit called SimPack. Until now, it supports the following measurement approaches:

  • feature vectors
  • strings or sequences of strings
  • trees and graphs
  • information theory

The project is developed in Java and available (together with the Javadoc API) at: http://www.ifi.unizh.ch/ddis/simpack.html