Over the last weeks I have developed a small, web-based human subject test to compare three models of role-filler similarity against human judgments. After an introduction and motivation section, people are asked to rate the similarity between spatial relations, objects and finally combinations of both. The results are compared to those of the computational theories. It turns out that both, the multiplicative approach and the weighted average with automatically determined flexible weightings are potential candidates whereas the simple (unweighted) average does not performed very well (as expected). Moreover there is evidence, that the multiplicative approach tends to underestimate while the weighted average overestimates in general. It took quite a while to really understand how the test should look like, which kind of rating system (sliders) to choose and how to randomize the questions - however I am still not satisfied, especially because sometimes the randomization leads to pairs that are really hard to compare or all dissimilar. I will report on all design decisions made later on.
At the moment the test is available only in German language, but an English version will be online within the next weeks. I will also give full access to the underlaying database, so that everyone interested in human similarity judgments can download and use the results. Until now more than 40 people have participated in the test. Note however, that it is still a pre-test and I will run a face-to-face test with selected participants and a slightly modified test settings in December. Human subject testing is a difficult task and there is a lot of ‘noise’ to be removed (or taken into account) before getting useful result - if you have ideas what can be improved, please comment on this posting.
Role & Filler -Test: [German Version]