• Places
    • Home
    • Graphs
    • Prefixes
  • Admin
    • Users
    • Settings
    • Plugins
    • Statistics
  • Repository
    • Load local file
    • Load from HTTP
    • Load from library
    • Remove triples
    • Clear repository
    • RDF quality heuristics
  • Query
    • YASGUI SPARQL Editor
    • Simple Form
    • SWISH Prolog shell
  • Help
    • Documentation
    • Tutorial
    • Roadmap
    • HTTP Services
  • Login

SWI-Prolog Natural Language Processing Primitives
AllApplicationManualNameSummaryHelp

  • Documentation
    • Reference manual
    • Packages
      • SWI-Prolog Natural Language Processing Primitives
        • Double Metaphone -- Phonetic string matching
        • Porter Stem -- Determine stem and related routines
        • library(snowball): The Snowball multi-lingual stemmer library
        • library(isub): isub: a string similarity measure
          • isub/4

4 library(isub): isub: a string similarity measure

author
Giorgos Stoilos
See also
A string metric for ontology alignment by Giorgos Stoilos, 2005.

The library(isub) implements a similarity measure between strings, i.e., something similar to the Levenshtein distance. This method is based on the length of common substrings.

[det]isub(+Text1:text, +Text2:text, +Normalize:bool, -Similarity:float)
Similarity is a measure for the distance between Text1 and Text2. E.g.
?- isub('E56.Language', 'languange', true, D).
D = 0.711348.

If Normalize is true, isub/4 applies string normalization as implemented by the original authors: Text1 and Text2 are mapped to lowercase and the characters "._ " are removed. Lowercase mapping is done with the C-library function towlower(). In general, the required normalization is domain dependent and is better left to the caller. See e.g., unaccent_atom/2.

Text1 and Text2 are either an atom, string or a list of characters or character codes.
Similarity is a float in the range [0.0..1.0], where 1.0 means most similar

Index

?
atom_to_stem_list/2
double_metaphone/2
double_metaphone/3
1
isub/4
porter_stem/2
2
read/1
2
snowball/3
snowball_current_algorithm/1
tokenize_atom/2
tokenize_atom/3
2
unaccent_atom/2
2

ClioPatria (version V3.1.1-40-g9d9e003)