Have you ever wondered what kind of similar keyphrases people use when searching for something and how do they compare?
I have :)
Using a subset of keyphrases used by visitors of inlevel.com (some 550 items), I wrote a simple R program that tries to assign potentially similar phrases to each of the keyphrases used.
I employed three methods for this task:
I have :)
Using a subset of keyphrases used by visitors of inlevel.com (some 550 items), I wrote a simple R program that tries to assign potentially similar phrases to each of the keyphrases used.
I employed three methods for this task:
- Levenshtein distance - see source code here
- length of longest common substring (LCS) - source code of my implementation here
- phrases starting with base phrase - very rude implementation available here
As you will notice, each of the method gives different results. None single set of the results is definitively better than the others. Combining them in some way seems the most interesting solution.
Correcting spelling mistakes before analyzing dependencies seems important for improving results. Some other methods can be added as well.
No comments:
Post a Comment