The problem I'm trying to solve:
- extract keywords from multiple texts
- try to summarize texts > sentence extraction
- group and relate products based on their descriptions > classification / clustering
- add relevant information to text based on similar / related text
- http://text-analysis.googlecode.com/files/Text_Mining_Infrastructure_in_R.pdf
- http://cran.r-project.org/web/packages/tm/vignettes/tm.pdf - introduction to tm package
- http://cran.r-project.org/doc/Rnews/Rnews_2008-2.pdf - introduction to text mining in R
- http://epub.wu.ac.at/1923/1/document.pdf - text mining in R and its applications
Then I've jumped to TextRank algorithm for keywords & sentence extraction. Seems, TextRank is not present in tm, but there is Java source code available so should be possible to call it from R.
Will need to compare TextRank to KEA. The later is implemented for R in RKEA.
Looks promising.
1 comment:
Hi, Did you make any progress to it ? I was going to start some research on this topic with R? If you can share your progress, it will really help.
Post a Comment