Friday, December 24, 2010

First steps in text mining with R

Everyone is preparing for Christmas Eve's Dinner.  No one is calling, little email. Looks like a perfect time to start researching text mining in R :)

The problem I'm trying to solve:

  • extract keywords from multiple texts
  • try to summarize texts > sentence extraction
  • group and relate products based on their descriptions > classification / clustering
  • add relevant information to text based on similar / related text
I've started with tm package.

Then I've jumped to TextRank algorithm for keywords & sentence extraction. Seems, TextRank is not present in tm, but there is Java source code available so should be possible to call it from R.

Will need to compare TextRank to KEA. The later is implemented for R in RKEA.

Looks promising.

1 comment:

Darren J said...

Hi, Did you make any progress to it ? I was going to start some research on this topic with R? If you can share your progress, it will really help.