How to extract key phrases from a text PDF Print E-mail

Extracting keyphrases from a text is more complex task then extracting keywords from a text. But, usually, keyphrases are more useful in text mining tasks then just keywords.

I have created Perl script that extracts keyphrases from the text.

 

 

Phrase is any sequence of 2 or more words that is repeated in text 2 or more times.

The script extracts words' sequences from the text that are most repeated. Then find phrases that are similar to another found phrases and remove them. Last step is to remove some common phrases that are used often but are not useful (like "he said" or "on the").

I have used data from Google  News for testing.

It is possible see how Perl scripts extracts keyphrases from the text. There is list of latest news from Google News service. Click link against news message and see keyphrases. Then open message and see if keyphrases are correct.

Last Updated on Monday, 09 August 2010 16:21
 

Comments  

 
#6 Amlodipine 2012-06-16 12:49
The Keyword.. Its say depend on situation, reality and emotion.. which which be perfect combination for extacting the words..
 
 
#5 ehabh 2012-02-16 04:24
Am I missing something, or is the code not published.
I have a large news site in Arabic and would love to see the common repeated phrases in my site for better editorial quality. I have a large data set of news of a few years. I would love to get a copy of the script and try it.
 
 
#4 Administrator 2011-07-20 05:51
Quoting climmi:
I think the key phrases extracted by your program are likely to be the commonly used terms, since it's a fixed term, I will know what are mentioned in the article, but what if I want to know what they are talking about, how to efficiently extract the key topics?

Yes. Knowing of key topics is not easy thing. I have some ideas about this and will try to code functionality. I want to use twitter and another social networks to get additional information from them and analyze it. This will work only for just published news .
 
 
#3 climmi 2011-07-20 02:17
I think the key phrases extracted by your program are likely to be the commonly used terms, since it's a fixed term, I will know what are mentioned in the article, but what if I want to know what they are talking about, how to efficiently extract the key topics?
 
 
#2 Administrator 2011-02-08 06:27
Quoting Brendan Newlon:
Your script is wonderful! How did you do it? I'm trying to find a way to do the same thing (find key phrases and key words) in Classical Chinese religious texts for my research. Can you give me any advice?


I don't know if the script will work for Chinese . I am going to share that script , so you will be able to try it.
Can you try to use it for Chinese there on the site to see if it works?
 
 
#1 2011-02-08 04:25
Your script is wonderful! How did you do it? I'm trying to find a way to do the same thing (find key phrases and key words) in Classical Chinese religious texts for my research. Can you give me any advice?

Thanks,

--brendan