Calculation the importance of the event PDF Print E-mail

There are hundreds of thousands of new events described in Web every day. News sites, blogs and another resources publish information about events. Usually it is needed somehow to filter important events from not important.

There i will describe the tool that assign rating (from 1 to 10) to each message in Google News service.

 

Filtering of events (news) is important in different area. Especially, in finance and business.

Few months ago i had created service News4Trader.com . There are latest business news with rating (or rank or importance) assignes to each message. Rating is from 1 to 10.

I want to describe how importance of message is calculated.

Google news service allow to read all messages about some event in 1 site. The service shows just first sentences of a message and allow to go to original sie to read full text. The power of Google News service is it can find what messages are related to the same event.

So it is possible to extract some useful information about each event from Google News service.

  • Count of messages about some event
  • Time when each message was published

List of times of publishing brings few useful things for calculation of rank: time between the first and the last messages and linear variation coefficient of time periods.

More about the second option.

If we have N messages then we have N-1 intervals between messages. Each interval is calculated in seconds (can be in minutes, this is not important).

So we have N-1 values - P(1),P(2),P(3),...,P(N-1). We use modified formula of calculating linear variation of this list to get coefficient K.

This coefficient is 0 if P(1)=P(2)=....=P(N-1)>0 . And is 1 if P(1)=P(2)=....=P(N-1)=0 . K is more 0 and less 1 in other cases.

In other words K is 1 if all messages were published in the same time. And is 0 if messages were published with some interval one after another.

So we use 3 numbers to calculate importance of event.

  • C - Count of messages about some event
  • T - time in seconds between first and last message
  • K - coefficient described above.

Formula to get rank is:

R=c1*C+c2*(1/T)+c3*K

where c1,c2,c3 are coefficients get with experiments and can be yet improved.

Full working service that uses this algorithm is at News4Trader.com

Last Updated on Tuesday, 23 March 2010 15:23