Login Form



Syndication

feed-image


Roman Gelembjuk
SMEStorage php sample PDF Print E-mail

I have did some libs for SMEStorage.com API . Also i communicated with programmers that used SMEStorage API and i found that most difficult thing is to start using this API. For example, creating a folder or uploading a file. And next things are easy is this first tasks are done.

There i will describe how to create folder in SMEStorage with php.

Last Updated on Friday, 12 February 2010 12:46
 
Extracting useful content from Web-resource. Technologies review. PDF Print E-mail

There is review of techniques of extracting important content from web-resources.

It is not secret that Internet giants like Google, Microsoft or Yahoo already have powerful technologies that can extract important content from web-pages. But big corporations are not interested in publishing of their technologies.

If to try to find information on this question we can find 2 types of resources - scientific publications and blog posts and articles of individuals that did some small tools for their needs.

Let see what information we can find now for the subject "Extracting useful (important) information from web pages".

Last Updated on Wednesday, 21 April 2010 14:22
 
Getting keywords of the text PDF Print E-mail

I have written simple Perl script that extracts keywords from the text.

The tool is avaliable to play with the link http://gelembjuk.com/cgi-bin/science/textmining/1/keywords.pl

This tool extracts all words from text, remembers each word count of repeating and then leave most used words as keywords. Also words are filtered , some common words are dropped from the list.

Last Updated on Friday, 19 February 2010 14:27
 
How to extract useful content from HTML page PDF Print E-mail

One of most important tasks in information retrieval science is to extract useful content from Web pages.

The problem is Web pages (usually HTML) contain some information (text) with tons of information additional texts - navigation structures, advertising etc. From point of view of information retrieval tools each HTML page has main (useful) content and helpful information that is good when viewing Web, but not when extracting the data.

The task is to filter useful content from HTML without knowing of structure of the page.

It is not difficult to get useful content when structure of HTML page is known. For example, with using WebScraper. But it is not possible to create templates or REGEXP exprassions for every page in the Web.

Last Updated on Thursday, 18 March 2010 15:05
 
<< Start < Prev 1 2 3 4 5 6 Next > End >>

Page 6 of 6