Login Form



Welcome to the Roman Gelembjuk's blog
How to extract useful content from HTML page

One of most important tasks in information retrieval science is to extract useful content from Web pages.

The problem is Web pages (usually HTML) contain some information (text) with tons of information additional texts - navigation structures, advertising etc. From point of view of information retrieval tools each HTML page has main (useful) content and helpful information that is good when viewing Web, but not when extracting the data.

The task is to filter useful content from HTML without knowing of structure of the page.

It is not difficult to get useful content when structure of HTML page is known. For example, with using WebScraper. But it is not possible to create templates or REGEXP exprassions for every page in the Web.

Last Updated on Thursday, 18 March 2010 15:05
How i edited PHP files on remote FTP server online

Recently i was out home and office. And i had to do quick bug fix on one of my sites.

I get chance to use notebook of my friend . But there was no any PHP editor or even goof FTP client tool. I was not able to install any software on that notebook.

I found nice solution to edit my code online with SMEStorage service.

Last Updated on Monday, 01 March 2010 15:03
Extracting data from HTML pages with Perl

Extracting data from HTML pages is procedure that is used more and more in different applications. Usually, codes use regular expressions to get some data from HTML page. This is not good solution when it is needed to extract complex data structure. Regular expression usually are big and difficult to understand if it is needed to change after some time or another coder want to change.

I found another solution. I have create Perl module that extracts data with using template. Template is HTML code with some special tags and attributes on place of data to extract.

Last Updated on Friday, 26 February 2010 08:32
SMEStorage Joomla component

I have released SMEStorage Joomla component. This is tool for creating file downloads directory with Joomla site.

The component is like mirror of SMEStorage folder. It shows folders and files and allow files to be downloaded.

Files are stored with SMEStorage. SMEStorage allow to store files in different clouds - Amazon S3, Gmail, Google Docs, FTP, WebDav, iDisc and more.

Last Updated on Friday, 19 February 2010 14:26
Uploading file to SMEStorage

In this post i will describe php sample that uploads file to SMEStorage (http://www.smestorage.com)

In previous post i wroute about how SMEStorage API works in general and creating of folder was described.

Now i will extend that sample and add function for uploading of the file.

Last Updated on Thursday, 18 February 2010 14:43
<< Start < Prev 1 2 3 4 5 6 Next > End >>

Page 5 of 6