Split text for sentences with PHP PDF Print E-mail

Getting sentences from text if function needed in many applications.

There i will describe how to split text for sentences with PHP.

 

I usually use next function to parse text for sentences:

function toxttosentence($text){

$text=ereg_replace("\t","",$text);            
$text = preg_replace("/\n\s+\n/", "\n\n", $text);
$text = preg_replace("/[\n]{3,}/", "\n\n", $text);

$sentences=array();
$a=split("\n\n",$text);    
foreach ($a as $b)
{
 $b = preg_replace("/http:\/\/(.*?)[\s\)]/", "", $b);
 $b = preg_replace("/http:\/\/([^\s]*?)$/", "", $b);
 $b = preg_replace("/\[\s*[0-9]*\s*\]/", "", $b);
 foreach (split('\.',$b) as $sent)
 if(strlen(trim($sent))>3){
 $sent=preg_replace("/\n/", " ", $sent);
 $sent=trim(ereg_replace("  ", " ", $sent));
 $sent[0]=strtoupper($sent[0]);
 array_push($sentences,$sent.'.');
 }
}

return $sentences;
}

Last Updated on Friday, 19 March 2010 11:43
 

Comments  

 
#4 amal 2012-08-26 16:29
very nice and very useful article thank you
 
 
#3 Liam ONeill 2011-08-22 14:45
Ah ok. I'm not too good with regex but I guess the best way would be to check the length of previous sentence and if it is less that say 2 words it couldn't possibly be a sentence. Not totally fool proof but maybe avoid the most basic of punctuation. I have spoken with my client and she happy to ensure that all text inputtted has at the end of each sentence (not written out like that. Just like this. Before the next sentence starts but I was hoping to find a solution
 
 
#2 Administrator 2011-08-22 14:38
Quoting Liam ONeill:
Does this function take into account things like Dr. Smith and U.S.A or is it just looking for periods?

As a side comment it really gets up my nose when I am not allowed to put ' on my name. My name is O'Neill but your form won't allow it in the name field. LOL

No. It uses point as separator of sentences.
 
 
#1 Liam ONeill 2011-08-22 14:32
Does this function take into account things like Dr. Smith and U.S.A or is it just looking for periods?

As a side comment it really gets up my nose when I am not allowed to put ' on my name. My name is O'Neill but your form won't allow it in the name field. LOL