Generating Unique Text with Markov chains PDF Print E-mail

Few years ago i had written Joomla component AutoContent. That component generates new unique content and add it to Joomla site. In general, this is not good tool, because it does black SEO things. If search engine (ex. Google) finds that there is autogenerated text without sense then site can be banned.

I desided to not support that component anymore. But there are some interesting classes in that component.

One of them is php class that generates new unique text with Markov chains.

From Wikipedia: …Markov chain is a stochastic process with markov property … [Which means] state changes are probabilistic, and future state depend on current state only.

Markov chains have various uses, but one of most known in web is generating new text with using some existent text.

The algorithm is,

  1. Parse given text for words, save words to the list and create relations for each word to words that are next after it in text.
  2. Choose one starting word from the text. This word describe current state of Markov matrix.
  3. Get next word with using relations or the list. To generate the next word, see what words are next for current word in the original text. Choose one of them randomly.
  4. Repeat 2, until text of required size is generated.

function generateNewSentences($sentences,$wordscount=2) {
 $hash=array();
 $resultsentences=array();
 
 for ($i=0;$i<count($sentences);$i++){
 $words=split(' ',trim($sentences[$i]));

 for($k=0;$k<count($words)-$wordscount;$k++){
 $prefix=trim(join(' ',array_slice($words,$k,$wordscount)));
 if($prefix=='') continue;

 if(empty($hash[$prefix])){
 $hash[$prefix]=array($words[$k+$wordscount]);

 for($j=$i+1;$j<count($sentences);$j++){
 if(preg_match('/'.ereg_replace('/','\/',preg_quote($prefix)).'(.*)$/',$sentences[$j],$m)){
 $w=split(' ',trim($m[1]));
 
 if(count($w)>0 && trim($w[0])!='')
 array_push($hash[$prefix],trim($w[0]));
 }
 }
 }
 }
 }

 $prefixes=array_keys($hash);

 $stpr=array();
 foreach ($prefixes as $pr){
 if($pr[0]==strtoupper($pr[0]))
 array_push($stpr,$pr);
 }

 for ($i=0;$i<count($sentences);$i++){
 $p= $stpr[rand(0,count($stpr)-1)];     
 $sent=split(' ',$p);
 $cc=count(split(' ',$sentences[$i]));

 $j=0;

 do{
 $w=$hash[$p][rand(0,count($hash[$p])-1)];
 array_push($sent,$w);
 $j++;
 
 $p=join(' ',array_slice($sent,$j,$wordscount));
 }while(strrpos($w,'.')!=strlen($w)-1 && $j<$cc*2);

 $sn=join(' ',$sent);
 if($sn[strlen($sn)-1]!='.') $sn.='.';
 array_push($resultsentences,$sn);
 }
 return $resultsentences;
}

There is code of the function that generates new text with Markov chains. The functions requires array of sentences as argument.

 

Last Updated on Friday, 19 March 2010 11:29
 

Comments  

 
#1 Ole Smith 2015-10-06 10:20
Oh guys what a great post , You have done really really great , Fantastic post. You post really very help me in my study. Specially i like your hard working and your way of talking really very friendly.
Purchase now our Gymnastic Mats at affordable price from GymnasticMatsUk .co.uk