List Info

Thread: note 70824 added to function.levenshtein




note 70824 added to function.levenshtein
user name
2006-10-30 15:03:25
Here is a small function to implement a fuzzy search.
If someone is wondering about the mathematics inside, this
information can be found in "A fast bit-vector
algorithm for approximate string matching based on dynamic
programming" by Gene Myers, May 27 1998

<?php
// This can search for PATTERN into HAYSTACK with MIST
mistaken symbols in PATTERN
// OK_MATCH is a function ($Position, $Score)
// Position is zero-based index in HAYSTACK, where PATTERN
is found,
// and Score is the number of mistaken symbols
function FuzzyLook($pattern,$haystack,$mist,$ok_match)
{
	$m = strlen($pattern);
	$n = strlen($haystack);
	if($n==0 OR $m==0) return;
	// Precompute Peq[Z]
  for($i=0; $i<32; $i++) $Bit[$i] = (1 << $i);
	for($i=0; $i<$m; $i++) $Peq[ ord(substr($pattern,$i,1))
] |= $Bit[$i];

	$Pv = 0xFFFFFFFF;
	$Mv = 0;
	$Score = $m;

	for($j=0; $j<$n; $j++)
	{
		$Eq = $Peq[ ord(substr($haystack,$j,1))];
		$Xv = $Eq | $Mv;
		$Xh = ((($Eq & $Pv) + $Pv) ^ $Pv) | $Eq;
		$Ph = $Mv | ~ ($Xh | $Pv);
		$Mh = $Pv & $Xh;
		if($Ph & (1 << ($m-1))) $Score += 1;
		elseif($Mh & (1 << ($m-1))) $Score -= 1;
		$Ph <<= 1;
		$Pv = ($Mh << 1) | ~ ($Xv | $Ph);
		$Mv = $Ph & $Xv;
		if($Score <= $mist && $j>=$m-1)
$ok_match($j-$m+1,$Score);
	}
}

?>
----
Server IP: 66.163.161.117
Probable Submitter: 85.187.59.16
----
Manual Page -- http://www.php.net/manual/en/function.levenshtein.php
Edit        -- https://master
.php.net/note/edit/70824
Del: integrated  -- h
ttps://master.php.net/note/delete/70824/integrated
Del: useless     -- http
s://master.php.net/note/delete/70824/useless
Del: bad code    -- htt
ps://master.php.net/note/delete/70824/bad+code
Del: spam        -- https:/
/master.php.net/note/delete/70824/spam
Del: non-english -- 
https://master.php.net/note/delete/70824/non-english
Del: in docs     -- http
s://master.php.net/note/delete/70824/in+docs
Del: other reasons-- https://mast
er.php.net/note/delete/70824
Reject      -- https://mast
er.php.net/note/reject/70824
Search      -- https://
master.php.net/manage/user-notes.php

-- 
PHP Notes Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub
.php

[1]

about | contact  Other archives ( Real Estate discussion Medical topics )