Here is a small function to implement a fuzzy search.
If someone is wondering about the mathematics inside, this
information can be found in "A fast bit-vector
algorithm for approximate string matching based on dynamic
programming" by Gene Myers, May 27 1998
<?php
// This can search for PATTERN into HAYSTACK with MIST
mistaken symbols in PATTERN
// OK_MATCH is a function ($Position, $Score)
// Position is zero-based index in HAYSTACK, where PATTERN
is found,
// and Score is the number of mistaken symbols
function FuzzyLook($pattern,$haystack,$mist,$ok_match)
{
$m = strlen($pattern);
$n = strlen($haystack);
if($n==0 OR $m==0) return;
// Precompute Peq[Z]
for($i=0; $i<32; $i++) $Bit[$i] = (1 << $i);
for($i=0; $i<$m; $i++) $Peq[ ord(substr($pattern,$i,1))
] |= $Bit[$i];
$Pv = 0xFFFFFFFF;
$Mv = 0;
$Score = $m;
for($j=0; $j<$n; $j++)
{
$Eq = $Peq[ ord(substr($haystack,$j,1))];
$Xv = $Eq | $Mv;
$Xh = ((($Eq & $Pv) + $Pv) ^ $Pv) | $Eq;
$Ph = $Mv | ~ ($Xh | $Pv);
$Mh = $Pv & $Xh;
if($Ph & (1 << ($m-1))) $Score += 1;
elseif($Mh & (1 << ($m-1))) $Score -= 1;
$Ph <<= 1;
$Pv = ($Mh << 1) | ~ ($Xv | $Ph);
$Mv = $Ph & $Xv;
if($Score <= $mist && $j>=$m-1)
$ok_match($j-$m+1,$Score);
}
}
?>
----
Server IP: 66.163.161.117
Probable Submitter: 85.187.59.16
----
Manual Page -- http://www.php.net/manual/en/function.levenshtein.php
Edit -- https://master
.php.net/note/edit/70824
Del: integrated -- h
ttps://master.php.net/note/delete/70824/integrated
Del: useless -- http
s://master.php.net/note/delete/70824/useless
Del: bad code -- htt
ps://master.php.net/note/delete/70824/bad+code
Del: spam -- https:/
/master.php.net/note/delete/70824/spam
Del: non-english --
https://master.php.net/note/delete/70824/non-english
Del: in docs -- http
s://master.php.net/note/delete/70824/in+docs
Del: other reasons-- https://mast
er.php.net/note/delete/70824
Reject -- https://mast
er.php.net/note/reject/70824
Search -- https://
master.php.net/manage/user-notes.php
--
PHP Notes Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub
.php
|