List Info

Thread: Re: Fast way to find string in a file ?




Re: Fast way to find string in a file ?
country flaguser name
United States
2007-05-25 19:34:05

I get a little annoyed that yahoogroups strips out the leading spaces
which then greatly reduces the readibility of the code.

Maybe producing ">" prefixes on each line by replying to my own
message will overcome this.
I won't make a habit of this, this is just a test.
If anyone knows a way to preserve indents in posted code please let
me know.

> Your basic loop to scan a file is pretty simple and
> will look like this:
>
> function StringInFile(strFind, strFileName: string): boolean;
> const
> BUFSIZE = 8192;
> var
> fstm: TFileStream;
> numread: Longint;
> buffer: array [0..BUFSIZE-1] of char;
> szFind: array [0..255] of char;
> found: boolean;
> begin
> StrPCopy(szFind, strFind);
> found := False;
> fstm := TFileStream.Create(strFileName, fmOpenRead);
> repeat
> numread := fstrm.Read(Buffer, BUFSIZE);
> if BMFind(szFind, Buffer, numread) >= 0 then
>; found := True
>; else if numread = BUFSIZE then // more to scan
>; fstm.Position := fstmPosition - (Length(strFind)-1);
> until found or (numread < BUFSIZE);
> fstm.Free;
> Result := found;
&gt; end;
>;
> 1) The reason for backing up fstm.Position by nearly the length
&gt; of strFind is in case strFind crosses buffer boundaries.
>
> 2) The "BMFind" function used above is a Boyer-Moore search
&gt; as shown below. This is the fastest string search known.
&gt;
> function BMFind(szSubStr, buf: PChar; iBufSize: integer): integer;
> { Returns -1 if substring not found,
&gt; or zero-based index into buffer if substring found }
> var
> iSubStrLen: integer;
> skip: array [char] of integer;
> found: boolean;
> iMaxSubStrIdx: integer;
> iSubStrIdx: integer;
> iBufIdx: integer;
> iScanSubStr: integer;
> mismatch: boolean;
> iBufScanStart: integer;
> ch: char;
&gt; begin
&gt; { Initialisations }
> found := False;
&gt; Result := -1;
> { Check if trivial scan for empty string }
> iSubStrLen := StrLen(szSubStr);
> if iSubStrLen = 0 then
>; begin
&gt; Result := 0;
> Exit
>; end;
>;
> iMaxSubStrIdx := iSubStrLen - 1;
> { Initialise the skip table }
> for ch := Low(skip) to High(skip) do skip[ch] := iSubStrLen;
> for iSubStrIdx := 0 to (iMaxSubStrIdx - 1) do
> skip[szSubStr[iSubStrIdx]] := iMaxSubStrIdx - iSubStrIdx;
>
> { Scan the buffer, starting comparisons at the end of the
substring }
> iBufScanStart := iMaxSubStrIdx;
> while (not found) and (iBufScanStart < iBufSize) do
> begin
&gt; iBufIdx := iBufScanStart;
> iScanSubStr := iMaxSubStrIdx;
> repeat
&gt; mismatch := (szSubStr[iScanSubStr] <> buf[iBufIdx]);
>; if not mismatch then
>; if iScanSubStr > 0 then
>; begin // more characters to scan
>; Dec(iBufIdx); Dec(iScanSubStr)
&gt; end
> else
>; found := True;
&gt; until mismatch or found;
&gt; if found then
>; Result := iBufIdx
> else
>; iBufScanStart := iBufScanStart + skip[buf[iBufScanStart]];
>; end;
>; end;
>;
> I have included a "wholeword_only&quot; flag n the BMFind below.
&gt; This confirms or rejects the "found" result, and will
>; cause the loop to keep searching if match is rejected.
>
> function BMFind(szSubStr, buf: PChar; iBufSize: integer;
> wholeword_only: boolean): integer;
> { Returns -1 if substring not found,
&gt; or zero-based index into buffer if substring found }
> var
> iSubStrLen: integer;
> skip: array [char] of integer;
> found: boolean;
> iMaxSubStrIdx: integer;
> iSubStrIdx: integer;
> iBufIdx: integer;
> iScanSubStr: integer;
> mismatch: boolean;
> iBufScanStart: integer;
> ch: char;
&gt; begin
&gt; found := False;
&gt; Result := -1;
> iSubStrLen := StrLen(szSubStr);
> if iSubStrLen = 0 then
>; begin
&gt; Result := 0;
> Exit
>; end;
>;
> iMaxSubStrIdx := iSubStrLen - 1;
> { Initialise the skip table }
> for ch := Low(skip) to High(skip) do skip[ch] := iSubStrLen;
> for iSubStrIdx := 0 to (iMaxSubStrIdx - 1) do
> skip[szSubStr[iSubStrIdx]] := iMaxSubStrIdx - iSubStrIdx;
>
> { Scan the buffer, starting comparisons at the end of the
substring }
> iBufScanStart := iMaxSubStrIdx;
> while (not found) and (iBufScanStart < iBufSize) do
> begin
&gt; iBufIdx := iBufScanStart;
> iScanSubStr := iMaxSubStrIdx;
> repeat
&gt; mismatch := (szSubStr[iScanSubStr] <> buf[iBufIdx]);
>; if not mismatch then
>; if iScanSubStr > 0 then
>; begin // more characters to scan
>; Dec(iBufIdx); Dec(iScanSubStr)
&gt; end
> else
>; found := True;
&gt; until mismatch or found;
&gt; if found and wholeword_only then
>; begin
&gt; if (iBufIdx > 0) then
>; found := not IsCharAlpha(buf[iBufIdx - 1]);
>; if found then
>; if iBufScanStart < (iBufSize - 1) then
>; found := not IsCharAlpha(buf[iBufScanStart + 1]);
>; end;
>; if found then
>; Result := iBufIdx
> else
>; iBufScanStart := iBufScanStart + skip[buf[iBufScanStart]];
>; end;
>; end;
>;
> Obviously you'll be tempted to increase BUFSIZE on the assumption
> that it will improve performance. My experience is that it does
>; not, and that 8K is pretty optimum.
>

__._,_.___
.

__,_._,___
Re: Re: Fast way to find string in a file ?
country flaguser name
United Kingdom
2007-05-25 22:57:46

Hi,

> If anyone knows a way to preserve indents
> in posted code please let me know.

They are preserved if you receive messages by email
and display them in plain text:

delphi-en%40yahoogroups.com">delphi-enyahoogroups.com

Using > is a bad idea because it makes it look as though
you are quoting someone else's code. And whatever you
use, it will only preserve a single space indent in HTML.

It's better to replace all multiple spaces with a visible
character such as ~.

function StringInFile(strFind, strFileName: string): boolean;
const
~ BUFSIZE = 8192;
var
~ fstm: TFileStream;
~ numread: Longint;
~ buffer: array [0..BUFSIZE-1] of char;
~ szFind: array [0..255] of char;
~ found: boolean;
begin
~ StrPCopy(szFind, strFind);
~ found := False;
~ fstm := TFileStream.Create(strFileName, fmOpenRead);
~ repeat
~~~ numread := fstrm.Read(Buffer, BUFSIZE);
~~~ if BMFind(szFind, Buffer, numread) >= 0 then
~~~~~ found := True
~~~ else if numread = BUFSIZE then // more to scan
~~~~~ fstm.Position := fstmPosition - (Length(strFind)-1);
~ until found or (numread < BUFSIZE);
...

regards,

Martin.

__._,_.___
.

__,_._,___
[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )