List Info

Thread: Re: Searching magma string indices




Re: Searching magma string indices
country flaguser name
United States
2007-03-05 21:59:29
Hi Brent,

> > Now, the expression '%foo%bar%' goes beyond the
first requirement
> of just searching on a prefix and/or a suffix..  For
this a different
> type of index would be needed that adds every possible
value for
> "Fredfoon Tobart", the following values:
> >
> >     fredfoon tobart
> >     redfoon tobart
> >     edfoon tobart
> >     dfoon tobart
> >     foon tobart
> >     oon tobart
> >     on tobart
> >     n tobart
> >      tobart
> >     tobart
> >     obart
> >     bart
> >     art
> >     rt
> >     t
> >
> > Your '%foo%bar%' expression would then need to
translate to
> >
> >     (familyName from: 'foo' to: 'foo'
maAlphabeticalNext)
> >     & (familyName from: 'bar' to: 'bar'
maAlphabeticalNext)
> >
> > to find the object(s) that had "Fredfoon
Tobart".
> 
> I must admit I am not following you here. Is this one
of those Magma
> keyword indices which I have never managed to grok ?

Yes.  As you know, even though many indexes only map one
value per
indexed attribute, any index can just as easily map multiple
values per
object, per indexed attribute.

Have a look at MaClause>>#shouldInclude:using:, you
can very simply how
it treats multiple values; if any one of the (multiple)
indexed values
for a particular object are "in range", then the
entire object
qualifies for that clause.

It's more generic than a "keyword" index, but a
keyword index is an
easy way to grok it, because you know when you search for a
keyword,
any of the keywords assigned to an object will cause that
object to
match.

So in the above example, searching the first clause 

> >    (familyName from: 'foo' to: 'foo'
maAlphabeticalNext)

is qualified on the 5th element

> >    foon tobart

and the second clause

> >     & (familyName from: 'bar' to: 'bar'
maAlphabeticalNext)

is qualifie by the 12th element:

> >     bart

therefore the entire expression is satisfied and the object
qualifies.

> Also, whilst you are hunting elephants, SQL has both %
and ?
> wildcards: % is any sequence of characters inlcuding
the empty string
> and ? is precicely one character.
> So foo??bar would match fooABbar but not fooCDEbar and
not fooFbar.
> 
> Any chance you could bend Magma's indices into managing
expressions
> with a fixed number of ?s (e.g. ?foo????bar??baz???)

Yes, there are several ways to do this but here's one.  A
new index
type that simply indexes a unique value for each character
at a
particular position.  Returning to the previous example, an
object
whose #familyName attribute is "fredfoon tobart",
this index type would
to index the object at the following values:

  (256 * 1) + $f asciiValue
  (256 * 2) + $r asciiValue
  (256 * 3) + $e asciiValue
  (256 * 4) + $d asciiValue
  (256 * 5) + $f asciiValue
  (256 * 6) + $o asciiValue
  (256 * 7) + $o asciiValue
  (256 * 8) + $n asciiValue
  (256 * 9) + Character space asciiValue
  (256 * 10) + $t asciiValue
  (256 * 11) + $o asciiValue
  (256 * 12) + $b asciiValue
  (256 * 13) + $a asciiValue
  (256 * 14) + $r asciiValue
  (256 * 15) + $t asciiValue

Then, whenever a single-character-matching string is
specified, every
character other than the question-marks becomes a
conjunction in the
query.  If the user is searching:

  where: [ : p | p familyName like: '????foon toba??' ]

then it would have to be parsed into the following regular
clause:

  (familyName = ((256*5) + $f asciiValue)
  & (familyName = ((256*6) + $o asciiValue)
  & (familyName = ((256*7) + $o asciiValue)
  & (familyName = ((256*8) + Character space
asciiValue)
  & (familyName = ((256*9) + $t asciiValue)
  & (familyName = ((256*10) + $o asciiValue)
  & (familyName = ((256*11) + $b asciiValue)
  & (familyName = ((256*12) + $a asciiValue)

Notice the question marks are the ones on which we are not
qualifying,
so any value can be placed in positions 1-4 and 13-14.  

Does this all make sense?

> It is to darn hot here to think.

Yeah, you're probably just already missing skiing down fresh
powder.. 


_______________________________________________
Magma mailing list
Magmalists.squeakfoundation.org
http://lists.squeakfoundation.org/mailman/listinfo/magma


Re: Searching magma string indices
country flaguser name
South Africa
2007-03-07 10:10:36
Hi Chris,

Thanks for your suggestions: I think we are getting closer.
However:

> > Also, whilst you are hunting elephants, SQL has
both % and ?
> > wildcards: % is any sequence of characters
inlcuding the empty string
> > and ? is precicely one character.
> > So foo??bar would match fooABbar but not fooCDEbar
and not fooFbar.
> > 
> > Any chance you could bend Magma's indices into
managing expressions
> > with a fixed number of ?s (e.g.
?foo????bar??baz???)
> 
> Yes, there are several ways to do this but here's one. 
A new index
> type that simply indexes a unique value for each
character at a
> particular position.  Returning to the previous
example, an object
> whose #familyName attribute is "fredfoon
tobart", this index type would
> to index the object at the following values:
> 
>   (256 * 1) + $f asciiValue
>   (256 * 2) + $r asciiValue
>   (256 * 3) + $e asciiValue
>   (256 * 4) + $d asciiValue
>   (256 * 5) + $f asciiValue
>   (256 * 6) + $o asciiValue
>   (256 * 7) + $o asciiValue
>   (256 * 8) + $n asciiValue
>   (256 * 9) + Character space asciiValue
>   (256 * 10) + $t asciiValue
>   (256 * 11) + $o asciiValue
>   (256 * 12) + $b asciiValue
>   (256 * 13) + $a asciiValue
>   (256 * 14) + $r asciiValue
>   (256 * 15) + $t asciiValue
> 
> Then, whenever a single-character-matching string is
specified, every
> character other than the question-marks becomes a
conjunction in the
> query.  If the user is searching:
> 
>   where: [ : p | p familyName like: '????foon toba??'
]
> 
> then it would have to be parsed into the following
regular clause:
> 
>   (familyName = ((256*5) + $f asciiValue)
>   & (familyName = ((256*6) + $o asciiValue)
>   & (familyName = ((256*7) + $o asciiValue)
>   & (familyName = ((256*8) + Character space
asciiValue)
>   & (familyName = ((256*9) + $t asciiValue)
>   & (familyName = ((256*10) + $o asciiValue)
>   & (familyName = ((256*11) + $b asciiValue)
>   & (familyName = ((256*12) + $a asciiValue)
> 
> Notice the question marks are the ones on which we are
not qualifying,
> so any value can be placed in positions 1-4 and 13-14. 

> 
Note: SQL actually uses '_' not '?' for a single wild
character. Sorry: I continue to use '?' in this thread for
consistency.

This all works except for expressions of the form: '???'
i.e. where we are searching for all strings with exactly
three characters.
I have yet to think of how to include this boundary
condition except that we could additionally index on the
size of the strings. In this case the MaHashIndex
would include a record with key = 3 and values = 'all string
of length three'.

Finally, we also need to support expressions like
'%foo%bar?b??t?'. I am hoping your earlier suggestions on
searching for %'s can be merged into the same index
as the ?s.


Any ideas ?

Brent





_______________________________________________
Magma mailing list
Magmalists.squeakfoundation.org
http://lists.squeakfoundation.org/mailman/listinfo/magma


[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )