List Info

Thread: regex literals




regex literals
country flaguser name
Switzerland
2008-02-13 02:58:40
I'm thinking of adding regex literals to GNU Smalltalk.  The
only syntax 
I found that would work is ##/regex/.  /regex/ wouldn't work
for the old 
syntax, because the lexer has no way to understand that the
/ in this 
example

     a: b
         /regex/ printNl

starts a regex and is not a division operator.  It would
work in the new 
syntax (after one of [ ( { ^ . keyword: identifier
binary-message, and 
maybe a few more I forgot, / would start a regex, otherwise
it would be 
a division operator), but I don't like to add a feature that
cannot be 
ported to other Smalltalks.

What do you think?  Right now I'm more for "no" or
"not yet", but I'm 
open to discussion.

Paolo


_______________________________________________
help-smalltalk mailing list
help-smalltalkgnu.org

http://lists.gnu.org/mailman/listinfo/help-smalltalk

Re: regex literals
user name
2008-02-13 03:11:46
On Wed, 13 Feb 2008 09:58:40 +0100
Paolo Bonzini <bonzinignu.org> wrote:

> I'm thinking of adding regex literals to GNU Smalltalk.
 The only syntax 
> I found that would work is ##/regex/.  
> ...
> What do you think?  Right now I'm more for
"no" or "not yet", but I'm 
> open to discussion.

One thing I've seen with locale describing symbols in
VisualWorks is
#"de_de.UTF-8", so going along with this approach
something like
#/.../ makes sense.

s.


_______________________________________________
help-smalltalk mailing list
help-smalltalkgnu.org

http://lists.gnu.org/mailman/listinfo/help-smalltalk

Re: regex literals
country flaguser name
Switzerland
2008-02-13 03:38:35
>> I'm thinking of adding regex literals to GNU
Smalltalk.  The only syntax 
>> I found that would work is ##/regex/.  
>> ...
>> What do you think?  Right now I'm more for
"no" or "not yet", but I'm 
>> open to discussion.
> 
> One thing I've seen with locale describing symbols in
VisualWorks is
> #"de_de.UTF-8"

Yes, that's #'de_de.UTF-8'.  It's supported in GNU Smalltalk
too, for 
"weird" symbols that are not valid Smalltalk
message names.

> , so going along with this approach something like
> #/.../ makes sense.

Two hashes because #/ is valid Smalltalk.

Paolo


_______________________________________________
help-smalltalk mailing list
help-smalltalkgnu.org

http://lists.gnu.org/mailman/listinfo/help-smalltalk

Re: regex literals
user name
2008-02-13 04:00:31
On Wed, 13 Feb 2008 10:38:35 +0100
Paolo Bonzini <bonzinignu.org> wrote:

> > One thing I've seen with locale describing symbols
in VisualWorks is
> > #"de_de.UTF-8"
> 
> Yes, that's #'de_de.UTF-8'.  

Obviously, I need Smalltalk syntax coloring for my mail
client 

s.


_______________________________________________
help-smalltalk mailing list
help-smalltalkgnu.org

http://lists.gnu.org/mailman/listinfo/help-smalltalk

Re: regex literals
country flaguser name
Switzerland
2008-02-13 04:52:46
Tony Garnock-Jones wrote:
> Paolo Bonzini wrote:
>> I'm thinking of adding regex literals to GNU
Smalltalk.
> 
> I'd be against this.
> 
>  'a.*b' asRegex
> 
> to me seems better, and doesn't require and
lexer/parser changes.

It's also slower, which is why as of today 'a.*b' works even
without 
sending #asRegex.

However, *always* treating string literals as regexes is
going to give 
problems in the long term.  In particular, it would break
with another 
extension that I was thinking about:

     #(1 3 2 6 5 4) select: #odd => #(1 3 5)
     #(1 12 2) select: (1 to: 10) => #(1 12)
     #('foo' 'bar') select: ##/f./ => #('foo')

This would be quite easily implemented (#select: would send
a new 
message to its argument, e.g. #~, instead of #value.  If
regexes would 
be implemented simply as strings, however, there would be a
conflict 
between the Collection example (second) and the regex
example (third):

     'foo' select: 'aeiouy' => 'oo'
     #('foo') select: 'f.' => cannot make it return 'foo'
as I'd like!

That's why in this case, simply using string literals as
regexes 
wouldn't work.  You would need to specify #asRegex to get
the desired 
behavior.

As I said, I'm also thinking "no"/"not
yet".  It's not paramount: older 
code would be unaffected, and I could start implementing the
above 
(which is not happening any time soon), and then see if it
is a problem. 
  Just, there *might* be one.

Paolo


_______________________________________________
help-smalltalk mailing list
help-smalltalkgnu.org

http://lists.gnu.org/mailman/listinfo/help-smalltalk

Re: regex literals
country flaguser name
United Kingdom
2008-02-13 05:20:30
Paolo Bonzini wrote:
> It's also slower, which is why as of today 'a.*b' works
even without
> sending #asRegex.

Slower because of repeated sends of asRegex?

I'd rather see new syntax for compile-time evaluation in
literal
position, instead of specialised syntax for regex literals.

##('a.*b' asRegex)

> However, *always* treating string literals as regexes
is going to give
> problems in the long term.

Agreed. Type-punning is often not a great idea.

Regards,
  Tony



_______________________________________________
help-smalltalk mailing list
help-smalltalkgnu.org

http://lists.gnu.org/mailman/listinfo/help-smalltalk

Re: regex literals
country flaguser name
Switzerland
2008-02-13 05:27:31
>> It's also slower, which is why as of today 'a.*b'
works even without
>> sending #asRegex.
> 
> Slower because of repeated sends of asRegex?

Yes.  Or just because 1 send is already more than 0!

> I'd rather see new syntax for compile-time evaluation
in literal
> position, instead of specialised syntax for regex
literals.
> 
> ##('a.*b' asRegex)

A bit verbose but yes, it is a possibility if performance is
a concern. 
  And it works now.

Paolo


_______________________________________________
help-smalltalk mailing list
help-smalltalkgnu.org

http://lists.gnu.org/mailman/listinfo/help-smalltalk

Re: regex literals
country flaguser name
United Kingdom
2008-02-13 05:35:33
Paolo Bonzini wrote:
> A bit verbose but yes, it is a possibility if
performance is a concern.
>  And it works now.

Sorry? There's existing compile-time-eval syntax? Cool!

Tony



_______________________________________________
help-smalltalk mailing list
help-smalltalkgnu.org

http://lists.gnu.org/mailman/listinfo/help-smalltalk

Re: regex literals
country flaguser name
United States
2008-02-13 14:30:56
PAOLO BONZINI <BONZINIGNU.ORG> WRITES:
> HOWEVER, *ALWAYS* TREATING STRING LITERALS AS REGEXES
IS GOING TO GIVE
> PROBLEMS IN THE LONG TERM.  IN PARTICULAR, IT WOULD
BREAK WITH ANOTHER
> EXTENSION THAT I WAS THINKING ABOUT:
>
>     #(1 3 2 6 5 4) SELECT: #ODD => #(1 3 5)

THIS IS SORT OF IN THE PRESOURCE TEST SUITE:

#(1 3 2 6 5 4) SELECT: #ODD SENDINGBLOCK
  -| #(1 3 2 6 5 4) SELECT: [:GENSYM | GENSYM ODD]
  => #(1 3 5)

>     #(1 12 2) SELECT: (1 TO: 10) => #(1 12)

I WOULD NOT USE THAT 

>     #('FOO' 'BAR') SELECT: ##/F./ => #('FOO')
>
> THIS WOULD BE QUITE EASILY IMPLEMENTED (#SELECT: WOULD
SEND A NEW
> MESSAGE TO ITS ARGUMENT, E.G. #~, INSTEAD OF #VALUE.

I WOULD RATHER HAVE A GENERALIZATION OF THE SENDINGBLOCK
PROTOCOL TO
SEND EXPLICITLY, PERHAPS WITH THIS EXTENSION (BECAUSE WITH
LITERALS,
THERE'S NO CHANCE FOR CONFUSION):

EVAL [NOCANDY.MYCODEMINDSET INSTALLIN: NAMESPACE CURRENT]

NOCANDY.PRESRC.MESSAGEMACRO SUBCLASS: SELECTLITERALBLOCKS [
    <POOL: NOCANDY.PRESRC>      "EH?"

    "OBVIOUSLY YOU WOULD MEMOIZE THIS RESULT"
    SELECTLITERALBLOCKS CLASS >> INLINABLEACTIONS [
        "SINCE, FOR ALL THESE CASES, THE STANDARD
#SELECT: SEMANTICS
         *OBVIOUSLY* AREN'T USEFUL"
        ^{'`X TO: `Y' -> '[:`G1 | `G1 BETWEEN: `X AND:
`Y]'
          -> [:M |
              ((M ATALL: #('`X' '`Y')) ALLSATISFY: [:EACH |
                   EACH ISLITERAL AND: [EACH VALUE
ISINTEGER]])
                  IFTRUE: [{'`G1' -> SELF
NEWVARIABLE}]].
          '`X' -> '[:`G1 | `G1 `SEL]'
          -> [:M | | SEL |
              SEL := M AT: '`X'.
              {SEL ISLITERAL.
               SEL VALUE ISSYMBOL.
               SEL VALUE NUMARGS = 0}
                  CONDEVERY IFTRUE: [{'`G1' -> SELF
NEWVARIABLE.
                                      #'`SEL' -> SEL
VALUE}]].
          '`X' -> '[:`G1 | `G1 ~ `X]'
          -> [:M | | X |
              X := M AT: '`X'.
              (X ISLITERAL AND: [X VALUE ISREGEX])
                  IFTRUE: [{'`G1' -> SELF
NEWVARIABLE}]].
          } COLLECT: [:TRIPLET |
             {CODETEMPLATE FROMEXPR: TRIPLET KEY KEY.
              CODETEMPLATE FROMEXPR: TRIPLET KEY VALUE.
              TRIPLET VALUE}]
    ]

    EXPANDMESSAGE: SEL TO: RCV WITHARGUMENTS: ARGS [
        | FILTER |
        FILTER := ARGS FIRST.
        SELF CLASS INLINABLEACTIONS DO: [:TRIPLET | | MATCH
EXPAND TEST |
            MATCH := TRIPLET FIRST. EXPAND := TRIPLET
SECOND.
              TEST := TRIPLET THIRD.
            (MATCH MATCH: FILTER) IFNOTNIL: [:PM |
                (TEST VALUE: PM) IFNOTNIL: [:XTN |
                    XTN DO: [:EACH | PM ADD: EACH].
                    ^STINST.RBMESSAGENODE
                        RECEIVER: RCV
                        SELECTOR: SEL
                        ARGUMENTS: {EXPAND EXPAND: PM}]]].
        ^SELF FORGOEXPANSION
    ]
]

#(1 12 2) SELECT: (1 TO: 10)
  -| #(1 12 2) SELECT: [:GENSYM | GENSYM BETWEEN: 1 AND:
10]
  => #(1 2)

ON A SIDE NOTE, WITH UNICODE, #ˆ‹ WOULD BE A GOOD NAME FOR
#~, OR MAYBE
#INCLUDES: 

-- 
BUT YOU KNOW HOW RELUCTANT PARANORMAL PHENOMENA ARE TO
REVEAL
THEMSELVES WHEN SKEPTICS ARE PRESENT. --ROBERT SHEAFFER, SKI
9/2003


_______________________________________________
HELP-SMALLTALK MAILING LIST
HELP-SMALLTALKGNU.ORG
HTTP://LISTS.GNU.ORG/MAILMAN/LISTINFO/HELP-SMALLTALK

regex literals
country flaguser name
Switzerland
2008-02-14 02:18:29
> This is sort of in the Presource test suite:
> 
> #(1 3 2 6 5 4) select: #odd sendingBlock
>   -| #(1 3 2 6 5 4) select: [:gensym | gensym odd]
>   => #(1 3 5)
> 
>>     #(1 12 2) select: (1 to: 10) => #(1 12)
> 
> I would not use that 

Note that it's just a special case of Collections:

	'foobar' select: 'aeiou' => 'ooa'

In fact, "#(1 1.2 2) select: (1 to: 10)" would
*not* include 1.2 in the 
result.

My desire is to allow the common idea of "select:
#odd" without 
implementing Symbol>>#value:.  I see no need to
implement #sendingBlock 
(all this IMHO of course) if you reason that:

1) right now, #select: and #collect: have the same
"protocol" for the 
argument, but the two are very different.  In the case of 
#select:/#reject: the argument should return true/false for
any 
collection; for #collect: instead the argument should return
an object 
in the same domain as the source.

Taking an extreme position: #value: is the most overloaded
method in 
Smalltalk and the less you use it, the better.   (Because
then you 
can achieve more polymorphism and more DWIM).

2) therefore, I decide that #select: (and #reject accept a
different 
thing than a block, a "predicate".  A predicate
can be a unary block of 
course, but also a symbol, a regex, a collection, ...  I
chose #~ as the 
message that the predicate protocol would implement because
it's what we 
use for regexes, but it's not necessary to implement it with
that name 
(also because we currently have "aString ~
aRegex", not the other way 
round).

3) the same could apply to #collect:, but with a *different*
message to 
emphasize that the argument is not a "predicate",
it is an "xyz" (name 
to be decided  I didn't
find any good one).  I don't have very strong 
ideas on how to call the message, but it also could apply to
symbols, 
regexes and collections: for example

   #('1.2' '3.4') collect: #allButLast => #('1.' '3.')

   #('1.2' '3.4') collect: '^.*.' asRegex => #('1.'
'3.')
   #('1.2' '3.4') collect: '.(.*)' asRegex => #('2'
'4')

   #('foo' 'bar') collect: #(1 3) => #('fo' 'br')

> NoCandy.Presrc.MessageMacro subclass:
SelectLiteralBlocks [
>     <pool: NoCandy.Presrc>      "eh?"

You mean <import: ...> here?

> On a side note, with Unicode, #∋ would be a good name
for #~, or maybe
> #includes: 

Now what Unicode symbols would be binary messages, and which
would be 
okay for identifiers/keywords?  

Paolo


_______________________________________________
help-smalltalk mailing list
help-smalltalkgnu.org

http://lists.gnu.org/mailman/listinfo/help-smalltalk

[1-10]

about | contact  Other archives ( Real Estate discussion Medical topics )