|
List Info
Thread: regex literals
|
|
| regex literals |
  Switzerland |
2008-02-13 02:58:40 |
I'm thinking of adding regex literals to GNU Smalltalk. The
only syntax
I found that would work is ##/regex/. /regex/ wouldn't work
for the old
syntax, because the lexer has no way to understand that the
/ in this
example
a: b
/regex/ printNl
starts a regex and is not a division operator. It would
work in the new
syntax (after one of [ ( { ^ . keyword: identifier
binary-message, and
maybe a few more I forgot, / would start a regex, otherwise
it would be
a division operator), but I don't like to add a feature that
cannot be
ported to other Smalltalks.
What do you think? Right now I'm more for "no" or
"not yet", but I'm
open to discussion.
Paolo
_______________________________________________
help-smalltalk mailing list
help-smalltalk gnu.org
http://lists.gnu.org/mailman/listinfo/help-smalltalk
|
|
| Re: regex literals |

|
2008-02-13 03:11:46 |
On Wed, 13 Feb 2008 09:58:40 +0100
Paolo Bonzini <bonzini gnu.org> wrote:
> I'm thinking of adding regex literals to GNU Smalltalk.
The only syntax
> I found that would work is ##/regex/.
> ...
> What do you think? Right now I'm more for
"no" or "not yet", but I'm
> open to discussion.
One thing I've seen with locale describing symbols in
VisualWorks is
#"de_de.UTF-8", so going along with this approach
something like
#/.../ makes sense.
s.
_______________________________________________
help-smalltalk mailing list
help-smalltalk gnu.org
http://lists.gnu.org/mailman/listinfo/help-smalltalk
|
|
| Re: regex literals |
  Switzerland |
2008-02-13 03:38:35 |
>> I'm thinking of adding regex literals to GNU
Smalltalk. The only syntax
>> I found that would work is ##/regex/.
>> ...
>> What do you think? Right now I'm more for
"no" or "not yet", but I'm
>> open to discussion.
>
> One thing I've seen with locale describing symbols in
VisualWorks is
> #"de_de.UTF-8"
Yes, that's #'de_de.UTF-8'. It's supported in GNU Smalltalk
too, for
"weird" symbols that are not valid Smalltalk
message names.
> , so going along with this approach something like
> #/.../ makes sense.
Two hashes because #/ is valid Smalltalk.
Paolo
_______________________________________________
help-smalltalk mailing list
help-smalltalk gnu.org
http://lists.gnu.org/mailman/listinfo/help-smalltalk
|
|
| Re: regex literals |

|
2008-02-13 04:00:31 |
On Wed, 13 Feb 2008 10:38:35 +0100
Paolo Bonzini <bonzini gnu.org> wrote:
> > One thing I've seen with locale describing symbols
in VisualWorks is
> > #"de_de.UTF-8"
>
> Yes, that's #'de_de.UTF-8'.
Obviously, I need Smalltalk syntax coloring for my mail
client
s.
_______________________________________________
help-smalltalk mailing list
help-smalltalk gnu.org
http://lists.gnu.org/mailman/listinfo/help-smalltalk
|
|
| Re: regex literals |
  Switzerland |
2008-02-13 04:52:46 |
Tony Garnock-Jones wrote:
> Paolo Bonzini wrote:
>> I'm thinking of adding regex literals to GNU
Smalltalk.
>
> I'd be against this.
>
> 'a.*b' asRegex
>
> to me seems better, and doesn't require and
lexer/parser changes.
It's also slower, which is why as of today 'a.*b' works even
without
sending #asRegex.
However, *always* treating string literals as regexes is
going to give
problems in the long term. In particular, it would break
with another
extension that I was thinking about:
#(1 3 2 6 5 4) select: #odd => #(1 3 5)
#(1 12 2) select: (1 to: 10) => #(1 12)
#('foo' 'bar') select: ##/f./ => #('foo')
This would be quite easily implemented (#select: would send
a new
message to its argument, e.g. #~, instead of #value . If
regexes would
be implemented simply as strings, however, there would be a
conflict
between the Collection example (second) and the regex
example (third):
'foo' select: 'aeiouy' => 'oo'
#('foo') select: 'f.' => cannot make it return 'foo'
as I'd like!
That's why in this case, simply using string literals as
regexes
wouldn't work. You would need to specify #asRegex to get
the desired
behavior.
As I said, I'm also thinking "no"/"not
yet". It's not paramount: older
code would be unaffected, and I could start implementing the
above
(which is not happening any time soon), and then see if it
is a problem.
Just, there *might* be one.
Paolo
_______________________________________________
help-smalltalk mailing list
help-smalltalk gnu.org
http://lists.gnu.org/mailman/listinfo/help-smalltalk
|
|
| Re: regex literals |
  United Kingdom |
2008-02-13 05:20:30 |
Paolo Bonzini wrote:
> It's also slower, which is why as of today 'a.*b' works
even without
> sending #asRegex.
Slower because of repeated sends of asRegex?
I'd rather see new syntax for compile-time evaluation in
literal
position, instead of specialised syntax for regex literals.
##('a.*b' asRegex)
> However, *always* treating string literals as regexes
is going to give
> problems in the long term.
Agreed. Type-punning is often not a great idea.
Regards,
Tony
_______________________________________________
help-smalltalk mailing list
help-smalltalk gnu.org
http://lists.gnu.org/mailman/listinfo/help-smalltalk
|
|
| Re: regex literals |
  Switzerland |
2008-02-13 05:27:31 |
>> It's also slower, which is why as of today 'a.*b'
works even without
>> sending #asRegex.
>
> Slower because of repeated sends of asRegex?
Yes. Or just because 1 send is already more than 0!
> I'd rather see new syntax for compile-time evaluation
in literal
> position, instead of specialised syntax for regex
literals.
>
> ##('a.*b' asRegex)
A bit verbose but yes, it is a possibility if performance is
a concern.
And it works now.
Paolo
_______________________________________________
help-smalltalk mailing list
help-smalltalk gnu.org
http://lists.gnu.org/mailman/listinfo/help-smalltalk
|
|
| Re: regex literals |
  United Kingdom |
2008-02-13 05:35:33 |
Paolo Bonzini wrote:
> A bit verbose but yes, it is a possibility if
performance is a concern.
> And it works now.
Sorry? There's existing compile-time-eval syntax? Cool!
Tony
_______________________________________________
help-smalltalk mailing list
help-smalltalk gnu.org
http://lists.gnu.org/mailman/listinfo/help-smalltalk
|
|
| Re: regex literals |
  United States |
2008-02-13 14:30:56 |
PAOLO BONZINI <BONZINI GNU.ORG> WRITES:
> HOWEVER, *ALWAYS* TREATING STRING LITERALS AS REGEXES
IS GOING TO GIVE
> PROBLEMS IN THE LONG TERM. IN PARTICULAR, IT WOULD
BREAK WITH ANOTHER
> EXTENSION THAT I WAS THINKING ABOUT:
>
> #(1 3 2 6 5 4) SELECT: #ODD => #(1 3 5)
THIS IS SORT OF IN THE PRESOURCE TEST SUITE:
#(1 3 2 6 5 4) SELECT: #ODD SENDINGBLOCK
-| #(1 3 2 6 5 4) SELECT: [:GENSYM | GENSYM ODD]
=> #(1 3 5)
> #(1 12 2) SELECT: (1 TO: 10) => #(1 12)
I WOULD NOT USE THAT
> #('FOO' 'BAR') SELECT: ##/F./ => #('FOO')
>
> THIS WOULD BE QUITE EASILY IMPLEMENTED (#SELECT: WOULD
SEND A NEW
> MESSAGE TO ITS ARGUMENT, E.G. #~, INSTEAD OF #VALUE .
I WOULD RATHER HAVE A GENERALIZATION OF THE SENDINGBLOCK
PROTOCOL TO
SEND EXPLICITLY, PERHAPS WITH THIS EXTENSION (BECAUSE WITH
LITERALS,
THERE'S NO CHANCE FOR CONFUSION):
EVAL [NOCANDY.MYCODEMINDSET INSTALLIN: NAMESPACE CURRENT]
NOCANDY.PRESRC.MESSAGEMACRO SUBCLASS: SELECTLITERALBLOCKS [
<POOL: NOCANDY.PRESRC> "EH?"
"OBVIOUSLY YOU WOULD MEMOIZE THIS RESULT"
SELECTLITERALBLOCKS CLASS >> INLINABLEACTIONS [
"SINCE, FOR ALL THESE CASES, THE STANDARD
#SELECT: SEMANTICS
*OBVIOUSLY* AREN'T USEFUL"
^{'` X TO: ` Y' -> '[:`G1 | `G1 BETWEEN: ` X AND:
` Y]'
-> [:M |
((M ATALL: #('` X' '` Y')) ALLSATISFY: [:EACH |
EACH ISLITERAL AND: [EACH VALUE
ISINTEGER]])
IFTRUE: [{'`G1' -> SELF
NEWVARIABLE}]].
'` X' -> '[:`G1 | `G1 `SEL]'
-> [:M | | SEL |
SEL := M AT: '` X'.
{SEL ISLITERAL.
SEL VALUE ISSYMBOL.
SEL VALUE NUMARGS = 0}
CONDEVERY IFTRUE: [{'`G1' -> SELF
NEWVARIABLE.
#'`SEL' -> SEL
VALUE}]].
'` X' -> '[:`G1 | `G1 ~ ` X]'
-> [:M | | X |
X := M AT: '` X'.
(X ISLITERAL AND: [X VALUE ISREGEX])
IFTRUE: [{'`G1' -> SELF
NEWVARIABLE}]].
} COLLECT: [:TRIPLET |
{CODETEMPLATE FROMEXPR: TRIPLET KEY KEY.
CODETEMPLATE FROMEXPR: TRIPLET KEY VALUE.
TRIPLET VALUE}]
]
EXPANDMESSAGE: SEL TO: RCV WITHARGUMENTS: ARGS [
| FILTER |
FILTER := ARGS FIRST.
SELF CLASS INLINABLEACTIONS DO: [:TRIPLET | | MATCH
EXPAND TEST |
MATCH := TRIPLET FIRST. EXPAND := TRIPLET
SECOND.
TEST := TRIPLET THIRD.
(MATCH MATCH: FILTER) IFNOTNIL: [:PM |
(TEST VALUE: PM) IFNOTNIL: [:XTN |
XTN DO: [:EACH | PM ADD: EACH].
^STINST.RBMESSAGENODE
RECEIVER: RCV
SELECTOR: SEL
ARGUMENTS: {EXPAND EXPAND: PM}]]].
^SELF FORGOEXPANSION
]
]
#(1 12 2) SELECT: (1 TO: 10)
-| #(1 12 2) SELECT: [:GENSYM | GENSYM BETWEEN: 1 AND:
10]
=> #(1 2)
ON A SIDE NOTE, WITH UNICODE, #ˆ‹ WOULD BE A GOOD NAME FOR
#~, OR MAYBE
#INCLUDES:
--
BUT YOU KNOW HOW RELUCTANT PARANORMAL PHENOMENA ARE TO
REVEAL
THEMSELVES WHEN SKEPTICS ARE PRESENT. --ROBERT SHEAFFER, SKI
9/2003
_______________________________________________
HELP-SMALLTALK MAILING LIST
HELP-SMALLTALK GNU.ORG
HTTP://LISTS.GNU.ORG/MAILMAN/LISTINFO/HELP-SMALLTALK
|
|
| regex literals |
  Switzerland |
2008-02-14 02:18:29 |
> This is sort of in the Presource test suite:
>
> #(1 3 2 6 5 4) select: #odd sendingBlock
> -| #(1 3 2 6 5 4) select: [:gensym | gensym odd]
> => #(1 3 5)
>
>> #(1 12 2) select: (1 to: 10) => #(1 12)
>
> I would not use that
Note that it's just a special case of Collections:
'foobar' select: 'aeiou' => 'ooa'
In fact, "#(1 1.2 2) select: (1 to: 10)" would
*not* include 1.2 in the
result.
My desire is to allow the common idea of "select:
#odd" without
implementing Symbol>>#value:. I see no need to
implement #sendingBlock
(all this IMHO of course) if you reason that:
1) right now, #select: and #collect: have the same
"protocol" for the
argument, but the two are very different. In the case of
#select:/#reject: the argument should return true/false for
any
collection; for #collect: instead the argument should return
an object
in the same domain as the source.
Taking an extreme position: #value: is the most overloaded
method in
Smalltalk and the less you use it, the better. (Because
then you
can achieve more polymorphism and more DWIM).
2) therefore, I decide that #select: (and #reject accept a
different
thing than a block, a "predicate". A predicate
can be a unary block of
course, but also a symbol, a regex, a collection, ... I
chose #~ as the
message that the predicate protocol would implement because
it's what we
use for regexes, but it's not necessary to implement it with
that name
(also because we currently have "aString ~
aRegex", not the other way
round).
3) the same could apply to #collect:, but with a *different*
message to
emphasize that the argument is not a "predicate",
it is an "xyz" (name
to be decided I didn't
find any good one). I don't have very strong
ideas on how to call the message, but it also could apply to
symbols,
regexes and collections: for example
#('1.2' '3.4') collect: #allButLast => #('1.' '3.')
#('1.2' '3.4') collect: '^.*.' asRegex => #('1.'
'3.')
#('1.2' '3.4') collect: '.(.*)' asRegex => #('2'
'4')
#('foo' 'bar') collect: #(1 3) => #('fo' 'br')
> NoCandy.Presrc.MessageMacro subclass:
SelectLiteralBlocks [
> <pool: NoCandy.Presrc> "eh?"
You mean <import: ...> here?
> On a side note, with Unicode, #∋ would be a good name
for #~, or maybe
> #includes:
Now what Unicode symbols would be binary messages, and which
would be
okay for identifiers/keywords?
Paolo
_______________________________________________
help-smalltalk mailing list
help-smalltalk gnu.org
http://lists.gnu.org/mailman/listinfo/help-smalltalk
|
|
[1-10]
|
|