List Info

Thread: Re: What to do for bytes in 2.6?




Re: What to do for bytes in 2.6?
country flaguser name
United States
2008-01-17 21:11:10
> *If* we provide some kind of "backport" of
> bytes (even if it's just an alias for or trivial
> subclass of str), it should be part of a strategy 
> that makes it easier to write code that
> runs under 2.6 and can be automatically translated 
> to run under 3.0 with the same semantics. 

If it's just an alias or trivial subclass, then we 
haven't added anything that can't be done trivially
by the 2-to-3 tool.

I'm thinking that this is a deeper change. 
It doesn't serve either 2.6 or 3.0 to conflate
str/unicode model with the bytes/text model.
Mixing the two in one place just creates a mess
in that one place.

I'm sure we're thinking that this is just an optional
transition tool, but the reality is that once people
write 2.6 tools that use the new model,
then 2.6 users are forced to deal with that model.
It stops being optional or something in the future,
it becomes a mental jump that needs to be made now
(while still retaining the previous model in mind
for all the rest of the code).

I don't think you need a case study to forsee that
it will be unpleasant to work with a code base
that commingles the two world views.

One other thought.  I'm guessing that apps that would
care about the distinction are already using unicode
and are already treating text as distinct from arrays
of bytes.  Instead, it's backwards thinking 20th-century
neanderthal ascii-bound folks like myself who are going
to have transition issues.  It would be nice for us
knuckle-draggers to not have to face the issue until 3.0.


Raymond
_______________________________________________
Python-Dev mailing list
Python-Devpython.org
ht
tp://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/p
ython-dev/nessto%40sharedlog.com

Re: What to do for bytes in 2.6?
user name
2008-01-17 22:43:47
On Jan 17, 2008 7:11 PM, Raymond Hettinger <pythonrcn.com> wrote:
> > *If* we provide some kind of "backport"
of
> > bytes (even if it's just an alias for or trivial
> > subclass of str), it should be part of a strategy
> > that makes it easier to write code that
> > runs under 2.6 and can be automatically
translated
> > to run under 3.0 with the same semantics.
>
> If it's just an alias or trivial subclass, then we
> haven't added anything that can't be done trivially
> by the 2-to-3 tool.

I suggest you study how the 2to3 tool actually works before
asserting this.

Consider the following function.

def stuff(blah):
  foo = ""
  while True:
    bar = blah.read(1024)
    if bar == "":
      break
    foo += bar
  return foo

Is it reading text or binary data from stream blah? We can't
tell.  If
it's meant to be reading text, 2to3 should leave it alone.
But if it's
meant to be reading binary data, 2to3 should change the
string
literals to bytes literals (b"" in this case). (If
it's used for both,
there's no hope.) As it stands, 2to3 hasn't a chance to
decide what to
do, so it will leave it alone -- but the
"translated" code will be
wrong if it was meant to be reading bytes.

However, if the two empty string literals were changed to
b"", we
would know it was reading bytes. 2to3 could leave it alone,
but at
least the untranslated code would be correct for 2.6 and
the
translated code would be correct for 3.0.

This may seem trivial (because we do all the work, and 2to3
just
leaves stuff alone), but having b"" and bytes as
aliases for "" and
str in 2.6 would mean that we could write 2.6 code that
correctly
expresses the use of binary data -- and we could use
u"" and unicode
for code using text, and 2to3 would translate those to
"" and str and
the code would be correct 3.0 text processing code.

Note that we really can't make 2to3 assume that all uses of
str and ""
are referring to binary data -- that would mistranslate the
vast
majority of code that does non-Unicode-aware text
processing, which I
estimate is the majority of small and mid-size programs.

> I'm thinking that this is a deeper change.
> It doesn't serve either 2.6 or 3.0 to conflate
> str/unicode model with the bytes/text model.
> Mixing the two in one place just creates a mess
> in that one place.
>
> I'm sure we're thinking that this is just an optional
> transition tool, but the reality is that once people
> write 2.6 tools that use the new model,
> then 2.6 users are forced to deal with that model.
> It stops being optional or something in the future,
> it becomes a mental jump that needs to be made now
> (while still retaining the previous model in mind
> for all the rest of the code).

This may be true. But still, 2.6 *will* run 2.5 code without
any
effort, so we will be able to mix modules using the 2.5
style and
modules using the 3.0 style (or at least some aspects of 3.0
style) in
one interpreter. Neither 2.5 nor 3.0 will support this
combination.
That's why 2.6 is so important it's a stepping stone.

> I don't think you need a case study to forsee that
> it will be unpleasant to work with a code base
> that commingles the two world views.

Well, you shouldn't commingle the two world view in a single
module or
package. But that would just be bad style -- you shouldn't
use
competing style rules within a package either (like using
words_with_underscores and camelCaseWords for method
names).

> One other thought.  I'm guessing that apps that would
> care about the distinction are already using unicode
> and are already treating text as distinct from arrays
> of bytes.

Yes, but 99% of these still accept str instances in
positions where
they require text. The problem is that the str type and its
literals
are ambiguous -- their use is not enough to be able to guess
whether
text or data is meant. Just being able to (voluntarily! on
a
per-module basis!) use a different type name and literal
style for
data could help forward-looking programmers get started on
making the
distinction clear, thus getting ready for 3.0 without making
the jump
just yet (or maintaining a 2.6 and a 3.0 version of the same
package
easily, using 2to3 to automatically generate the 3.0 version
from the
2.6 code base).

> Instead, it's backwards thinking 20th-century
> neanderthal ascii-bound folks like myself who are
going
> to have transition issues.  It would be nice for us
> knuckle-draggers to not have to face the issue until
3.0.

Oh, you won't. Just don't use the -3 command-line flag and
don't put
"from __future__ import <whatever>" at the
top of your modules, and
you won't have to change your ways at all. You can continue
to
distribute your packages in 2.5 syntax that will also work
with 2.6,
and your users will be happy (as long as they don't want to
use your
code on 3.0 -- but if you want to give them that, *that* is
when you
will finally be forced to face the issue. 

Note that I believe that the -3 flag should not change
semantics -- it
should only add warnings. Semantic changes must either be
backwards
compatible or be requested explicitly with a __forward__
import (which
2to3 can remove).

-- 
--Guido van Rossum (home page: http://www.python.org/~
guido/)
_______________________________________________
Python-Dev mailing list
Python-Devpython.org
ht
tp://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/p
ython-dev/nessto%40sharedlog.com

Re: What to do for bytes in 2.6?
country flaguser name
Japan
2008-01-17 22:57:22
Raymond Hettinger writes:

 > One other thought.  I'm guessing that apps that would
 > care about the distinction are already using unicode
 > and are already treating text as distinct from arrays
 > of bytes.

Indeed.  Mailman, for instance.  Yet Mailman still has
problems with
(broken) wire protocol that sneaks past the gate, and causes
some
exception that is only handled by the top-level "no
matter what goes
wrong, we're not going to lose this post" handler
(which literally
shunts it into a queue that only human admins look it --
it's not
Mailman's problem any more.)

However, I am not sure it would help Mailman to catch such
bugs to
move from the str/unicode paradigm to the bytes/text
paradigm.  The
problem Mailman faces is that there is no (single) Japanese
foyer
where the characters have to exchange their muddy
"bytes" shoes for
nice clean "unicode" slippers.  Instead, there are
a number of ways to
get in, and the translation takes place (and sometimes not)
at
different stages.  But this is not a Python issue; it has to
do with
Mailman's design.

So I don't think this would be improved if we changed the
paradigm
forcibly.  I don't see a benefit to apps like Mailman from
changing
over in 2.x.

_______________________________________________
Python-Dev mailing list
Python-Devpython.org
ht
tp://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/p
ython-dev/nessto%40sharedlog.com

Re: What to do for bytes in 2.6?
country flaguser name
United States
2008-01-17 23:30:37
"Guido van Rossum" <guidopython.org> wrote in
message 
news:ca471dc20801172043l3356e04et9b8e807177230c6fmail.gmail.com...
| Is it reading text or binary data from stream blah? We
can't tell.  If
| it's meant to be reading text, 2to3 should leave it alone.
But if it's
| meant to be reading binary data, 2to3 should change the
string
| literals to bytes literals (b"" in this case).
(If it's used for both,
| there's no hope.) As it stands, 2to3 hasn't a chance to
decide what to
| do, so it will leave it alone -- but the
"translated" code will be
| wrong if it was meant to be reading bytes.

It seems that the main purpose of adding bytes (as more or
less a
synonym for str when used as bytes) is to aid 2to3
translation.
So I think I would favor it being part of a future import.

| Note that I believe that the -3 flag should not change
semantics -- it
| should only add warnings. Semantic changes must either be
backwards
| compatible or be requested explicitly with a __forward__
import (which
| 2to3 can remove).

Were you planning to make bytes a __future__ (or
__forward__?) import?
I think making it so should satisfy Raymond's concerns. 
Even if whatever
you eventually do is technically backwards compatible, he is
suggesting 
that
conceptually, it is not.  I see some validity to that view.

tjr



_______________________________________________
Python-Dev mailing list
Python-Devpython.org
ht
tp://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/p
ython-dev/nessto%40sharedlog.com

Re: What to do for bytes in 2.6?
country flaguser name
United States
2008-01-17 23:44:52
> having b"" and bytes as aliases for
"" and
> str in 2.6 would mean that we could write 2.6 code that
correctly
> expresses the use of binary data -- and we could use
u"" and unicode
> for code using text, and 2to3 would translate those to
"" and str and
> the code would be correct 3.0 text processing code.

I see. There's a healthy benefit for 2to3 translation that
cannot be
accomplished in any other way.   This may potentially more
than offset the
negative impact to the 2.x world where it complexifies the
language
without any immediate payoff.

FWIW, I'm sold on the rationale.  Hopefully, this can be
confined
to just annotation and aliasing but not porting back any C
API
changes or code that depends on the bytes/text distinction.
I worry
that as soon as the annotation is made available, it will
ripple
throughout the code and pervade the language so that it
cannot
be ignored by Py2.6 coders.  It's a bit of a Pandora's Box.

> Just being able to (voluntarily! on a
> per-module basis!) use a different type name and
literal style for
> data could help forward-looking programmers get started
on making the
> distinction clear, thus getting ready for 3.0 without
making the jump
> just yet (or maintaining a 2.6 and a 3.0 version of the
same package
> easily, using 2to3 to automatically generate the 3.0
version from the
> 2.6 code base).

Are we going to be "forward-looking" and change
all of the
standard library modules?  Is this going to pop-up
everywhere
and become something you have to know about whether or
not you're a volunteering forward-looking programmer?  I
hope not.


Raymond
_______________________________________________
Python-Dev mailing list
Python-Devpython.org
ht
tp://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/p
ython-dev/nessto%40sharedlog.com

Re: What to do for bytes in 2.6?
country flaguser name
United States
2008-01-18 00:11:17
On 04:43 am, guidopython.org wrote:
>Just being able to (voluntarily! on a
>per-module basis!) use a different type name and literal
style for
>data could help forward-looking programmers get started
on making the
>distinction clear, thus getting ready for 3.0 without
making the jump
>just yet (or maintaining a 2.6 and a 3.0 version of the
same package
>easily, using 2to3 to automatically generate the 3.0
version from the
>2.6 code base).

Yes!  Yes!  Yes!  A thousand times yes!  :-D

This is *the* crucial feature which will make porting large
libraries 
like Twisted to 3.0 even possible.  Thank you, Guido.

To the critics of this idea: libraries which process text,
if they are 
meant to be correct, will need to deal explicitly with the
issue of what 
data-types they believe to be text, what methods they will
call on them, 
and how they deal with them.  You cannot get away from this.
 It is not 
an issue reserved for the "pure" future of 3.0; if
your code doesn't 
handle these types correctly now, it has bugs in it *now*. 
(In fact I 
am fixing some code with just such a bug in it right now
.)

It is definitely possible to make your library code do the
right thing 
for different data types, continue to support str literals
in 2.6, and 
eventually require text / unicode input (after an
appropriate 
deprecation period, of course).  And it will be a lot easier
if the 
translations imposed by 2to3 are as minimal as possible.
>Note that I believe that the -3 flag should not change
semantics -- it
>should only add warnings. Semantic changes must either
be backwards
>compatible or be requested explicitly with a __forward__
import (which
>2to3 can remove).

This also sounds great.
_______________________________________________
Python-Dev mailing list
Python-Devpython.org
ht
tp://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/p
ython-dev/nessto%40sharedlog.com

Re: What to do for bytes in 2.6?
user name
2008-01-18 08:52:25
On Jan 17, 2008 9:30 PM, Terry Reedy <tjreedyudel.edu> wrote:
> "Guido van Rossum" <guidopython.org> wrote in message
> news:ca471dc20801172043l3356e04et9b8e807177230c6fmail.gmail.com...
> | Is it reading text or binary data from stream blah?
We can't tell.  If
> | it's meant to be reading text, 2to3 should leave it
alone. But if it's
> | meant to be reading binary data, 2to3 should change
the string
> | literals to bytes literals (b"" in this
case). (If it's used for both,
> | there's no hope.) As it stands, 2to3 hasn't a chance
to decide what to
> | do, so it will leave it alone -- but the
"translated" code will be
> | wrong if it was meant to be reading bytes.
>
> It seems that the main purpose of adding bytes (as more
or less a
> synonym for str when used as bytes) is to aid 2to3
translation.
> So I think I would favor it being part of a future
import.
>
> | Note that I believe that the -3 flag should not
change semantics -- it
> | should only add warnings. Semantic changes must
either be backwards
> | compatible or be requested explicitly with a
__forward__ import (which
> | 2to3 can remove).
>
> Were you planning to make bytes a __future__ (or
__forward__?) import?
> I think making it so should satisfy Raymond's concerns.
 Even if whatever
> you eventually do is technically backwards compatible,
he is suggesting
> that conceptually, it is not.  I see some validity to
that view.

While it *could* be made conditional on a __future__ import,
the cost
of those (both in terms of implementing them and using them)
is rather
high so I'd prefer it to be always available. Given
Raymond's later
response, I'm not sure it's worth the effort.

On Jan 17, 2008 9:44 PM, Raymond Hettinger <pythonrcn.com> wrote:
> > having b"" and bytes as aliases for
"" and
> > str in 2.6 would mean that we could write 2.6 code
that correctly
> > expresses the use of binary data -- and we could
use u"" and unicode
> > for code using text, and 2to3 would translate
those to "" and str and
> > the code would be correct 3.0 text processing
code.
>
> I see. There's a healthy benefit for 2to3 translation
that cannot be
> accomplished in any other way.   This may potentially
more than offset the
> negative impact to the 2.x world where it complexifies
the language
> without any immediate payoff.
>
> FWIW, I'm sold on the rationale.  Hopefully, this can
be confined
> to just annotation and aliasing but not porting back
any C API
> changes or code that depends on the bytes/text
distinction.

I'm fine with only making this a superficial veneer:
b"" and bytes as
aliases for "" and str. However Thomas Heller's
response requires more
thought.

> I worry
> that as soon as the annotation is made available, it
will ripple
> throughout the code and pervade the language so that it
cannot
> be ignored by Py2.6 coders.  It's a bit of a Pandora's
Box.

Well, that's one opinion against another. And I presume by
now you
have read Glyph's enthusiastic response. Getting Twisted on
3.0 is a
big enabler for getting other 3rd party packages an apps on
board!

> > Just being able to (voluntarily! on a
> > per-module basis!) use a different type name and
literal style for
> > data could help forward-looking programmers get
started on making the
> > distinction clear, thus getting ready for 3.0
without making the jump
> > just yet (or maintaining a 2.6 and a 3.0 version
of the same package
> > easily, using 2to3 to automatically generate the
3.0 version from the
> > 2.6 code base).
>
> Are we going to be "forward-looking" and
change all of the
> standard library modules?  Is this going to pop-up
everywhere
> and become something you have to know about whether or
> not you're a volunteering forward-looking programmer? 
I hope not.

I have no desire to fix up the standard library
cosmetically, and I
don't see the need -- the standard library has already been
forward
ported to 3.0, so the rationale just doesn't apply.

-- 
--Guido van Rossum (home page: http://www.python.org/~
guido/)
_______________________________________________
Python-Dev mailing list
Python-Devpython.org
ht
tp://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/p
ython-dev/nessto%40sharedlog.com

Re: What to do for bytes in 2.6?
user name
2008-01-18 10:15:48
2008/1/18, Guido van Rossum <guidopython.org>:

> I don't think any of that is necessary. I would rather
have the
> following two in the language by default (see my
response to Terry and
> Raymond):
>
> bytes is an alias for str (not even a subclass)
> b"" is an alias for ""

+1

-- 
.    Facundo

Blog: http://www.tanique
til.com.ar/plog/
PyAr: http://www.python.org/ar/
_______________________________________________
Python-Dev mailing list
Python-Devpython.org
ht
tp://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/p
ython-dev/nessto%40sharedlog.com

Re: What to do for bytes in 2.6?
country flaguser name
Canada
2008-01-19 12:46:07
Guido van Rossum <guidopython.org> wrote:
> This may seem trivial (because we do all the work, and
2to3 just
> leaves stuff alone), but having b"" and bytes
as aliases for "" and
> str in 2.6 would mean that we could write 2.6 code that
correctly
> expresses the use of binary data -- and we could use
u"" and unicode
> for code using text, and 2to3 would translate those to
"" and str and
> the code would be correct 3.0 text processing code.

I like this solution because of its simplicity.

  Neil

_______________________________________________
Python-Dev mailing list
Python-Devpython.org
ht
tp://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/p
ython-dev/nessto%40sharedlog.com

Re: What to do for bytes in 2.6?
country flaguser name
Canada
2008-01-19 12:53:42
Guido van Rossum <guidopython.org> wrote:
> bytes is an alias for str (not even a subclass)
> b"" is an alias for ""

One advantage of a subclass is that there could be a flag
that warns
about combining bytes and unicode data.  For example,
b"x" + u"y"
would produce a warning.  As someone who writes
internationalized
software, I would happly use both the byte designating
syntax and
the warning flag, even if I wasn't planning to move to
Python 3.

  Neil

_______________________________________________
Python-Dev mailing list
Python-Devpython.org
ht
tp://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/p
ython-dev/nessto%40sharedlog.com

[1-10] [11-16]

about | contact  Other archives ( Real Estate discussion Medical topics )