> "Dr.Ruud" <rvtol+news isolution.nl> writes:
>> Brain fart:
>>
>> my $foo = <<'FOO' :koi8-r;
>> raw koi8-r data here
>> FOO
> Brilliant, except for the complication that the
end-of-heredoc string
> must be encoded in the foreign encoding (which I think
is a solvable
> problem).
Do you mean that "FOO" would have to be in koi8-r
rather that in the
script's own encoding, if any?
Yes, I agree it looks nifty; it's nice that it should use
the existing
though little-used :attributes syntactic slot. But I think
it leads
to some curious questions.
First of all, one wonders whether this wouldn't lead to more
general,
per-literal encoding specs, such as C<'string':enc>,
C<q!string!:enc>,
(or even C<q:string:enc> skipping the dup colon?), and
similar ilk?
Even without :enc being applicable to general literals,
though, what
about interpolated data from C< <<"FOO"
>? That is, would something
like C<"str1 $var str2":euc-tw>, written
heredockishly as
my $foo = <<"FOO" :euc-tw;
str1 $var str2
FOO
mean: (line endings aside)
decode(euc_tw => 'str1 ')
. $var
. decode(euc_tw => ' str2')
Or would it instead mean:
decode euc_tw => ('str1 ' . $var . ' str2')
Which goes first?
Mmm, doesn't this mean we'd get to specify an encoding on
readpipe?
I think it does!
my $rebus = <<`HIC`:Latin1;
cmd1 $var | $cmd2
cmd3
HIC
Yum!
You know, that's almost even somewhat appealing--compared
with
the alternative:
my $rebus = do {
open(my $rdpipe, "|- :encoding(Latin1)",
"cmd1 $var | cmd2; cmd3 |");
local $/;
<$rdpipe>;
};
Although certainly the simpler
my $rebus = `cmd1 $var | cmd2; cmd3` :Latin1;
or, if you must,
my $rebus = qx(cmd1 $var | cmd2; cmd3) :Latin1;
would be easier on the eye and mind than C<
<<`HIC`:Latin1 > would.
Hm, looking at the command-interpolated version, it now
seems pretty
obvious that variable interpolation must occur before
"de-"encoding
(er, "en-"decoding? I just can't keep those two
straight in my head!),
so that would mean
my $rebus = decode Latin1 => qx(cmd1 $var | cmd2;
cmd3);
So I guess that clears up the order of operations on the
prospective
C< <<"HIC":Latin1 > case, doesn't it?
my $rebus = <<"HIC" :Latin1;
str1 $var str2
HIC
would be
my $rebus = decode Latin1 => "str1 $var
str2";
Hm...
my rebus = <<`HIC` :Latin1;
str1 $var str2
HIC
In Latin1, there's no trouble, but I'd have to unwrap that
to
see when the implicit line-breaking split ran.
my rebus = split( /(?=n)/, decode(Latin1 => `str1
$var str2`) );
I wonder a little about other line terminators in very funky
encodings.
Let's say Jis0212-RAW had v stuff far beyond n. Would
my lines = <<`FOO` :jis0212-raw;
str1 $var str2
FOO
be therefore
my lines = split( /(?=n)/, decode(jis0212_raw =>
`str1 $var str2`) );
Hm, looks like I'm relying on split losing the trailing null
field there.
I guess I could write the regex as /(?=n.)/s so split
doesn't have to go
to extra work of splitting the last thing and then throwing
it away.
Hm, maybe using R might be better:
my lines = map { decode jis0212_raw => $_ }
split( /(?=R.)/s, `str1 $var str2`);
Oh, never mind; the qx// implicit split doesn't use n; it
uses $/
(which is a bit of a bother to put in a m//). So that's
just:
my lines = split( m[(?=Q$/E.)]s,
decode(jis0212_raw => `str1 $var str2`) );
> I'm surprised that no one responded to this
suggestion.
I'd noticed only Juerd's original, not the <<FOO:koi
reply, because
in my hastiness, I carelessly ran % scan `pick -subj Smack`
and
so missed the intriguing reply.
Thanks, Johan! Glad you flagged it. Fun stuff, eh?
--tom
PS: Now that I think of it, those pod markups would be
better
written as C<<< <<"HIC"
>>> instead of C< <<"HIC" >,
because
the space you get around the string varies. This:
% ( echo "=head1 WITNESS" ; echo preamble
'I<<< <<"HIC":Latin1 >>>'
postamble ) | pod2text
WITNESS
preamble *<<"HIC":Latin1* postamble
is probably better than this:
% ( echo "=head1 WITNESS" ; echo preamble 'I<
<<"HIC":Latin1 >' postamble ) | pod2text
WITNESS
preamble * <<"HIC":Latin1 * postamble
|