List Info

Thread: Charset conversion / stylistic question.




Charset conversion / stylistic question.
user name
2007-04-17 15:03:01
Hi all,

I'm just starting out and still a little bit intimidated
about how to
do things correctly in erlang and would like some advice.
(In other
words, please bear with me.) I'd like to write some routines
to recode
charsets and I'm not sure what would be the best way to go
about it.
What I've come up with so far is:

  map (Function, <<A:8, Rest/binary>>) ->
list_to_binary([Function(A)|
map (Function, Rest)]);
  map (_Function, <<>>) -> <<>>.

  convert({ebcdic, iso_8859_1},
<<Binary/binary>>) -> map(fun
cp037_to_iso8859_1/1, Binary);
  (...)
  convert({From, To}, <<_Binary/binary>>) ->
{unsupported_encoding, From, To}.

  cp037_to_iso8859_1 (0) -> 0;
  (...)
  cp037_to_iso8859_1 (129) -> 97; % a
  cp037_to_iso8859_1 (130) -> 98; % b
  cp037_to_iso8859_1 (131) -> 99; % c
  cp037_to_iso8859_1 (132) -> 100; % d
  cp037_to_iso8859_1 (133) -> 101; % e
  (...)
  cp037_to_iso8859_1 (255) -> 159.


which works, so I'm relieved. But I'm not sure about the
most
appropriate way to write the actual conversion functions.
Using
pattern matching for each conversion like above, or just
having one
function body with a bunch of if's in it or a case
statement...

>From the languages I'm used to, I'd probably use a bunch
of arrays and
have the value of the `from` charset be an index into the
conversion
array. This would be possible as well, though it wouldn't be
possible
to convert multibyte charsets like UTF-8...

Thanks in advance and sorry about the newbie question,
   -tim
_______________________________________________
erlang-questions mailing list
erlang-questionserlang.org
http://www.erlang.org/mailman/listinfo/erlang-questions

Re: Charset conversion / stylistic question.
user name
2007-04-17 15:38:24

Hi Tim,

There is a problem in your map/2 function.
The call to list_to_binary/1 makes it non-tail recursive,
so the recursion cannot reuse the stack frame.
Also, you will get many unnecessary calls to list_to_binary()
(one per iteration).

One could instead write like this

map(F, Bin) ->
&nbsp; list_to_binary(map_1(F, Bin))

map_1(F, <<A:8, Rest/binary>>) -> [F(A)|map_1(Rest)];
map_1(_, <<&gt;>) -> [].

In your map, the terminating case returned <<&gt;>, which
would be right for a call to map(F, <<&gt;>), but when
ending the iteration creates a non-proper list, e.g.
[$a,$b,$c|<&lt;>>]. This is tolerated by list_to_binary(),
but you should be aware that it is happening, because
it can bite you in other situations.

In OTP R11B-4, there is support for binary comprehensions,
even though it's still experimental. With it, I believe your
function could be done in this fashion (haven';t tried it
myself, the syntax might not be quite correct):

map(F, Bin) ->
&nbsp; &nbsp; << F(C):1 || C:1 <<- Bin >>.

BR,
Ulf W

2007/4/17, Tim Becker < tim.beckergmx.net">tim.beckergmx.net&gt;:
Hi all,

I9;m just starting out and still a little bit intimidated about how to
do things correctly in erlang and would like some advice. (In other
words, please bear with me.) I'd like to write some routines to recode
charsets and I'm not sure what would be the best way to go about it.
What I've come up with so far is:

 ; map (Function, <<A:8, Rest/binary>>) -> list_to_binary([Function(A)|
map (Function, Rest)]);
 &nbsp;map (_Function, <<&gt;>) -> <<&gt;>.

&nbsp; convert({ebcdic, iso_8859_1}, <<Binary/binary>>) -> map(fun
cp037_to_iso8859_1/1, Binary);
&nbsp; (...)
 ; convert({From, To}, <<_Binary/binary>>) -> {unsupported_encoding, From, To}.

&nbsp; cp037_to_iso8859_1 (0) -> 0;
 &nbsp;(...)
  ;cp037_to_iso8859_1 (129) -> 97; % a
 &nbsp;cp037_to_iso8859_1 (130) -> 98; % b
 &nbsp;cp037_to_iso8859_1 (131) -> 99; % c
 &nbsp;cp037_to_iso8859_1 (132) -> 100; % d
  cp037_to_iso8859_1 (133) -> 101; % e
 &nbsp;(...)
&nbsp; cp037_to_iso8859_1 (255) -> 159.


which works, so I'm relieved. But I'm not sure about the most
appropriate way to write the actual conversion functions. Using
pattern matching for each conversion like above, or just having one
function body with a bunch of if's in it or a case statement...

>From the languages I'm used to, I'd probably use a bunch of arrays and
have the value of the `from` charset be an index into the conversion
array. This would be possible as well, though it wouldn';t be possible
to convert multibyte charsets like UTF-8...

Thanks in advance and sorry about the newbie question,
 &nbsp; -tim
_______________________________________________
erlang-questions mailing list
erlang-questionserlang.org">erlang-questionserlang.org
http://www.erlang.org/mailman/listinfo/erlang-questions

[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )