List Info

Thread: questions about "Language" info packets




questions about "Language" info packets
user name
2007-02-13 05:38:35
nut.txt says:
|   "Language"
|       ISO 639 and ISO 3166 for language/country code

Does "ISO 639" mean ISO 639-1 or ISO 639-2?
Are both codes required or allowed?  If yes, in what
format?

|       something like "eng" (US English)

When using a three-letter code from ISO 639-2, should a nut
writer use
the bibliographic or the terminology code?

Are two-letter codes allowed at all?

|       can be 0 if unknown

Does this mean that there is no Language entry, or that it
is an emtpy
string, or that it is a string containing a zero byte, or
that the
string is "0"?

|       and "multi" if several languages

ISO 639-2 already has "mul" for multiple
languages.
Does this mean that both "mul" and
"multi" are allowed?


Regards,
Clemens
_______________________________________________
NUT-devel mailing list
NUT-develmplayerhq.hu

http://lists.mplayerhq.hu/mailman/listinfo/nut-devel

Re: questions about "Language" info packets
user name
2007-02-13 07:15:08
Hi

On Tue, Feb 13, 2007 at 12:38:35PM +0100, Clemens Ladisch
wrote:
> nut.txt says:
> |   "Language"
> |       ISO 639 and ISO 3166 for language/country code
> 
> Does "ISO 639" mean ISO 639-1 or ISO 639-2?
> Are both codes required or allowed?  If yes, in what
format?

that is a very good question, as the example below is a ISO
639-2 code
i think its clear that ISO 639-2 is allowed

furthermore there is a link
h
ttp://www.loc.gov/standards/iso639-2/englangn.html
pointng to 639-2 but none to 639-1 so id say 639-1 is not
allowed
also all 639-1 codes have a code in 639-2 while many 639-2
codes
do not have one in 639-1
comments are of course welcome ...


> 
> |       something like "eng" (US English)
> 
> When using a three-letter code from ISO 639-2, should a
nut writer use
> the bibliographic or the terminology code?

that is also a very good question, i think none of us was
aware that there
are 2 different codes for some languages (that is one based
on the native
word for the language and one based on the english word) but
luckily the
majority of the languages has just 1 code


> 
> Are two-letter codes allowed at all?

id say no


> 
> |       can be 0 if unknown
> 
> Does this mean that there is no Language entry, or that
it is an emtpy
> string, or that it is a string containing a zero byte,
or that the
> string is "0"?

hmm ISO 639-2 contains a "und" for undetermined
and nothing in our spec
forbids its use so iam tempted to say that "und"
must/should be used if
unknown and applications must treat a empty string like
"und"


> 
> |       and "multi" if several languages
> 
> ISO 639-2 already has "mul" for multiple
languages.
> Does this mean that both "mul" and
"multi" are allowed?

id handle this like above:
"mul" must/should be used if multiple languages
but demuxers must
treat "multi" like "mul"

oppinions?, comments? 

[...]
-- 
Michael     GnuPG fingerprint:
9FF2128B147EF6730BADF133611EC787040B0FAB

The greatest way to live with honor in this world is to be
what we pretend
to be. -- Socrates

_______________________________________________
NUT-devel mailing list
NUT-develmplayerhq.hu

http://lists.mplayerhq.hu/mailman/listinfo/nut-devel
Re: questions about "Language" info packets
country flaguser name
Italy
2007-02-13 08:38:28
Michael Niedermayer wrote:

> 
> opinions?, comments? 
> 

Fine by me.

Who is going to update the spec with such clarification?

Maybe is better to produce an addendum/errata

lu

-- 

Luca Barbato

Gentoo/linux Gentoo/PPC
http://dev.gentoo.org/
~lu_zero

_______________________________________________
NUT-devel mailing list
NUT-develmplayerhq.hu

http://lists.mplayerhq.hu/mailman/listinfo/nut-devel

Re: questions about "Language" info packets
user name
2007-02-13 09:26:11
Michael Niedermayer wrote:
> On Tue, Feb 13, 2007 at 12:38:35PM +0100, Clemens
Ladisch wrote:
> > nut.txt says:
> > |   "Language"
> > |       ISO 639 and ISO 3166 for language/country
code
> > 
> > Does "ISO 639" mean ISO 639-1 or ISO
639-2?
> > Are both codes required or allowed?  If yes, in
what format?
> 
> that is a very good question, as the example below is a
ISO 639-2 code
> i think its clear that ISO 639-2 is allowed
> 
> furthermore there is a link
> h
ttp://www.loc.gov/standards/iso639-2/englangn.html
> pointng to 639-2 but none to 639-1 so id say 639-1 is
not allowed
> also all 639-1 codes have a code in 639-2 while many
639-2 codes
> do not have one in 639-1
> comments are of course welcome ...
> 
> > |       something like "eng" (US
English)
> > 
> > When using a three-letter code from ISO 639-2,
should a nut writer use
> > the bibliographic or the terminology code?
> 
> that is also a very good question, i think none of us
was aware that there
> are 2 different codes for some languages (that is one
based on the native
> word for the language and one based on the english
word) but luckily the
> majority of the languages has just 1 code

And we Germans are out of luck and cannot use nut?  

If the language code were just used as a code, it wouldn't
matter which
one is to be used, but there are certain players that just
display the
raw code instead of converting it to a language name, so I
think it
makes sense to let the encoder choose which one to use.

> > Are two-letter codes allowed at all?
> 
> id say no

So ISO 3166 is out, too?

> > |       can be 0 if unknown
> > 
> > Does this mean that there is no Language entry, or
that it is an emtpy
> > string, or that it is a string containing a zero
byte, or that the
> > string is "0"?
> 
> hmm ISO 639-2 contains a "und" for
undetermined and nothing in our spec
> forbids its use so iam tempted to say that
"und" must/should be used if
> unknown and applications must treat a empty string like
"und"
> 
> > |       and "multi" if several
languages
> > 
> > ISO 639-2 already has "mul" for multiple
languages.
> > Does this mean that both "mul" and
"multi" are allowed?
> 
> id handle this like above:
> "mul" must/should be used if multiple
languages but demuxers must
> treat "multi" like "mul"

OK.  Proposed new description:

    "Language"
        An ISO 639-2 (three-letter) language code, e.g.
"eng" for English
        (see <http://www.loc.gov/standards/iso639-2/php/code_list.p
hp>).
        All codes defined in ISO 639-2 are allowed,
including "und"
        (Undetermined), "mul" (Multiple languages)
and the bibliographic/
        terminology variants.
        For historical reasons, demuxers MUST treat
"multi" like "mul" and
        "" (the empty string) like
"und".


Regards,
Clemens
_______________________________________________
NUT-devel mailing list
NUT-develmplayerhq.hu

http://lists.mplayerhq.hu/mailman/listinfo/nut-devel

Re: questions about "Language" info packets
country flaguser name
United States
2007-02-13 14:36:34
On Tue, Feb 13, 2007 at 04:26:11PM +0100, Clemens Ladisch
wrote:
> Michael Niedermayer wrote:
> > On Tue, Feb 13, 2007 at 12:38:35PM +0100, Clemens
Ladisch wrote:
> > > nut.txt says:
> > > |   "Language"
> > > |       ISO 639 and ISO 3166 for
language/country code
> > > 
> > > Does "ISO 639" mean ISO 639-1 or
ISO 639-2?
> > > Are both codes required or allowed?  If yes,
in what format?
> > 
> > that is a very good question, as the example below
is a ISO 639-2 code
> > i think its clear that ISO 639-2 is allowed
> > 
> > furthermore there is a link
> > h
ttp://www.loc.gov/standards/iso639-2/englangn.html
> > pointng to 639-2 but none to 639-1 so id say 639-1
is not allowed
> > also all 639-1 codes have a code in 639-2 while
many 639-2 codes
> > do not have one in 639-1
> > comments are of course welcome ...
> > 
> > > |       something like "eng" (US
English)
> > > 
> > > When using a three-letter code from ISO
639-2, should a nut writer use
> > > the bibliographic or the terminology code?
> > 
> > that is also a very good question, i think none of
us was aware that there
> > are 2 different codes for some languages (that is
one based on the native
> > word for the language and one based on the english
word) but luckily the
> > majority of the languages has just 1 code
> 
> And we Germans are out of luck and cannot use nut? 


Huh??

> If the language code were just used as a code, it
wouldn't matter which
> one is to be used, but there are certain players that
just display the
> raw code instead of converting it to a language name,
so I think it
> makes sense to let the encoder choose which one to
use.

If this isn't acceptable to the user then the user should
choose a
player with more "user friendly" display. Existing
legacy devices
won't play nut files anyway so it's something of a
non-issue.

> > > Are two-letter codes allowed at all?
> > 
> > id say no
> 
> So ISO 3166 is out, too?

I'm against 2-letter codes. The number of languages is way
too large
for these codes to be remotely sufficient.

> OK.  Proposed new description:
> 
>     "Language"
>         An ISO 639-2 (three-letter) language code, e.g.
"eng" for English
>         (see <http://www.loc.gov/standards/iso639-2/php/code_list.p
hp>).
>         All codes defined in ISO 639-2 are allowed,
including "und"
>         (Undetermined), "mul" (Multiple
languages) and the bibliographic/
>         terminology variants.
>         For historical reasons, demuxers MUST treat
"multi" like "mul" and
>         "" (the empty string) like
"und".

Historical reasons?? There are no such files, and this is a
draft
(albeit frozen) spec. I don't see any way that translating
"multi" to
"mul" and "" to "und" would
improve functionality over just treating
them as an unexpected value. If there's cruft in the spec
that can be
removed without really hurting anything, I'd like to remove
it.

Rich
_______________________________________________
NUT-devel mailing list
NUT-develmplayerhq.hu

http://lists.mplayerhq.hu/mailman/listinfo/nut-devel

Re: questions about "Language" info packets
user name
2007-02-13 15:35:09
Hi

On Tue, Feb 13, 2007 at 04:26:11PM +0100, Clemens Ladisch
wrote:
> Michael Niedermayer wrote:
> > On Tue, Feb 13, 2007 at 12:38:35PM +0100, Clemens
Ladisch wrote:
> > > nut.txt says:
> > > |   "Language"
> > > |       ISO 639 and ISO 3166 for
language/country code
> > > 
> > > Does "ISO 639" mean ISO 639-1 or
ISO 639-2?
> > > Are both codes required or allowed?  If yes,
in what format?
> > 
> > that is a very good question, as the example below
is a ISO 639-2 code
> > i think its clear that ISO 639-2 is allowed
> > 
> > furthermore there is a link
> > h
ttp://www.loc.gov/standards/iso639-2/englangn.html
> > pointng to 639-2 but none to 639-1 so id say 639-1
is not allowed
> > also all 639-1 codes have a code in 639-2 while
many 639-2 codes
> > do not have one in 639-1
> > comments are of course welcome ...
> > 
> > > |       something like "eng" (US
English)
> > > 
> > > When using a three-letter code from ISO
639-2, should a nut writer use
> > > the bibliographic or the terminology code?
> > 
> > that is also a very good question, i think none of
us was aware that there
> > are 2 different codes for some languages (that is
one based on the native
> > word for the language and one based on the english
word) but luckily the
> > majority of the languages has just 1 code
> 
> And we Germans are out of luck and cannot use nut? 

> 
> If the language code were just used as a code, it
wouldn't matter which
> one is to be used, but there are certain players that
just display the
> raw code instead of converting it to a language name,
so I think it
> makes sense to let the encoder choose which one to
use.

hmm i understand both "deu" and "ger"
equally good/bad


> 
> > > Are two-letter codes allowed at all?
> > 
> > id say no
> 
> So ISO 3166 is out, too?

ISO 3166 has 2 and 3 letter codes too but i wasnt speaking
about that ...


[...]
-- 
Michael     GnuPG fingerprint:
9FF2128B147EF6730BADF133611EC787040B0FAB

No snowflake in an avalanche ever feels responsible. --
Voltaire

_______________________________________________
NUT-devel mailing list
NUT-develmplayerhq.hu

http://lists.mplayerhq.hu/mailman/listinfo/nut-devel
Re: questions about "Language" info packets
user name
2007-02-13 15:44:57
Hi

On Tue, Feb 13, 2007 at 03:36:34PM -0500, Rich Felker
wrote:
[...]
> > > > Are two-letter codes allowed at all?
> > > 
> > > id say no
> > 
> > So ISO 3166 is out, too?
> 
> I'm against 2-letter codes. The number of languages is
way too large
> for these codes to be remotely sufficient.

ISO 3166 is about country codes and about half of the codes
from the 2
letter codespace seem to be used or reserved in some way
...


> 
> > OK.  Proposed new description:
> > 
> >     "Language"
> >         An ISO 639-2 (three-letter) language code,
e.g. "eng" for English
> >         (see <http://www.loc.gov/standards/iso639-2/php/code_list.p
hp>).
> >         All codes defined in ISO 639-2 are
allowed, including "und"
> >         (Undetermined), "mul" (Multiple
languages) and the bibliographic/
> >         terminology variants.
> >         For historical reasons, demuxers MUST
treat "multi" like "mul" and
> >         "" (the empty string) like
"und".
> 
> Historical reasons?? There are no such files, and this
is a draft
> (albeit frozen) spec. I don't see any way that
translating "multi" to
> "mul" and "" to "und"
would improve functionality over just treating
> them as an unexpected value. If there's cruft in the
spec that can be
> removed without really hurting anything, I'd like to
remove it.

well then lets add a
"a muxer MUST ignore unknown language and country codes
instead of treating
them as an error"

[...]
-- 
Michael     GnuPG fingerprint:
9FF2128B147EF6730BADF133611EC787040B0FAB

Good people do not need laws to tell them to act
responsibly, while bad
people will find a way around the laws. -- Plato

_______________________________________________
NUT-devel mailing list
NUT-develmplayerhq.hu

http://lists.mplayerhq.hu/mailman/listinfo/nut-devel
Re: questions about "Language" info packets
user name
2007-02-13 15:47:37
Hi

On Tue, Feb 13, 2007 at 03:38:28PM +0100, Luca Barbato
wrote:
> Michael Niedermayer wrote:
> 
> > 
> > opinions?, comments? 
> > 
> 
> Fine by me.
> 
> Who is going to update the spec with such
clarification?

well i guess i will after we agree on what to do exactly


> 
> Maybe is better to produce an addendum/errata

maybe adding a history chapter to the spec would do which
contains
such non trivial changes?

[...]

-- 
Michael     GnuPG fingerprint:
9FF2128B147EF6730BADF133611EC787040B0FAB

it is not once nor twice but times without number that the
same ideas make
their appearance in the world. -- Aristotle

_______________________________________________
NUT-devel mailing list
NUT-develmplayerhq.hu

http://lists.mplayerhq.hu/mailman/listinfo/nut-devel
Re: questions about "Language" info packets
country flaguser name
United States
2007-02-14 00:13:43
On Tue, Feb 13, 2007 at 10:44:57PM +0100, Michael
Niedermayer wrote:
> > > OK.  Proposed new description:
> > > 
> > >     "Language"
> > >         An ISO 639-2 (three-letter) language
code, e.g. "eng" for English
> > >         (see <http://www.loc.gov/standards/iso639-2/php/code_list.p
hp>).
> > >         All codes defined in ISO 639-2 are
allowed, including "und"
> > >         (Undetermined), "mul"
(Multiple languages) and the bibliographic/
> > >         terminology variants.
> > >         For historical reasons, demuxers MUST
treat "multi" like "mul" and
> > >         "" (the empty string) like
"und".
> > 
> > Historical reasons?? There are no such files, and
this is a draft
> > (albeit frozen) spec. I don't see any way that
translating "multi" to
> > "mul" and "" to
"und" would improve functionality over just
treating
> > them as an unexpected value. If there's cruft in
the spec that can be
> > removed without really hurting anything, I'd like
to remove it.
> 
> well then lets add a
> "a muxer MUST ignore unknown language and country
codes instead of treating
> them as an error"

Certainly. It's almost essential from a practical standpoint
anyway,
since (I suppose... am I wrong?) language codes could be
added to
639-2 after your implementation was released, making your
implementation suddenly become non-compliant if you rejected
them.

Anyway from a usability standpoint, I think the important
feature is
that a piece of software, when searching for a given (known)
language,
is able to find such a stream if one exists. This doesn't
require any
semantic interpretation of the codes, just an agreement on
which codes
will be used.

Rich
_______________________________________________
NUT-devel mailing list
NUT-develmplayerhq.hu

http://lists.mplayerhq.hu/mailman/listinfo/nut-devel

Re: questions about "Language" info packets
user name
2007-02-14 02:32:48
Michael Niedermayer wrote:
> On Tue, Feb 13, 2007 at 03:36:34PM -0500, Rich Felker
wrote:
> [...]
> > > So ISO 3166 is out, too?
> > 
> > I'm against 2-letter codes. The number of
languages is way too large
> > for these codes to be remotely sufficient.
> 
> ISO 3166 is about country codes and about half of the
codes from the 2
> letter codespace seem to be used or reserved in some
way ...

But should three-letter country codes be allowed?
In that case, how should the entire language string be
formatted?
Something like "lll-ccc" where both
"lll" and "-ccc" are optional?

> > >     For historical reasons, demuxers MUST
treat "multi" like "mul" and
> > >     "" (the empty string) like
"und".
> > 
> > Historical reasons?? There are no such files, and
this is a draft
> > (albeit frozen) spec.

Well, I interpreted "frozen" to mean that no
incompatible changes could
be made at all ...

> > I don't see any way that translating
"multi" to
> > "mul" and "" to
"und" would improve functionality over just
treating
> > them as an unexpected value. If there's cruft in
the spec that can be
> > removed without really hurting anything, I'd like
to remove it.
> 
> well then lets add a
> "a muxer MUST ignore unknown language and country
codes instead of treating
> them as an error"

Agreed.


Regards,
Clemens
_______________________________________________
NUT-devel mailing list
NUT-develmplayerhq.hu

http://lists.mplayerhq.hu/mailman/listinfo/nut-devel

[1-10] [11-13]

about | contact  Other archives ( Real Estate discussion Medical topics )