|
List Info
Thread: Lang attribute and "old" latin
|
|
| Lang attribute and "old" latin |
  United States |
2008-04-24 19:04:13 |
All,
As far as I know, current screen reading technology only
supports a limited
number of languages.
I am in the process of reviewing a number of web documents
that feature, in
part, a fair bit of "old Latin" (circa 13th
century - it's a cool academic
project). At any rate, W3C guidance states "Clearly
identify changes in the
natural language of a document's text and any text
equivalents (e.g.,
captions)." *AND* the ISO code for Latin is either
"LA" (ISO 639-1) or "LAT"
(ISO 639-2) so clearly this *CAN* be done.
As well, wikipedia suggests that "Screen readers
without Unicode support
will read a character outside Latin-1 as a question mark,
and even in the
latest version of JAWS, the most popular screen reader,
Unicode characters
are very difficult to read." (Is this true, I was not
aware of this. The
document often uses þ throughout this old Latin
text - is this going
to be an issue?)
The question is, is there any real advantage gained by
adding this
information (lang="lat") to the content? It
is/would be a huge undertaking,
and if *not* done is pedantically/dogmatically wrong (fails
WCAG P1 4.1),
however I am at a loss to explain any real value in doing it
to the client
as at the end of the day I cannot myself find a "real
justification" that
would improve the accessibility of the document.
Thoughts, arguments (either side) and other support
gratefully accepted.
Cheers!
JF
|
|
| Re: Lang attribute and "old"
latin |
  China |
2008-04-24 20:24:27 |
ON FRI, 25 APR 2008 08:04:13 +0800, JOHN FOLIOT - STANFORD
ONLINE
ACCESSIBILITY PROGRAM <JFOLIOT STANFORD.EDU> WROTE:
> AS FAR AS I KNOW, CURRENT SCREEN READING TECHNOLOGY
ONLY SUPPORTS A
> LIMITED NUMBER OF LANGUAGES.
>
> I AM IN THE PROCESS OF REVIEWING A NUMBER OF WEB
DOCUMENTS THAT FEATURE,
> IN PART, A FAIR BIT OF "OLD LATIN" (CIRCA
13TH CENTURY - IT'S A COOL
> ACADEMIC PROJECT).
...
> THE DOCUMENT OFTEN USES Þ THROUGHOUT THIS OLD
LATIN TEXT - IS THIS
> GOING TO BE AN ISSUE?)
...
> THE QUESTION IS, IS THERE ANY REAL ADVANTAGE GAINED BY
ADDING THIS
> INFORMATION (LANG="LAT") TO THE CONTENT?
DEPENDS ON A LOT OF THINGS. IF NOTHING HAS SPECIAL HANDLING
FOR LATIN,
THEN NO. SO WHAT MIGHT HAVE SPECIAL HANDLING?
1. SCREENREADERS - PRONUNCIATION HAS CHANGED (WHEREAS I
WOULD BE A LITTLE
SURPRISED TO LEARN OF ANY PARTICULAR PECULIARITIES IN
BRAILLE FOR LATIN -
ANYONE ACTUALLY KNOW?).
2. DICTIONARY LOOKUP - IF THIS IS AN ACADEMIC PROJECT THEN
IT MIGHT BE
MORE IMPORTANT FOR OTHER REASONS THAN FOR ACCESSIBILITY
REASONS. (IT IS
POSSIBLE TO MACHINE-PROCESS WORDS OR EVEN PHRASES IN VARIOUS
USEFUL WAYS,
E.G. FOR MACHINE TRANSLATION. IT IS SIGNIFICANTLY MORE
SUCCESSFUL IF YOU
KNOW FOR SURE WHAT LANGUAGE YOU ARE DEALING WITH).
3. TYPESETTING
IF YOU THINK NONE OF THESE APPLY FOR THE LIFE OF THE DATA IN
THE FORM YOU
MARK IT, AND THAT PEOPLE WILL BE ABLE TO MARK IT POST-FACTUM
IF SOMETHING
DOES COME UP, THEN NO, THERE IS NO REAL BENEFIT I CAN SEE.
IT DEPENDS ON
*HOW* MUCH OF AN UNDERTAKING IT IS - HAND-CODING LANGUAGE
TAGS WOULD BE AN
IDIOTIC WAY TO DO IT, BUT IT MAY BE EASY TO CREATE A SIMPLE
METHOD FOR
DOING IT IN A WYSIWYG STYLE, WHICH MAY MAKE IT WORTHWHILE.
OR IT MAY BE
TOTALLY INFEASIBLE, WHICH MEANS YOU TAKE THE "WE DIDN'T
DO THIS" UNTIL
SOMEONE COMES ALONG WITH THE ABILITY TO MAKE THE USE CASE
JUSTIFY THE
EFFORT (BY CHANGING ONE OR BOTH OF THOSE).
CHEERS
CHAALS
--
CHARLES MCCATHIENEVILE OPERA SOFTWARE, STANDARDS GROUP
JE PARLE FRANçAIS -- HABLO ESPAñOL -- JEG LæRER NORSK
HTTP://MY.OPERA.COM/CHAALS TRY OPERA 9.5:
HTTP://SNAPSHOT.OPERA.COM
|
|
| Re: Lang attribute and "old"
latin |

|
2008-04-25 01:12:11 |
John Foliot - Stanford Online Accessibility Program wrote:
> As far as I know, current screen reading technology
only supports a
> limited number of languages.
Rather limited, I'm afraid. Moreover, support to language
switching on
the basis of language markup (lang or xml:lang attributes)
is much more
limited.
In practical terms, using language markup at the top level
(<html> or
<body> element) is a good move: it takes a very small
effort, and it
helps some people. (But then it should be _correct_. It
often isn't, so
e.g. Google does not use the information.)
Using language markup at other markup levels, e.g. for
individual
paragraphs or even words, is rather pointless, sad to say.
There isn't
much support worth mentioning. (I use it, but mostly as a
matter of
principle, or habit, and not very consistently. Many W3C
pages,
including pages that declare that it should be used, don't
use it. Most
web pages don't even make a try, so what motivation is there
for
software developers to support it?)
That's the big picture. In details, there's a lot that could
be said,
especially about the problems, but this doesn't seem to be
an
interesting topic to most people. However, mostly for
"academic"
interest, I'll comment on your specific issues:
> I am in the process of reviewing a number of web
documents that
> feature, in part, a fair bit of "old Latin"
(circa 13th century -
> it's a cool academic project).
I took "old" Latin as referring to pre-classic
Latin... Anyway, there's
no useful standardized way to distinguish between different
forms of
Latin in language codes. You could use country codes, e.g.
"la-GB" to
refer to Latin as used in the United Kingdom, but this would
be
anachronistic for 13th century language and also useless.
> At any rate, W3C guidance states
> "Clearly identify changes in the natural language
of a document's
> text and any text equivalents (e.g., captions)."
I'm afraid nobody, including the W3C, takes that seriously.
It's just
too much trouble with little if any tangible benefit. It's
based on
theoretical ideas - largely, law, poorly analyzed ideas - on
the
_possible_ usefuless of language markup, rather than actual
experience.
> *AND* the ISO code
> for Latin is either "LA" (ISO 639-1) or
"LAT" (ISO 639-2) so clearly
> this *CAN* be done.
The technically correct language code for use in markup is
"la", with
lowercase as the recommended spelling. HTML and XML
specifications refer
to specifications that mandate the use of two-letter codes
for languages
that have one.
> As well, wikipedia suggests that "Screen readers
without Unicode
> support will read a character outside Latin-1 as a
question mark,
Character support is a different issue and should not depend
on language
markup, and mostly doesn't.
Generally, in special software like screen readers or
specialized
browsers, we should expect character support to be more
restricted than
in common modern browsers. Even Latin-1 isn't as safe as in
"normal"
browsing. For example, what would a screen reader do upon
encountering a
special character like " ¶"? Would it recognize it
as having a special
meaning (paragraph separator) and make a pause? Hardly. It
probably
spells it out. This might mean saying "pilcrow
sign", perhaps
independently of language being used (since characters names
aren't
widely localized - most characters don't even _have_ a name
in most
languages), which might be complete gibberish even to people
who
understand normal English.
> The question is, is there any real advantage gained by
adding this
> information (lang="lat") to the content?
Very little if at all. But if used, it should be
lang="la".
> I am at a loss to explain any real value
> in doing it to the client as at the end of the day I
cannot myself
> find a "real justification" that would
improve the accessibility of
> the document.
The best explanation that I could use (if someone offered to
pay me for
adding such markup and I needed to soup up
"internal" and "moral"
motivation) is the following (and it's lame, so this tells a
lot):
If a user opens your HTML page in a word processor like
Microsoft Word,
it will use the language markup, and this can be relevant
when spelling
checks are "on", i.e. words classified as
misspelled are highlighted.
Declaring Latin words as Latin prevents the program from
applying
English spelling rules to them. (The copy of Word I just
tested seems to
be Latin-ignorant. That is, it recognizes the words being in
Latin but
does not flag anything as misspelled and does not even
hyphenate Latin
words. But even this is probably better than treating them
as English or
some other language.)
On some browsers, like Firefox, the user can right-click on
a word and
get information about its language. Sometimes it is useful
to know that
a word is Latin. (But what are the odds that a user knows
about such
functionality?)
Style sheets, either page or user style sheets, could be
used to style
words in a particular language as different from others,
using a
selector like [lang="la"] or :lang(la). However,
this does not work e.g.
on IE 6, which does not recognize such selectors.
Moreover, some day some browsers or other software could
make real use
of the markup.
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/
~jkorpela/
|
|
| Re: Lang attribute and "old"
latin |

|
2008-04-25 01:11:52 |
John Foliot - Stanford Online Accessibility Program wrote:
> As far as I know, current screen reading technology
only supports a
> limited number of languages.
Rather limited, I'm afraid. Moreover, support to language
switching on
the basis of language markup (lang or xml:lang attributes)
is much more
limited.
In practical terms, using language markup at the top level
(<html> or
<body> element) is a good move: it takes a very small
effort, and it
helps some people. (But then it should be _correct_. It
often isn't, so
e.g. Google does not use the information.)
Using language markup at other markup levels, e.g. for
individual
paragraphs or even words, is rather pointless, sad to say.
There isn't
much support worth mentioning. (I use it, but mostly as a
matter of
principle, or habit, and not very consistently. Many W3C
pages,
including pages that declare that it should be used, don't
use it. Most
web pages don't even make a try, so what motivation is there
for
software developers to support it?)
That's the big picture. In details, there's a lot that could
be said,
especially about the problems, but this doesn't seem to be
an
interesting topic to most people. However, mostly for
"academic"
interest, I'll comment on your specific issues:
> I am in the process of reviewing a number of web
documents that
> feature, in part, a fair bit of "old Latin"
(circa 13th century -
> it's a cool academic project).
I took "old" Latin as referring to pre-classic
Latin... Anyway, there's
no useful standardized way to distinguish between different
forms of
Latin in language codes. You could use country codes, e.g.
"la-GB" to
refer to Latin as used in the United Kingdom, but this would
be
anachronistic for 13th century language and also useless.
> At any rate, W3C guidance states
> "Clearly identify changes in the natural language
of a document's
> text and any text equivalents (e.g., captions)."
I'm afraid nobody, including the W3C, takes that seriously.
It's just
too much trouble with little if any tangible benefit. It's
based on
theoretical ideas - largely, law, poorly analyzed ideas - on
the
_possible_ usefuless of language markup, rather than actual
experience.
> *AND* the ISO code
> for Latin is either "LA" (ISO 639-1) or
"LAT" (ISO 639-2) so clearly
> this *CAN* be done.
The technically correct language code for use in markup is
"la", with
lowercase as the recommended spelling. HTML and XML
specifications refer
to specifications that mandate the use of two-letter codes
for languages
that have one.
> As well, wikipedia suggests that "Screen readers
without Unicode
> support will read a character outside Latin-1 as a
question mark,
Character support is a different issue and should not depend
on language
markup, and mostly doesn't.
Generally, in special software like screen readers or
specialized
browsers, we should expect character support to be more
restricted than
in common modern browsers. Even Latin-1 isn't as safe as in
"normal"
browsing. For example, what would a screen reader do upon
encountering a
special character like " ¶"? Would it recognize it
as having a special
meaning (paragraph separator) and make a pause? Hardly. It
probably
spells it out. This might mean saying "pilcrow
sign", perhaps
independently of language being used (since characters names
aren't
widely localized - most characters don't even _have_ a name
in most
languages), which might be complete gibberish even to people
who
understand normal English.
> The question is, is there any real advantage gained by
adding this
> information (lang="lat") to the content?
Very little if at all. But if used, it should be
lang="la".
> I am at a loss to explain any real value
> in doing it to the client as at the end of the day I
cannot myself
> find a "real justification" that would
improve the accessibility of
> the document.
The best explanation that I could use (if someone offered to
pay me for
adding such markup and I needed to soup up
"internal" and "moral"
motivation) is the following (and it's lame, so this tells a
lot):
If a user opens your HTML page in a word processor like
Microsoft Word,
it will use the language markup, and this can be relevant
when spelling
checks are "on", i.e. words classified as
misspelled are highlighted.
Declaring Latin words as Latin prevents the program from
applying
English spelling rules to them. (The copy of Word I just
tested seems to
be Latin-ignorant. That is, it recognizes the words being in
Latin but
does not flag anything as misspelled and does not even
hyphenate Latin
words. But even this is probably better than treating them
as English or
some other language.)
On some browsers, like Firefox, the user can right-click on
a word and
get information about its language. Sometimes it is useful
to know that
a word is Latin. (But what are the odds that a user knows
about such
functionality?)
Style sheets, either page or user style sheets, could be
used to style
words in a particular language as different from others,
using a
selector like [lang="la"] or :lang(la). However,
this does not work e.g.
on IE 6, which does not recognize such selectors.
Moreover, some day some browsers or other software could
make real use
of the markup.
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/
~jkorpela/
|
|
| Re: Lang attribute and "old"
latin |

|
2008-04-25 02:09:18 |
Suppose that someone was thinking about incorporating
support for
Latin in screenreaders and went crawling the web to find how
much it
was used. If it wasn't marked (but "modern"
languages often are), they
would conclude that there wasn't any. A chicken and egg
problem.
If the rest of the text is in correct English, perhaps a
quick spell
check would highlight where the non-English text occurs,
making it
easier to mark up.
regards,
--
Alan Chuter,
Senior Web Accessibility Consultant, Technosite
(www.technosite.es)
Researcher, Inredis Project (www.inredis.es/)
Email: achuter technosite.es
Alternative email: achuter.technosite yahoo.com
Blogs: www.blogger.com/profile/09119760634682340619
On 25/04/2008, John Foliot - Stanford Online Accessibility
Program
<jfoliot stanford.edu> wrote:
>
> All,
>
> As far as I know, current screen reading technology
only supports a limited
> number of languages.
>
> I am in the process of reviewing a number of web
documents that feature, in
> part, a fair bit of "old Latin" (circa 13th
century - it's a cool academic
> project). At any rate, W3C guidance states
"Clearly identify changes in the
> natural language of a document's text and any text
equivalents (e.g.,
> captions)." *AND* the ISO code for Latin is
either "LA" (ISO 639-1) or "LAT"
> (ISO 639-2) so clearly this *CAN* be done.
>
> As well, wikipedia suggests that "Screen readers
without Unicode support
> will read a character outside Latin-1 as a question
mark, and even in the
> latest version of JAWS, the most popular screen
reader, Unicode characters
> are very difficult to read." (Is this true, I
was not aware of this. The
> document often uses þ throughout this old
Latin text - is this going
> to be an issue?)
>
> The question is, is there any real advantage gained by
adding this
> information (lang="lat") to the content? It
is/would be a huge undertaking,
> and if *not* done is pedantically/dogmatically wrong
(fails WCAG P1 4.1),
> however I am at a loss to explain any real value in
doing it to the client
> as at the end of the day I cannot myself find a
"real justification" that
> would improve the accessibility of the document.
>
> Thoughts, arguments (either side) and other support
gratefully accepted.
>
> Cheers!
>
>
> JF
>
>
>
>
|
|
| Re: Lang attribute and "old"
latin |
  United Kingdom |
2008-04-25 02:52:27 |
John Foliot - Stanford Online Accessibility Program wrote:
> As far as I know, current screen reading technology
only supports a limited
> number of languages.
[snip]
> The question is, is there any real advantage gained by
adding this
> information (lang="lat") to the content?
Even if there were no speech synthesis available for a
language, screen
readers like JAWS can announce language changes and users
can associate
particular voice configurations with particular languages.
As it happens, it looks like Classical Latin is among the
MBROLA voices:
http://t
cts.fpms.ac.be/synthesis/mbrola.html
It is therefore (at least theoretically) usable with at
least some
screen readers and text-to-speech software, e.g. NVDA,
FreeTTS (used by
FireVox), and Emacspeak:
http://www.nvda
.fr/spip.php?article14
http://mambo.ucsc.e
du/psl/mbrola/
http
://web.mit.edu/ATIC/src/emacspeak-9.0/mbrola
--
Benjamin Hawkes-Lewis
|
|
| Re: Lang attribute and "old"
latin |
  United Kingdom |
2008-04-25 02:58:41 |
Jukka K. Korpela wrote:
> Using language markup at other markup levels, e.g. for
individual
> paragraphs or even words, is rather pointless, sad to
say. There isn't
> much support worth mentioning. (I use it, but mostly as
a matter of
> principle, or habit, and not very consistently. Many
W3C pages,
> including pages that declare that it should be used,
don't use it. Most
> web pages don't even make a try, so what motivation is
there for
> software developers to support it?)
Software developers /do/ support it. JAWS, for example, can
switch
voices inline based on the LANG attribute.
> For example, what would a screen reader do upon
encountering a
> special character like " ¶"?
Depends on its configuration.
> Style sheets, either page or user style sheets, could
be used to style
> words in a particular language as different from
others, using a
> selector like [lang="la"] or :lang(la).
However, this does not work e.g.
> on IE 6, which does not recognize such selectors.
If you're going the trouble of adding lang attributes, you
could add
class attributes for IE6 backwards compatibility at the same
time.
--
Benjamin Hawkes-Lewis
|
|
| Re: Lang attribute and "old"
latin |
  Belgium |
2008-04-25 08:31:34 |
At 08:11 25/04/2008, Jukka K. Korpela wrote:
>John Foliot wrote:
>
> > As far as I know, current screen reading
technology only supports a
> > limited number of languages.
>
>Rather limited, I'm afraid.
It is indeed limited. See also the old thread (April 2005)
starting at
<http://lists.w3.org/Archives/Public/w3c-
wai-gl/2005AprJun/0097.html>.
However, the number of languages supported by, for example,
JAWS, is not
limited to the list at
<http://www.freedomscientific.com/fs_produ
cts/software_jawsinfo.asp>.
Local distributors, for example Freedom Scientific Benelux,
can deliver a
JAWS version with a speech synthesizer for Dutch.
For a version that supports Latin, I would contact
Freedom Scientific Vatican City
>Moreover, support to language switching on
>the basis of language markup (lang or xml:lang
attributes) is much more
>limited.
In the tests I did with JAWS last year, language switching
worked with
lang, but xml:lang was ignored.
Language subcodes may not work as expected in some screen
readers
(based on my tests with JAWS; I tried to collect data for
other screen
readers, without success; see
<http://lists.w3.org/Archives/Public/w3c-w
ai-ig/2008JanMar/0041.html>,
test data are still welcome).
>In practical terms, using language markup at the top
level (<html> or
><body> element) is a good move: it takes a very
small effort, and it
>helps some people. (But then it should be _correct_. It
often isn't, so
>e.g. Google does not use the information.)
Even when the language markup is correct, Google does not
necessarily use that information. I have found webpages in
Dutch with
correct language markup that still show up in the results
when I
explicitly ask Google to return only pages in English.
>Using language markup at other markup levels, e.g. for
individual
>paragraphs or even words, is rather pointless, sad to
say. There isn't
>much support worth mentioning. (I use it, but mostly as
a matter of
>principle, or habit, and not very consistently. Many W3C
pages,
>including pages that declare that it should be used,
don't use it. Most
>web pages don't even make a try, so what motivation is
there for
>software developers to support it?)
What is the threshold for "not much support"?
Using the same threshold, one might arrive at the conclusion
that
the percentage of screen reader users is so low that there
is
"not much need" for markup that benefits screen
reader users.
(I'm not accursing anyone on these lists, but see the
comments
by some of the anonymous cowards at
<http://www.computerworld.com/comments/node/9077118
?page=2>.)
>That's the big picture. In details, there's a lot that
could be said,
>especially about the problems, but this doesn't seem to
be an
>interesting topic to most people.
Just like global warming. That doesn't mean it's not
important.
(Global warming affects more people than web accessibility,
and still most people don't care enough to change their
behaviour.)
>However, mostly for "academic"
>interest, I'll comment on your specific issues:
>(...
> > At any rate, W3C guidance states
> > "Clearly identify changes in the natural
language of a document's
> > text and any text equivalents (e.g.,
captions)."
>
>I'm afraid nobody, including the W3C, takes that
seriously. It's just
>too much trouble with little if any tangible benefit.
It's based on
>theoretical ideas - largely, law, poorly analyzed ideas
- on the
>_possible_ usefuless of language markup, rather than
actual experience.
I guess Online Video Killed the Accessibility Star.
Most tutorials on captioning are in English and all too many
accessibility
tutorials in English (on captioning or any other subject)
pretend that
all documents are monolingual. By extension, they assume the
same for video.
(Some captioning formats actually have codes for language
switching,
but if you don't know where to look, you can waste a lot of
time
searching for that information.)
>(...)
Best regards,
Christophe
---
Please don't invite me to LinkedIn, Facebook, Quechup or
other
"social networks". You may have agreed to their
"privacy policy", but
I haven't.
--
Christophe Strobbe
K.U.Leuven - Dept. of Electrical Engineering - SCD
Research Group on Document Architectures
Kasteelpark Arenberg 10 bus 2442
B-3001 Leuven-Heverlee
BELGIUM
tel: +32 16 32 85 51
http://www.docarch.be/
Disclaimer: http
://www.kuleuven.be/cwis/email_disclaimer.htm
|
|
[1-8]
|
|