List Info

Thread: locale2lang-0.1 - BUG + fix: fallback to language_only extracted from Accept-Language h




locale2lang-0.1 - BUG + fix: fallback to language_only extracted from Accept-Language h
user name
2007-08-30 18:59:59
hi samizdat-devel,

i think the bug + fix below could solve a practical problem
for many
non-english speaking indymedia collectives or other
independent media
groups: "activist spam" which someone posts as
identical articles, in
English, on several dozen different local indymedia sites.
This sort
of article is sometimes serious and sometimes more like
conspiracy
theory, but AFAIK the people doing it usually have
"en-US"  in their
browser http Accept-Language header.  If the mono option is
enabled
by sysadmin and the user chooses this option:
  https://savan
nah.nongnu.org/patch/?6167
and if his/her preferred language is non-English, then s/he
will 
not even notice the presence of the "activist
spam" article.

This could possibly imply less intervention or less urgent
intervention
is needed by moderators (depending on the editorial policy,
of course):
the decision and filtering of what to read (ignoring
non-preferred languages
rather than just not preferring them) is made by the reader,
not by
an editorial collective de facto deciding on behalf of the
whole local
activist community.  (Of course, ignoring real spam is not a
good idea.
For that we have the Antispam class in antispam.rb .)

Anyway, read on if you're interested. 

cheers
boud



[bug #20932] locale2lang-0.1 - BUG + fix: fallback to
language_only
     extracted from Accept-Language http header is needed


URL:
   <http://sav
annah.nongnu.org/bugs/?20932>

                  Summary: locale2lang-0.1 - BUG + fix:
fallback to
language_only extracted from Accept-Language http header is
needed
                  Project: Samizdat
             Submitted by: boud
             Submitted on: Wednesday 08/29/2007 at 22:04
                 Category: None
                 Severity: 3 - Normal
                   Status: Works For Me
                  Privacy: Public
              Assigned to: None
              Open/Closed: Open
          Discussion Lock: Any

    
_______________________________________________________

Details:

PROBLEM:
Even though RFC 2616 recommends that user clients (e.g.
firefox)
should recommend to their users to have a backup generic
language
without a country code (e.g. "en" in addition to
"en-US"),
in practice most users do not do this.
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.
html#sec14.4

In particular, for non-english language samizdat sites,
this
means that people who have only "en-US" sent by
their browser
end up getting the default local language of the site.
Their
article then gets published with message.language = the
local
language, not "en", since formally speaking, they
state that
they prefer "en-US" to the local language, but
they are not
interested in "en".

This implies that if people want to add a local translation
rather than hiding an article, then moderator intervention
is required to change the language (unless the user chose
open editing).

Moreover, the monolanguage patch https://savan
nah.nongnu.org/patch/?6167
(still under
development) will fail to exclude these type of articles
under
the mono option, since their language is wrongly tagged
(except
for a pedantic interpretation of their request).

For these reasons, i'm putting this as a bug (with a
proposed
fix) rather than a patch.

PROPOSED SOLUTION:
This requires a reasonably modern version of ruby gettext,
e.g.
debian 1.7.0-1 or later.  Copying gettext/locale_object.rb
into an older installation and using an appropriate require
statement is a hack to avoid a full installation of a
recent
gettext.

The idea is that if a requested accept-language in the list
is not found, then parse off the language part of it and
try
that instead. This could potentially create multiple
entries
of the same language, but i suspect that shouldn't be a
problem.


--- s070818/samizdat/lib/samizdat/engine/request.rb    
2007-08-14
01:16:53.000000000 +0200
+++ /usr/lib/ruby/1.8/samizdat/engine/request.rb       
2007-08-29
23:02:06.869866760 +0200
 -165,8
+173,17 
        accept.scan(/([^ ,;]+)(?:;q=([^ ,;]+))?/).collect
{|l, q|
          [l, (q ? q.to_f : 1.0)]
        }.sort_by {|l, q| -q }.each {|l, q|
-        accept_language.push l if config_lang.include? l
+#        accept_language.push l if config_lang.include? l
+        if config_lang.include? l
+          accept_language.push l
+        else
+          # try converting full locale (language tag) to
ISO-639 language
only
+          # http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.
html#sec14.4
+          lang_only = Locale::Object.new(l).language
+          accept_language.push lang_only if
config_lang.include? lang_only
+        end
        }
+
      # lang cookie overrides Accept-Language
      lang = cookie('lang') and config_lang.include? lang
and
        accept_language.unshift lang


FUTURE EXTENSIONS:
The relations between human languages and how close or
distant
they are are well studied. A measure of the distance
between
different languages could potentially be used as a backup
to
find the likely closest language that a user would prefer
rather than just taking what is considered the
"language"
part of the locale/Accept-Language string.

Since the "narratives" which claim different
national identities
often try to claim sharp distinctions between closely
related
languages, this could potentially be a quite politically
sensitive issue. This is not surprising, and is not IMHO an
argument against doing this: an RDF engine specifically
aimed
for grassroots, non-authoritarian media is necessarily
going
to challenge artificial linguistic barriers if it's to get
somewhere near doing its task.

In any case, users with their own notions of language
preferences
would still be able to state this by all the presently
available
methods; adding a language metric would only be used as a
fallback.

COMMENT:
The Locale:: module could probably also be used to check
the
config files for valid languages and warn about invalid
languages/locales.



    
_______________________________________________________

File Attachments:


-------------------------------------------------------
Date: Wednesday 08/29/2007 at 22:04  Name:
070829_locale2lang-0.1  Size: 997B
   By: boud

<http://savannah.nongnu.org/bugs/download.php?file_
id=13832>

    
_______________________________________________________

Reply to this item at:

   <http://sav
annah.nongnu.org/bugs/?20932>

_______________________________________________
   Message sent via/by Savannah
   http://savannah.nongnu.or
g/



_______________________________________________
samizdat-devel mailing list
samizdat-develnongnu.org
http://lists.nongnu.org/mailman/listinfo/samizdat-devel

[1]

about | contact  Other archives ( Real Estate discussion Medical topics )