List Info

Thread: DO NOT REPLY New: - Patch for mod_autoindex to set the character set




DO NOT REPLY New: - Patch for mod_autoindex to set the character set
country flaguser name
United States
2007-04-12 12:12:56
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=42
105>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=42105

           Summary: Patch for mod_autoindex to set the
character set
           Product: Apache httpd-2
           Version: 2.3-HEAD
          Platform: All
        OS/Version: other
            Status: NEW
          Severity: normal
          Priority: P2
         Component: mod_autoindex
        AssignedTo: bugshttpd.apache.org
        ReportedBy: poemlsuse.de


[Summarizing from the dev list here. See http://marc.info/?l
=apache-httpd-
dev&m=117027634505806&w=2 and following posts.]

Users have a problem with directory listings generated by
mod_autoindex:
It is not possible to control the character setting which
which the
response is marked. The server cannot know what the real
encoding on
disk is, it decides on a very rough guess based on the OS it
is running
on: APR_HAS_UNICODE_FS, which is, as far (as little) as I
looked, 1 on
Windows, and 0 on Linux. Depending on it, mod_autoindex
decides whether
to add a (fixed) charset to the content type:

#if APR_HAS_UNICODE_FS                                      
                                                            
                                                            
                                                    
    ap_set_content_type(r,
"text/html;charset=utf-8");
#else                                                       
                                                            
                                                            
                                                    
    ap_set_content_type(r, "text/html");
#endif                                                      
                                                            
                                                            
                                                    

Thing is, that Linux uses filesystems that encode UTF-8
since ages, and
since a system-wide UTF-8 locale is becoming more and more
widespread,
filenames encoded as such are occurring much more
frequently. This
means, that on many servers the content type needs to be
set
appropriately, so the browser can display things correctly.

My first thought was to define APR_HAS_UNICODE_FS to 1, but
that could
be just as wrong; it only means that the filesystem is
unicode capable
but not that the actual filenames happen to be encoded like
that.
Instead, it only depends on site specific needs.

Thus, I think the right way is to make the character set
configurable.
I am attaching a patch which adds a
"AddDirectoryIndexCharset" directive
to the mod_autoindex configuration.

The patch actually removes the dependency on
APR_HAS_UNICODE_FS. My
train of thought here is that utf-8 can (and should) be the
default,
unless configured otherwise. This fits Windows (it has
always been like
that), and it (largely) fits Linux. But I don't know about
other
platforms.

On Thu, Feb 01, 2007 at 11:13:38AM -0600, William A. Rowe,
Jr. wrote:
> Dr. Peter Poeml wrote:                                 
                                                            
                                                            
                                                       
> > On Thu, Feb 01, 2007 at 10:59:46 +0000, Joe Orton
wrote:                                                      
                                                            
                                                           
> >> On Wed, Jan 31, 2007 at 09:45:12PM +0100, Dr.
Peter Poeml wrote:                                          
                                                            
                                                            
 
> >>> Users have a problem with directory
listings generated by mod_autoindex:                        
                                                            
                                                            
          
> >>> It is not possible to control the
character setting which which the                           
                                                            
                                                            
            
> >>> response is marked.                       
                                                            
                                                            
                                                            
   
> >> AddDefaultCharset does allow this already as
you mention in the bug.                                     
                                                            
                                                            
  
> >> Can't users who insist on using filenames
using one encoding and file                                 
                                                            
                                                            
     
> >> content using another simply use:             
                                                            
                                                            
                                                            

> >>                                               
                                                            
                                                            
                                                            

> >> AddDefaultCharset UTF-8                       
                                                            
                                                            
                                                            

> >> AddCharset ISO-8859-1 .html                   
                                                            
                                                            
                                                            

> >>                                               
                                                            
                                                            
                                                            

> >> or similar?                                   
                                                            
                                                            
                                                            

> >                                                   
                                                            
                                                            
                                                          
> > I don't think so, because it means                
                                                            
                                                            
                                                          
> >  1) that all .html files would need to be
ISO-8859-1                                                  
                                                            
                                                            
      
> >  2) you cannot have files with
charset=somethingelse anymore                               
                                                            
                                                            
                 
> >  3) all non-html files would need to be UTF-8
then, unless you add                                        
                                                            
                                                            
  
> >     AddCharset directives for all of them...      
                                                            
                                                            
                                                          
>                                                        
                                                            
                                                            
                                                       
> And you can't match by name.  I'm reviewing the patch,
but I'll already                                            
                                                            
                                                        
> offer a +1 on the concept.                             
                                                            
                                                            
                                                       

On Thu, Feb 01, 2007 at 10:01:52PM +0100, Ruediger Pluem
wrote:
> In the general case I agree with Joe that if things can
be done with existing                                       
                                                            
                                                       
> directives / code, no new directives / code should be
added, but this case here                                   
                                                            
                                                         
> is different.                                          
                                                            
                                                            
                                                       
>                                                        
                                                            
                                                            
                                                       
> I think it is the ultimate duty of the content
generator to set the correct                                
                                                            
                                                            
   
> content type / encoding. So in this case this would be
mod_autoindex. Whether                                      
                                                            
                                                        
> mod_autoindex detects this automatically or has a
directive to set this is another                            
                                                            
                                                            

> story. Currently I would be in favour of a directive
provided that there is                                      
                                                            
                                                          
> no reliable and performant autodetection mechanism.    
                                                            
                                                            
                                                       
>                                                        
                                                            
                                                            
                                                       
> From my point of view AddDefaultCharset and AddCharset
should be used to                                           
                                                            
                                                        
>                                                        
                                                            
                                                            
                                                       
> - configure the "core content generator" of
httpd (serving static files)                                
                                                            
                                                            
      
> - help fixing broken content generators who cannot set
the encoding correctly                                      
                                                            
                                                        
>   by themselves                                        
                                                            
                                                            
                                                       
>                                                        
                                                            
                                                            
                                                       
> So +1 on the general concept.                          
                                                            
                                                            
                                                       

Cool.

Here is the patch against trunk, with documentation added.

I hope I got the way of patching the documentation right. A
review would
be very much appreciated.

Thanks,
Peter

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=ema
il
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the
assignee.

------------------------------------------------------------
---------
To unsubscribe, e-mail: bugs-unsubscribehttpd.apache.org
For additional commands, e-mail: bugs-helphttpd.apache.org


DO NOT REPLY - Patch for mod_autoindex to set the character set
country flaguser name
United States
2007-08-30 16:45:51
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=42
105>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=42105





------- Additional Comments From wroweapache.org  2007-08-30
14:45 -------
Something similar was created to add IndexOptions
Type=content/type Charset=foo

and will be available in the next 2.0 and 2.2 releases of
httpd.

We are a bit premature to presume a utf-8 on unix-ish
systems, because by
definition they are bytestreams.  But that said, OS/X made
it explicit that
filenames are UTF-8, so we follow your suggestion on at
least one 'unix' 

Thank you for your report!

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=ema
il
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the
assignee.

------------------------------------------------------------
---------
To unsubscribe, e-mail: bugs-unsubscribehttpd.apache.org
For additional commands, e-mail: bugs-helphttpd.apache.org


DO NOT REPLY - Patch for mod_autoindex to set the character set
country flaguser name
United States
2007-08-30 16:46:10
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=42
105>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=42105


wroweapache.org changed:

           What    |Removed                     |Added
------------------------------------------------------------
----------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED




-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=ema
il
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the
assignee.

------------------------------------------------------------
---------
To unsubscribe, e-mail: bugs-unsubscribehttpd.apache.org
For additional commands, e-mail: bugs-helphttpd.apache.org


[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )