List Info

Thread: file URL is overspecified




file URL is overspecified
country flaguser name
Poland
2007-06-15 05:50:14

The URI reference in the HTML 4 refers to RFC 2396 which is obsolete by RFC 3986.  The latter document has a new section 2.5: "Identifying Data", containing the following new material:

URI characters provide identifying data for each of the URI components, serving as an external interface for identification between systems.  Although the presence and nature of the URI production interface is hidden from clients that use its URIs (and is thus beyond the scope of the interoperability requirements defined by this specification), it is a frequent source of confusion and errors in the interpretation of URI character issues.  Implementers have to be aware that there are multiple character encodings involved in the production and transmission of URIs: local name and data encoding, public interface encoding, URI character encoding, data format encoding, and protocol encoding.

Local names, such as file system names, are stored with a local character encoding.  URI producing applications (e.g., origin servers) will typically use the local encoding as the basis for producing meaningful names.  The URI producer will transform the local encoding to one that is suitable for a public interface and then transform the public interface encoding into the restricted set of URI characters (reserved, unreserved, and percent-encodings). Those characters are, in turn, encoded as octets to be used as a reference within a data format (e.g., a document charset), and such data formats are often subsequently encoded for transmission over Internet protocols.

The new statements above are slightly incompatible with what HTML URI encoding specification says:

URIs do not contain non-ASCII values

That statement is true for what the RFC calls "public interface encoding": it seems reasonable that the user agent should use an URL when it requests an external resource; however, requiring that HTML documents should use a public URI for resources that the user agent is expected to serve without communicating with an external server, such as local files identified using then file scheme, seems an excessive complication to me.  Internet Explorer does not respect this prohibition because it uses IRIs, not URIs, internally, and converts them to URLs if needed when it communicates with an external server.  If an external URL is specified in the source document as percent-encoded, it is passed without altering because encoding is not needed and the server is responsible for decoding; however, there is no server to decode a local URL and it remains unresolved.  That is not compliant with the current standard, but I think in this case the implementation is right and the standard needs some freedom with respect to local URLs.

Of course, one could always do away with an argument that an HTML document containing reference to a local resource cannot be published and can be authored as noncompliant.  However, this is only partially true. ; The reason is that the prohibition of B.2.1 propagated to the XSLT specification that refers to it explicitly where it specifies how URI attributes should be transformed in html mode.  In effect, a document produced by a conforming XSLT processor for local usage is perfectly valid and perfectly useless: hyperlinks are broken and images do not show up.

·        My suggestion: The constraints for URLs denoting local resources should be relaxed.

I understand that this is fixed by HTML 5, so this is perhaps the good news:

The href content attribute, if specified, must contain a URI (or IRI).

Best regards,

Christopher Yeleighton

RE: file URL is overspecified
country flaguser name
United Kingdom
2007-06-15 07:11:31
>The reason is that the prohibition of B.2.1 propagated to the XSLT specification that refers to it explicitly where it specifies how URI attributes should be transformed in html mode.  In effect, a document produced by a conforming XSLT processor for local usage is perfectly valid and perfectly useless: hyperlinks are broken and images do not show up.
 
To help you get round the difference between what the HTML spec says and what current browsers do, XSLT 2.0 introduced the serialization parameter escape-uri-attributes="no", giving the XSLT author control over whether and which ;URIs in generated HTML pages are percent-encoded. ;Of course, this is only a small amelioration to this messy problem; but it helps.
 
Michael Kay


From: xsl-editors-requestw3.org [mailtosl-editors-requestw3.org] On Behalf Of Kristof Zelechovski
Sent: 15 June 2007 11:50
To: www-htmlw3.org
Cc: 'Tim Berners-Lee'; xsl-editorsw3.org; whatwgwhatwg.org
Subject: file URL is overspecified

The URI reference in the HTML 4 refers to RFC 2396 which is obsolete by RFC 3986.  The latter document has a new section 2.5: "Identifying Data", containing the following new material:

URI characters provide identifying data for each of the URI components, serving as an external interface for identification between systems.  Although the presence and nature of the URI production interface is hidden from clients that use its URIs (and is thus beyond the scope of the interoperability requirements defined by this specification), it is a frequent source of confusion and errors in the interpretation of URI character issues.  Implementers have to be aware that there are multiple character encodings involved in the production and transmission of URIs: local name and data encoding, public interface encoding, URI character encoding, data format encoding, and protocol encoding.

Local names, such as file system names, are stored with a local character encoding.  URI producing applications (e.g., origin servers) will typically use the local encoding as the basis for producing meaningful names.  The URI producer will transform the local encoding to one that is suitable for a public interface and then transform the public interface encoding into the restricted set of URI characters (reserved, unreserved, and percent-encodings). Those characters are, in turn, encoded as octets to be used as a reference within a data format (e.g., a document charset), and such data formats are often subsequently encoded for transmission over Internet protocols.

The new statements above are slightly incompatible with what HTML URI encoding specification says:

URIs do not contain non-ASCII values

That statement is true for what the RFC calls "public interface encoding": it seems reasonable that the user agent should use an URL when it requests an external resource; however, requiring that HTML documents should use a public URI for resources that the user agent is expected to serve without communicating with an external server, such as local files identified using then file scheme, seems an excessive complication to me.  Internet Explorer does not respect this prohibition because it uses IRIs, not URIs, internally, and converts them to URLs if needed when it communicates with an external server.  If an external URL is specified in the source document as percent-encoded, it is passed without altering because encoding is not needed and the server is responsible for decoding; however, there is no server to decode a local URL and it remains unresolved.  That is not compliant with the current standard, but I think in this case the implementation is right and the standard needs some freedom with respect to local URLs.

Of course, one could always do away with an argument that an HTML document containing reference to a local resource cannot be published and can be authored as noncompliant.  However, this is only partially true. ; The reason is that the prohibition of B.2.1 propagated to the XSLT specification that refers to it explicitly where it specifies how URI attributes should be transformed in html mode.  In effect, a document produced by a conforming XSLT processor for local usage is perfectly valid and perfectly useless: hyperlinks are broken and images do not show up.

·        My suggestion: The constraints for URLs denoting local resources should be relaxed.

I understand that this is fixed by HTML 5, so this is perhaps the good news:

The href content attribute, if specified, must contain a URI (or IRI).

Best regards,

Christopher Yeleighton

RE: file URL is overspecified
country flaguser name
Poland
2007-06-17 04:15:51

MSXML does not respect the attribute escape-uri-attributes.  It seems the best way to go in the Microsoft world is to use XML mode output mode to generate XHTML and convert it to HTML using the native HTML Document object.  This was not particularly difficult, you can see the source code here.  I admit it is rather inefficient but I wanted to use existing components and to make the code short̵2;the code is still too long to just paste it.

On the other hand, if you want to use the xsl-stylesheet instruction to generate the HTML code on the fly, it is possible to fix the broken links using decodeURI in the onLoad event handler; the downside is that the page will flash because the images will be invalid on the outset.

That was just for the record, sorry for disturbing you if consider this information useless.  I shall welcome all your comments otherwise.

Best regards

Christopher Yeleighton

 


From: Michael Kay [mailto:mhkmhk.me.uk]
Sent: Friday, June 15, 2007 2:12 PM
To: 'Kristof Zelechovski'; www-htmlw3.org
Cc: 'Tim Berners-Lee'; xsl-editorsw3.org; whatwgwhatwg.org
Subject: RE: file URL is overspecified

 

>The reason is that the prohibition of B.2.1 propagated to the XSLT specification that refers to it explicitly where it specifies how URI attributes should be transformed in html mode.  In effect, a document produced by a conforming XSLT processor for local usage is perfectly valid and perfectly useless: hyperlinks are broken and images do not show up.

 

To help you get round the difference between what the HTML spec says and what current browsers do, XSLT 2.0 introduced the serialization parameter escape-uri-attributes="no", giving the XSLT author control over whether and which ;URIs in generated HTML pages are percent-encoded. Of course, this is only a small amelioration to this messy problem; but it helps.

 

Michael Kay

RE: file URL is overspecified
country flaguser name
United Kingdom
2007-06-17 06:46:16
 

MSXML does not respect the attribute escape-uri-attributes.   

 

MSXML doesn't implement XSLT 2.0, so that's not surprising. I think you have an issue with the products and not with the W3C specs, so this list isn't going to help you much.

 

Michael Kay

http://www.saxonica.com/ 

[1-4]

about | contact  Other archives ( Real Estate discussion Medical topics )