List Info

Thread: Re: doctype frustration




Re: doctype frustration
country flaguser name
Canada
2007-08-04 14:11:00

Hello,

On 04/08/2007, Tim Seifert wrote in Digest Number 1935:

>>; The head of the document includes:
>>
>>; <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN&quot;
>&gt; "http://www.w3.org/TR/html4/loose.dtd&quot;>
&gt;
> It doesn't look typographically incorrect, here's the one from the HTML
>; specifications:
>;
> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN&quot;
> "http://www.w3.org/TR/html4/loose.dtd&quot;>

Which is what I used.

>
> But we'd need to see your actual page, to be sure. Is it just the
> articles one that you mention below, or other pages on the site. The

All of them, but I've changed them as of this morning.

> page checked out fine, when I tried it with the W3C validator, using my
> browser to send a copy of the file. Likewise, it checked out fine with
>; the WDG one being given the address directly. But putting the articles
> address directly into the W3C got the tentative assessment. That
>; suggests server or validator issues.

Which browser did you use to send the file? Ibrowse? And you sent it how? By
upload, or be referring it by URI?

I'm asking because with either method using IBrowse to either upload or
direct the validator to the page, the page does NOT pass validation.

Which then causes me to speculate if IBrowse does contain a glitch after
all, given that you do not find a problem with the validation? However, the
same problem occurs with AWeb, so I'm inclined to think it's not an IBrowse
problem.

>
> I seem to recall reading problems with validation when the DOCTYPE
> wasn't split in two lines, as above. But that was a bug in the
> validator, it's not an important matter. Likewise, I can remember
> hearding grumblings about its handling of CHARSET information in the
> META statements. It was years ago that I heard those comments.
>
>>; <HTML&gt;
>&gt; <HEAD&gt;
>&gt; <META HTTP-EQUIV="Content-Type&quot; CONTENT=&quot;text/html;CHARSET=iso-8859-1">;
>
> Try it with a space after text/html; before the CHARSET. Though I
> noticed that your file is like that when I tested it. But it really
&gt; should be done with HTTP headers, not as part of the file.
&gt;
>>; <META NAME=";AUTHOR&quot; CONTENT=&quot;KRONSFELD CEMETERY&quot;>
>> <META NAME=";revisit-after" CONTENT=&quot;15-days">

See other messages posted re. combinations of spacing and case. It's the
CHARSET word that causes it to choke if if it's uppercase. It must be
lowercase. And I think that is a bug.

You say it should be done with HTTP headers, but what should be the
construction of an http header? Are you saying and http header this can be
part of the html document? I don't find any information in the html
specification on how to construct and http header. Isn't sending an http
header the server's responsibility?

Well, I supposed this is actually some kind of http header,

<!DOCTYPE HTML PUBLIC &quot;-//W3C//DTD HTML 4.01 Transitional//EN&quot; &quot;http://www.w3.org/TR/html4/loose.dtd&quot;>

[Note to users: please don't copy the line above, because I've deliberately
inserted non-breaking spaces for illustration sake]

If you would be so kind as to offer an example of a correct http header,
that would be appreciated

>> Perhaps it's the server delivering the pages that is causing the problem.
>
> [timbigblack ~]$ lynx --head
&gt; http://www.mts.net/~ejunrau/kronsfeld/articles.html
>
> HTTP/1.1 200 OK
> Server: Sun-ONE-Web-Server/6.1
> Date: Sat, 04 Aug 2007 06:57:11 GMT
> Content-length: 3591
>; Content-type: text/html
> Last-modified: Fri, 03 Aug 2007 17:07:17 GMT
> Accept-ranges: bytes
&gt; Connection: close
&gt;
> Looks okay, albeit it doesn't have the charset information that you're
&gt; providing with the META statement. This is where it should really be.

That suggests that your server is also case sensitive when it comes to that
line in question. I've uploaded new versions of the pages. Try it again and
see what happens.

> If you can't configure the server, you might want to see if it accepts

I don't see how I can configure the server, but I could bring this to the
attention of the ISP's tech support and see what they say. Note that the
ISP's server delivering the pages for personal web pages such as at
http://www.mts.net/~ejunrau/ is case sensitive. I belive it's some kind of
unix-based server. for example "InDeX.htMl"; is not the same as
";index.html". Apparently their "business" server is or was something to do
with a Windows 2000 server when I last had occasion to ask about it.

&gt; you putting .htaccess files on the server. That might allow you to do
> something like the following to set the HTTP headers correctly:
>
> AddDefaultCharset iso-8859-1
>
> That's actually for the Apache server, but the technique is sometimes
> copied by other servers. Of course, be certain that you are using that
>; charset.
>
> If you serve plain text files anywhere on the server, you really NEED to
> be setting the charset in the HTTP headers. Only a HTML file can use a

To echo my question above, how is that done?

> META statement to do what should be done by the HTTP headers. Without
> information about the charset there's some common, and sometimes
> problematical, assumptions: Unidentified by MIME text is treated as
> ASCII. The old default for unidentified webserved text was iso-8859-1.
> A new assumed default is UTF-8 (which is compatible with ASCII, but not
> iso-8859-1).

That is indeed what the validation server is doing, falling back to UTF-8
since it can't find the charset without the "fixed" syntax.

>> So I changed the code to read,
&gt;>
>>; <META http-equiv="content-type&quot; content=&quot;text/html;charset=iso-8859-1">;
>>;
>>; instead of,
>>
>>; <META HTTP-EQUIV="Content-Type&quot; CONTENT=&quot;text/html;CHARSET=iso-8859-1">;
>>;
>>; and it passes validation. I didn't take the time to test out if only the
>> CHARSET parameter needed to be lowercased -- perhaps Bonnie tested that
>;> -- but I thought for good measure I'd do the rest of the line also.
&gt;>
>>; I don't think this is correct validator behavior. Since when is html case
>;> sensitive? Granted that CHARSET in the way it is used there is a
>&gt; parameter and not a tag, IMHO, but I still don't think that justifies
>> that behavior.
>
> It's not HTML that's case sensitive, so to speak, in this regard, but
> the parser. XHTML is case sensitive. I don't recall whether HTTP
>; header names are, though do recall reading that the parameters aren't
&gt; (e.g. ISO-8859-1).
>
> This is a HTTP "Content-Type&quot; header as received from a server (below).
> Note the case of the entire thing (header name, and header contents).
>
> Content-Type: text/html; charset=iso-8859-1
>
> If you're simulating headers, it'd be best to simulate them identically.

No, I'm not simulating headers. How is it done?

Regards
--

Ernest Unrau
Morden, Manitoba
CANADA
E-mail: saskwatch%40mts.net">saskwatchmts.net

__._,_.___
.

__,_._,___
Re: Re: doctype frustration
country flaguser name
United States
2007-08-04 16:07:07

for what it is worth - when I checked Ernest's initial page I was in
the linux side of my dual boot amithlon/ubuntu linux system and I
used the w3 web site from Forefox. I downloaded a copy of his page (by
saving to disk via the browser and then I changed the code and
verified it with Firefox to the W3 verifyer. I can go into Ibrowse on
the same system and do it all over but my conclusion is that the
meta tag for the font specification needed a lower case for the font
type and this is probably browser submitting to W3 independent.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Bonnie Dalzell, MA
mail:5100 Hydes Rd ---- Hydes MD USA 21082-----EMAIL: bdalzell%40qis.net">bdalzellqis.net

freelance anatomist, vertebrate paleontologist, writer, illustrator, dog
breeder, computer nerd & iconoclast... Borzoi info at www.borzois.com.

Editor Net.Pet Online Animal Magazine - http://www.netpetmagazine.com
HOME http://www.qis.net/~borzoi/ BUSINESS http://www.batw.com

__._,_.___
Recent Activity
Visit Your Group
SPONSORED LINKS
Yahoo! TV

Staying in tonight?

Check Daily Picks &

see what to watch.

Moderator Central

An online resource

for moderators

of Yahoo! Groups.

Yahoo! Groups

Join a yoga group

and take the stress

out of your life.

Re: Re: doctype frustration
country flaguser name
Australia
2007-08-04 22:58:58

Tim:
&gt;> page checked out fine, when I tried it with the W3C validator, using my
>&gt; browser to send a copy of the file. Likewise, it checked out fine with
>;> the WDG one being given the address directly. But putting the articles
>> address directly into the W3C got the tentative assessment. That
>;> suggests server or validator issues.

Ernest Unrau:
&gt; Which browser did you use to send the file? Ibrowse? And you sent it how? By
> upload, or be referring it by URI?
>;
> I'm asking because with either method using IBrowse to either upload or
> direct the validator to the page, the page does NOT pass validation.

The browser I used was Opera on Linux. Looking at the differences, and
the messages I see when doing it, I presume it uploaded the page as a
file, rather than gives the URI to the validator.

Web browsers that give you an easy way to validate a page (such as with
right-click options), can do it in different ways: Pass the page
address to be checked to the validator, or upload the page that you
downloaded to the validator as a file. The second way adds your browser
as part of the equation, and its handling, can affect the results, *and*
the server might have sent you something different than it might send to
others. It's not a good way to test things.

What matters is how the validator reacts when given the URI, as it's
seeing your page as any other browser will get it (served from your
webserver).

> You say it should be done with HTTP headers, but what should be the
> construction of an http header? Are you saying and http header this can be
> part of the html document? I don't find any information in the html
>; specification on how to construct and http header. Isn't sending an http
>; header the server's responsibility?

The HTML specification is just about how the pages are formed. HTTP is
how the serving works (Hyper Text Transfer Protocol), whether it's
serving pages, images, or other data. That's defined by another
standard.

When you ask for something from a webserver, you send a HTTP request.
It includes lists of the things your browser will accept (text, html,
JPEGs, etc.), languages that you can read (configured by you), along
with the address for the data that you want. The server responds with a
HTTP header, and if possible, follows it with what you asked for. If it
can't, the HTTP headers include an error code, and possibly some written
information for you to read. The HTTP headers give information *about*
what you will receive. The browser then, according to the
specifications, acts upon that data in the manner appropriate to the
description. That is how you can see a written 404 error page when
asking for something like an image that wasn't there. It doesn't
mattter what the URI is, if the server says you're getting text/html
you're getting it.

The server identifies the type of data before sending it to you. It can
do that by actually assessing the data, it can do that by presuming that
all data served from certain directories are certain types, it can
presume that data served with certain filenames are certain types. On
the server is the only place that filenames really mean anything, and
only if that server works that way.

> Well, I supposed this is actually some kind of http header,
>
>
> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN&quot; "http://www.w3.org/TR/html4/loose.dtd&quot;>

Nope, that's just a doctype that identifies the *type* and version of
HTML that the page is.

&gt; If you would be so kind as to offer an example of a correct http header,
> that would be appreciated

I already had, there's an example of the received HTTP headers right
below. I used the lynx web browser (a text one), telling it to just
show me the headers (with the --head option). Note that they are all
generated by the server.

All that's required of you (as the site author) is to ensure that the
server knows what encoding scheme you've used. If the server was
preconfigured with one scheme, you could write your pages in the same
one. But I can see that it's not (it said just "text/html"). SysAdmins
often leave it that way, because their clients might use all manner of
different types. Serving pages with the wrong one is worse than serving
it with none. If there's a HTTP header providing the information, it
overrides anything else. Clients would have no way of saying anything
to the contrary, the server answer is absolute.

>&gt; [timbigblack ~]$ lynx --head
&gt;> http://www.mts.net/~ejunrau/kronsfeld/articles.html
>&gt;
>>; HTTP/1.1 200 OK
>&gt; Server: Sun-ONE-Web-Server/6.1
>&gt; Date: Sat, 04 Aug 2007 06:57:11 GMT
>> Content-length: 3591
>;> Content-type: text/html
>> Last-modified: Fri, 03 Aug 2007 17:07:17 GMT
>> Accept-ranges: bytes
&gt;> Connection: close
&gt;>
>>; Looks okay, albeit it doesn't have the charset information that you're
&gt;> providing with the META statement. This is where it should really be.

&gt; That suggests that your server is also case sensitive when it comes to that
>; line in question. I've uploaded new versions of the pages. Try it again and
> see what happens.

That's what *your* server sent. Most servers do not pay attention to
anything written in the HTML files. Some did, in the past, that's where
the META statements had their origins. The server would look at your
page, and then adjust its headers to suit. Later, it became a way of
sending information that clients had no way of programming the server to
send, as some browsers started paying attention to the information in
the HTML.

Some web browsers do things even worse, like MSIE, which can ignore all
descriptive information and start making guesses about what encoding the
page is. It often gets it wrong. I used to have ASCII pages, sent with
proper ASCII declaration. MSIE decided that it was something else, and
stuffed up the pages it received, some links stopped working, as it
changed the characters written in the link addresses. That's an example
of why it's a bad idea to subvert the original technique (HTTP headers,
which worked perfectly fine), with things that are bad substitutes;
stupid mistakes occur.

>> If you can't configure the server, you might want to see if it accepts

> I don't see how I can configure the server, but I could bring this to the
> attention of the ISP's tech support and see what they say.

That's why I mentioned the .htaccess files as Apache uses, and some
other webservers, too - so you can see if yours does. You'd stick
a .htaccess file in any directory you want to customise, and write
server configuration directives into it. That allows you to customise
that directory, and all its children. It's a useful way of letting
clients customise their websites, when they don't have any access to
server configuration. You want to have a look for a manual for your
server software ("Sun ONE Web Server&quot;), and see how it does it.

&gt; Note that the ISP's server delivering the pages for personal web pages such as at
> http://www.mts.net/~ejunrau/ is case sensitive. I belive it's some kind of
> unix-based server. for example "InDeX.htMl"; is not the same as
> "index.html";. Apparently their "business" server is or was something to do
> with a Windows 2000 server when I last had occasion to ask about it.

That's quite common. There's all manner of free *ix HTTP servers, but
Windows' one costs them, so only paying customers get it. Ironically,
the free Apache server is generally much better. Yours is using one by
Sun Microsystems, according to the HTTP headers I saw. I see that there
are docs for it at <http://docs.sun.com/app/docs/coll/S1_websvr61_en&gt;,
but a brief scan through them doesn't look very helpful. I've found
system administrator configuration information, the only thing I've
found about user customisation and .htaccess files is about "access":
<http://docs.sun.com/source/817-1831-10/agaccess.html#wp1011824>

Try what I mentioned before, though. Create a .htaccess file and drop
it into your root directory (".htaccess"; is the filename), with the
following text inside it:

AddDefaultCharset iso-8859-1

It might just accept the same thing, or perhaps some variant of that
information. But you might be stuck with using the second-rate method
of using meta statements in your HTML.

>> If you're simulating headers, it'd be best to simulate them identically.

>; No, I'm not simulating headers. How is it done?

Yes you are. The META statements in HTML are an attempt to do the same
job as the HTTP headers that a server sends before you get the HTML.
Hint - the "http-equiv"; bit tells you it's trying to be an equivelent of
a HTTP header.

<meta http-equiv="Content-Type&quot; content=&quot;text/html; charset=iso-8859-1&quot;>

It's a meta statement element.
It's trying to be a equivelent of the HTTP Content-Type header.
And for that header's content to be text/html; charset=iso-8859-1.

I have some website authoring information on my own site, but not so
much in the way of website serving information, at this time:
&lt;http://www.cameratim.com/computing/web-authoring-guide/contents>

--

Regards,
Tim Seifert.

__________________________________________________________
I've got a very bad feeling about this.
-- Han Solo

__._,_.___
.

__,_._,___
[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )