|
List Info
Thread: Looking for broken links
|
|
| Looking for broken links |

|
2006-11-23 01:59:54 |
|
Jamie L. Mitchell wrote:
> I am working on a test automation project; one of things that I need
> to do is find a fast way of looking for broken links in a script.
> Since the tool I am using does not easily support this feature, I was
> thinking of building a Delphi DLL which would do the following
> (somehow). Given a link, make sure the server actually is available,
> using the URL base (for example on my own web site, it would check to
> make sure that http://www.go-tac.com/ was actually there), and then
> try to look for the page itself. I would then call the DLL repeatedly
> from the automation tool script.
>
> My thought was to do a 'get' using an HTTP product. If the get
> failed, I would know that the link was broken. Clearly, this topic is
> not in my round house. I would like to bring as little information
> back as possible - speed matters. Is it possible to do a 'HEAD' to
> bring back less information? What would be the best way to handle this?
Indy would make it very easy. Its HTTP component has Get and Head
methods. If they succeed, they return the data they downloaded. If they
fail, they raise exceptions related to the cause of the problem.
You don't need to do a two-step check, doing the server and the specific
page separately. Just look for the page. If the server isn't available,
you'll find out anyway. (Furthermore, the page at the root might not
exist even if deeper pages do.)
--
Rob
__._,_.___
.
__,_._,___
|
| Looking for broken links |

|
2006-11-23 05:54:03 |
|
The server may be temporarily down. If DNS info comes back, that's a better sign that the URL is valid. Therefore a ping or maybe tracert would be best with a timeout limit imposed.
Dave
Rob Kennedy < rkennedy%40cs.wisc.edu">rkennedy cs.wisc.edu> wrote:
Jamie L. Mitchell wrote:
> I am working on a test automation project; one of things that I need
> to do is find a fast way of looking for broken links in a script.
> Since the tool I am using does not easily support this feature, I was
> thinking of building a Delphi DLL which would do the following
> (somehow). Given a link, make sure the server actually is available,
> using the URL base (for example on my own web site, it would check to
> make sure that http://www.go-tac.com/ was actually there), and then
> try to look for the page itself. I would then call the DLL repeatedly
> from the automation tool script.
>
> My thought was to do a 'get' using an HTTP product. If the get
> failed, I would know that the link was broken. Clearly, this topic is
> not in my round house. I would like to bring as little information
> back as possible - speed matters. Is it possible to do a 'HEAD' to
> bring back less information? What would be the best way to handle this?
Indy would make it very easy. Its HTTP component has Get and Head
methods. If they succeed, they return the data they downloaded. If they
fail, they raise exceptions related to the cause of the problem.
You don't need to do a two-step check, doing the server and the specific
page separately. Just look for the page. If the server isn't available,
you'll find out anyway. (Furthermore, the page at the root might not
exist even if deeper pages do.)
--
Rob
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
[Non-text portions of this message have been removed]
__._,_.___
.
__,_._,___
|
| Looking for broken links |

|
2006-11-23 15:24:35 |
|
David Smith wrote:
> The server may be temporarily down. If DNS info comes back, that's a better sign that the URL is valid.
I think an HTTP request for the resource coming back with a reply of
"200 OK" is a better sign that the URL is valid.
DNS information will simply indicate that the domain name exists, but
that doesn't warrant a conclusion that all addresses at that domain
refer to valid resources.
> Therefore a ping or maybe tracert would be best with a timeout limit imposed.
If the server is temporarily down, then a ping won't come back, either.
Like I said, if Get or Head fail, you'll get an exception telling you
why. Maybe it's an EIdConnectTimeout, or EIdResolveError, or
EIdHTTPProtocolException. The latter will also include an HTTP response
code. An HTTP request will include a DNS lookup anyway to resolve the
host name in the URL.
The key is that the code to initiate the command can be just one line.
Everything else is error checking, which is the primary task of Jamie's
code.
--
Rob
__._,_.___
.
__,_._,___
|
| Looking for broken links |

|
2006-11-23 16:59:48 |
|
OK. I think we agree on the main point: The meat of the app. will be in the response processing.
Dave
Rob Kennedy < rkennedy%40cs.wisc.edu">rkennedy cs.wisc.edu> wrote:
David Smith wrote:
> The server may be temporarily down. If DNS info comes back, that's a better sign that the URL is valid.
I think an HTTP request for the resource coming back with a reply of
"200 OK" is a better sign that the URL is valid.
DNS information will simply indicate that the domain name exists, but
that doesn't warrant a conclusion that all addresses at that domain
refer to valid resources.
> Therefore a ping or maybe tracert would be best with a timeout limit imposed.
If the server is temporarily down, then a ping won't come back, either.
Like I said, if Get or Head fail, you'll get an exception telling you
why. Maybe it's an EIdConnectTimeout, or EIdResolveError, or
EIdHTTPProtocolException. The latter will also include an HTTP response
code. An HTTP request will include a DNS lookup anyway to resolve the
host name in the URL.
The key is that the code to initiate the command can be just one line.
Everything else is error checking, which is the primary task of Jamie's
code.
--
Rob
---------------------------------
Everyone is raving about the all-new Yahoo! Mail beta.
[Non-text portions of this message have been removed]
__._,_.___
.
__,_._,___
|
[1-4]
|
|