|
List Info
Thread: Working with search engines
|
|
| Working with search engines |

|
2006-09-14 22:04:12 |
Hi,
Welcome all. That's my first post here. I'm not sure if
this is the
right place - I mean there are some discussions directly on
a
jspwiki.org website
I use jspwiki (2.2.33 still) on my site. Being aware of
search engines I
wanted to prepare my site to be indexed properly. I use
container
authentication (form based login) to limit access to
edit/delete pages).
So I turned on ShortViewURLConstructor to get rid of
Wiki.jsp?page=xxx URLs.
I put the following robots.txt file:
User-agent: *
Disallow: /wiki/Edit.jsp
Disallow: /wiki/rss.jsp
Disallow: /wiki/Diff.jsp
Disallow: /wiki/PageInfo.jsp
Disallow: /wiki/Upload.jsp
Disallow: /wiki/wiki/UserPreferences
Disallow: /wiki/UserPreferences.jsp
I don't like to have meta-description the same all over the
website, so
I've added the following to commonheader.jsp to place it
only on the
start page:
<% String pageP =
request.getParameter("page");
if (pageP==null || "Start".equals(pageP)) {
%>
<meta name="description"
content="blah, blah..."/>
<% } %>
Today, I saw in the apache access log googlebot indexing
previous
versions of my pages (with "GET
/wiki/wiki/xxx?version=2 HTTP/1.1") - it
has indexed PageInfo's URLs before my changes in
robots.txt. So I've
added another kludge to commonheader.jsp:
<% if (request.getParameter("version") !=
null) { %>
<meta name="robots"
content="noindex, follow" />
<% } %>
Am I paranoid? I wonder
if anyone else has similiar experiences.
In my opinion my robots.txt is good. I see no point in
indexing those
pages. I don't want to index previous versions of my pages
also - to
save bandwith, and because the old ones are just not up to
date.
Any opinions?
Regards.
--
Mikolaj Rydzewski <miki ceti.pl> http://ceti.pl/~miki/
PGP KeyID: 8b12ab02
There are three kinds of people: men, women and unix.
_______________________________________________
Jspwiki-users mailing list
Jspwiki-users ecyrd.com
http://ecyrd.com/cgi-bin/mailman/listinfo/jspwiki-users
a>
|
|
| Working with search engines |

|
2006-09-15 06:56:03 |
Nope, that's not too paranoid. Rampant bots can cause
quite a lot of
havoc on the pages, including things like locking pages and
putting
your entire repository on its knees.
The 2.4 ViewTemplate.jsp has the following code by default:
<wiki:CheckVersion mode="notlatest">
<meta name="robots"
content="noindex,nofollow" />
</wiki:CheckVersion>
This stops robots from indexing older pages. This is mostly
due to
spam, though.
In my own robots.txt, I also have the same pages disallowed
as you
(except for rss.jsp).
/Janne
On Sep 15, 2006, at 01:04 , Mikolaj Rydzewski wrote:
> Hi,
>
> Welcome all. That's my first post here. I'm not sure
if this is the
> right place - I mean there are some discussions
directly on a
> jspwiki.org website
>
> I use jspwiki (2.2.33 still) on my site. Being aware of
search
> engines I wanted to prepare my site to be indexed
properly. I use
> container authentication (form based login) to limit
access to edit/
> delete pages).
>
> So I turned on ShortViewURLConstructor to get rid of
Wiki.jsp?
> page=xxx URLs.
>
> I put the following robots.txt file:
>
> User-agent: *
> Disallow: /wiki/Edit.jsp
> Disallow: /wiki/rss.jsp
> Disallow: /wiki/Diff.jsp
> Disallow: /wiki/PageInfo.jsp
> Disallow: /wiki/Upload.jsp
> Disallow: /wiki/wiki/UserPreferences
> Disallow: /wiki/UserPreferences.jsp
>
>
> I don't like to have meta-description the same all
over the
> website, so I've added the following to
commonheader.jsp to place
> it only on the start page:
>
> <% String pageP =
request.getParameter("page");
> if (pageP==null || "Start".equals(pageP))
{ %>
> <meta name="description"
content="blah, blah..."/>
> <% } %>
>
> Today, I saw in the apache access log googlebot
indexing previous
> versions of my pages (with "GET
/wiki/wiki/xxx?version=2 HTTP/1.1")
> - it has indexed PageInfo's URLs before my changes in
robots.txt.
> So I've added another kludge to commonheader.jsp:
> <% if (request.getParameter("version")
!= null) { %>
> <meta name="robots"
content="noindex, follow" />
> <% } %>
>
>
> Am I paranoid? I wonder
if anyone else has similiar experiences.
>
> In my opinion my robots.txt is good. I see no point in
indexing
> those pages. I don't want to index previous versions
of my pages
> also - to save bandwith, and because the old ones are
just not up
> to date.
>
> Any opinions?
>
> Regards.
>
>
> --
> Mikolaj Rydzewski <miki ceti.pl> http://ceti.pl/~miki/
> PGP KeyID: 8b12ab02
> There are three kinds of people: men, women and unix.
>
> _______________________________________________
> Jspwiki-users mailing list
> Jspwiki-users ecyrd.com
> http://ecyrd.com/cgi-bin/mailman/listinfo/jspwiki-users
a>
_______________________________________________
Jspwiki-users mailing list
Jspwiki-users ecyrd.com
http://ecyrd.com/cgi-bin/mailman/listinfo/jspwiki-users
a>
|
|
| Working with search engines |

|
2006-09-15 06:56:03 |
Nope, that's not too paranoid. Rampant bots can cause
quite a lot of
havoc on the pages, including things like locking pages and
putting
your entire repository on its knees.
The 2.4 ViewTemplate.jsp has the following code by default:
<wiki:CheckVersion mode="notlatest">
<meta name="robots"
content="noindex,nofollow" />
</wiki:CheckVersion>
This stops robots from indexing older pages. This is mostly
due to
spam, though.
In my own robots.txt, I also have the same pages disallowed
as you
(except for rss.jsp).
/Janne
On Sep 15, 2006, at 01:04 , Mikolaj Rydzewski wrote:
> Hi,
>
> Welcome all. That's my first post here. I'm not sure
if this is the
> right place - I mean there are some discussions
directly on a
> jspwiki.org website
>
> I use jspwiki (2.2.33 still) on my site. Being aware of
search
> engines I wanted to prepare my site to be indexed
properly. I use
> container authentication (form based login) to limit
access to edit/
> delete pages).
>
> So I turned on ShortViewURLConstructor to get rid of
Wiki.jsp?
> page=xxx URLs.
>
> I put the following robots.txt file:
>
> User-agent: *
> Disallow: /wiki/Edit.jsp
> Disallow: /wiki/rss.jsp
> Disallow: /wiki/Diff.jsp
> Disallow: /wiki/PageInfo.jsp
> Disallow: /wiki/Upload.jsp
> Disallow: /wiki/wiki/UserPreferences
> Disallow: /wiki/UserPreferences.jsp
>
>
> I don't like to have meta-description the same all
over the
> website, so I've added the following to
commonheader.jsp to place
> it only on the start page:
>
> <% String pageP =
request.getParameter("page");
> if (pageP==null || "Start".equals(pageP))
{ %>
> <meta name="description"
content="blah, blah..."/>
> <% } %>
>
> Today, I saw in the apache access log googlebot
indexing previous
> versions of my pages (with "GET
/wiki/wiki/xxx?version=2 HTTP/1.1")
> - it has indexed PageInfo's URLs before my changes in
robots.txt.
> So I've added another kludge to commonheader.jsp:
> <% if (request.getParameter("version")
!= null) { %>
> <meta name="robots"
content="noindex, follow" />
> <% } %>
>
>
> Am I paranoid? I wonder
if anyone else has similiar experiences.
>
> In my opinion my robots.txt is good. I see no point in
indexing
> those pages. I don't want to index previous versions
of my pages
> also - to save bandwith, and because the old ones are
just not up
> to date.
>
> Any opinions?
>
> Regards.
>
>
> --
> Mikolaj Rydzewski <miki ceti.pl> http://ceti.pl/~miki/
> PGP KeyID: 8b12ab02
> There are three kinds of people: men, women and unix.
>
> _______________________________________________
> Jspwiki-users mailing list
> Jspwiki-users ecyrd.com
> http://ecyrd.com/cgi-bin/mailman/listinfo/jspwiki-users
a>
_______________________________________________
Jspwiki-users mailing list
Jspwiki-users ecyrd.com
http://ecyrd.com/cgi-bin/mailman/listinfo/jspwiki-users
a>
|
|
| Working with search engines |

|
2006-09-15 06:56:03 |
Nope, that's not too paranoid. Rampant bots can cause
quite a lot of
havoc on the pages, including things like locking pages and
putting
your entire repository on its knees.
The 2.4 ViewTemplate.jsp has the following code by default:
<wiki:CheckVersion mode="notlatest">
<meta name="robots"
content="noindex,nofollow" />
</wiki:CheckVersion>
This stops robots from indexing older pages. This is mostly
due to
spam, though.
In my own robots.txt, I also have the same pages disallowed
as you
(except for rss.jsp).
/Janne
On Sep 15, 2006, at 01:04 , Mikolaj Rydzewski wrote:
> Hi,
>
> Welcome all. That's my first post here. I'm not sure
if this is the
> right place - I mean there are some discussions
directly on a
> jspwiki.org website
>
> I use jspwiki (2.2.33 still) on my site. Being aware of
search
> engines I wanted to prepare my site to be indexed
properly. I use
> container authentication (form based login) to limit
access to edit/
> delete pages).
>
> So I turned on ShortViewURLConstructor to get rid of
Wiki.jsp?
> page=xxx URLs.
>
> I put the following robots.txt file:
>
> User-agent: *
> Disallow: /wiki/Edit.jsp
> Disallow: /wiki/rss.jsp
> Disallow: /wiki/Diff.jsp
> Disallow: /wiki/PageInfo.jsp
> Disallow: /wiki/Upload.jsp
> Disallow: /wiki/wiki/UserPreferences
> Disallow: /wiki/UserPreferences.jsp
>
>
> I don't like to have meta-description the same all
over the
> website, so I've added the following to
commonheader.jsp to place
> it only on the start page:
>
> <% String pageP =
request.getParameter("page");
> if (pageP==null || "Start".equals(pageP))
{ %>
> <meta name="description"
content="blah, blah..."/>
> <% } %>
>
> Today, I saw in the apache access log googlebot
indexing previous
> versions of my pages (with "GET
/wiki/wiki/xxx?version=2 HTTP/1.1")
> - it has indexed PageInfo's URLs before my changes in
robots.txt.
> So I've added another kludge to commonheader.jsp:
> <% if (request.getParameter("version")
!= null) { %>
> <meta name="robots"
content="noindex, follow" />
> <% } %>
>
>
> Am I paranoid? I wonder
if anyone else has similiar experiences.
>
> In my opinion my robots.txt is good. I see no point in
indexing
> those pages. I don't want to index previous versions
of my pages
> also - to save bandwith, and because the old ones are
just not up
> to date.
>
> Any opinions?
>
> Regards.
>
>
> --
> Mikolaj Rydzewski <miki ceti.pl> http://ceti.pl/~miki/
> PGP KeyID: 8b12ab02
> There are three kinds of people: men, women and unix.
>
> _______________________________________________
> Jspwiki-users mailing list
> Jspwiki-users ecyrd.com
> http://ecyrd.com/cgi-bin/mailman/listinfo/jspwiki-users
a>
_______________________________________________
Jspwiki-users mailing list
Jspwiki-users ecyrd.com
http://ecyrd.com/cgi-bin/mailman/listinfo/jspwiki-users
a>
|
|
| Working with search engines |

|
2006-09-15 06:56:03 |
Nope, that's not too paranoid. Rampant bots can cause
quite a lot of
havoc on the pages, including things like locking pages and
putting
your entire repository on its knees.
The 2.4 ViewTemplate.jsp has the following code by default:
<wiki:CheckVersion mode="notlatest">
<meta name="robots"
content="noindex,nofollow" />
</wiki:CheckVersion>
This stops robots from indexing older pages. This is mostly
due to
spam, though.
In my own robots.txt, I also have the same pages disallowed
as you
(except for rss.jsp).
/Janne
On Sep 15, 2006, at 01:04 , Mikolaj Rydzewski wrote:
> Hi,
>
> Welcome all. That's my first post here. I'm not sure
if this is the
> right place - I mean there are some discussions
directly on a
> jspwiki.org website
>
> I use jspwiki (2.2.33 still) on my site. Being aware of
search
> engines I wanted to prepare my site to be indexed
properly. I use
> container authentication (form based login) to limit
access to edit/
> delete pages).
>
> So I turned on ShortViewURLConstructor to get rid of
Wiki.jsp?
> page=xxx URLs.
>
> I put the following robots.txt file:
>
> User-agent: *
> Disallow: /wiki/Edit.jsp
> Disallow: /wiki/rss.jsp
> Disallow: /wiki/Diff.jsp
> Disallow: /wiki/PageInfo.jsp
> Disallow: /wiki/Upload.jsp
> Disallow: /wiki/wiki/UserPreferences
> Disallow: /wiki/UserPreferences.jsp
>
>
> I don't like to have meta-description the same all
over the
> website, so I've added the following to
commonheader.jsp to place
> it only on the start page:
>
> <% String pageP =
request.getParameter("page");
> if (pageP==null || "Start".equals(pageP))
{ %>
> <meta name="description"
content="blah, blah..."/>
> <% } %>
>
> Today, I saw in the apache access log googlebot
indexing previous
> versions of my pages (with "GET
/wiki/wiki/xxx?version=2 HTTP/1.1")
> - it has indexed PageInfo's URLs before my changes in
robots.txt.
> So I've added another kludge to commonheader.jsp:
> <% if (request.getParameter("version")
!= null) { %>
> <meta name="robots"
content="noindex, follow" />
> <% } %>
>
>
> Am I paranoid? I wonder
if anyone else has similiar experiences.
>
> In my opinion my robots.txt is good. I see no point in
indexing
> those pages. I don't want to index previous versions
of my pages
> also - to save bandwith, and because the old ones are
just not up
> to date.
>
> Any opinions?
>
> Regards.
>
>
> --
> Mikolaj Rydzewski <miki ceti.pl> http://ceti.pl/~miki/
> PGP KeyID: 8b12ab02
> There are three kinds of people: men, women and unix.
>
> _______________________________________________
> Jspwiki-users mailing list
> Jspwiki-users ecyrd.com
> http://ecyrd.com/cgi-bin/mailman/listinfo/jspwiki-users
a>
_______________________________________________
Jspwiki-users mailing list
Jspwiki-users ecyrd.com
http://ecyrd.com/cgi-bin/mailman/listinfo/jspwiki-users
a>
|
|
[1-5]
|
|