List Info

Thread: Working with search engines




Working with search engines
user name
2006-09-14 22:04:12
Hi,

Welcome all. That's my first post here. I'm not sure if
this is the 
right place - I mean there are some discussions directly on
a 
jspwiki.org website

I use jspwiki (2.2.33 still) on my site. Being aware of
search engines I 
wanted to prepare my site to be indexed properly. I use
container 
authentication (form based login) to limit access to
edit/delete pages).

So I turned on ShortViewURLConstructor to get rid of
Wiki.jsp?page=xxx URLs.

I put the following robots.txt file:

User-agent: *
Disallow: /wiki/Edit.jsp
Disallow: /wiki/rss.jsp
Disallow: /wiki/Diff.jsp
Disallow: /wiki/PageInfo.jsp
Disallow: /wiki/Upload.jsp
Disallow: /wiki/wiki/UserPreferences
Disallow: /wiki/UserPreferences.jsp


I don't like to have meta-description the same all over the
website, so 
I've added the following to commonheader.jsp to place it
only on the 
start page:

<% String pageP =
request.getParameter("page");
   if (pageP==null || "Start".equals(pageP)) {
%>
  <meta name="description"
content="blah, blah..."/>
<% } %>

Today, I saw in the apache access log googlebot indexing
previous 
versions of my pages (with "GET
/wiki/wiki/xxx?version=2 HTTP/1.1") - it 
has indexed PageInfo's URLs before my changes in
robots.txt. So I've 
added another kludge to commonheader.jsp:
<% if (request.getParameter("version") !=
null) { %>
        <meta name="robots"
content="noindex, follow" />
<% } %>


Am I paranoid?  I wonder
if anyone else has similiar experiences.

In my opinion my robots.txt is good. I see no point in
indexing those 
pages. I don't want to index previous versions of my pages
also - to 
save bandwith, and because the old ones are just not up to
date.

Any opinions?

Regards.


-- 
Mikolaj Rydzewski      <mikiceti.pl>        http://ceti.pl/~miki/
                    PGP KeyID: 8b12ab02
There are three kinds of people: men, women and unix.

_______________________________________________
Jspwiki-users mailing list
Jspwiki-usersecyrd.com
http://ecyrd.com/cgi-bin/mailman/listinfo/jspwiki-users
Working with search engines
user name
2006-09-15 06:56:03
Nope, that's not too paranoid.  Rampant bots can cause
quite a lot of  
havoc on the pages, including things like locking pages and
putting  
your entire repository on its knees.

The 2.4 ViewTemplate.jsp has the following code by default:

   <wiki:CheckVersion mode="notlatest">
         <meta name="robots"
content="noindex,nofollow" />
   </wiki:CheckVersion>

This stops robots from indexing older pages.  This is mostly
due to  
spam, though.

In my own robots.txt, I also have the same pages disallowed
as you  
(except for rss.jsp).

/Janne

On Sep 15, 2006, at 01:04 , Mikolaj Rydzewski wrote:

> Hi,
>
> Welcome all. That's my first post here. I'm not sure
if this is the  
> right place - I mean there are some discussions
directly on a  
> jspwiki.org website
>
> I use jspwiki (2.2.33 still) on my site. Being aware of
search  
> engines I wanted to prepare my site to be indexed
properly. I use  
> container authentication (form based login) to limit
access to edit/ 
> delete pages).
>
> So I turned on ShortViewURLConstructor to get rid of
Wiki.jsp? 
> page=xxx URLs.
>
> I put the following robots.txt file:
>
> User-agent: *
> Disallow: /wiki/Edit.jsp
> Disallow: /wiki/rss.jsp
> Disallow: /wiki/Diff.jsp
> Disallow: /wiki/PageInfo.jsp
> Disallow: /wiki/Upload.jsp
> Disallow: /wiki/wiki/UserPreferences
> Disallow: /wiki/UserPreferences.jsp
>
>
> I don't like to have meta-description the same all
over the  
> website, so I've added the following to
commonheader.jsp to place  
> it only on the start page:
>
> <% String pageP =
request.getParameter("page");
>   if (pageP==null || "Start".equals(pageP))
{ %>
>  <meta name="description"
content="blah, blah..."/>
> <% } %>
>
> Today, I saw in the apache access log googlebot
indexing previous  
> versions of my pages (with "GET
/wiki/wiki/xxx?version=2 HTTP/1.1")  
> - it has indexed PageInfo's URLs before my changes in
robots.txt.  
> So I've added another kludge to commonheader.jsp:
> <% if (request.getParameter("version")
!= null) { %>
>        <meta name="robots"
content="noindex, follow" />
> <% } %>
>
>
> Am I paranoid?  I wonder
if anyone else has similiar experiences.
>
> In my opinion my robots.txt is good. I see no point in
indexing  
> those pages. I don't want to index previous versions
of my pages  
> also - to save bandwith, and because the old ones are
just not up  
> to date.
>
> Any opinions?
>
> Regards.
>
>
> -- 
> Mikolaj Rydzewski      <mikiceti.pl>        http://ceti.pl/~miki/
>                    PGP KeyID: 8b12ab02
> There are three kinds of people: men, women and unix.
>
> _______________________________________________
> Jspwiki-users mailing list
> Jspwiki-usersecyrd.com
> http://ecyrd.com/cgi-bin/mailman/listinfo/jspwiki-users

_______________________________________________
Jspwiki-users mailing list
Jspwiki-usersecyrd.com
http://ecyrd.com/cgi-bin/mailman/listinfo/jspwiki-users
Working with search engines
user name
2006-09-15 06:56:03
Nope, that's not too paranoid.  Rampant bots can cause
quite a lot of  
havoc on the pages, including things like locking pages and
putting  
your entire repository on its knees.

The 2.4 ViewTemplate.jsp has the following code by default:

   <wiki:CheckVersion mode="notlatest">
         <meta name="robots"
content="noindex,nofollow" />
   </wiki:CheckVersion>

This stops robots from indexing older pages.  This is mostly
due to  
spam, though.

In my own robots.txt, I also have the same pages disallowed
as you  
(except for rss.jsp).

/Janne

On Sep 15, 2006, at 01:04 , Mikolaj Rydzewski wrote:

> Hi,
>
> Welcome all. That's my first post here. I'm not sure
if this is the  
> right place - I mean there are some discussions
directly on a  
> jspwiki.org website
>
> I use jspwiki (2.2.33 still) on my site. Being aware of
search  
> engines I wanted to prepare my site to be indexed
properly. I use  
> container authentication (form based login) to limit
access to edit/ 
> delete pages).
>
> So I turned on ShortViewURLConstructor to get rid of
Wiki.jsp? 
> page=xxx URLs.
>
> I put the following robots.txt file:
>
> User-agent: *
> Disallow: /wiki/Edit.jsp
> Disallow: /wiki/rss.jsp
> Disallow: /wiki/Diff.jsp
> Disallow: /wiki/PageInfo.jsp
> Disallow: /wiki/Upload.jsp
> Disallow: /wiki/wiki/UserPreferences
> Disallow: /wiki/UserPreferences.jsp
>
>
> I don't like to have meta-description the same all
over the  
> website, so I've added the following to
commonheader.jsp to place  
> it only on the start page:
>
> <% String pageP =
request.getParameter("page");
>   if (pageP==null || "Start".equals(pageP))
{ %>
>  <meta name="description"
content="blah, blah..."/>
> <% } %>
>
> Today, I saw in the apache access log googlebot
indexing previous  
> versions of my pages (with "GET
/wiki/wiki/xxx?version=2 HTTP/1.1")  
> - it has indexed PageInfo's URLs before my changes in
robots.txt.  
> So I've added another kludge to commonheader.jsp:
> <% if (request.getParameter("version")
!= null) { %>
>        <meta name="robots"
content="noindex, follow" />
> <% } %>
>
>
> Am I paranoid?  I wonder
if anyone else has similiar experiences.
>
> In my opinion my robots.txt is good. I see no point in
indexing  
> those pages. I don't want to index previous versions
of my pages  
> also - to save bandwith, and because the old ones are
just not up  
> to date.
>
> Any opinions?
>
> Regards.
>
>
> -- 
> Mikolaj Rydzewski      <mikiceti.pl>        http://ceti.pl/~miki/
>                    PGP KeyID: 8b12ab02
> There are three kinds of people: men, women and unix.
>
> _______________________________________________
> Jspwiki-users mailing list
> Jspwiki-usersecyrd.com
> http://ecyrd.com/cgi-bin/mailman/listinfo/jspwiki-users

_______________________________________________
Jspwiki-users mailing list
Jspwiki-usersecyrd.com
http://ecyrd.com/cgi-bin/mailman/listinfo/jspwiki-users
Working with search engines
user name
2006-09-15 06:56:03
Nope, that's not too paranoid.  Rampant bots can cause
quite a lot of  
havoc on the pages, including things like locking pages and
putting  
your entire repository on its knees.

The 2.4 ViewTemplate.jsp has the following code by default:

   <wiki:CheckVersion mode="notlatest">
         <meta name="robots"
content="noindex,nofollow" />
   </wiki:CheckVersion>

This stops robots from indexing older pages.  This is mostly
due to  
spam, though.

In my own robots.txt, I also have the same pages disallowed
as you  
(except for rss.jsp).

/Janne

On Sep 15, 2006, at 01:04 , Mikolaj Rydzewski wrote:

> Hi,
>
> Welcome all. That's my first post here. I'm not sure
if this is the  
> right place - I mean there are some discussions
directly on a  
> jspwiki.org website
>
> I use jspwiki (2.2.33 still) on my site. Being aware of
search  
> engines I wanted to prepare my site to be indexed
properly. I use  
> container authentication (form based login) to limit
access to edit/ 
> delete pages).
>
> So I turned on ShortViewURLConstructor to get rid of
Wiki.jsp? 
> page=xxx URLs.
>
> I put the following robots.txt file:
>
> User-agent: *
> Disallow: /wiki/Edit.jsp
> Disallow: /wiki/rss.jsp
> Disallow: /wiki/Diff.jsp
> Disallow: /wiki/PageInfo.jsp
> Disallow: /wiki/Upload.jsp
> Disallow: /wiki/wiki/UserPreferences
> Disallow: /wiki/UserPreferences.jsp
>
>
> I don't like to have meta-description the same all
over the  
> website, so I've added the following to
commonheader.jsp to place  
> it only on the start page:
>
> <% String pageP =
request.getParameter("page");
>   if (pageP==null || "Start".equals(pageP))
{ %>
>  <meta name="description"
content="blah, blah..."/>
> <% } %>
>
> Today, I saw in the apache access log googlebot
indexing previous  
> versions of my pages (with "GET
/wiki/wiki/xxx?version=2 HTTP/1.1")  
> - it has indexed PageInfo's URLs before my changes in
robots.txt.  
> So I've added another kludge to commonheader.jsp:
> <% if (request.getParameter("version")
!= null) { %>
>        <meta name="robots"
content="noindex, follow" />
> <% } %>
>
>
> Am I paranoid?  I wonder
if anyone else has similiar experiences.
>
> In my opinion my robots.txt is good. I see no point in
indexing  
> those pages. I don't want to index previous versions
of my pages  
> also - to save bandwith, and because the old ones are
just not up  
> to date.
>
> Any opinions?
>
> Regards.
>
>
> -- 
> Mikolaj Rydzewski      <mikiceti.pl>        http://ceti.pl/~miki/
>                    PGP KeyID: 8b12ab02
> There are three kinds of people: men, women and unix.
>
> _______________________________________________
> Jspwiki-users mailing list
> Jspwiki-usersecyrd.com
> http://ecyrd.com/cgi-bin/mailman/listinfo/jspwiki-users

_______________________________________________
Jspwiki-users mailing list
Jspwiki-usersecyrd.com
http://ecyrd.com/cgi-bin/mailman/listinfo/jspwiki-users
Working with search engines
user name
2006-09-15 06:56:03
Nope, that's not too paranoid.  Rampant bots can cause
quite a lot of  
havoc on the pages, including things like locking pages and
putting  
your entire repository on its knees.

The 2.4 ViewTemplate.jsp has the following code by default:

   <wiki:CheckVersion mode="notlatest">
         <meta name="robots"
content="noindex,nofollow" />
   </wiki:CheckVersion>

This stops robots from indexing older pages.  This is mostly
due to  
spam, though.

In my own robots.txt, I also have the same pages disallowed
as you  
(except for rss.jsp).

/Janne

On Sep 15, 2006, at 01:04 , Mikolaj Rydzewski wrote:

> Hi,
>
> Welcome all. That's my first post here. I'm not sure
if this is the  
> right place - I mean there are some discussions
directly on a  
> jspwiki.org website
>
> I use jspwiki (2.2.33 still) on my site. Being aware of
search  
> engines I wanted to prepare my site to be indexed
properly. I use  
> container authentication (form based login) to limit
access to edit/ 
> delete pages).
>
> So I turned on ShortViewURLConstructor to get rid of
Wiki.jsp? 
> page=xxx URLs.
>
> I put the following robots.txt file:
>
> User-agent: *
> Disallow: /wiki/Edit.jsp
> Disallow: /wiki/rss.jsp
> Disallow: /wiki/Diff.jsp
> Disallow: /wiki/PageInfo.jsp
> Disallow: /wiki/Upload.jsp
> Disallow: /wiki/wiki/UserPreferences
> Disallow: /wiki/UserPreferences.jsp
>
>
> I don't like to have meta-description the same all
over the  
> website, so I've added the following to
commonheader.jsp to place  
> it only on the start page:
>
> <% String pageP =
request.getParameter("page");
>   if (pageP==null || "Start".equals(pageP))
{ %>
>  <meta name="description"
content="blah, blah..."/>
> <% } %>
>
> Today, I saw in the apache access log googlebot
indexing previous  
> versions of my pages (with "GET
/wiki/wiki/xxx?version=2 HTTP/1.1")  
> - it has indexed PageInfo's URLs before my changes in
robots.txt.  
> So I've added another kludge to commonheader.jsp:
> <% if (request.getParameter("version")
!= null) { %>
>        <meta name="robots"
content="noindex, follow" />
> <% } %>
>
>
> Am I paranoid?  I wonder
if anyone else has similiar experiences.
>
> In my opinion my robots.txt is good. I see no point in
indexing  
> those pages. I don't want to index previous versions
of my pages  
> also - to save bandwith, and because the old ones are
just not up  
> to date.
>
> Any opinions?
>
> Regards.
>
>
> -- 
> Mikolaj Rydzewski      <mikiceti.pl>        http://ceti.pl/~miki/
>                    PGP KeyID: 8b12ab02
> There are three kinds of people: men, women and unix.
>
> _______________________________________________
> Jspwiki-users mailing list
> Jspwiki-usersecyrd.com
> http://ecyrd.com/cgi-bin/mailman/listinfo/jspwiki-users

_______________________________________________
Jspwiki-users mailing list
Jspwiki-usersecyrd.com
http://ecyrd.com/cgi-bin/mailman/listinfo/jspwiki-users
[1-5]

about | contact  Other archives ( Real Estate discussion Medical topics )