|
List Info
Thread: Created: (NUTCH-514) Indexer should only index pages with fetch status SUCCESS
|
|
| Created: (NUTCH-514) Indexer should
only index pages with fetch status
SUCCESS |
  United States |
2007-07-14 07:10:05 |
INDEXER SHOULD ONLY INDEX PAGES WITH FETCH STATUS SUCCESS
---------------------------------------------------------
KEY: NUTCH-514
URL:
HTTPS://ISSUES.APACHE.ORG/JIRA/BROWSE/NUTCH-514
PROJECT: NUTCH
ISSUE TYPE: IMPROVEMENT
COMPONENTS: INDEXER
REPORTER: DO?ACAN GüNEY
PRIORITY: MINOR
FIX FOR: 1.0.0
CURRENTLY IF YOU PARSE DURING FETCH, NUTCH ONLY PARSES PAGES
WHICH ARE SUCCESSFULLY (I.E, HAVE A STATUS
STATUS_FETCH_SUCCESS). BUT, IF YOU RUN PARSE AS A SEPERATE
JOB, NUTCH PARSES PAGES LIKE "404 NOT FOUND"S OR
"301 MOVED"S. SINCE MOST OF THESE CAN BE
SUCCESSFULLY PARSED THESE ARE INDEXED AND SHOW UP IN SEARCH
RESULTS.
IMO, WE SHOULD EITHER SOMEHOW MARK CONTENTS SO THAT A
SEPARATE PARSE DOESN'T OUTPUT NON-STATUS_FETCH_SUCCESS PAGES
OR WE SHOULD FILTER THEM OUT IN INDEXER.
--
THIS MESSAGE IS AUTOMATICALLY GENERATED BY JIRA.
-
YOU CAN REPLY TO THIS EMAIL TO ADD A COMMENT TO THE ISSUE
ONLINE.
|
|
| Updated: (NUTCH-514) Indexer should
only index pages with fetch status
SUCCESS |
  United States |
2007-07-14 07:12:04 |
[
HTTPS://ISSUES.APACHE.ORG/JIRA/BROWSE/NUTCH-514?PAGE=COM.ATL
ASSIAN.JIRA.PLUGIN.SYSTEM.ISSUETABPANELS:ALL-TABPANEL ]
DO?ACAN GüNEY UPDATED NUTCH-514:
--------------------------------
ATTACHMENT: NUTCH-514.PATCH
A SIMPLE PATCH FOR THE ISSUE. UPDATES INDEXER TO CHECK FOR
STATUS_FETCH_SUCCESS.
I WILL LET THIS PATCH STAY HERE FOR A WHILE. IS HAVING 404
AND 301 (AND SIMILAR) PAGES IN SEARCH RESULTS USEFUL TO
ANYONE? ALSO, SHOULD WE SKIP THEM IN PARSER OR FILTER THEM
IN INDEXER (AS THIS PATCH DOES)?
> INDEXER SHOULD ONLY INDEX PAGES WITH FETCH STATUS
SUCCESS
>
---------------------------------------------------------
>
> KEY: NUTCH-514
> URL:
HTTPS://ISSUES.APACHE.ORG/JIRA/BROWSE/NUTCH-514
> PROJECT: NUTCH
> ISSUE TYPE: IMPROVEMENT
> COMPONENTS: INDEXER
> REPORTER: DO?ACAN GüNEY
> PRIORITY: MINOR
> FIX FOR: 1.0.0
>
> ATTACHMENTS: NUTCH-514.PATCH
>
>
> CURRENTLY IF YOU PARSE DURING FETCH, NUTCH ONLY PARSES
PAGES WHICH ARE SUCCESSFULLY (I.E, HAVE A STATUS
STATUS_FETCH_SUCCESS). BUT, IF YOU RUN PARSE AS A SEPERATE
JOB, NUTCH PARSES PAGES LIKE "404 NOT FOUND"S OR
"301 MOVED"S. SINCE MOST OF THESE CAN BE
SUCCESSFULLY PARSED THESE ARE INDEXED AND SHOW UP IN SEARCH
RESULTS.
> IMO, WE SHOULD EITHER SOMEHOW MARK CONTENTS SO THAT A
SEPARATE PARSE DOESN'T OUTPUT NON-STATUS_FETCH_SUCCESS PAGES
OR WE SHOULD FILTER THEM OUT IN INDEXER.
--
THIS MESSAGE IS AUTOMATICALLY GENERATED BY JIRA.
-
YOU CAN REPLY TO THIS EMAIL TO ADD A COMMENT TO THE ISSUE
ONLINE.
|
|
| Commented: (NUTCH-514) Indexer should
only index pages with fetch status
SUCCESS |
  United States |
2007-07-30 06:02:52 |
[
HTTPS://ISSUES.APACHE.ORG/JIRA/BROWSE/NUTCH-514?PAGE=COM.ATL
ASSIAN.JIRA.PLUGIN.SYSTEM.ISSUETABPANELS:COMMENT-TABPANEL#AC
TION_12516365 ]
DO?ACAN GüNEY COMMENTED ON NUTCH-514:
-------------------------------------
SINCE NO ONE COMMENTED, I AM ASSUMING THAT NO ONE WANTS TO
SEE 404 AND OTHERS IN THEIR SEARCH RESULTS. SO I AM GOING TO
COMMIT THIS ONE SOON.
> INDEXER SHOULD ONLY INDEX PAGES WITH FETCH STATUS
SUCCESS
>
---------------------------------------------------------
>
> KEY: NUTCH-514
> URL:
HTTPS://ISSUES.APACHE.ORG/JIRA/BROWSE/NUTCH-514
> PROJECT: NUTCH
> ISSUE TYPE: IMPROVEMENT
> COMPONENTS: INDEXER
> REPORTER: DO?ACAN GüNEY
> PRIORITY: MINOR
> FIX FOR: 1.0.0
>
> ATTACHMENTS: NUTCH-514.PATCH
>
>
> CURRENTLY IF YOU PARSE DURING FETCH, NUTCH ONLY PARSES
PAGES WHICH ARE SUCCESSFULLY (I.E, HAVE A STATUS
STATUS_FETCH_SUCCESS). BUT, IF YOU RUN PARSE AS A SEPERATE
JOB, NUTCH PARSES PAGES LIKE "404 NOT FOUND"S OR
"301 MOVED"S. SINCE MOST OF THESE CAN BE
SUCCESSFULLY PARSED THESE ARE INDEXED AND SHOW UP IN SEARCH
RESULTS.
> IMO, WE SHOULD EITHER SOMEHOW MARK CONTENTS SO THAT A
SEPARATE PARSE DOESN'T OUTPUT NON-STATUS_FETCH_SUCCESS PAGES
OR WE SHOULD FILTER THEM OUT IN INDEXER.
--
THIS MESSAGE IS AUTOMATICALLY GENERATED BY JIRA.
-
YOU CAN REPLY TO THIS EMAIL TO ADD A COMMENT TO THE ISSUE
ONLINE.
|
|
| Commented: (NUTCH-514) Indexer should
only index pages with fetch status
SUCCESS |
  United States |
2007-07-30 11:20:53 |
[
HTTPS://ISSUES.APACHE.ORG/JIRA/BROWSE/NUTCH-514?PAGE=COM.ATL
ASSIAN.JIRA.PLUGIN.SYSTEM.ISSUETABPANELS:COMMENT-TABPANEL#AC
TION_12516428 ]
ANDRZEJ BIALECKI COMMENTED ON NUTCH-514:
-----------------------------------------
+1 WE'RE ONLY HUMANS WITH 24 HOURS IN A DAY .. ;)
ACTUALLY, THIS ISSUE IS A PART OF A BIGGER PROBLEM, I.E. HOW
TO DEAL WITH REDIRECTED PAGES, AND THIS PATCH DOESN'T SOLVE
THE UNDERLYING PROBLEM. WE CAN ADD IT AS A BAND-AID FOR THE
TIME BEING, UNTIL WE COME UP WITH A PROPER FIX. I'D SUGGEST
THAT THE COMMIT MESSAGE INCLUDES A REFERENCE TO NUTCH-273
AND NUTCH-353 .
> INDEXER SHOULD ONLY INDEX PAGES WITH FETCH STATUS
SUCCESS
>
---------------------------------------------------------
>
> KEY: NUTCH-514
> URL:
HTTPS://ISSUES.APACHE.ORG/JIRA/BROWSE/NUTCH-514
> PROJECT: NUTCH
> ISSUE TYPE: IMPROVEMENT
> COMPONENTS: INDEXER
> REPORTER: DO?ACAN GüNEY
> PRIORITY: MINOR
> FIX FOR: 1.0.0
>
> ATTACHMENTS: NUTCH-514.PATCH
>
>
> CURRENTLY IF YOU PARSE DURING FETCH, NUTCH ONLY PARSES
PAGES WHICH ARE SUCCESSFULLY (I.E, HAVE A STATUS
STATUS_FETCH_SUCCESS). BUT, IF YOU RUN PARSE AS A SEPERATE
JOB, NUTCH PARSES PAGES LIKE "404 NOT FOUND"S OR
"301 MOVED"S. SINCE MOST OF THESE CAN BE
SUCCESSFULLY PARSED THESE ARE INDEXED AND SHOW UP IN SEARCH
RESULTS.
> IMO, WE SHOULD EITHER SOMEHOW MARK CONTENTS SO THAT A
SEPARATE PARSE DOESN'T OUTPUT NON-STATUS_FETCH_SUCCESS PAGES
OR WE SHOULD FILTER THEM OUT IN INDEXER.
--
THIS MESSAGE IS AUTOMATICALLY GENERATED BY JIRA.
-
YOU CAN REPLY TO THIS EMAIL TO ADD A COMMENT TO THE ISSUE
ONLINE.
|
|
| Closed: (NUTCH-514) Indexer should only
index pages with fetch status SUCCESS |
  United States |
2007-07-30 14:03:53 |
[
HTTPS://ISSUES.APACHE.ORG/JIRA/BROWSE/NUTCH-514?PAGE=COM.ATL
ASSIAN.JIRA.PLUGIN.SYSTEM.ISSUETABPANELS:ALL-TABPANEL ]
DO?ACAN GüNEY CLOSED NUTCH-514.
-------------------------------
RESOLVED AND COMMITTED.
> INDEXER SHOULD ONLY INDEX PAGES WITH FETCH STATUS
SUCCESS
>
---------------------------------------------------------
>
> KEY: NUTCH-514
> URL:
HTTPS://ISSUES.APACHE.ORG/JIRA/BROWSE/NUTCH-514
> PROJECT: NUTCH
> ISSUE TYPE: IMPROVEMENT
> COMPONENTS: INDEXER
> REPORTER: DO?ACAN GüNEY
> ASSIGNEE: DO?ACAN GüNEY
> PRIORITY: MINOR
> FIX FOR: 1.0.0
>
> ATTACHMENTS: NUTCH-514.PATCH
>
>
> CURRENTLY IF YOU PARSE DURING FETCH, NUTCH ONLY PARSES
PAGES WHICH ARE SUCCESSFULLY (I.E, HAVE A STATUS
STATUS_FETCH_SUCCESS). BUT, IF YOU RUN PARSE AS A SEPERATE
JOB, NUTCH PARSES PAGES LIKE "404 NOT FOUND"S OR
"301 MOVED"S. SINCE MOST OF THESE CAN BE
SUCCESSFULLY PARSED THESE ARE INDEXED AND SHOW UP IN SEARCH
RESULTS.
> IMO, WE SHOULD EITHER SOMEHOW MARK CONTENTS SO THAT A
SEPARATE PARSE DOESN'T OUTPUT NON-STATUS_FETCH_SUCCESS PAGES
OR WE SHOULD FILTER THEM OUT IN INDEXER.
--
THIS MESSAGE IS AUTOMATICALLY GENERATED BY JIRA.
-
YOU CAN REPLY TO THIS EMAIL TO ADD A COMMENT TO THE ISSUE
ONLINE.
|
|
| Resolved: (NUTCH-514) Indexer should
only index pages with fetch status
SUCCESS |
  United States |
2007-07-30 14:03:52 |
[
HTTPS://ISSUES.APACHE.ORG/JIRA/BROWSE/NUTCH-514?PAGE=COM.ATL
ASSIAN.JIRA.PLUGIN.SYSTEM.ISSUETABPANELS:ALL-TABPANEL ]
DO?ACAN GüNEY RESOLVED NUTCH-514.
---------------------------------
RESOLUTION: FIXED
ASSIGNEE: DO?ACAN GüNEY
COMMITTED IN REV. 561092.
> INDEXER SHOULD ONLY INDEX PAGES WITH FETCH STATUS
SUCCESS
>
---------------------------------------------------------
>
> KEY: NUTCH-514
> URL:
HTTPS://ISSUES.APACHE.ORG/JIRA/BROWSE/NUTCH-514
> PROJECT: NUTCH
> ISSUE TYPE: IMPROVEMENT
> COMPONENTS: INDEXER
> REPORTER: DO?ACAN GüNEY
> ASSIGNEE: DO?ACAN GüNEY
> PRIORITY: MINOR
> FIX FOR: 1.0.0
>
> ATTACHMENTS: NUTCH-514.PATCH
>
>
> CURRENTLY IF YOU PARSE DURING FETCH, NUTCH ONLY PARSES
PAGES WHICH ARE SUCCESSFULLY (I.E, HAVE A STATUS
STATUS_FETCH_SUCCESS). BUT, IF YOU RUN PARSE AS A SEPERATE
JOB, NUTCH PARSES PAGES LIKE "404 NOT FOUND"S OR
"301 MOVED"S. SINCE MOST OF THESE CAN BE
SUCCESSFULLY PARSED THESE ARE INDEXED AND SHOW UP IN SEARCH
RESULTS.
> IMO, WE SHOULD EITHER SOMEHOW MARK CONTENTS SO THAT A
SEPARATE PARSE DOESN'T OUTPUT NON-STATUS_FETCH_SUCCESS PAGES
OR WE SHOULD FILTER THEM OUT IN INDEXER.
--
THIS MESSAGE IS AUTOMATICALLY GENERATED BY JIRA.
-
YOU CAN REPLY TO THIS EMAIL TO ADD A COMMENT TO THE ISSUE
ONLINE.
|
|
| Commented: (NUTCH-514) Indexer should
only index pages with fetch status
SUCCESS |
  United States |
2007-07-30 23:19:53 |
[
HTTPS://ISSUES.APACHE.ORG/JIRA/BROWSE/NUTCH-514?PAGE=COM.ATL
ASSIAN.JIRA.PLUGIN.SYSTEM.ISSUETABPANELS:COMMENT-TABPANEL#AC
TION_12516613 ]
HUDSON COMMENTED ON NUTCH-514:
------------------------------
INTEGRATED IN NUTCH-NIGHTLY #166 (SEE
[HTTP://LUCENE.ZONES.APACHE.ORG:8080/HUDSON/JOB/NUTCH-NIGHTL
Y/166/])
> INDEXER SHOULD ONLY INDEX PAGES WITH FETCH STATUS
SUCCESS
>
---------------------------------------------------------
>
> KEY: NUTCH-514
> URL:
HTTPS://ISSUES.APACHE.ORG/JIRA/BROWSE/NUTCH-514
> PROJECT: NUTCH
> ISSUE TYPE: IMPROVEMENT
> COMPONENTS: INDEXER
> REPORTER: DO?ACAN GüNEY
> ASSIGNEE: DO?ACAN GüNEY
> PRIORITY: MINOR
> FIX FOR: 1.0.0
>
> ATTACHMENTS: NUTCH-514.PATCH
>
>
> CURRENTLY IF YOU PARSE DURING FETCH, NUTCH ONLY PARSES
PAGES WHICH ARE SUCCESSFULLY (I.E, HAVE A STATUS
STATUS_FETCH_SUCCESS). BUT, IF YOU RUN PARSE AS A SEPERATE
JOB, NUTCH PARSES PAGES LIKE "404 NOT FOUND"S OR
"301 MOVED"S. SINCE MOST OF THESE CAN BE
SUCCESSFULLY PARSED THESE ARE INDEXED AND SHOW UP IN SEARCH
RESULTS.
> IMO, WE SHOULD EITHER SOMEHOW MARK CONTENTS SO THAT A
SEPARATE PARSE DOESN'T OUTPUT NON-STATUS_FETCH_SUCCESS PAGES
OR WE SHOULD FILTER THEM OUT IN INDEXER.
--
THIS MESSAGE IS AUTOMATICALLY GENERATED BY JIRA.
-
YOU CAN REPLY TO THIS EMAIL TO ADD A COMMENT TO THE ISSUE
ONLINE.
|
|
[1-7]
|
|
|
about | contact Other archives ( Real Estate discussion Medical topics )
|