|
|
| Created: (NUTCH-518) Fix
OpicScoringFilter to respect scoring
filter chaining |
  United States |
2007-07-18 03:16:04 |
Fix OpicScoringFilter to respect scoring filter chaining
--------------------------------------------------------
Key: NUTCH-518
URL: https
://issues.apache.org/jira/browse/NUTCH-518
Project: Nutch
Issue Type: Bug
Components: indexer
Affects Versions: 1.0.0
Reporter: Enis Soztutar
Fix For: 1.0.0
Opic Scoring returns the score that it calculates, rather
than returning previous_score * calculated_score. This
prevents using another scoring filter along with Opic
scoring.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.
|
|
| Updated: (NUTCH-518) Fix
OpicScoringFilter to respect scoring
filter chaining |
  United States |
2007-07-18 03:18:04 |
[ https://issues.apache.org/jira/browse/NUTCH-518?page=com.at
lassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated NUTCH-518:
--------------------------------
Attachment: opicScoring.chain.patch
Patch is attached, which was formerly a part of the patch in
NUTCH-439
> Fix OpicScoringFilter to respect scoring filter
chaining
>
--------------------------------------------------------
>
> Key: NUTCH-518
> URL: https
://issues.apache.org/jira/browse/NUTCH-518
> Project: Nutch
> Issue Type: Bug
> Components: indexer
> Affects Versions: 1.0.0
> Reporter: Enis Soztutar
> Fix For: 1.0.0
>
> Attachments: opicScoring.chain.patch
>
>
> Opic Scoring returns the score that it calculates,
rather than returning previous_score * calculated_score.
This prevents using another scoring filter along with Opic
scoring.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.
|
|
| Resolved: (NUTCH-518) Fix
OpicScoringFilter to respect scoring
filter chaining |
  United States |
2007-07-18 13:05:05 |
[
HTTPS://ISSUES.APACHE.ORG/JIRA/BROWSE/NUTCH-518?PAGE=COM.ATL
ASSIAN.JIRA.PLUGIN.SYSTEM.ISSUETABPANELS:ALL-TABPANEL ]
DO?ACAN GüNEY RESOLVED NUTCH-518.
---------------------------------
RESOLUTION: FIXED
ASSIGNEE: DO?ACAN GüNEY
FIXED IN REV. 557344.
> FIX OPICSCORINGFILTER TO RESPECT SCORING FILTER
CHAINING
>
--------------------------------------------------------
>
> KEY: NUTCH-518
> URL:
HTTPS://ISSUES.APACHE.ORG/JIRA/BROWSE/NUTCH-518
> PROJECT: NUTCH
> ISSUE TYPE: BUG
> COMPONENTS: INDEXER
> AFFECTS VERSIONS: 1.0.0
> REPORTER: ENIS SOZTUTAR
> ASSIGNEE: DO?ACAN GüNEY
> FIX FOR: 1.0.0
>
> ATTACHMENTS: OPICSCORING.CHAIN.PATCH
>
>
> OPIC SCORING RETURNS THE SCORE THAT IT CALCULATES,
RATHER THAN RETURNING PREVIOUS_SCORE * CALCULATED_SCORE.
THIS PREVENTS USING ANOTHER SCORING FILTER ALONG WITH OPIC
SCORING.
--
THIS MESSAGE IS AUTOMATICALLY GENERATED BY JIRA.
-
YOU CAN REPLY TO THIS EMAIL TO ADD A COMMENT TO THE ISSUE
ONLINE.
|
|
| Closed: (NUTCH-518) Fix
OpicScoringFilter to respect scoring
filter chaining |
  United States |
2007-07-18 13:05:05 |
[
HTTPS://ISSUES.APACHE.ORG/JIRA/BROWSE/NUTCH-518?PAGE=COM.ATL
ASSIAN.JIRA.PLUGIN.SYSTEM.ISSUETABPANELS:ALL-TABPANEL ]
DO?ACAN GüNEY CLOSED NUTCH-518.
-------------------------------
RESOLVED AND COMMITTED.
> FIX OPICSCORINGFILTER TO RESPECT SCORING FILTER
CHAINING
>
--------------------------------------------------------
>
> KEY: NUTCH-518
> URL:
HTTPS://ISSUES.APACHE.ORG/JIRA/BROWSE/NUTCH-518
> PROJECT: NUTCH
> ISSUE TYPE: BUG
> COMPONENTS: INDEXER
> AFFECTS VERSIONS: 1.0.0
> REPORTER: ENIS SOZTUTAR
> ASSIGNEE: DO?ACAN GüNEY
> FIX FOR: 1.0.0
>
> ATTACHMENTS: OPICSCORING.CHAIN.PATCH
>
>
> OPIC SCORING RETURNS THE SCORE THAT IT CALCULATES,
RATHER THAN RETURNING PREVIOUS_SCORE * CALCULATED_SCORE.
THIS PREVENTS USING ANOTHER SCORING FILTER ALONG WITH OPIC
SCORING.
--
THIS MESSAGE IS AUTOMATICALLY GENERATED BY JIRA.
-
YOU CAN REPLY TO THIS EMAIL TO ADD A COMMENT TO THE ISSUE
ONLINE.
|
|
| Reopened: (NUTCH-518) Fix
OpicScoringFilter to respect scoring
filter chaining |
  United States |
2007-07-18 13:32:05 |
[
HTTPS://ISSUES.APACHE.ORG/JIRA/BROWSE/NUTCH-518?PAGE=COM.ATL
ASSIAN.JIRA.PLUGIN.SYSTEM.ISSUETABPANELS:ALL-TABPANEL ]
ANDRZEJ BIALECKI REOPENED NUTCH-518:
-------------------------------------
THIS ONE WAS TOO QUICK, I THINK ... I WANTED TO DISCUSS THE
ISSUE WHETHER THE CHAINING PROCESS SHOULD USE MULTIPLICATION
OR ADDITION. I'M NOT ENTIRELY SURE THE MULTIPLICATION IS THE
RIGHT CHOICE - IF ONE OF PRECEDING FILTERS RETURNS 0,
OPICSCORINGFILTER HAS NO CHANCE TO AFFECT THAT DECISION.
> FIX OPICSCORINGFILTER TO RESPECT SCORING FILTER
CHAINING
>
--------------------------------------------------------
>
> KEY: NUTCH-518
> URL:
HTTPS://ISSUES.APACHE.ORG/JIRA/BROWSE/NUTCH-518
> PROJECT: NUTCH
> ISSUE TYPE: BUG
> COMPONENTS: INDEXER
> AFFECTS VERSIONS: 1.0.0
> REPORTER: ENIS SOZTUTAR
> ASSIGNEE: DO?ACAN GüNEY
> FIX FOR: 1.0.0
>
> ATTACHMENTS: OPICSCORING.CHAIN.PATCH
>
>
> OPIC SCORING RETURNS THE SCORE THAT IT CALCULATES,
RATHER THAN RETURNING PREVIOUS_SCORE * CALCULATED_SCORE.
THIS PREVENTS USING ANOTHER SCORING FILTER ALONG WITH OPIC
SCORING.
--
THIS MESSAGE IS AUTOMATICALLY GENERATED BY JIRA.
-
YOU CAN REPLY TO THIS EMAIL TO ADD A COMMENT TO THE ISSUE
ONLINE.
|
|
| Commented: (NUTCH-518) Fix
OpicScoringFilter to respect scoring
filter chaining |
  United States |
2007-07-18 13:40:04 |
[
HTTPS://ISSUES.APACHE.ORG/JIRA/BROWSE/NUTCH-518?PAGE=COM.ATL
ASSIAN.JIRA.PLUGIN.SYSTEM.ISSUETABPANELS:COMMENT-TABPANEL#AC
TION_12513679 ]
DO?ACAN GüNEY COMMENTED ON NUTCH-518:
-------------------------------------
SURE. I THOUGHT YOU ARE OK WITH IT SINCE YOU MENTIONED YOU
ARE GOING TO COMMIT IT IN NUTCH-439. SHOULD I REVERT THE
COMMIT OR LEAVE IT IN FOR NOW?
> FIX OPICSCORINGFILTER TO RESPECT SCORING FILTER
CHAINING
>
--------------------------------------------------------
>
> KEY: NUTCH-518
> URL:
HTTPS://ISSUES.APACHE.ORG/JIRA/BROWSE/NUTCH-518
> PROJECT: NUTCH
> ISSUE TYPE: BUG
> COMPONENTS: INDEXER
> AFFECTS VERSIONS: 1.0.0
> REPORTER: ENIS SOZTUTAR
> ASSIGNEE: DO?ACAN GüNEY
> FIX FOR: 1.0.0
>
> ATTACHMENTS: OPICSCORING.CHAIN.PATCH
>
>
> OPIC SCORING RETURNS THE SCORE THAT IT CALCULATES,
RATHER THAN RETURNING PREVIOUS_SCORE * CALCULATED_SCORE.
THIS PREVENTS USING ANOTHER SCORING FILTER ALONG WITH OPIC
SCORING.
--
THIS MESSAGE IS AUTOMATICALLY GENERATED BY JIRA.
-
YOU CAN REPLY TO THIS EMAIL TO ADD A COMMENT TO THE ISSUE
ONLINE.
|
|
| Commented: (NUTCH-518) Fix
OpicScoringFilter to respect scoring
filter chaining |
  United States |
2007-07-18 15:03:04 |
[
HTTPS://ISSUES.APACHE.ORG/JIRA/BROWSE/NUTCH-518?PAGE=COM.ATL
ASSIAN.JIRA.PLUGIN.SYSTEM.ISSUETABPANELS:COMMENT-TABPANEL#AC
TION_12513704 ]
ANDRZEJ BIALECKI COMMENTED ON NUTCH-518:
-----------------------------------------
RIGHT, I WAS TOO QUICK TOO ... ;) LEAVE IT IN FOR NOW. LET'S
AGREE FIRST ON WHAT IS THE RIGHT WAY TO DO THIS.
> FIX OPICSCORINGFILTER TO RESPECT SCORING FILTER
CHAINING
>
--------------------------------------------------------
>
> KEY: NUTCH-518
> URL:
HTTPS://ISSUES.APACHE.ORG/JIRA/BROWSE/NUTCH-518
> PROJECT: NUTCH
> ISSUE TYPE: BUG
> COMPONENTS: INDEXER
> AFFECTS VERSIONS: 1.0.0
> REPORTER: ENIS SOZTUTAR
> ASSIGNEE: DO?ACAN GüNEY
> FIX FOR: 1.0.0
>
> ATTACHMENTS: OPICSCORING.CHAIN.PATCH
>
>
> OPIC SCORING RETURNS THE SCORE THAT IT CALCULATES,
RATHER THAN RETURNING PREVIOUS_SCORE * CALCULATED_SCORE.
THIS PREVENTS USING ANOTHER SCORING FILTER ALONG WITH OPIC
SCORING.
--
THIS MESSAGE IS AUTOMATICALLY GENERATED BY JIRA.
-
YOU CAN REPLY TO THIS EMAIL TO ADD A COMMENT TO THE ISSUE
ONLINE.
|
|
| Commented: (NUTCH-518) Fix
OpicScoringFilter to respect scoring
filter chaining |
  United States |
2007-07-18 23:27:05 |
[
HTTPS://ISSUES.APACHE.ORG/JIRA/BROWSE/NUTCH-518?PAGE=COM.ATL
ASSIAN.JIRA.PLUGIN.SYSTEM.ISSUETABPANELS:COMMENT-TABPANEL#AC
TION_12513807 ]
HUDSON COMMENTED ON NUTCH-518:
------------------------------
INTEGRATED IN NUTCH-NIGHTLY #154 (SEE
[HTTP://LUCENE.ZONES.APACHE.ORG:8080/HUDSON/JOB/NUTCH-NIGHTL
Y/154/])
> FIX OPICSCORINGFILTER TO RESPECT SCORING FILTER
CHAINING
>
--------------------------------------------------------
>
> KEY: NUTCH-518
> URL:
HTTPS://ISSUES.APACHE.ORG/JIRA/BROWSE/NUTCH-518
> PROJECT: NUTCH
> ISSUE TYPE: BUG
> COMPONENTS: INDEXER
> AFFECTS VERSIONS: 1.0.0
> REPORTER: ENIS SOZTUTAR
> ASSIGNEE: DO?ACAN GüNEY
> FIX FOR: 1.0.0
>
> ATTACHMENTS: OPICSCORING.CHAIN.PATCH
>
>
> OPIC SCORING RETURNS THE SCORE THAT IT CALCULATES,
RATHER THAN RETURNING PREVIOUS_SCORE * CALCULATED_SCORE.
THIS PREVENTS USING ANOTHER SCORING FILTER ALONG WITH OPIC
SCORING.
--
THIS MESSAGE IS AUTOMATICALLY GENERATED BY JIRA.
-
YOU CAN REPLY TO THIS EMAIL TO ADD A COMMENT TO THE ISSUE
ONLINE.
|
|
| Commented: (NUTCH-518) Fix
OpicScoringFilter to respect scoring
filter chaining |
  United States |
2007-07-19 01:12:04 |
[
HTTPS://ISSUES.APACHE.ORG/JIRA/BROWSE/NUTCH-518?PAGE=COM.ATL
ASSIAN.JIRA.PLUGIN.SYSTEM.ISSUETABPANELS:COMMENT-TABPANEL#AC
TION_12513819 ]
ENIS SOZTUTAR COMMENTED ON NUTCH-518:
-------------------------------------
SINCE THERE IS NO ORDERING AMONG SCORING FILTERS, IF WE DO
SOMETHING SPECIFIC TO ZERO IN OPICSCORING, IT MIGHT LEAD TO
NONDETERMINISTIC BEHAVIOUR. LET'S SAY FOR EXAMPLE THE CODE
IN OPICSCORING IS CHANGED SO THAT :
PUBLIC FLOAT INDEXERSCORE(TEXT URL, DOCUMENT DOC, CRAWLDATUM
DBDATUM, CRAWLDATUM FETCHDATUM, PARSE PARSE, INLINKS
INLINKS, FLOAT INITSCORE) {
IF(INITSCORE != 0)
RETURN (FLOAT)MATH.POW(DBDATUM.GETSCORE(), SCOREPOWER)
* INITSCORE;
ELSE
//DO SMT NASTY
}
THEN THERE WILL BE A BIG DIFFERENCE IF SCORING-OPIC IS RUN
BEFORE OR AFTER SCORING-FOO.
AS FAR AS I CAN TELL FROM THE MASSAGES IN MAILING LISTS,
SCORING FILTERS ARE USED FOR RESTRICTING THE CRAWL TO
TOPICS, SO ZERO-HANDLING MIGHT BROKE TOPIC-SPECIFIC CRAWLS.
SO MY VOTE IS TO KEEP CURRENT IMPLEMENTATION.
> FIX OPICSCORINGFILTER TO RESPECT SCORING FILTER
CHAINING
>
--------------------------------------------------------
>
> KEY: NUTCH-518
> URL:
HTTPS://ISSUES.APACHE.ORG/JIRA/BROWSE/NUTCH-518
> PROJECT: NUTCH
> ISSUE TYPE: BUG
> COMPONENTS: INDEXER
> AFFECTS VERSIONS: 1.0.0
> REPORTER: ENIS SOZTUTAR
> ASSIGNEE: DO?ACAN GüNEY
> FIX FOR: 1.0.0
>
> ATTACHMENTS: OPICSCORING.CHAIN.PATCH
>
>
> OPIC SCORING RETURNS THE SCORE THAT IT CALCULATES,
RATHER THAN RETURNING PREVIOUS_SCORE * CALCULATED_SCORE.
THIS PREVENTS USING ANOTHER SCORING FILTER ALONG WITH OPIC
SCORING.
--
THIS MESSAGE IS AUTOMATICALLY GENERATED BY JIRA.
-
YOU CAN REPLY TO THIS EMAIL TO ADD A COMMENT TO THE ISSUE
ONLINE.
|
|
| Commented: (NUTCH-518) Fix
OpicScoringFilter to respect scoring
filter chaining |
  United States |
2007-07-19 01:26:04 |
[
HTTPS://ISSUES.APACHE.ORG/JIRA/BROWSE/NUTCH-518?PAGE=COM.ATL
ASSIAN.JIRA.PLUGIN.SYSTEM.ISSUETABPANELS:COMMENT-TABPANEL#AC
TION_12513821 ]
DO?ACAN GüNEY COMMENTED ON NUTCH-518:
-------------------------------------
THIS IS ANOTHER ALTERNATIVE. I AM NOT SUGGESTING THAT WE USE
IT BUT JUST TO PUT IT ON THE TABLE:
* REMOVE INITIAL SCORE ARGUMENT FROM INDEXERSCORE AND
GENERATORSORTVALUE.
* CHANGE SCORINGFILTERS.JAVA TO COLLECT SCORES FROM
DIFFERENT SCORINGFILTER-S.
* CALCULATE THEIR GEOMETRIC MEAN.
THIS APPROACH IS FAR MORE AGGRESSIVE. IT IS LIKE A LOGICAL
AND. WITH GEOMETRIC MEAN A PAGE IS 'IMPORTANT' PRETTY MUCH
ONLY IF *ALL* SCORING FILTERS DECIDE THAT IT IS IMPORTANT. I
REALLY LIKE THIS APPROACH, BUT IT WON'T WORK FOR PEOPLE WHO
WANT TO GIVE A HIGH SCORE TO PAGES WITH CERTAIN CONTENT EVEN
IF THE PAGE ITSELF HAS NO INLINKS (FOR THIS CASE, ADDITION
WOULD HAVE WORKED VERY WELL).
> FIX OPICSCORINGFILTER TO RESPECT SCORING FILTER
CHAINING
>
--------------------------------------------------------
>
> KEY: NUTCH-518
> URL:
HTTPS://ISSUES.APACHE.ORG/JIRA/BROWSE/NUTCH-518
> PROJECT: NUTCH
> ISSUE TYPE: BUG
> COMPONENTS: INDEXER
> AFFECTS VERSIONS: 1.0.0
> REPORTER: ENIS SOZTUTAR
> ASSIGNEE: DO?ACAN GüNEY
> FIX FOR: 1.0.0
>
> ATTACHMENTS: OPICSCORING.CHAIN.PATCH
>
>
> OPIC SCORING RETURNS THE SCORE THAT IT CALCULATES,
RATHER THAN RETURNING PREVIOUS_SCORE * CALCULATED_SCORE.
THIS PREVENTS USING ANOTHER SCORING FILTER ALONG WITH OPIC
SCORING.
--
THIS MESSAGE IS AUTOMATICALLY GENERATED BY JIRA.
-
YOU CAN REPLY TO THIS EMAIL TO ADD A COMMENT TO THE ISSUE
ONLINE.
|
|