List Info

Thread: parse-mp3 plugin concatenating previous tags for text field




parse-mp3 plugin concatenating previous tags for text field
user name
2006-12-11 13:32:54
The parse-mp3 plugin seems to be saving a state of the
previous  
parse's text content. For every new mp3 file parsed, it is
putting  
the contents of all the previous text fields in the plain
text field  
for that file.

You can see this by fetching a set of mp3s in one segment,
then  
viewing their plain text in the nutch webapp. The plaintext
will  
include the contents of all files fetched in that round,
which makes  
searching fruitless.

I made a tiny band-aid change to MP3Parser.java and  
MetadataCollector.java against the nightly. It seems to fix
the problem.


--- MP3Parser.java      2006-12-10 09:43:26.000000000 -0500
+++ MP3Parser.java.new  2006-12-10 16:37:03.000000000 -0500
 -67,7
+67,7 
        fos.write(raw);
        fos.close();
        MP3File mp3 = new MP3File(tmp);
-
+         metadataCollector.clearText();
        if (mp3.hasID3v2Tag()) {
          parse = getID3v2Parse(mp3, content.getMetadata());
        } else if (mp3.hasID3v1Tag()) {

--- MetadataCollector.java      2006-12-10
09:43:26.000000000 -0500
+++ MetadataCollector.java.new  2006-12-10
16:37:28.000000000 -0500
 -42,6
+42,10 
        this.conf = conf;
    }

+  public void clearText() {
+       text = "";
+  }
+
    public void notifyProperty(String name, String value)
throws  
MalformedURLException {
      if (name.equals("TIT2-Text"))
        setTitle(value);






parse-mp3 plugin concatenating previous tags for text field
user name
2006-12-12 15:13:33
Could you please create a JIRA issue and attach this patch
there so it
won't get lost. It also helps to keep uptodate the CHANGES
file as you
can just copy-paste from there when you do a commit.

--
 Sami Siren

Brian Whitman wrote:
> The parse-mp3 plugin seems to be saving a state of the
previous parse's
> text content. For every new mp3 file parsed, it is
putting the contents
> of all the previous text fields in the plain text field
for that file.
> 
> You can see this by fetching a set of mp3s in one
segment, then viewing
> their plain text in the nutch webapp. The plaintext
will include the
> contents of all files fetched in that round, which
makes searching
> fruitless.
> 
> I made a tiny band-aid change to MP3Parser.java and
> MetadataCollector.java against the nightly. It seems to
fix the problem.
> 
> 
> --- MP3Parser.java      2006-12-10 09:43:26.000000000
-0500
> +++ MP3Parser.java.new  2006-12-10 16:37:03.000000000
-0500
>  -67,7 +67,7 
>        fos.write(raw);
>        fos.close();
>        MP3File mp3 = new MP3File(tmp);
> -
> +         metadataCollector.clearText();
>        if (mp3.hasID3v2Tag()) {
>          parse = getID3v2Parse(mp3,
content.getMetadata());
>        } else if (mp3.hasID3v1Tag()) {
> 
> --- MetadataCollector.java      2006-12-10
09:43:26.000000000 -0500
> +++ MetadataCollector.java.new  2006-12-10
16:37:28.000000000 -0500
>  -42,6 +42,10 
>        this.conf = conf;
>    }
> 
> +  public void clearText() {
> +       text = "";
> +  }
> +
>    public void notifyProperty(String name, String
value) throws
> MalformedURLException {
>      if (name.equals("TIT2-Text"))
>        setTitle(value);
> 
> 
> 
> 
> 
> 
> 

[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )