|
List Info
Thread: RE: how can I say to jackrabbit to index a text when I put a TIFF in the repository?
|
|
| RE: how can I say to jackrabbit to index
a text when I put a TIFF in the
repository? |
  United States |
2008-03-27 12:59:30 |
Maybe add a reference property to your TIFF node, that
references a node that contains the text?
-----Original Message-----
From: Paco Avila <pavila git.es>
Sent: Thursday, March 27, 2008 1:51pm
To: users jackrabbit.apache.org
Subject: how can I say to jackrabbit to index a text when I
put a TIFF in the repository?
I need to store a TIFF in the repository and make it
searchable. I have
the text version of this file, how can this text be indexed?
--
Paco Avila <pavila git.es>
GIT Consultors
|
|
| Re: how can I say to jackrabbit to index
a text when I put a TIFF in the
repository? |

|
2008-03-28 01:26:47 |
Hi,
On Thu, Mar 27, 2008 at 7:59 PM, Dave Brosius
<dbrosius mebigfatguy.com> wrote:
> Maybe add a reference property to your TIFF node, that
references
> a node that contains the text?
Or just a normal string property with the text to be
indexed.
BR,
Jukka Zitting
|
|
| Re: how can I say to jackrabbit to index
a text when I put a TIFF in the
repository? |
  Spain |
2008-03-28 01:43:11 |
El vie, 28-03-2008 a las 08:26 +0200, Jukka Zitting
escribió:
> Hi,
>
> On Thu, Mar 27, 2008 at 7:59 PM, Dave Brosius
<dbrosius mebigfatguy.com> wrote:
> > Maybe add a reference property to your TIFF node,
that references
> > a node that contains the text?
>
> Or just a normal string property with the text to be
indexed.
>
> BR,
>
> Jukka Zitting
But, in this case, the query can't be:
/jcr:root//element(*,my:document)[jcr:contains(nt:resource,'
hola
mundo')]
and should be something like (if I store the text in
my:docText
property:
/jcr:root//element(*,my:document)[jcr:contains(my:docText,'h
ola
mundo')]
because Lucene is not indexing the "document text
version". By the way,
can I get the text generated by text-extractors or it is
only used by
Lucene engine?
--
Paco Avila <pavila git.es>
GIT Consultors
|
|
| Re: how can I say to jackrabbit to index
a text when I put a TIFF in the
repository? |

|
2008-03-28 01:57:27 |
Hi,
On Fri, Mar 28, 2008 at 8:43 AM, Paco Avila <pavila git.es> wrote:
> El vie, 28-03-2008 a las 08:26 +0200, Jukka Zitting
escribió:
> > Or just a normal string property with the text to
be indexed.
>
> But, in this case, the query can't be:
>
>
/jcr:root//element(*,my:document)[jcr:contains(nt:resource,'
hola
> mundo')]
>
> and should be something like (if I store the text in
my:docText
> property:
>
>
/jcr:root//element(*,my:document)[jcr:contains(my:docText,'h
ola
> mundo')]
>
> because Lucene is not indexing the "document text
version".
You could use jcr:contains(., 'hola mundo') that looks in
all
properties of a node.
Alternatively, you could also put the text in a TIFF comment
and
implement a custom TextExtractor class that pulls that
comment for
Jackrabbit to index as the text version of the TIFF file.
> By the way, can I get the text generated by
text-extractors or
> it is only used by Lucene engine?
No, it's only used for Lucene. But of course you can
instantiate and
run the text extractors manually on any binary property you
like.
BR,
Jukka Zitting
|
|
| Re: how can I say to jackrabbit to index
a text when I put a TIFF in the
repository? |

|
2008-03-28 02:19:41 |
why don't you just add a mixin nodetype(that would contain
the
document version property) to the nt:resource node while
uploading and
store doc version as additional property on the nt:resource
node.
this would solve your problem if i understood your use case
the right
way.
regards,
philipp
On Fri, Mar 28, 2008 at 7:57 AM, Jukka Zitting
<jukka.zitting gmail.com> wrote:
> Hi,
>
>
> On Fri, Mar 28, 2008 at 8:43 AM, Paco Avila
<pavila git.es> wrote:
> > El vie, 28-03-2008 a las 08:26 +0200, Jukka
Zitting escribió:
>
> > > Or just a normal string property with the
text to be indexed.
> >
>
> > But, in this case, the query can't be:
> >
> >
/jcr:root//element(*,my:document)[jcr:contains(nt:resource,'
hola
> > mundo')]
> >
> > and should be something like (if I store the
text in my:docText
> > property:
> >
> >
/jcr:root//element(*,my:document)[jcr:contains(my:docText,'h
ola
> > mundo')]
> >
> > because Lucene is not indexing the
"document text version".
>
> You could use jcr:contains(., 'hola mundo') that looks
in all
> properties of a node.
>
> Alternatively, you could also put the text in a TIFF
comment and
> implement a custom TextExtractor class that pulls that
comment for
> Jackrabbit to index as the text version of the TIFF
file.
>
>
> > By the way, can I get the text generated by
text-extractors or
> > it is only used by Lucene engine?
>
> No, it's only used for Lucene. But of course you can
instantiate and
> run the text extractors manually on any binary
property you like.
>
> BR,
>
> Jukka Zitting
>
|
|
| Re: how can I say to jackrabbit to index
a text when I put a TIFF in the
repository? |
  Spain |
2008-03-28 02:33:28 |
What do you mean with "document version"? the text
from the TIFF image?
El vie, 28-03-2008 a las 08:19 +0100, Philipp Koch
escribió:
> why don't you just add a mixin nodetype(that would
contain the
> document version property) to the nt:resource node
while uploading and
> store doc version as additional property on the
nt:resource node.
> this would solve your problem if i understood your use
case the right
> way.
>
> regards,
> philipp
>
> On Fri, Mar 28, 2008 at 7:57 AM, Jukka Zitting
<jukka.zitting gmail.com> wrote:
> > Hi,
> >
> >
> > On Fri, Mar 28, 2008 at 8:43 AM, Paco Avila
<pavila git.es> wrote:
> > > El vie, 28-03-2008 a las 08:26 +0200, Jukka
Zitting escribió:
> >
> > > > Or just a normal string property with
the text to be indexed.
> > >
> >
> > > But, in this case, the query can't be:
> > >
> > >
/jcr:root//element(*,my:document)[jcr:contains(nt:resource,'
hola
> > > mundo')]
> > >
> > > and should be something like (if I store
the text in my:docText
> > > property:
> > >
> > >
/jcr:root//element(*,my:document)[jcr:contains(my:docText,'h
ola
> > > mundo')]
> > >
> > > because Lucene is not indexing the
"document text version".
> >
> > You could use jcr:contains(., 'hola mundo') that
looks in all
> > properties of a node.
> >
> > Alternatively, you could also put the text in a
TIFF comment and
> > implement a custom TextExtractor class that pulls
that comment for
> > Jackrabbit to index as the text version of the
TIFF file.
> >
> >
> > > By the way, can I get the text generated by
text-extractors or
> > > it is only used by Lucene engine?
> >
> > No, it's only used for Lucene. But of course you
can instantiate and
> > run the text extractors manually on any binary
property you like.
> >
> > BR,
> >
> > Jukka Zitting
> >
--
Paco Avila <pavila git.es>
GIT Consultors
|
|
| Re: how can I say to jackrabbit to index
a text when I put a TIFF in the
repository? |
  Spain |
2008-03-28 02:39:38 |
El vie, 28-03-2008 a las 08:57 +0200, Jukka Zitting
escribió:
> Hi,
>
> On Fri, Mar 28, 2008 at 8:43 AM, Paco Avila
<pavila git.es> wrote:
> > El vie, 28-03-2008 a las 08:26 +0200, Jukka
Zitting escribió:
> > > Or just a normal string property with the
text to be indexed.
> >
> > But, in this case, the query can't be:
> >
> >
/jcr:root//element(*,my:document)[jcr:contains(nt:resource,'
hola
> > mundo')]
> >
> > and should be something like (if I store the text
in my:docText
> > property:
> >
> >
/jcr:root//element(*,my:document)[jcr:contains(my:docText,'h
ola
> > mundo')]
> >
> > because Lucene is not indexing the "document
text version".
>
> You could use jcr:contains(., 'hola mundo') that looks
in all
> properties of a node.
But nt:resource is a subnode and my:docText is a property,
jcr:contains(., 'hola mundo') will search in all properties
and
subnodes?
> Alternatively, you could also put the text in a TIFF
comment and
> implement a custom TextExtractor class that pulls that
comment for
> Jackrabbit to index as the text version of the TIFF
file.
>
> > By the way, can I get the text generated by
text-extractors or
> > it is only used by Lucene engine?
>
> No, it's only used for Lucene. But of course you can
instantiate and
> run the text extractors manually on any binary property
you like.
Thanks.
|
|
| Re: how can I say to jackrabbit to index
a text when I put a TIFF in the
repository? |

|
2008-03-28 02:44:54 |
Hi,
On Fri, Mar 28, 2008 at 9:39 AM, Paco Avila <pavila git.es> wrote:
> El vie, 28-03-2008 a las 08:57 +0200, Jukka Zitting
escribió:
> > You could use jcr:contains(., 'hola mundo') that
looks in all
> > properties of a node.
>
> But nt:resource is a subnode and my:docText is a
property,
> jcr:contains(., 'hola mundo') will search in all
properties and
> subnodes?
Ah, yes. Put the extra text property in the nt:resource node
(with an
appropriate mixin or subtype as mentioned by Philipp). Then
jcr:contains(jcr:content,'hola mundo') should match the
content.
BR,
Jukka Zitting
|
|
| Re: how can I say to jackrabbit to index
a text when I put a TIFF in the
repository? |

|
2008-03-28 02:47:49 |
> What do you mean with "document version"? the
text from the TIFF image?
i was refering to my:docText in your example.
you could than call jcr:contains(jcr:content, 'hola mundo')
.
regrads,
philipp
On Fri, Mar 28, 2008 at 8:33 AM, Paco Avila <pavila git.es> wrote:
> What do you mean with "document version"? the
text from the TIFF image?
>
> El vie, 28-03-2008 a las 08:19 +0100, Philipp Koch
escribió:
>
>
> > why don't you just add a mixin nodetype(that would
contain the
> > document version property) to the nt:resource
node while uploading and
> > store doc version as additional property on the
nt:resource node.
> > this would solve your problem if i understood
your use case the right
> > way.
> >
> > regards,
> > philipp
> >
> > On Fri, Mar 28, 2008 at 7:57 AM, Jukka Zitting
<jukka.zitting gmail.com> wrote:
> > > Hi,
> > >
> > >
> > > On Fri, Mar 28, 2008 at 8:43 AM, Paco Avila
<pavila git.es> wrote:
> > > > El vie, 28-03-2008 a las 08:26 +0200,
Jukka Zitting escribió:
> > >
> > > > > Or just a normal string property
with the text to be indexed.
> > > >
> > >
> > > > But, in this case, the query can't
be:
> > > >
> > > >
/jcr:root//element(*,my:document)[jcr:contains(nt:resource,'
hola
> > > > mundo')]
> > > >
> > > > and should be something like (if I
store the text in my:docText
> > > > property:
> > > >
> > > >
/jcr:root//element(*,my:document)[jcr:contains(my:docText,'h
ola
> > > > mundo')]
> > > >
> > > > because Lucene is not indexing the
"document text version".
> > >
> > > You could use jcr:contains(., 'hola mundo')
that looks in all
> > > properties of a node.
> > >
> > > Alternatively, you could also put the text
in a TIFF comment and
> > > implement a custom TextExtractor class that
pulls that comment for
> > > Jackrabbit to index as the text version of
the TIFF file.
> > >
> > >
> > > > By the way, can I get the text
generated by text-extractors or
> > > > it is only used by Lucene engine?
> > >
> > > No, it's only used for Lucene. But of
course you can instantiate and
> > > run the text extractors manually on any
binary property you like.
> > >
> > > BR,
> > >
> > > Jukka Zitting
> > >
> --
>
> Paco Avila <pavila git.es>
> GIT Consultors
>
>
|
|
| Re: how can I say to jackrabbit to index
a text when I put a TIFF in the
repository? |
  Spain |
2008-03-28 02:49:13 |
El vie, 28-03-2008 a las 09:44 +0200, Jukka Zitting
escribió:
> Hi,
>
> On Fri, Mar 28, 2008 at 9:39 AM, Paco Avila
<pavila git.es> wrote:
> > El vie, 28-03-2008 a las 08:57 +0200, Jukka
Zitting escribió:
> > > You could use jcr:contains(., 'hola mundo')
that looks in all
> > > properties of a node.
> >
> > But nt:resource is a subnode and my:docText is a
property,
> > jcr:contains(., 'hola mundo') will search in all
properties and
> > subnodes?
>
> Ah, yes. Put the extra text property in the nt:resource
node (with an
> appropriate mixin or subtype as mentioned by Philipp).
Then
> jcr:contains(jcr:content,'hola mundo') should match the
content.
Ok, I understand it now. Thanks!
--
Paco Avila <pavila git.es>
GIT Consultors
|
|
[1-10]
|
|