List Info

Thread: some interestinh problem




some interestinh problem
user name
2006-12-27 19:06:17
Hello, pylucene-dev.

sometimes i receive weird exception for unicode data, which
is
japanese text. here an entry from the log:

2006-12-27 11:01:18,541 ERROR
Traceback (most recent call last):
  File "/home/search/lib/index/Index.py", line 91,
in indexDocument
    doc.add(Field("summary", fields['summary'],
Field.Store.YES, Field.Index.TOKENIZED))
InvalidArgsError: (<type 'PyLucene.Field'>,
'__init__', ('summary',
'xe7xb4xa0xe6x95xb5xe3x81xaaxe3x82xafxe3x83
xaaxe3x82xb9xe3x83x9exe3x82xb9xe3x83x97xe3x83
xacxe3x82xbcxe3x83xb3xe3x83x88xe3x81x8cxe5xb1
x8axe
3x81x8dxe3x81xbexe3x81x97xe3x81x9fxefxa3xa6
xe3x83xabxe3x82xa4xe3x82xb5xe3x83xb3xe3x82xbf
xe3x81
x95xe3x82x93xe3x81x8bxe3x82x89xefxa6xa8
xe3x81x84xe3x81x88xe3x81x84xe3x81x88xe3x80x82
xefxbcx91
xe5xb9xb4xe9xa0x91xe5xbcxb5xe3x81xa3xe3x81x9f
xe3x80x8cxe8x87xaaxe5x88x86xe3x80x8dxe3x81x8b
xe3x8
2x89xe3x80x8cxe8x87xaaxe5x88x86xe3x80x8dxe3x8
1xabxe3x80x82xe3x81xa7xe3x81x99xe3x80x82xefxb
cx88
xe7xacx91xefxbcx89
xe3x81x9dxe3x82x8cxe3x82x82xe3x80x8exe8xa6xaa
xe3x81xb0xe3x81x8bxe3x82xb0xe3x
83x83xe3x82xbaxe3x80x8fxf0x95xbexb9',
<Field_Store: YES>, <Field_Index: TOKENIZED>))

what is actually wrong with parameters?

Thanks.

--
Yura Smolsky


_______________________________________________
pylucene-dev mailing list
pylucene-devosafoundation.org
http://lists.osafoundation.org/mailman/listinfo/pylu
cene-dev
some interestinh problem
user name
2006-12-27 19:16:47
On Wed, 27 Dec 2006, Yura Smolsky wrote:

> sometimes i receive weird exception for unicode data,
which is
> japanese text. here an entry from the log:
>
> 2006-12-27 11:01:18,541 ERROR
> Traceback (most recent call last):
>  File "/home/search/lib/index/Index.py", line
91, in indexDocument
>    doc.add(Field("summary",
fields['summary'], Field.Store.YES, Field.Index.TOKENIZED))
> InvalidArgsError: (<type 'PyLucene.Field'>,
'__init__', ('summary',
'xe7xb4xa0xe6x95xb5xe3x81xaaxe3x82xafxe3x83
>
xaaxe3x82xb9xe3x83x9exe3x82xb9xe3x83x97xe3x83
xacxe3x82xbcxe3x83xb3xe3x83x88xe3x81x8cxe5xb1
x8axe
>
3x81x8dxe3x81xbexe3x81x97xe3x81x9fxefxa3xa6
xe3x83xabxe3x82xa4xe3x82xb5xe3x83xb3xe3x82xbf
xe3x81
> x95xe3x82x93xe3x81x8bxe3x82x89xefxa6xa8
xe3x81x84xe3x81x88xe3x81x84xe3x81x88xe3x80x82
xefxbcx91
>
xe5xb9xb4xe9xa0x91xe5xbcxb5xe3x81xa3xe3x81x9f
xe3x80x8cxe8x87xaaxe5x88x86xe3x80x8dxe3x81x8b
xe3x8
>
2x89xe3x80x8cxe8x87xaaxe5x88x86xe3x80x8dxe3x8
1xabxe3x80x82xe3x81xa7xe3x81x99xe3x80x82xefxb
cx88
> xe7xacx91xefxbcx89
xe3x81x9dxe3x82x8cxe3x82x82xe3x80x8exe8xa6xaa
xe3x81xb0xe3x81x8bxe3x82xb0xe3x
> 83x83xe3x82xbaxe3x80x8fxf0x95xbexb9',
<Field_Store: YES>, <Field_Index: TOKENIZED>))
>
> what is actually wrong with parameters?

Dunno, it could be a problem with converting to Unicode ?

It looks like the argument is a regular python string
instance, not a unicode 
string instance. Because Java uses only unicode strings,
regular python 
strings are converted to Unicode by assuming they're utf-8
encoded. Is that 
the case with this string ?

A way around the problem is to convert the string to Unicode
yourself before 
passing it to PyLucene.

If you send in a piece of code that reproduces the problem,
I can be more 
helpful.

Andi..
_______________________________________________
pylucene-dev mailing list
pylucene-devosafoundation.org
http://lists.osafoundation.org/mailman/listinfo/pylu
cene-dev
some interestinh problem
user name
2006-12-27 19:39:51
Hello, Andi.

i thought that string could be regular python string. hmm...
i will
try to change that

AV> On Wed, 27 Dec 2006, Yura Smolsky wrote:

>> sometimes i receive weird exception for unicode
data, which is
>> japanese text. here an entry from the log:
>>
>> 2006-12-27 11:01:18,541 ERROR
>> Traceback (most recent call last):
>>  File "/home/search/lib/index/Index.py",
line 91, in indexDocument
>>    doc.add(Field("summary",
fields['summary'], Field.Store.YES, Field.Index.TOKENIZED))
>> InvalidArgsError: (<type 'PyLucene.Field'>,
'__init__', ('summary',
'xe7xb4xa0xe6x95xb5xe3x81xaaxe3x82xafxe3x83
>>
xaaxe3x82xb9xe3x83x9exe3x82xb9xe3x83x97xe3x83
xacxe3x82xbcxe3x83xb3xe3x83x88xe3x81x8cxe5xb1
x8axe
>>
3x81x8dxe3x81xbexe3x81x97xe3x81x9fxefxa3xa6
xe3x83xabxe3x82xa4xe3x82xb5xe3x83xb3xe3x82xbf
xe3x81
>>
x95xe3x82x93xe3x81x8bxe3x82x89xefxa6xa8
xe3x81x84xe3x81x88xe3x81x84xe3x81x88xe3x80x82
xefxbcx91
>>
xe5xb9xb4xe9xa0x91xe5xbcxb5xe3x81xa3xe3x81x9f
xe3x80x8cxe8x87xaaxe5x88x86xe3x80x8dxe3x81x8b
xe3x8
>>
2x89xe3x80x8cxe8x87xaaxe5x88x86xe3x80x8dxe3x8
1xabxe3x80x82xe3x81xa7xe3x81x99xe3x80x82xefxb
cx88
>> xe7xacx91xefxbcx89
xe3x81x9dxe3x82x8cxe3x82x82xe3x80x8exe8xa6xaa
xe3x81xb0xe3x81x8bxe3x82xb0xe3x
>> 83x83xe3x82xbaxe3x80x8fxf0x95xbexb9',
<Field_Store: YES>, <Field_Index: TOKENIZED>))
>>
>> what is actually wrong with parameters?

AV> Dunno, it could be a problem with converting to
Unicode ?

AV> It looks like the argument is a regular python string
instance, not a unicode
AV> string instance. Because Java uses only unicode
strings, regular python
AV> strings are converted to Unicode by assuming they're
utf-8 encoded. Is that
AV> the case with this string ?

AV> A way around the problem is to convert the string to
Unicode yourself before
AV> passing it to PyLucene.

AV> If you send in a piece of code that reproduces the
problem, I can be more
AV> helpful.

AV> Andi..




--
Yura Smolsky


_______________________________________________
pylucene-dev mailing list
pylucene-devosafoundation.org
http://lists.osafoundation.org/mailman/listinfo/pylu
cene-dev
[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )