|
List Info
Thread: some interestinh problem
|
|
| some interestinh problem |

|
2006-12-27 19:06:17 |
Hello, pylucene-dev.
sometimes i receive weird exception for unicode data, which
is
japanese text. here an entry from the log:
2006-12-27 11:01:18,541 ERROR
Traceback (most recent call last):
File "/home/search/lib/index/Index.py", line 91,
in indexDocument
doc.add(Field("summary", fields['summary'],
Field.Store.YES, Field.Index.TOKENIZED))
InvalidArgsError: (<type 'PyLucene.Field'>,
'__init__', ('summary',
'xe7xb4xa0xe6x95xb5xe3x81xaaxe3x82xafxe3x83
xaaxe3x82xb9xe3x83x9exe3x82xb9xe3x83x97xe3x83
xacxe3x82xbcxe3x83xb3xe3x83x88xe3x81x8cxe5xb1
x8axe
3x81x8dxe3x81xbexe3x81x97xe3x81x9fxefxa3xa6
xe3x83xabxe3x82xa4xe3x82xb5xe3x83xb3xe3x82xbf
xe3x81
x95xe3x82x93xe3x81x8bxe3x82x89xefxa6xa8
xe3x81x84xe3x81x88xe3x81x84xe3x81x88xe3x80x82
xefxbcx91
xe5xb9xb4xe9xa0x91xe5xbcxb5xe3x81xa3xe3x81x9f
xe3x80x8cxe8x87xaaxe5x88x86xe3x80x8dxe3x81x8b
xe3x8
2x89xe3x80x8cxe8x87xaaxe5x88x86xe3x80x8dxe3x8
1xabxe3x80x82xe3x81xa7xe3x81x99xe3x80x82xefxb
cx88
xe7xacx91xefxbcx89
xe3x81x9dxe3x82x8cxe3x82x82xe3x80x8exe8xa6xaa
xe3x81xb0xe3x81x8bxe3x82xb0xe3x
83x83xe3x82xbaxe3x80x8fxf0x95xbexb9',
<Field_Store: YES>, <Field_Index: TOKENIZED>))
what is actually wrong with parameters?
Thanks.
--
Yura Smolsky
_______________________________________________
pylucene-dev mailing list
pylucene-dev osafoundation.org
http://lists.osafoundation.org/mailman/listinfo/pylu
cene-dev
|
|
| some interestinh problem |

|
2006-12-27 19:16:47 |
On Wed, 27 Dec 2006, Yura Smolsky wrote:
> sometimes i receive weird exception for unicode data,
which is
> japanese text. here an entry from the log:
>
> 2006-12-27 11:01:18,541 ERROR
> Traceback (most recent call last):
> File "/home/search/lib/index/Index.py", line
91, in indexDocument
> doc.add(Field("summary",
fields['summary'], Field.Store.YES, Field.Index.TOKENIZED))
> InvalidArgsError: (<type 'PyLucene.Field'>,
'__init__', ('summary',
'xe7xb4xa0xe6x95xb5xe3x81xaaxe3x82xafxe3x83
>
xaaxe3x82xb9xe3x83x9exe3x82xb9xe3x83x97xe3x83
xacxe3x82xbcxe3x83xb3xe3x83x88xe3x81x8cxe5xb1
x8axe
>
3x81x8dxe3x81xbexe3x81x97xe3x81x9fxefxa3xa6
xe3x83xabxe3x82xa4xe3x82xb5xe3x83xb3xe3x82xbf
xe3x81
> x95xe3x82x93xe3x81x8bxe3x82x89xefxa6xa8
xe3x81x84xe3x81x88xe3x81x84xe3x81x88xe3x80x82
xefxbcx91
>
xe5xb9xb4xe9xa0x91xe5xbcxb5xe3x81xa3xe3x81x9f
xe3x80x8cxe8x87xaaxe5x88x86xe3x80x8dxe3x81x8b
xe3x8
>
2x89xe3x80x8cxe8x87xaaxe5x88x86xe3x80x8dxe3x8
1xabxe3x80x82xe3x81xa7xe3x81x99xe3x80x82xefxb
cx88
> xe7xacx91xefxbcx89
xe3x81x9dxe3x82x8cxe3x82x82xe3x80x8exe8xa6xaa
xe3x81xb0xe3x81x8bxe3x82xb0xe3x
> 83x83xe3x82xbaxe3x80x8fxf0x95xbexb9',
<Field_Store: YES>, <Field_Index: TOKENIZED>))
>
> what is actually wrong with parameters?
Dunno, it could be a problem with converting to Unicode ?
It looks like the argument is a regular python string
instance, not a unicode
string instance. Because Java uses only unicode strings,
regular python
strings are converted to Unicode by assuming they're utf-8
encoded. Is that
the case with this string ?
A way around the problem is to convert the string to Unicode
yourself before
passing it to PyLucene.
If you send in a piece of code that reproduces the problem,
I can be more
helpful.
Andi..
_______________________________________________
pylucene-dev mailing list
pylucene-dev osafoundation.org
http://lists.osafoundation.org/mailman/listinfo/pylu
cene-dev
|
|
| some interestinh problem |

|
2006-12-27 19:39:51 |
Hello, Andi.
i thought that string could be regular python string. hmm...
i will
try to change that
AV> On Wed, 27 Dec 2006, Yura Smolsky wrote:
>> sometimes i receive weird exception for unicode
data, which is
>> japanese text. here an entry from the log:
>>
>> 2006-12-27 11:01:18,541 ERROR
>> Traceback (most recent call last):
>> File "/home/search/lib/index/Index.py",
line 91, in indexDocument
>> doc.add(Field("summary",
fields['summary'], Field.Store.YES, Field.Index.TOKENIZED))
>> InvalidArgsError: (<type 'PyLucene.Field'>,
'__init__', ('summary',
'xe7xb4xa0xe6x95xb5xe3x81xaaxe3x82xafxe3x83
>>
xaaxe3x82xb9xe3x83x9exe3x82xb9xe3x83x97xe3x83
xacxe3x82xbcxe3x83xb3xe3x83x88xe3x81x8cxe5xb1
x8axe
>>
3x81x8dxe3x81xbexe3x81x97xe3x81x9fxefxa3xa6
xe3x83xabxe3x82xa4xe3x82xb5xe3x83xb3xe3x82xbf
xe3x81
>>
x95xe3x82x93xe3x81x8bxe3x82x89xefxa6xa8
xe3x81x84xe3x81x88xe3x81x84xe3x81x88xe3x80x82
xefxbcx91
>>
xe5xb9xb4xe9xa0x91xe5xbcxb5xe3x81xa3xe3x81x9f
xe3x80x8cxe8x87xaaxe5x88x86xe3x80x8dxe3x81x8b
xe3x8
>>
2x89xe3x80x8cxe8x87xaaxe5x88x86xe3x80x8dxe3x8
1xabxe3x80x82xe3x81xa7xe3x81x99xe3x80x82xefxb
cx88
>> xe7xacx91xefxbcx89
xe3x81x9dxe3x82x8cxe3x82x82xe3x80x8exe8xa6xaa
xe3x81xb0xe3x81x8bxe3x82xb0xe3x
>> 83x83xe3x82xbaxe3x80x8fxf0x95xbexb9',
<Field_Store: YES>, <Field_Index: TOKENIZED>))
>>
>> what is actually wrong with parameters?
AV> Dunno, it could be a problem with converting to
Unicode ?
AV> It looks like the argument is a regular python string
instance, not a unicode
AV> string instance. Because Java uses only unicode
strings, regular python
AV> strings are converted to Unicode by assuming they're
utf-8 encoded. Is that
AV> the case with this string ?
AV> A way around the problem is to convert the string to
Unicode yourself before
AV> passing it to PyLucene.
AV> If you send in a piece of code that reproduces the
problem, I can be more
AV> helpful.
AV> Andi..
--
Yura Smolsky
_______________________________________________
pylucene-dev mailing list
pylucene-dev osafoundation.org
http://lists.osafoundation.org/mailman/listinfo/pylu
cene-dev
|
|
[1-3]
|
|