List Info

Thread: Problems with StringReader()




Problems with StringReader()
user name
2006-11-28 17:21:48
Brilliant - I see the problem now, thank you very much.

As you say in my first example I was calling StringReader
straight with the
whatever read() returned - this is not necessarily utf-8. 
My own
StringReader class didn't specify utf-8 either.

I've just simply added

uniText=unicode(textString, 'utf8','ignore' )

and passed this into PyLucene.StringReader and
getBestFragment and it works!

Thanks again,

Phil.



-----Original Message-----
From: pylucene-dev-bouncesosafoundation.org
[mailto:pylucene-dev-bouncesosafoundation.org] On
Behalf Of Andi Vajda
Sent: 28 November 2006 17:05
To: 'pylucene-devosafoundation.org'
Subject: Re: [pylucene-dev] Problems with StringReader()


On Tue, 28 Nov 2006, BEADLING, Philip, GBM wrote:

>    def highlight( self, searchText,
searchResultFilenames ):
>        for filename in searchResultFilenames:
>            # Find text directory from documents
directory and convert
> network fileshare to local mount
>            textFile =
filename.replace("\Documents\","\Text\&qu
ot;) +
".txt"
>            textFile = textFile.replace("\",
"/")
>            textFile =
>
textFile.replace("//networkshare/IRDcaf/Documentation&q
uot;, "/Documentation")
>
>            print "<br>", searchText,
"<br>", textFile
>            if os.path.isfile( textFile ):
>                filen = open( textFile, 'r' )
>                textString = filen.read()
>                filen.close()
>                term = Term( "field",
searchText )
>                termQuery = TermQuery( term )
>                scorer = QueryScorer( termQuery )
>                highlighter = Highlighter( scorer )
>                simpAn = SimpleAnalyzer()
>                # PROBLEM IS HERE!!!!
>                reader = PyLucene.StringReader(
textString )
>                tokenStream =
simpAn.tokenStream("field", reader )
>                print highlighter.getBestFragment(
tokenStream, textString
)
>

At first quick glance, it doesn't look like 'textString' is
going to be of 
type 'unicode' in the above code sample. What comes out of a
python file's 
read method is a object of type 'str'. I believe PyLucene
will try to
convert 
the 'str' into a 'unicode' object by assuming 'utf-8'
encoding. If your
'str' 
is not 'utf-8' encoded then that is going to fail.

If you send in a piece of code that runs (with the required
data) that 
reproduces the problem you're experiencing, I might be able
to help you 
better.

Andi..
_______________________________________________
pylucene-dev mailing list
pylucene-devosafoundation.org
http://lists.osafoundation.org/mailman/listinfo/pylu
cene-dev

************************************************************
***********************
The Royal Bank of Scotland plc. Registered in Scotland No
90312. Registered Office: 36 St Andrew Square, Edinburgh EH2
2YB. 
Authorised and regulated by the Financial Services Authority

 
This e-mail message is confidential and for use by the 
addressee only. If the message is received by anyone other 
than the addressee, please return the message to the sender 
by replying to it and then delete the message from your 
computer. Internet e-mails are not necessarily secure. The 
Royal Bank of Scotland plc does not accept responsibility
for 
changes made to this message after it was sent. 

Whilst all reasonable care has been taken to avoid the 
transmission of viruses, it is the responsibility of the
recipient to 
ensure that the onward transmission, opening or use of this 
message and any attachments will not adversely affect its 
systems or data. No responsibility is accepted by The 
Royal Bank of Scotland plc in this regard and the recipient
should carry 
out such virus and other checks as it considers appropriate.

Visit our websites at: 
http://www.rbos.com
http://www.rbsmarkets.com 
************************************************************
***********************
_______________________________________________
pylucene-dev mailing list
pylucene-devosafoundation.org
http://lists.osafoundation.org/mailman/listinfo/pylu
cene-dev
[1]

about | contact  Other archives ( Real Estate discussion Medical topics )