Email lists > > [Xapian-devel] indexing and searching of timed events > [Xapian-devel] indexing and searching of timed events

[Xapian-devel] indexing and searching of timed events




This post if a part of  this thread

2008-06-05 09:33:10
indexing and searching of timed events
Hello,

I am working on an indexing/search engine for speech and I
would like to
try to use Xapian for that. I have an idea how to do it in
Xapian, but I
am not sure, if it is correct since I have just quickly
looked at the
Xapian code.

Tokens I need to index:
Each speech audio record, processed by a speech recognizer
is converted
to an oriented graph of hypotheses. Each hypothesis contains
the
recognized word, start time, end time and confidence score.
These
hypotheses are overlapped in time, so there is generally a
bunch of
hypotheses in each point of time. 

A simple graph of hypotheses (output of speech recognizer):
http://www.research.ibm.com/journal/sj/404/brown1.gif

So I suppose that the main thing I need to change in
Xapian code is the
termpos type (in types.h), which is just an unsigned
integer. For speech
indexing I need to change it to a struct containing start
time, end time
and score of recognized words.

Then to be able to search for phrases correctly, I have to
change the
code in ./matcher/phrasepostlist.cc to take start and end
time into
account.

Please, correct me if I am wrong or if I missed something. I
am really
new to Xapian, so I will be grateful for any hint on this
problem
(tutorial, code snippet, doxygen page, ...).

Thank you,
Miso



_______________________________________________
Xapian-devel mailing list
Xapian-devellists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-devel

about | contact  Other archives ( Real Estate discussion Medical topics )