|
List Info
Thread: acts_as_solr
|
|
| acts_as_solr |

|
2006-08-29 02:25:31 |
I've spent a few hours tinkering with an Ruby ActiveRecord
plugin to
index, delete, and search models fronted by a database into
Solr.
The results are are
$ script/console
>> Book.new(:title => "Solr in
Action", :author => "Yonik &
Hoss").save
=> true
>> Book.new(:title => "Lucene in
Action", :author => "Otis &
Erik").save
=> true
>> action_books =
Book.find_by_solr("action")
=> [#<Book:0x2406db0 attributes={"title"=>"Solr in
Action",
"author"=>"Yonik & Hoss",
"id"=>"21"}>,
#<Book:0x2406d74 attributes=
{"title"=>"Lucene in Action",
"author"=>"Otis & Erik",
"id"=>"22"}>]
>> action_books =
Book.find_by_solr("actions") # to show stemming
=> [#<Book:0x279ebbc attributes={"title"=>"Solr in
Action",
"author"=>"Yonik & Hoss",
"id"=>"21"}>,
#<Book:0x279eb80 attributes=
{"title"=>"Lucene in Action",
"author"=>"Otis & Erik",
"id"=>"22"}>]
>> Book.find_by_solr("yonik OR otis") #
to show QueryParser boolean
expressions
=> [#<Book:0x2793adc attributes={"title"=>"Solr in
Action",
"author"=>"Yonik & Hoss",
"id"=>"21"}>,
#<Book:0x2793aa0 attributes=
{"title"=>"Lucene in Action",
"author"=>"Otis & Erik",
"id"=>"22"}>]
My model looks like this:
class Book < ActiveRecord::Base
acts_as_solr
end
(ain't ActiveRecord slick?!)
acts_as_solr adds save and destroy hooks. All model
attributes are
sent to Solr like this:
>> action_books[0].to_solr_doc.to_s
=> "<doc><field
name='id'>Book:21</field><field
name='type'>Book</
field><field name='pk'>21</field><field
name='title_t'>Solr in
Action</field><field name='author_t'>Yonik
& Hoss</field></doc>"
The Solr id is <model_name>:<primary_key>
formatted, type field is
the model name and AND'd to queries to narrow them to the
requesting
model, the pk field is the primary key of the database
table, and the
rest of the attributes are named with an _t suffix to
leverage the
dynamic field capability. All _t fields are copied into the
default
search field of "text".
At this point it is extremely basic, no configurability, and
there
are lots of issues to address to flesh this into something
robustly
general purpose. But as a proof-of-concept I'm pleased at
how easy
it was to write this hook.
I'd like to commit this to the Solr repository. Any
objections?
Once committed, folks will be able to use
"script/plugin install ..."
to install the Ruby side of things, and using a binary
distribution
of Solr's example application and a custom solr/conf
directory (just
for schema.xml) they'd be up and running quite quickly. If
ok to
commit, what directory should I put things under? How about
just
"ruby"?
I currently do not foresee having a lot of time to spend on
this, but
I do feel quite strongly that having an
"acts_as_solr" hook into
ActiveRecord will really lure in a lot of Rails developers.
I'm sure
there will be plenty that will not want a hybrid Ruby/Java
environment, and for them there is the ever improving Ferret
project. Ferret, however, would still need layers added on
top of it
to achieve all that Solr provides, so Solr is where I'm at
now.
Despite my time constraints, I'm volunteering to bring this
prototype
to a documented and easily usable state, and manage patches
submitted
by savvy users to make it robust.
Thoughts?
Erik
p.s. And for the really die-hard bleeding edgers, the
complete
acts_as_solr code is pasted below which you can put into a
Rails
project in vendor/plugins/acts_as_solr.rb, along with a
simple one-
line require 'acts_as_solr' init.rb in vendor/plugins.
Sheepishly,
here's the hackery....
--------
require 'active_record'
require 'rexml/document'
require 'net/http'
def post_to_solr(body, mode = :search)
url = URI.parse("http://localhost:8983"
;)
post = Net::HTTP::Post.new(mode == :search ?
"/solr/select" : "/
solr/update")
post.body = body
post.content_type = 'application/x-www-form-urlencoded'
response = Net::HTTP.start(url.host, url.port) do |http|
http.request(post)
end
return response.body
end
module SolrMixin
module Acts #:nodoc:
module ARSolr #:nodoc:
def self.included(base)
base.extend(ClassMethods)
end
module ClassMethods
def acts_as_solr(options={}, solr_options={})
# configuration = {}
# solr_configuration = {}
# configuration.update(options) if
options.is_a?(Hash)
# solr_configuration.update(solr_options) if
solr_options.is_a?(Hash)
after_save :solr_save
after_destroy :solr_destroy
include SolrMixin::Acts::ARSolr::InstanceMethods
end
def find_by_solr(q, options = {}, find_options =
{})
q = "(#) AND type:#{self.name}"
response =
post_to_solr("q=#{ERB::Util::url_encode(q)}
&wt=ruby&fl=pk")
data = eval(response)
docs = data['response']['docs']
return [] if docs.size == 0
ids = docs.collect {|doc| doc['pk']}
conditions = [ "#{self.table_name}.id in
(?)", ids ]
result = self.find(:all,
:conditions => conditions)
end
end
module InstanceMethods
def solr_id
"#{self.class.name}:#{self.id}"
end
def solr_save
logger.debug "solr_save: #{self.class.name}
: #{self.id}"
xml = REXML::Element.new('add')
xml.add_element to_solr_doc
response = post_to_solr(xml.to_s, :update)
solr_commit
true
end
# remove from index
def solr_destroy
logger.debug "solr_destroy:
#{self.class.name} : #{self.id}"
post_to_solr("<delete><id>#</i
d></delete>", :update)
solr_commit
true
end
def solr_commit
post_to_solr('<optimize
waitFlush="false"
waitSearcher="false"/>', :update)
end
# convert instance to Solr document
def to_solr_doc
logger.debug "to_doc: creating doc for
class: #
{self.class.name}, id: #{self.id}"
doc = REXML::Element.new('doc')
# Solr id is <classname>:<id> to be
unique across all models
doc.add_element field("id", solr_id)
doc.add_element field("type",
self.class.name)
doc.add_element field("pk",
self.id.to_s)
# iterate through the fields and add them to the
document
self.attributes.each_pair do |key,value|
# _t is appended as a dynamic
"text" field for Solr
doc.add_element field("#_t",
value.to_s) unless
key.to_s == "id"
end
return doc
end
def field(name, value)
field = REXML::Element.new("field")
field.add_attribute("name", name)
field.add_text(value)
field
end
end
end
end
end
# reopen ActiveRecord and include all the above to make
# them available to all our models if they want it
ActiveRecord::Base.class_eval do
include SolrMixin::Acts::ARSolr
end
|
|
| acts_as_solr |

|
2006-08-29 20:12:13 |
: I've spent a few hours tinkering with an Ruby
ActiveRecord plugin to
: index, delete, and search models fronted by a database
into Solr.
I don't know crap about Ruby, but that looks pretty cool.
: I'd like to commit this to the Solr repository. Any
objections?
: commit, what directory should I put things under? How
about just
: "ruby"?
no objections .. as for where, my gut says somwhere under
src/ (ie:
src/ruby) but the current src/ tree is very focused on the
server itself
-- src/java, src/scripts, and src/webapp all being
completley server
specific, src/apps and src/test being server specific in
nature since they
focus on src/java.
perhaps a top level "clients" directory with
this going in clients/ruby ?
-Hoss
|
|
| acts_as_solr |

|
2006-08-29 20:16:31 |
On Aug 29, 2006, at 4:12 PM, Chris Hostetter wrote:
> perhaps a top level "clients" directory
with this going in clients/
> ruby ?
Pardon me for chiming in, but this is a very good idea. I
would also
suggest that Java clients should also go in here.
phil.
--
Whirlycott
Philip Jacob
phil whirlycott.com
http://www.whirlycott
.com/phil/
|
|
| acts_as_solr |

|
2006-08-29 20:22:22 |
On 8/29/06, WHIRLYCOTT <phil whirlycott.com> wrote:
> On Aug 29, 2006, at 4:12 PM, Chris Hostetter wrote:
>
> > perhaps a top level "clients"
directory with this going in clients/
> > ruby ?
>
> Pardon me for chiming in, but this is a very good idea.
I would also
> suggest that Java clients should also go in here.
Might this fit better under a contrib/ umbrella? This would
more
closely model lucene's layout.
-Mike
|
|
| acts_as_solr |

|
2006-08-29 20:41:38 |
: > > perhaps a top level "clients"
directory with this going in clients/
: > > ruby ?
: > Pardon me for chiming in, but this is a very good
idea. I would also
: > suggest that Java clients should also go in here.
: Might this fit better under a contrib/ umbrella? This
would more
: closely model lucene's layout.
Maybe ... "contrib" in the "Java
Lucene" project sense however is all java
code, i would imagine that if someone wrote a perl utility
to deal with
index files it would not make sense to but in the Lucene
"contrib"
directory for that reason ... thre may be Java code
submitted down the
road that we think is useful enough to make available in
releases, but
niche enough that we on't want to put it in the main
solr.war, which might
be more along the lines of the "contrib" notion
-- hence my suggestion of
"clients" ...
...but i'm just thinking outloud at this point, i don't
have a strong
opinion either way.
-Hoss
|
|
| acts_as_solr |

|
2006-08-29 21:25:34 |
On 8/29/06, Chris Hostetter <hossman_lucene fucit.org> wrote:
> Maybe ... "contrib" in the "Java
Lucene" project sense however is all java
> code, i would imagine that if someone wrote a perl
utility to deal with
> index files it would not make sense to but in the
Lucene "contrib"
> directory for that reason ... thre may be Java code
submitted down the
> road that we think is useful enough to make available
in releases, but
> niche enough that we on't want to put it in the main
solr.war, which might
> be more along the lines of the "contrib"
notion -- hence my suggestion of
> "clients" ...
>
> ...but i'm just thinking outloud at this point, i
don't have a strong
> opinion either way.
Your point definately resonates. clients are also a rather
important
type of third-party contribution for a webapp thus a
top-level
directory makes sense.
-Mike
|
|
| acts_as_solr |

|
2006-08-29 21:33:39 |
Let's create it as a top-level directory solely because it
might give people
a small head-start in SOLR evaluation and getting things off
the ground
(less navigation around the tree to get started). If there
are any
problems, we can always revert back to /contrib/clients.
B
-----Original Message-----
From: Mike Klaas [mailto:mike.klaas gmail.com]
Sent: Tuesday, August 29, 2006 3:26 PM
To: solr-user lucene.apache.org
Subject: Re: acts_as_solr
On 8/29/06, Chris Hostetter <hossman_lucene fucit.org> wrote:
> Maybe ... "contrib" in the "Java
Lucene" project sense however is all java
> code, i would imagine that if someone wrote a perl
utility to deal with
> index files it would not make sense to but in the
Lucene "contrib"
> directory for that reason ... thre may be Java code
submitted down the
> road that we think is useful enough to make available
in releases, but
> niche enough that we on't want to put it in the main
solr.war, which might
> be more along the lines of the "contrib"
notion -- hence my suggestion of
> "clients" ...
>
> ...but i'm just thinking outloud at this point, i
don't have a strong
> opinion either way.
Your point definately resonates. clients are also a rather
important
type of third-party contribution for a webapp thus a
top-level
directory makes sense.
-Mike
|
|
| acts_as_solr |

|
2006-08-30 13:18:22 |
Currently src is all server specific and I would rather see
it kept that
way.
I am OK with either /client or /contrib.
Bill
On 8/29/06, Brian Lucas <blucasco gmail.com> wrote:
>
> Let's create it as a top-level directory solely
because it might give
> people
> a small head-start in SOLR evaluation and getting
things off the ground
> (less navigation around the tree to get started). If
there are any
> problems, we can always revert back to
/contrib/clients.
>
> B
>
> -----Original Message-----
> From: Mike Klaas [mailto:mike.klaas gmail.com]
> Sent: Tuesday, August 29, 2006 3:26 PM
> To: solr-user lucene.apache.org
> Subject: Re: acts_as_solr
>
> On 8/29/06, Chris Hostetter <hossman_lucene fucit.org> wrote:
>
> > Maybe ... "contrib" in the "Java
Lucene" project sense however is all
> java
> > code, i would imagine that if someone wrote a perl
utility to deal with
> > index files it would not make sense to but in the
Lucene "contrib"
> > directory for that reason ... thre may be Java
code submitted down the
> > road that we think is useful enough to make
available in releases, but
> > niche enough that we on't want to put it in the
main solr.war, which
> might
> > be more along the lines of the
"contrib" notion -- hence my suggestion
> of
> > "clients" ...
> >
> > ...but i'm just thinking outloud at this point, i
don't have a strong
> > opinion either way.
>
> Your point definately resonates. clients are also a
rather important
> type of third-party contribution for a webapp thus a
top-level
> directory makes sense.
>
> -Mike
>
>
|
|
| acts_as_solr |

|
2006-08-30 17:36:12 |
Cool stuff Erik!
I think Ruby is very fertile ground for Solr to pick up
users/developers right now.
Getting into some little details, it looks like a commit
(which
actualy does an optimize) is done on every .save, right?
I also notice that the commit is asynchronous... so one
could do a
save, then do an immediate search and not see the changes
yet, right?
I don't know anything about RoR and ActiveRecord, but
hopefully there
is some way to avoid a commit on every operation.
> I'd like to commit this to the Solr repository.
+1
Let's go with clients/ruby
-Yonik
|
|
| acts_as_solr |

|
2006-08-30 19:41:58 |
You might want to look at acts_as_searchable for Ruby:
http://ru
byforge.org/projects/ar-searchable
That's a similar plugin for the Hyperestraier search engine
using its
REST interface.
On 8/28/06, Erik Hatcher <erik ehatchersolutions.com>
wrote:
> I've spent a few hours tinkering with an Ruby
ActiveRecord plugin to
> index, delete, and search models fronted by a database
into Solr.
|
|
|
|