|
List Info
Thread: New AllegroGraph adapter (development)
|
|
| New AllegroGraph adapter (development) |

|
2007-10-30 08:25:59 |
Good morning! Thank you for developing such a useful
library.
Franz, Inc. is generously sponsoring the development of an
ActiveRDF
adapter for AllegroGraph. Currently, it's still fairly
primitive, and
it's not yet fast enough for production work.
You can access the source tree through RubyForge:
http://rubyfo
rge.org/scm/?group_id=4460
Be warned: The tree is in a state of considerable flux, and
more-or-less everything is subject to change.
Right now, this adapter works as a read/write
SPARQL-over-HTTP
adapter. (We'd be happy to merge the useful bits of this
code back
into activerdf_sparql.) In the future, we're looking at
various ways
to speed up the adapter considerably, and we may change the
underlying
protocol to something with a quicker round-trip time.
* Why it's slow
We've been wrestling with a few design issues, and we're not
quite
sure about what we should do next. Our biggest problem: Our
query
granularity is too small for optimum performance.
Imagine that we have an RDF database with the following
schema:
foaf:Organization
foaf:name
foaf:member
foaf:Person
foaf:title
foaf:name
foaf:nick
Now imagine that we write the following Ruby code:
FOAF::Organization.find_all.each do |org|
puts "Members of #{org.name}:"
org.all_member.each do |person|
puts " #{person.title} #{person.name}
(#{person.nick})"
end
end
When we run this code, we make separate SPARQL queries for
each call
to "org.name", "person.title",
"person.name" and "person.nick". But
instead of going to an in-process database, these queries
each require
an HTTP request to a separate process. This costs us at
least 5-25ms
per query, depending on OS and processor speed.
We'd like to speed this up considerably, but we're not sure
where to
begin. We might be able to improve performance considerably
using
larger-granularity queries, but we're not sure how to
achieve that
with ActiveRDF. For example, it might be possible to reduce
that
entire inner loop to a single query, assuming we have enough
knowledge
of the schema:
SELECT ?s ?title ?name ?nick WHERE {
<http://.../OrgName>
foaf:member ?s .
?s rdf:type foaf:Person .
OPTIONAL { ?s foaf:title ?title } .
OPTIONAL { ?s foaf:name ?name } .
OPTIONAL { ?s foaf:nick ?nick }
}
Is ActiveRDF an appropriate library if we want to make such
large-granularity queries? Obviously, this is pretty far
outside the
mission of the existing rdflite, redland and sesame2
adapters.
Thank you for any information you can provide, and thank you
for such
an excellent RDF library!
Cheers,
Eric
_______________________________________________
ActiveRDF mailing list
ActiveRDF lists.deri.org
http
://lists.deri.org/mailman/listinfo/activerdf
|
|
| Re: large-granularity queries |
  United States |
2007-10-30 09:09:42 |
Eric Kidd wrote:
> When we run this code, we make separate SPARQL queries
for each call
> to "org.name", "person.title",
"person.name" and "person.nick". But
> instead of going to an in-process database, these
queries each require
> an HTTP request to a separate process. This costs us at
least 5-25ms
> per query, depending on OS and processor speed.
>
> We'd like to speed this up considerably, but we're not
sure where to
> begin. We might be able to improve performance
considerably using
> larger-granularity queries, but we're not sure how to
achieve that
> with ActiveRDF. For example, it might be possible to
reduce that
> entire inner loop to a single query, assuming we have
enough knowledge
> of the schema:
>
> SELECT ?s ?title ?name ?nick WHERE {
> <http://.../OrgName>
foaf:member ?s .
> ?s rdf:type foaf:Person .
> OPTIONAL { ?s foaf:title ?title } .
> OPTIONAL { ?s foaf:name ?name } .
> OPTIONAL { ?s foaf:nick ?nick }
> }
>
> Is ActiveRDF an appropriate library if we want to make
such
> large-granularity queries? Obviously, this is pretty
far outside the
> mission of the existing rdflite, redland and sesame2
adapters.
>
>
I think it is possible, by adding an alternative virtual
object to
activeRDF. Instead of only having an object that
corresponds to an
RDF:Resource, (or in addition to having that object), we
should have a
class RDF:Resources, which corresponds to the results of a
query. The
advantage of this is that we delay actually firing off a
query until an
'each' operator is called in Ruby -- until then the
predicates would be
virtual methods of the RDF:Resources object, and we could
add to the
query without actually submitting to the server. The trick
would be an
'each' implementation that would be sufficiently clever to
combine the
query implicit in its subroutine with the query-so-far of
its parent
object for some reasonable set of subroutines.
Second advantage of the approach would be that it could
handle blank
nodes, and it could handle clusters of literals in alternate
languages.
Of course, I have not implemented anything yet...
Benno
_______________________________________________
ActiveRDF mailing list
ActiveRDF lists.deri.org
http
://lists.deri.org/mailman/listinfo/activerdf
|
|
| Re: large-granularity queries |

|
2007-10-30 12:41:40 |
On 30.10.2007, at 15:09, Benno Blumenthal wrote:
> Eric Kidd wrote:
>> When we run this code, we make separate SPARQL
queries for each call
>> to "org.name", "person.title",
"person.name" and "person.nick". But
>> instead of going to an in-process database, these
queries each
>> require
>> an HTTP request to a separate process. This costs
us at least 5-25ms
>> per query, depending on OS and processor speed.
>>
>> We'd like to speed this up considerably, but we're
not sure where to
>> begin. We might be able to improve performance
considerably using
>> larger-granularity queries, but we're not sure how
to achieve that
>> with ActiveRDF. For example, it might be possible
to reduce that
>> entire inner loop to a single query, assuming we
have enough
>> knowledge
>> of the schema:
>>
>> SELECT ?s ?title ?name ?nick WHERE {
>> <http://.../OrgName>
foaf:member ?s .
>> ?s rdf:type foaf:Person .
>> OPTIONAL { ?s foaf:title ?title } .
>> OPTIONAL { ?s foaf:name ?name } .
>> OPTIONAL { ?s foaf:nick ?nick }
>> }
>>
>> Is ActiveRDF an appropriate library if we want to
make such
>> large-granularity queries? Obviously, this is
pretty far outside the
>> mission of the existing rdflite, redland and
sesame2 adapters.
>>
>>
> I think it is possible, by adding an alternative
virtual object to
> activeRDF. Instead of only having an object that
corresponds to an
> RDF:Resource, (or in addition to having that object),
we should
> have a class RDF:Resources, which corresponds to the
results of a
> query. The advantage of this is that we delay
actually firing off
> a query until an 'each' operator is called in Ruby --
until then
> the predicates would be virtual methods of the
RDF:Resources
> object, and we could add to the query without actually
submitting
> to the server. The trick would be an 'each'
implementation that
> would be sufficiently clever to combine the query
implicit in its
> subroutine with the query-so-far of its parent object
for some
> reasonable set of subroutines.
This sounds like an interesting idea.
There definitively needs to be some sort of optimisation for
SPARQL
query speed.
So it is a good thing to have a lively discussion about
this, and
probably more then one implementation for some good old
Darwinian
competition.
I actually often used a fairly low level variation of the
RDF:Resources idea.
When doing expensive queries with large results, I just made
a query
by hand, like this:
Query.new.distinct(:s1, :p1, :p2).where(:s1, someproperty,
"some sort
of ID").where(:s1, prop1, :p1).where(:s1, prop2,
:p2).execute
Then I could get all the necessary data from the big result
array.
We could support this type of ideom more elegantly and
transparently
with some sort of higher granularity sparql query.
Speaking about not really released features:
I have experimented with a very primitive SPARQL query
result cache.
This had very good results, but I only used it with static
sparql
stores. It of course would be much complexer for dynamic
stores.
> Second advantage of the approach would be that it could
handle
> blank nodes, and it could handle clusters of literals
in alternate
> languages.
>
> Of course, I have not implemented anything yet...
The current trunk of activerdf supports blank nodes in
SPARQL queries.
_______________________________________________
ActiveRDF mailing list
ActiveRDF lists.deri.org
http
://lists.deri.org/mailman/listinfo/activerdf
|
|
| Re: New AllegroGraph adapter
(development) |

|
2007-10-30 12:48:23 |
On 30.10.2007, at 14:25, Eric Kidd wrote:
> Good morning! Thank you for developing such a useful
library.
Thanks, no problem :P
> We'd like to speed this up considerably, but we're not
sure where to
> begin.
It is allows good to see new ideas in this area.
Speed is currently the most important issue for future
ActiveRDF
development IMHO.
> Is ActiveRDF an appropriate library if we want to make
such
> large-granularity queries? Obviously, this is pretty
far outside the
> mission of the existing rdflite, redland and sesame2
adapters.
I will have to think about it some more, and Eyal should
also take a
look at the issue.
But it probably would be better to implement this kind of
thing with
a specific Domain Specific Language (DSL) on top of
activerdf.
I hear there is some sort of plugin or so for Rails which
can be used
to emulate a data source for Rails, with all the bells and
whistles
which Rails can then use.
There is also the possibility of somehow incorporating
ActiveResource, which can be used to access resources over
REST.
I currently am wrapping up my master thesis about design
patterns for
the semantic web, and I think I can spend more time on this
issue
after that.
_______________________________________________
ActiveRDF mailing list
ActiveRDF lists.deri.org
http
://lists.deri.org/mailman/listinfo/activerdf
|
|
| Re: New AllegroGraph adapter
(development) |

|
2007-10-30 12:56:20 |
|
| Benjamin --
Be sure to send it out to this list when you're done with your thesis. I bet a bunch of people on here would enjoy reading what you have to say about the topic.
Eric --
Regarding you're query:
SELECT ?s ?title ?name ?nick WHERE {
<http://.../OrgName> foaf:member ?s . ?s rdf:type foaf:Person . OPTIONAL { ?s foaf:title ?title } . OPTIONAL { ?s foaf:name ?name } . OPTIONAL { ?s foaf:nick ?nick } }
Wouldn39;t it be possible to implement some caching layer that simply performs
SELECT ?p ?o WHERE { <your_subject> ?p ?o }
When you request any particular property for the first time?
You might consider implementing the configuration of such a caching system in the same way the ActiveRecord associations work, so that you would end up with something like this (note: I realize this isn't how ActiveRDF class objects are currently declared):
class FOAF::Person << ActiveRDF::Base belongs_to FOAF::Organization, :include => true end
That would tell the cache to eager-load ?org ?p ?o WHERE <user> foaf:organization ?org .
-ted
On 10/30/07, Benjamin Heitmann < benjamin.heitmann deri.org">benjamin.heitmann deri.org> wrote:
On 30.10.2007, at 14:25, Eric Kidd wrote:
> Good morning! Thank you for developing such a useful library.
Thanks, no problem :P
> We'd like to speed this up considerably, but we're not sure where to
> begin.
It is allows good to see new ideas in this area. Speed is currently the most important issue for future ActiveRDF development IMHO.
> Is ActiveRDF an appropriate library if we want to make such
> large-granularity queries? Obviously, this is pretty far outside the > mission of the existing rdflite, redland and sesame2 adapters.
I will have to think about it some more, and Eyal should also take a
look at the issue.
But it probably would be better to implement this kind of thing with a specific Domain Specific Language (DSL) on top of activerdf.
I hear there is some sort of plugin or so for Rails which can be used
to emulate a data source for Rails, with all the bells and whistles which Rails can then use.
There is also the possibility of somehow incorporating ActiveResource, which can be used to access resources over REST.
I currently am wrapping up my master thesis about design patterns for the semantic web, and I think I can spend more time on this issue after that. _______________________________________________ ActiveRDF mailing list
ActiveRDF lists.deri.org">ActiveRDF lists.deri.org http://lists.deri.org/mailman/listinfo/activerdf
-- Edward Benson http://www.edwardbenson.com/
|
| Re: large-granularity queries |

|
2007-10-30 13:06:31 |
|
| Another option if you want to do raw SPARQL queries is to store them in RHTML templates (a bit hackish, but it works) and then render them to a string.
I've found that works really well when you have some parameters being passed in by the user that will end up affecting the SPARQL query --- you can just write your query as an ERb template, use the controller to set up the variables that fill in the template, and then do something like the following:
& nbsp; query = render_to_string :partial => 'queries/annotation_query9;, :locals => local_hash, :template => false results = SPARQL.execute_sparql_query
(query)
Where views/queries/annotation_query looks something like:
<%= render :partial => 'namespaces' %>
SELECT DISTINCT ?Annotation ?Entity ?EntityLabel ?Poly ?Point WHERE { # snip # .... # Things like this:
# ----------------------------------------------------------------------- # FILTER - on class [optional] # -----------------------------------------------------------------------
<% if ( (defined? classes) && (classes != nil)) classes.each { |klass| %> ?Entity rdf:type <<%= klass %>> <% } end %>
#snip .....
You have to use the result set manually, but it allows you to issue some pretty complex queries without a lot of work.
While the ActiveRecord-like object mapping is a great part of ActiveRDF, sometimes maintaining complex SPARQL queries should be really left to plain-old sparql template files.
-ted On 10/30/07, Benjamin Heitmann < benjamin.heitmann deri.org">benjamin.heitmann deri.org> wrote:
On 30.10.2007, at 15:09, Benno Blumenthal wrote:
> Eric Kidd wrote: >> When we run this code, we make separate SPARQL queries for each call
>> to "org.name", "person.title", "person.name" and "person.nick". But >> instead of going to an in-process database, these queries each
>> require >> an HTTP request to a separate process. This costs us at least 5-25ms >> per query, depending on OS and processor speed. >> >> We'd like to speed this up considerably, but we're not sure where to
>> begin. We might be able to improve performance considerably using >> larger-granularity queries, but we're not sure how to achieve that >> with ActiveRDF. For example, it might be possible to reduce that
>> entire inner loop to a single query, assuming we have enough >> knowledge >> of the schema: >> >> SELECT ?s ?title ?name ?nick WHERE { >> <http://.../OrgName> foaf:member ?s .
>> ?s rdf:type foaf:Person . >> OPTIONAL { ?s foaf:title ?title } . >> OPTIONAL { ?s foaf:name ?name } . >> OPTIONAL { ?s foaf:nick ?nick } >> } >>
>> Is ActiveRDF an appropriate library if we want to make such >> large-granularity queries? Obviously, this is pretty far outside the >> mission of the existing rdflite, redland and sesame2 adapters.
>> >> > I think it is possible, by adding an alternative virtual object to > activeRDF. Instead of only having an object that corresponds to an > RDF:Resource, (or in addition to having that object), we should
> have a class RDF:Resources, which corresponds to the results of a > query. The advantage of this is that we delay actually firing off > a query until an 'each' operator is called in Ruby -- until then
> the predicates would be virtual methods of the RDF:Resources > object, and we could add to the query without actually submitting > to the server. The trick would be an 'each' implementation that
> would be sufficiently clever to combine the query implicit in its > subroutine with the query-so-far of its parent object for some > reasonable set of subroutines.
This sounds like an interesting idea.
There definitively needs to be some sort of optimisation for SPARQL query speed.
So it is a good thing to have a lively discussion about this, and probably more then one implementation for some good old Darwinian
competition.
I actually often used a fairly low level variation of the RDF:Resources idea.
When doing expensive queries with large results, I just made a query by hand, like this: Query.new.distinct
(:s1, :p1, :p2).where(:s1, someproperty, "some sort of ID").where(:s1, prop1, :p1).where(:s1, prop2, :p2).execute
Then I could get all the necessary data from the big result array.
We could support this type of ideom more elegantly and transparently
with some sort of higher granularity sparql query.
Speaking about not really released features:
I have experimented with a very primitive SPARQL query result cache. This had very good results, but I only used it with static sparql
stores. It of course would be much complexer for dynamic stores.
> Second advantage of the approach would be that it could handle > blank nodes, and it could handle clusters of literals in alternate
> languages. > > Of course, I have not implemented anything yet...
The current trunk of activerdf supports blank nodes in SPARQL queries. _______________________________________________ ActiveRDF mailing list
ActiveRDF lists.deri.org">ActiveRDF lists.deri.org http://lists.deri.org/mailman/listinfo/activerdf
-- Edward Benson http://www.edwardbenson.com/
|
| Re: large-granularity queries |

|
2007-11-01 07:21:35 |
On 10/30/07/10/07 10:09 -0400, Benno Blumenthal wrote:
>> Is ActiveRDF an appropriate library if we want to
make such
>> large-granularity queries? Obviously, this is
pretty far outside the
>> mission of the existing rdflite, redland and
sesame2 adapters.
> I think it is possible, by adding an alternative
virtual object to
> activeRDF. Instead of only having an object that
corresponds to an
> RDF:Resource, (or in addition to having that object),
we should have a
> class RDF:Resources, which corresponds to the results
of a query. The
> advantage of this is that we delay actually firing off
a query until an
> 'each' operator is called in Ruby -- until then the
predicates would be
> virtual methods of the RDF:Resources object, and we
could add to the
> query without actually submitting to the server. The
trick would be an
> 'each' implementation that would be sufficiently clever
to combine the
> query implicit in its subroutine with the query-so-far
of its parent
> object for some reasonable set of subroutines.
Agree, that sounds like a good way. I have to let it sink
and think about
it for a while, but I do feel that it's appropriate and
necessary. In cases
where speed is important, I now often resort to the
lower-level Query
object, Query.new.distinct(:s).where(....) as described by
Benjamin, since
it allows such compound queries without multiple
round-trips, but it would
be of course much nicer to hide this under the roof.
So...I'll think about it. As always, any suggestions,
half-baked
implementations or patches would speed up my thinking
considerably!
-eyal
_______________________________________________
ActiveRDF mailing list
ActiveRDF lists.deri.org
http
://lists.deri.org/mailman/listinfo/activerdf
|
|
| Re: New AllegroGraph adapter
(development) |

|
2007-11-01 07:27:13 |
On 10/30/07/10/07 09:25 -0400, Eric Kidd wrote:
>Franz, Inc. is generously sponsoring the development of
an ActiveRDF
>adapter for AllegroGraph. Currently, it's still fairly
primitive, and
>it's not yet fast enough for production work.
>
>Right now, this adapter works as a read/write
SPARQL-over-HTTP
>adapter. (We'd be happy to merge the useful bits of this
code back
>into activerdf_sparql.) In the future, we're looking at
various ways
>to speed up the adapter considerably, and we may change
the underlying
>protocol to something with a quicker round-trip time.
Aside from discussing the speed-up of querying in ActiveRDF,
I just wanted
to say that I consider the development of this AllegroGraph
adapter great
news, that I'm very glad to see it being open source, and
that I'm grateful
to Franz Inc. for sponsoring your work!
Also, I saw your comment in the test code, regarding the way
ActiveRDF
supports the setting of attribute values
(smersh.foaf::member = tatiana).
I'm also not happy with this. Suggestions for improvement,
anybody?
-eyal
_______________________________________________
ActiveRDF mailing list
ActiveRDF lists.deri.org
http
://lists.deri.org/mailman/listinfo/activerdf
|
|
| Re: New AllegroGraph adapter
(development) |

|
2007-11-01 08:40:35 |
On 10/30/07, Edward Benson <edward.benson gmail.com> wrote:
> Regarding you're query:
>
> SELECT ?s ?title ?name ?nick WHERE {
> <http://.../OrgName>
foaf:member ?s .
> ?s rdf:type foaf:Person .
> OPTIONAL { ?s foaf:title ?title } .
> OPTIONAL { ?s foaf:name ?name } .
> OPTIONAL { ?s foaf:nick ?nick }
> }
>
>
> Wouldn't it be possible to implement some caching layer
that simply performs
>
> SELECT ?p ?o WHERE { <your_subject> ?p ?o }
>
> When you request any particular property for the first
time?
That's a good question. Let me try to explain what I was
thinking.
Imagine the following (contrived) schema:
geo:Country
geo:name rdfs:Literal
geo:gdp rdfs:Literal
geo:citizen foaf:Person
...and the following loop:
GEO::Country.find_all do |country|
puts "#{country.name} #{country.gdp}"
end
In this rather silly case, querying up front for geo:name
and geo:gdp
saves us 2*(number of countries) queries as we iterate
through the
loop.
But querying for geo:citizen is not such a good idea, unless
the
programmer *really, really* needs it. :-(
> You might consider implementing the configuration of
such a caching system
> in the same way the ActiveRecord associations work, so
that you would end up
> with something like this (note: I realize this isn't
how ActiveRDF class
> objects are currently declared):
>
> class FOAF::Person << ActiveRDF::Base
> belongs_to FOAF::Organization, :include => true
> end
Yeah, this is a nice idea. We could probably fill in most of
this data
automatically by querying for:
?p rdf:type rdfs:Property
?p rdfs:domain ?domain
?p rdfs:range ?range
Does anybody else have any suggestions?
Thank you for your ideas!
Cheers,
Eric
_______________________________________________
ActiveRDF mailing list
ActiveRDF lists.deri.org
http
://lists.deri.org/mailman/listinfo/activerdf
|
|
| Re: large-granularity queries |

|
2007-11-01 08:42:39 |
On 10/30/07, Benno Blumenthal <benno iri.columbia.edu> wrote:
> The trick would be an
> 'each' implementation that would be sufficiently clever
to combine the
> query implicit in its subroutine with the query-so-far
of its parent
> object for some reasonable set of subroutines.
Interesting! Let me see if I understand this idea. On the
first pass
through the loop, we'd have:
FOAF::Person.find_all.each do |person|
puts "#{person.title} #{person.name}"
end
...and on the first pass through the loop, the
"person.title" and
"person.name" queries would be run in the ordinary
fashion? And then,
once the the first iteration of the loop was finished, the
system
could re-issue the original query, asking for
"title" and "name"?
Or did you have something else in mind?
Thank you very much for looking at this problem!
Cheers,
Eric
_______________________________________________
ActiveRDF mailing list
ActiveRDF lists.deri.org
http
://lists.deri.org/mailman/listinfo/activerdf
|
|
|
|