List Info

Thread: New AllegroGraph adapter (development)




New AllegroGraph adapter (development)
user name
2007-10-30 08:25:59
Good morning! Thank you for developing such a useful
library.

Franz, Inc. is generously sponsoring the development of an
ActiveRDF
adapter for AllegroGraph.  Currently, it's still fairly
primitive, and
it's not yet fast enough for production work.

You can access the source tree through RubyForge:

  http://rubyfo
rge.org/scm/?group_id=4460

Be warned: The tree is in a state of considerable flux, and
more-or-less everything is subject to change.

Right now, this adapter works as a read/write
SPARQL-over-HTTP
adapter. (We'd be happy to merge the useful bits of this
code back
into activerdf_sparql.) In the future, we're looking at
various ways
to speed up the adapter considerably, and we may change the
underlying
protocol to something with a quicker round-trip time.

* Why it's slow

We've been wrestling with a few design issues, and we're not
quite
sure about what we should do next. Our biggest problem: Our
query
granularity is too small for optimum performance.

Imagine that we have an RDF database with the following
schema:

  foaf:Organization
    foaf:name
    foaf:member

  foaf:Person
    foaf:title
    foaf:name
    foaf:nick

Now imagine that we write the following Ruby code:

  FOAF::Organization.find_all.each do |org|
    puts "Members of #{org.name}:"
    org.all_member.each do |person|
      puts "  #{person.title} #{person.name}
(#{person.nick})"
    end
  end

When we run this code, we make separate SPARQL queries for
each call
to "org.name", "person.title",
"person.name" and "person.nick". But
instead of going to an in-process database, these queries
each require
an HTTP request to a separate process. This costs us at
least 5-25ms
per query, depending on OS and processor speed.

We'd like to speed this up considerably, but we're not sure
where to
begin. We might be able to improve performance considerably
using
larger-granularity queries, but we're not sure how to
achieve that
with ActiveRDF. For example, it might be possible to reduce
that
entire inner loop to a single query, assuming we have enough
knowledge
of the schema:

  SELECT ?s ?title ?name ?nick WHERE {
    <http://.../OrgName>
foaf:member ?s .
    ?s rdf:type foaf:Person .
    OPTIONAL { ?s foaf:title ?title } .
    OPTIONAL { ?s foaf:name ?name } .
    OPTIONAL { ?s foaf:nick ?nick }
  }

Is ActiveRDF an appropriate library if we want to make such
large-granularity queries? Obviously, this is pretty far
outside the
mission of the existing rdflite, redland and sesame2
adapters.

Thank you for any information you can provide, and thank you
for such
an excellent RDF library!

Cheers,
Eric
_______________________________________________
ActiveRDF mailing list
ActiveRDFlists.deri.org
http
://lists.deri.org/mailman/listinfo/activerdf

Re: large-granularity queries
country flaguser name
United States
2007-10-30 09:09:42
Eric Kidd wrote:
> When we run this code, we make separate SPARQL queries
for each call
> to "org.name", "person.title",
"person.name" and "person.nick". But
> instead of going to an in-process database, these
queries each require
> an HTTP request to a separate process. This costs us at
least 5-25ms
> per query, depending on OS and processor speed.
>
> We'd like to speed this up considerably, but we're not
sure where to
> begin. We might be able to improve performance
considerably using
> larger-granularity queries, but we're not sure how to
achieve that
> with ActiveRDF. For example, it might be possible to
reduce that
> entire inner loop to a single query, assuming we have
enough knowledge
> of the schema:
>
>   SELECT ?s ?title ?name ?nick WHERE {
>     <http://.../OrgName>
foaf:member ?s .
>     ?s rdf:type foaf:Person .
>     OPTIONAL { ?s foaf:title ?title } .
>     OPTIONAL { ?s foaf:name ?name } .
>     OPTIONAL { ?s foaf:nick ?nick }
>   }
>
> Is ActiveRDF an appropriate library if we want to make
such
> large-granularity queries? Obviously, this is pretty
far outside the
> mission of the existing rdflite, redland and sesame2
adapters.
>
>   
I think it is possible, by adding an alternative virtual
object to 
activeRDF.  Instead of only having an object that
corresponds to an 
RDF:Resource, (or in addition to having that object), we
should have a 
class RDF:Resources, which corresponds to the results of a
query.   The 
advantage of this is that we delay actually firing off a
query until an 
'each' operator is called in Ruby -- until then the
predicates would be 
virtual methods of the RDF:Resources object, and we could
add to the 
query without actually submitting to the server.   The trick
would be an 
'each' implementation that would be sufficiently clever to
combine the 
query implicit in its subroutine with the query-so-far of
its parent 
object for some reasonable set of subroutines.

Second advantage of the approach would be that it could
handle blank 
nodes, and it could handle clusters of literals in alternate
languages.

Of course, I have not implemented anything yet...

Benno
_______________________________________________
ActiveRDF mailing list
ActiveRDFlists.deri.org
http
://lists.deri.org/mailman/listinfo/activerdf

Re: large-granularity queries
user name
2007-10-30 12:41:40
On 30.10.2007, at 15:09, Benno Blumenthal wrote:

> Eric Kidd wrote:
>> When we run this code, we make separate SPARQL
queries for each call
>> to "org.name", "person.title",
"person.name" and "person.nick". But
>> instead of going to an in-process database, these
queries each  
>> require
>> an HTTP request to a separate process. This costs
us at least 5-25ms
>> per query, depending on OS and processor speed.
>>
>> We'd like to speed this up considerably, but we're
not sure where to
>> begin. We might be able to improve performance
considerably using
>> larger-granularity queries, but we're not sure how
to achieve that
>> with ActiveRDF. For example, it might be possible
to reduce that
>> entire inner loop to a single query, assuming we
have enough  
>> knowledge
>> of the schema:
>>
>>   SELECT ?s ?title ?name ?nick WHERE {
>>     <http://.../OrgName>
foaf:member ?s .
>>     ?s rdf:type foaf:Person .
>>     OPTIONAL { ?s foaf:title ?title } .
>>     OPTIONAL { ?s foaf:name ?name } .
>>     OPTIONAL { ?s foaf:nick ?nick }
>>   }
>>
>> Is ActiveRDF an appropriate library if we want to
make such
>> large-granularity queries? Obviously, this is
pretty far outside the
>> mission of the existing rdflite, redland and
sesame2 adapters.
>>
>>
> I think it is possible, by adding an alternative
virtual object to  
> activeRDF.  Instead of only having an object that
corresponds to an  
> RDF:Resource, (or in addition to having that object),
we should  
> have a class RDF:Resources, which corresponds to the
results of a  
> query.   The advantage of this is that we delay
actually firing off  
> a query until an 'each' operator is called in Ruby --
until then  
> the predicates would be virtual methods of the
RDF:Resources  
> object, and we could add to the query without actually
submitting  
> to the server.   The trick would be an 'each'
implementation that  
> would be sufficiently clever to combine the query
implicit in its  
> subroutine with the query-so-far of its parent object
for some  
> reasonable set of subroutines.

This sounds like an interesting idea.

There definitively needs to be some sort of optimisation for
SPARQL  
query speed.

So it is a good thing to have a lively discussion about
this, and  
probably more then one implementation for some good old
Darwinian  
competition.

I actually often used a fairly low level variation of the  
RDF:Resources idea.

When doing expensive queries with large results, I just made
a query  
by hand, like this:
Query.new.distinct(:s1, :p1, :p2).where(:s1, someproperty,
"some sort  
of ID").where(:s1, prop1, :p1).where(:s1, prop2,
:p2).execute

Then I could get all the necessary data from the big result
array.

We could support this type of ideom more elegantly and
transparently  
with some sort of higher granularity sparql query.


Speaking about not really released features:

I have experimented with a very primitive SPARQL query
result cache.  
This had very good results, but I only used it with static
sparql  
stores. It of course would be much complexer for dynamic
stores.


> Second advantage of the approach would be that it could
handle  
> blank nodes, and it could handle clusters of literals
in alternate  
> languages.
>
> Of course, I have not implemented anything yet...

The current trunk of activerdf supports blank nodes in
SPARQL queries.
_______________________________________________
ActiveRDF mailing list
ActiveRDFlists.deri.org
http
://lists.deri.org/mailman/listinfo/activerdf

Re: New AllegroGraph adapter (development)
user name
2007-10-30 12:48:23
On 30.10.2007, at 14:25, Eric Kidd wrote:

> Good morning! Thank you for developing such a useful
library.

Thanks, no problem :P

> We'd like to speed this up considerably, but we're not
sure where to
> begin.

It is allows good to see new ideas in this area.
Speed is currently the most important issue for future
ActiveRDF  
development IMHO.

> Is ActiveRDF an appropriate library if we want to make
such
> large-granularity queries? Obviously, this is pretty
far outside the
> mission of the existing rdflite, redland and sesame2
adapters.


I will have to think about it some more, and Eyal should
also take a  
look at the issue.

But it probably would be better to implement this kind of
thing with  
a specific Domain Specific Language (DSL) on top of
activerdf.

I hear there is some sort of plugin or so for Rails which
can be used  
to emulate a data source for Rails, with all the bells and
whistles  
which Rails can then use.

There is also the possibility of somehow incorporating  
ActiveResource, which can be used to access resources over
REST.


I currently am wrapping up my master thesis about design
patterns for  
the semantic web, and I think I can spend more time on this
issue  
after that. 
_______________________________________________
ActiveRDF mailing list
ActiveRDFlists.deri.org
http
://lists.deri.org/mailman/listinfo/activerdf

Re: New AllegroGraph adapter (development)
user name
2007-10-30 12:56:20
Benjamin --

Be sure to send it out to this list when you're done with your thesis. I bet a bunch of people on here would enjoy reading what you have to say about the topic.

Eric --

Regarding you're query:

 SELECT ?s ?title ?name ?nick WHERE {
   <http://.../OrgName> foaf:member ?s .
   ?s rdf:type foaf:Person .
   OPTIONAL { ?s foaf:title ?title } .
   OPTIONAL { ?s foaf:name ?name } .
   OPTIONAL { ?s foaf:nick ?nick }
 }
 

Wouldn&#39;t it be possible to implement some caching layer that simply performs&nbsp;

SELECT ?p ?o WHERE { <your_subject> ?p ?o } 

When you request any particular property for the first time? ;

You might consider implementing the configuration of such a caching system in the same way the ActiveRecord associations work, so that you would end up with something like this (note: I realize this isn't how ActiveRDF class objects are currently declared):

class FOAF::Person << ActiveRDF::Base
&nbsp; belongs_to FOAF::Organization, :include => true
end

That would tell the cache to eager-load ?org ?p ?o WHERE <user&gt; foaf:organization ?org .


 
-ted

On 10/30/07, Benjamin Heitmann < benjamin.heitmannderi.org">benjamin.heitmannderi.org&gt; wrote:

On 30.10.2007, at 14:25, Eric Kidd wrote:

&gt; Good morning! Thank you for developing such a useful library.

Thanks, no problem :P

> We'd like to speed this up considerably, but we're not sure where to
> begin.

It is allows good to see new ideas in this area.
Speed is currently the most important issue for future ActiveRDF
development IMHO.

>; Is ActiveRDF an appropriate library if we want to make such
>; large-granularity queries? Obviously, this is pretty far outside the
> mission of the existing rdflite, redland and sesame2 adapters.


I will have to think about it some more, and Eyal should also take a
look at the issue.

But it probably would be better to implement this kind of thing with
a specific Domain Specific Language (DSL) on top of activerdf.

I hear there is some sort of plugin or so for Rails which can be used
to emulate a data source for Rails, with all the bells and whistles
which Rails can then use.

There is also the possibility of somehow incorporating
ActiveResource, which can be used to access resources over REST.


I currently am wrapping up my master thesis about design patterns for
the semantic web, and I think I can spend more time on this issue
after that.
_______________________________________________
ActiveRDF mailing list
ActiveRDFlists.deri.org">ActiveRDFlists.deri.org
http://lists.deri.org/mailman/listinfo/activerdf



--
Edward Benson
http://www.edwardbenson.com/
Re: large-granularity queries
user name
2007-10-30 13:06:31
Another option if you want to do raw SPARQL queries is to store them in RHTML templates (a bit hackish, but it works) and then render them to a string.&nbsp;

I've found that works really well when you have some parameters being passed in by the user that will end up affecting the SPARQL query --- you can just write your query as an ERb template, use the controller to set up the variables that fill in the template, and then do something like the following:

 &nbsp;  query = render_to_string :partial => 'queries/annotation_query9;, :locals => local_hash, :template => false
&nbsp;   results = SPARQL.execute_sparql_query (query)

Where views/queries/annotation_query looks something like:

<%= render :partial => 'namespaces' %>
SELECT DISTINCT
?Annotation ?Entity ?EntityLabel ?Poly ?Point&nbsp;
WHERE {
# snip
# ....
# Things like this:
# -----------------------------------------------------------------------
# FILTER - on class [optional]
# -----------------------------------------------------------------------
&lt;% if ( (defined? classes) && (classes != nil)) ;
 &nbsp;  classes.each { |klass|&nbsp;
  %>
 &nbsp;?Entity rdf:type <<%= klass %>>
<% } 
end %>

#snip
.....

You have to use the result set manually, but it allows you to issue some pretty complex queries without a lot of work. ;

While the ActiveRecord-like object mapping is a great part of ActiveRDF, sometimes maintaining complex SPARQL queries should be really left to plain-old sparql template files.&nbsp;
 ;
-ted

On 10/30/07, Benjamin Heitmann < benjamin.heitmannderi.org">benjamin.heitmannderi.org&gt; wrote:

On 30.10.2007, at 15:09, Benno Blumenthal wrote:

&gt; Eric Kidd wrote:
>;> When we run this code, we make separate SPARQL queries for each call
>;> to "org.name&quot;, "person.title&quot;, "person.name" and "person.nick&quot;. But
>&gt; instead of going to an in-process database, these queries each
>;> require
&gt;> an HTTP request to a separate process. This costs us at least 5-25ms
>;> per query, depending on OS and processor speed.
>;>
>&gt; We'd like to speed this up considerably, but we're not sure where to
>&gt; begin. We might be able to improve performance considerably using
>> larger-granularity queries, but we're not sure how to achieve that
>&gt; with ActiveRDF. For example, it might be possible to reduce that
>;> entire inner loop to a single query, assuming we have enough
>;> knowledge
>> of the schema:
&gt;>
>> &nbsp; SELECT ?s ?title ?name ?nick WHERE {
>>  ; &nbsp; <http://.../OrgName> foaf:member ?s .
>&gt; &nbsp; &nbsp; ?s rdf:type foaf:Person .
>>  ; &nbsp; OPTIONAL { ?s foaf:title ?title } .
>>  ; &nbsp; OPTIONAL { ?s foaf:name ?name } .
>>  ; &nbsp; OPTIONAL { ?s foaf:nick ?nick }
>>  ; }
>>
>> Is ActiveRDF an appropriate library if we want to make such
>&gt; large-granularity queries? Obviously, this is pretty far outside the
>&gt; mission of the existing rdflite, redland and sesame2 adapters.
>>
>>
&gt; I think it is possible, by adding an alternative virtual object to
> activeRDF.  ;Instead of only having an object that corresponds to an
> RDF:Resource, (or in addition to having that object), we should
&gt; have a class RDF:Resources, which corresponds to the results of a
> query.&nbsp;  The advantage of this is that we delay actually firing off
> a query until an 'each&#39; operator is called in Ruby -- until then
>; the predicates would be virtual methods of the RDF:Resources
> object, and we could add to the query without actually submitting
> to the server.&nbsp;  The trick would be an 'each&#39; implementation that
>; would be sufficiently clever to combine the query implicit in its
> subroutine with the query-so-far of its parent object for some
> reasonable set of subroutines.

This sounds like an interesting idea.

There definitively needs to be some sort of optimisation for SPARQL
query speed.

So it is a good thing to have a lively discussion about this, and
probably more then one implementation for some good old Darwinian
competition.

I actually often used a fairly low level variation of the
RDF:Resources idea.

When doing expensive queries with large results, I just made a query
by hand, like this:
Query.new.distinct (:s1, :p1, :p2).where(:s1, someproperty, "some sort
of ID").where(:s1, prop1, :p1).where(:s1, prop2, :p2).execute

Then I could get all the necessary data from the big result array.

We could support this type of ideom more elegantly and transparently
with some sort of higher granularity sparql query.


Speaking about not really released features:

I have experimented with a very primitive SPARQL query result cache.
This had very good results, but I only used it with static sparql
stores. It of course would be much complexer for dynamic stores.


> Second advantage of the approach would be that it could handle
>; blank nodes, and it could handle clusters of literals in alternate
> languages.
>
> Of course, I have not implemented anything yet...

The current trunk of activerdf supports blank nodes in SPARQL queries.
_______________________________________________
ActiveRDF mailing list
ActiveRDFlists.deri.org">ActiveRDFlists.deri.org
http://lists.deri.org/mailman/listinfo/activerdf



--
Edward Benson
http://www.edwardbenson.com/
Re: large-granularity queries
user name
2007-11-01 07:21:35
On 10/30/07/10/07 10:09 -0400, Benno Blumenthal wrote:
>> Is ActiveRDF an appropriate library if we want to
make such
>> large-granularity queries? Obviously, this is
pretty far outside the
>> mission of the existing rdflite, redland and
sesame2 adapters.
> I think it is possible, by adding an alternative
virtual object to 
> activeRDF.  Instead of only having an object that
corresponds to an 
> RDF:Resource, (or in addition to having that object),
we should have a 
> class RDF:Resources, which corresponds to the results
of a query.   The 
> advantage of this is that we delay actually firing off
a query until an 
> 'each' operator is called in Ruby -- until then the
predicates would be 
> virtual methods of the RDF:Resources object, and we
could add to the 
> query without actually submitting to the server.   The
trick would be an 
> 'each' implementation that would be sufficiently clever
to combine the 
> query implicit in its subroutine with the query-so-far
of its parent 
> object for some reasonable set of subroutines.
Agree, that sounds like a good way. I have to let it sink
and think about 
it for a while, but I do feel that it's appropriate and
necessary. In cases 
where speed is important, I now often resort to the
lower-level Query 
object, Query.new.distinct(:s).where(....) as described by
Benjamin, since 
it allows such compound queries without multiple
round-trips, but it would 
be of course much nicer to hide this under the roof.

So...I'll think about it. As always, any suggestions,
half-baked 
implementations or patches would speed up my thinking
considerably!

  -eyal
_______________________________________________
ActiveRDF mailing list
ActiveRDFlists.deri.org
http
://lists.deri.org/mailman/listinfo/activerdf

Re: New AllegroGraph adapter (development)
user name
2007-11-01 07:27:13
On 10/30/07/10/07 09:25 -0400, Eric Kidd wrote:
>Franz, Inc. is generously sponsoring the development of
an ActiveRDF
>adapter for AllegroGraph.  Currently, it's still fairly
primitive, and
>it's not yet fast enough for production work.
>
>Right now, this adapter works as a read/write
SPARQL-over-HTTP
>adapter. (We'd be happy to merge the useful bits of this
code back
>into activerdf_sparql.) In the future, we're looking at
various ways
>to speed up the adapter considerably, and we may change
the underlying
>protocol to something with a quicker round-trip time.
Aside from discussing the speed-up of querying in ActiveRDF,
I just wanted 
to say that I consider the development of this AllegroGraph
adapter great 
news, that I'm very glad to see it being open source, and
that I'm grateful 
to Franz Inc. for sponsoring your work!

Also, I saw your comment in the test code, regarding the way
ActiveRDF 
supports the setting of attribute values
(smersh.foaf::member = tatiana).  
I'm also not happy with this. Suggestions for improvement,
anybody?

  -eyal
_______________________________________________
ActiveRDF mailing list
ActiveRDFlists.deri.org
http
://lists.deri.org/mailman/listinfo/activerdf

Re: New AllegroGraph adapter (development)
user name
2007-11-01 08:40:35
On 10/30/07, Edward Benson <edward.bensongmail.com> wrote:
> Regarding you're query:
>
>  SELECT ?s ?title ?name ?nick WHERE {
>    <http://.../OrgName>
foaf:member ?s .
>    ?s rdf:type foaf:Person .
>    OPTIONAL { ?s foaf:title ?title } .
>    OPTIONAL { ?s foaf:name ?name } .
>    OPTIONAL { ?s foaf:nick ?nick }
>  }
>
>
> Wouldn't it be possible to implement some caching layer
that simply performs
>
> SELECT ?p ?o WHERE { <your_subject> ?p ?o }
>
> When you request any particular property for the first
time?

That's a good question. Let me try to explain what I was
thinking.

Imagine the following (contrived) schema:

 geo:Country
   geo:name rdfs:Literal
   geo:gdp rdfs:Literal
   geo:citizen foaf:Person

...and the following loop:

 GEO::Country.find_all do |country|
   puts "#{country.name} #{country.gdp}"
 end

In this rather silly case, querying up front for geo:name
and geo:gdp
saves us 2*(number of countries) queries as we iterate
through the
loop.

But querying for geo:citizen is not such a good idea, unless
the
programmer *really, really* needs it. :-(

> You might consider implementing the configuration of
such a caching system
> in the same way the ActiveRecord associations work, so
that you would end up
> with something like this (note: I realize this isn't
how ActiveRDF class
> objects are currently declared):
>
> class FOAF::Person << ActiveRDF::Base
>   belongs_to FOAF::Organization, :include => true
> end

Yeah, this is a nice idea. We could probably fill in most of
this data
automatically by querying for:

 ?p rdf:type rdfs:Property
 ?p rdfs:domain ?domain
 ?p rdfs:range ?range

Does anybody else have any suggestions?

Thank you for your ideas!

Cheers,
Eric
_______________________________________________
ActiveRDF mailing list
ActiveRDFlists.deri.org
http
://lists.deri.org/mailman/listinfo/activerdf

Re: large-granularity queries
user name
2007-11-01 08:42:39
On 10/30/07, Benno Blumenthal <bennoiri.columbia.edu> wrote:
> The trick would be an
> 'each' implementation that would be sufficiently clever
to combine the
> query implicit in its subroutine with the query-so-far
of its parent
> object for some reasonable set of subroutines.

Interesting! Let me see if I understand this idea. On the
first pass
through the loop, we'd have:

 FOAF::Person.find_all.each do |person|
   puts "#{person.title} #{person.name}"
 end

...and on the first pass through the loop, the
"person.title" and
"person.name" queries would be run in the ordinary
fashion? And then,
once the the first iteration of the loop was finished, the
system
could re-issue the original query, asking for
"title" and "name"?

Or did you have something else in mind?

Thank you very much for looking at this problem!

Cheers,
Eric
_______________________________________________
ActiveRDF mailing list
ActiveRDFlists.deri.org
http
://lists.deri.org/mailman/listinfo/activerdf

[1-10] [11-15]

about | contact  Other archives ( Real Estate discussion Medical topics )