|
List Info
Thread: Thoughts on database persistence
|
|
| Thoughts on database persistence |

|
2006-03-18 16:11:48 |
On 3/18/06, Jukka Zitting <jukka.zitting gmail.com> wrote:
> Hi,
>
> I just added JNDI/DataSource -based versions of the
database
> persistence manager and file system classes as
requested in JCR-313.
> The comments on the issue thread got me thinking about
the current
> "simple" approach and database
configurability in general. The
> database file system classes are pretty much mirrors of
the
> persistence manager counterparts, so I'll just focus
on the PM classes
> here, the same ideas apply to both situations.
>
> The JCR-313 issue thread focused much on the question
of
> implementation "simplicity". After going
throught the code I think the
> question has more to do with the approach of keeping
the database
> connection throughout the PM lifecycle and caching
prepared statements
> for performance rather than any inherent simplicity of
the
> implementaion approach.
'Simple' also refers to use of a very simple data model
instead of
a fully normalized schema or some object-relational mapping.
> The main point seems to be that the current
> implementation wants to prepare the used statements
once during
> initialization rather than once per method call. This
is somewhat in
> conflict with the J2EE best practice of keeping a
database connection
> and related resources like prepared statements only for
the duration
> of a single operation.
those best practices apply to j2ee applications. the point
is that i don't
consider jackrabbit to be a j2ee application, jackrabbit is
infrastructure
and has other requirements regarding its persistence layer
than a
database application.
>
> Incidentally there happens to be one approach that
would keep the
> performance advantages of the current approach, remove
the conflict
> with J2EE practices, and even simplify the
implementation! Like this:
>
> 1) Change the DatabasePersistenceManager to get a
database connection
> and prepare the used statements per each operation to
comply with J2EE
> practices.
note that write operations must occur within a single
transaction, i.e.
you can't get a new connection for every write operation.
>
> 2) Use the Commons DBCP DriverAdapterCPDS DataSource
implementation
> with PreparedStatement pooling in
SimpleDbPersistenceManager to keep
> the performance gains.
>
> 3) Remove the now unneeded Connection and
PreparedStatement members
> and resetStatement() method from the
DatabasePersistenceManager class
> to simplify the implementation.
>
> The cost of this change would be a bit of pooling
overhead per each
> persistence manager operation (should be insignificang
compared to the
> cost of the database operations) and the introduction
of commons-dbcp
> and commons-pool as dependencies.
>
> This change would also clarify that the responsibility
of any extra
> database shutdown operations like in the current
> DerbyPersistenceManager rests on the subclass as the
> DatabasePersistenceManager class would no longer keep
any stable
> reference to the underlying database.
>
> What do you think? I can take a shot at implementing
this if you think
> it's worth doing.
-1 for changing SimpleDbPersistenceManager as suggested
right now
on the other hand i have problem with adding a new more
sophisticated
db pm as suggested.
time and experience will tell if we want to keep them both
or not.
cheers
stefan
>
> BR,
>
> Jukka Zitting
>
> --
> Yukatan - http://yukatan.fi/ - info yukatan.fi
> Software craftsmanship, JCR consulting, and Java
development
>
|
|
| Thoughts on database persistence |

|
2006-03-18 18:28:07 |
Hi,
On 3/18/06, Stefan Guggisberg <stefan.guggisberg gmail.com> wrote:
> 'Simple' also refers to use of a very simple data
model instead of
> a fully normalized schema or some object-relational
mapping.
Agreed. A different data model would require a fully
separate PM class
(like in the orm- or dbd- contribs). I believe the
SimpleDbPersistenceManager data model is good for the
current needs
and pretty much orthogonal to the way the database
connection is
handled.
> those best practices apply to j2ee applications. the
point is that i don't
> consider jackrabbit to be a j2ee application,
jackrabbit is infrastructure
> and has other requirements regarding its persistence
layer than a
> database application.
Good point. In many cases Jackrabbit however lives in a J2EE
environment and, as expressed in JCR-313, there are
legitimate needs
for using it within the constraints of existing database
deployments.
> note that write operations must occur within a single
transaction, i.e.
> you can't get a new connection for every write
operation.
Ah, good point. That pretty much downs my proposal. So,
withdrawn for now.
BR,
Jukka Zitting
--
Yukatan - http://yukatan.fi/ - info yukatan.fi
Software craftsmanship, JCR consulting, and Java
development
|
|
| Thoughts on database persistence |

|
2006-03-18 20:43:29 |
Hi to all,
more thoughts on database persistence ...
It seems having a jdbc based persistence manager as the
default
implementation misleads users, new and not so new users
often think
that jackrabbit will benefit from rdbms features and analyze
jackrabbit internals taking into account j2ee best
practices.
Keeping simple the SimpleDBPersistenceManager is a good
option not
only for the sake of simplicity, but also because other
approaches are
discouraged due to design decisions. As Stefan pointed a few
times
jackrabbit is designed to stand in its own right. It means
that it's
not designed to leverage any persistence storage engine,
rdbms
included.
The fact derby is the default PM doesn't mean it's the
best option,
there's overhead related to sql parsing and too many unused
features.
It took me a while to understand it , but I
agree that for now the
best option is a simple and transactional btree
implementation, as
Stefan has been pointing for a long time. Something like
http://jdbm.sourceforge.
net/ would probably be a better fit. Stefan,
WDYT?. Is it worth to give it a try?
Since questions about leveraging rdbms capabilities arises
in the
Mailing list all the time, in case the comments above have
any sense,
I suggest adding a few more entries to the faqs that make
clear
Jackrabbit is not just a layer on top of a rdbms. WDYT?
e.g.
----
I want to use jackrabbit in a j2ee environment and I want to
use JNDi
to configure jdbc connections, how can I do it?
You can override the default implementation and get
connections
through JNDI, but take into account that using a rdbms in
server mode
is not the best option. Jackrabbit *is* a storage engine by
itself.
Does Jackrabbit leverage rdbms capabilities?
No, all Jackrabbit needs from a PersistenceManager
implementation is a
simple transactional persistence mechanism that supports
large
collections. A simple btree implementation suffice.
What's the benefits of using a jdbc based PM
implementation?
Only the rdbms administrative stuff, scheduled backups, etc.
---
my 0,0002 cents, in case it worths that much ;)
edgar
ps, congratulations to all. you are all doing a great job!!
On 3/18/06, Jukka Zitting <jukka.zitting gmail.com> wrote:
> Hi,
>
> On 3/18/06, Stefan Guggisberg <stefan.guggisberg gmail.com> wrote:
> > 'Simple' also refers to use of a very simple
data model instead of
> > a fully normalized schema or some
object-relational mapping.
>
> Agreed. A different data model would require a fully
separate PM class
> (like in the orm- or dbd- contribs). I believe the
> SimpleDbPersistenceManager data model is good for the
current needs
> and pretty much orthogonal to the way the database
connection is
> handled.
>
> > those best practices apply to j2ee applications.
the point is that i don't
> > consider jackrabbit to be a j2ee application,
jackrabbit is infrastructure
> > and has other requirements regarding its
persistence layer than a
> > database application.
>
> Good point. In many cases Jackrabbit however lives in a
J2EE
> environment and, as expressed in JCR-313, there are
legitimate needs
> for using it within the constraints of existing
database deployments.
>
> > note that write operations must occur within a
single transaction, i.e.
> > you can't get a new connection for every write
operation.
>
> Ah, good point. That pretty much downs my proposal. So,
withdrawn for now.
>
> BR,
>
> Jukka Zitting
>
> --
> Yukatan - http://yukatan.fi/ - info yukatan.fi
> Software craftsmanship, JCR consulting, and Java
development
>
|
|
| Thoughts on database persistence |

|
2006-03-18 21:26:11 |
On 3/18/06, Edgar Poce <edgarpoce gmail.com> wrote:
> What's the benefits of using a jdbc based PM
implementation?
> Only the rdbms administrative stuff, scheduled backups,
etc.
"only" seems to diminish the vast importance of
these things. if
jackrabbit provided more in the way of management tools and
backup
capability, people might be less anxious to put a database
underneath
it.
|
|
| Thoughts on database persistence |

|
2006-03-18 21:51:21 |
On 3/18/06, Brian Moseley <bcm osafoundation.org>
wrote:
> On 3/18/06, Edgar Poce <edgarpoce gmail.com> wrote:
>
> > What's the benefits of using a jdbc based PM
implementation?
> > Only the rdbms administrative stuff, scheduled
backups, etc.
>
> "only" seems to diminish the vast
importance of these things. if
> jackrabbit provided more in the way of management tools
and backup
> capability, people might be less anxious to put a
database underneath
> it.
>
You are right, I was thinking mainly in performance. But
sure, things
like audit, backup and recovery are major issues.
|
|
| Thoughts on database persistence |

|
2006-03-19 03:14:36 |
Hi!
I am one of those that brought this subject to the ml in the
past (unfortunately, no so detailed as
Jukka did).
I tend to agree with Jukka from the perspective of known
j2ee best practices. Though, Stefan's
points are more important from functionality and performance
point of view. What looks really
interesting is the fact that both ideas would work quite
well together if:
- we can define a middle "persistence manager"
layer that defines the needed atomic operations (so
this fullfils the requirement that all writes take place in
the same transaction)
- we look at the current persistence manager as JDBC-like
single operation provider.
With the above in mind, one will be able to define:
- in the higher level: how the connection is handled
- in the current pm layer: how the implementation (real
persistence access) is handled
I would probably need more knowledge of the current
implementation details in order to be able to
come out with a full proposal, but I really hope that the
devs will hopefully understand what I am
trying to say.
hope this is more than 2c
./alex
--
.w( the_mindstorm )p.
#: Stefan Guggisberg changed the world a bit at a time by
saying (astral date: 3/18/2006 6:11 PM) :#
> On 3/18/06, Jukka Zitting <jukka.zitting gmail.com> wrote:
>> Hi,
>>
>> I just added JNDI/DataSource -based versions of the
database
>> persistence manager and file system classes as
requested in JCR-313.
>> The comments on the issue thread got me thinking
about the current
>> "simple" approach and database
configurability in general. The
>> database file system classes are pretty much
mirrors of the
>> persistence manager counterparts, so I'll just
focus on the PM classes
>> here, the same ideas apply to both situations.
>>
>> The JCR-313 issue thread focused much on the
question of
>> implementation "simplicity". After
going throught the code I think the
>> question has more to do with the approach of
keeping the database
>> connection throughout the PM lifecycle and caching
prepared statements
>> for performance rather than any inherent simplicity
of the
>> implementaion approach.
>
> 'Simple' also refers to use of a very simple data
model instead of
> a fully normalized schema or some object-relational
mapping.
>
>> The main point seems to be that the current
>> implementation wants to prepare the used statements
once during
>> initialization rather than once per method call.
This is somewhat in
>> conflict with the J2EE best practice of keeping a
database connection
>> and related resources like prepared statements only
for the duration
>> of a single operation.
>
> those best practices apply to j2ee applications. the
point is that i don't
> consider jackrabbit to be a j2ee application,
jackrabbit is infrastructure
> and has other requirements regarding its persistence
layer than a
> database application.
>
>>
>> Incidentally there happens to be one approach that
would keep the
>> performance advantages of the current approach,
remove the conflict
>> with J2EE practices, and even simplify the
implementation! Like this:
>>
>> 1) Change the DatabasePersistenceManager to get a
database connection
>> and prepare the used statements per each operation
to comply with J2EE
>> practices.
>
> note that write operations must occur within a single
transaction, i.e.
> you can't get a new connection for every write
operation.
>
>>
>> 2) Use the Commons DBCP DriverAdapterCPDS
DataSource implementation
>> with PreparedStatement pooling in
SimpleDbPersistenceManager to keep
>> the performance gains.
>>
>> 3) Remove the now unneeded Connection and
PreparedStatement members
>> and resetStatement() method from the
DatabasePersistenceManager class
>> to simplify the implementation.
>>
>> The cost of this change would be a bit of pooling
overhead per each
>> persistence manager operation (should be
insignificang compared to the
>> cost of the database operations) and the
introduction of commons-dbcp
>> and commons-pool as dependencies.
>>
>> This change would also clarify that the
responsibility of any extra
>> database shutdown operations like in the current
>> DerbyPersistenceManager rests on the subclass as
the
>> DatabasePersistenceManager class would no longer
keep any stable
>> reference to the underlying database.
>>
>> What do you think? I can take a shot at
implementing this if you think
>> it's worth doing.
>
> -1 for changing SimpleDbPersistenceManager as suggested
right now
>
> on the other hand i have problem with adding a new more
sophisticated
> db pm as suggested.
>
> time and experience will tell if we want to keep them
both or not.
>
> cheers
> stefan
>
>>
>> BR,
>>
>> Jukka Zitting
>>
>> --
>> Yukatan - http://yukatan.fi/ - info yukatan.fi
>> Software craftsmanship, JCR consulting, and Java
development
>>
>
|
|
| Thoughts on database persistence |

|
2006-03-20 10:18:30 |
hi edgar
On 3/18/06, Edgar Poce <edgarpoce gmail.com> wrote:
> Hi to all,
>
> more thoughts on database persistence ...
>
> It seems having a jdbc based persistence manager as
the default
> implementation misleads users, new and not so new users
often think
> that jackrabbit will benefit from rdbms features and
analyze
> jackrabbit internals taking into account j2ee best
practices.
>
> Keeping simple the SimpleDBPersistenceManager is a
good option not
> only for the sake of simplicity, but also because other
approaches are
> discouraged due to design decisions. As Stefan pointed
a few times
> jackrabbit is designed to stand in its own right. It
means that it's
> not designed to leverage any persistence storage
engine, rdbms
> included.
>
> The fact derby is the default PM doesn't mean it's
the best option,
> there's overhead related to sql parsing and too many
unused features.
> It took me a while to understand it , but I
agree that for now the
> best option is a simple and transactional btree
implementation, as
> Stefan has been pointing for a long time. Something
like
> http://jdbm.sourceforge.
net/ would probably be a better fit. Stefan,
> WDYT?. Is it worth to give it a try?
i've never took a closer look at jdbm but it's certainly
something worth
investigating.
>
> Since questions about leveraging rdbms capabilities
arises in the
> Mailing list all the time, in case the comments above
have any sense,
> I suggest adding a few more entries to the faqs that
make clear
> Jackrabbit is not just a layer on top of a rdbms. WDYT?
yes, i agree.
cheers
stefan
>
> e.g.
> ----
>
> I want to use jackrabbit in a j2ee environment and I
want to use JNDi
> to configure jdbc connections, how can I do it?
> You can override the default implementation and get
connections
> through JNDI, but take into account that using a rdbms
in server mode
> is not the best option. Jackrabbit *is* a storage
engine by itself.
>
> Does Jackrabbit leverage rdbms capabilities?
> No, all Jackrabbit needs from a PersistenceManager
implementation is a
> simple transactional persistence mechanism that
supports large
> collections. A simple btree implementation suffice.
>
> What's the benefits of using a jdbc based PM
implementation?
> Only the rdbms administrative stuff, scheduled backups,
etc.
>
> ---
>
> my 0,0002 cents, in case it worths that much ;)
> edgar
>
> ps, congratulations to all. you are all doing a great
job!!
>
> On 3/18/06, Jukka Zitting <jukka.zitting gmail.com> wrote:
> > Hi,
> >
> > On 3/18/06, Stefan Guggisberg
<stefan.guggisberg gmail.com> wrote:
> > > 'Simple' also refers to use of a very
simple data model instead of
> > > a fully normalized schema or some
object-relational mapping.
> >
> > Agreed. A different data model would require a
fully separate PM class
> > (like in the orm- or dbd- contribs). I believe the
> > SimpleDbPersistenceManager data model is good for
the current needs
> > and pretty much orthogonal to the way the database
connection is
> > handled.
> >
> > > those best practices apply to j2ee
applications. the point is that i don't
> > > consider jackrabbit to be a j2ee application,
jackrabbit is infrastructure
> > > and has other requirements regarding its
persistence layer than a
> > > database application.
> >
> > Good point. In many cases Jackrabbit however lives
in a J2EE
> > environment and, as expressed in JCR-313, there
are legitimate needs
> > for using it within the constraints of existing
database deployments.
> >
> > > note that write operations must occur within
a single transaction, i.e.
> > > you can't get a new connection for every
write operation.
> >
> > Ah, good point. That pretty much downs my
proposal. So, withdrawn for now.
> >
> > BR,
> >
> > Jukka Zitting
> >
> > --
> > Yukatan - http://yukatan.fi/ - info yukatan.fi
> > Software craftsmanship, JCR consulting, and Java
development
> >
>
|
|
[1-7]
|
|