|
List Info
Thread: multiple writers and remotetcp backends
|
|
| multiple writers and remotetcp backends |
  Australia |
2007-04-30 21:36:54 |
Hi guys,
We've been in discussion with Richard and Olly on this
issue, in various
different forums, but as the correct answer isn't
immediately obvious, I'm
opening it up for wider discussion and comment.
The problem is that a xapian tcp-server in 'writable' mode
makes no attempt
to ensure only one 'active' connection at a time is trying
to modify the
database. If multiple connections are made to a writable
server, the
behaviour is undefined (or even it is was defined, it is
unlikely to be
defined in a way that would make it useful). While some
applications can
ensure internally that only a single connection is made to
such a server,
some applications are architected such that multiple
processes, possibly
even multiple machines, must coordinate this "single
writer" approach. This
becomes quite difficult without support inside xapian
itself.
It seems there are 2 general solutions we can implement:
* Only ever allow a single connection to the writable
server. When a second
connection is attempted, we either refuse the connection, or
allow the
connection just to send back an authorative 'writer already
connected'
response, and then close the connection.
* Implement a kind of 'queue' or some other way to block the
incoming
connection. In this case we would accept the connection,
respond with a
message indicating they are in a queue (your call is
important!) and then
block until the first writer is complete. The client side
of the connection
then has a choice regarding waiting in the queue, or hanging
up and trying
again later.
In my opinion, the second option sounds the
"best", but the first option
seems "good enough" and easier to implement. I'm
sure there are both other
options and opinions on this topic, so I'm soliciting all
feedback on this,
with the intention of opening a xapian bug to track the
status, and
ultimately end up with a patch.
Thanks,
Mark
_______________________________________________
Xapian-devel mailing list
Xapian-devel lists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-devel
|
|
| Re: multiple writers and remotetcp
backends |
  United Kingdom |
2007-05-01 03:27:31 |
Mark Hammond wrote:
> It seems there are 2 general solutions we can
implement:
>
> * Only ever allow a single connection to the writable
server. When a second
> connection is attempted, we either refuse the
connection, or allow the
> connection just to send back an authorative 'writer
already connected'
> response, and then close the connection.
>
> * Implement a kind of 'queue' or some other way to
block the incoming
> connection. In this case we would accept the
connection, respond with a
> message indicating they are in a queue (your call is
important!) and then
> block until the first writer is complete. The client
side of the connection
> then has a choice regarding waiting in the queue, or
hanging up and trying
> again later.
>
> In my opinion, the second option sounds the
"best", but the first option
> seems "good enough" and easier to implement.
I agree. If we implement the first solution, I imagine that
many clients
will need to emulate the behaviour of the second solution by
repeatedly
opening a connection to poll for the lock. While the first
option is
slightly simpler to implement, I think that the majority of
the
implementation work is likely to be in enforcing the
single-writer
constraint. I recommend attempting to implement the first
solution
first anyway, since the second solution requires everything
that it
does: once we've got reliable checking of the locks on the
connection,
we can implement a queue on top of that.
One thing which both of these will probably need is a change
to the
remote protocol to allow a connection to specify whether it
is writable
or readonly at the time the connection is opened. This
would allow the
lock on the database to be checked and obtained at this
point for
writable database, rather than waiting for the connection to
attempt a
write operation.
In theory, Xapian itself is meant to prevent there being two
instances
of a Writable database for the same path in existence
concurrently:
however, the tcpserver avoids the check for this by opening
the database
in its server process, and then passing it through a fork to
the
sub-processes. I don't think there's any way we can check
for this
state in the core Xapian library, so the tcpserver itself
needs to
enforce the single-writer constraint.
I'm not sure what happens for the windows variant of the
server (which
is a threaded implementation), but I imagine that there the
same
instance of the database is accessed by each connection: if
this is
correct, there are likely to be other problems since the
database is not
safe for concurrent access. A threaded implementation needs
to create a
new instance of the database for each thread.
> I'm sure there are both other
> options and opinions on this topic, so I'm soliciting
all feedback on this,
> with the intention of opening a xapian bug to track the
status, and
> ultimately end up with a patch.
Opening a bug sooner than later would probably be advisable,
so that the
history of this discussion is easy to find in future.
In particular, discussion of this bug may make it clear
whether it
should block the 1.0.0 release, and if so we
--
Richard
_______________________________________________
Xapian-devel mailing list
Xapian-devel lists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-devel
|
|
| Re: multiple writers and remotetcp
backends |
  United Kingdom |
2007-05-01 06:49:27 |
On Tue, May 01, 2007 at 12:36:54PM +1000, Mark Hammond
wrote:
> The problem is that a xapian tcp-server in 'writable'
mode makes no attempt
> to ensure only one 'active' connection at a time is
trying to modify the
> database. If multiple connections are made to a
writable server, the
> behaviour is undefined (or even it is was defined, it
is unlikely to be
> defined in a way that would make it useful).
I'd not appreciated this happened from the previous
discussion - this
is certainly a bug. I understood the issue was just that of
trying to
marshal multiple processes wanting to write to the same
remote server
in a sane way.
Looking at the code, I believe it's also wrong that we open
the database
and then fork multiple processes which can make use of it,
even for a
read-only Database. We certainly don't promise that you can
use the same
Xapian object from different threads. I think similar rules
ought to
apply over fork.
But this matters much more for writers - with the current
backends, it
happens to work OK for readers I think.
So we should probably leave the reader issue for now, as it
can be fixed
without API or ABI changes, but fix the writer issue.
Cheers,
Olly
_______________________________________________
Xapian-devel mailing list
Xapian-devel lists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-devel
|
|
| Re: multiple writers and remotetcp
backends |
  United Kingdom |
2007-05-01 07:42:08 |
Olly Betts wrote:
> On Tue, May 01, 2007 at 12:36:54PM +1000, Mark Hammond
wrote:
>> The problem is that a xapian tcp-server in
'writable' mode makes no attempt
>> to ensure only one 'active' connection at a time is
trying to modify the
>> database. If multiple connections are made to a
writable server, the
>> behaviour is undefined (or even it is was defined,
it is unlikely to be
>> defined in a way that would make it useful).
>
> I'd not appreciated this happened from the previous
discussion - this
> is certainly a bug. I understood the issue was just
that of trying to
> marshal multiple processes wanting to write to the same
remote server
> in a sane way.
I thought that was the problem too until looking at the code
this morning.
> Looking at the code, I believe it's also wrong that we
open the database
> and then fork multiple processes which can make use of
it, even for a
> read-only Database. We certainly don't promise that
you can use the same
> Xapian object from different threads. I think similar
rules ought to
> apply over fork.
Yes - I would imagine that the behaviour over fork() is
system dependent
- it works okay for readers on Linux, but may well not on
other platforms.
> But this matters much more for writers - with the
current backends, it
> happens to work OK for readers I think.
I'm dubious that it will work correctly under load even with
just
readers on Windows, having looked at the code, due to it
being a
threaded rather than a forking model.
> So we should probably leave the reader issue for now,
as it can be fixed
> without API or ABI changes, but fix the writer issue.
I imagine that a simple fix would be to open the database
when a new
connection comes in, instead of opening the database when
the server is
started and passing it to each sub-process (or sub-thread on
windows).
That way, the normal locking code will be able to enforce
the necessary
constraints.
This will mean that a server will only allow one connection
at a time if
the writable parameter is set, and should throw an exception
to clients
that try to open a second concurrent connection.
If performance is found to be a problem, we can implement a
pooling
system or similar at a later date.
Thoughts? Or shall I just try implementing it and check if
it works as
desired?
--
Richard
_______________________________________________
Xapian-devel mailing list
Xapian-devel lists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-devel
|
|
| Re: multiple writers and remotetcp
backends |
  United Kingdom |
2007-05-01 08:30:32 |
On Tue, May 01, 2007 at 01:42:08PM +0100, Richard Boulton
wrote:
> Olly Betts wrote:
> >Looking at the code, I believe it's also wrong that
we open the database
> >and then fork multiple processes which can make use
of it, even for a
> >read-only Database. We certainly don't promise
that you can use the same
> >Xapian object from different threads. I think
similar rules ought to
> >apply over fork.
>
> Yes - I would imagine that the behaviour over fork() is
system dependent
> - it works okay for readers on Linux, but may well not
on other platforms.
I believe it should be OK currently. But (for example) it
will be a
problem when we start trying to lock the revisions which
readers are
currently using, so we want to avoid people relying on it
working.
> >But this matters much more for writers - with the
current backends, it
> >happens to work OK for readers I think.
>
> I'm dubious that it will work correctly under load even
with just
> readers on Windows, having looked at the code, due to
it being a
> threaded rather than a forking model.
Yes, that could well be a problem already.
> >So we should probably leave the reader issue for
now, as it can be fixed
> >without API or ABI changes, but fix the writer
issue.
>
> I imagine that a simple fix would be to open the
database when a new
> connection comes in, instead of opening the database
when the server is
> started and passing it to each sub-process (or
sub-thread on windows).
> That way, the normal locking code will be able to
enforce the necessary
> constraints.
Yes.
> This will mean that a server will only allow one
connection at a time if
> the writable parameter is set, and should throw an
exception to clients
> that try to open a second concurrent connection.
Yes. Ultimately I think we want a writable reader to be
able to specify
if the server should fail right away or block if the
database is locked
but failing is certainly an improvement over what is
currently happening
and will do for now.
> If performance is found to be a problem, we can
implement a pooling
> system or similar at a later date.
If opening a database is "too slow", we should
first try to address that
head on as that would benefit non-remote users too.
> Thoughts? Or shall I just try implementing it and
check if it works as
> desired?
Go for it.
Cheers,
Olly
_______________________________________________
Xapian-devel mailing list
Xapian-devel lists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-devel
|
|
[1-5]
|
|