|
List Info
Thread: Indexing
|
|
| Indexing |

|
2006-02-24 09:18:09 |
Hi Chris!
Been hectic days here, haven't had time to followup. But
here goes:
Chris Muller <chris funkyobjects.org> wrote:
> Howdy Göran!
>
> What version of Magma are you using?
1.0
Btw, I am slightly confused about this. There is
MagmaServerLoader,
MagmaTesterLoader and MagmaClientLoader. I presume these
three MCs use
MC deps to refer to the latest of all components they
consist of. I
understand that server is a superset of client etc. I guess
that using
any of these would give me Magma 1.1 - or rather - the
latest of all
packages, right?
And then I presume that Magma1.0 is an MC referring to a
frozen set of
older snapshots, but does it correspond to MagmaServerLoader
or
MagmaClientLoader or ... what?
Anyway, I just loaded Magma1.0-cmm.4 and my app still works.
(Cees and others are now using Monticello Configurations,
perhaps that
is an option for you too - a config is just a list of
specific
snapshots)
> > AFAICT MaTransaction>>markRead:using: calls
> > monitorLargeCollectionChanges:, but the WeakSet in
> > MaTransaction just
> > keeps growing, I assume it uses identity instead
but
> > perhaps should use
> > equality?. Looks like a bug, anyway that is not my
> > issue here...
>
> I'm trying to understand this without having access to
> Squeak right now.. not remembering what WeakSet you
> may be talking about; I know the MaTransaction has a
> 'readSet' which is a WeakIdentityKeyDictionary but
> what is the name of the variable referencing the
> WeakSet? I don't remember, I'm afraid I'll have to
> wait until this weekend to comment, sorry.
The WeakSet I am referring to is largeCollectionChanges.
In markRead:using: it says at the end:
...
anObject maIsLargeCollection
ifTrue:
[ self monitorLargeCollectionChanges: anObject changes.
anObject session: session ].
^anObject
And in MaTransaction>>monitorLargeCollectionChanges:
we have:
monitorLargeCollectionChanges: aMaLargeCollectionChanges
largeCollectionChanges add: aMaLargeCollectionChanges
Ok, so in my app I have MagmaCollections in three different
places and
given the number of instances of my domain objects at the
moment I
should have this number of MagmaCollections:
(Q2Model allInstances size * 2) + (Q2Process allInstances
size) ==> 9
Note: Q2Model has two MagmaCollection instvars and Q2Process
one.
This gives me 9 right now. MagmaCollection allInstances size
gives me
378! And MagmaCollectionChanges allInstances size gives 379.
Hmmmmm, ok - now I cleaned out my Magma db directory (it had
tons of
older MagmaCollection files - indexes that is - around
370-ish). Now it
looks much better. I still have "twice too many"
MagmaCollections in my
image though:
{(Q2Model allInstances size * 2) + (Q2Process allInstances
size).
MagmaSession allInstances size.
MagmaCollection allInstances size.
MagmaCollectionChanges allInstances size}
===> #(9 2 18 19)
Q2Model allInstances size is 1 - which is my domain model
root object.
It has two MagmaCollections. Then I have Q2Process - 7
instances with
one MagmaCollection each. That is the expected number given
a single
MagmaSession. The second MagmaSession seems to be an extra
internal
session used by Magma (right?) and perhaps that session is
for some
reason also materializing the collections - which would
explain the
double amount (18 instead of 9). And yes, I have files on
disk
indicating 9 MagmaCollections (there are 9 unique numbers
used in the
.hdx filenames).
Anyway, this looks "reasonable" and funny enough
my experience that
spurred this email (the index not being updated) seems to
have magically
disappeared. It might have been related to these older index
files
laying around? Odd.
> > So the code in captureOldHashesFor: seems to only
> > capture the hashes for
> > the large collections that are
"monitored", but in
> > my case this happens
> > to only include the collection I navigated through
> > (since I then read
> > it), but not the other! Is this logic really
> > correct? Or do I need to
> > "trick" the session into reindexing my
other
> > collection too somehow?
>
> No, this does not sound correct. All LargeCollections
> should be monitored as soon as they're persistent. If
> they're not persistent, changing keys doesn't matter.
> But again, I'm talking codeless here..
So a MagmaSession always "knows" all
MagmaCollections in a db,
regardless of if they have been navigated and materialized
in the
session yet?
> Brent are you around? Didn't we just fix a bug
> related to this recently?
>
> Sorry you had this problem Göran. I will investigate
> it this weekend and have an answer/fix for you.
No problem, I haven't had many issues at all with Magma so
far - and in
this case it was probably because there were older files in
the db dir
(my guess).
Btw, in my app you can actually have the server
"build" a separate Magma
db, then download it, unzip and reconnect to it on the
clients locally -
so it was nice to see that you are very careful with
predicting "odd"
scenarios - because I then stumbled onto this exception
(perfectly
correctly - because I needed to reconnect etc) signalled in
MagmaSession>>validateRemoteId :
"Cannot connect because the repository has been
replaced."
> > The current result seems to be that object A is
only
> > reindexed in the
> > collection I navigated through. I assume there is
> > currently no way for
> > Magma to know which collections it should re-index
> > on its own.
>
> Nope, no way. I tried real hard for Magma to detect
> this automatically but eventually concluded it was
> impossible without severely affecting performance.
>
> I'll get back to you tomorrow or Sunday.. Be sure to
> tell me what version you're using.
>
> - Chris
Hmmm, let me see now... above you are saying (I guess) that
only
monitored MagmaCollections will be reindexed. And AFAICT
from the code
the monitored collections are the ones we have materialized
in the
session. But above you wrote "All LargeCollections
should be monitored
as soon as they're persistent." which seems
contradictory.
I am at a loss right now. Right now my app seems to work
nicely - but
perhaps that is just because I materialize all these
collections in my
sessions right now.
regards, Göran
PS. Very happy with Magma so far. And yes,
the second demo the other
day went fine and we have a GO for the project! And most
likely we will
open source it too.
|
|
| Indexing |

|
2006-02-24 16:49:51 |
> Btw, I am slightly confused about this. There is
> MagmaServerLoader,
> MagmaTesterLoader and MagmaClientLoader. I presume
> these three MCs use
> MC deps to refer to the latest of all components
> they consist of. I
> understand that server is a superset of client etc.
> I guess that using
> any of these would give me Magma 1.1 - or rather -
> the latest of all
> packages, right?
Yep. MagmaClientLoader has just the client packages,
used to ONLY connect to a *remote* server.
MagmaServerLoader includes all the packages in client
plus some extras for the server. These are needed to
either host a server or connect using #openLocal:.
MagmaTesterLoader includes all in server (and client)
plus a bunch extra for the test cases.
> And then I presume that Magma1.0 is an MC referring
> to a frozen set of
> older snapshots, but does it correspond to
> MagmaServerLoader or
> MagmaClientLoader or ... what?
Better to think of them not as "older" (because
they're actually newer with these patches this week)
but as "minus the security code".
Rather than create three separate Loader packages for
1.0, I just created one that includes everything (a la
"Magma1.0TesterLoader"). The premise is that
soon 1.1
will be the best one to use. With these last two
fix-updates to 1.0, it is now branched from 1.1 on
SqueakSource. I'm not planning to release another 1.1
for a few weeks yet.
So this is another reason to stay with 1.0 for now. I
have merged the fixes into my own local 1.1 but not
planning to commit it to SqueakSource yet until I'm
done with this iteration.
> Anyway, I just loaded Magma1.0-cmm.4 and my app
> still works.
>
> (Cees and others are now using Monticello
> Configurations, perhaps that
> is an option for you too - a config is just a list
> of specific
> snapshots)
I have no preference either way other than I really
don't want to have a SqueakSource server running right
now just to use MC-Configs.. When they support
File-based repositories I'll check them out again.
> Ok, so in my app I have MagmaCollections in three
> different places and
> given the number of instances of my domain objects
> at the moment I
> should have this number of MagmaCollections:
>
> (Q2Model allInstances size * 2) + (Q2Process
> allInstances size) ==> 9
>
> Note: Q2Model has two MagmaCollection instvars and
> Q2Process one.
>
> This gives me 9 right now. MagmaCollection
> allInstances size gives me
> 378! And MagmaCollectionChanges allInstances size
> gives 379.
The next time this happens, see how many instances of
MagmaSession you have. Remember, they all have their
own copy of all the MagmaCollections and changes.
There have been intermittent issues with cleanup of
old sessions over the years, it may be back.. It was
always related to Block/Method contexts holding old
Sessions in one of their (temp-var?) references..
There is a utility method, MagmaSession
class>>#cleanUp which enumerates all instances of
these contexts does a fine job of getting rid of the
ones; print-it to see the before/after instance count.
> Hmmmmm, ok - now I cleaned out my Magma db directory
> (it had tons of
> older MagmaCollection files - indexes that is -
> around 370-ish). Now it
> looks much better.
Now this confuses me. "Cleaning up" the the
directory
files alone should have no effect on the number of
instances in the image.. ??
> I still have "twice too many"
> MagmaCollections in my
> image though:
> ...
> That is the expected
> number given a single
> MagmaSession. The second MagmaSession seems to be an
> extra internal
> session used by Magma (right?) and perhaps that
> session is for some
> reason also materializing the collections - which
> would explain the
> double amount (18 instead of 9).
Exactly right. Magma has a meta-model that is
maintained via its own transaction mechanism. The
meta-model includes such things as the
class-definitions, the magma-collections and their
indexes, the code-base for the repository, etc. See
MagmaRepositoryDefinition. It is the root of the
"meta side".
When a new class-definition or large-collection is
added, the server refreshes its own "internal"
session
because it must know about them to do its work
properly.
> > No, this does not sound correct. All
> LargeCollections
> > should be monitored as soon as they're
persistent.
> If
> > they're not persistent, changing keys doesn't
> matter.
> > But again, I'm talking codeless here..
>
> So a MagmaSession always "knows" all
> MagmaCollections in a db,
> regardless of if they have been navigated and
> materialized in the
> session yet?
Since all the MagmaCollections are part of the
MagmaRepositoryDefinition (the meta root), and this
definition is faulted down and materialized upon
connect, the answer is yes, each connected
MagmaSession always knows all MagmaCollections in a
db.
> Btw, in my app you can actually have the server
> "build" a separate Magma
> db, then download it, unzip and reconnect to it on
> the clients locally -
Wow, you can tell me more about this? This is
obviously part of the "working offline"
function,
right?
This might be painful if you are planning to try to
"merge" the offline work back into the
"master" later.
I have planned, for 1.2, an efficient server-to-server
protocol that will allow large chunks of domains to be
transported between repositories without having to go
through the client; and, further, to be able to
"sync"
up with the original repository. I hope to have this
done by summer.
> so it was nice to see that you are very careful with
> predicting "odd"
> scenarios - because I then stumbled onto this
> exception (perfectly
> correctly - because I needed to reconnect etc)
> signalled in
> MagmaSession>>validateRemoteId :
>
> "Cannot connect because the repository has been
> replaced."
>
>
I never imagined anyone would run into that condition
so soon. Glad you
are putting it through some
good paces.
So I gather you discovered you just need to connect
with a new MagmaSession instance instead of trying to
reuse the old one.
> Hmmm, let me see now... above you are saying (I
> guess) that only
> monitored MagmaCollections will be reindexed. And
> AFAICT from the code
> the monitored collections are the ones we have
> materialized in the
> session. But above you wrote "All
LargeCollections
> should be monitored
> as soon as they're persistent." which seems
> contradictory.
This question is hopefully answered now (above). All
MagmaCollections in the db are monitored as soon as
you connect because they're part of the meta
RepositoryDef. All newly craeted ones since the
connect are monitored as soon as they become
persistent via your commit. Non-persistent
collections with indices do not suffer from key-change
side-effects.
> PS. Very happy with Magma so far. And yes,
the
> second demo the other
> day went fine and we have a GO for the project! And
> most likely we will
> open source it too.
Fantastic! Someday I hope my Java-Oracle cohorts will
at least *listen* to an alternative for five-minutes
without smirk and ridicule (about which they know
NOTHING). In the meantime, we spend hours and
hundreds of e-mails every day toiling over
column-lengths, types, slow-BLOBs and CLOBs,
constraint order, naming-abbreviation
"standards", DBA
fights, etc. etc. Blecch!
- Chris
|
|
| Indexing |

|
2006-02-24 19:06:20 |
Hi Chris!
Chris Muller <chris funkyobjects.org> wrote:
[SNIP]
> So this is another reason to stay with 1.0 for now. I
> have merged the fixes into my own local 1.1 but not
> planning to commit it to SqueakSource yet until I'm
> done with this iteration.
Ok, yes, I will be sticking to 1.0 until there is a
compelling reason to
switch for me - and KryptOn is not AFAICT such a reason in
this
particular project.
> > Anyway, I just loaded Magma1.0-cmm.4 and my app
> > still works.
> >
> > (Cees and others are now using Monticello
> > Configurations, perhaps that
> > is an option for you too - a config is just a list
> > of specific
> > snapshots)
>
> I have no preference either way other than I really
> don't want to have a SqueakSource server running right
> now just to use MC-Configs.. When they support
> File-based repositories I'll check them out again.
Oh, ok. Didn't know that.
> > Ok, so in my app I have MagmaCollections in three
> > different places and
> > given the number of instances of my domain objects
> > at the moment I
> > should have this number of MagmaCollections:
> >
> > (Q2Model allInstances size * 2) + (Q2Process
> > allInstances size) ==> 9
> >
> > Note: Q2Model has two MagmaCollection instvars and
> > Q2Process one.
> >
> > This gives me 9 right now. MagmaCollection
> > allInstances size gives me
> > 378! And MagmaCollectionChanges allInstances size
> > gives 379.
>
> The next time this happens, see how many instances of
> MagmaSession you have. Remember, they all have their
> own copy of all the MagmaCollections and changes.
Right, I am aware of that.
> There have been intermittent issues with cleanup of
> old sessions over the years, it may be back.. It was
> always related to Block/Method contexts holding old
> Sessions in one of their (temp-var?) references..
> There is a utility method, MagmaSession
> class>>#cleanUp which enumerates all instances of
> these contexts does a fine job of getting rid of the
> ones; print-it to see the before/after instance count.
Good advice! I have been battling trying to get rid of
MagmaSessions
quite a bit you see.
It has seemed quite odd to me, but I will try that.
> > Hmmmmm, ok - now I cleaned out my Magma db
directory
> > (it had tons of
> > older MagmaCollection files - indexes that is -
> > around 370-ish). Now it
> > looks much better.
>
> Now this confuses me. "Cleaning up" the
the directory
> files alone should have no effect on the number of
> instances in the image.. ??
No, I actually toasted the whole dir, recreated the db and
indexes and
all.
The problem is probably related to the fact that my
"fill the db with
stuff" code also creates the indexes (at the same time
as I instantiate
the MagmaCollections) so running that code (reinitializing
my domain
model) over and over creates more and more index files. And
then - when
I close and reopen the db Magma evidently gets a bit
confused - that is
my guess.
> > I still have "twice too many"
> > MagmaCollections in my
> > image though:
> > ...
> > That is the expected
> > number given a single
> > MagmaSession. The second MagmaSession seems to be
an
> > extra internal
> > session used by Magma (right?) and perhaps that
> > session is for some
> > reason also materializing the collections - which
> > would explain the
> > double amount (18 instead of 9).
>
> Exactly right. Magma has a meta-model that is
> maintained via its own transaction mechanism. The
> meta-model includes such things as the
> class-definitions, the magma-collections and their
> indexes, the code-base for the repository, etc. See
> MagmaRepositoryDefinition. It is the root of the
> "meta side".
Aha. Nice. And good to know.
> When a new class-definition or large-collection is
> added, the server refreshes its own
"internal" session
> because it must know about them to do its work
> properly.
>
> > > No, this does not sound correct. All
> > LargeCollections
> > > should be monitored as soon as they're
persistent.
> > If
> > > they're not persistent, changing keys
doesn't
> > matter.
> > > But again, I'm talking codeless here..
> >
> > So a MagmaSession always "knows" all
> > MagmaCollections in a db,
> > regardless of if they have been navigated and
> > materialized in the
> > session yet?
>
> Since all the MagmaCollections are part of the
> MagmaRepositoryDefinition (the meta root), and this
> definition is faulted down and materialized upon
> connect, the answer is yes, each connected
> MagmaSession always knows all MagmaCollections in a
> db.
Ok. Good. Now I have a much better "picture" of
how this works.
> > Btw, in my app you can actually have the server
> > "build" a separate Magma
> > db, then download it, unzip and reconnect to it on
> > the clients locally -
>
> Wow, you can tell me more about this? This is
> obviously part of the "working offline"
function,
> right?
Indeed. The master server has code to create a separate
Magma db, then
does an intricate veryDeepCopy of the model, and excluding
various parts
depending on the permissions of the user etc, and stores it
in the new
db. The db is then zipped up and served out by KomHttpServer
as a single
zip file. Then I use external calls to wget and unzip
(because I expect
this db to possibly become quite large) from the client to
get it down,
unpack etc.
The neat part is that all this is done behind a Seaside UI
so the user
simply logs on, choose a "mirror" and press
"download" and voila - back
to the login screen, but now the client Seaside app has a
partial mirror
of the master server db.
> This might be painful if you are planning to try to
> "merge" the offline work back into the
"master" later.
Nope, not at all. All
changes to the domain model are modelled using
the Command pattern - or as I like to call them
"transactions" (not to
be confused with Magma transactions of course).
So all modifications to the model are funneled through the
top object
which in turn creates instances of Q2Txn (with concrete
subclasses for
each type of change), call them to do their work and then I
store them
in a MagmaCollection.
So basically I should be able to nuke the model and rebuild
it in full
by simply applying all those Q2Txn instances in sequence.
Quite
Prevaylerish in style.
Now - this model comes into real play in the offline
scenario - a client
simply first downloads all "unknown" Q2Txns,
applies them (bringing the
local Magma db up to date), then uploads all local Q2Txns to
be applied
at the master server.
I have all this working today - the Q2Txn instances are
first
"disconnected" (using UUIDs instead of object
refs) from the domain
objects, serialized using ReferenceStream and gzipped, then
sent over as
a ByteArray using SOAP (which does base64 encoding I think)
and then
rematerialized on the other side, reconnected in the new
model and
"applied". Works like a charm.
And since I then have real objects for all operations I kind
attach
specific conflict code to each kind of transaction object.
So a little
bit of manual work - but it pays off. And in other ways too
- like
having full complete logging and traceability of all changes
- per
definition.
> I have planned, for 1.2, an efficient server-to-server
> protocol that will allow large chunks of domains to be
> transported between repositories without having to go
> through the client; and, further, to be able to
"sync"
> up with the original repository. I hope to have this
> done by summer.
Ok, sounds like very useful tech for us - but we can't wait
for it.
But it might come in handy later on.
Our scenario is first a full download of a partial db done
on the LAN
and then regular synchs (sending those Q2Txns back and
forth) with quite
small data. And since the Q2Txns are only deltas they turn
very small.
[SNIP]
> So I gather you discovered you just need to connect
> with a new MagmaSession instance instead of trying to
> reuse the old one.
Indeed. No problem.
> > Hmmm, let me see now... above you are saying (I
> > guess) that only
> > monitored MagmaCollections will be reindexed. And
> > AFAICT from the code
> > the monitored collections are the ones we have
> > materialized in the
> > session. But above you wrote "All
LargeCollections
> > should be monitored
> > as soon as they're persistent." which seems
> > contradictory.
>
> This question is hopefully answered now (above). All
> MagmaCollections in the db are monitored as soon as
> you connect because they're part of the meta
> RepositoryDef. All newly craeted ones since the
> connect are monitored as soon as they become
> persistent via your commit. Non-persistent
> collections with indices do not suffer from key-change
> side-effects.
Ok. Got it.
> > PS. Very happy with Magma so far. And yes,
the
> > second demo the other
> > day went fine and we have a GO for the project!
And
> > most likely we will
> > open source it too.
>
> Fantastic! Someday I hope my Java-Oracle cohorts will
> at least *listen* to an alternative for five-minutes
> without smirk and ridicule (about which they know
> NOTHING). In the meantime, we spend hours and
> hundreds of e-mails every day toiling over
> column-lengths, types, slow-BLOBs and CLOBs,
> constraint order, naming-abbreviation
"standards", DBA
> fights, etc. etc. Blecch!
Hehe, yes indeed. A sidenote:
I ran a 2-hour workshop yesterday with 8 other employees at
Toolkit
(where I work).
It was a "Shock and Awe"-workshop throwing them
right into a stripped
version of my customer app - focusing mainly on Seaside but
with Magma
inside too of course.
One of the fun parts is that with the Seaside/Magma
integration and my
bits and pieces already in place they never ever saw a
single line
related to the db.
One pair of developers added instvars in the domain model,
created
objects per user object in the model, yaddayadda - and it
"just worked".
Even if they actually know a bit about OODBs I still think
they were a
bit mesmerized. I mean - hey, they didn't write a single
line of code
for it - not even a "commit".
> - Chris
regards, Göran
|
|
[1-3]
|
|