List Info

Thread: FreeBSD UFS2 snapshots, and math ... - resolved, but two more Qs




FreeBSD UFS2 snapshots, and math ... - resolved, but two more Qs
user name
2005-10-20 20:51:06
Folks,

On Thu, 20 Oct 2005, Gayn Winters wrote:

> > Imagine that each data block is marked with labels
> > on change. It doesn't matter how many labels there
> > are, there will be only one data block saved.
> 
> In trying to follow this thread, I started looking
around for a precise
> definition of snapshot.
> Man mksnap_ffs
> wasn't too helpful, and googling for
"snapshot" etc. wasn't fruitful.
> I'm guessing that the original author of the thread
(user at dhp.com)
> may also need such a definition.  Can someone provide a
pointer to a
> specification or at least an RFC-like paper?


I found one:

http://www.freebsd.org/cgi/cvsweb.c
gi/src/sys/ufs/ffs/README.snapshot?rev=1.4

and further, I did some tests and discovered that what I was
being told
(by you folks) was indeed correct.

No matter how many snapshots you have, the changes in blocks
since the
tiem before the first snapshot is only recorded in one of
them.  That is
to say, if I do the following:

- create 4 1gig /dev/zero filled files
- create a snapshot
- overwrite one of those 1gig files with /dev/random

My free space will have decreased by 1gig.  So far so good.

If I then:

- create a second snapshot
- overwrite a different 1gig file with /dev/random

My free space merely decreases by another 1gig.  It makes
sense to me now
because it has occurred to me that since the second file had
not changed
between the creation of the first and second snapshot, there
is no reason
for _both_ snapshots to _both_ say "this 1gig random
file used to be
filled with zeros" - it would be redundant.

So that's great ... but I am curious, how do they know ?  I
think my
previous assumption (that the first _and_ the second
snapshot file would
_both_ have to record the change of file #2 from zero to
random) was based
on the notion that these snapshot files were totally
autonomous and
independent, and had no general organization behind them. 
If that was the
case, then I am still fairly certain both snapshots would
need to record
the change of the second file.

So what is the behind the scenes organization that makes it
possible for
the snapshot files to not duplicate data like that ?

ALSO,

I have noticed that if you:

- dd 1gig /dev/zero file
- create snapshot
- overwrite that 1gig file with /dev/random

(free space decreases by 1gig, as expected)

- rewrite that 1gig file with /dev/zero again

You _don't_ get that 1gig of free space back ... which
surprises me, since
it was all zeros before, and its all zeros now ... how does
the snapshot
know those are "different zeros" ?  And what
ramifications does this have
for restoring, etc., if identical files do not get counted
as identical in
the snapshot ?

thanks.

_______________________________________________
freebsd-questionsfreebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-que
stions
To unsubscribe, send any mail to
"freebsd-questions-unsubscribefreebsd.org"
FreeBSD UFS2 snapshots, and math ... - resolved, but two more Qs
user name
2005-10-20 21:22:26

> -----Original Message-----
> From: user [mailto:userdhp.com] 
> Sent: Thursday, October 20, 2005 1:51 PM
> To: Gayn Winters
> Cc: 'Andrew P.'; freebsd-questionsfreebsd.org
> Subject: RE: FreeBSD UFS2 snapshots, and math ... -
resolved, 
> but two more Qs
> 
> 
> 
> Folks,
> 
> On Thu, 20 Oct 2005, Gayn Winters wrote:
> 
> > > Imagine that each data block is marked with
labels
> > > on change. It doesn't matter how many labels
there
> > > are, there will be only one data block saved.
> > 
> > In trying to follow this thread, I started looking
around 
> for a precise
> > definition of snapshot.
> > Man mksnap_ffs
> > wasn't too helpful, and googling for
"snapshot" etc. wasn't 
> fruitful.
> > I'm guessing that the original author of the
thread (user 
> at dhp.com)
> > may also need such a definition.  Can someone
provide a pointer to a
> > specification or at least an RFC-like paper?
> 
> 
> I found one:
> 
> http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/ufs/
ffs/README.s
> napshot?rev=1.4
> 
> and further, I did some tests and discovered that what
I was 
> being told
> (by you folks) was indeed correct.
> 
> No matter how many snapshots you have, the changes in
blocks since the
> tiem before the first snapshot is only recorded in one
of 
> them.  That is
> to say, if I do the following:
> 
> - create 4 1gig /dev/zero filled files
> - create a snapshot
> - overwrite one of those 1gig files with /dev/random
> 
> My free space will have decreased by 1gig.  So far so
good.
> 
> If I then:
> 
> - create a second snapshot
> - overwrite a different 1gig file with /dev/random
> 
> My free space merely decreases by another 1gig.  It
makes 
> sense to me now
> because it has occurred to me that since the second
file had 
> not changed
> between the creation of the first and second snapshot,
there 
> is no reason
> for _both_ snapshots to _both_ say "this 1gig
random file used to be
> filled with zeros" - it would be redundant.
> 
> So that's great ... but I am curious, how do they know
?  I think my
> previous assumption (that the first _and_ the second
snapshot 
> file would
> _both_ have to record the change of file #2 from zero
to 
> random) was based
> on the notion that these snapshot files were totally
autonomous and
> independent, and had no general organization behind
them.  If 
> that was the
> case, then I am still fairly certain both snapshots
would 
> need to record
> the change of the second file.
> 
> So what is the behind the scenes organization that
makes it 
> possible for
> the snapshot files to not duplicate data like that ?
> 
> ALSO,
> 
> I have noticed that if you:
> 
> - dd 1gig /dev/zero file
> - create snapshot
> - overwrite that 1gig file with /dev/random
> 
> (free space decreases by 1gig, as expected)
> 
> - rewrite that 1gig file with /dev/zero again
> 
> You _don't_ get that 1gig of free space back ... which 
> surprises me, since
> it was all zeros before, and its all zeros now ... how
does 
> the snapshot
> know those are "different zeros" ?  And what
ramifications 
> does this have
> for restoring, etc., if identical files do not get
counted as 
> identical in
> the snapshot ?
> 
> thanks.
> 

I just finished skimming an old paper by McKusick on Soft
Updates:
http://www.usenix.org/publications/libra
ry/proceedings/usenix99/full_pap
ers/mckusick/mckusick.pdf
This paper is dated 1999.  Does anyone know if it accurately
reflects
how soft updates and snapshots in FreeBSD 5.4 are
implemented?  If so,
it would answer the above questions.

-gayn


_______________________________________________
freebsd-questionsfreebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-que
stions
To unsubscribe, send any mail to
"freebsd-questions-unsubscribefreebsd.org"
FreeBSD UFS2 snapshots, and math ... - resolved, but two more Qs
user name
2005-10-21 13:34:45
user <userdhp.com> writes:

> Folks,
> 
> On Thu, 20 Oct 2005, Gayn Winters wrote:
> 
> > > Imagine that each data block is marked with
labels
> > > on change. It doesn't matter how many labels
there
> > > are, there will be only one data block saved.
> > 
> > In trying to follow this thread, I started looking
around for a precise
> > definition of snapshot.
> > Man mksnap_ffs
> > wasn't too helpful, and googling for
"snapshot" etc. wasn't fruitful.
> > I'm guessing that the original author of the
thread (user at dhp.com)
> > may also need such a definition.  Can someone
provide a pointer to a
> > specification or at least an RFC-like paper?
> 
> 
> I found one:
> 
> http://www.freebsd.org/cgi/cvsweb.c
gi/src/sys/ufs/ffs/README.snapshot?rev=1.4
> 
> and further, I did some tests and discovered that what
I was being told
> (by you folks) was indeed correct.
> 
> No matter how many snapshots you have, the changes in
blocks since the
> tiem before the first snapshot is only recorded in one
of them.  That is
> to say, if I do the following:
> 
> - create 4 1gig /dev/zero filled files
> - create a snapshot
> - overwrite one of those 1gig files with /dev/random
> 
> My free space will have decreased by 1gig.  So far so
good.
> 
> If I then:
> 
> - create a second snapshot
> - overwrite a different 1gig file with /dev/random
> 
> My free space merely decreases by another 1gig.  It
makes sense to me now
> because it has occurred to me that since the second
file had not changed
> between the creation of the first and second snapshot,
there is no reason
> for _both_ snapshots to _both_ say "this 1gig
random file used to be
> filled with zeros" - it would be redundant.
> 
> So that's great ... but I am curious, how do they know
?  I think my
> previous assumption (that the first _and_ the second
snapshot file would
> _both_ have to record the change of file #2 from zero
to random) was based
> on the notion that these snapshot files were totally
autonomous and
> independent, and had no general organization behind
them.  If that was the
> case, then I am still fairly certain both snapshots
would need to record
> the change of the second file.

Yes, they both need to notice, but they can share the actual
copy of
the data.

> So what is the behind the scenes organization that
makes it possible for
> the snapshot files to not duplicate data like that ?

Without trying to give a whole course in filesystems (there
are books
available if you want to go in depth), the data in the file
is
held in a number of data blocks, but there is meta-data that
tells
where the data is.  When a file is overwritten, the
snapshots continue
to use the old version of the meta-data, which continues to
point to
the old data, while the "real" filesystem creates
a new meta-data
container pointing to new data blocks.  If you then make
another
snapshot, the snapshot will use the new meta-data and its
associated
underlying data. 

It's an application of the "copy-on-write"
principle.
http://en.
wikipedia.org/wiki/Copy-on-write

> ALSO,
> 
> I have noticed that if you:
> 
> - dd 1gig /dev/zero file
> - create snapshot
> - overwrite that 1gig file with /dev/random
> 
> (free space decreases by 1gig, as expected)
> 
> - rewrite that 1gig file with /dev/zero again
> 
> You _don't_ get that 1gig of free space back ... which
surprises me, since
> it was all zeros before, and its all zeros now ... how
does the snapshot
> know those are "different zeros" ?  And what
ramifications does this have
> for restoring, etc., if identical files do not get
counted as identical in
> the snapshot ?

The snapshot doesn't know what the bits in the file are. 
All it knows
is that the file's data used to be, say in "block
1857" and now the
file's data are in "block 1956".  The fact that
both blocks are
identical is not detected.

If you're really interested in this, I suggest reading a
decent
operating systems book.  It's a lot easier to understand the
specific
implementation when you have a good grip on the standard
terminology
and principles.
_______________________________________________
freebsd-questionsfreebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-que
stions
To unsubscribe, send any mail to
"freebsd-questions-unsubscribefreebsd.org"
FreeBSD UFS2 snapshots, and math ... - resolved, but two more Qs
user name
2005-10-21 14:24:30

On 21 Oct 2005, Lowell Gilbert wrote:

> The snapshot doesn't know what the bits in the file
are.  All it knows
> is that the file's data used to be, say in "block
1857" and now the
> file's data are in "block 1956".  The fact
that both blocks are
> identical is not detected.
> 
> If you're really interested in this, I suggest reading
a decent
> operating systems book.  It's a lot easier to
understand the specific
> implementation when you have a good grip on the
standard terminology
> and principles.


Thanks very much for your help.  I am going to read a book
or two - my
plan was to start with "the design adn implementation
of the 4.4BSD OS",
but I wanted to update it with more modern information -
like snapshots,
etc., which I will do with those URLs we have already posted
RE: the
snapshot work.

If you have any others, let me know.

_______________________________________________
freebsd-questionsfreebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-que
stions
To unsubscribe, send any mail to
"freebsd-questions-unsubscribefreebsd.org"
FreeBSD UFS2 snapshots, and math ... - resolved, but two more Qs
user name
2005-10-21 16:08:24
user <userdhp.com> writes:

> On 21 Oct 2005, Lowell Gilbert wrote:
> 
> > The snapshot doesn't know what the bits in the
file are.  All it knows
> > is that the file's data used to be, say in
"block 1857" and now the
> > file's data are in "block 1956".  The
fact that both blocks are
> > identical is not detected.
> > 
> > If you're really interested in this, I suggest
reading a decent
> > operating systems book.  It's a lot easier to
understand the specific
> > implementation when you have a good grip on the
standard terminology
> > and principles.
> 
> 
> Thanks very much for your help.  I am going to read a
book or two - my
> plan was to start with "the design adn
implementation of the 4.4BSD OS",
> but I wanted to update it with more modern information
- like snapshots,
> etc., which I will do with those URLs we have already
posted RE: the
> snapshot work.
> 
> If you have any others, let me know.

Yes.  Start with something more basic, because McKusick's
books assume
that you are already acquainted with the standard
terminology.
Tanenbaum's are the usual recommendations.  And when you do
get to
McKusick, you'll do a lot better with the new "Design
and
Implementation of the FreeBSD Operating System," which
covers a lot of
these recent improvements.
_______________________________________________
freebsd-questionsfreebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-que
stions
To unsubscribe, send any mail to
"freebsd-questions-unsubscribefreebsd.org"
[1-5]

about | contact  Other archives ( Real Estate discussion Medical topics )