|
List Info
Thread: SoC project proposal
|
|
| SoC project proposal |
  Slovakia |
2007-03-16 18:19:39 |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi
I have written this project proposal for this year's summer
of code. You can
read it here
http://wasabi.fiit.stuba.sk/~haad/netbsd/soc_pro_ext3.
html
Here is short version: <<EOF
General
The Ext2 file system is the de-facto standard, Unix-like
file system used on
Linux installations. Ext2 does not have journaling
capabilities, so Ext3 was
built on top of it to add them without breaking
compatibility with Ext2. Ext3 is
now a stable journaled file system used on lots of Linux
installations.
NetBSD currently fully supports the Ext2 file system at the
kernel level.
Unfortunately there is no support for the new features
included in Ext3,
although Ext3 file systems can be mounted provided that
their journal is clean.
It would be very nice if NetBSD had Ext3 file system support
because the system
could immediately gain a journaled file system as well as
compatibility with Linux.
NetBSD as operating system really need good, stable journal
file system, today
disks and raids become more and more bigger with size about
1TB or more. FFS was
not designed for disks size like this. We have problems with
file system sizes
over 2TB (nor FFS or FFS2 is suitable for this size) good
ext3/ext4 support will
give away these problems.
EXT3 file system features:
*
Journaling
*
Over 16TB file system size
1.
Journaling
In a nutshell, the journal in ext3fs meaning is a
regular file which
stores whole metadata (and optionally data) blocks that have
been modified,
prior to writing them into the filesystem. This means it is
possible to add a
journal to an existing ext2 file system without the need for
data conversion.
When changes to the filesystem (e.g. a file is
renamed) they are stored in
a transaction in the journal and can either be complete or
incomplete at the
time of a crash. If a transaction is complete at the time of
a crash (or in the
normal case where the system does not crash), then any
blocks in that
transaction are guaranteed to represent a valid filesystem
state, and are copied
into the filesystem. If a transaction is incomplete at the
time of the crash,
then there is no guarantee of consistency for the blocks in
that transaction so
they are discarded
2.
Availability
By contrast, ext3 does not require a file system
check, even after an
unclean system shutdown, except for certain rare hardware
failure cases (e.g.
hard drive failures). This is because the data is written to
disk in such a way
that the file system is always consistent. The time to
recover an ext3 file
system after an unclean system shutdown does not depend on
the size of the file
system or the number of files; rather, it depends on the
size of the "journal"
used to maintain consistency.
3.
Data Integrity
Using the ext3 file system can provide stronger
guarantees about data
integrity in case of an unclean system shutdown. You choose
the type and level
of protection that your data receives. You can choose to
keep the file system
consistent, but allow for damage to data on the file system
in the case of
unclean system shutdown; this can give a modest speed up
under some but not all
circumstances. Alternatively, you can choose to ensure that
the data is
consistent with the state of the file system; this means
that you will never see
garbage data in recently-written files after a crash.
Linux use journal block device to manage journals for their
filesystems like
ext3... . I think that NetBSD need something similar to
Linux's JBD(Journal
block device).
Journal block device
Linux use for journaling JBD Journal Block Device. JBD
provides atomicity in
operations. It was design to add journaling capabilities on
a block device. The
ext3 filesystem code will inform the JBD of modifications it
is performing
(called a transaction). he journal supports the transactions
start and stop, and
in case of crash, the journal can replayed the transactions
to put the partition
back in a consistent state fast.
Good journal API can be used in our non journaled
filesystems e.g ffs. Main goal
of my Soc project should be design and implementation of
good journal API and
then implement ext3fs support.
JBD API is used to open,load,commit and administer journal
transactions on
device. In Linux JBD is defined in fs/jbd/ and
include/linux/jbd.h.
JBD use these objects in their API:
handle,transaction,journal.
1.
Handle is single atomic update on filesystem. Handle
is a group of
writes/updates on disk that should be performed atomically.
2.
Handles can be stored in groups called transactions.
Only transactions are
flushed to journal. Transaction is atomicity in nature
because consists only
from atomic handles. When transaction is being committed it
can have these states:
1.
Running: the transaction currently is live and
can accept new
handles. In a system only one transaction can be in the
running state.
2.
Locked: the transaction does not accept any new
handles but existing
handles are not complete. Once all the existing handles are
completed, the
transaction goes to the next state.
3.
Flush: all the handles in a transaction are
complete. The
transaction is writing itself to the journal.
4.
Commit: the entire transaction log has been
written to the journal.
The transaction is writing a commit block indicating that
the transaction log in
the journal is complete.
5.
Finished: the transaction is written completely
to the journal. It
has to remain there until the blocks are updated to the
actual locations on the
disk.
Extending our ext2fs support
Our ext2fs implementation is located src/sys/ufs/ext2fs/. I
will use this path
when explicitly define another path. For linux paths I
implicitly mean
/usr/src/linux/fs/ext3/ path.
Ext3fs SuperBlock
I have to extend our super block structure defined in
ext2fs.h to support ext3fs
journal options used. In our superblock structure there is
padding included
which can be used for adding new features.
Also struct m_ext2fs need to have a least new journal
mounted flag. If we want
EXT3 ACL support structures for struct
ext3_acl_header,struct ext3_acl_entry are
needed.
Journal
A journal is a log that internally manages updates for a
single block device.
The updates first are stored in the journal and then are
reflected to their real
locations on the disk. The area belonging to the journal is
managed like a
circular-linked list. That is, the journal reuses its area
when the journal is full.
User land part
I have to write usable BSD license mke2fs program, and
e2fsck if we want to use
ext3 file system without additional packages from pkgsrc.
Here I will also write
new or extend our mount_e2fs to support journaling.
Documentation
Write good documentation about development process so other
developers can use,
include it to NetBSD internals book.
EOF
I'm working on this proposal now ,so it's work in progress
now , but I want to
discuss this project here.
Regards
-
------------------------------------------------------------
---
Adam Hamsik
ICQ 249727910
jabber haad jabber.org
-
------------------------------------------------------------
---
There are 10 kinds of people in the world. Those who
understand
binary numbers, and those who don't.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (NetBSD)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFF+yYL9Wt2FT7y228RAj5RAKCCr3Uc8Gk8o00lsAyfSUraUj2BfwCf
cRO2
Hn5KKd6GKRRZ2CMPbyE0N8g=
=T1j9
-----END PGP SIGNATURE-----
|
|
| Re: SoC project proposal |
  Slovakia |
2007-03-18 07:10:21 |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Bill Stouder-Studenmund wrote:
> On Sat, Mar 17, 2007 at 12:19:39AM +0100, haad wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Hi
>>
>> I have written this project proposal for this
year's summer of code. You can
>> read it here
>>
>> http://wasabi.fiit.stuba.sk/~haad/netbsd/soc_pro_ext3.
html
>>
>>
>> Here is short version: <<EOF
>> General
>>
>> The Ext2 file system is the de-facto standard,
Unix-like file system used on
>> Linux installations. Ext2 does not have journaling
capabilities, so Ext3 was
>> built on top of it to add them without breaking
compatibility with Ext2. Ext3 is
>> now a stable journaled file system used on lots of
Linux installations.
>>
>> NetBSD currently fully supports the Ext2 file
system at the kernel level.
>> Unfortunately there is no support for the new
features included in Ext3,
>> although Ext3 file systems can be mounted provided
that their journal is clean.
>> It would be very nice if NetBSD had Ext3 file
system support because the system
>> could immediately gain a journaled file system as
well as compatibility with Linux.
>>
>> NetBSD as operating system really need good, stable
journal file system, today
>> disks and raids become more and more bigger with
size about 1TB or more. FFS was
>> not designed for disks size like this. We have
problems with file system sizes
>> over 2TB (nor FFS or FFS2 is suitable for this
size) good ext3/ext4 support will
>> give away these problems.
>
> Note: this is not correct. While I do not question the
idea that it could
> be EXCRUCIATINGLY PAINFUL to use either an ffs1 or ffs2
file system for a
> multi-TB file system, it can be done.
>
> ffs1 supports 2^31 fs blocks. These are what everyone
calls fragments. So
> a 1k fragment size ffs can support 2 TB. A 4k fragment
can support 8 TB,
> and so on.
>
> Changing the block pointers to 64-bit numbers was one
of the main points
> of ffs2/ufs2. So many-TB support was the point.
>
>> EXT3 file system features:
>>
>> *
>>
>> Journaling
>> *
>>
>> Over 16TB file system size
>>
>> 1.
>>
>> Journaling
>>
>> In a nutshell, the journal in ext3fs meaning
is a regular file which
>> stores whole metadata (and optionally data) blocks
that have been modified,
>> prior to writing them into the filesystem. This
means it is possible to add a
>> journal to an existing ext2 file system without the
need for data conversion.
>>
>> When changes to the filesystem (e.g. a file
is renamed) they are stored in
>> a transaction in the journal and can either be
complete or incomplete at the
>> time of a crash. If a transaction is complete at
the time of a crash (or in the
>> normal case where the system does not crash), then
any blocks in that
>> transaction are guaranteed to represent a valid
filesystem state, and are copied
>> into the filesystem. If a transaction is incomplete
at the time of the crash,
>> then there is no guarantee of consistency for the
blocks in that transaction so
>> they are discarded
>> 2.
>>
>> Availability
>>
>> By contrast, ext3 does not require a file
system check, even after an
>> unclean system shutdown, except for certain rare
hardware failure cases (e.g.
>> hard drive failures). This is because the data is
written to disk in such a way
>> that the file system is always consistent. The time
to recover an ext3 file
>> system after an unclean system shutdown does not
depend on the size of the file
>> system or the number of files; rather, it depends
on the size of the "journal"
>> used to maintain consistency.
>> 3.
>>
>> Data Integrity
>>
>> Using the ext3 file system can provide
stronger guarantees about data
>> integrity in case of an unclean system shutdown.
You choose the type and level
>> of protection that your data receives. You can
choose to keep the file system
>> consistent, but allow for damage to data on the
file system in the case of
>> unclean system shutdown; this can give a modest
speed up under some but not all
>> circumstances. Alternatively, you can choose to
ensure that the data is
>> consistent with the state of the file system; this
means that you will never see
>> garbage data in recently-written files after a
crash.
>>
>> Linux use journal block device to manage journals
for their filesystems like
>> ext3... . I think that NetBSD need something
similar to Linux's JBD(Journal
>> block device).
>>
>> Journal block device
>>
>> Linux use for journaling JBD Journal Block Device.
JBD provides atomicity in
>> operations. It was design to add journaling
capabilities on a block device. The
>> ext3 filesystem code will inform the JBD of
modifications it is performing
>> (called a transaction). he journal supports the
transactions start and stop, and
>> in case of crash, the journal can replayed the
transactions to put the partition
>> back in a consistent state fast.
>>
>> Good journal API can be used in our non journaled
filesystems e.g ffs. Main goal
>> of my Soc project should be design and
implementation of good journal API and
>> then implement ext3fs support.
>
> Sounds good.
Great .
>
>> JBD API is used to open,load,commit and administer
journal transactions on
>> device. In Linux JBD is defined in fs/jbd/ and
include/linux/jbd.h.
>>
>> JBD use these objects in their API:
handle,transaction,journal.
>>
>> 1.
>>
>> Handle is single atomic update on filesystem.
Handle is a group of
>> writes/updates on disk that should be performed
atomically.
>> 2.
>>
>> Handles can be stored in groups called
transactions. Only transactions are
>> flushed to journal. Transaction is atomicity in
nature because consists only
>> from atomic handles. When transaction is being
committed it can have these states:
>> 1.
>>
>> Running: the transaction currently is
live and can accept new
>> handles. In a system only one transaction can be in
the running state.
>
> I'll want to see how things develop, but this could be
a bottle neck
> eventually. If I understand you correctly, this means
that only one thread
> in the file system can update metadata (or real data)
at once. I don't
> like that idea. However, chances are that this is a
fine assumption to
> start with.
I can look at this later ,when it will work with one thread.
I will keep
multiple thread option in mind when I will design/code this
API.
>
>> 2.
>>
>> Locked: the transaction does not accept
any new handles but existing
>> handles are not complete. Once all the existing
handles are completed, the
>> transaction goes to the next state.
>> 3.
>>
>> Flush: all the handles in a transaction
are complete. The
>> transaction is writing itself to the journal.
>> 4.
>>
>> Commit: the entire transaction log has
been written to the journal.
>> The transaction is writing a commit block
indicating that the transaction log in
>> the journal is complete.
>> 5.
>>
>> Finished: the transaction is written
completely to the journal. It
>> has to remain there until the blocks are updated to
the actual locations on the
>> disk.
>>
>>
>> Extending our ext2fs support
>>
>> Our ext2fs implementation is located
src/sys/ufs/ext2fs/. I will use this path
>> when explicitly define another path. For linux
paths I implicitly mean
>> /usr/src/linux/fs/ext3/ path.
>> Ext3fs SuperBlock
>>
>> I have to extend our super block structure defined
in ext2fs.h to support ext3fs
>> journal options used. In our superblock structure
there is padding included
>> which can be used for adding new features.
>> Also struct m_ext2fs need to have a least new
journal mounted flag. If we want
>> EXT3 ACL support structures for struct
ext3_acl_header,struct ext3_acl_entry are
>> needed.
>
> Just to be clear, we sill be adding the exact same
features that normal
> ext3fs has, correct?
Yes AFAIK ACL are not essential for using ext3fs.
>> Journal
>>
>> A journal is a log that internally manages updates
for a single block device.
>> The updates first are stored in the journal and
then are reflected to their real
>> locations on the disk. The area belonging to the
journal is managed like a
>> circular-linked list. That is, the journal reuses
its area when the journal is full.
>>
>> User land part
>>
>> I have to write usable BSD license mke2fs program,
and e2fsck if we want to use
>> ext3 file system without additional packages from
pkgsrc. Here I will also write
>> new or extend our mount_e2fs to support
journaling.
>
> This is not correct. While we would PREFER a
BSD-licensed set of tools, we
> can use GPL'd tools if needed. I mainly mention this as
the other aspects
> of this project NEED to happen in the SoC time frame,
while this can be
> cleaned up later.
Sorry for this, I will remove this part from my project
proposal.
>> Documentation
>>
>> Write good documentation about development process
so other developers can use,
>> include it to NetBSD internals book.
>>
>> EOF
>>
>> I'm working on this proposal now ,so it's work in
progress now , but I want to
>> discuss this project here.
>
> Take care,
>
> Bill
Regards
-
------------------------------------------------------------
---
Adam Hamsik
ICQ 249727910
jabber haad jabber.org
-
------------------------------------------------------------
---
There are 10 kinds of people in the world. Those who
understand
binary numbers, and those who don't.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (NetBSD)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFF/Swt9Wt2FT7y228RAgUYAKCFD94pT6UVn2kCdR0DPpuOKKpKvQCg
kZRU
O9dhuOrIZ0XZQm0pF1aIJww=
=qZ8I
-----END PGP SIGNATURE-----
|
|
| Re: SoC project proposal |
  Canada |
2007-03-18 07:41:54 |
>> Note: this is not correct. While I do not question
the idea that it
>> could be EXCRUCIATINGLY PAINFUL to use either an
ffs1 or ffs2 file
>> system for a multi-TB file system, it can be done.
I think that characterization is a bit inaccurate.
At work we're using ffs1 for a filesystem just epsilon under
2T (it's a
1k/8k filesystem; we'd make it bigger except the RAID card
it's on
doesn't support logical drives >2T), and it works fine.
What do you
expect to be "EXCRUCIATINGLY PAINFUL" about it?
/~ The ASCII der Mouse
/ Ribbon Campaign
X Against HTML mouse rodents.montreal.qc.ca
/ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3
27 4B
|
|
| Re: SoC project proposal |
  United States |
2007-03-18 19:08:46 |
On Sun, Mar 18, 2007 at 01:10:21PM +0100, haad wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Bill Stouder-Studenmund wrote:
> > On Sat, Mar 17, 2007 at 12:19:39AM +0100, haad
wrote:
> >> User land part
> >>
> >> I have to write usable BSD license mke2fs
program, and e2fsck if we want to use
> >> ext3 file system without additional packages
from pkgsrc. Here I will also write
> >> new or extend our mount_e2fs to support
journaling.
> >
> > This is not correct. While we would PREFER a
BSD-licensed set of tools, we
> > can use GPL'd tools if needed. I mainly mention
this as the other aspects
> > of this project NEED to happen in the SoC time
frame, while this can be
> > cleaned up later.
>
> Sorry for this, I will remove this part from my project
proposal.
Just making it an optinal, "if time permits,"
thing will be fine.
You will be required to deliver what you promise to get the
money, or we
have to somehow explain why you get full funding even w/o
having all the
deliverables. Thus it is unfair to make (or even let) you
list required
deliverables that are beyond what is necessary. But
"nice to have" is fine
if listed as such.
Take care,
Bill
|
|
| Re: SoC project proposal |
  United States |
2007-03-18 19:17:19 |
On Sun, Mar 18, 2007 at 08:41:54AM -0400, der Mouse wrote:
> >> Note: this is not correct. While I do not
question the idea that it
> >> could be EXCRUCIATINGLY PAINFUL to use either
an ffs1 or ffs2 file
> >> system for a multi-TB file system, it can be
done.
>
> I think that characterization is a bit inaccurate.
>
> At work we're using ffs1 for a filesystem just epsilon
under 2T (it's a
> 1k/8k filesystem; we'd make it bigger except the RAID
card it's on
> doesn't support logical drives >2T), and it works
fine. What do you
> expect to be "EXCRUCIATINGLY PAINFUL" about
it?
fsck time on reboot after unclean shutdown.
Actual operation should be fine.
It also depends on how well your drives work.
It can also depend on where your pain points are.
Take care,
Bill
|
|
| FFS fsck (was: SoC project proposal) |

|
2007-03-20 15:54:30 |
> What do you expect to be "EXCRUCIATINGLY
PAINFUL" about it?
> fsck time on reboot after unclean shutdown.
Would't background fsck (on softdeps) solve that?
|
|
| Re: FFS fsck (was: SoC project proposal) |
  United States |
2007-03-20 18:15:41 |
On Tue, Mar 20, 2007 at 09:54:30PM +0100, Edgar Fu? wrote:
> > What do you expect to be "EXCRUCIATINGLY
PAINFUL" about it?
>
> > fsck time on reboot after unclean shutdown.
>
> Would't background fsck (on softdeps) solve that?
Kinda, kinda not.
With a journal, you play maybe 16 MB of journal data (write
it from the
journal to disk) and you're done.
With softdeps and background fsck, you still have to read
the whole disk
and fsck it. Sure, you're running quickly, but if you
actually needed
something from the fixed-fs, you have to wait. You also have
to slosh
around an amount of metadata that is proportional to the
disk size, and
you trigger LOTS of seaks. So while your file system is
mounted, it is
performing in a very degraded mode.
Take care,
Bill
|
|
| Re: SoC project proposal |

|
2007-03-21 16:25:20 |
On Sat, Mar 17, 2007 at 12:19:39AM +0100, haad wrote:
> [...]
> Here is short version: <<EOF
> General
>
> The Ext2 file system is the de-facto standard,
Unix-like file system used on
> Linux installations. Ext2 does not have journaling
capabilities, so Ext3 was
> built on top of it to add them without breaking
compatibility with Ext2. Ext3 is
> now a stable journaled file system used on lots of
Linux installations.
>
> NetBSD currently fully supports the Ext2 file system at
the kernel level.
> Unfortunately there is no support for the new features
included in Ext3,
> although Ext3 file systems can be mounted provided that
their journal is clean.
> It would be very nice if NetBSD had Ext3 file system
support because the system
> could immediately gain a journaled file system as well
as compatibility with Linux.
>
> NetBSD as operating system really need good, stable
journal file system, today
> disks and raids become more and more bigger with size
about 1TB or more. FFS was
> not designed for disks size like this. We have problems
with file system sizes
> over 2TB (nor FFS or FFS2 is suitable for this size)
good ext3/ext4 support will
> give away these problems.
>
> EXT3 file system features:
>
> *
>
> Journaling
> *
>
> Over 16TB file system size
ffsv2 can actually handle filesystems much larger than
that.
Also note that ext2/ext3 has some limitations, like 16bit
uid/gid, only
one second resolution for access/create/modification times.
Will, or will not,
be a problem, depending on your application.
--
Manuel Bouyer <bouyer antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la
difference
--
|
|
| Re: SoC project proposal |
  United Kingdom |
2007-03-21 16:58:32 |
On Wed, Mar 21, 2007 at 10:25:20PM +0100, Manuel Bouyer
wrote:
> >
> > Over 16TB file system size
>
> ffsv2 can actually handle filesystems much larger than
that.
Even lowly ffsv1 can do 256TB (with 64kB fragments), or 32TB
with
64kB blocks and 8kB fragments.
(modulo bugs in untested code paths)
> Also note that ext2/ext3 has some limitations, like
16bit uid/gid,
> only one second resolution for
access/create/modification times.
Hmm... those don't look good!
David
--
David Laight: david l8s.co.uk
|
|
[1-9]
|
|