List Info

Thread: Observations on setting up a PEAR channel




Observations on setting up a PEAR channel
user name
2007-03-04 06:40:38
Having just set up my own (test, internal) PEAR channel, I
thought I'd 
offer some thoughts and observations about the PEAR
installer in 
general, and how it manages packages. Before I start, let me
say that I 
appreciate very sincerely the enormous amount of work that
Greg and 
others have put into the PEAR installer and Greg's
Chiara_PEAR_Server 
package. It's all great.

First let me state my main point of reference: RPM. For
those who are 
not familiar with it, RPM (www.rpm.org) is a package
management system 
for Linux operating systems and is used by various OSes
including Red 
Hat Enterprise Linux (RHEL), CentOS, Fedora, SuSE and
others.  RPM, like 
the PEAR package.xml system, allows the packaging of
metadata (about 
versioning, dependencies and suchlike) plus an actual piece
of software 
(binaries, data, configuration files etc.) into an output
package. RPM 
calls the metadata a "spec file" (the contents of
which are remarkably 
similar to a package.xml, albeit in a different format) and
the 
resulting package "an RPM package" (the equivalent
of a PEAR .tar/.tgz 
output package)

RPM is the base system but doesn't have any concept of
"channels" 
("repositories" in RPM terminology) or any
features to actually support 
resolution of dependencies. Thus, it is commonly used
alongside package 
management tools such as "yum" or "APT"
which fulfil the missing link by 
enabling one to do things like "yum install
[somepackage]" which will 
install "somepackage" and all of its dependencies,
analogous to "pear 
install --alldeps [somepackage]". I'll mainly refer to
"yum" as this is 
the standard tool on Fedora and CentOS, the systems I'm most
familiar with.

Enough about the overview, let's consider a couple of
noteworthy things 
that make RPM/yum different to PEAR.

1. Separation of channel information and package metadata

PEAR intrinsically ties a particular package file (.tgz) to
a specific 
channel. The <channel> element has to be defined in
the package.xml file 
and this is therefore encoded in the output package.  In
other words, 
the channel is specified at source rather than destination.
In contrast, 
RPM spec files contain no explicit information about the
repository from 
which they will ultimately be served. Ditto for
dependencies, which are 
given as a package name *without* any channel information.
Instead, all 
*channel* metadata are defined dynamically at runtime in the
yum 
configuration file on the *end-user system* (destination),
with snippets 
like this (simple example):

[mychannelname]
name = Example Repository
baseurl = http://myrepo.example.com/


"mychannelname" and "Example Repository"
are not special in the above 
configuration, and could vary arbitrarily between
destination systems. 
Think of "mychannelname" as being like a channel
alias.

Separating metadata about the package itself from the
transport route 
(channel/repository) has a number of benefits:

a) It's easy to move/rename a package repository; no
rebuilding of 
packages is necessary.

b) Portability is improved; both output packages (tgz) and
their 
respective sources can be shared/copied between channels
*without 
changes*. There are many common uses for this, for example
sharing 
packages between two distinct repositories - e.g. a private
and public 
repo, where the public one might be a subset of the private
one.

c) It "makes sense" in a lot of ways; it is a
logical separation and 
there is certainly an argument to be made that the mode of
serving does 
not belong in metadata about the package

d) It makes the initial packaging process simpler; a package
can be 
built NOW and subsequently incorporated into a/some
channel(s), unlike 
with PEAR where a channel must be set up and
'channel-discover'd before 
a package can even be built for that channel. This makes the
barrier to 
entry very high for someone, especially since
Chiara_PEAR_Server can be 
tricky to get up and running.

e) It means that tricky bugs like #10254 (RPM-building specs
for 
external channels fails) wouldn't exist.

Now, just out of interest, I started to have a little look
at what the 
technical issues might be in moving the channel metadata
from originator 
to end-user:

- The current package2 schema (and the code in
   PackageFile/v2/Validator.php) enforces that
<channel> must be present;
   however, that's easy enough to change.

- The REST metadata could remain exactly the same; the
<c> parts would
   just be filled in from the actual channel that's being
generated
   rather than from the package.xmls of the contained
packages.

- At least to kick off with, the dependency handling
(specification of
   requirements) could stay pretty much the same and still
include
   channel data. This is restrictive, but it would be a
first step. (Note
   that removing the channel name from a dependency *does*
have security
   implications, but not insurmountable ones - we work with
it just fine
   in RPM land).

- The main thing that would need to change (and I don't
think this is
   huge) is that the Registry would fill in the channel
identifier at
   install-time, *from the actual channel that a package was
installed
   from*, rather than from the package's metadata.

- We would not be able to extract channel data from static
tarballs for
   use with things like pear make-rpm-spec, but that's OK.

Are there major architectural issues I'm overlooking?
Downsides?

2. Managing your own channel

PEAR_Chiara_Server is a great tool. Let's get that out of
the way 
upfront. And it has many uses. However, for simple
situations (e.g.a 
single maintainer), it could be argued that it's overkill,
not least 
because it requires external complexities like a database
and user list. 
Let me explain how it works in yum land.

If I have a directory full of packages (.rpm packages that
is, the 
equivalent of .tgz output files), all I need to do is run,
from the 
command line, "createrepo ." to create metadata
for that set of 
packages. (That may include multiple packages and multiple
versions of 
the same package, by the way). That creates a directory
called 
"repodata", inside which is a number of XML files
(similar to the 
various ones PEAR uses in REST mode) that have all the
metadata about 
the packages in that directory.  No external databases,
tools or 
configurations are required. To set this up as a public
repository 
(channel), *all* I need to do is start serving that
directory from the 
web. Nothing more. Then, someone else can add
("discover") that 
repository on their system as described earlier and do
"yum install 
[mypackage]". There's a beautiful simplicity and
flexibility in that. 
Each directory is self contained, and no extra software is
required to 
start serving a repository. Like PEAR with REST, it's also
trivial to 
mirror, as you simply need to mirror a directory of static
files. To add 
a new package, I just create the package, drop into the
directory and do 
"createrepo ." again.

Now, I don't think that PEAR is that much different here -
the REST 
method of describing channel metadata is very similar to
yum; I think we 
are just missing a command line frontend to generate the
metadata rather 
than having to use a more complex tool like the web UI of 
Chiara_PEAR_server. (Again: I'm not saying that anything
about 
Chiara_PEAR_Server is bad. It would just be nice to have a
slightly 
simpler building-block).  I think this should be possible -
in theory 
all the XML metadata is derivable solely from the
package.xml files of 
the packages in the directory, right? In fact if nobody else
has done 
it, I'm tempted to code this up myself when I get the time.
My use case 
would be something like this:

pear makechannel channelname /path/to/channel

for example

pear makechannel pear.example.com .

to create REST metadata for pear.example.com for packages in
the current 
directory. I would structure the directory layout slightly
differently, 
but I think that's cool as everything (including the
download URL) is 
defined explicitly in the REST data (for the download URL, 
r/[pkgname]/[ver].xml -> <g>[url]</g>)

So, to summarise, the PEAR Installer and RPM+yum fulfil very
similar 
roles in their respective areas.  However, as an observer, I
had to jump 
through many more hoops to get up and running with a PEAR
channel. None 
of them were pointless, or useless, but it seems to be me
that it would 
be nice to lower the barrier to entry for simple cases.

-- 
PEAR Development Mailing List (http://pear.php.net/)
To unsubscribe, visit: http://www.php.net/unsub
.php


Re: Observations on setting up a PEAR channel
user name
2007-03-04 10:37:41
Tim Jackson wrote:
> Enough about the overview, let's consider a couple of
noteworthy things
> that make RPM/yum different to PEAR.

0. PEAR has automatic channel server resolution

Unlike RPMs, where the server location is not defined in the
spec file,
if you don't already know where to find a package, you have
to manually
add the server to your list of repositories.  RPM puts the
burden of
locating servers on the user.  This works great when you
only want to
add 1 or 2 packages at a time, but from personal experience,
it is a
royal pain in the ass if you want to install non-standard
RPMs.  In
fact, I have the same issue with all OS-based distribution
systems.

0.5.

That's another difference: RPM was designed to allow
maintaining remote
updates of an OS, where the need for cross-server
dependencies is
non-existent.  If you have an RPM named "Log" from
one server, RPM
assumes that an RPM named "Log" from another
server is in fact the same,
and will treat them as identical.  This can be extremely
dangerous in
PHP, where common tasks implemented in userland always have
the same
name.  Without defining the source of the package in the
meta-data
(package.xml) we run the risk of allowing people to destroy
their
installation by accidentally "upgrading" to a
package from another channel.

More sinisterly, if a malicious user were to clone the PEAR
package, for
instance, and make it a hidden dependency in their own
package, it would
be possible to "upgrade" the PEAR installer itself
and install spyware
without the enduser ever knowing.

This is probably the main reason that RPM requires the end
user to
manually set up servers, as it requires a tradeoff of
convenience for
security.  PEAR puts the onus of convenience on the package
distributor
to a certain extent, but provides extra security to ensure
that
malicious packages are not possible.

A lot of thought went into how to do this properly, and
because of the
initial decision to allow auto-discovery of channel
services, meta-data
cannot be separated from the package without introducing
tremendous risk.

However, the channel.xml specification fully supports
mirroring, which
is a simple way to move packages to another server without
having to
change anything.  The mirrors are defined at the source
channel, again
for security reasons.

> d) It makes the initial packaging process simpler; a
package can be
> built NOW and subsequently incorporated into a/some
channel(s), unlike
> with PEAR where a channel must be set up and
'channel-discover'd before
> a package can even be built for that channel. This
makes the barrier to
> entry very high for someone, especially since
Chiara_PEAR_Server can be
> tricky to get up and running.

This is not true - if you take the channel.xml from
http://pear.php.net/c
hannel.xml and modify it to define a new channel,
you can

pear channel-add channel.xml

and immediately install/upgrade packages from that channel.

> Now, just out of interest, I started to have a little
look at what the
> technical issues might be in moving the channel
metadata from originator
> to end-user:

The installer would have to be completely redesigned, and
frankly the
better solution is to make it easier to set up channel
information.

It is quite possible to create a channel server that does
not require a
database, all we would need to do is generate REST from the
release
archives.  This has been on the distant TODO list in my
mind, and would
be a wonderful PEAR package as a channel server for
"lite" channels.

> 2. Managing your own channel

It might be helpful to understand the history behind
channels.
Originally, pear.php.net was an XML-RPC-based service, and
development
versions of PEAR 1.4.0 right up to 1.4.0a12
(ht
tp://pear.php.net/package/PEAR/download/1.4.0a12) used
XML-RPC.
PEAR_Server was designed and released prior to 1.4.0a12, and
so like
pear.php.net, REST was an addon after the fact.  At a
certain point, I
found my energy to redesign was lagging .  In
addition, I suspected
that adoption of Chiara_PEAR_Server was not high enough to
really see
the issues, and that I should wait to redesign.  Now that
people are
really starting to use it, I think the time is ripe for a
revisit and an
official channel server "lite" distributed through
pear.php.net.

> So, to summarise, the PEAR Installer and RPM+yum fulfil
very similar
> roles in their respective areas.  However, as an
observer, I had to jump
> through many more hoops to get up and running with a
PEAR channel. None
> of them were pointless, or useless, but it seems to be
me that it would
> be nice to lower the barrier to entry for simple
cases.

I agree, it is very interesting to see the barriers from
others'
perspectives, thanks for this interesting and provocative
post.

Greg

-- 
PEAR Development Mailing List (http://pear.php.net/)
To unsubscribe, visit: http://www.php.net/unsub
.php


Re: Re: Observations on setting up a PEAR channel
user name
2007-03-04 15:54:27
Greg Beaver wrote:

First, thanks for your comments Greg. To reiterate again if
it wasn't 
clear, I'm not criticising any past design decisions, just
offering some 
(hopefully) constructive thoughts from a "third
party" perspective. 
Incidentally, if there's any background docs I should be
reading, please 
  do point me in the right direction (similarly for
references in the 
PEAR Installer Manifesto printed book, which I have read and
has 
actually given me a very interesting introduction to much of
the history 
which precedes where we find ourselves today).

> Unlike RPMs, where the server location is not defined
in the spec file,
> if you don't already know where to find a package, you
have to manually
> add the server to your list of repositories.  RPM puts
the burden of
> locating servers on the user.  This works great when
you only want to
> add 1 or 2 packages at a time, but from personal
experience, it is a
> royal pain in the ass if you want to install
non-standard RPMs.  In
> fact, I have the same issue with all OS-based
distribution systems.

This is all true and a significant benefit of the way PEAR
does things. 
Thanks for the reminder.

> That's another difference: RPM was designed to allow
maintaining remote
> updates of an OS, where the need for cross-server
dependencies is
> non-existent.  

I'm not sure that's quite right; we often have cross-server
deps. 
They're just not explicitly defined as such and don't have 
auto-discovery. As you rightly point out (and I touched on
in my 
previous comments), this has both up and down sides. The
ability for a 
package (including a core package) to be
"upgraded" across repositories 
is a side-effect of this, which can be exploited both
usefully (where 
intended) and maliciously (if a repository is compromised). 
I think 
PEAR is very clever here and the namespace delimitation is a
good thing; 
I was just pointing out some of the complications it causes
from a new 
user point of view.

> However, the channel.xml specification fully supports
mirroring, which
> is a simple way to move packages to another server
without having to
> change anything.  The mirrors are defined at the source
channel, again
> for security reasons.

Interesting point, although I wasn't specifically talking
about mirrors, 
I was thinking in general where (let's say) a friend decides
that he 
wants to rsync packages X, Y and Z (but not A, B and C) from
my package 
repository, and import them into his (different)
repository.

> if you take the channel.xml from
> http://pear.php.net/c
hannel.xml and modify it to define a new channel,
> you can
> pear channel-add channel.xml
> and immediately install/upgrade packages from that
channel.

Ah. Interesting to know.

> It is quite possible to create a channel server that
does not require a
> database, all we would need to do is generate REST from
the release
> archives.  This has been on the distant TODO list in my
mind, and would
> be a wonderful PEAR package as a channel server for
"lite" channels.

Excellent, that confirms what I thought then.

> Now that people are
> really starting to use it, I think the time is ripe for
a revisit and an
> official channel server "lite" distributed
through pear.php.net.

Fantastic. If you or anyone else does make a start on this,
please let 
me know - I would like to help. And when I get time, if
nobody else is 
working on it, I may well kick it off myself.


Tim

-- 
PEAR Development Mailing List (http://pear.php.net/)
To unsubscribe, visit: http://www.php.net/unsub
.php


[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )