RFC: yup enhancements


Subject: RFC: yup enhancements
From: Hollis Blanchard (hollis-lists@austin.rr.com)
Date: Mon Oct 15 2001 - 22:11:17 MDT


The current yup has some limitations:

1. it only understands one (main) repository. You can't have multiple
repositories to draw from, and of course you can't have a repository which
is incomplete.
2. it can't handle multiple architectures at all, including "source" (as in
.src.rpm).
3. yup has a pretty big memory footprint, and is a bit slow to start up.

Here are some yup internals:

- yup maintains a list of initially-available packages in a big text file
(yup.db.init).
- yup downloads yup.db.diff files which it applies as a patch to
yup.db.init, producing yup.db.list.
- yup.db.list is the master list of all available packages. Here's a sample:

.begin RPM
  .filename ./YellowDog/RPMS/ElectricFence-2.2.2-5.ppc.rpm
  .name ElectricFence
  .version 2.2.2
  .release 5
  .size 64186
  .license GPL
  .arch ppc
  .rpmgroup Development/Tools
  [...]

This file on my YDL 2.0 system is currently more than 3 MB in size. The file
is parsed into objects (with attributes 'name', 'version', etc). That takes
a considerable amount of time.

The reason for the .diff file is clear: bandwidth. Whenever a package in
yup.db.list is updated, users would have to download the entire 3+ MB list
again. By using the diff, they can download only the package information
that has changed.

Ok, so that's how it is right now.

Python (the language yup is written in) has an interesting feature: it
allows you to save an object to a file and load it again later. This is
called pickling an object.

I think this could help us in a couple ways:
- pickles would eliminate parsing time (replacing it with much faster load
time).
- pickles would reduce yup.db.list size significantly because they would not
be human-readable. (All the yup.db files should really be transparently
gzipped already, but that doesn't matter now...)
- pickles could also allow us to avoid keeping both yup.db.list and
yup.db.init around (detailed below).

Pickles would *only* be used to hold package header information - not the
packages themselves. In fact it may be best to pickle each rpm's python
header structure itself (I haven't looked at it yet).

The (server) directory structure I'm considering is something like this:

yup.version: [text file]
    0.8
yup.arch.ppc/list: [text file]
    ElectricFence 2.2.2-5
    ImageMagick 5.2.7-2
    ...
yup.arch.ppc/pkginfo/ [directory of pickled objects]
    ElectricFence.pickle
    ImageMagick.pickle
    ...
yup.arch.ppc/pkg/ [directory of rpm's]
    ElectricFence-2.2.2-5.ppc.rpm
    ImageMagick-5.2.7-2.ppc.rpm
    ...

I'm not sure any other files outside a yup.arch directory are needed. Please
correct me if I've overlooked something.

Before executing any action, yup does the following (for each repository):
1. yup would always download the 'list' file for each arch.
2. yup would compare the version of each package in 'list' with the version
of the local pickle.
3. If the remote version is more current, yup downloads the updated pickle.

In this way, yup keeps the most up-to-date pickles locally at all times,
which it can then use to quickly make decisions regarding the availability
of updates, dependencies, etc. There also is no single 'master' list - the
pickles can be organized by repository. (In theory this organization could
be added to the current yup source, but I believe there are too many
assumptions that only one package list exists.)

There are optimizations that can and should be made, but does this general
idea sound ok? Any comments welcome.

-Hollis



This archive was generated by hypermail 2a24 : Mon Oct 15 2001 - 21:22:31 MDT