|
List Info
Thread: Versioning databases
|
|
| Versioning databases |

|
2006-06-05 02:33:40 |
Sounds nice. I had thought of also (somehow) saving diffs
in a db so
you could generate the test db you used previously. Don't
know if there
is interest in this, but we had a prototype of this a few
years ago.
Joe
Michael James wrote:
> Some biological databases actually come in versions,
> for example; we are up to the TIGR4 rice genome and
> swisprot UniProtKB/Swiss-Prot Release 50.0 of
30-May-2006
>
> Others just change daily, NCBI:nr NCBI:nt etc.
>
> All this effort creates a problem for repeatability,
> the blast results you get next week
> won't quite be the ones you got today.
>
> It seems to me that the situation would be improved
> by tagging results "BLAST against ncbi.nih.gov
nr 2006-06-05 000"
>
> This means we need to come up with a versioning scheme
> and for anything without, I'd suggest something as
simple as
> issuing_authority database date
3_digit_release_number
> eg ncbi.nih.gov nr 2006-06-05 000
>
> For uniqueness, use the internet name for
issuing_authority.
>
> The database is the filename stripped of all qualifiers
> Remove things like .gz .00.tar.gz
>
> The date in ISO format!
>
> 3 more digits to ensure uniqueness.
>
>
> Such a scheme would also be
> a big win for us database administrators.
> We could start to weave it through the tangled web
> of different providers and formats
> so we actually know the original issuing authority
> for the file we are downloading.
>
> What do you think?
> michaelj
>
>
--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman scalableinformatics.com
web : http://www.scalabl
einformatics.com
phone: +1 734 786 8423
fax : +1 734 786 8452
cell : +1 734 612 4615
_______________________________________________
Bioclusters maillist - Bioclusters bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bioclusters
a>
|
|
| Versioning databases |

|
2006-06-05 03:02:45 |
|
It seems much of this could be addressed by a svn repository . I know I'd sure appreciate typing 'svn update nt'. What was in your prototype?
----- Original Message ---- From: Joe Landman <landman scalableinformatics.com> To: "Clustering, compute farming & distributed computing in life science informatics" <bioclusters bioinformatics.org> Sent: Sunday, June 4, 2006 10:33:40 PM Subject: Re: [Bioclusters] Versioning databases
Sounds nice.  ; I had thought of also (somehow) saving diffs in a db so you could generate the test db you used previously. Don't know if there is interest in this, but we had a prototype of this a few years ago.
Joe
Michael
James wrote: > Some biological databases actually come in versions, > for example; we are up to the TIGR4 rice genome and > swisprot UniProtKB/Swiss-Prot Release 50.0 of 30-May-2006 > > Others just change daily, NCBI:nr NCBI:nt etc. > > All this effort creates a problem for repeatability, > the blast results you get next week > won't quite be the ones you got today. > > It seems to me that the situation would be improved > by tagging results "BLAST against ncbi.nih.gov nr 2006-06-05 000" > > This means we need to come up with a versioning scheme > and for anything without, I'd suggest something as simple as > issuing_authority database date 3_digit_release_number >
eg ncbi.nih.gov nr 2006-06-05 000 > > For uniqueness, use the internet name for issuing_authority. > > The database is the filename stripped of all qualifiers > Remove things like .gz .00.tar.gz > > The date in ISO format! > > 3 more digits to ensure uniqueness. > > > Such a scheme would also be > a big win for us database administrators. > We could start to weave it through the tangled web > of different providers and formats > so we actually know the original issuing authority > for the file we are downloading. > > What do you think? > michaelj > >
-- Joseph Landman, Ph.D Founder and CEO Scalable
Informatics LLC, email: landman scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 cell : +1 734 612 4615 _______________________________________________ Bioclusters maillist - Bioclusters bioinformatics.org https://bioinformatics.org/mailman/listinfo/bioclusters
|
| Versioning databases |

|
2006-06-05 03:06:10 |
Just a simple postgresql saving of compressed deltas with a
simple front
end. SVN wasn't popular at the time, and cvs didn't look
like it could
handle it. Even svn might blow lots of time in diff
calculation.
Mike Cariaso wrote:
>
> It seems much of this could be addressed by a svn
repository. I know I'd
> sure appreciate typing 'svn update nt'. What was in
your prototype?
>
> ----- Original Message ----
> From: Joe Landman <landman scalableinformatics.com>
> To: "Clustering, compute farming &
distributed computing in life science
> informatics" <bioclusters bioinformatics.org>
> Sent: Sunday, June 4, 2006 10:33:40 PM
> Subject: Re: [Bioclusters] Versioning databases
>
> Sounds nice. I had thought of also (somehow) saving
diffs in a db so
> you could generate the test db you used previously.
Don't know if there
> is interest in this, but we had a prototype of this a
few years ago.
>
> Joe
>
> Michael James wrote:
> > Some biological databases actually come in
versions,
> > for example; we are up to the TIGR4 rice genome
and
> > swisprot UniProtKB/Swiss-Prot Release 50.0 of
30-May-2006
> >
> > Others just change daily, NCBI:nr NCBI:nt etc.
> >
> > All this effort creates a problem for
repeatability,
> > the blast results you get next week
> > won't quite be the ones you got today.
> >
> > It seems to me that the situation would be
improved
> > by tagging results "BLAST against
ncbi.nih.gov nr 2006-06-05 000"
> >
> > This means we need to come up with a versioning
scheme
> > and for anything without, I'd suggest something
as simple as
> > issuing_authority database date
3_digit_release_number
> > eg ncbi.nih.gov nr 2006-06-05
000
> >
> > For uniqueness, use the internet name for
issuing_authority.
> >
> > The database is the filename stripped of all
qualifiers
> > Remove things like .gz .00.tar.gz
> >
> > The date in ISO format!
> >
> > 3 more digits to ensure uniqueness.
> >
> >
> > Such a scheme would also be
> > a big win for us database administrators.
> > We could start to weave it through the tangled
web
> > of different providers and formats
> > so we actually know the original issuing
authority
> > for the file we are downloading.
> >
> > What do you think?
> > michaelj
> >
> >
>
> --
> Joseph Landman, Ph.D
> Founder and CEO
> Scalable Informatics LLC,
> email: landman scalableinformatics.com
> web : http://www.scalabl
einformatics.com
> phone: +1 734 786 8423
> fax : +1 734 786 8452
> cell : +1 734 612 4615
> _______________________________________________
> Bioclusters maillist - Bioclusters bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters
a>
>
>
>
------------------------------------------------------------
------------
>
> _______________________________________________
> Bioclusters maillist - Bioclusters bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters
a>
--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman scalableinformatics.com
web : http://www.scalabl
einformatics.com
phone: +1 734 786 8423
fax : +1 734 786 8452
cell : +1 734 612 4615
_______________________________________________
Bioclusters maillist - Bioclusters bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bioclusters
a>
|
|
| Versioning databases |

|
2006-06-05 14:43:11 |
On 5 Jun 2006, at 4:06 am, Joe Landman wrote:
> Just a simple postgresql saving of compressed deltas
with a simple
> front end. SVN wasn't popular at the time, and cvs
didn't look
> like it could handle it. Even svn might blow lots of
time in diff
> calculation.
Oooh yes. I've seen what happens when even small things
like
bacterial genomes are kept in CVS repositories. It's not
pretty.
Putting nt in it? *shudder*
I'm not sure I'd trust diff and patch to do the right
thing anyway,
especially with repetitive or highly similar sequences.
Does diff
use enough context to be reliable for DNA sequence? I doubt
it.
Diff is essentially a sequence alignment algorithm anyway,
and we all
know all about those.
Tim
_______________________________________________
Bioclusters maillist - Bioclusters bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bioclusters
a>
|
|
[1-4]
|
|
|
about | contact Other archives ( Real Estate discussion Medical topics )
|