|
List Info
Thread: Metadata section of XLS files
|
|
| Metadata section of XLS files |

|
2006-12-14 17:38:26 |
Hi guys,
I'm sorry to bother you with this, but you seem to be the
only people reachable outside Microsoft that know about the
XLS file format. I've dealt myself a lot with the XML
versions and also did some minor enhancements to the OO
XSLTs that convert WordML to OO, but now I need some help
with the binary version.
I'm trying to write a comparison function that compares two
versions of a document with each other and should return
true if the documents have the same content and false
otherwise. I'm using an MD5 hash to do this.
The reason is, that I want to eliminate versions of
documents in Sharepoint where only metadata has changed.
Unfortunately, Sharepoint is so clever that it writes
Metadata not only into its own database, but also inside the
document itself, if it is an office document type.
Therefore I want to strip off the header (and trailer) that
contains metadata. For doc files this is quite easy. I just
had to remove (or overwrite with zeros) the first 2554 and
the last 1520 bytes and compare the files afterwards.
Unfortunately this strategy does not work with XLS files. It
seems that every sheet inside the file has it's own copy of
metadata.
Can you give me any advice, how to get rid of the metadata
(just for the comparison). Is there any byte sequence I can
search for and then overwrite the next x byte with zeros?
I would be really thankful for any help.
Thanks a lot and best regards
René
------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribe sc.openoffice.org
For additional commands, e-mail: dev-help sc.openoffice.org
|
|
| Metadata section of XLS files |

|
2006-12-14 18:08:18 |
On Thu, Dec 14, 2006 at 06:38:26PM +0100, Ren? Peinl wrote:
> Hi guys,
> I'm sorry to bother you with this, but you seem to be
the only people reachable outside Microsoft that know about
the XLS file format. I've dealt myself a lot with the XML
versions and also did some minor enhancements to the OO
XSLTs that convert WordML to OO, but now I need some help
with the binary version.
> I'm trying to write a comparison function that compares
two versions of a document with each other and should return
true if the documents have the same content and false
otherwise. I'm using an MD5 hash to do this.
> The reason is, that I want to eliminate versions of
documents in Sharepoint where only metadata has changed.
Unfortunately, Sharepoint is so clever that it writes
Metadata not only into its own database, but also inside the
document itself, if it is an office document type.
> Therefore I want to strip off the header (and trailer)
that contains metadata. For doc files this is quite easy. I
just had to remove (or overwrite with zeros) the first 2554
and the last 1520 bytes and compare the files afterwards.
> Unfortunately this strategy does not work with XLS
files. It seems that every sheet inside the file has it's
own copy of metadata.
> Can you give me any advice, how to get rid of the
metadata (just for the comparison). Is there any byte
sequence I can search for and then overwrite the next x byte
with zeros?
> I would be really thankful for any help.
> Thanks a lot and best regards
> Ren?
The metadata is stored in standard OLE2 format. You can not
rely on
it being at a specific byte position in the file. There are
simple
tools available to dump the content (eg via libgsf) you'd
need to
write something yourself if your goal was to strip out some
of the
properties. The code in libgsf (C) or hpsf/poi (java)
should make
it fairly simple.
There are some docs available on the properties in OLE2 in
http://jakarta.ap
ache.org/poi/hpsf/
where you can also find some docs on the OLE2 container
format
itself.
------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribe sc.openoffice.org
For additional commands, e-mail: dev-help sc.openoffice.org
|
|
| Metadata section of XLS files |

|
2006-12-15 15:35:03 |
René Peinl wrote:
> Hi guys, I'm sorry to bother you with this, but you
seem to be the
> only people reachable outside Microsoft that know about
the XLS file
> format. I've dealt myself a lot with the XML versions
and also did
> some minor enhancements to the OO XSLTs that convert
WordML to OO,
> but now I need some help with the binary version. I'm
trying to write
> a comparison function that compares two versions of a
document with
> each other and should return true if the documents have
the same
> content and false otherwise. I'm using an MD5 hash to
do this. The
> reason is, that I want to eliminate versions of
documents in
> Sharepoint where only metadata has changed.
Unfortunately, Sharepoint
> is so clever that it writes Metadata not only into its
own database,
> but also inside the document itself, if it is an office
document
> type. Therefore I want to strip off the header (and
trailer) that
> contains metadata. For doc files this is quite easy. I
just had to
> remove (or overwrite with zeros) the first 2554 and the
last 1520
> bytes and compare the files afterwards. Unfortunately
this strategy
> does not work with XLS files. It seems that every sheet
inside the
> file has it's own copy of metadata. Can you give me any
advice, how
> to get rid of the metadata (just for the comparison).
Is there any
> byte sequence I can search for and then overwrite the
next x byte
> with zeros? I would be really thankful for any help.
Thanks a lot and
We have a complete description of the OLE2 container file
format:
http:/
/sc.openoffice.org/compdocfileformat.pdf
Regards
Daniel
------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribe sc.openoffice.org
For additional commands, e-mail: dev-help sc.openoffice.org
|
|
| Metadata section of XLS files |

|
2006-12-18 03:20:34 |
René Peinl wrote:
> Hi guys,
> I'm sorry to bother you with this, but you seem to be
the only people reachable outside Microsoft that know about
the XLS file format. I've dealt myself a lot with the XML
versions and also did some minor enhancements to the OO
XSLTs that convert WordML to OO, but now I need some help
with the binary version.
> I'm trying to write a comparison function that compares
two versions of a document with each other and should return
true if the documents have the same content and false
otherwise. <snip>
> I would be really thankful for any help.
> Thanks a lot and best regards
> René
Is this a new OOo thing - providing support to M$
developers?
--
Xfce on PCLinuxOS, OOo 2.0.2 (en_GB).
Direct mail to "teaman" is not opened; if
necessary, email "realmail"
I try to take one day at a time - but sometimes several days
attack me at once (Ashleigh Brilliant)
------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribe sc.openoffice.org
For additional commands, e-mail: dev-help sc.openoffice.org
|
|
[1-4]
|
|
|
about | contact Other archives ( Real Estate discussion Medical topics )
|