List Info

Thread: converting HTML to OpenOffice Writer




converting HTML to OpenOffice Writer
user name
2006-07-10 00:53:29
I have an XSLT style sheet that converts HTML to an OO
document.

It's pretty rough; I only have one transformation that I'm
needing to
do right now so there might be all sorts of problems with
it.

However, it might be of interest to people here.


Anyone want to see it? Help me make it better?

-- 
Nic Ferrier
http://www.tapsellfer
rier.co.uk   for all your tapsell ferrier needs

------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribexml.openoffice.org
For additional commands, e-mail: dev-helpxml.openoffice.org

converting HTML to OpenOffice Writer
user name
2006-07-10 07:45:33
On 10/07/06, Nic James Ferrier <nferriertapsellferrier.co.uk> wrote:
> I have an XSLT style sheet that converts HTML to an OO
document.
>
> It's pretty rough; I only have one transformation that
I'm needing to
> do right now so there might be all sorts of problems
with it.
>
> However, it might be of interest to people here.
>
>
> Anyone want to see it? Help me make it better?

Hi Nic.
   I'll  help if you want osme? Might be an idea to use it
with
http://me
rcury.ccil.org/~cowan/XML/tagsoup/  tagsoup since most
html isn't all that clever?



regards






-- 
Dave Pawson
XSLT XSL-FO FAQ.
http://www.dpawson.co.uk

------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribexml.openoffice.org
For additional commands, e-mail: dev-helpxml.openoffice.org

converting HTML to OpenOffice Writer
user name
2006-07-10 10:14:38
Hi Dave

>    I'll  help if you want osme? Might be an idea to
use it with
> http://me
rcury.ccil.org/~cowan/XML/tagsoup/  tagsoup since most
> html isn't all that clever?

Hmmm... Maybe. But I don't see that this is ever going to
be a
generalized converter from HTML to OO. I see it as a step in
a
specific pipeline requiring quite good HTML. A very
adaptable
generalized converter will need mapping support between HTML
and OO
and that will be complicated. More complicated than I want
anyway.


For example, I am publishing my CV like this. I maintain the
CV in
Emacs org-mode. From there I generate an XOXO microformat
file which I
then XSLT into well marked up HTML (with DIVs and things).

I can then use html2oo.xslt to tranfer that into OO and from
there get
Word or anything else that OO can spit out.


Another example of an application I had in mind is something
I built
for Thompson: it built websites out of legal content by
converting
their SGML content to XML and then HTML via XSLT. I also had
to
convert the XML to Word by using an XSL-FO processor. 

But now I would just have a single HTML design with a CSS
providing
the look for the web pages and html2oo.xslt producing Word
(via
OpenOffice).


Anyway... I've inlined the stylesheet at the bottom. As I
said, it's
not comprehensive yet but as I need more elements I will add
them.

Right now, I'm controlling the resulting OO file with a
Makefile that
looks like this:

  doc.odt: doc/content.xml
	bash -c 'cd doc ; zip -r ../doc.odt *'


  doc/content.xml: html2oo.xslt doc.html doc
	xsltproc --html html2oo.xslt doc.html > doc/content.xml

  doc:
	[ -d doc ] || ( mkdir doc ; unzip -d doc doc.odt )


There are options for making this better but it kinda
depends on what
tools you want to use for the XSLT.


If I setup a darcs (http://abridgegame.org/
darcs/) repository for this
would anyone contribute do you think? Would you?


<?xml version="1.0"
encoding="utf-8"?>
<xsl:stylesheet  version="1.0" 
                 xmlnssl=&
quot;http://www.w3.or
g/1999/XSL/Transform"
                
xmlns:office="urn:oasis:names:tc:opendocumentmlns:
office:1.0" 
                
xmlns:style="urn:oasis:names:tc:opendocumentmlns:
style:1.0" 
                
xmlns:text="urn:oasis:names:tc:opendocumentmlns:
text:1.0"
                
xmlns:table="urn:oasis:names:tc:opendocumentmlns:
table:1.0"
                
xmlns:draw="urn:oasis:names:tc:opendocumentmlns:
drawing:1.0"
                
xmlns:fo="urn:oasis:names:tc:opendocumentmlns<
img
src='http://www.archivesat.com/images/love_struck.gif'>sl-fo
-compatible:1.0"
                 xmlnslink=
"http://www.w3.org/1999/x
link"
                 xmlns:dc="http://purl.org/dc/e
lements/1.1/"
                
xmlns:meta="urn:oasis:names:tc:opendocumentmlns:
meta:1.0"
                
xmlns:number="urn:oasis:names:tc:opendocumentmlns:
datastyle:1.0"
                
xmlns:svg="urn:oasis:names:tc:opendocumentmlns:
svg-compatible:1.0"
                
xmlns:chart="urn:oasis:names:tc:opendocumentmlns:
chart:1.0"
                
xmlns:dr3d="urn:oasis:names:tc:opendocumentmlns:
dr3d:1.0"
                 xmlns:math="http://www.w3.org/
1998/Math/MathML"
                
xmlns:form="urn:oasis:names:tc:opendocumentmlns:
form:1.0"
                
xmlns:script="urn:oasis:names:tc:opendocumentmlns:
script:1.0"
                 xmlns:ooo="http://openoffice.o
rg/2004/office"
                 xmlns:ooow="http://openoffice.o
rg/2004/writer"
                 xmlns:oooc="http://openoffice.org
/2004/calc"
                 xmlns:dom="http://www.w3.org/2
001/xml-events"
                 xmlnsforms
="http://www.w3.org/2002/
xforms"
                 xmlnssd=&
quot;http://www.w3.org/20
01/XMLSchema"
                 xmlnssi=&
quot;http://www.
w3.org/2001/XMLSchema-instance">

    <!-- 
         Copyright (C) 2006 by Tapsell-Ferrier Limited

         This program is free software; you can redistribute
it and/or modify
         it under the terms of the GNU General Public
License as published by 
         the Free Software Foundation; either version 2, or
(at your option) 
         any later version. 

         This program is distributed in the hope that it
will be useful, 
         but WITHOUT ANY WARRANTY; without even the implied
warranty of 
         MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.  See the 
         GNU General Public License for more details. 
    
         You should have received a copy of the GNU General
Public License 
         along with this program; see the file COPYING.  If
not, write to the 
         Free Software Foundation, Inc.,   51 Franklin
Street, Fifth Floor, 
         Boston, MA  02110-1301  USA 
      -->

    <xsl:output method="xml"
indent="yes"/>

    <xsl:template match="/html">
        <office:document-content
office:version="1.0">
            <office:scripts/>
            <office:font-face-decls>
                <style:font-face
style:name="StarSymbol"
svg:font-family="StarSymbol"
style:font-charset="x-symbol"/>
                <style:font-face style:name="DejaVu
Sans1" svg:font-family="'DejaVu Sans'"
style:font-pitch="variable"/>
                <style:font-face style:name="DejaVu
Serif" svg:font-family="'DejaVu Serif'"
style:font-family-generic="roman"
style:font-pitch="variable"/>
                <style:font-face style:name="DejaVu
Sans" svg:font-family="'DejaVu Sans'"
style:font-family-generic="swiss"
style:font-pitch="variable"/>
            </office:font-face-decls>
            <office:automatic-styles>
                <style:style
style:name="Table1"
style:family="table">
                    <style:table-properties
style:width="6.925in"
table:align="margins"/>
                </style:style>
                <style:style
style:name="Table1.A"
style:family="table-column">
                    <style:table-column-properties
style:column-width="3.4625in"
style:rel-column-width="32767*"/>
                </style:style>
                <style:style
style:name="Table1.A1"
style:family="table-cell">
                    <style:table-cell-properties
fo:padding="0.0382in"
fo:border-left="0.0007in solid #000000"
fo:border-right="none"
fo:border-top="0.0007in solid #000000"
fo:border-bottom="0.0007in solid #000000"/>
                </style:style>
                <style:style
style:name="Table1.B1"
style:family="table-cell">
                    <style:table-cell-properties
fo:padding="0.0382in" fo:border="0.0007in
solid #000000"/>
                </style:style>
                <style:style
style:name="Table1.A2"
style:family="table-cell">
                    <style:table-cell-properties
fo:padding="0.0382in"
fo:border-left="0.0007in solid #000000"
fo:border-right="none"
fo:border-top="none"
fo:border-bottom="0.0007in solid #000000"/>
                </style:style>
                <style:style
style:name="Table1.B2"
style:family="table-cell">
                    <style:table-cell-properties
fo:padding="0.0382in"
fo:border-left="0.0007in solid #000000"
fo:border-right="0.0007in solid #000000"
fo:border-top="none"
fo:border-bottom="0.0007in solid #000000"/>
                </style:style>
                <style:style
style:name="Table2"
style:family="table">
                    <style:table-properties
style:width="6.925in"
table:align="margins"/>
                </style:style>
                <style:style
style:name="Table2.A"
style:family="table-column">
                    <style:table-column-properties
style:column-width="3.4625in"
style:rel-column-width="32767*"/>
                </style:style>
                <style:style
style:name="Table2.A1"
style:family="table-cell">
                    <style:table-cell-properties
fo:padding="0.0382in"
fo:border-left="0.0007in solid #000000"
fo:border-right="none"
fo:border-top="0.0007in solid #000000"
fo:border-bottom="0.0007in solid #000000"/>
                </style:style>
                <style:style
style:name="Table2.B1"
style:family="table-cell">
                    <style:table-cell-properties
fo:padding="0.0382in" fo:border="0.0007in
solid #000000"/>
                </style:style>
                <style:style
style:name="Table2.A2"
style:family="table-cell">
                    <style:table-cell-properties
fo:padding="0.0382in"
fo:border-left="0.0007in solid #000000"
fo:border-right="none"
fo:border-top="none"
fo:border-bottom="0.0007in solid #000000"/>
                </style:style>
                <style:style
style:name="Table2.B2"
style:family="table-cell">
                    <style:table-cell-properties
fo:padding="0.0382in"
fo:border-left="0.0007in solid #000000"
fo:border-right="0.0007in solid #000000"
fo:border-top="none"
fo:border-bottom="0.0007in solid #000000"/>
                </style:style>
                <style:style style:name="P1"
style:family="paragraph"
style:parent-style-name="Table_20_Heading">
                    <style:paragraph-properties
fo:text-align="start"
style:justify-single-word="false"/>
                    <style:text-properties
fo:font-style="normal"
fo:font-weight="normal"
style:font-style-asian="normal"
style:font-weight-asian="normal"
style:font-style-complex="normal"
style:font-weight-complex="normal"/>
                </style:style>
                <style:style style:name="P2"
style:family="paragraph"
style:parent-style-name="Standard"
style:list-style-name="L1"/>
                <style:style style:name="P3"
style:family="paragraph"
style:parent-style-name="Standard"
style:list-style-name="L2"/>
                <text:list-style
style:name="L1">
                    <text:list-level-style-bullet
text:level="1"
text:style-name="Bullet_20_Symbols"
style:num-suffix="."
text:bullet-char="&#x25CF;">
                        <style:list-level-properties
text:space-before="0.25in"
text:min-label-width="0.25in"/>
                        <style:text-properties
style:font-name="StarSymbol"/>
                    </text:list-level-style-bullet>
                    <text:list-level-style-bullet
text:level="2"
text:style-name="Bullet_20_Symbols"
style:num-suffix="."
text:bullet-char="&#x25CB;">
                        <style:list-level-properties
text:space-before="0.5in"
text:min-label-width="0.25in"/>
                        <style:text-properties
style:font-name="StarSymbol"/>
                    </text:list-level-style-bullet>
                    <text:list-level-style-bullet
text:level="3"
text:style-name="Bullet_20_Symbols"
style:num-suffix="."
text:bullet-char="&#x25A0;">
                        <style:list-level-properties
text:space-before="0.75in"
text:min-label-width="0.25in"/>
                        <style:text-properties
style:font-name="StarSymbol"/>
                    </text:list-level-style-bullet>
                    <text:list-level-style-bullet
text:level="4"
text:style-name="Bullet_20_Symbols"
style:num-suffix="."
text:bullet-char="&#x25CF;">
                        <style:list-level-properties
text:space-before="1in"
text:min-label-width="0.25in"/>
                        <style:text-properties
style:font-name="StarSymbol"/>
                    </text:list-level-style-bullet>
                    <text:list-level-style-bullet
text:level="5"
text:style-name="Bullet_20_Symbols"
style:num-suffix="."
text:bullet-char="&#x25CB;">
                        <style:list-level-properties
text:space-before="1.25in"
text:min-label-width="0.25in"/>
                        <style:text-properties
style:font-name="StarSymbol"/>
                    </text:list-level-style-bullet>
                    <text:list-level-style-bullet
text:level="6"
text:style-name="Bullet_20_Symbols"
style:num-suffix="."
text:bullet-char="&#x25A0;">
                        <style:list-level-properties
text:space-before="1.5in"
text:min-label-width="0.25in"/>
                        <style:text-properties
style:font-name="StarSymbol"/>
                    </text:list-level-style-bullet>
                    <text:list-level-style-bullet
text:level="7"
text:style-name="Bullet_20_Symbols"
style:num-suffix="."
text:bullet-char="&#x25CF;">
                        <style:list-level-properties
text:space-before="1.75in"
text:min-label-width="0.25in"/>
                        <style:text-properties
style:font-name="StarSymbol"/>
                    </text:list-level-style-bullet>
                    <text:list-level-style-bullet
text:level="8"
text:style-name="Bullet_20_Symbols"
style:num-suffix="."
text:bullet-char="&#x25CB;">
                        <style:list-level-properties
text:space-before="2in"
text:min-label-width="0.25in"/>
                        <style:text-properties
style:font-name="StarSymbol"/>
                    </text:list-level-style-bullet>
                    <text:list-level-style-bullet
text:level="9"
text:style-name="Bullet_20_Symbols"
style:num-suffix="."
text:bullet-char="&#x25A0;">
                        <style:list-level-properties
text:space-before="2.25in"
text:min-label-width="0.25in"/>
                        <style:text-properties
style:font-name="StarSymbol"/>
                    </text:list-level-style-bullet>
                    <text:list-level-style-bullet
text:level="10"
text:style-name="Bullet_20_Symbols"
style:num-suffix="."
text:bullet-char="&#x25CF;">
                        <style:list-level-properties
text:space-before="2.5in"
text:min-label-width="0.25in"/>
                        <style:text-properties
style:font-name="StarSymbol"/>
                    </text:list-level-style-bullet>
                </text:list-style>
                <text:list-style
style:name="L2">
                    <text:list-level-style-bullet
text:level="1"
text:style-name="Bullet_20_Symbols"
style:num-suffix="."
text:bullet-char="&#x25CF;">
                        <style:list-level-properties
text:space-before="0.25in"
text:min-label-width="0.25in"/>
                        <style:text-properties
style:font-name="StarSymbol"/>
                    </text:list-level-style-bullet>
                    <text:list-level-style-bullet
text:level="2"
text:style-name="Bullet_20_Symbols"
style:num-suffix="."
text:bullet-char="&#x25CB;">
                        <style:list-level-properties
text:space-before="0.5in"
text:min-label-width="0.25in"/>
                        <style:text-properties
style:font-name="StarSymbol"/>
                    </text:list-level-style-bullet>
                    <text:list-level-style-bullet
text:level="3"
text:style-name="Bullet_20_Symbols"
style:num-suffix="."
text:bullet-char="&#x25A0;">
                        <style:list-level-properties
text:space-before="0.75in"
text:min-label-width="0.25in"/>
                        <style:text-properties
style:font-name="StarSymbol"/>
                    </text:list-level-style-bullet>
                    <text:list-level-style-bullet
text:level="4"
text:style-name="Bullet_20_Symbols"
style:num-suffix="."
text:bullet-char="&#x25CF;">
                        <style:list-level-properties
text:space-before="1in"
text:min-label-width="0.25in"/>
                        <style:text-properties
style:font-name="StarSymbol"/>
                    </text:list-level-style-bullet>
                    <text:list-level-style-bullet
text:level="5"
text:style-name="Bullet_20_Symbols"
style:num-suffix="."
text:bullet-char="&#x25CB;">
                        <style:list-level-properties
text:space-before="1.25in"
text:min-label-width="0.25in"/>
                        <style:text-properties
style:font-name="StarSymbol"/>
                    </text:list-level-style-bullet>
                    <text:list-level-style-bullet
text:level="6"
text:style-name="Bullet_20_Symbols"
style:num-suffix="."
text:bullet-char="&#x25A0;">
                        <style:list-level-properties
text:space-before="1.5in"
text:min-label-width="0.25in"/>
                        <style:text-properties
style:font-name="StarSymbol"/>
                    </text:list-level-style-bullet>
                    <text:list-level-style-bullet
text:level="7"
text:style-name="Bullet_20_Symbols"
style:num-suffix="."
text:bullet-char="&#x25CF;">
                        <style:list-level-properties
text:space-before="1.75in"
text:min-label-width="0.25in"/>
                        <style:text-properties
style:font-name="StarSymbol"/>
                    </text:list-level-style-bullet>
                    <text:list-level-style-bullet
text:level="8"
text:style-name="Bullet_20_Symbols"
style:num-suffix="."
text:bullet-char="&#x25CB;">
                        <style:list-level-properties
text:space-before="2in"
text:min-label-width="0.25in"/>
                        <style:text-properties
style:font-name="StarSymbol"/>
                    </text:list-level-style-bullet>
                    <text:list-level-style-bullet
text:level="9"
text:style-name="Bullet_20_Symbols"
style:num-suffix="."
text:bullet-char="&#x25A0;">
                        <style:list-level-properties
text:space-before="2.25in"
text:min-label-width="0.25in"/>
                        <style:text-properties
style:font-name="StarSymbol"/>
                    </text:list-level-style-bullet>
                    <text:list-level-style-bullet
text:level="10"
text:style-name="Bullet_20_Symbols"
style:num-suffix="."
text:bullet-char="&#x25CF;">
                        <style:list-level-properties
text:space-before="2.5in"
text:min-label-width="0.25in"/>
                        <style:text-properties
style:font-name="StarSymbol"/>
                    </text:list-level-style-bullet>
                </text:list-style>
            </office:automatic-styles>
            <xsl:apply-templates
select="body"/>
        </office:document-content>
    </xsl:template>

    <xsl:template match="body">
        <office:body>
            <office:text>
                <office:forms
form:automatic-focus="false"
form:apply-design-mode="false"/>
                <text:sequence-decls>
                    <text:sequence-decl
text:display-outline-level="0"
text:name="Illustration"/>
                    <text:sequence-decl
text:display-outline-level="0"
text:name="Table"/>
                    <text:sequence-decl
text:display-outline-level="0"
text:name="Text"/>
                    <text:sequence-decl
text:display-outline-level="0"
text:name="Drawing"/>
                </text:sequence-decls>
                <xsl:apply-templates
select="node()"/>
            </office:text>
        </office:body>
    </xsl:template>

    <xsl:template match="h1">
        <text:h
text:style-name="Heading_20_1"><xsl:apply-
templates select="node()"/></text:h>
    </xsl:template>

    <xsl:template match="h2">
        <text:h
text:style-name="Heading_20_2"><xsl:apply-
templates select="node()"/></text:h>
    </xsl:template>

      <xsl:template match="h3">
          <text:h
text:style-name="Heading_20_3"><xsl:apply-
templates select="node()"/></text:h>
      </xsl:template>

      <xsl:template match="h4">
          <text:h
text:style-name="Heading_20_4"><xsl:apply-
templates select="node()"/></text:h>
      </xsl:template>


      <xsl:template match="p">
          <text:p
text:style-name="Standard"><xsl:apply-temp
lates select="node()"/></text:p>
      </xsl:template>


      <xsl:template match="table">
          <table:table table:name="Table1"
table:style-name="Table1">
              <table:table-column
table:style-name="Table1.A"
table:number-columns-repeated="2"/>
              <!-- FIXME: should not do this... 
                   instead simply apply on node() and have
template matches for tr[th] -->
              <xsl:for-each
select="tr[th]">
                  <table:table-header-rows>
                      <table:table-row>
                          <xsl:apply-templates
select="th|td"/>
                      </table:table-row>
                  </table:table-header-rows>          
     
              </xsl:for-each>
              <xsl:for-each
select="tr[td]">
                  <table:table-row>
                      <xsl:apply-templates
select="td"/>
                  </table:table-row>
              </xsl:for-each>
          </table:table>
      </xsl:template>

      <xsl:template match="th|td">
          <table:table-cell
table:style-name="Table1.A1"
office:value-type="string">
              <xsl:call-template
name="text_applyer"/>
          </table:table-cell>        
      </xsl:template>

      <xsl:template match="ul">
          <text:list text:style-name="L1">
              <!-- FIXME: should not do this... 
                   instead simply apply on node() and have
template matches for li -->
              <xsl:for-each select="li">
                 
<text:list-item><xsl:call-template
name="text_applyer"/></text:list-item>
              </xsl:for-each>
          </text:list>
      </xsl:template>

      <xsl:template name="text_applyer">
          <xsl:choose>
              <xsl:when
test="text()"><text:p
text:style-name="Standard"><xsl:value-of
select="."/></text:p>
              </xsl:when>
              <xsl:otherwise><xsl:apply-templates
select="node()"/></xsl:otherwise>
          </xsl:choose>
      </xsl:template>

      <xsl:template match="p">
          <text:p
text:style-name="Standard"><xsl:apply-temp
lates select="node()"/></text:p>
          <text:p
text:style-name="Standard"></text:p>
      </xsl:template>

  </xsl:stylesheet>


-- 
Nic Ferrier
http://www.tapsellfer
rier.co.uk   for all your tapsell ferrier needs

------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribexml.openoffice.org
For additional commands, e-mail: dev-helpxml.openoffice.org

converting HTML to OpenOffice Writer
user name
2006-07-10 11:38:11
Nic James Ferrier wrote:

>I have an XSLT style sheet that converts HTML to an OO
document.
>
>It's pretty rough; I only have one transformation that
I'm needing to
>do right now so there might be all sorts of problems
with it.
>
>However, it might be of interest to people here.
>
>
>Anyone want to see it?
>

sure

> Help me make it better?
>  
>

sure, whereas you might want to compare with

http://www.o
pendocumentfoundation.org/repos/svn/libopendocument/trunk/xs
l/default/document2xhtml.xsl
resp.
http://www.opendocumentfoundation.org/repo
s/svn/libopendocument/trunk/

HTH

Michi

-- 
Michael Wechner
Wyona      -   Open Source Content Management   -    Apache
Lenya
http://www.wyona.com     
                http://lenya.apache.org
michael.wechnerwyona.com                        michiapache.org
+41 44 272 91 61

------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribexml.openoffice.org
For additional commands, e-mail: dev-helpxml.openoffice.org

[1-4]

about | contact  Other archives ( Real Estate discussion Medical topics )