List Info

Thread: WW8 font import




WW8 font import
user name
2006-09-06 19:12:22
Hi list-members,

I'm trying to find where the OOo WW8 filter import code
determines the 
font of a character. I see that a font table is built in the
WW8Fonts 
constructor, so I'm guessing that somewhere there's a text
attribute 
which contains a value pointing to entry in this table. But
I can't find 
it. Can anyone help?

Thanks,
Alan

------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribesw.openoffice.org
For additional commands, e-mail: dev-helpsw.openoffice.org

WW8 font import
user name
2006-09-06 00:20:23
On Wed, 2006-09-06 at 21:12 +0200, Alan Yaniger wrote:
> Hi list-members,
> 
> I'm trying to find where the OOo WW8 filter import
code determines the 
> font of a character. I see that a font table is built
in the WW8Fonts 
> constructor, so I'm guessing that somewhere there's a
text attribute 
> which contains a value pointing to entry in this table.
But I can't find 
> it. Can anyone help?

Yeah, see sw/source/filter/ww8/ww8par6.cxx
"SwWW8ImplReader::Read_FontCode" which
eventually calls
SwWW8ImplReader::GetFontParams with the font id, and that
font id is
looked up in the the list that WW8Fonts knows about.

There will be different font ids for "western"
font, CTL font and CJK
font. And as always if none is explicitly set for a range of
text, then
that of the currently applied character style is used, and
if no
character style (which is the normal case) then of the
paragraph style
in operation will be used.

C.

------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribesw.openoffice.org
For additional commands, e-mail: dev-helpsw.openoffice.org

WW8 font import
user name
2006-09-07 15:12:17
Hi Caolan,

Thanks for your help. If I may ask for more help:

I've noticed that when a certain Hebrew font is defined in
Word as the 
font of the "Header 1" style,  Read_FontCode is
not called with the 
font's id passed as an argument. Instead of displaying the
text with the 
"Header 1"-defined font , OOo uses the default
CTL font.  If in Word I 
change "Header 1" to use a different Hebrew
font, Read_FontCode *is* 
called with the font's id passed as an argument, and the
text looks fine.

The calls to Read_FontCode apparently are determined by the
sprm, as 
defined in GetWW2SprmDispatcher()

I would like to debug this by making a copy of the Word
document, 
changing the Header 1 font in one of them, importing the two
files into 
OOo, and dumping the sprm's of the two files and their
contents, and 
then comparing the dumps.
 
a) Do you think this is the proper way to debug this?
b) If so, how would I create such a dump?

Thanks,
Alan

Caolan McNamara wrote:

>On Wed, 2006-09-06 at 21:12 +0200, Alan Yaniger wrote:
>  
>
>>Hi list-members,
>>
>>I'm trying to find where the OOo WW8 filter import
code determines the 
>>font of a character. I see that a font table is
built in the WW8Fonts 
>>constructor, so I'm guessing that somewhere
there's a text attribute 
>>which contains a value pointing to entry in this
table. But I can't find 
>>it. Can anyone help?
>>    
>>
>
>Yeah, see sw/source/filter/ww8/ww8par6.cxx
>"SwWW8ImplReader::Read_FontCode" which
eventually calls
>SwWW8ImplReader::GetFontParams with the font id, and
that font id is
>looked up in the the list that WW8Fonts knows about.
>
>There will be different font ids for
"western" font, CTL font and CJK
>font. And as always if none is explicitly set for a
range of text, then
>that of the currently applied character style is used,
and if no
>character style (which is the normal case) then of the
paragraph style
>in operation will be used.
>
>C.
>
>--------------------------------------------------------
-------------
>To unsubscribe, e-mail: dev-unsubscribesw.openoffice.org
>For additional commands, e-mail: dev-helpsw.openoffice.org
>
>  
>

------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribesw.openoffice.org
For additional commands, e-mail: dev-helpsw.openoffice.org

WW8 font import
user name
2006-09-06 20:53:49
On Thu, 2006-09-07 at 17:12 +0200, Alan Yaniger wrote:
> Hi Caolan,
> 
> Thanks for your help. If I may ask for more help:
> 
> I've noticed that when a certain Hebrew font is
defined in Word as the 
> font of the "Header 1" style, 
Read_FontCode is not called with the 
> font's id passed as an argument. Instead of displaying
the text with the 
> "Header 1"-defined font , OOo uses the
default CTL font.  If in Word I 
> change "Header 1" to use a different Hebrew
font, Read_FontCode *is* 
> called with the font's id passed as an argument, and
the text looks fine.
> 
> The calls to Read_FontCode apparently are determined by
the sprm, as 
> defined in GetWW2SprmDispatcher()
> 
> I would like to debug this by making a copy of the Word
document, 
> changing the Header 1 font in one of them, importing
the two files into 
> OOo, and dumping the sprm's of the two files and their
contents, and 
> then comparing the dumps.
>  
> a) Do you think this is the proper way to debug this?
> b) If so, how would I create such a dump?

Well, what format is your document in ? Word 97+, Word 6/95
or Word 2.
Probably Word97 (in which case the table used is
"GetWW8SprmDispatcher"), but yes, when the
importer sees the "srpm" in
the stream it calls the appropiate handler for that code and
for the WW8
fontcode setting for CTL I'd expect the srpm to be 0x4a5E
(for WW8)

You speak here of a style. Styles inherit from eachother, so
if e.g. a
Heading 1 does not explictly set a font it would inherit the
font from
its parent. In MSWord itself you have the styles and
formatting menu. If
you investigate there you should be able to see what Word
says about
your style. Writer is similiar if you press F11 and right
click on
Heading 1 and modify and look at the text in the organizer
tab. For
example if it is based on something else (linked with) and
what explicit
overrides are applied to this style. Following the chain
backwards in
the UI of writer and word should show where the font is
really explictly
set.

You need to know where the font that "Heading 1"
should be using was
really set. If it was explicitly set in word for Heading 1,
but didn't
appear in Writer correctly, or if this actually happened in
a parent
style and Heading 1 import itself is fine.

At the top of the style chain you have the
"Normal" in word and
"Default" in writer style. The defaults for
these styles, because they
are the top of the chain, are a little different. They are
read in
ww8scan.cxx into ftcStandardChpCTLStsh from the header of
the stylesheet
which is at the start of all the styles.

So, check in writer and word in the UIs where the difference
in the
style chain occurs, and start work on that style, or on
where the
initial default fontid for CTL is set.

Once you are looking at the right style, then tweaking the
srpms by
poking is a reasonable approach. There are some notes here
though...

1. in word itself make sure that fastsave is disabled as
otherwise the
documents are really really difficult to work with, this is
in the
options->save of word somewhere near the end of the
menubar
2. word documents are ole structured storage documents so
they are
effectively a pile of streams combined into one file, so you
will see a
flattened view with a basic hexeditor, but you can still
work with it
for a little hex editing.
3. And finally the fileformat is little endian so e.g.
0x4a5E will be
stored as 5E 4A in the stream.

Any hexediter of your choice would be sufficient to find
0x4A5E's and
change the following fontid value, for example here's my
primitive
viewer and poker
http:
//www.skynet.ie/~caolan/Packages/chex.html
http
://www.skynet.ie/~caolan/Packages/cpoke.html

C.

------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribesw.openoffice.org
For additional commands, e-mail: dev-helpsw.openoffice.org

WW8 font import
user name
2006-09-08 11:48:59
Hi Caolan,

I appreciate your guidance here. Yes, in my Word document,
the CTL font 
is set higher up in the chain. "Header 1"
inherits from "Header" which 
inherits from "Normal", which is where the CTL
font is set. But 
curiously, I found that if in Word I change the CTL font in
"Normal" to 
the Hebrew font "Miriam", a hex dump on the doc
file shows one less
0x4a5E "srpm" than if I change it to any other
font. And indeed I found 
that when importing the file with "Miriam" to
OOo, there is one less 
call to Read_FontCode. As a result, the variable
"bCTLFontChanged" is 
"false" when WW8RStyle::Set1StyleDefaults() is
called, and OOo sets the 
display font to OOo's default CTL font instead of to
"Miriam".

Do you know why MS Word is saving the file this way when the
font is 
"Miriam"? Might "Miriam" be defined
in some other setting, and Word 
thinks that "Normal" is getting the font from
there?

Thanks,
Alan

Caolan McNamara wrote:

>On Thu, 2006-09-07 at 17:12 +0200, Alan Yaniger wrote:
>  
>
>>Hi Caolan,
>>
>>Thanks for your help. If I may ask for more help:
>>
>>I've noticed that when a certain Hebrew font is
defined in Word as the 
>>font of the "Header 1" style, 
Read_FontCode is not called with the 
>>font's id passed as an argument. Instead of
displaying the text with the 
>>"Header 1"-defined font , OOo uses the
default CTL font.  If in Word I 
>>change "Header 1" to use a different
Hebrew font, Read_FontCode *is* 
>>called with the font's id passed as an argument,
and the text looks fine.
>>
>>The calls to Read_FontCode apparently are determined
by the sprm, as 
>>defined in GetWW2SprmDispatcher()
>>
>>I would like to debug this by making a copy of the
Word document, 
>>changing the Header 1 font in one of them, importing
the two files into 
>>OOo, and dumping the sprm's of the two files and
their contents, and 
>>then comparing the dumps.
>> 
>>a) Do you think this is the proper way to debug
this?
>>b) If so, how would I create such a dump?
>>    
>>
>
>Well, what format is your document in ? Word 97+, Word
6/95 or Word 2.
>Probably Word97 (in which case the table used is
>"GetWW8SprmDispatcher"), but yes, when the
importer sees the "srpm" in
>the stream it calls the appropiate handler for that code
and for the WW8
>fontcode setting for CTL I'd expect the srpm to be
0x4a5E (for WW8)
>
>You speak here of a style. Styles inherit from
eachother, so if e.g. a
>Heading 1 does not explictly set a font it would inherit
the font from
>its parent. In MSWord itself you have the styles and
formatting menu. If
>you investigate there you should be able to see what
Word says about
>your style. Writer is similiar if you press F11 and
right click on
>Heading 1 and modify and look at the text in the
organizer tab. For
>example if it is based on something else (linked with)
and what explicit
>overrides are applied to this style. Following the chain
backwards in
>the UI of writer and word should show where the font is
really explictly
>set.
>
>You need to know where the font that "Heading
1" should be using was
>really set. If it was explicitly set in word for Heading
1, but didn't
>appear in Writer correctly, or if this actually happened
in a parent
>style and Heading 1 import itself is fine.
>
>At the top of the style chain you have the
"Normal" in word and
>"Default" in writer style. The defaults for
these styles, because they
>are the top of the chain, are a little different. They
are read in
>ww8scan.cxx into ftcStandardChpCTLStsh from the header
of the stylesheet
>which is at the start of all the styles.
>
>So, check in writer and word in the UIs where the
difference in the
>style chain occurs, and start work on that style, or on
where the
>initial default fontid for CTL is set.
>
>Once you are looking at the right style, then tweaking
the srpms by
>poking is a reasonable approach. There are some notes
here though...
>
>1. in word itself make sure that fastsave is disabled as
otherwise the
>documents are really really difficult to work with, this
is in the
>options->save of word somewhere near the end of the
menubar
>2. word documents are ole structured storage documents
so they are
>effectively a pile of streams combined into one file, so
you will see a
>flattened view with a basic hexeditor, but you can still
work with it
>for a little hex editing.
>3. And finally the fileformat is little endian so e.g.
0x4a5E will be
>stored as 5E 4A in the stream.
>
>Any hexediter of your choice would be sufficient to find
0x4A5E's and
>change the following fontid value, for example here's
my primitive
>viewer and poker
>http:
//www.skynet.ie/~caolan/Packages/chex.html
>http
://www.skynet.ie/~caolan/Packages/cpoke.html
>
>C.
>
>--------------------------------------------------------
-------------
>To unsubscribe, e-mail: dev-unsubscribesw.openoffice.org
>For additional commands, e-mail: dev-helpsw.openoffice.org
>
>  
>

------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribesw.openoffice.org
For additional commands, e-mail: dev-helpsw.openoffice.org

WW8 font import
user name
2006-09-07 18:09:04
On Fri, 2006-09-08 at 13:48 +0200, Alan Yaniger wrote: 
> Hi Caolan,
> 
> I appreciate your guidance here. Yes, in my Word
document, the CTL font 
> is set higher up in the chain. "Header 1"
inherits from "Header" which 
> inherits from "Normal", which is where the
CTL font is set. But 
> curiously, I found that if in Word I change the CTL
font in "Normal" to 
> the Hebrew font "Miriam", a hex dump on the
doc file shows one less
> 0x4a5E "srpm" than if I change it to any
other font. And indeed I found 
> that when importing the file with "Miriam"
to OOo, there is one less 
> call to Read_FontCode. As a result, the variable
"bCTLFontChanged" is 
> "false" when WW8RStyle::Set1StyleDefaults()
is called, and OOo sets the 
> display font to OOo's default CTL font instead of to
"Miriam".
> 
> Do you know why MS Word is saving the file this way
when the font is 
> "Miriam"? Might "Miriam" be
defined in some other setting, and Word 
> thinks that "Normal" is getting the font
from there?

That's excellent work. Now, the default style can have its
fontid set,
but if the fontid is *not* set, then we look to the default
fontids for
the MSWord stylesheet that owns these styles, this is
"rgftcStandardChpStsh" and is read in
ww8scan.cxx, see
"ftcStandardChpCTLStsh" for what is the CTL one
(or what we *think* is
the CTL one)

If the internal MSWord default font is Miriam, it is
possibly the case
that Word doesn't set this even in the "Normal"
style as it is the same
as the root MSWord CTL font, and that the "default CTL
font" is expected
to be used. What I'd expect to see here is something being
set for the
ftcStandardChpCTLStsh value, can you use your debugger, or
some printf's
to see if...

a) ftcStandardChpCTLStsh gets set to some value when your
document is
loaded ?, and if so what the value is. This value is just an
index into
the font list.

Where the problem is likely to be that either,
a) this number is not being read, or is read incorrectly
b) the number is correct, but the list of fontnames was read
incorrectly
and they don't match
c) word has some ugly hack of some kind which hardcodes the
root default
CTL font to Miriam
d) something else

WW8Fonts::WW8Fonts is where we read in this list of
fontnames and the
fontid is a simple index into that list. So the other thing
to do is
to debug there and see what fonts this document contains,
and what their
indexes are. What we're really looking for here is to see
if Miriam is
in this fontlist, and what index in the fontlist it is. 

Ideally the index in this fontlist in WW8Fonts::WW8Fonts for
"Miriam" is
the same as the value for ftcStandardChpCTLStsh. But now we
need to know
if Miriam is missing from this list or not, and what value
for
ftcStandardChpCTLStsh was set.

One possibility, but it's very speculative, is that in
ww8scan.cxx
WW8Style::WW8Style ftcStandardChpCTLStsh is not read from
the MSWord
document at all because the particular version of word you
have writes a
short header with no default CTL fontid value, and what we
should do in
this case is take the Western fontid. But that's purely an
idea. More
data is needed as to what fonts WW8Fonts::WW8Fonts sees, and
what index
is read for ftcStandardChpStsh and ftcStandardChpCTLStsh

C.

------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribesw.openoffice.org
For additional commands, e-mail: dev-helpsw.openoffice.org

WW8 font import
user name
2006-09-08 14:59:53
Hi Caolan,

I found that Miriam is included in the font list in the
WW8Fonts 
constructor, with index 3.
I also found that ftcStandardChpCTLStsh has the value 0.
If in my debugger, I change ftcStandardChpCTLStsh's value
to 3, Miriam's 
index - lo and behold, the text is displayed in the Miriam
font. So now, 
given the four possibilities you listed of what went wrong,
how do I go 
about finding out which is correct?

Thanks very much,
Alan

Caolan McNamara wrote:

>a) ftcStandardChpCTLStsh gets set to some value when
your document is
>loaded ?, and if so what the value is. This value is
just an index into
>the font list.
>
>Where the problem is likely to be that either,
>a) this number is not being read, or is read incorrectly
>b) the number is correct, but the list of fontnames was
read incorrectly
>and they don't match
>c) word has some ugly hack of some kind which hardcodes
the root default
>CTL font to Miriam
>d) something else
>
>WW8Fonts::WW8Fonts is where we read in this list of
fontnames and the
>fontid is a simple index into that list. So the other
thing to do is
>to debug there and see what fonts this document
contains, and what their
>indexes are. What we're really looking for here is to
see if Miriam is
>in this fontlist, and what index in the fontlist it is. 
>
>Ideally the index in this fontlist in WW8Fonts::WW8Fonts
for "Miriam" is
>the same as the value for ftcStandardChpCTLStsh. But now
we need to know
>if Miriam is missing from this list or not, and what
value for
>ftcStandardChpCTLStsh was set.
>
>One possibility, but it's very speculative, is that in
ww8scan.cxx
>WW8Style::WW8Style ftcStandardChpCTLStsh is not read
from the MSWord
>document at all because the particular version of word
you have writes a
>short header with no default CTL fontid value, and what
we should do in
>this case is take the Western fontid. But that's purely
an idea. More
>data is needed as to what fonts WW8Fonts::WW8Fonts sees,
and what index
>is read for ftcStandardChpStsh and ftcStandardChpCTLStsh
>
>C.
>
>--------------------------------------------------------
-------------
>To unsubscribe, e-mail: dev-unsubscribesw.openoffice.org
>For additional commands, e-mail: dev-helpsw.openoffice.org
>
>  
>

------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribesw.openoffice.org
For additional commands, e-mail: dev-helpsw.openoffice.org

WW8 font import
user name
2006-09-07 19:59:49
On Fri, 2006-09-08 at 16:59 +0200, Alan Yaniger wrote:
> Hi Caolan,
> 
> I found that Miriam is included in the font list in the
WW8Fonts 
> constructor, with index 3.
> I also found that ftcStandardChpCTLStsh has the value
0.
> If in my debugger, I change ftcStandardChpCTLStsh's
value to 3, Miriam's 
> index - lo and behold, the text is displayed in the
Miriam font. So now, 
> given the four possibilities you listed of what went
wrong, how do I go 
> about finding out which is correct?

That's the tricky bit.

Does ftcStandardChpCTLStsh actually get *read* from the file
as 0, or is
it 0 because it remained at the default value of 0.

i.e. if a breakpoint is put on 
        rSt >> ftcStandardChpCTLStsh;
in ww8scan.cxx is it triggered ?

Maybe also send me an empty .doc which shows this behaviour
and I might
be able to hazard a better guess as to what the problem is.

C.

------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribesw.openoffice.org
For additional commands, e-mail: dev-helpsw.openoffice.org

WW8 font import
user name
2006-09-07 21:09:09
On Thu, 2006-09-07 at 20:59 +0100, Caolan McNamara wrote:
> On Fri, 2006-09-08 at 16:59 +0200, Alan Yaniger wrote:
> > Hi Caolan,
> > 
> > I found that Miriam is included in the font list
in the WW8Fonts 
> > constructor, with index 3.
> > I also found that ftcStandardChpCTLStsh has the
value 0.
> > If in my debugger, I change
ftcStandardChpCTLStsh's value to 3, Miriam's 
> > index - lo and behold, the text is displayed in
the Miriam font. So now, 
> > given the four possibilities you listed of what
went wrong, how do I go 
> > about finding out which is correct?
> 
> That's the tricky bit.
> 
> Does ftcStandardChpCTLStsh actually get *read* from the
file as 0, or is
> it 0 because it remained at the default value of 0.
> 
> i.e. if a breakpoint is put on 
>         rSt >> ftcStandardChpCTLStsh;
> in ww8scan.cxx is it triggered ?
> 
> Maybe also send me an empty .doc which shows this
behaviour and I might
> be able to hazard a better guess as to what the problem
is.

Hmm, as you say all 3 default fontids are 0, but the Miriam
font is the
4th font. It might be that Word totally ignores this value,
I'll look
into it. 

Perhaps hexpoking the .doc file at the ftcStandardChp*Stsh
value
locations and setting to other different but valid indexes
into the font
table and reloading in Word itself will show if word
actually honours
that setting at all, or only for e.g. western and otherwise
always uses
its internal default font as the top of the tree font
setting, i.e. word
might be hardcoded to use Miriam for the default CTL font
when the
"Normal" doesn't explictly set it.

C.

------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribesw.openoffice.org
For additional commands, e-mail: dev-helpsw.openoffice.org

WW8 font import
user name
2006-09-10 09:11:37
Hi Caolan,

I've found that hexpoking the value of the
ftcStandardChpStsh (western 
font ) is honored by Word. By changing the value of 0 to 2,
the text was 
dislayed byWord in Arial instead of Times New Roman.

However, when I did the same to ftcStandardChpCTLStsh (for
Hebrew), I 
should have gotten text in Arial, but I got it in Miriam. It
seems that 
if "Normal" is set to use "Miriam"
for CTL, Word just acts as if CTL was 
not set at all, and relies on Miriam as a default, hardcoded
in Word 
somewhere.

To import a document like this into OOo, do we have to do
the same? 
Namely, hardcode "Miriam" as a default CTL font,
when the value of 
ftcStandardChpCTLStsh is 0 and the language is Hebrew (and
supply other 
defaults for Arabic and other RTL languages)? Or will this
break 
something else?

Alan

Caolan McNamara wrote:

>Hmm, as you say all 3 default fontids are 0, but the
Miriam font is the
>4th font. It might be that Word totally ignores this
value, I'll look
>into it. 
>
>Perhaps hexpoking the .doc file at the
ftcStandardChp*Stsh value
>locations and setting to other different but valid
indexes into the font
>table and reloading in Word itself will show if word
actually honours
>that setting at all, or only for e.g. western and
otherwise always uses
>its internal default font as the top of the tree font
setting, i.e. word
>might be hardcoded to use Miriam for the default CTL
font when the
>"Normal" doesn't explictly set it.
>
>C.
>
>--------------------------------------------------------
-------------
>To unsubscribe, e-mail: dev-unsubscribesw.openoffice.org
>For additional commands, e-mail: dev-helpsw.openoffice.org
>
>  
>

------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribesw.openoffice.org
For additional commands, e-mail: dev-helpsw.openoffice.org

[1-10] [11-16]

about | contact  Other archives ( Real Estate discussion Medical topics )