Hi Kohei,
Kohei Yoshida wrote:
> Embedding of R into Calc is legally impossible as R is
released under
> GPL. However, we could dynamically load it at run time
without
> violating the license term (but again, IANAL).
Well, with embedding I mean load at runtime the necessary
resources (as
discussed on the stat wiki page).
> Do you have a reference for this code i.e. is it
entirely your own
> code, or is it derived from another source
(application, book, paper,
> etc.)?
The code is a direct port by *ME* of the ANOVA mathematical
definition:
i.e. it is this definition that is written in all books (and
articles) I
have read about ANOVA.
HOWEVER: almost all books give then another
"practical" formula, which
is to calculate the square of residuals using a difference
of 2 other
terms. *These formulas* (containing a difference) are
usually *unstable*
(see my previous discussion, and the current implementation
of
correlation and some other Calc functions to see the adverse
effects of
a subtraction), in that they can generate impossible values
(like
negative F statistic).
So, this is essentially my port of the formula. Of course, I
took a look
at the correlation code (because I had previously NO idea
how to access
the data in Calc). My other reference was (a very good one):
http://courses.ncssm.edu/math/Stat_Inst/PDFS/NEWANOVA.p
df (see also the
stat wiki page, ANOVA section,
http://wiki.ser
vices.openoffice.org/wiki/Statistical_Data_Analysis_Tool#Mul
tiple-Groups_Inference).
However, that document describes the matrix approach to
ANOVA. Calc does
NOT allow matrix calculations, so I transposed the ideas
back to simple
calculations. [Currently I have implemented only the one-way
non-blocked
type ANOVA. When the code is complete and functional, I will
make the
changes for the various flavours of ANOVA, but it makes NO
sense now, to
have multiple code segments to update.]
One LAST COMMENT:
The code outputs the *F statistic*, NOT the *p Value*.
To obtain the p-value, a call to FDIST('F statistic value',
dfB, dfE)
must be made, BUT:
- I did NOT figure out yet where this function is and how
it's named
- all statistic software outputs both the F statistic AND
the p-value
[and a lot of other data - this is actually meaningful]
So, in the longer term we must also think about more
expanded output,
because Calc is really limited here. And all modern
statistical
functions and techniques DO OUTPUT a lot of data, not just a
p-value.
Hope this clarifies some issues.
Sincerely,
Leonard Mada
------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribe sc.openoffice.org
For additional commands, e-mail: dev-help sc.openoffice.org
|