List Info

Thread: Statistical Functions Implementation




Statistical Functions Implementation
user name
2006-11-15 16:59:17
Hi Kohei,

Kohei Yoshida wrote:
> Embedding of R into Calc is legally impossible as R is
released under
> GPL.  However, we could dynamically load it at run time
without
> violating the license term (but again, IANAL).

Well, with embedding I mean load at runtime the necessary
resources (as 
discussed on the stat wiki page). 

> Do you have a reference for this code i.e. is it
entirely your own
> code, or is it derived from another source
(application, book, paper,
> etc.)?

The code is a direct port by *ME* of the ANOVA mathematical
definition: 
i.e. it is this definition that is written in all books (and
articles) I 
have read about ANOVA.

HOWEVER: almost all books give then another
"practical" formula, which 
is to calculate the square of residuals using a difference
of 2 other 
terms. *These formulas* (containing a difference) are
usually *unstable* 
(see my previous discussion, and the current implementation
of 
correlation and some other Calc functions to see the adverse
effects of 
a subtraction), in that they can generate impossible values
(like 
negative F statistic).

So, this is essentially my port of the formula. Of course, I
took a look 
at the correlation code (because I had previously NO idea
how to access 
the data in Calc). My other reference was (a very good one):

http://courses.ncssm.edu/math/Stat_Inst/PDFS/NEWANOVA.p
df (see also the 
stat wiki page, ANOVA section, 
http://wiki.ser
vices.openoffice.org/wiki/Statistical_Data_Analysis_Tool#Mul
tiple-Groups_Inference). 
However, that document describes the matrix approach to
ANOVA. Calc does 
NOT allow matrix calculations, so I transposed the ideas
back to simple 
calculations. [Currently I have implemented only the one-way
non-blocked 
type ANOVA. When the code is complete and functional, I will
make the 
changes for the various flavours of ANOVA, but it makes NO
sense now, to 
have multiple code segments to update.]

One LAST COMMENT:
The code outputs the *F statistic*, NOT the *p Value*.
To obtain the p-value, a call to FDIST('F statistic value',
dfB, dfE) 
must be made, BUT:
 - I did NOT figure out yet where this function is and how
it's named
 - all statistic software outputs both the F statistic AND
the p-value
   [and a lot of other data - this is actually meaningful]

So, in the longer term we must also think about more
expanded output, 
because Calc is really limited here. And all modern
statistical 
functions and techniques DO OUTPUT a lot of data, not just a
p-value.

Hope this clarifies some issues.

Sincerely,

Leonard Mada

------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribesc.openoffice.org
For additional commands, e-mail: dev-helpsc.openoffice.org

Statistical Functions Implementation
user name
2006-11-15 17:56:11
Leonard Mada wrote:
> One LAST COMMENT:
> The code outputs the *F statistic*, NOT the *p Value*.
> To obtain the p-value, a call to FDIST('F statistic
value', dfB, dfE) 
> must be made, BUT:
> - I did NOT figure out yet where this function is and
how it's named
> - all statistic software outputs both the F statistic
AND the p-value
>   [and a lot of other data - this is actually
meaningful]
> 
> So, in the longer term we must also think about more
expanded output, 
> because Calc is really limited here. And all modern
statistical 
> functions and techniques DO OUTPUT a lot of data, not
just a p-value.

A function can return an array (LINEST does this, for
example). However, 
ANOVA was always an example of the things that might better
be handled 
in an add-on doing a one-time calculation, not an add-in
function (see 
issue 4921).

Niklas

------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribesc.openoffice.org
For additional commands, e-mail: dev-helpsc.openoffice.org

[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )