Metropoli BBS - stat03.doc

VIEWER: stat03.doc MODE: TEXT (CP437)
			STAT03

A collection of statistical/matrix utilities for the HP48G(X)

		(c) 1996 by Christian Meland
		christian.meland@pfi.no

IMPORTANT: This is the third version, and it has not been fully 
tested. Backup your memory. Also, I do not know if it will run on 
any other version than rom version R, as some of the unsupported 
statistics entries have moved around a bit.


0.0 DISCLAIMER
==============

STAT03 is distributed in the hope that it will be useful, but the 
copyright holder provide the program "as is" without warranty of 
any kind, either expressed or implied, including, but not limited to, 
the implied warranties of merchantability and fitness for a 
particular purpose. In no event will the copyright holder be liable 
to you for damages, including any general, special, incidental or 
consequential damages arising out of the use or inability to use the 
program.


1.0 CREDITS
===========

Tamir Demri wrote the quiksort program I use in MEDI, PTILE 
and SRT. He also wrote the heapsort-program I modified to sort 
arrays with two columns (more than two columns are 
delt with by using indexes.)
Laurent Bercot (HPZeus) wrote the NDUPN program.
The code used to revert the stack is from Jim Donnelly's "An intro 
to hp-48 sysRPL and ass. lang."
I used Mika Heiskanen's JAZZ to write the library.
Scott Hyde suggested to implement harmonic and geometric means 
and did some beta-testing.


2.0 OVERVIEW
============

STAT03 is a collection of utilities I find useful for users working 
with statistics and numerical matrices. It is still mostly basic 
commands, but they should be of great help for those who would 
like to implement more advanced tests and tasks.


2.1 HISTORY
===========

This is the third release. The number of commands has increased to 
49, and the size to about 9,3 kb. More than 23 % is ml, the rest is 
sysRPL. So far I have not got much response on my programs, so 
maybe that's why the progress is slow :-)? If you use it, give me a 
word or two, suggestions ar appreciated (christian.meland@pfi.no)


3.0 INSTALLATION
================

The STAT03 library works in any port of a HP48G(X).

To install STAT03 on your HP48 download the file 'stat03.lib' to 
your calculator and store in it a port of your choice. For example to 
store the library to port 0, type in (or press suitable keys)

	'stat03.lib' DUP RCL SWAP PURGE 0 STO

and power-cycle the calculator. The library will autoattach itself to 
the HOME directory.

To delete the library, press ON-C, then type (press)
:p: 1042 DUP DETACH PURGE
where p is the port number where the library is stored. (If this 
makes no sense to you, the port number is 0 ;-) )


4.0 COMMANDS
============


->VEC (alias VEC)

Stack: (real-meta
	or list
	or array --> [ vector ]


->CVEC (alias CV)

Stack:	(real-meta
	or list
	or array --> [[ column ]] (n*1)


MANIP

MANIP is used to manipulate matrices, like "let column 4 be equal 
to exp(column 5)". The user has access to an algebraic command 
CNR, that allows him/her to spesify column-numbers.

Stack:	([[real]] col_nr << program >>
	or [[real]] col_nr 'algebraic'
	or [[real]] col_nr 'global'
	or [[real]] col_nr 'local'
	or [[real]] col_nr real	 --> [[real]]

col_nr can be from 1 to p+1, where p=number of columns in input-
array. If col_nr are p+1, a new column is appended, or else the 
spesified column is overwritten.

Examples:

[[1 4][2 5][3 6]] 3 << 1 CNR 2 CNR + >> MANIP
-> [[1 4 5][2 5 7][3 6 9]]

[[1 4 5][2 5 7][3 6 9]] 3 'Sum(I=1,3,CNR(I))' MANIP
-> [[1 4 10][2 5 14][3 6 18]]


CNR

CNR is used in MANIP, and will not give any reaponse if used 
outside MANIP.

Syntax: col_nr CNR or 'CNR(col_nr)'



You will find that many of the M... commands are stack-
replacements for some of the built-in stat. commands. Most of 
them also give the user some additional features, such as mean of a 
spesific column or mean of the entire array.


MTOT

Stack:	[[real]] -> [ tot1 tot2 ... totp]
	or [[real]] col_nr -> tot(col_nr)
	or [[real]] 0 -> tot(whole array)


MMEAN

Stack:	[[real]] -> [ mean1 mean2 ... meanp]
	or [[real]] col_nr -> mean(col_nr)
	or [[real]] 0 -> mean(whole array)


MSDEV

Stack:	[[real]] -> [ sdev1 sdev2 ... sdevp]
	or [[real]] col_nr -> sdev(col_nr)
	or [[real]] 0 -> sdev(whole array)


MMAX

Stack:	[[real]] -> [ max1 max2 ... max]
	or [[real]] col_nr -> max (col_nr)
	or [[real]] 0 -> max(whole array)


MMIN

Stack:	[[real]] -> [ min1 min2 ... minp]
	or [[real]] col_nr -> min(col_nr)
	or [[real]] 0 -> min(whole array)


MVAR

Stack:	[[real]] -> [ var1 var2 ... varp]
	or [[real]] col_nr -> var(col_nr)
	or [[real]] 0 -> var(whole array)


MSSQ

Stack:	[[real]] -> [ ssq1 ssq2 ... ssqp]
	or [[real]] col_nr -> ssq(col_nr)
	or [[real]] 0 -> ssq(whole array)

Sum of squares, Sum(X^2)


MSXY

Stack:	[[real]] col_x col_y -> sum(x*y)

Sum of products, Sum(x*y)


MSLN

Stack:	[[real]] -> [ sumln1 sumln2 ... sumlnp]
	or [[real]] col_nr -> sumln(col_nr)
	or [[real]] 0 -> sumln(whole array)

Sum of logaritms, Sum(ln(x))


MGMEAN

Stack:	[[real]] -> [ gmean1 gmean2 ... gmeanp]
	or [[real]] col_nr -> gmean(col_nr)
	or [[real]] 0 -> gmean(whole array)

Geometric mean, Prod(X)^(1/n)


MHMEAN

Stack:	[[real]] -> [ hmean1 hmean2 ... hmeanp]
	or [[real]] col_nr -> hmean(col_nr)
	or [[real]] 0 -> hmean(whole array)

Harmonic mean, n/(sum(inv(x)))


MMEDI

Stack:	[[real]] -> [ medi1 medi2 ... medip]
	or [[real]] col_nr -> medi(col_nr)
	or [[real]] 0 -> medi(whole array)

Median of data.


MC

Stack:	[[real]] -> number_of_cols
	or [real] -> number_of_elements

Number of columns. For vectors: number of elements.


MR

Stack:	[[real]] -> number_of_cols
	or [real] -> number_of_elements

Number of rows. For vectors: number of elements.


SQARR

Stack: ([[real]] --> [[real]] 
	or [real] --> [real])

Squares all values in an array.


EADD

Stack:	[[real]] [[real]] -> [[real]] (both can be vectors)
	or [[real]] real -> [[real]]
	or real [[real]] -> [[real]]

Addition of arrays, or addition by a constant. If two arrays are 
added, the number of elements must be equal, and the result will 
inherit the dimensions from the array on stack level 1


ESUB

Stack:	[[real]] [[real]] -> [[real]] (both can be vectors)
	or [[real]] real -> [[real]]
	or real [[real]] -> [[real]]

Subtraction of arrays, or subtraction of a constant. If two arrays are 
subtracted, the number of elements must be equal, and the result 
will inherit the dimensions from the array on stack level 1


EMUL

Stack:	[[real]] [[real]] -> [[real]] (both can be vectors)
	or [[real]] real -> [[real]]
	or real [[real]] -> [[real]]

Element-by-element multiplication of arrays, or multiplication by a 
scalar. If two arrays are multiplied, the number of elements must be 
equal, and the result will inherit the dimensions from the array on 
stack level 1


EDIV

Stack:	[[real]] [[real]] -> [[real]] (both can be vectors)
	or [[real]] real -> [[real]]
	or real [[real]] -> [[real]]

Element-by-element division of arrays, or division by a divisor. If 
two arrays are divided, the number of elements must be equal, and 
the result will inherit the dimensions from the array on stack level 
1


MEDI

Stack:	--> [ medians ] or median

Returns the median of the array stored as SigmaDAT. Very fast 
(returns the median of a { 100 10 } RANDom array in 1.5 sec's).


WMEAN

Stack:	--> mean

WMEAN returns the weighted mean value. The data-values are 
stored in the xcol of SigmaDAT, and the weights (or frequencies) 
are stored in the ycol.


WSDEV

Stack:	--> std

WSDEV returns the weighted standard deviation value. The data-
values are stored in the xcol of SigmaDAT, and the weights (or 
frequencies) are stored in the ycol.


WVAR

Stack:	--> var

WVAR returns the weighted variance value. The data-values are 
stored in the xcol of SigmaDAT, and the weights (or frequencies) 
are stored in the ycol.


WPSDEV

Stack:	--> pstd

WPSDEV returns the weighted standard deviation for the 
population (denominator n instead of the usual n-1). The data-
values are stored in the xcol of SigmaDAT, and the weights (or 
requencies) are stored in the ycol.


WPVAR

Stack:	--> pvar

WPSTD returns the weighted variance for the population 
(denominator n instead of the usual n-1). The data-values are 
stored in the xcol of SigmaDAT, and the weights (or frequencies) 
are stored in the ycol.


COLM

Stack:	[[real]] col_nr --> [[real]]

Extracts the spesific column from an array. Same as COL- SWAP 
DROP ->COL, but faster, the result is still a column, and does not 
error if the input has only one column.


ROWM

Stack:	[[real]] row_nr --> [real]

Extracts the spesific row from an array. Same as ROW- SWAP 
DROP, but faster.


COLSW

Stack:	[[real]] col_nr1 col_nr2 --> [[real]]

Swap two columns in an array. Same as CSWP, but faster.


BINS1

Stack:	min delta nbins --> [[bins]] [below above]

Fast replacement for the built in BINS-command, with minor 
changes. Very fast in some cases, fast in other. A large number of 
bins and/or a large amount of data located right below max and 
(min+max)/2 will slow the process much more than a high number 
of data to count.

Comparison: (all data are produced by RAND, with an initial 123 
RDZ)

Program 	#data	min	delta	nbins	Time	#data/sec
BINS1		100	0	1/20	20	0.1878	532.3
BINS						2.184	9.16
BINS1		1000	0	1/100	100	2.850	350.8
BINS						44.38	22.53
BINS1		1000	0	1/10	10	0.5624	1778
BINS						41.52	24.08
BINS1		3000	0	1/100	100	7.908	380
BINS						ca. 133	ca. 22
BINS1		3000	0	1/2	2	1.266	2370
BINS1		3000	-1	1	1	0.891	3366
BINS1		1000	-1	1/100	100	3.226	310
BINS1		1000	1	1/100	100	0.726	1377

Given min, delta and nbins, it returns the number of observations 
in each cell [[ min,min+d>,[min+d,min+2d>,...,[min+(nbins-1)d, 
min+nbins*d>] and [number below, number above]
Note that the last interval is open [...> and not closed [...], as with 
BINS. The program does not handle data with exponent equal to 
499 properly, but this should not be any problem in most cases.


PTILE

Stack:	[[real]] p --> percentile
	or [real] p --> percentile

Returns the p'th percentile of the array. 0<=p<=100. Fast.


SRT (alias QSRT)

Stack:	[[real]] --> [[real]]
	or { reals } --> { reals }
	or real_meta --> real_meta
	or [[real]] col_nr --> [[real]]				
	<---- NEW
	or [[real]] -col_nr --> [[real]] (descending)		
	<---- NEW
	or { list of any objects } << program >> -> { sorted list }
	<---- NEW

This is an attempt to make a fast general purpose sort command. 
The three new "sub-commands" use a heapsort algoritm written by 
Tamir Demri, I have made some modifications and added some ml 
indexing and rearrengment routines. The other "sub-commands" 
use a quicksort algoritm written by Tamir Demri. They require 
about 1.6 times size of object to sort free space. To sort in 
decreasing order, use REV after sorting is done. SRT is very fast. It 
sorts an 1000 elem. array (made by RAND) in 1.54 sec's.

When input is [[real]] col_nr, it will sort the array by that spesific 
column, and in decreasing order if col_nr is negative.

When input is {list} << program>>, the list can contain any object, 
and different types can be mixed. The << program >> should, for 
each element in the list, return a real that is to be the sorting-key. 
For example, to sort a list of elements by their size, one would use
<< BYTES SWAP DROP >> as sorting-key. The program is quite 
fast, to sort a list of 500 random complex numbers by their 
imaginary part, only takes 4 sec. if you use :: C>Im ; as << 
program >>. The list is decomposed and rebuilt, so you may run 
into a heavy garbage-collection from time to time. Also you need 
some free space ( appx. 2 times size of list + 16 times number of 
elements) To sort lists of strings, remember that NUM only give 
you the value of the first character. You may use the built in SORT 
command, or use :: ROMPTR 412 5D #>% ; as << program >> 
(ROMPTR 412 5D is the routine to revert strings, #>% is the body 
of B->R, but it works well with strings, too.)
Tips: To randomize a list, simply use << DROP RAND >> as 
sorting-key.


TRNS

Stack:	[[real]] --> [[real]]
	or [[complex]] --> [[complex]]
	or 'local' -->
	or 'global' -->

Faster replacement for the built in TRN. Conjugates the elements 
of a complex array (like TRN does).


META

Stack:	n --> 1 2 ... n n
Makes a real meta object.
Example: 5 META --> 1 2 3 4 5 5


NDUPN

Stack: (ob n --> ob ob ... ob n)

Code written by HPZeus. Makes n copies of the object on stack 
level 2.


REV

Stack:	{list} --> {list}
	or meta --> meta
	or string --> string
	or [[real]] --> [[real]]
	or [real] --> [real]

Reverses the elements/characters/numbers of an object.


ZALPHA (algebraic)

Stack:	alpha --> Z(alpha)

For a given alpha value 0<=alpha<=1, computes the z (standard 
normally distr.) value such that P(Z>z)=alpha (=> 0 1 z UTPN = 
alpha).
Flag 54:
Clear: ZALPHA gives a (good enough) numerical approximation. 
Typical time: 0.06 sec. Accuracy is in most cases better than +/-
0.001
Set: ZALPHA gives an accurate answer.
Part of algoritm found in Numerical Recipes


TALPHA (algebraic)

Stack:	df alpha --> t(df,alpha)

For a given set of (df, alpha), df >=1, 0<=alpha<=1, TALPHA 
computes the t (student-t distr.) value such that P(T>t|df)=alpha 
(=> df t UTPT = alpha). 
Flag 54:
Clear: TALPHA gives a (good enough) numerical approximation. 
Typical time: 0.13 sec. Accuracy is in most cases better than +/-
0.01
Set: TALPHA gives an accurate answer.
Part of algoritm found in Numerical Recipes


UTPG (algebraic)

Stack:	k x --> utpg

Returns the upper tail prob. that a gamma(k) deviate should be 
greater than x.
= 1 - integral(0,x,exp(-t)*t^(k-1),t)/gamma(k)
Uses a numerical solution found in Numerical Recipes. Note that 
although the gammadistribution has two parameters, you can 
always standardize x such that only k (the shape-parameter) is 
needed. I did not want to implement the standardisation, because 
the gamma-distribution is presented in different ways in different 
books. The most usual ones require that you either multiply or 
divide x by a (a is the scale-parameter).


DIGAMMA (algebraic)

Stack:	x --> digamma(x)

Returns the digammafunction of x.
Digamma(x)=	d/dx[ln(gamma(x))]


UNI

Stack:	n -> [uniform(0,1)]
	or { n k } -> [[uniform(0,1)]]

Return a vector or array filled with RAND-numbers.


NORM

Stack:	--> norm

Generates a normally distr. value with expectation 0 and variance 
1. Norm=sqrt(-2*ln(%RAN)*SINRAD(2PI*%RAN)


NNORM

Stack:	mean sdev n --> [[real]]

Given mean, sdev and number of deviates, generates n independent 
values from a normal(mean,sdev^2) distribution, and collects them 
in a { n 1 } array. Uses a acsept/reject  algoritm (donÆt know the 
correct english phrase)


MAKERN

Stack:	n ->

Routine to draw öitemsö without replacement. (like picking 
numbered balls out of a hat.) This command make a program in the 
current directory named ÆRANnÆ where n is the number of balls to 
pick. MAKERN will error if n is greater than 125000. The size of 
the ÆRANnÆ program is n/2+20,5 bytes.

Example: Make a program to pick 5 balls.
5 MAKERN
You will now find a program named ÆRAN5Æ in your directory. 
With an initial 1 RDZ it will return:
RAN5	-> 4
RAN5	-> 3
RAN5	-> 5
RAN5	-> 2
RAN5	-> 1
RAN5	-> Error: No More Balls Left!

The ÆRANnÆ program is self-modifying, so donÆt use it from a port. 
To see that it actually is self-modifying, recall a copy of the 
program to the stack, then execute << RANn DROP >> several 
times.

 
STEST

Stack:	{list of names/objects} n --> [[number mean std. 
	lower upper]...]

Appends a program named 'TestPrg' (must exist in the same 
directory, and have stackdiagram ( ob_from_list --> real)) to each 
of the elements in the list n times, and records the results of each of 
the n trials. The order of which the list is traversed, is randomized. 
I use it mostly to test which of several procedures are the fastest 
one.

For example, to see which of SIN and COS is in general the fastest 
on to perform, I would store SIN and COS under the names 'S' and 
'C' (you are free to choose your own names), then store this 
program under the name 'TestPrg' (assuming RAD mode) 
<< RAND 6.2832 * SWAP TIM DTAG SWAP DROP >>
(TIM is in M. Heiskanen's HACK library) Then I would execute
{ S C } 50 STEST
On my calc, it returns
 [[ 1 15.38... 1.55... 14.94... 15.82...]
 [ 2 15.56... 0.94... 15.30... 15.83... ]]
indicating that even if SIN was a bit faster than COS, it was not 
signifficantly faster (because the intervals cover eachother).
The level of confidence of the interval 95%.
Note that TestPrg may take input from the user, and such be used 
as a data collector.


PREGR

Used in polynomial regression. Still not complete, but this 
will help you on the way.

Stack: [[real]] m -> [[1 r1 r1^2 ... r1^m]
		...[1 rn rn^2 ... rn^m]]
To fit a 4'th degree polynomial to the data in a column of an array, 
given that the array has two columns and the first is x, the second 
is y, I would execute the following program:

<< ->COL DROP SWAP 4 PREGR LSQ >>

A faster version of LSQ is << DUP TRNS DUP ROT * INV 
SWAP * SWAP * >>. It is not as accurate though.

Things to do:

t-tests (one and two-sample, indep. and depn.)
ANOVA (one and two-way)
Konfidence-intervals for p, my, sigma
UTPBinomial
Trigamma
Maybe simulation of some more distributions 
(exponential...yes, I know it's easy)
Chi-square tests
Any suggestions?
[ RETURN TO DIRECTORY ]