STAT03 A collection of statistical/matrix utilities for the HP48G(X) (c) 1996 by Christian Meland christian.meland@pfi.no IMPORTANT: This is the third version, and it has not been fully tested. Backup your memory. Also, I do not know if it will run on any other version than rom version R, as some of the unsupported statistics entries have moved around a bit. 0.0 DISCLAIMER ============== STAT03 is distributed in the hope that it will be useful, but the copyright holder provide the program "as is" without warranty of any kind, either expressed or implied, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. In no event will the copyright holder be liable to you for damages, including any general, special, incidental or consequential damages arising out of the use or inability to use the program. 1.0 CREDITS =========== Tamir Demri wrote the quiksort program I use in MEDI, PTILE and SRT. He also wrote the heapsort-program I modified to sort arrays with two columns (more than two columns are delt with by using indexes.) Laurent Bercot (HPZeus) wrote the NDUPN program. The code used to revert the stack is from Jim Donnelly's "An intro to hp-48 sysRPL and ass. lang." I used Mika Heiskanen's JAZZ to write the library. Scott Hyde suggested to implement harmonic and geometric means and did some beta-testing. 2.0 OVERVIEW ============ STAT03 is a collection of utilities I find useful for users working with statistics and numerical matrices. It is still mostly basic commands, but they should be of great help for those who would like to implement more advanced tests and tasks. 2.1 HISTORY =========== This is the third release. The number of commands has increased to 49, and the size to about 9,3 kb. More than 23 % is ml, the rest is sysRPL. So far I have not got much response on my programs, so maybe that's why the progress is slow :-)? If you use it, give me a word or two, suggestions ar appreciated (christian.meland@pfi.no) 3.0 INSTALLATION ================ The STAT03 library works in any port of a HP48G(X). To install STAT03 on your HP48 download the file 'stat03.lib' to your calculator and store in it a port of your choice. For example to store the library to port 0, type in (or press suitable keys) 'stat03.lib' DUP RCL SWAP PURGE 0 STO and power-cycle the calculator. The library will autoattach itself to the HOME directory. To delete the library, press ON-C, then type (press) :p: 1042 DUP DETACH PURGE where p is the port number where the library is stored. (If this makes no sense to you, the port number is 0 ;-) ) 4.0 COMMANDS ============ ->VEC (alias VEC) Stack: (real-meta or list or array --> [ vector ] ->CVEC (alias CV) Stack: (real-meta or list or array --> [[ column ]] (n*1) MANIP MANIP is used to manipulate matrices, like "let column 4 be equal to exp(column 5)". The user has access to an algebraic command CNR, that allows him/her to spesify column-numbers. Stack: ([[real]] col_nr << program >> or [[real]] col_nr 'algebraic' or [[real]] col_nr 'global' or [[real]] col_nr 'local' or [[real]] col_nr real --> [[real]] col_nr can be from 1 to p+1, where p=number of columns in input- array. If col_nr are p+1, a new column is appended, or else the spesified column is overwritten. Examples: [[1 4][2 5][3 6]] 3 << 1 CNR 2 CNR + >> MANIP -> [[1 4 5][2 5 7][3 6 9]] [[1 4 5][2 5 7][3 6 9]] 3 'Sum(I=1,3,CNR(I))' MANIP -> [[1 4 10][2 5 14][3 6 18]] CNR CNR is used in MANIP, and will not give any reaponse if used outside MANIP. Syntax: col_nr CNR or 'CNR(col_nr)' You will find that many of the M... commands are stack- replacements for some of the built-in stat. commands. Most of them also give the user some additional features, such as mean of a spesific column or mean of the entire array. MTOT Stack: [[real]] -> [ tot1 tot2 ... totp] or [[real]] col_nr -> tot(col_nr) or [[real]] 0 -> tot(whole array) MMEAN Stack: [[real]] -> [ mean1 mean2 ... meanp] or [[real]] col_nr -> mean(col_nr) or [[real]] 0 -> mean(whole array) MSDEV Stack: [[real]] -> [ sdev1 sdev2 ... sdevp] or [[real]] col_nr -> sdev(col_nr) or [[real]] 0 -> sdev(whole array) MMAX Stack: [[real]] -> [ max1 max2 ... max] or [[real]] col_nr -> max (col_nr) or [[real]] 0 -> max(whole array) MMIN Stack: [[real]] -> [ min1 min2 ... minp] or [[real]] col_nr -> min(col_nr) or [[real]] 0 -> min(whole array) MVAR Stack: [[real]] -> [ var1 var2 ... varp] or [[real]] col_nr -> var(col_nr) or [[real]] 0 -> var(whole array) MSSQ Stack: [[real]] -> [ ssq1 ssq2 ... ssqp] or [[real]] col_nr -> ssq(col_nr) or [[real]] 0 -> ssq(whole array) Sum of squares, Sum(X^2) MSXY Stack: [[real]] col_x col_y -> sum(x*y) Sum of products, Sum(x*y) MSLN Stack: [[real]] -> [ sumln1 sumln2 ... sumlnp] or [[real]] col_nr -> sumln(col_nr) or [[real]] 0 -> sumln(whole array) Sum of logaritms, Sum(ln(x)) MGMEAN Stack: [[real]] -> [ gmean1 gmean2 ... gmeanp] or [[real]] col_nr -> gmean(col_nr) or [[real]] 0 -> gmean(whole array) Geometric mean, Prod(X)^(1/n) MHMEAN Stack: [[real]] -> [ hmean1 hmean2 ... hmeanp] or [[real]] col_nr -> hmean(col_nr) or [[real]] 0 -> hmean(whole array) Harmonic mean, n/(sum(inv(x))) MMEDI Stack: [[real]] -> [ medi1 medi2 ... medip] or [[real]] col_nr -> medi(col_nr) or [[real]] 0 -> medi(whole array) Median of data. MC Stack: [[real]] -> number_of_cols or [real] -> number_of_elements Number of columns. For vectors: number of elements. MR Stack: [[real]] -> number_of_cols or [real] -> number_of_elements Number of rows. For vectors: number of elements. SQARR Stack: ([[real]] --> [[real]] or [real] --> [real]) Squares all values in an array. EADD Stack: [[real]] [[real]] -> [[real]] (both can be vectors) or [[real]] real -> [[real]] or real [[real]] -> [[real]] Addition of arrays, or addition by a constant. If two arrays are added, the number of elements must be equal, and the result will inherit the dimensions from the array on stack level 1 ESUB Stack: [[real]] [[real]] -> [[real]] (both can be vectors) or [[real]] real -> [[real]] or real [[real]] -> [[real]] Subtraction of arrays, or subtraction of a constant. If two arrays are subtracted, the number of elements must be equal, and the result will inherit the dimensions from the array on stack level 1 EMUL Stack: [[real]] [[real]] -> [[real]] (both can be vectors) or [[real]] real -> [[real]] or real [[real]] -> [[real]] Element-by-element multiplication of arrays, or multiplication by a scalar. If two arrays are multiplied, the number of elements must be equal, and the result will inherit the dimensions from the array on stack level 1 EDIV Stack: [[real]] [[real]] -> [[real]] (both can be vectors) or [[real]] real -> [[real]] or real [[real]] -> [[real]] Element-by-element division of arrays, or division by a divisor. If two arrays are divided, the number of elements must be equal, and the result will inherit the dimensions from the array on stack level 1 MEDI Stack: --> [ medians ] or median Returns the median of the array stored as SigmaDAT. Very fast (returns the median of a { 100 10 } RANDom array in 1.5 sec's). WMEAN Stack: --> mean WMEAN returns the weighted mean value. The data-values are stored in the xcol of SigmaDAT, and the weights (or frequencies) are stored in the ycol. WSDEV Stack: --> std WSDEV returns the weighted standard deviation value. The data- values are stored in the xcol of SigmaDAT, and the weights (or frequencies) are stored in the ycol. WVAR Stack: --> var WVAR returns the weighted variance value. The data-values are stored in the xcol of SigmaDAT, and the weights (or frequencies) are stored in the ycol. WPSDEV Stack: --> pstd WPSDEV returns the weighted standard deviation for the population (denominator n instead of the usual n-1). The data- values are stored in the xcol of SigmaDAT, and the weights (or requencies) are stored in the ycol. WPVAR Stack: --> pvar WPSTD returns the weighted variance for the population (denominator n instead of the usual n-1). The data-values are stored in the xcol of SigmaDAT, and the weights (or frequencies) are stored in the ycol. COLM Stack: [[real]] col_nr --> [[real]] Extracts the spesific column from an array. Same as COL- SWAP DROP ->COL, but faster, the result is still a column, and does not error if the input has only one column. ROWM Stack: [[real]] row_nr --> [real] Extracts the spesific row from an array. Same as ROW- SWAP DROP, but faster. COLSW Stack: [[real]] col_nr1 col_nr2 --> [[real]] Swap two columns in an array. Same as CSWP, but faster. BINS1 Stack: min delta nbins --> [[bins]] [below above] Fast replacement for the built in BINS-command, with minor changes. Very fast in some cases, fast in other. A large number of bins and/or a large amount of data located right below max and (min+max)/2 will slow the process much more than a high number of data to count. Comparison: (all data are produced by RAND, with an initial 123 RDZ) Program #data min delta nbins Time #data/sec BINS1 100 0 1/20 20 0.1878 532.3 BINS 2.184 9.16 BINS1 1000 0 1/100 100 2.850 350.8 BINS 44.38 22.53 BINS1 1000 0 1/10 10 0.5624 1778 BINS 41.52 24.08 BINS1 3000 0 1/100 100 7.908 380 BINS ca. 133 ca. 22 BINS1 3000 0 1/2 2 1.266 2370 BINS1 3000 -1 1 1 0.891 3366 BINS1 1000 -1 1/100 100 3.226 310 BINS1 1000 1 1/100 100 0.726 1377 Given min, delta and nbins, it returns the number of observations in each cell [[ min,min+d>,[min+d,min+2d>,...,[min+(nbins-1)d, min+nbins*d>] and [number below, number above] Note that the last interval is open [...> and not closed [...], as with BINS. The program does not handle data with exponent equal to 499 properly, but this should not be any problem in most cases. PTILE Stack: [[real]] p --> percentile or [real] p --> percentile Returns the p'th percentile of the array. 0<=p<=100. Fast. SRT (alias QSRT) Stack: [[real]] --> [[real]] or { reals } --> { reals } or real_meta --> real_meta or [[real]] col_nr --> [[real]] <---- NEW or [[real]] -col_nr --> [[real]] (descending) <---- NEW or { list of any objects } << program >> -> { sorted list } <---- NEW This is an attempt to make a fast general purpose sort command. The three new "sub-commands" use a heapsort algoritm written by Tamir Demri, I have made some modifications and added some ml indexing and rearrengment routines. The other "sub-commands" use a quicksort algoritm written by Tamir Demri. They require about 1.6 times size of object to sort free space. To sort in decreasing order, use REV after sorting is done. SRT is very fast. It sorts an 1000 elem. array (made by RAND) in 1.54 sec's. When input is [[real]] col_nr, it will sort the array by that spesific column, and in decreasing order if col_nr is negative. When input is {list} << program>>, the list can contain any object, and different types can be mixed. The << program >> should, for each element in the list, return a real that is to be the sorting-key. For example, to sort a list of elements by their size, one would use << BYTES SWAP DROP >> as sorting-key. The program is quite fast, to sort a list of 500 random complex numbers by their imaginary part, only takes 4 sec. if you use :: C>Im ; as << program >>. The list is decomposed and rebuilt, so you may run into a heavy garbage-collection from time to time. Also you need some free space ( appx. 2 times size of list + 16 times number of elements) To sort lists of strings, remember that NUM only give you the value of the first character. You may use the built in SORT command, or use :: ROMPTR 412 5D #>% ; as << program >> (ROMPTR 412 5D is the routine to revert strings, #>% is the body of B->R, but it works well with strings, too.) Tips: To randomize a list, simply use << DROP RAND >> as sorting-key. TRNS Stack: [[real]] --> [[real]] or [[complex]] --> [[complex]] or 'local' --> or 'global' --> Faster replacement for the built in TRN. Conjugates the elements of a complex array (like TRN does). META Stack: n --> 1 2 ... n n Makes a real meta object. Example: 5 META --> 1 2 3 4 5 5 NDUPN Stack: (ob n --> ob ob ... ob n) Code written by HPZeus. Makes n copies of the object on stack level 2. REV Stack: {list} --> {list} or meta --> meta or string --> string or [[real]] --> [[real]] or [real] --> [real] Reverses the elements/characters/numbers of an object. ZALPHA (algebraic) Stack: alpha --> Z(alpha) For a given alpha value 0<=alpha<=1, computes the z (standard normally distr.) value such that P(Z>z)=alpha (=> 0 1 z UTPN = alpha). Flag 54: Clear: ZALPHA gives a (good enough) numerical approximation. Typical time: 0.06 sec. Accuracy is in most cases better than +/- 0.001 Set: ZALPHA gives an accurate answer. Part of algoritm found in Numerical Recipes TALPHA (algebraic) Stack: df alpha --> t(df,alpha) For a given set of (df, alpha), df >=1, 0<=alpha<=1, TALPHA computes the t (student-t distr.) value such that P(T>t|df)=alpha (=> df t UTPT = alpha). Flag 54: Clear: TALPHA gives a (good enough) numerical approximation. Typical time: 0.13 sec. Accuracy is in most cases better than +/- 0.01 Set: TALPHA gives an accurate answer. Part of algoritm found in Numerical Recipes UTPG (algebraic) Stack: k x --> utpg Returns the upper tail prob. that a gamma(k) deviate should be greater than x. = 1 - integral(0,x,exp(-t)*t^(k-1),t)/gamma(k) Uses a numerical solution found in Numerical Recipes. Note that although the gammadistribution has two parameters, you can always standardize x such that only k (the shape-parameter) is needed. I did not want to implement the standardisation, because the gamma-distribution is presented in different ways in different books. The most usual ones require that you either multiply or divide x by a (a is the scale-parameter). DIGAMMA (algebraic) Stack: x --> digamma(x) Returns the digammafunction of x. Digamma(x)= d/dx[ln(gamma(x))] UNI Stack: n -> [uniform(0,1)] or { n k } -> [[uniform(0,1)]] Return a vector or array filled with RAND-numbers. NORM Stack: --> norm Generates a normally distr. value with expectation 0 and variance 1. Norm=sqrt(-2*ln(%RAN)*SINRAD(2PI*%RAN) NNORM Stack: mean sdev n --> [[real]] Given mean, sdev and number of deviates, generates n independent values from a normal(mean,sdev^2) distribution, and collects them in a { n 1 } array. Uses a acsept/reject algoritm (don’t know the correct english phrase) MAKERN Stack: n -> Routine to draw ”items” without replacement. (like picking numbered balls out of a hat.) This command make a program in the current directory named ’RANn’ where n is the number of balls to pick. MAKERN will error if n is greater than 125000. The size of the ’RANn’ program is n/2+20,5 bytes. Example: Make a program to pick 5 balls. 5 MAKERN You will now find a program named ’RAN5’ in your directory. With an initial 1 RDZ it will return: RAN5 -> 4 RAN5 -> 3 RAN5 -> 5 RAN5 -> 2 RAN5 -> 1 RAN5 -> Error: No More Balls Left! The ’RANn’ program is self-modifying, so don’t use it from a port. To see that it actually is self-modifying, recall a copy of the program to the stack, then execute << RANn DROP >> several times. STEST Stack: {list of names/objects} n --> [[number mean std. lower upper]...] Appends a program named 'TestPrg' (must exist in the same directory, and have stackdiagram ( ob_from_list --> real)) to each of the elements in the list n times, and records the results of each of the n trials. The order of which the list is traversed, is randomized. I use it mostly to test which of several procedures are the fastest one. For example, to see which of SIN and COS is in general the fastest on to perform, I would store SIN and COS under the names 'S' and 'C' (you are free to choose your own names), then store this program under the name 'TestPrg' (assuming RAD mode) << RAND 6.2832 * SWAP TIM DTAG SWAP DROP >> (TIM is in M. Heiskanen's HACK library) Then I would execute { S C } 50 STEST On my calc, it returns [[ 1 15.38... 1.55... 14.94... 15.82...] [ 2 15.56... 0.94... 15.30... 15.83... ]] indicating that even if SIN was a bit faster than COS, it was not signifficantly faster (because the intervals cover eachother). The level of confidence of the interval 95%. Note that TestPrg may take input from the user, and such be used as a data collector. PREGR Used in polynomial regression. Still not complete, but this will help you on the way. Stack: [[real]] m -> [[1 r1 r1^2 ... r1^m] ...[1 rn rn^2 ... rn^m]] To fit a 4'th degree polynomial to the data in a column of an array, given that the array has two columns and the first is x, the second is y, I would execute the following program: << ->COL DROP SWAP 4 PREGR LSQ >> A faster version of LSQ is << DUP TRNS DUP ROT * INV SWAP * SWAP * >>. It is not as accurate though. Things to do: t-tests (one and two-sample, indep. and depn.) ANOVA (one and two-way) Konfidence-intervals for p, my, sigma UTPBinomial Trigamma Maybe simulation of some more distributions (exponential...yes, I know it's easy) Chi-square tests Any suggestions?