(* * LANGUAGE : ANS Forth * PROJECT : Forth Environments * DESCRIPTION : Estimate MFLOPS rating * CATEGORY : Benchmark * AUTHOR : Marcel Hendrix * LAST CHANGE : January 13, 2001, Marcel Hendrix *) NEEDS -miscutil REVISION -flops "ÄÄÄ MFLOPS rating Version 0.00 ÄÄÄ" DOC (* Flops.c is a 'c' program which attempts to estimate your systems floating-point 'MFLOPS' rating for the FADD, FSUB, FMUL, and FDIV operations based on specific 'instruction mixes' (discussed below). The program provides an estimate of PEAK MFLOPS performance by making maximal use of register variables with minimal interaction with main memory. The execution loops are all small so that they will fit in any cache. Flops.c can be used along with Linpack and the Livermore kernels (which exersize memory much more extensively) to gain further insight into the limits of system performance. The flops.c execution modules also include various percent weightings of FDIV's (from 0% to 25% FDIV's) so that the range of performance can be obtained when using FDIV's. FDIV's, being computationally more intensive than FADD's or FMUL's, can impact performance considerably on some systems. Flops.c consists of 8 independent modules (routines) which, except for module 2, conduct numerical integration of various functions. Module 2, estimates the value of pi based upon the Maclaurin series expansion of atan(1). MFLOPS ratings are provided for each module, but the programs overall results are summerized by the MFLOPS(1), MFLOPS(2), MFLOPS(3), and MFLOPS(4) outputs. The MFLOPS(1) result is identical to the result provided by all previous versions of flops.c. It is based only upon the results from modules 2 and 3. Two problems surfaced in using MFLOPS(1). First, it was difficult to completely 'vectorize' the result due to the recurrence of the 's' variable in module 2. This problem is addressed in the MFLOPS(2) result which does not use module 2, but maintains nearly the same weighting of FDIV's (9.2%) as in MFLOPS(1) (9.6%). The second problem with MFLOPS(1) centers around the percentage of FDIV's (9.6%) which was viewed as too high for an important class of problems. This concern is addressed in the MFLOPS(3) result where NO FDIV's are conducted at all. The number of floating-point instructions per iteration (loop) is given below for each module executed: MODULE FADD FSUB FMUL FDIV TOTAL Comment 1 7 0 6 1 14 7.1% FDIV's 2 3 2 1 1 7 difficult to vectorize. 3 6 2 9 0 17 0.0% FDIV's 4 7 0 8 0 15 0.0% FDIV's 5 13 0 15 1 29 3.4% FDIV's 6 13 0 16 0 29 0.0% FDIV's 7 3 3 3 3 12 25.0% FDIV's 8 13 0 17 0 30 0.0% FDIV's A*2+3 21 12 14 5 52 A=5, MFLOPS(1), Same as 40.4% 23.1% 26.9% 9.6% previous versions of the flops.c program. Includes only Modules 2 and 3, does 9.6% FDIV's, and is not easily vectorizable. 1+3+4 58 14 66 14 152 A=4, MFLOPS(2), New output +5+6+ 38.2% 9.2% 43.4% 9.2% does not include Module 2, A*7 but does 9.2% FDIV's. 1+3+4 62 5 74 5 146 A=0, MFLOPS(3), New output +5+6+ 42.9% 3.4% 50.7% 3.4% does not include Module 2, 7+8 but does 3.4% FDIV's. 3+4+6 39 2 50 0 91 A=0, MFLOPS(4), New output +8 42.9% 2.2% 54.9% 0.0% does not include Module 2, and does NO FDIV's. NOTE: Various timer routines are included as indicated below. The timer routines, with some comments, are attached at the end of the main program. NOTE: Please do not remove any of the printouts. EXAMPLE COMPILATION: UNIX based systems cc -DUNIX -O flops.c -o flops cc -DUNIX -DROPT flops.c -o flops cc -DUNIX -fast -O4 flops.c -o flops . . . etc. Al Aburto aburto@nosc.mil *) ENDDOC 1e FCONSTANT A0 -0.1666666666671334e FCONSTANT A1 0.833333333809067e-2 FCONSTANT A2 0.198412715551283e-3 FVALUE A3 -- changeable ... 0.27557589750762e-5 FCONSTANT A4 0.2507059876207e-7 FVALUE A5 -- changeable ... 0.164105986683e-9 FCONSTANT A6 1e FCONSTANT B0 -0.4999999999982e FCONSTANT B1 0.4166666664651e-1 FCONSTANT B2 -0.1388888805755e-2 FCONSTANT B3 0.24801428034e-4 FCONSTANT B4 -0.2754213324e-6 FCONSTANT B5 0.20189405e-8 FCONSTANT B6 0.3999999946405e-1 FCONSTANT D1 0.96e-3 FCONSTANT D2 0.1233153e-5 FCONSTANT D3 0.48e-3 FCONSTANT E2 0.411051e-6 FCONSTANT E3 (* ************************************************** Set Variable Values. T[1] references all timing results relative to one million loops. The program will execute from 31250 to 512000000 loops based on a runtime of Module 1 of at least TLimit = 15.0 seconds. That is, a runtime of 15 seconds for Module 1 is used to determine the number of loops to execute. No more than NLimit = 512000000 loops are allowed ************************************************** *) #1500 =: TLimit -- 1.5 seconds ( was 15000 ) #51200000 =: NLimit -- maximum number of loops ( was 512000000 ) #15625 =: loops -- Initial number of loops, DO NOT CHANGE! : FVALUES 0 ?DO 0e FVALUE LOOP ; 8 FVALUES T[0] T[1] T[2] T[3] T[4] T[5] T[6] T[7] 8 FVALUES T[8] T[9] T[10] T[11] T[12] T[13] T[14] T[15] 8 FVALUES T[16] T[17] T[18] T[19] T[20] T[21] T[22] T[23] 8 FVALUES T[24] T[25] T[26] T[27] T[28] T[29] T[30] T[31] 4 FVALUES T[32] T[33] T[34] T[35] 0e FVALUE scale 0e FVALUE nulltime loops VALUE m \ #128 ALLOT \ ( n ) ALLOT : FMS? ( F: -- time ) MS? S>F 1e-3 F* ; : SFMS? ( F: -- stime ) FMS? scale F* nulltime F- ; (* ************************************************* Module 1. Calculate integral of df(x)/f(x) defined below. Result is ln(f(1)). There are 14 double precision operations per loop ( 7 +, 0 -, 6 *, 1 / ) that are included in the timing. 50.0% +, 00.0% -, 42.9% *, and 07.1% / ************************************************* *) : MODULE-1 ( -- ) loops 0 LOCALS| sa n | 1e 0e 0e 0e 0e FLOCALS| xx uu ss vv ww | BEGIN sa TLimit < WHILE 0e TO ss 0e TO vv 1e TO ww n 2* DUP TO n S>F 1/F TO xx TIMER-RESET n 1 DO ww +TO vv vv xx F* TO uu uu D3 F* D2 F+ uu F* D1 F+ uu E3 F* E2 F+ uu F* D1 F+ uu F* ww F+ F/ +TO ss LOOP MS? TO sa n NLimit >= UNTIL THEN 1e6 n S>F F/ FDUP TO scale TO T[1] TIMER-RESET n 0 DO LOOP FMS? scale F* 0e FMAX TO nulltime sa S>F 1e-3 F* scale F* nulltime F- TO T[2] T[2] 14e ( #flops?) F/ FDUP TO T[3] 1/F TO T[4] D1 D2 F+ D3 F+ D1 E2 F+ E3 F+ F1+ F/ D1 F+ ss F2* F+ F2/ xx F* 1/F FDUP 40e3 F* scale F/ F>S TO m ( note: m is multiple of 4 ) ( sb) 25.2e F- ( error) ." 1 " ( error) #13 E.R T[2] #12 F.R T[4] #12 F.R ; (* ***************************************************** Module 2. Calculate value of PI from Taylor Series expansion of atan(1.0). There are 7 double precision operations per loop ( 3 +, 2 -, 1 *, 1 / ) that are included in the timing. 42.9% +, 28.6% -, 14.3% *, and 14.3% / ***************************************************** *) : MODULE-2 ( -- ) 1e 0e 0e 0e 0e -1e FLOCALS| sa xx uu ss vv ww | TIMER-RESET ( note: m is multiple of 4 ) m 2/ 0 DO 2e +TO uu 2e +TO uu LOOP FMS? scale F* 0e FMAX TO T[5] sa TO uu 0e TO vv 0e TO ww 0e TO xx TIMER-RESET m 2/ 0 DO uu 2e F+ 5e FOVER F- +TO xx FDUP -5e F* +TO vv 5e FOVER F/ +TO ww 2e F+ FDUP TO uu -5e FOVER F- +TO xx FDUP 5e F* +TO vv -5e FSWAP F/ +TO ww LOOP FMS? scale F* TO T[6] T[6] T[5] F- 7e ( #flops) F/ FDUP TO T[7] 1/F TO T[8] sa xx F* m S>F F/ F>S TO m ww 4e F* 5e F/ 5e vv F/ F+ ( sb) 31.25e vv FDUP FSQR F* F/ F- PI F- ( pi_error) CR ." 2 " #13 E.R T[6] T[5] F- #12 F.R T[8] #12 F.R ; (* ****************************************************** Module 3. Calculate integral of sin(x) from 0.0 to PI/3.0 using Trapazoidal Method. Result is 0.5. There are 17 double precision operations per loop (6 +, 2 -, 9 *, 0 /) included in the timing. 35.3% +, 11.8% -, 52.9% *, and 00.0% / ****************************************************** *) : MODULE-3 ( -- ) PI m 3 * S>F F/ 0e 0e 0e 0e FLOCALS| uu ss vv ww xx | TIMER-RESET m 1 DO 1e +TO vv vv xx F* TO uu uu FSQR A6 FOVER F* A5 F- FOVER F* A4 F+ FOVER F* A3 F- FOVER F* A2 F+ FOVER F* A1 F+ F* F1+ uu F* +TO ss LOOP SFMS? FDUP TO T[9] 17e F/ FDUP TO T[10] 1/F TO T[11] PI 3e F/ TO uu uu FSQR TO ww A6 ww F* A5 F- ww F* A4 F+ ww F* A3 F- ww F* A2 F+ ww F* A1 F+ ww F* 1e F+ uu F* ( sa) ss F2* F+ F2/ xx F* 0.5e F- ( error ) CR ." 3 " #13 E.R T[9] #12 F.R T[11] #12 F.R ; (* *********************************************************** Module 4. Calculate Integral of cos(x) from 0.0 to PI/3 using the Trapazoidal Method. Result is sin(PI/3). There are 15 double precision operations per loop (7 +, 0 -, 8 *, and 0 / ) included in the timing. 50.0% +, 00.0% -, 50.0% *, 00.0% / *********************************************************** *) : MODULE-4 ( -- ) A3 FNEGATE TO A3 A5 FNEGATE TO A5 PI m 3 * S>F F/ 0e 0e 0e 0e FLOCALS| uu ss vv ww xx | TIMER-RESET m 1 DO I S>F xx F* FSQR B6 FOVER F* B5 F+ FOVER F* B4 F+ FOVER F* B3 F+ FOVER F* B2 F+ FOVER F* B1 F+ FSWAP F* F1+ +TO ss LOOP SFMS? FDUP TO T[12] 15e F/ FDUP TO T[13] 1/F TO T[14] PI 3e F/ TO uu uu FSQR TO ww B6 ww F* B5 F+ ww F* B4 F+ ww F* B3 F+ ww F* B2 F+ ww F* B1 F+ ww F* F1+ ( sa) ss F2* F1+ F+ F2/ xx F* ( sa ) A6 ww F* A5 F+ ww F* A4 F+ ww F* A3 F+ ww F* A2 F+ ww F* A1 F+ ww F* A0 F+ uu F* ( sb) F- ( error ) CR ." 4 " #13 E.R T[12] #12 F.R T[14] #12 F.R ; (* *********************************************************** Module 5. Calculate Integral of tan(x) from 0.0 to PI/3 using the Trapezoidal Method. Result is ln(cos(PI/3)). There are 29 double precision operations per loop (13 +, 0 -, 15 *, and 1 /) included in the timing. 46.7% +, 00.0% -, 50.0% *, and 03.3% / *********************************************************** *) : MODULE-5 ( -- ) PI m 3 * S>F F/ 0e 0e 0e 0e FLOCALS| uu ss vv ww xx | TIMER-RESET m 1 DO I S>F xx F* FDUP FSQR FSWAP FOVER ( ww uu ww ) A6 FOVER F* A5 F+ FOVER F* A4 F+ FOVER F* A3 F+ FOVER F* A2 F+ FOVER F* A1 F+ F* F1+ F* ( ww vv ) FSWAP B6 FOVER F* B5 F+ FOVER F* B4 F+ FOVER F* B3 F+ FOVER F* B2 F+ FOVER F* B1 F+ F* F1+ F/ +TO ss LOOP SFMS? FDUP TO T[15] 29e F/ FDUP TO T[16] 1/F TO T[17] PI 3e F/ TO uu uu FSQR TO ww A6 ww F* A5 F+ ww F* A4 F+ ww F* A3 F+ ww F* A2 F+ ww F* A1 F+ ww F* F1+ uu F* ( sa ) B6 ww F* B5 F+ ww F* B4 F+ ww F* B3 F+ ww F* B2 F+ ww F* B1 F+ ww F* F1+ F/ ( sa/sb ) ss F2* F+ F2/ xx F* 0.6931471805599453e F- ( error ) CR ." 5 " #13 E.R T[15] #12 F.R T[17] #12 F.R ; (* *********************************************************** Module 6. Calculate Integral of sin(x)*cos(x) from 0.0 to PI/4 using the Trapezoidal Method. Result is sin(PI/4)^2. There are 29 double precision operations per loop (13 +, 0 -, 16 *, and 0 /) included in the timing. 46.7% +, 00.0% -, 53.3% *, and 00.0% / *********************************************************** *) : MODULE-6 ( -- ) PI m 4 * S>F F/ 0e 0e 0e 0e FLOCALS| uu ss vv ww xx | TIMER-RESET m 1 DO I S>F xx F* FDUP FSQR ( uu ww ) FSWAP FOVER ( ww uu ww ) A6 FOVER F* A5 F+ FOVER F* A4 F+ FOVER F* A3 F+ FOVER F* A2 F+ FOVER F* A1 F+ F* F1+ F* ( vv ) FSWAP B6 FOVER F* B5 F+ FOVER F* B4 F+ FOVER F* B3 F+ FOVER F* B2 F+ FOVER F* B1 F+ F* F1+ F* +TO ss LOOP SFMS? FDUP TO T[18] 29e F/ FDUP TO T[19] 1/F TO T[20] PI 4e F/ TO uu uu FSQR TO ww A6 ww F* A5 F+ ww F* A4 F+ ww F* A3 F+ ww F* A2 F+ ww F* A1 F+ ww F* F1+ uu F* ( sa ) B6 ww F* B5 F+ ww F* B4 F+ ww F* B3 F+ ww F* B2 F+ ww F* B1 F+ ww F* F1+ F* ( sa*sb ) ss F2* F+ F2/ xx F* 0.25e F- ( error ) CR ." 6 " #13 E.R T[18] #12 F.R T[20] #12 F.R ; (* ****************************************************** Module 7. Calculate value of the definite integral from 0 to sa of 1/(x+1), x/(x*x+1), and x*x/(x*x*x+1) using the Trapezoidal Rule. There are 12 double precision operations per loop ( 3 +, 3 -, 3 *, and 3 / ) that are included in the timing. 25.0% +, 25.0% -, 25.0% *, and 25.0% / ****************************************************** *) : MODULE-7 ( -- ) 0e 0e 0e 0e 1e 102.3321513995275e FLOCALS| sa ww ss uu vv xx | sa m S>F F/ TO vv TIMER-RESET m 1 DO I S>F vv F* FDUP TO xx ( xx) FSQR TO uu ss ww FDUP xx F+ F/ xx uu ww F+ F/ F+ uu FDUP xx F* ww F+ F/ F+ F- TO ss LOOP SFMS? FDUP TO T[21] 12e F/ FDUP TO T[22] 1/F TO T[23] sa TO xx xx FSQR TO uu ww ww xx ww F+ F/ F+ xx uu ww F+ F/ F+ uu xx uu F* ww F+ F/ F+ FNEGATE ( sa) ss F2* F+ vv F* 18e F* TO sa sa -2000e F* scale F/ F>S TO m sa 500.2e F+ ( error) CR ." 7 " #13 E.R T[21] #12 F.R T[23] #12 F.R ; (* *********************************************************** Module 8. Calculate Integral of sin(x)*cos(x)*cos(x) from 0 to PI/3 using the Trapezoidal Method. Result is (1-cos(PI/3)^3)/3. There are 30 double precision operations per loop included in the timing: 13 +, 0 -, 17 * 0 / 46.7% +, 00.0% -, 53.3% *, and 00.0% / *********************************************************** *) : MODULE-8 ( -- ) PI m 3 * S>F F/ 0e 0e 0e 0e FLOCALS| uu ss vv ww xx | TIMER-RESET m 1 DO I S>F xx F* FDUP FSQR ( uu ww ) B6 FOVER F* B5 F+ FOVER F* B4 F+ FOVER F* B3 F+ FOVER F* B2 F+ FOVER F* B1 F+ FOVER F* F1+ FSQR ( vv^2 ) FSWAP A6 FOVER F* A5 F+ FOVER F* A4 F+ FOVER F* A3 F+ FOVER F* A2 F+ FOVER F* A1 F+ F* F1+ ( uu vv^2 w ) F* F* +TO ss LOOP SFMS? FDUP TO T[24] 30e F/ FDUP TO T[25] 1/F TO T[26] PI 3e F/ TO uu uu FSQR TO ww A6 ww F* A5 F+ ww F* A4 F+ ww F* A3 F+ ww F* A2 F+ ww F* A1 F+ ww F* F1+ uu F* ( sa ) B6 ww F* B5 F+ ww F* B4 F+ ww F* B3 F+ ww F* B2 F+ ww F* B1 F+ ww F* F1+ FSQR F* ( sa*sb^2 ) ss F2* F+ F2/ xx F* 0.29166666666666667e F- ( error ) CR ." 8 " #13 E.R T[24] #12 F.R T[26] #12 F.R ; (* ************************************************* MFLOPS(1) output. This is the same weighting used for all previous versions of the flops.c program. Includes Modules 2 and 3 only. ************************************************* *) : MFLOPS(1) ( F: -- t ) T[6] T[5] F- 5e F* T[9] F+ 52e F/ FDUP TO T[27] 1/F FDUP TO T[28] ; (* ************************************************* MFLOPS(2) output. This output does not include Module 2, but it still does 9.2% FDIV's. ************************************************* *) : MFLOPS(2) ( F: -- t ) T[2] T[9] F+ T[12] F+ T[15] F+ T[18] F+ T[21] 4e F* F+ 152e F/ FDUP TO T[29] 1/F FDUP TO T[30] ; (* ************************************************* MFLOPS(3) output. This output does not include Module 2, but it still does 3.4% FDIV's. ************************************************* *) : MFLOPS(3) ( F: -- t ) T[2] T[9] F+ T[12] F+ T[15] F+ T[18] F+ T[21] F+ T[24] F+ 146e F/ FDUP TO T[31] 1/F FDUP TO T[32] ; (* ************************************************* MFLOPS(4) output. This output does not include Module 2, and it does NO FDIV's. ************************************************* *) : MFLOPS(4) ( F: -- t ) T[9] T[12] F+ T[18] F+ T[24] F+ 91e F/ FDUP TO T[33] 1/F FDUP TO T[34] ; : MAIN ( -- ) PRECISION >R 4 SET-PRECISION CR ." FLOPS Forth Program (Double Precision), V2.0 14 Jan 2001" CR 1e6 loops S>F F/ TO scale loops TO m CR ." Module Error RunTime MFLOPS" CR ." (usec)" CR MODULE-1 MODULE-2 MODULE-3 MODULE-4 MODULE-5 MODULE-6 MODULE-7 MODULE-8 CR CR ." Iterations = " m EOL #10 .R CR ." NullTime (usec) = " nulltime EOL #10 F.R CR ." MFLOPS(1) = " MFLOPS(1) EOL #10 F.R CR ." MFLOPS(2) = " MFLOPS(2) EOL #10 F.R CR ." MFLOPS(3) = " MFLOPS(3) EOL #10 F.R CR ." MFLOPS(4) = " MFLOPS(4) EOL #10 F.R CR R> SET-PRECISION ; DOC (* FLOPS Forth Program (Double Precision), V2.0 14 Jan 2001 Module Error RunTime MFLOPS (usec) 1 5.6843E-13 0.5690 24.6046 2 -7.7639E-14 0.1927 36.3165 3 -9.5260E-15 0.9010 18.8679 4 3.9895E-14 0.3910 38.3632 5 2.4366E-14 1.3285 21.8291 6 -3.4120E-15 1.1142 26.0265 7 -5.1273E-11 1.0185 11.7820 8 3.0153E-14 1.1215 26.7499 Iterations = 4001600 NullTime (usec) = 0.0360 MFLOPS(1) = 27.8858 MFLOPS(2) = 18.1433 MFLOPS(3) = 22.6576 MFLOPS(4) = 25.7955 *) ENDDOC :ABOUT CR ." Try: MAIN " ; .ABOUT -flops CR (* End of Source *)