(* * LANGUAGE : ANS Forth * PROJECT : Forth Environments * DESCRIPTION : check list of URLs for new data * CATEGORY : Tool * AUTHOR : Marcel Hendrix * LAST CHANGE : August 18, 2002, Marcel Hendrix *) NEEDS -miscutil NEEDS -pipes NEEDS -assemble REVISION -scooter "ÄÄÄ Scooter - URL fetch Version 1.00 ÄÄÄ" PRIVATES DOC (* WHAT ---- This program uses the w3m netbrowser. Communication is through a pipe. The idea is to check if any of the URLs in the "saved-urls.dat" file have changed since last time we checked. This is done by computing the 32-bit CRC of these pages. You can check URLs in parallel with browsing the 'net in the "normal" way. BUGS ---- There is a large timeout when a page is off-line or not accessible. I don't see an easy fix for that. (It may be possible to change ACCUMULATE to have a time-out.) EXAMPLE OUTPUT -------------- FORTH> update-crcs 0 stable :: http://www.tim-mann.org/chess.html/index.html 1 stable :: http://home.iae.nl/users/mhx/index.html 2 stable :: http://www.bagley.org/~doug/shootout/index.html 3 stable :: http://www.cs.utk.edu/~ghenry/distrib/archive.htm 4 stable :: http://www98.phys.virginia.edu/classes/551.jvn.fall01/index.html 5 stable :: http://www.azillionmonkeys.com/qed/asm.html 6 stable :: http://www.cs.bell-labs.com/cm/cs/pearls/index.html 7 stable :: http://algoart.com/index.html 8 stable :: http://www-cs-faculty.stanford.edu/~knuth/index.html 9 stable :: http://home.hccnet.nl/a.w.m.van.der.horst/index.html 10 stable :: http://pweb.de.uu.net/schwalm.hb/index.html 11 stable :: http://www.jwdt.com/~paysan/bigforth.html 12 stable :: http://www.sunweb.ch/custom/epprecht/index.html 13 stable :: http://www.colorforth.com/index.html 14 stable :: http://www.fig-uk.org/codeindex/index.html 15 stable :: http://www.fig-uk.org/index.html 16 stable :: http://www.geocities.com/forthlinks/index.html 17 stable :: http://home.earthlink.net/~neilbawd/index.html 18 stable :: http://www.quartus.net/discus/index.html 19 stable :: http://www.forth.org.ru/index.html 20 stable :: http://dec.bournemouth.ac.uk/forth/index.html ok *) ENDDOC -- Tools ----------------------------------------------------------------- -- I forgot how to do this in high-level CODE crc32 ( n1 char -- n2 ) rpush, ebx pop, edx -> [esp] xchg, \ pop crc to edx 8 b# -> ecx mov, \ loop count @@1: edx shr, \ shift crc bh rcr, bl ror, \ shift character ebx -> eax mov, \ save character bh -> bl xor, @@2 offset SHORT jns, \ skip if equal $EDB88320 d# -> edx xor, \ crc-32 polymial 1 04C1 1DB7 @@2: eax -> ebx mov, \ restore character @@1 loop, \ next bit edx -> [esp] xchg, \ crc to tos rpop, ebx jmp, END-CODE PRIVATE \ calculate crc-32 of string : crc-32 ( c-addr u -- n2 ) -1 -ROT BOUNDS ?DO I C@ crc32 LOOP INVERT ;P : ACCUMULATE ( c-addr umax handle -- c-addr u ) 0 3 PICK LOCALS| start sz handle umax addr | BEGIN umax DUP WHILE addr SWAP handle READ-FILE ?FILE DUP WHILE DUP +TO sz addr umax ROT /STRING TO umax TO addr REPEATED DROP start sz ;P -- http access ----------------------------------------------------------- : GET-URL ( c-addr1 u1 -- c-addr2 u2 ) S" c:/w3m/w3m.exe -dump " 2SWAP $+ R/O OPEN-PIPE ?FILE >S PAD #100000 S ACCUMULATE ( -- c-addr u ) S> CLOSE-PIPE ?FILE ; : .URL ( c-addr u -- ) GET-URL CR TYPE CR ; : URL->CRC ( c-addr1 u1 -- c-addr2 u2 u3 ) GET-URL crc-32 ; -- A small URL database -------------------------------------------------- 0 VALUE rec# PRIVATE CREATE URL-DBASE PRIVATE HERE #100 CELLS DUP ALLOT ERASE : cleanup ( u -- ) DROP URL-DBASE rec# 0 ?DO @+ FREE ?ALLOCATE LOOP DROP CLEAR rec# URL-DBASE #100 CELLS ERASE ;P : ADD-URL ( c-addr u -- ) rec# 100 U>= ABORT" URL-DBASE full" DUP 2 CELLS + ALLOCATE ?ALLOCATE ( -- c-addr u addr ) CELL+ CELLPACK CELL- URL-DBASE rec# CELL[] ! 1 +TO rec# ;P ' cleanup IS-FORGET ADD-URL : ADD-CRC URL-DBASE []CELL @ ! ;P ( u rec# -- ) : ADD-DATA rec# >S ADD-URL S> ADD-CRC ; ( crc c-addr u -- ) : URL@ URL-DBASE []CELL @ CELL+ @+ ;P ( ix -- c-addr u ) : CRC@ URL-DBASE []CELL @ @ ;P ( ix -- u ) : job DROP TYPE ;P ( c-addr u ix -- ) : RECORD->FILE ( crc c-addr u1 handle -- ) >S ROT U>D #12 (UD.R) S~ S" ~ $+ 2SWAP $+ S~ " ADD-DATA~ $+ S> WRITE-LINE ?FILE ;P : SAVE-URLS ( -- ) S" saved-urls.dat" W/O BIN CREATE-FILE ?FILE LOCAL handle URL-DBASE rec# 0 ?DO @+ @+ SWAP @+ handle RECORD->FILE LOOP DROP handle CLOSE-FILE ?FILE ;P : RESTORE-URLS S" saved-urls.dat" INCLUDED ;P ( -- ) RESTORE-URLS : .URLS ( -- ) URL-DBASE rec# 0 ?DO @+ @+ SWAP @+ CR I 3 .R ." :: " TYPE C/L #20 - HTAB ." -- CRC = " H. LOOP DROP ; -- Vector this to what you want done when a page changes ----------------- DEFER DO-SOMETHING ( c-addr u index -- ) ' job IS DO-SOMETHING : UPDATE-CRCs ( -- ) URL-DBASE rec# 0 ?DO CR I 3 .R SPACE I URL@ TYPE @+ @+ >S @+ GET-URL 2DUP crc-32 DUP >S I ADD-CRC S> S> <> IF C/L #30 - HTAB ." :: URL contents have changed" I DO-SOMETHING ELSE C/L #30 - HTAB ." :: stable" 2DROP ENDIF LOOP DROP SAVE-URLS ; :ABOUT CR .~ Try: S" http://some_url.html" GET-URL ( -- c-addr u ) -- get URL content in a string~ CR .~ S" http://some_url.html" .URL -- print page content~ CR .~ S" http://some_url.html" URL->CRC ( -- u ) -- compute 32-bit CRC of page~ CR .~ crc S" http://some_url.html" ADD-DATA -- store crc entry in database file~ CR ." .URLS -- print all items in database" CR ." UPDATE-CRCs -- update CRCs and perform actions" ; .ABOUT -scooter CR DEPRIVE (* End of Source *)