FDScript Programming Guide

The most comprehensive way to use FramerD is through FDScript, the FramerD scripting language. FDScript is a dialect of the Scheme programming language with a number of special FramerD-related extensions as well as special extensions for text analysis, web scripting, and general operating system access.

This document describes how FDScript differs from and extends the Scheme standard. It also introduces the basic FDScript facilities for cool functions like text analysis and operating system access. It is not intended as a tutorial for Scheme programming; for learning Scheme, the schemers.org site provides many valuable resources.

What's Cool. FDScript includes a framework for building distributed applications, pervasive support for international programming (including text searching, matching, and processing) with Unicode, and language-level support for non-deterministic programming. This is in addition to operating system access functions, extensive tools for web scripting, general purpose text analysis tools, and specific tools for dealing with HTML, XML, and MIME documents.

Of course, the raison d'etre of FDScript is access to the persistent object and association databases maintained by FramerD. FDScript is used to implement shell and web access to FramerD databases as well as providing the basis for FramerD applications.

What's Missing. FDScript is a full implementation of R4RS scheme except for full continuations. In FDScript, it is only possible to return from a given procedure call once. In particular one cannot return from call-with-current-continuation more than once. With respect to the latest R5RS standard, FDScript is missing the standardized top level environments and the hygenic macro implementation. FDScript 2.0 does have an unhygienic macro facility.

This document is a manual for writing programs in FDScript; it assumes some familiarity with the Scheme language and is intended for use in conjunction with other FramerD documentation, especially the FramerD Concepts document.

Distributed Programming	FDScript allows programs and data to be distributed across processors and machines, by a special remote procedure call protocol which allows Scheme objects to be passed among multiple clients and servers.

Distributed Programming

FDScript allows programs and data to be distributed across processors and machines, by a special remote procedure call protocol which allows Scheme objects to be passed among multiple clients and servers.

Distributed programming in FDScript is organized around the notion of servers processing requests. Distributing a program across many machines consists of defining servers with different roles depending on capacities of machines, dependencies between services and data, and prosaic concerns of bandwidth and connectivity.

Every server on a machine has a particular address, called a port on which it listens for requests. Combining this port with the name of the machine defines a unique server id which other programs can use to access the server. The syntax port@host specifies a server, where the port can be:

an integer address, typically larger than 1000 but less than 100000
a service name, defined by an operating system database
a "touch tone encoded" integer address, where an integer address is encoded alphabetically

Each server is effectively a remote Scheme interpreter which provides some subset of the Scheme namespace augmented by whatever special procedures it defines. These special procedures are called operations, but it can be useful to think of them as procedures which happen to be executed remotely. We give a brief description of how to start your own server below, but a more detailed description can be found in Implementing DType servers.

There are numerous ways to use a remote server from FDScript. The most seamless method uses an expression of the form (USE-SERVER "service@host" op) returns a remote procedure whose application in the current machine invokes the remote operation op (which is typically a symbol) on the server listening for service requests for the host host. For example,

[fdscript] (define rplus (use-server "demos@framerd.org" '+))
[fdscript] (rplus 3 4 5)
12

uses an Internet connection to add three numbers. A more interesting use would be:

[fdscript] (define nlphrase (use-server "demos@framerd.org" 'nlphrase))
[fdscript] (nlphrase "This sentence starts with a T")
(#((#("This" DETERMINER "this") #("sentence" NOUN)) (#("begins" VERB "begin")) 
   (#("with" PREPOSITION) #("a" DETERMINER) #("T" NOMINALIZATION "t"))))

The infrastructure for remote evaluation can be directly accessed through the FDScript functions remote-eval and dtcall. The remote-eval procedure takes a Scheme expression and evaluates it on the remote server, e.g.

[fdscript] (dtype-eval '(if (even? (length (session-id))) 'even-id 'odd-id)
                       "demos@framerd.org")
EVEN-ID

the dtcall procedure takes a server id, a remote operation, and any number of arguments and applies the operation to the arguments remotely, for example

[fdscript] (dtcall "demos@framerd.org" nlphrase "Many sentences start with M")
(#((#("Many" DETERMINER "many") #("sentences" PLURAL-NOUN "sentence")) 
   (#("start" VERB)) (#("with" PREPOSITION) #("M" NOMINALIZATION "m"))))

The dtcall procedure differs from dtype-eval in that it evaluates its arguments locally, so that:

[fdscript] (define my-sentence "Many sentences start with M")
[fdscript] (dtcall "demos@framerd.org" nlparse my-sentence)
(#((#("Many" DETERMINER "many") #("sentences" PLURAL-NOUN "sentence")) 
   (#("start" VERB)) (#("with" PREPOSITION) #("M" NOMINALIZATION "m"))))

does what you would expect.

Remote processing with dtcall can also be initiated from your operating system's command line using the dtcall command; for example:

sh% dtcall demos@framerd.org nlphrase "Other sentences start with other letters"
(#((#("Other" DETERMINER "other") 
    #("sentences" PLURAL-NOUN "sentence")) 
   (#("start" VERB)) 
   (#("with" PREPOSITION) #("other" DETERMINER) 
    #("letters" NOUN "letter"))))

Starting a Server

Starting a server can be as simple as creating a server configuration file and calling the fdserver program on this file. A server configuration file is just a regular Scheme text file, with the suffix (type) .fdz, which may include some special function calls to configure the server.

;; This is the file myfact.fdz
(set-port-id! "fact")
(define (fact n)
  (define (fact-iter i f)
    (if (= i 0) f (fact-iter (- i 1) (* f i))))
  (fact-iter n 1))
(define (help)
  "This server provides an iterative factorial computation 
through the operation FACT")

given this definition, a `local' server can be started with simply the line:

sh% fdserver myfact.fdz ‐‐local &

The ‐‐local argument tells fdserver to run the server "locally", where it can be accessed by the current machine using the hostname localhost but not accessed from anywhere else. The ampersand (&) at the end of the line tells the computer to run the server in the background, so you can type other things at the command line.

Once the server has been started, it can be used remotely from fdscript:

Eval: (define rfact (use-server "fact@localhost" 'fact))
Eval: (rfact 33)
8683317618811886495518194401280000000

It can also be used from the command line (with dtcall):

[haase@buster docs]$ dtcall fact@localhost fact 10
3628800

If the program is started without the ‐‐local argument, as in:

sh% fdserver myfact.fdz &

it can be accessed from other machines with a server id of the form fact@hostname where hostname is the name of the machine running fdserver. As discussed at length in Running DType Servers, this means that any machine on the Internet can connect to the server (if they know about it), but there are numerous ways to restrict access.

FramerD servers are described in detail in the FramerD Server Guide

Choices: Non-Deterministic Values	FDScript allows values to be "non-deterministic", implicitly representing several possible results or outcomes. These values, called choices, simplify many programming patterns.

Choices: Non-Deterministic Values

FDScript allows values to be "non-deterministic", implicitly representing several possible results or outcomes. These values, called choices, simplify many programming patterns.

FDScript includes a novel facility for non-deterministic programming organized around a construct called the choice. A choice describes a set of values which may be any object except another choice. When FDScript encounters a choice, it automatically explores different possible outcomes based on each element of the choice. This makes it very simple to describe certain kinds of processes and operations by characterizing the inputs and outputs of procedures as choices rather than single values.

Choices in FDScript are descended from the AMB operator discussed by John McCarthy and various versions of this idea implemented by David Chapman, Ramin Zabih, David McAllester, and Jeff Siskind. They first entered FramerD in its predecessor language, Framer, as a way of regularizing functions involving multi-valued and single-valued slots.
Choices are distinct from the multiple values provided by Common Lisp and as specified by the R5RS Scheme standard. These facilities allow a procedure to return structured multiple values, where different value positions have different semantics (e.g. the first value might be an x coordinate and the second value might be a y coordinate). Choices in FDScript, on the other hand, represent an unstructured set of values.

Curly braces represent literal choices, so evaluating a choice between 3, 4, and 5 just returns a choice between the three numbers.

[fdscript] {3 4 5}
{3 4 5}

However, adding 10 to the set of choices returns a different set of choices:

[fdscript] (+ {3 4 5} 10)
{13 14 15}

while multiplying the set of choices by itself produces even more options:

[fdscript] (* {3 4 5} {3 4 5})
{9 12 15 16 15 20 25}

Whenever FDScript applies a procedure to a set of choices, it picks each of the choices, applies the procedure, and combines the results; thus, if we define SQUARE as:

(define (square x) (* x x))

and apply it to the same set of choices as above, we get only three choices back, since square is called three times on each single input and that single input is then multipled by itself:

[fdscript] (square {3 4 5})
{9 16 25}

When a procedure returns a non-deterministic value, we can apply another procedure to it, as in:

[fdscript] (+ (square {3 4 5}) 10)
{19 26 35}

Most FDScript procedures work in exactly this way when given non deterministic sets for arguments, passing on any non-determinism in their arguments to their results. However, some procedures work differently by either returning deterministic results for non-deterministic arguments into a single result or taking deterministic arguments and returning a set of choices (a non-deterministic result).

When a procedure returns a non-deterministic result consisting of one choice, that is the same as a deterministic result. This means that a regular procedure can return a deterministic result from non-deterministic argument, as in:

[fdscript] (square {3 -3})
9

Deterministic results from non-deterministic inputs

Built-in procedures for generating deterministic results from non-deterministic inputs include:

(pick-one set) randomly selects one of the choices in set
(choice->list set) returns the choices as a list of elements
(choice-size set) returns the number of choices in set

set

(empty? expr)
(fail? expr) returns true if evaluating expr returns no values
(exists? expr) returns true if evaluating expr returns any values at all
(contains? val expr) returns true the result of evaluating expr includes val

[fdscript] (PICK-ONE (CHOICE 2 3 4))
3
[fdscript] (PICK-ONE (CHOICE 2 3 4))
2
[fdscript] (CHOICE-SIZE (CHOICE 2 3 4))
3
[fdscript] (CHOICE-SIZE 8)
1
[fdscript] (CHOICE-SIZE {})
0
[fdscript] (FAIL? (CHOICE))
#T
[fdscript] (FAIL? 3)
#F
[fdscript] (EMPTY? (CHOICE 3 4))
#F
[fdscript] (DEFINE (EVEN? x) (if (zero? (remainder x 2)) x (CHOICE)))
[fdscript] (EXISTS? (CHOICE))
#F
[fdscript] (EXISTS? 3)
#T
[fdscript] (EXISTS? (even? (CHOICE 3 5 9)))
#F
[fdscript] (EXISTS? (even? (CHOICE 2 3 5 9)))
#T
[fdscript] (CONTAINS? 2 (CHOICE 2 3 4))
#t
[fdscript] (CONTAINS? 5 (CHOICE 2 3 4))
#f
[fdscript] (CONTAINS? 8 (+ (CHOICE 2 3 4) (CHOICE 4 5 6)))
#t

Non-deterministic results from deterministic inputs

Other built-in procedures generate non-deterministic results from deterministic arguments. The most basic such procedure is CHOICE, which returns its arguments non-deterministically, e.g.

[fdscript] (choice 3 4 5)
{3 4 5}
[fdscript] (+ (CHOICE 3 4 5) 10)
{13 14 15}

while another important one is ELTS which returns the elements of a sequence non-deterministically, e.g.:

[fdscript] (elts '(a b c))
{A B C}
[fdscript] (elts "def")
{#\d #\f #\e}

Failure and Pruning

A procedure can also return no choices at all. This "return value" is called a failure and is indicated by pair of empty curly braces "{}", e.g.

[fdscript] (CHOICE)
{}

when a procedure is called on a failure, the procedure itself returns a failure, so:

[fdscript] (+ (CHOICE 10 8) (CHOICE))
{}

This special result, indicating no returned choices, is called a failure because of the way that choices are used in searching by non-deterministic programming. If you think of a given procedure as doing some `search' given the constraints of its arguments, returning the empty choice can be considered as "failing" in the part of the search.

The early termination on failure is called "pruning." We say that the call to + was pruned because the second call to CHOICE failed. Note that if a subexpression fails in this way, none of the remaining arguments are evaluated, E.G.

[fdscript] (+ (CHOICE) (begin (lineout "last argument") 3))
{}

doesn't produce the output line `last argument' because the whole expression is pruned before the final form is evaluated.

Using Choices to Represent Sets

Non-deterministic return values can be used to represent sets, as in the following definition of set intersection, which specifies the base case and naturally generalizes:

[fdscript] (define (intersect x y) (if (equal? x y) x (fail)))
[fdscript] (intersect (CHOICE 3 4 5 6) (CHOICE 5 6 7 8))
{5 6}

We can see the value combination process in action by adding trace statements to the INTERSECT procedure, as in:

[fdscript] (define (intersect x y) 
            (lineout "INTERSECT " x " = " y " is " (equal? x y))
            (if (equal? x y) x {}))
[fdscript] (intersect (CHOICE 3 4 5) (CHOICE 5 6 7))
    INTERSECT 3 = 5 is #f 
    INTERSECT 3 = 6 is #f
    INTERSECT 3 = 7 is #f
    INTERSECT 4 = 5 is #f
    INTERSECT 4 = 6 is #f
    INTERSECT 4 = 7 is #f
    INTERSECT 5 = 5 is #t 
    INTERSECT 5 = 6 is #f
    INTERSECT 5 = 7 is #f
5

Of course, this is an inefficient way to compute intersections. FDScript provides a number of special forms for dealing with non-deterministic values, which we describe in the next section.

Combining Choices

There are a variety of FDScript special forms for dealing with non-deterministic values. They are called "special" forms because they do not follow the normal rules for non-deterministic procedure combination.

(INTERSECTION expr₁ expr₂)

evaluates expr₁ and expr₂ and returns only the values returned by both expressions. E.G.

[fdscript] (INTERSECTION {3 4 5} {2 4 6})
4

On very large choices, operations like intersection can be very time consuming. FDScript provides a special flavor of choice, the sorted choice which can be optimized for these sorts of operations. The function sorted-choice returns such a choice.

(UNION expr₁ expr₂)

evaluates expr₁ and expr₂ and returns the results from both. E.G.

[fdscript] (UNION {3 4 5} {2 4 6})
{2 3 4 5 6}

(DIFFERENCE expr₁ expr₂)

evaluates expr₁ and expr₂ and returns the results of expr₁ which are not returned by expr₂. E.G.

[fdscript] (DIFFERENCE {3 4 5} {2 4 6})
{3 5}

(TRY expr_i...)

Evaluates each expr_i in order, returning the first one which doesn't fail (e.g. which produces any values at all), E.G.

[fdscript] (TRY (INTERSECTION (CHOICE 3 4 5) (CHOICE 6 7 8)) ; This one fails
               (INTERSECTION (CHOICE 3 4 5) (CHOICE 1 2 3))  ; This one doesn't
               (INTERSECTION (CHOICE 3 4 5) (CHOICE 4 5 6))) ; This one doesn't get a chance
3

Pruning and Special Forms

You may have figured out that non-deterministic evaluation and pruning can't apply to the definitions above or else an expression like:

        (UNION (CHOICE) (CHOICE 3 4))

would automatically be pruned. Some other special forms also break the default rules for combination and pruning. For instance, the formatted output functions such as LINEOUT don't do automatic enumeration and pruning, so you get the following behavior:

[fdscript] (LINEOUT "This is empty: " (CHOICE) " but this isn't: " (CHOICE 2 3))
    This is empty: {} but this isn't: {2 3}

Choices and User Procedures

User procedures (like the procedure INTERSECT which we defined above) automatically invoke the interpreter's search and combination mechanisms. For instance, the following fragment generates possible sentences:

[fdscript] (DEFINE (sentence subject verb object) (list subject verb object))
[fdscript] (sentence (CHOICE "Moe" "Larry" "Curly") 
                  (CHOICE "hit" "kissed")
                  (CHOICE "Huey" "Dewey" "Louie"))
{("Moe" "hit" "Huey") ("Larry" "hit" "Huey") ("Curly" "hit" "Huey") 
            ("Moe" "kissed" "Huey") ("Larry" "kissed" "Huey") 
            ("Curly" "kissed" "Huey") ("Moe" "hit" "Dewey") 
            ("Larry" "hit" "Dewey") ("Curly" "hit" "Dewey") 
            ("Moe" "kissed" "Dewey") ("Larry" "kissed" "Dewey") 
            ("Curly" "kissed" "Dewey") ("Moe" "hit" "Louie") 
            ("Larry" "hit" "Louie") ("Curly" "hit" "Louie") 
            ("Moe" "kissed" "Louie") ("Larry" "kissed" "Louie") 
            ("Curly" "kissed" "Louie")}

The only caveat to the non-deterministic application of user procedures was mentioned above. If a user procedure takes a dotted or optional argument, the argument is bound to a list of the remaining choices rather than a choice among the lists that they would generate. So, this definition calls LINEOUT once on the choice {3 4}:

[fdscript] (define (list-choices . x) (lineout "Results are: " (car x)))
[fdscript] (list-choices (CHOICE 3 4))
    Results are: {3 4}

while this definition calls list-choices separately on the returned values:

[fdscript] (define (list-choices x) (lineout "Results are: " x))
[fdscript] (list-choices (CHOICE 3 4))
    Results are: 3
    Results are: 4

calls list-choices separately on the returned values.

The key point is that if a procedure is expecting a choice as an argument and needs the choice to remain a choice (rather than having its elements enumerated), the argument should be extracted from a "dotted" argument. For instance, suppose we wanted to define a function which returned twice the size of a choice, we might try to write it this way:

[fdscript] (DEFINE (BAD-DOUBLE-SIZE x) (* 2 (choice-size x)))
[fdscript] (BAD-DOUBLE-SIZE 3) ; <== this works fine
2
[fdscript] (BAD-DOUBLE-SIZE {3 4 5 6}) ; <== this doesn't
2

but that won't work because the x argument is bound to each of the numbers in the choice individually, rather than as an entire choice at once. A correct definition would be:

[fdscript] (DEFINE (DOUBLE-SIZE . ARGS) (* 2 (SET-SIZE (CAR ARGS))))
[fdscript] (DOUBLE-SIZE 3) ; <== this still works fine
2
[fdscript] (DOUBLE-SIZE {3 4 5 6}) ; <== and so does this...
8

Choices and variables

Choices can be stored and saved in a variety of ways. For instance, the special form SET! sets a variable to contain a set of possible values, so one can say:

[fdscript] (SET! small-primes (CHOICE 2 3 5 7 11 13 17 19))
[fdscript] (define (divides? x y) (if (zero? (remainder x y)) y {}))
[fdscript] (divides? 15 small-primes)
{3 5}

The SET+! adds a set of values non-deterministically to a variable. For example,

[fdscript] (SET! small-odd-numbers (CHOICE 1 3 5 7))
[fdscript] small-odd-numbers
{5 1 7 3}
[fdscript] (SET+! small-odd-numbers (CHOICE 9 17))
[fdscript] small-odd-numbers
{5 9 1 7 17 3}

The binding special forms LET and LET* can be used to store non-deterministic values in the same way as set!. E.G.

[fdscript] (define (divides? x y) (if (zero? (remainder x y)) y {}))
[fdscript] (let ((small-primes (CHOICE 2 3 5 7 11 13 17 19)))
            (divides? 15 small-primes))
{3 5}

(For old-time Scheme aficianados, this interpretation of LET breaks the equivalence of LET and LAMBDA, since writing the above as an application of a lambda would automatically iterate through the choices..)

Iterating over choices

Sometimes it is important to be able to process each element of a choice separately. FDScript provides three special forms supporting this kind of processing, DO-CHOICES, FOR-CHOICES, and FILTER-CHOICES:

(DO-CHOICES (var val-expr) expr₁ expr₂...)

Evaluates all of the expr_i with var bound to each of the values returned by val-expr. E.G.

[fdscript] (DO-CHOICES (x (CHOICE 3 4)) (lineout "I saw a " x))
    I saw a 4
    I saw a 3

DO-CHOICES can be used to unpack a set of values to pass to forms which don't automatically unpack their arguments (such as LINEOUT), as in the following definition which puts each of the values returned by FGET on a different line:

(define (print-slot-values frame slot)
  (let ((values (fget frame slot)))
    (lineout "The " slot " of " frame " is:")
    (do-choices (value values)
      (lineout "            " value))))

(FOR-CHOICES (var val-expr) expr₁ expr₂...)

Like DO-CHOICES, but combines the results of evaluating the last expr_i for each value, E.G.

[fdscript] (FOR-CHOICES (x (CHOICE 3 4 5 6)) (if (zero? (remainder x 2)) (+ x 3)))
{7 9}

(FILTER-CHOICES (var value-expr) test-expr_i...)

Evaluates value-expr and and binds var to each element and returning those elements for which every test-expr_i returns true given the binding. E.G.

[fdscript] (DEFINE (EVEN? x) (zero? (remainder x 2)))
[fdscript] (FILTER-CHOICES (num (CHOICE 1 2 3 4 5 6)) 
            (EVEN? x))
{2 4 6}
[fdscript] (FILTER-CHOICES (num (CHOICE 1 2 3 4 5 6)) 
            (EVEN? num)
            (< num 6))
{2 4}

Hashing Utilities	FDscript provides primitive hashtables and "hashsets" to support efficient operations with large heterogenous data sets. These are analogous to Perl's associative arrays or Python's dictionaries.

Hashing Utilities

FDscript provides primitive hashtables and "hashsets" to support efficient operations with large heterogenous data sets. These are analogous to Perl's associative arrays or Python's dictionaries.

FDScript provides fast implementations of sets and association tables using an internal hashing implementation. These functions are similar to those provided by other programming environments, so our descriptions here will be brief.

(make-hashtable): returns an empty hash table.
(hashtable-get hashtable key): gets the value(s) associated with key in hashtable.
(hashtable-add! hashtable key new): adds new to the values associated with key in hashtable.
(hashtable-set! hashtable key new): makes new be the only values associated with key in hashtable.
(hashtable-zap! hashtable key): removes any associations with key in hashtable.

For example, the following code stores the squares of the integers from 0 to 199 in a hashtable:

    [fdscript] (define square-table (make-hashtable))
    [fdscript] squares-table
    [#hashtable 0/19]
    [fdscript] (dotimes (i 200) (hashtable-add! square-table i (* i i)))
    [fdscript] squares-table
    [#hashtable 200/271]
    [fdscript] (hashtable-get square-table 20 #f)
    400
    [fdscript] (hashtable-zap! square-table 20)
    [fdscript] (hashtable-get square-table 20 #f)
    {}
    [fdscript] (hashtable-add! square-table 30 300) ; Not true!
    [fdscript] (hashtable-get square-table 30 #f)
    {900 300} ; < Note multiple values

FDScript also provides a "hashset" facility for maintaining large sets of objects with fast tests for membership

(make-hashset): returns an empty hashset.
(hashset-get hashset elt): returns true if elt is in hashset.
(hashset-add! hashset elt): adds elt to hashset.
(hashset-zap! hashset elt): removes elt from hashset.
(hashset-elts hashset): returns the elements of hashset as a non-deterministic set.

For example, the following code stores some number of primes in a hashset:

[fdscript] (define primes-table (make-hashset))
[fdscript] primes-table
[#hashset 0/19]
[fdscript] (hashset-add! primes-table (amb 1 2 3 5 7 11 13 17 19 23 29))
[fdscript] (hashset-get primes-table 15)
#F
[fdscript] (hashset-get primes-table 17)
#T
[fdscript] (hashset-get primes-table 2)
#T
[fdscript] (hashset-zap! primes-table 2)
[fdscript] (hashset-get primes-table 2)
#F
[fdscript] (hashset-elts primes-table)
{1 2 3 5 7 11 13 17 19 23 29}

In addition to hashing primitives, FDScript provides a generic ordering function for many lisp objects, which allows numbers, strings, symbols, pairs, vectors, etc to be placed in a "total order". This ordering is first based on types, with numbers being smaller than all other types and proceeding in order: numbers, characters, symbols, OIDs, strings, pairs, vectors, records, and slotmaps. Objects of the same time are ordered numerically, lexicographically (using Unicode character values), or recursively.

The generic ordering can be accessed through the primitives ANY<? and ANY>?, as in:

[fdscript] (any<? 33 44)
#t
[fdscript] (any<? 33 "forty-four")
#t
[fdscript] (any<? "thirty-three" 44)
#f
[fdscript] (any>? "thirty-three" 44)
#t
[fdscript] (any<? "thirty-three" "three hundred")
#t

The procedure SORTED takes a choice and returns a vector whose elements are sorted by the generic comparision function:

[fdscript] (sorted (choice "abc" "abd"))
#("abc" "abd")
[fdscript] (sorted (choice 110/17 1 2.3))
#(1 2.300000 110/17)

when SORTED is given a second argument, it is a procedure which is used as the key for sorting, for example

[fdscript] (sorted (choice '(3 . "three") '(28 . "twenty-eight")
                       '(3000000000 . "really big"))
	       car)
#((3 . "three") (28 . "twenty-eight") (3000000000 . "really big"))

Sequence functions	FDscript provides a number of generic "sequence" functions based on similar functions in Common Lisp. These functions operate on lists, vectors, strings, and packets, uniformly, attempting to reduce the cognitive overload of all these extra data types.

Sequence functions

FDscript provides a number of generic "sequence" functions based on similar functions in Common Lisp. These functions operate on lists, vectors, strings, and packets, uniformly, attempting to reduce the cognitive overload of all these extra data types.

Sequences are either lists, vectors, strings or packets. Generic functions on sequences include:

(elt sequence index): returns the indexth element of sequence. For strings, this will be a character, for packets, it will be an integer in the range 0-255, and for vectors and lists it could be any object. This procedure fails (returns the empty choice) if sequence has fewer than index elements.
(reverse sequence): returns a sequence of the same type with its elements in reverse order.
(length sequence): returns the number of elements in sequence
(find key sequence): returns an element of sequence which is EQUAL? to key or #F otherwise.
(position key sequence [start]): returns the position of the first element of sequence after start which is EQUAL? to key or #F otherwise. If start (an integer) is not provided, the absolute first occurence is returned.
(count key sequence): returns the number of elements of sequence which are EQUAL? to key.
(subseq sequence start [end]): returns the subsequence of sequence starting at start and ending at end (or the end of sequence if end is not specified).
(remove key sequence): returns a copy of sequence with all elements EQUAL? to key removed.
(search sub-sequence sequence [start]): returns an offset into sequence where sub-sequence starts, or #f otherwise. sequence and sub-sequence need not be the same type. If start is specified, the search starts at the offset start in sequence (but still returns an offset relative to the beginning of sequence).
(mismatch sequence1 sequence2 [start1] [start2]): returns the offset at which sequence1 and sequence2 begin to differ. If start1 and start2 are specified, they indicate starting places in sequence1 and sequence2 respectively.
(doseq (var sequence [index]) body...): evaluates body repeatedly with each element (in order) bound to var. If the variable index is provided, it is bound to the position in the sequence where the element is found.
(first sequence): returns the first element of sequence
(second sequence): returns the second element of sequence
(third sequence): returns the third element of sequence
(fourth sequence): returns the fourth element of sequence
(fifth sequence): returns the fifth element of sequence

Formatted Output with `PRINTOUT`	FDScript includes a formatted output library modelled (and named) after InterLisp's PRINTOUT. PRINTOUT can be used to create formatted messages for the user or to generate textual data files. The PRINTOUT model is also used by the HTML generation procedures in the `FDWWW` library.

FDScript provides a simple and elegant way of generating formatted output. Most other Lisp dialects provide FORMAT commands descended in spirit from Fortran's FORMAT directive. In FDScript, we instead take InterLisp's PRINTOUT expression and use it as our model. Each formatted output procedure takes an arbitrary number of arguments and evaluates each one. If it is string, it is output without enclosing quotes; if it is the void value (such as is returned by iteration functions), it does nothing; and for any other value, it calls the procedure WRITE to display it, which produces a LISP-like representation of the object. E.G.

The procedure PRINTOUT processes its arguments and sends the results to the standard output. The function LINEOUT does the same but appends a newline to the end of the output.

The procedure STRINGOUT does its output to a string and returns the result without doing any external output, E.G.

If one of the arguments to a PRINTOUT function is an iterative expression (like DOLIST) its arguments can call PRINTOUT themselves. Since the iteration expression returns void, only the generated output will be seen. E.G.

The procedure printout-to takes an initial argument of an output stream, followed by printout args. Generated output is sent to the designated stream. For example

Useful Input/Output Functions	FDScript provides a number of special functions for input and output. These include forms and procedures for binding the default input and output streams, working with "virtual streams" writing to strings, and doing binary input and output.

Useful Input/Output Functions

FDScript provides a number of special functions for input and output. These include forms and procedures for binding the default input and output streams, working with "virtual streams" writing to strings, and doing binary input and output.

FDScript implements Scheme ports as an input and output abstraction. The function open-input-file opens an external file for input; the function open-output-file opens an external file for output. The results of these functions can be used as second arguments to functions like write, display, and newline or as the first argument to printout-to.

The ports returned by these functions can also be made the default port for input or output. The form (WITH-INPUT port ...body...) evaluates body with a default input port of port. Similarly, the form (WITH-OUTPUT port ...body...) evaluates body with a default output port of port.

Variants of this function can take filenames as arguments and implicitly open an input or output file. The form (WITH-INPUT-FROM-FILE filename ...body...) evaluates body with a default input port reading data from filename. Similarly, the form (WITH-OUTPUT-TO-FILE filename ...body...) evaluates body with a default output port writing data to filename.

In addition to file ports, string ports allow programs to read from and write to strings. A string input port reads from a literal string as though it were a file; a string output port accumulates its output in a string which can be extracted along the way. The function (open-string-input-stream string) opens a string input port for reading, e.g.

(define p1 (open-string-input-port "(first) (second)"))
(read p1)
(first)
(read p1)
(second)
(read p1)
#EOF

while the form (open-string-output-stream) creates a stream for output whose "output thus far" can be extracted with STRING-STREAM-CONTENTS, e.g.

(define p2 (open-string-output-stream))
(write '(first) p2)
(write '(second) p2)
(string-stream-contents p2)
"(FIRST)(SECOND)"

String streams can also be used implicitly with the form (WITH-OUTPUT-TO-STRING ...body...) which evaluates body with output going (by default) to a string whose value is returned. Thus, we can say:

(with-output-to-string (write '(first)) (write '(second)))
"(FIRST)(SECOND)"

or with the form (WITH-INPUT-FROM-STRING string ...body...) which evaluates forms given default input from the string string, e.g.

(with-input-from-string "33+5i 44.5"
  (list (read) (read)))
(33+5i 44.5)

Binary I/O

A binary input or output file can be opened by using the fopen function with a "b" mode to get an input or output port. The functions read-byte and write-byte will read integer-valued bytes from such streams.

The function write-data can be used to write a packet to a file or output stream. (write-data packet stream-or-filename) writes the bytes in a packet directly to the output stream.

DTypes can be written to binary output ports with the function write-dtype and read with the function read-dtype.

An object's DTYPE representation can be written to a file with write-dtype-to-file; a DTYPE representation for an object can be added to the end of a file with the function add-dtype-to-file. These can be used together with read-dtype-from-file to accumulate a set of objects in a file.

DTypes can also be written to packets with the function write-dtype-to-packet and read from packets with the function read-dtype-from-packet. For example,

[fdscript] (write-dtype-to-packet "foo")
[#PACKET 8 0x0600000003666f6f]
[fdscript] (write-dtype-to-packet "f�b")
[#PACKET 9 0x400206006600f60062]
[fdscript] (write-dtype-to-packet 88)
[#PACKET 5 0x0300000058]

Direct binary I/O is possible with four functions:

read-byte: Reads a single byte from the stream as an integer between 0 and 255

Operating System Functions	FDScript provides a variety of functions for interacting with the host operating system. These can be useful in the construction of system utilities and in connecting systems of description to the systems they are describing.

Operating System Functions

FDScript provides a variety of functions for interacting with the host operating system. These can be useful in the construction of system utilities and in connecting systems of description to the systems they are describing.

FDScript also provides a number of functions for accessing operating system functions. These are useful for tracking resources, converting non-FramerD data into FramerD data, and other operations.

Environment access

(getenv var)

looks up the value associated with the string var in the following places:

the global FDScript environment (potentially modified by configuration files or profiles)

(under WIN32) the Windows Registry, under the key "Software\\FramerD\\environment\\var" beneath both the user and local machine roots
through the ANSI standard function getenv on the variable var

For example,

    [fdscript] (fdgetenv "USER")
    "haase"
    [fdscript] (getenv "SUPER_POOL")
    "/usr/local/share/framerd/super-pool"

(getenv var)

uses the C library function getenv to get the value of the environment variable var, e.g.

    [fdscript] (cgetenv "TERM")
    "VT100"

(timestring)

returns a string representing the current time, e.g.

    [fdscript] (timestring)
    "15:45"

(session-id)

returns a string representing the current FDScript session, e.g.

    [fdscript] (session-id)
    "framerd: haase@eliza.media.mit.edu OS:Digital Unix Release:Jan 24 1997 Fri Jan 24 23:50:03 1997"

(system printout-args...)

Combines printout-args to make a command line which it passes to the default command intepreter. For example,

[fdscript] (define filename "badfile")
[fdscript] (system "rm " filename)
0
[fdscript] (system "rm " filename) ; Already gone
rm: cannot remove `foobar': No such file or directory
256

(CD dir)
(CWD dir)

changes the current working directory to be dir.

Exploring the Filesystem

FDScript uses strings to represent files and directories in the file system. The file system can be explored by the functions GETFILES and GETDIRS. GETFILES takes a directory name and returns all of the files it contains; GETDIRS also takes a directory name but returns all of the subdirectories it contains. The following procedure gets all of the files recursively underneath a particular directory, taking advantage of getfiles, getdirs, and FDScript's automatic non-determinism:

(define (allfiles dir)
  (choice (getfiles dir)
          (allfiles (getdirs dir))))

These predicates can be applied to give information about a file given its name:

(file-exists? filename) returns true if filename exists
(file-writable? filename) returns true if filename can be modified
(directory? filename) returns true if filename is a directory
(symbolic-link? filename) returns true if filename is a symbolic link
(regular-file? filename) returns true if filename is a regular file (not a directory or a symbolic link)

The following functions can be applied to pathnames to generate other pathnames or components of pathnames:

(fullname path) returns a complete pathname (based at the file system root) given a relative pathname.
(basename path) returns a the final part of a pathname, with the directory component removed.
(dirname path) returns the initial part of a pathname, just the directory.
(readlink path) returns the target of a link or the file itself otherwise.

Other information about particular files can be determined with these functions:

(file-size filename) returns the size (in bytes) of a regular file
(file-access-time filename) returns the last time a file was accessed, as a timestamp object
(file-creation-time filename) returns the time at which a file was created, as a timestamp object
(file-modification-time filename) returns the last time at which a file was modified, as a timestamp object
(file-size filename) returns the number of bytes comprising a file
(file-owner filename) returns a string describing the owner of filename

The predicate (FILE-OLDER? file1 file2) returns true if file1 is older than file2.

User-specific Information

(get-user-data)
(get-user-data username)
(get-user-data numeric-userid) returns information about a specified user, defaulting to the current user.

[fdscript] (get-user-data)
#[UID 31406
  GID 501
  UNAME "haase"
  TEXT-DATA "Kenneth Haase"
  HOMEDIR "/local/haase"
  SHELL "/bin/bash"]
[fdscript] (get-user-data "root")
#[UID 0 GID 0 UNAME "root" TEXT-DATA "root" HOMEDIR "/root" SHELL "/bin/bash"]
[fdscript] (get-user-data 0)
#[UID 0 GID 0 UNAME "root" TEXT-DATA "root" HOMEDIR "/root" SHELL "/bin/bash"]

(get-homedir) returns the absolute pathname of the current user's home directory.
```
[fdscript] (get-homedir)
"/local/haase"
```

Accessing the Web

FDScript also has builtin functions for getting documents from the World Wide Web. The function URLSTRING returns the content of a remote URL as a string, trying to use any information the server sends about character set and encoding. This will signal an error if the retrieved object does not have a MIME text type.

The function URLGET is more general and returns a slotmap describing a generalized mime type document parsed by FramerD's internal MIME parser.

Functions Dealing with Time

The basic time structure in FramerD is the timestamp which comes in two flavors: simple timestamps which represent moments with a precision of seconds and complex timestamps which representation moments with varying degrees of precision (days, seconds, milliseconds, microseconds, etc) and also carry timezone information.

(timestamp)
(timestamp string)
(timestamp string timezone)
(timestamp timestamp timezone)
Returns a timestamp object. Without an argument or with #f as an argument, the timestamp describes the current moment; with an argument, the string is parsed as an ISO-8601 formatted time, e.g. 1990-01-20T15:00:00-5:00 describes the 20th of January, 1990 at 3pm in the afternoon (Easter Standard Time) while 1990-01-20T20:00:00GMT describes the same moment in Greenwich Mean Time. When the timezone argument is provided it either changes the timezone of the first argument (keeping the moment the same) or is used in interpreting it. For example, (timestamp "199O-01-20T15:00:00EST" "GMT") would return a timestamp which prints out as 1990-01-20T20:00:00UTC.
(xtimestamp)
(xtimestamp precision)
(xtimestamp timestamp precision)
Returns a timestamp object with a particular precision. With no arguments, it returns a timestamp with the greatest possible precision; with one argument, it returns a timestamp with a particular precision (providing that timestamp is Precision can be a symbol year, month, day, hour, minute, second, millisecond, microsecond, or nanosecond. For example, (xtimestamp #f 'millisecond) returns something like .
(get-month) (get-month timestamp)returns a symbol denoting the current month or the month of a particular timestamp (in the local timezone), e.g. (get-month) ==> MARCH.
(get-year) Gets the current year (AD), e.g. (get-year) ==> 1997
(get-hour) Gets the current hour (hours since midnight), e.g. (get-hour) ==> 14
(get-season) Gets the current season, being ambiguous on the edges, e.g. (get-season) ==> {winter spring}
(get-day) returns a symbol describing the current day of the week, e.g. (get-day) ==> THURSDAY
(get-daytime) Returns a symbol describing the current time of day, being ambiguous on the edges, e.g. (get-daytime) ==> {afternoon evening}

Accessing the WIN32 Registry

Under WIN32, FDScript also provides some access to the Windows Registry. The registry can be used to store fixnums, strings, lists of strings, and binary data packets. The functions to use are:

(registry-get path entry)

looks up the value associated with the string entry under the registry key path. This does a search which first looks in the "Current User" tree and then looks in the "Local Machine" tree, returning the first branch to have a matching entry. It returns the empty set if the entry is not defined. E.G.

    [fdscript] (registry-get "Software\\MUSOFT\\framerd" "super-pool")
    xxx

(registry-set! path entry value)

changes the value associated with the string entry under the registry key path. This does a search which first looks in the "Current User" tree and then looks in the "Local Machine" tree, changing the first branch to have a matching entry. It is created in the user tree if the entry is not defined in either entry. E.G.

[fdscript] (registry-set! "Software\\MUSOFT\\etc" "birthday" 22197)
#t
[fdscript] (registry-get "Software\\MUSOFT\\etc" "birthday")
22197

These functions can be combined with the functions write-dtype-to-packet and read-dtype-from-packet to store arbitrary LISP objects in the registry, e.g.

[fdscript] (registry-set! "Software\\MUSOFT\\etc" "example"
              (write-dtype-to-packet '("example" 1)))
#t
[fdscript] (read-dtype-from-packet (registry-get "Software\\MUSOFT\\etc" "example"))
("example" 1)

Counting Resources

The procedure (resources) returns a slotmap containing various implementation-dependent resource information, e.g.

[fdscript] (resources)
#[MEMORY: 688 SWAPS: 0 USER-USECS: 57584 SYSTEM-USECS: 103456
  CONSES: 746 MALLOCD: 264 CONS-MEMORY: 12232 REFERENCED-OIDS: 0
  LOADED-OIDS: 0]

The function GET can be used to extract fields from a slotmap, E.G.

[fdscript] (get (resources) 'cons-memory)
167218

The (clock) function returns the number of microseconds of processing time expended since the first time clock was called:

[fdscript] (clock)
0
[fdscript] (clock)
1652000

The (memusage) function returns the number of KBytes of memory being used by the data of the current process. This is based on the operating system's accounting.

The (consusage) function returns the number of bytes of memory being used by the current process. This uses FramerD's own accounting methods rather than the operating systems and also leaves out conses which have been allocated but are not currently being used.

Accessing and modifying configuration information

FramerD installations and applications are customized by configuration files loaded when the installations or applications start up. A configuration file is a set of variable bindings which are established when the configuration file is loaded. No expressions are evaluated (which makes them somewhat safer), but the configuration file can define or redefine default values as well as adding values to variables which are already defined (potentially making them into choices).

Every FramerD application loads the "system configuration" file; interactive applications generally also load a "user profile" containing user specific information. In addition, the executables fdscript and fdserver both take arguments of the form ‐‐config=file to specify additional configuration files. These files can be manipulated from the system command line with the scripts fdconfig, fdprofile, and fdcfg as described in the user's guide In addition, configuration files can be modified from the evaluator by several primitives.

(CONFIG-SET! file var val) sets var in file to have the value val.

(CONFIG-ADD! file var val) adds the value val to the binding of var specified in file. If file already defines var, val is just added to the values there. If file doesn't define var, val is added as an augmentation, so that it will be added to any existing value when the config file is loaded.

(CONFIG-RESET! file var) removes any values associated with var in file.

Internationalization

FramerD and FDScript both use Unicode internally to represent characters, strings, and symbols. This means that programs and data can include characters from hundreds of national languages at the same time. Thus a FramerD frame can have one slot containing data as greek characters, another containing different data in Japanese Kanji, and yet another slot in the Korean Hangul character set.

All of the string and character functions work with Unicode strings, as in:

[fdscript] (subseq "�tes-vous parl� fran�ais?" 0 4)
"�tes"
[fdscript] (position #\� "�tes-vous parl� fran�ais?")
20
[fdscript] (string-upcase "�tes-vous parl� fran�ais?")
"�TES-VOUS PARL� FRAN�AIS?"

Unmarked versions of characters can be extracted with the functions CHAR-BASE CHAR-LOWER-BASE:

[fdscript] (char-base #\�)
#\c
[fdscript] (char-lower-base #\�)
#\e

Similar functions existing for strings

[fdscript] (string-base  "�tes-vous parl� Fran�ais?")
"Etes-vous parle Francais?"
[fdscript] (string-lower-base  "�tes-vous parl� Fran�ais?")
"etes-vous parle francais?"

permitting canonicalization of strings from various languages. However, the result is not guaranteed to be an ASCII string, as in:

[fdscript] (string-base "I hope to d�ss at the Schlo�")
"I hope to doss at the Schlo�"

FDScript also supports a diversity of external character encodings, allowing it to read and emit data in many different character sets. A character set is a mapping from some external character encoding into the Unicode representation used by FramerD. The contents of a file with a particular encoding can be converted into a string with the function FILESTRING whose second argument specifies the encoding. For example,

(filestring "xx.txt" "latin-1")

(filestring "john1.txt" "latin-7")

A packet (a byte vector) can be converted into a string by the function packet->string and converted back by the function string->packet, both of which require a character set specifier. For example, using the FILEDATA function to get the above file as a packet:

(packet->string (filedata "sassure1.txt"))

(string->packet "")

(equal? (string->packet "") "")
#t

The encoding of a program source file can be specified in several ways:

the second argument to load can specify a character set, e.g. (load "zh-parser.scm" "BIG5")
the file can call the function set-file-encoding! to change the encoding being used for the current file being loaded, e.g (set-file-encoding! "koi8")
the head of the file can include a special line of the form -*- text-encoding: latin-2 -*-

If the encoding of a file is not specified, a default encoding is used. This default encoding can be set in numerous ways:

after an application has started, the function set-default-encoding! can be used, as in (set-default-encoding! "latin-1")
when an fdscript listener is started, a command line option such as ‐‐charset=koi8 can be specified;
before the application launches, the environment or config variable CHARSET can be set

This default encoding is also used for interactions with the console, unless it is overridden by the function set-console-encoding!.

Regardless of the character encoding in force, unicode characters can always be entered as unicode escapes (modelled on Java) of the form \uxxxx or \Uxxxxxxxx (differing in case). Where xxxx or xxxxxxxx are the hexadecimal codes for the corresponding unicode characters. The interpretation of these escape sequences happens at a very low level, so they keep their syntactic character. Thus, the following will be parsed as a string:

[fdscript] \u0022foo\u0022
"foo"

The character sets built into FramerD include all of the ISO-8859 character sets as well as the KOI-8 character set for the Russian language. In addition, FramerD is able to read the mapping files provided by the Unicode consortium. These files can be found at ftp://ftp.unicode.org/pub/mappings/ and installed with .

Multi-Threaded Programming	FDScript experimentally provides facilities for multi-threaded programming. These include procedures for starting parallel threads of computation and for synchronizing access to shared resources.

Multi-Threaded Programming

FDScript experimentally provides facilities for multi-threaded programming. These include procedures for starting parallel threads of computation and for synchronizing access to shared resources.

On some platforms, FDScript provides support for the implementation of multi-threaded applications. Multi-threaded applications can do many things at once, proceeding with one task while blocked on another. On machines with multiple processors, different tasks can be divided among the different processors, possibly leading to performance improvements over performing all of the tasks on a single processor.

The support for multi-threaded programming in FDScript is provisional. The chief constructs for starting multiple independent threads are PARALLEL and SPAWN.

(parallel expr_i...): Evaluates each expr_i in a separate thread, combining the returned result choices into a single set of choices. In the absence of side effects (including I/O), this is just equivalent to AMB.
(spawn expr_i...): Evaluates each expr_i in a separate thread, but returns immediately and discards any results returned by the individual expressions.
(make-mutex): Returns a "mutex object" which can be used to make sure that separate threads do not interfere when accessing shared resources.
(with-mutex-locked mutex-expr expr_i....): Evaluates mutex-expr and then evaluates each of the expr_i.... expressions while guaranteeing that no other thread will evaluate a with-mutex-locked expression referring to the same value of mutex-expr.

Synchronized Procedures

FDScript also provides synchronized procedures. A procedure returned by SLAMBDA (which is syntatically identical to LAMBDA) or defined by SDEFINE (which is syntactically identical to DEFINE) is guaranteed to be running in only one thread at any moment.

For example, the following server initialization (.fdz) file uses a synchronized lambda to control writing to a data file even when running on a multi-threaded server (by default, FramerD servers are multi-threaded on platforms where configure can figure out how to compile them thus).

;; This is the file fdlog.fdz
(sdefine (log x)
  (add-dtype-to-file x "log.dtype"))

This is also an example of a "safe" wrapper around a potentially dangerous function (add-dtype-to-file). External clients can call the defined log procedure, but cannot call add-dtype-to-file directly (which writes to the local filesystem).

Working with Subjobs	FDScript programs can run other programs as subjobs and read and write input from those subjobs.

A subjob is a separate process from the FDScript interpreter with which the interpreter interacts. Subjobs can be local subjobs (started as programs on the same machine as the interpreter) or remote subjobs (started by connecting to a remote socket across the Internet). Both of these are called subjobs because the FDScript process may send output to and read input from them.

The simplest sort of subjob is started with the SYSTEM procedure, which executes a command on the local operating system. It takes no input (other than its command line) and its output is just sent to the console directly. The call to SYSTEM waits until the external program is done and then returns the exit code of the program.

The SYSTEM procedure takes an argument list like those passed to PRINTOUT and uses them to construct a command line. For example:

[fdscript] (define filename "test.fdx")
[fdscript] (system "chmod a+x " filename)
1

The OPEN-PROCESS procedure starts a parallel subprocess. It's first argument is the program to start and its remaining arguments are converted into strings and passed to the program. OPEN-PROCESS starts the subprocess and immediately returns a subjob which FDScript process can interact with. This interaction occurs through regular I/O function addressed to particular ports associated with the process.

(SUBJOB-INPUT subjob) returns an output port which can be used to send output to the subjob. (SUBJOB-OUTPUT subjob) returns an input port which can be used to read the output of the subjob. Error messages from subjobs started by OPEN-PROCESS are sent to the console.

The procedure OPEN-PROCESS-E is just like OPEN-PROCESS but uses its initial argument to specify where error messages from the process should be sent. If this first argument is a string, the error messages are sent to the file named by the string; if the first argument is false #F, errors are sent to a special stream which can be retrieved by the SUBJOB-ERRORS accessor. If the first argument is anything else, errors are just sent to the console.

For example, this interaction shows FDScript using an inferior FDScript process to evaluate expressions:

[fdscript] (define xx (open-process "fdscript" "-"))
;; Nothing (void) was returned
;; Values changed (1): XX
[fdscript] (printout-to (subjob-input xx) '(+ 2 3 (* 4 5)) "\n")
;; Nothing (void) was returned
[fdscript] (readline (subjob-output xx))
"25"

The accessor SUBJOB-PID returns the process ID of a created subjob. The procedure SUBJOB-CLOSE terminates a running subjob; it's second argument, when provided, indicates a signal with which the subjob will be closed via the kill() function.

The procedure OPEN-SOCKET opens a TCP stream connection to a designated port on a particular host and returns a subjob structure interacting with that remote connection. The first argument identifies the port on the remote server: it can be either an integer, a service name, or a touch-tone encoded port number. The second argument should be a hostname. OPEN-SOCKET returns a subjob object on which the SUBJOB-INPUT and SUBJOB-OUTPUT accessors will work. There is no SUBJOB-ERRORS for remote subjobs. SUBJOB-CLOSE works on remote subjobs by closing the stream connection to the remote server.

For example, the following fragment accesses the FramerD web server:

[fdscript] (define sock (open-socket "framerd.org" "http"))
;; Nothing (void) was returned
;; Values changed (1): SOCK
[fdscript] (printout-to (subjob-input sock) "GET /\n")
;; Nothing (void) was returned
[fdscript] (flush-output (subjob-input sock))
#t
[fdscript] (readline (subjob-output sock))
""
[fdscript] (readline (subjob-output sock))
"FramerD"
[fdscript] (readline (subjob-output sock))
""

Error Handling	Since the world is an uncertain place, programs can often encounter unexpected conditions and situations. One tool for building robust but understandable programs is to separate out the routine execution of procedures from the handling of unexpected conditions. FDScript has several tools for supporting this sort of horizontal modularization.

Error Handling

Since the world is an uncertain place, programs can often encounter unexpected conditions and situations. One tool for building robust but understandable programs is to separate out the routine execution of procedures from the handling of unexpected conditions. FDScript has several tools for supporting this sort of horizontal modularization.

The FDScript error model is based on the idea of user procedures or primitives raising exceptions to indicate an unexpected condition. In the current model, there is no way to handle the error where it occurred (by, for instance, trying an operation again). Instead, programs can set up contexts for catching and handling these errors.

The easiest way to catch errors is with the procedure SIGNALS-ERROR? which takes a single argument. The function returns false (#F) if the argument was evaluated without raising any exceptions (and thus discards the return value); otherwise, the function returns an error object describing the signalled error. For example,

[fdscript] (signals-error? (+ 2 3))
#f
[fdscript] (signals-error? (+ 2 'a))
[#ERROR ("Type Error" "+: not an integer" A)]

The error object, which may also be commonly returned by remote function evaluations, can be tested for with the predicate ERROR? and its components can be accessed with the primitives ERROR-EXCEPTION, ERROR-DETAILS, and ERROR-IRRITANT. E.G.

[fdscript] (define errobj (signals-error? (+ 2 'a)))
#f
[fdscript] errobj
[#ERROR ("Type Error" "+: not an integer" A)]
[fdscript] (error? errobj)
#T
[fdscript] (error-exception errobj)
"Type Error"
[fdscript] (error-details errobj)
"+: not an integer"
[fdscript](error-irritant errobj)
A

The return value from normal evaluation is accessible by using SIGNALS-ERROR+?, which returns multiple values (not choices) indicating the values returned the evaluation. E.G.

[fdscript] (signals-error+? (+ 2 3))
#f
;;+1: 5

These additional values can be accessed using multiple-value-bind, as in:

[fdscript] (define (test-eval expr)
             (multiple-value-bind (error? result) (signals-error+? (eval expr))
               (if error? (lineout "Evaluating " expr " signalled " error?)
                   (lineout "Evaluating " expr " returned " result))))
[fdscript] (test-eval '(+ 2 3))
Evaluating (+ 2 3) returned 5
[fdscript] (test-eval '(+ 2 a))
Evaluating (+ 2 A) signalled [#ERROR ("Variable is unbound" "EVAL" A)]
[fdscript] (test-eval '(+ 2 'a))
Evaluating (+ 2 'A) signalled [#ERROR ("Type Error" "+: not an integer" A)]

More sophisticated processing can be done with the special form ON-ERROR which evaluates its first argument and returns its value if no exceptions were raised. If exceptions were raised however, the remaining expressions in the ON-ERROR form are evaluated in an environment with the following bindings:

EXCEPTION: a string identifying the signalled error;
EXCEPTION-DETAILS: a string providing additional information about the error (for instance a filename)
IRRITANT: the lisp object whose character caused the error; for instance, the object which happens to be the wrong type for an operation;
BACKTRACE: a string containing the backtrace of program execution, which may be quite long, but can be parsed to extract call context information

Another option, between these two possibilities, is the CATCH-ERRORS procedure which evaluates its body and returns the result of the final expression. If any exceptions are raised during the execution of the body, the CATCH-ERRORS form returns an error object describing the raised exception, its details, and the irritant.

User FDScript code can signal an error with the form RAISE-EXCEPTION. It takes one to three arguments: an exception name (a string or symbol), a details description (a string), and an irritant (a lisp object).

Programming in the Large	FDScript has a variety of functions to support programming in the large. These include a module system and various file loading routines to support the development and packaging of libraries. The module system allows the organization of programs into different non-conflicting namespaces, with explicit interfaces between them.

Programming in the Large

FDScript has a variety of functions to support programming in the large. These include a module system and various file loading routines to support the development and packaging of libraries. The module system allows the organization of programs into different non-conflicting namespaces, with explicit interfaces between them.

FDScript provides a simple module system for organizing programs into distinct namespaces with designated interfaces between them. The advantage of this organization is that the implementations of shared libraries or utilities do not need to worry about name conflicts between their internal functions. A module A can define a function initialize (for instance) without worrying about conflicts with a different initialize function in module B.

Modules must explicitly export variable bindings to make them visible to other modules; those other modules must also explicitly use the other module to get access to their exported variables. These two relationships are the keys to the module system.

There are two broad classes of modules: unregistered modules are bound to variables in some local environment; registered modules are maintained in two global registries distingiushed based on whether the module is judged "safe" (does not access readily abused system functions for file or network access) or "enabled". The safe modules generally provide language extensions that build on the core Scheme and FramerD functions; the enabled modules generally provide additional functionality for accessing the file system, network, or subprocesses.

Registered modules are generally referred to by symbols, possibly including slashes to indicate a module hierarchy. An interactive user or program file can arrange to use the bindings of a module by calling the USE-MODULE procedure. Its argument should evaluate to either a module object or a symbol. If it is a symbol, the corresponding module is retrieved from the appropriate global registry(ies).

If a named module has not been registered, FDScript will look for a file which implements it. For a module named module, it looks for paths of any of the forms:

dir/module.fdx
dir/module.so (under Unix)
dir/module.dll (under WIN32)
dir/module/module.fdx

where dir can be replaced with each of the paths in the list of pathnames bound to MYFDPATH and then with each of the list of pathnames bound to %FDPATH. The %FDPATH variable is typically defined in the configuration file set up when FramerD was installed. The default directory on this list can also be revealed by the command fdxs modules.

In any of the above cases, the current environment is changed to inherit bindings from the specified module. This means that subsequent expressions and definitions will be able to access the bindings of the specified module.

Making Modules

From FDScript itself, a program file can specify its module with the special form (in-module module_name). If the first parameter is a simple symbol, an unregistered module is created and the variable module_name is bound to that module in two environments: the environment where in-module was called and the newly created environment, which is made current for the rest of the program file. If the first parameter is a more complex expression, it can either evaluate directly to a module (in which case that module is made current and subsequent expressions will be evaluated in and modify it) or it can evaluate to a symbol, denoting a registered module.

This is the most common case, where evaluating the parameter yields a symbol (often the parameter is simply a quoted symbol). In this case, in-module does one of two things:

if module_name is a registered module, it makes it current and evaluates the rest of the file inside that module;
if module_name is not a registered module, it creates a new module, registers it (in the enabled module registry) and switches to it;

The special form in-safe-module works just like in-module but will only search the "safe" module registry and will only create a new module in that registry.

Both in-module and in-safe-module take an optional second argument specifying the other modules which the designated module should use (as above). This is a choice of either direct module pointers or symbols designating registered modules. The symbol SAFE has a special semantics which causes any newly created module to only have access to the "safe" system functions which don't touch the file system or open new network connections.

Within a module, symbols are exported by the special form module-export!, e.g.

(module-export! 'whois)

exports the symbol whois from the current module. The argument to module-export! can be a choice, as in:

(module-export! '{whois whereis})

Other ways to make modules

Modules can also be created by the expressions STANDARD-MODULE and SAFE-MODULE, each of which takes an arbitrary number of expressions and evaluates them in a newly-minted module, which is finally returned by the expression. STANDARD-MODULE creates a module which has access to all of the FDScript functions. SAFE-MODULE creates a module which is unable to access "risky functions" which access the local file system, make new network connections, or change the active configuration.

The standard FDScript environment consists of the following namespaces:

a global namespace containing most FDScript functions
a "restricted" module containing functions for accessing the local file system, running system functions, making network connections, and configuring FramerD database access.
an "osprims" module containing functions for many common sorts of operating system access
an "fdinternals" module containing less common functions for getting at OIDs, pools, and their values
an "fdmaint" module containing functions for maintaining pools and indices
an "fdtext" module containing functions for text matching, searching, parsing, and other operations.
an "xmlgen" module containing functions for generating XML documents
a "htmlgen" module containing functions for generating HTML documents

The startup environment for FDScript uses the restricted module and the text module; the html or xml generation modules can be included by saying (use-module 'htmlgen) (use-module 'xmlgen) respectively. The startup environment for the fdcgi executable automatically uses the HTMLGEN and XMLGEN generation module.

The module structure is used as security mechanism for FramerD servers. The server startup file is loaded into its own module which directly uses the restricted and text modules as well as a special module of server functions. Connections to the server are each given their own environment, each of which uses the module created at startup but does not use any other modules. In particular, this means that the startup module (defined by the .fdz file) can use restricted functions but that remote clients cannot call these functions directly.

Loading Functions

FDScript provides some useful loading functions for writing portable programs divided into multiple pieces. These can also be especially useful in the module.fdx files which may implement a singled module composed of multiple source files.

The function LOAD-LIBRARY is just like LOAD but searches along the variable FDPATH for any relative paths. For example, if FDPATH were the list
("/usr/local/share/libs/" "/usr/share/libs"),
a call to (load-library "fishnet/module.fdx") would load the first of the following files which it could find:

/usr/local/share/libs/fishnet/module.fdx
/usr/share/libs/fishnet/module.fdx

The LOAD-LIBRARY function supports the maintainence of common libraries of code into which newly implemented libraries can be placed. Often FDPATH is a system wide definition and to allow for personalization, LOAD-LIBRARY will first try using the list of directories in FDMYPATH. These may both be set as configuration variables.

When a module consists of several files, the procedure LOAD-COMPONENT can be used to portably load the component files. LOAD-COMPONENT interprets relative pathnames with respect to the file in which LOAD-COMPONENT is being evaluated. For example, if the file "/usr/local/share/fishnet/module.fdx" contained the expression (load-module "analyze.fdx"), it would load the file "/usr/local/share/fishnet/analyze.fdx". LOAD-COMPONENT could be rewritten in terms of the function GET-COMPONENT, which generates an absolute pathname based on the file currently being loaded. This is useful for (among other things) referring to data files, so the same `module.fdx' file could say:
(use-pool (get-component "fishnet.pool"))
to use the file pool "/usr/local/share/fishnet/fishnet.pool".

The TX Text Processing Library	FDScript includes a suite of sophisticated tools for analyzing and parsing text in a variety of languages. This document describes those tools and their uses.

The TX library is a part of FDScript with functions for dealing with text. It includes a powerful pattern matching facility together with procedures for stemming (Porter), hashing (MD5), and morphological analysis. It also includes specialized parsers for HTML XML, MIME, and RFC822 email messages.

The Pattern Matcher

The TX pattern matcher recognizes and extracts structure from arbitrary strings. TX is organized around matching patterns (which are LISP objects) against strings (which are linear sequences of characters). Since FramerD strings can include any Unicode character, these strings may contain the characters of any human language and most machine languages.

Taken by itself, a pattern specifies a set of strings; for instance, the pattern (isalnum+) matches any sequence of alphanumeric characters, so that:

[fdscript] (tx-match '(isalnum+) "haase")
#t

but:

[fdscript] (tx-match '(isalnum+) "haase@media")
#f

since `@' isn't a letter or number. The pattern (isalnum+) also matches letters and numbers in other languages, so

[fdscript] (tx-match '(isalnum+) "h�se")
#t

(isalnum+) is called a matching operator. Strings and matching operators are the "basis level" for matching and searching: any search or match eventually gets down to either strings or matching operators. However, the matcher provides two general and powerful ways to combine these primitives.

Vector Patterns match Sequences

A vector pattern combines several patterns into a sequence, matching all strings consisting of a substring matched by the vector's first element followed by a substring matching the vector's second element, and so on. For example, the following vector pattern matches the string "haase@media":

[fdscript] (tx-match '#((isalnum+) "@" (isalnum+))  "haase@media")
#t

since the first (isalnum+) matches "haase", the string "@" matches "@" (strings always match themselves), and the second (isalnum+) matches "media". Note that this pattern would not, however, match a string like "haase%prep.ai.mit.edu".

Choices can be used as Patterns

Alternatives like this can be described by using FramerD choices to represent different patterns which can be matched. For example, we can extend the pattern above to also match "haase%prep.ai.mit.edu":

[fdscript] (tx-match '#((isalnum+) {"@" "%"} (isalnum+))  "haase%prep.ai.mit.edu")
#t

The choices in a pattern like this need not be strings; any pattern can be recursively included, e.g.

[fdscript] (tx-match '#((isalnum+) {"@" "%" (ispunct)} (isalnum+))  "haase-media")
#t

Named Patterns

When a symbol is used as a pattern, the value of that symbol is used for the matching, allowing complex patterns to be broken into smaller pieces. The procedure tx-closure (with abbreviation txc) takes a pattern and associates it with the current environment, so that symbol references within the pattern will be resolved in the corresponding environment. An example may make things clearer:

(define user-name '(isalnum+))
(define host-name
  {(isalnum+)
   #((isalnum+) "." (isalnum+) ".edu")
   #((isalnum+) "." (isalnum+) "." (isalnum+) ".edu")
   #((isalnum+) "." (isalnum+) "." (isalnum+) "." (isalnum+) ".edu")})
[fdscript] (tx-match (tx-closure '#(user-name "@" host-name))
                     "haase@media.mit.edu")
#t

The use of symbols as patterns is mostly meant to provide a way of reducing the complexity of individual patterns and enchancing their readability. Technically, however, it also makes the matcher more powerful because it allows the specification of recursive patterns.

How To Do Things With Patterns

We now know enough about patterns to look at the different ways patterns can be used in the TX package. Patterns can be used for more than matching against strings. As we saw above, the function tx-extract extracts the structure of the match:

[fdscript] (tx-extract '#((isalnum+) "@" (isalnum+))  "haase@media.mit.edu")
#("haase" "@" "media.mit.edu")

tx-extract treats named patterns as "atoms" and doesn't expand the internal structure of their match. This allows something like this:

[fdscript] (tx-extract (txc #(user-name "@" host-name)) "haase@media.mit.edu")
#("haase" "@" "media.mit.edu")

where simple substitution would extract the substructure of the hostname "media.mit.edu", rather than treating it as a single chunk:

[fdscript] (tx-extract (vector user-name "@" host-name))
#("haase" "@" #("media" "." "mit" ".edu"))

Note that in this example, we use vector to construct the pattern on the fly.

The function tx-search locates the first substring which matches a pattern, returning the integer position at which the substring starts. For example,

[fdscript] (tx-search '(isdigit+) "My name is 007, JAMES 007")
11

The function tx-matcher returns the length of the substring which a pattern does match, for example

[fdscript] (tx-matcher '(isdigit+) "123ABC")
3

The function tx-gather returns the substrings of a string which match a pattern, as in

[fdscript] (tx-gather '(isdigit+) "There were 12 grapes and 66 apples")
;; There are 2 results
{"12" "66"}

The matches are returned as a choice and can then be operated on by other procedures. For example, using read-from-string would return the actual numeric values:

[fdscript] (read-from-string
             (tx-gather '(isdigit+) "There were 12 grapes and 66 apples"))
;; There are 2 results
{12 66}

The function tx-segment breaks a larger string into smaller substrings at separators designated by a particular pattern. For instance, we can get substrings separated by vowels as follows:

(define vowels '(+ {"a" "e" "i" "o" "u"}))
[fdscript] (tx-segment "How long has it been?" vowels)
("H" "w l" "ng h" "s " "t b" "n?")

which we could glue back together with string-append:

[fdscript] (apply string-append (tx-segment "How long has it been?" vowels))
"Hw lng hs t bn?"

The function tx-fragment works much like tx-segment, but it keeps the separating strings, so we would have:

[fdscript] (tx-fragment "How long has it been?" vowels)
("" "H" "o" "w l" "o" "ng h" "a" "s " "i" "t b" "ee" "n?")

Applying string-append to the results of tx-fragment will restore the original string, as in:

[fdscript] (apply string-append
              (tx-fragment "How long has it been?" vowels))
"How long has it been?"

Parsing Files with Record Streams

Finally, we can take files and use patterns to divide them into records without having to load the whole file into a string. This can be useful with large data files used in other databases or applications. One starts by creating a record stream with the function open-record-stream, which takes a filename, a pattern, and (optionally) a text encoding (e.g. iso-8859/1 or BIG5).

Once a record stream has been created, the function read-record sequentially returns chunks of text from the file which match the record pattern. The function read-spacing can read the spacing between records.

Review

As we've seen, patterns in TX are built out of five simple elements: strings match themselves vectors of patterns match one pattern after another choices match one of many patterns symbols match patterns defined by global variables operators (like (isalnum+)) match certain kinds of substrings

Knowing how these simple pieces work and what operators are available, you can write and read patterns in TX. The following sections list the available operators. This pattern language was designed to more readable than standard regular expression languages such as those provided by the POSIX regex library or Perl.

Simple Operators

Simple operators are built-in primitives for identifying syntactic points (beginnings and end of lines), character properties (spacing, case, puncutation, etc), and some common patterns (mail ids, markup, etc).

(bol): matches either the beginning of a string or the beginning of a new line
(eol): matches either the end of a string or the end of a line
(isalpha): matches any alphabetic character
(isalpha+): matches any string of alphabetic characters
(isdigit): matches any base 10 digit character
(isdigit+): matches any sequence of base 10 digits
(isalnum): matches any alphanumeric character
(isalnum+): matches any string of alphanumeric characters
(ispunct): matches any punctuation character
(ispunct+): matches any string of punctuation characters
(isupper): matches any upper-case character
(isupper+): matches any string of upper-case characters
(islower): matches any lower-case character
(islower+): matches any string of lower-case characters
(isspace): matches any whitespace characters
(isspace+): matches any sequence of whitespace characters
(spaces): matches any sequence of whitespace characters
(lsymbol): matches any LISP symbol
(csymbol): matches any valid C identifier
(mailid): matches any email address or message reference

The primitive match operators which match more than a single character are maximizing; this means that they match the longest string possible. In particular, they will not match any substrings of a string they match. This means that an operator like (isalpha+) will match the substring "abc" in the string "abc3", but will not match the substring "ab". This makes the matching a lot faster and the more general sort of matching can be done by using the compound * and + operators (e.g. as (+ (isalpha)).

Parameterized Operators

(char-not chars) matches any string that does not contain any of the characters in chars (which is a string). E.G.

[fdscript] (tx-match '(char-not "+-") "333.5")
#t
[fdscript] (tx-match '(char-not "+-") "333.5+5i")
#f

(char-range first-char last-char) matches any character whose Unicode code point lies between the characters first-char and last-char (inclusive). For example, we could rewrite (islower) with

[fdscript] (tx-match '(char-range #\a #\z) "a")
#t
[fdscript] (tx-match '(char-range #\a #\z) "m")
#t

though this would only work for ASCII characters (islower) works for any Unicode character.

Compound operators

A compound operator takes another pattern as a parameter. Three of the most useful compound operators are (* pat), (+ pat), (NOT pat), and (NOT> pat). (* pat) matches any number (including zero) of consecutive sustrings matching pat; (+ pat) matches any number (excluding zero) of consecutive substrings matching pat; (not pat) matches all the substrings that do not contain pat; and (not> pat) matches the longest possible string consisting of anything BUT pat.

For example, we can recognize certain nonsense words:

[fdscript] (tx-match '(* {"hum" "dum" "doo" "de"}) "humdumdoodedum")
#t

which uses a choices as the repeated pattern. We can even extract structure from this nonsense:

(tx-extract '(* {"hum" "dum" "doo" "de"}) "humdumdoodedum")
(* "hum" "dum" "doo" "de" "dum")

More interestingly, we can use the (* pat) operator to match lists of items whose length may vary, e.g.

[fdscript] (tx-extract '(* #((isalnum+) {(eol) (isspace+) #("," (isspace+))}))
                       "foo bar baz")
(* #("foo" " ") #("bar" " ") #("baz" ""))
[fdscript] (tx-extract '(* #((isalnum+) {(eol) (isspace+) #("," (isspace+))}))
                       "foo, bar, baz, quux")
(* #("foo" #("," " ")) #("bar" #("," " "))
   #("baz" #("," " ")) #("quux" ""))

The (* pat) operator successfully matches no occurences of its pattern, so we get the somewhat confusing:

[fdscript] (tx-match '(* #((isalnum+) {(eol) (isspace+) #("," (isspace+))}))
                     "")
#t

though it does have some standards:

[fdscript] (tx-match '(* #((isalnum+) {(eol) (isspace+) #("," (isspace+))}))
                     ",")
#f

We can use the operator (+ pat) for cases where there will always be at least one instance of the pattern. So, we get

[fdscript] (tx-match '(+ #((isalnum+) {(eol) (isspace+) #("," (isspace+))}))
                     "")
#f

but can still handle the single case:

[fdscript] (tx-match '(+ #((isalnum+) {(eol) (isspace+) #("," (isspace+))}))
                     "cook")
#t

The (NOT pat) operator is apparently simple but hides some complexity. In its top level usage, it just reverses the behaviour of tx-match:

[fdscript] (tx-match '(not (isalpha+)) "good")
#f

Matching character case

Normally the matcher ignores case when comparing strings, so you have

(tx-match "Good" "good")
#t

however, the compound operator (MATCH-CASE pat) causes a pattern to pay attention to case, so that you have

[fdscript] (tx-match '(match-case "Good") "good")
#f

(MATCH-CASE pat) (which can be abbreviated MC) turns on case comparison; the complementary procedure (IGNORE-CASE pat) (which can be abbreviated IC) turns it back off. So, we can have:

[fdscript] (tx-match
            '(match-case #("Good" ", " (ignore-case "BAD") ", " "Ugly"))
            "Good, bad, Ugly")
#t

Other Text Processing Functions

The function MD5 returns a packet hashing its string argument, e.g.

[fdscript] (MD5 "I feel so unique")
[#PACKET 16 0x6a145c9f21b7cc4fe8a488ad59b34267]

using the MD5 message digest function. If the string is non-ASCII, it returns the MD5 of a UTF-8 encoding of the string. The MD5 function can also be called on a packet, as in:

[fdscript] (MD5 (write-dtype-to-packet '(SENTENCE "I am hungry")))
[#PACKET 16 0xbf2e69fd6c8b5023c9e73510c40260f3]

The function (refpoints string)
returns all the capitalized sequences of words in string, which corresponds very roughly to the significant proper names. This filters a small set of stop words and initial capitals.

[fdscript] (refpoints "Elvis and Princess Di met at the House of Blues
in Tusla, Oklahoma.  They listened to `Boogie-Woogie Bugle Boy' on the juke
box.")
{;; There are 7 results
 "Tusla" "Princess Di" "Elvis" "Blues"
 "House" "Boogie-Woogie Bugle Boy" "Oklahoma"}

The function (parse-timestring string)
attempts to interpret string as a date and time with respect to the current time, returning a timestamp object:

[fdscript] (parse-timestring "July 4, 1976 11:10 PM")
#<"1976-07-04T18:10:00GMT">

The function (stem-word word)
applies the Porter stemming algorithm to render a canonical form for word. This is not the linguistic verb root, but a special token which may not be a word at all. For example,

(stem-word "trees")
"tree"
(stem-word "meeting")
"meet"
(stem-word "meets")
"meet"
(stem-word "flies")
"fli"
(stem-word "flying")
"fly"

The function MORPHRULE implements a simple form of morphological analysis. It's first function is a string, it second argument is a set of "suffix rules" and its third argument is set of root forms. The function returns whichever root forms it can derive from the first argument from the suffix rules. The set of rules is a choice while the set of roots can be either a choice between strings of a hashset of strings (this can make it much faster).

For example, here is a very simple English morphological analyzer:

(define rules {
  #("ing" "") #("ed" "") #("s" "") #("ies" "y")
  #("nning" "n") #("nned" "n")})
(define roots {"cook" "fly" "skin"})
[fdscript] (morphrule "cooking" rules roots)
"cook"
[fdscript] (morphrule "flying" rules roots)
"fly"
[fdscript] (morphrule "flies" rules roots)
"fly"

Parsing HTML and XML

FDScript contains a custom library for parsing HTML and XML files. The basic engine of the parser is a non-validating XML parser which understands the peculiarities of certain HTML tags. This allows it to parse both HTML and XML. The function PARSE-HTML takes a string as input and returns a nested list structure representing the XML/HTML structure of the document.

Each element of the nested list structure consists of three items: a symbol indicating the HTML/XML tag, a list of attributes associated with the tag, and a list of the elements (strings and subexpressions) making up the item's content. For example,

[fdscript] (parse-html (filestring "test.html"))
((BODY () 
    ("\n" 
     (P () ("This is a test of " (STRONG () ("FramerD")) " HTML parsing\n")) 
     (P ((ALIGN "RIGHT")) ("It has several paragraphs\n")) "\n")))

where the file test.html would contain the following:

This is a test of FramerD HTML parsing

It has several paragraphs

The function PARSE-XML does the same for XML files and prints warnings if malformed XML is encountered. On the snippet above, PARSE-XML nests one paragraph in the next and notifies the user of the unbalanced tags:

[fdscript] (parse-xml (filestring "test.html"))
[16:04:07 P entity closed with (BODY ())]
((BODY () 
    ("\n" 
     (P () 
        ("This is a test of " (STRONG () ("FramerD")) " HTML parsing\n" 
         (P ((ALIGN "RIGHT")) ("It has several paragraphs\n")) "\n")))))

FDScript contains a plethora of special forms for generating XML and HTML (in the HTMLGEN module described here), but the functions UNPARSE-HTML and UNPARSE-XML take the results of the above functions and regenerate the HTML or XML they describe. This process inserts close tags for non-empty elements like P, so that we would have:

[fdscript] (unparse-html (parse-html (filestring "test.html")))

This is a test of FramerD HTML parsing
It has several paragraphs

;; Nothing (void) was returned

Parsing MIME and RFC-822

The function READ-MIME takes either a string or a packet and interprets it according to the MIME protocol. It returns a slotmap whose slots contain the fields of the message and whose BODY slot contains the body of the message.

[fdscript] (read-mime (filestring "test-message"))
#[FROM "haase@media.mit.edu"
  TO "walter@media.mit.edu"
  CONTENT "Looks like it's going to happen

‐‐ Ken
"]

If the message is multi-part (i.e. has attachments), the content slot will be a list of slotmaps, one for each component. The component slotmaps will each have slots which mime-type and content. The content slot will be either a string or a packet depending on whether its MIME type is a text type.

The mime parser is capable of using the character encodings which FramerD knows about, both for body text and for message fields. The argument may be either a string (which will already have been UTF-8 encoded) or a packet (which is taken as latin-1).

(get-mailids string)
returns all the whitespace-separated substrings of string which contain an atsign, which roughly corresponds to all email addresses or message references in the string.

[fdscript] (get-mailids "I heard that fdr@whitehouse.gov thinks hitler@reich.org is a fascist.")
{"fdr@whitehouse.gov" "hitler@reich.org"}

Simple but Handy

In addition to the facilities above, the text library (or FDScript itself) include some handling functions for text strings:

(has-suffix suffix string): returns true if string ends in suffix
(has-prefix prefix string): returns true if string starts with prefix
(uppercase? string): returns true #t if string has no lowercase characters
(lowercase? string): returns true #t if string has no uppercase characters
(capitalized? string): returns true #t if the first character of string is uppercase
(multi-line? string): returns true #t if string contains newlines
(numeric? string): returns true #t if string contains only numeric or punctuation characters
(empty-string? string): returns true #t if string has no characters
(whitespace% string): returns the percentage (an integer from 0 to 99) of characters in string which are whitespace
(alphabetic% string): returns the percentage (an integer from 0 to 99) of characters in string which are alphabetic characters

Implementing new commands with scripts

New command line directives can be implemented by fdscript program files. Under Unix, these should be marked as executable and start with a line something like:

#!/usr/local/bin/fdscript

depending on where your local copy of fdscript lives. The remaining lines are FDScript expressions evaluated to implement the specified command. If the subsequent expressions define a procedure main, this procedure is applied to the command line arguments to the script. E.G., suppose the file square.fdx contained the following text:

#!/usr/local/bin/fdscript
;; This is the file square.fdx
(define (square x) (* x x))
;; PARSE-ARG will convert a string to a number
(define (main x) (square (parse-arg x)))

we could use the file as a command from the shell:

sh% square.fdx 10
100

providing that square.fdx were set as executable.

The script can also access the arguments to the command through several variables:

nargs is the number of arguments
args is a list of all the arguments
arg1, arg2, arg3, and arg4 are the first four arguments (if given)

these arguments are generally strings, which the function parse-arg will convert to Lisp objects.

The default FramerD installation installs a command fdinstall-script (which is an FDScript script) which puts the approriate #! line at the front of a file and makes it executable. When called with two filename arguments, the executable script is stored in the second filename and the source filename (the first argument) is left untouched. Thus, we could create a simple square command using our square.fdx file:

sh% fdinstall-script square.fdx square
sh% square 2000
4000000

Slighly more complex commands can provide command-line access FramerD databases. For instance, the following script finds WordNet senses based on a word and a more general word

#!/usr/local/bin/fdscript
;; This is the file find-sense.fdx
(use-pool "brico@framerd.org") ; replace with local server
(define (main word category)
  (let ((candidates (find-frames "brico@framerd.org" 'words word))
	(super-senses (find-frames "brico@framerd.org" 'words category)))
    (do-choices (candidate (find-frames "brico@framerd.org" 'words word1))
      (if (value-path? candidate 'hypernym super-senses)
	  (lineout candidate)))))

which would work as follows:

sh% fdinstall-script find-sense.fdx find-sense
sh% find-sense dog animal
@/brico/f902("dog" "domestic_dog" "Canis_familiaris")
sh% find-sense dog person
@/brico/185c6("cad" "bounder" "blackguard" "dog" "hound" "heel")
@/brico/18651("dog")
@/brico/18b22("frump" "dog")

FDScript's Dirty Macros	FDScript provides a very simple macro facility

FDScript provides a very simple macro facility for implementing syntactic extensions of the core FDScript language. When the value of a symbol is a list of the form:

(macro (expr) body...)

the evaluator uses body to preprocess all expressions starting with the symbol. The expressions in body are evaluated in a "safe environment" where only the basic Scheme/FDScript functions are available and the variable expr is bound to the top level expression being processed. For example:

(define push
 '(macro (expr)
   `(set! ,(caddr expr) (cons ,(cadr expr) ,(caddr expr)))))

defines a version of Common LISP's push macro, used thus:

[fdscript] (define atoms '())
[fdscript] (push 'x atoms)
[fdscript] (push 'y atoms)
[fdscript] atoms
(Y X)
[fdscript] (let ((nums '()))
             (dotimes (i 5) (push i nums))
             nums)
(5 4 3 2 1)

An Example XML Parser	Here we use FDScript's record streams to write a very simple non-validating XML parser.

(define attributes #(" " (not> {">" "/"})))
(define element-pattern
  (tx-closure #("<" {"" "/"} (isalnum+) {"" attributes} {"" "/"} ">")))
(define (empty-string? x) (= (length x) 0))
(define (xml-parser-loop rs content-fn stack)
  ;; Read the content and process it
  (let ((content (read-spacing rs))) (content-fn stack content))
  ;; Get fresh markup
  (let* ((markup (read-record rs)))
    (if (eof-object? markup)
	(cond ((null? stack) 'ok)
	      (else (lineout "File ended early at " stack)
		    stack))
	;; If there is some, extract it structure and branch
	(let* ((extraction (tx-extract element-pattern markup))
	       (start-element (empty-string? (vector-ref extraction 1)))
	       (empty-element (not (empty-string? (vector-ref extraction 4))))
	       (tag (vector-ref extraction 2))
	       (attribs (vector-ref extraction 4)))
	  (cond (empty-element ;; empty elements have null content
		 (content-fn (cons (cons tag attribs) stack) "")
		 (xml-parser-loop rs content-fn stack))
		(start-element ;; start elements push onto the stack
		 (xml-parser-loop rs content-fn 
				  (cons (cons tag attribs) stack)))
		((equal? tag (car (car stack)))
		 ;; Matching non-start non-empty elements pop the stack
		 (xml-parser-loop rs content-fn (cdr stack)))
		(else
		 ;; anything else reports an error and returns the stream
		 (lineout "Element mismatch, started with "
		   (car (car stack)) " ended with "
		   tag)
		 rs))))))
(define (xml-parser filename content-fn)
  (let ((stream (open-record-stream filename element-pattern)))
    (xml-parser-loop stream content-fn '())))
(define (test-fn stack content)
  (lineout "Stack is " stack)
  (printout "  at content: ") (print content))

This is a test.  This is a bold statement about our
image ().

[17:08:30 MIT FramerD library 2.2 (C) 1994-2000, built Mar 19 2001]
[fdscript] (load "mini-xml.fdx")
;; Nothing (void) was returned
;; Values changed (6): ATTRIBUTES EMPTY-STRING? ELEMENT-PATTERN XML-PARSER-LOOP XML-PARSER TEST-FN
[fdscript] (xml-parser "mini.xml" test-fn)
Stack is ()
  at content: ""
Stack is (("P" . ""))
  at content: "This is a test.  This is a "
Stack is (("bold" . "") ("P" . ""))
  at content: "bold"
Stack is (("P" . ""))
  at content: " statement about our\nimage ("
Stack is (("img" . "/") ("P" . ""))
  at content: ""
Stack is (("P" . ""))
  at content: ")."
Stack is ()
  at content: "\n\n\n"
OK