uForth Quick Guide


Todd Coram (todd of maplefish.com)

DRAFT DRAFT DRAFT
Sep 30, 2009

Introduction

uForth is a Forth like scripting system for microcontrollers with limited resources. It is written in C and is best used as a glue language to tie C functions and services together dynamically. With that intention, uForth is not a full system Forth. It is neither ANSI compliant nor suitable for Forth programming from the ground up.

uForth is a 16 bit signed Forth that can fit into tiny microcontrollers. The VM and interpreter fits in as little as 8KB of Flash and 400 bytes of RAM (assuming, but not counting, a flash resident dictionary).

The dictionary stores all entries as 16 bit cells in the processor's native endianess. The dictionary can be saved and loaded on processes with the same endianess. It should be trivial to transform the dictionary between processes of different endian.

uForth implements ~55 primitives in C. While some of these words can be defined in uForth itself, they are implemented in C for space efficiency and speed.

uForth is a primitive optimized token based Forth. By this, I mean that primitives are compiled inline and all words are executed without the overhead of C function calls. By compiling the primitives inline, we save 110 bytes of dictionary space (an important consideration if the dictionary is RAM based) and 1 extra indirection (through the return stack) per primitive invocation.

Limitations

uForth is a scripting language. It is still very much a Forth, albeit a tiny one. As such, it has limitations.

uForth is not a strong meta Forth environment. You can define new words (:), create, and mark words immediate, but there is no POSTPONE nor DOES>, nor any decent way to disassemble words (yet!).

There is some limited support of strings and characters. Strings are packed into 16 bit cells. The lack of rich string support further distinguishes uForth scripting from traditional Forth system building.

There are no RAM based string storage, hence no ." . If you really want to print out a string, add it to the dictionary with ," .

If you want more string support you can add it via Scripting C.

uForth doesn't support floating point.

uForth has no notion of I/O. You can easily provide this by extending uForth with C (see Scripting C for an example).

uForth keeps word headers and code together. While not a real limitation this does make it a bit more difficult to generate a compact headerless dictionary.

uForth primary cell format is as a signed int. This limits uForth to a 32KB dictionary (RAM or ROM) and 32KB user RAM. (See uForth Addressing).

Don't forget: uForth is a scripting language, not a systems language. If you are using that much dictionary or user RAM, you are doing something wrong. Go find a real Forth!

Why all of these limitations? Read on...

Goals

Here is a short list of uForth goals:

  1. Support scripting C functions and libraries.
  2. Embed into existing C programs.
  3. Run on small microcontrollers.
  4. Support cross platform development.
  5. Fast and light.
  6. Easily extendable via C.
  7. ROMable dictionary.
  8. Cross compilation and development (portable dictionary).
  9. Simple virtual machine.

Scripting C using uForth

To support scripting, uForth is easily extended with C. C code has direct access to uForth memory and stacks. There is a single entry function called uforth_stat c_handle(void). It takes no parameters since it is expected to pop values off of the uForth data stack (and leave results there too).

The c_handle() function is invoked via the cf uForth word. A typical registration by a C application could look like this:

  uforth_interpret(": . 1 cf ;");
  uforth_interpret(": emit 2 cf ;");
  uforth_interpret(": key 3 cf ;");
  uforth_interpret(": mem 4 cf ;");
  uforth_interpret(": type 5 cf ;");
  uforth_interpret(": .s 6 cf ;");
  uforth_interpret(": save-image 7 cf ;");
  uforth_interpret(": load-image 8 cf ;");

The c_handle() function for the above definitions would begin like this:

uforth_stat c_handle(void) {
  CELL r1 = dpop();
  char *str;
  switch(r1) {
  case 1: /* dot  */
    if (iram.didx > -1) {
      r1 = dpop();
      printf("%d ",r1);
      fflush(stdout);
    } else printf("EMPTY\n");
    break;
  case 2: /* emit */
    printf("%c", dpop());
    break;

You can also call uForth from C, passing words as C strings. This is provided by the uforth_stat uforth_interpreter(char*) function.

The Portable Dictionary

uForth doesn't directly support writing to a Flash stored dictionary. You can designate the dictionary as RAM based or ROM (Flash) based. It is up to your application to work out how to compile definitions to Flash (see TBD example).

The PC (Unix/Cygwin) uForth allows you to save dictionary snapshots with the save-image command. You specify the output file (for example: save-image core.img) and it will dump the dictionary. This dumped dictionary can be loaded into a uForth at compile time. That is, you can do script development on the PC and then save the script as a loadable dictionary. There is C program (make_core_dict) that can convert this image into a C array that can be included in a target build via a generated header file (e.g. core_dict.h). This array can be designated as ROM or RAM based (depending on your target compiler).

Since uForth's C extension is based on tokens (numerical values as selectors), you can pretty much move this dictionary to any target uForth so as long as the token convention is followed!

uForth Bootstrapping

The PC executable uforth.boot can be used when there is no default pre-dumped dictionary (i.e. core_dict.h). This executable compiles primitives into a RAM based dictionary and loads init.f.

If you use save-image core.img and then make_core_dict <core.img >core_dict.h you can then retire uforth.boot and compile/run uforth (for your target platform).

uForth Addressing

Addressing in uForth is restrictive to relative indexing. There are two address spaces: dictionary and uram (there is also an iram but it doesn't support addressing). The dictionary may live in RAM or ROM. If it lives in ROM, special functions may be called to read and write values.

The dictionary is addressed by 16 bit cells. The address is expressed as in index starting at 0.

The uram space is also addressed by 16 bit cells. In order to distinguish uram addressing from dictionary addressing, uram is indexed as negative values starting at -1.

This allows you to use the same words (for example: @ and !) to address both uram and the dictionary.

That being said, uForth comes with the usual Forth words. Where ANSI word names are used, ANSI functionality should follow.

uForth is case sensitive and uses lowercase for core words.

Words

uForth supports 55 primitive words. Many of these words could be written in uForth itself, but in order to save precious dictionary space, they are written as primitives. In fact, some of the immediate primitives generate other primitives (a prime candidate for coding in uForth!). In the future those words may be moved out of the C core and into uForth.

uForth makes no claim to ANSI compatibility, but where it can, it will make words you can find in ANSI indeed compatible with ANSI.

Here are the currently implemented primitive words:

               !  '  (  *  +  ,  ,"  -  /  0=  0jmp?  :  ;  <  <0
                =  >  >r  @  [']  abort  and  begin  cf  create  do
               drop  dup  else  exec  exit  here  i  if  immediate
               j  leave  lit  loop  lshift not  or  pick r>  repeat
               rpick  rshift  swap  then  until  variable  while  xor


This document was generated using AFT v5.097