BASM (Basic ASseMbler) Part 2

Feel free to share: Tweet about this on Twitter
Share on LinkedIn
Share on Facebook
Share on Google+

An inline Basic Z80 assembler for CPC464 and later home computers:

In my previous post: BASM (Basic ASseMbler) Part 1, I introduced my little personal coding project in which I  started coding a simple but functional Z80 assembler in Locomotive Basic (the basic of the Amstrad/Schneider CPC464 and later home computer).

This ‘part 2’ will delve into what I built in the days following my plan formed and how I went about it.

Resources then and now

One of the biggest advantages I had compared to my 30+year younger self is obviously ‘the internet’ (in addition to 30+ years of experience in designing and developing software…). These days it is hard to conceive how we did things before the internet happened. Everything you needed to know, you had to get through school (for my programming explorations this wasn’t an option as the teachers at school hardly knew what a computer was, let alone how to code) or you had to get it from magazines and books.

Finding good books was a challenge itself, as you couldn’t google for good programming books of course.

Anyway, I decided I was going to grant myself this advantage and found some really nice resources online to help me out in this little endeavour. The following sites were particularly useful (in no particular order):

‘Modern’ coding in ‘old basic’

Another thing that has changed significantly since the early 80’s is programming itself (especially programming ‘at home’).

We’re now so used to full featured IDE’s with syntax highlighting, code refactoring, version management, project templates, debugging, UI design tools, etc. It is hard to imagine how we could live without all those tools.

Back in the early 80’s on our 8 bit home computers we of course had none of this. We had a machine that started up right into Basic. This was our ‘shell’ and our IDE. At the same time, this also meant that you could literally turn on your computer and start coding within seconds, without loading any additional software.

This made the threshold to start experimenting with code extremely low. Basic is itself also very easy to learn, though it was at least as limited as it was easy back then.

Locomotive Basic, which was included in ROM on the CPC computers was a pretty good basic implementation compared to most of the competitors. But unlike ‘modern’ versions of Basic every line of code had to be put on a line number and the only ‘control flow’ that was supported were:

  • For <iterator> = <lower> to <upper> / Next
  • While <condition> / Wend
  • if <condition> then <code>  else <code> (all on a single line)
  • Goto <line_number>
  • Gosub <line_number> / Return

The Basic did have rudimentary support for ‘functions’ through the statement ‘def fn’. With this statement you could define a function that returned only a real number or a integer. These did support parameters.

But for functions that did more than a single line returning only a number the best we had was Gosub. You could ‘gosub’ to a line number, do some work there and use the ‘return’ statement to return to directly after the ‘gosub’. It didn’t support anything like parameters or return values. These had to be ‘simulated’ using variables, which were all global. And I mean GLOBAL, as in global for the entire basic program.

Locomotion Basic also support some basic ‘switch/case’ like functionality using ‘on <variable> goto <line#>, line#, …’ or ‘on <variable> gosub <line#>, <line#>, …’. The <variable> had to be an integer and on the value 1 it would ‘goto’ or ‘gosub’ to the first line number, on 2 to the second, etc.

Further, there were only 3 ‘scalar’ datatypes:

  1. String: max 255 characters, defined by a ‘$’ at the end of the variable name
  2. Integer (signed): -32768, 32767 (16 bit two’s complement), defined by a ‘%’ at the and of the variable name
  3. Real (floating point): a 5 byte representation (4 bytes mantissa and 1 byte exponent), defined by a ‘!’ or no prefix (its the default type) at the end of the variable name.

The only ‘complex’ data type was: Array. These could have a maximum of 125 dimensions. Usually you would define them somewhere early in your code using the ‘dim’ statement. The data type would follow the variable name and then you’d specify the number of items for each dimension, for example:

dim myArray%[10][20}

This was more or less it.

In order to provide some structure to my code, I decided early on that I would try to come up with some simple coding style principles and guidelines.

Complex conditions

All complex conditions are defined as DEF FN functions. This makes if statements easier to read, especially considering that the entire IF THEN ELSE has to be written on a single line.

The following fragment shows a bunch of such functions. These specific functions check on the type of arguments of an instruction in the code:

10100 ' operand type checks
10101 DEF FNisR(r$)=(asc(r$)>=97 and asc(r$)<=101) or r$="h" or r$="l"
10102 DEF FNisJ(j$)=j$="ixh" or j$="ixl" or j$="iyh" or j$="iyl"
10103 DEF FNisN(n$)=n$="0" or n$="00" or (val(n$)>0 and val(n$)<=&FF) ' TODO improve reliability
10104 DEF FNisA(n$)=n$="&0" or n$="&00" or n$="&0000" or (val(n$)>0 and val(n$)<=32767) or (val(n$)<0 and val(n$)>=-32768) ' TODO improve reliability
10105 DEF FNisInd(i$)=left$(i$,1)="(" and right$(i$, 1)=")"
10106 DEF FNisIndA(i$)=FNisA(mid$(i$, 2, len(i$)-2))
10107 DEF FNisQ(q$)=q$="bc" or q$="de" or q$="hl" or q$="ix" or q$="iy" or q$="sp"
10108 DEF FNisC(c$)=c$="nz" or c$="z" or c$="nc" or c$="c" or c$="po" or c$="pe" or c$="p" or c$="m"


In order to make it easier to simulate modern functions in Basic, I decided to use a number of predefined variables to represent function arguments and return values.

The following example shows what this looks like:

20000 ' getR(arg1$) -> ret%
20001 if arg1$="a" then ret%=ra%:return
20002 if arg1$="b" then ret%=rb%:return
20003 if arg1$="c" then ret%=rc%:return
20004 if arg1$="d" then ret%=rd%:return
20005 if arg1$="e" then ret%=re%:return
20006 if arg1$="h" then ret%=rh%:return
20007 if arg1$="l" then ret%=rl%:return
20008 ret%=-1:return

The initial comment line shows what the ‘signature’ of the function looks like. As you can see it accepts a single argument: arg1$ (in this case of type string). It returns an Integer: ret%. Functions with multiple parameters would have arg1$, arg2$, etc. If the second parameter would be an Integer it would be called: arg2%. Some functions return ‘tuples’, for these the return value would be in variables like: retxy% and rethl%. Together they form the return tuple (xy, hl).

The idea behind this convention is mostly to not introduce too many global variables as these all have a const in memory and they fill up the global namespace. Using this approach there’d be only a few variables that are shared by possibly many functions. This does of course mean the names are not very descriptive and you have to take care to make sure you first assign the proper value to the variable before ‘calling’ the function and to make sure you get the return value from the right variable.

An example of a ‘function’ call:

14130 arg1$=stmt$(3):gosub 20000 ' call getR(arg1$)
14131 opc1%=112 or ret%:return

In line 14130 the arg1$ is set to the contents of stmt$(3), then the function is ‘called’ (gosub 20000).
In the next line the return value: ret% is used to construct an opcode.

Functions that need to call other functions themselves (nested functions) would need to make sure their own argument values are saved before assigning a value for the nested call to another function.

All in all, it is clear that this is definitely not a perfect solution, but for the complexity of the BASM code it seems to work pretty well.

Funnily, the code resembles assembler code a bit, where registers are often used to pass arguments to ‘functions’.

Use of goto

Even though I tried to use the ‘function’ convention, as much as possible, it is really hard in Basic to write a full program without using ‘goto’s as well. I guess it is possible to do without, but this would often make the code very verbose and in the end harder to read then to use ‘goto’s.

The following example shows where I use goto statements:

14001 if FNisR(stmt$(2)) and FNisR(stmt$(3)) then goto 14036 'RR
14036 arg1$=stmt$(2):gosub 20000:oper1%=ret%
14037 arg1$=stmt$(3):gosub 20000:oper2%=ret%
14038 opc1%=64 OR FNfoct(oper1%) or oper2%:return

In this fragment line 14001 first uses the DEF FN function FNisR(…) to test if operand 1 and 2 (stmt$(2) and stmt$(3)) are registers and if so it goto’s line 14016.
On line 14036 the ‘function’ getR(arg1$) is ‘called’ (gosub 20000), to get the register value for operand 1.

in line 14037 the same is done for operand 2.

Line 14038 constructs the opcode from the base code 64 and the two register codes (I’ll go into more detail later how these opcodes are structured). This line also has a ‘return’. This might seem strange as 14001 used goto 14036 to get here. However, this entire example is actually part of another ‘function’ that is called on line:

11180 if stmt$(1)="ld" then gosub 14000

And this is where the return on line 14038 returns the program flow.

The reason for using goto in this example mostly is that an entire IF THEN ELSE has to be on a single line. If this were not required I could just put the lines 14036-14038 in the code block of the THEN.

This is a typical situation where Basic requires you to be pragmatic and make the best of it.

Another use of goto in BASM is for ‘exception handling’ (though that’s a rather big word for what’s happening in BASM).

Whenever the ‘lever’/’parser’ of basm encounters an error in the provided assembler code it ‘goto’s line 20100.

13112 if stmt$(3)="" then goto 20100

When this occurs all processing is halted and an error message is printed.

20100 ' ill inst error
20110 if stmt$(0)="" then label$="" else label$=stmt$(0)+":"
20120 print "Illegal instruction:"
20130 print using "#### \ \ & &,&";l;label$;stmt$(1);stmt$(2);stmt$(3)
20140 end

There is no returning from this (see line 20140). So in this case a goto is also suitable.

This is more or less all there is when it comes to coding conventions I used, in relation to ‘program flow’ and structured programming.

In a following blog post (or multiple) in this series, I’ll go in more detail on the ‘lexing’ and ‘parsing’ of the assembler code and how opcodes are generated.