by: Matthew Hagerty
A computer basically has three main parts that are required for it to do something useful:
1. A CPU to execute instructions and manipulate data
2. Memory to store programs and data
3. Some form of input and output so the program can actually do something useful
Understanding the interaction between the CPU and memory is important to being proficient in any programming language, but is absolutely critical when using assembly language.
Memory can be further broken down into “volatile” and “non volatile”, which simply refers to what happens to the data in the memory when electrical power is removed. Volatile memory will lose, or forget, all its data when power is lost. Volatile memory is typically the RAM (random access memory) in the computer and physically consists of integrated circuit chips. An example of non volatile memory would be floppy disks, hard drives, and CD and DVD ROMs. For this discussion we are talking about the RAM in a machine without regard to virtual memory systems.
There is also another type of memory called ROM (read only memory) that is similar to RAM in that it is physically made of integrated circuit chips, however ROM is non volatile, so it remembers its program instructions and data when power is removed. There is a catch though, the program instructions and data in the ROM cannot be changed, and hence the name “read only memory”. The program instructions and data in a ROM are put there during manufacturing and have a special purpose in a computer that we will get to in a moment.
Memory is always measured in bytes (except in very esoteric computer systems which we are not talking about here) and is generally accessed by the CPU in “chunks” depending on the bus architecture of the CPU. For example, the popular 6502 and Z80 processors are 8-bit CPU’s and accesses memory one byte at a time (typically memory cannot be accessed in chunks smaller than a single byte). The 99/4A computer uses the TMS9900 which is a 16-bit CPU, so it always accesses memory in two-byte chunks. The more modern 32-bit and 64-bit CPU’s will access memory in four-byte and eight-byte chucks respectively, and have advanced features like memory managers, level-1 and level-2 caches, etc.. We are not considering these advanced features in this discussion, although understanding the virtual memory and caches lines of modern CPU’s is very important when programming on those kinds of systems.
To understand memory, the easiest analogy is probably that of a mailbox. Each memory “location” has an address, just like a house has an address. At a given address is a storage location (mailbox) in which a single byte of data can be stored. That’s it. The number of memory available memory locations depends on the design of the computer and the CPU.
To communicate with the outside world and memory in particular, the CPU has two primary interfaces called the “address bus” and “memory bus”. A “bus” is physically made up of a series of electrical contacts on the CPU itself, and depending on the CPU each bus will consist of a certain number of the contacts. In CPU parlance the term “8-bit”, “16-bit”, “32-bit”, etc. are referring to the physical size of the CPU’s data bus, which as you may recall from the previous discussion, directly affects how many bytes the CPU can read or write at one time. The size of the address bus affects how many memory locations a CPU can directly address. As an example, the Z80 CPU has an 8-bit data bus and a 16-bit address bus. Thus the Z80 can read and write one byte of data to any one of 65,536 (2^16) or 64K memory locations. The TMS9900 CPU has a 16-bit data and address bus, so it can access the same number of memory locations as the Z80, but the TMS9900 can access the data two bytes at a time.
Another term that is important to understand what talking about CPUs is “word”. A byte has a universal size of 8-bits. A “word”, however, is typically related to the size of CPU’s data bus. So a Z80’s “word” size is 8-bits, but a TMS9900 or Intel 8088 “word” size is 16-bits (since the 8088 and 8086 were 16-bit CPU’s). On modern CPU’s like the Intel and AMD 32-bit x86 processors, a “word” is 32-bits. After that the terms tend to normalize with names such as “double word” being 64-bits, and “quad-word” being 128-bits. However, on the original 8088 and 8086 CPU’s, a “double word” (DWORD) is 32-bits and a “quad word” (QWORD) is 64-bits. The point to remember is, when you are reading and see the term “word” in reference to a CPU, the actual size of the “word” of memory will depend on the CPU to which the term is being applied.
Because the “word” size of the TMS9900 CPU is 16-bits, the amount of memory directly addressable by the CPU is sometimes written as 32,768 “words”. Which is correct, but can be confusing since it looks like the TMS9900 can only access half as much data as a Z80 or 6502 CPU’s which have 8-bit “word” sizes.
When the CPU needs a piece of data, it places the address of the desired data on the address bus and informs the memory subsystem it wants to read the data at the specified address. The memory chips responsible for the specified address retrieve the data, place the data on the data bus, and then tell the CPU the data is ready to be read.
Writing to memory is similar to reading. The CPU places the address of where to store the data on the address bus, and it places the actual data to be stored on the data bus. The CPU then informs the memory subsystem to store the data at the specified address. Once complete, the memory subsystem informs the CPU that the data was successfully stored.
Internal to a CPU are several “registers” that help make all this memory access possible. As a programmer you need to understand these registers and how to use them. The CPU registers are typically measured in bits and are usually the same size as the CPU’s data or address bus. Every CPU is going to have at least two registers:
1. The Program Counter (PC)
2. The Status Register (ST)
The program counter is a special CPU controlled register that always points to the address in memory where the next instruction will be read. Since this register is holding a memory address, it is usually the same size as the CPU’s address bus. On the Z80, even though it is an 8-bit CPU (based on its data bus), the program counter is 16-bits since the Z80 can address 65,536 (64K) bytes of memory. The status register is another CPU controlled register that will be changed based on certain events that happen in the CPU. The status register’s size will depend on the CPU design and is “bit mapped”, meaning the specific bit positions in the register mean something special depending on if the bit is set to a 1 or 0. For example, after a subtraction operation, there is usually a “carry bit” in the status register that will be 1 or 0 depending on if the subtraction caused a carry. The status register bits are usually referred to as “flags”, and if a bit is 1 it is called “set”, and if a bit is 0 it is called “reset”. Every instruction the CPU can execute may affect the status register flags in certain ways, and your program can make decisions based on these flags.
A CPU will generally have other internal registers that can be used by the programmer for various tasks. Some of the registers might have specific uses, for example an “accumulator” register used during mathematical operations, an index register that can be used to help access bytes in memory based on an offset. All CPU’s have different registers and you will need to know what those registers are and if any of them have specific or special purposes. In the TMS9900 CPU there are 16 “general purpose” registers available to the programmer, meaning none of them have any special uses (except for register zero, but only with certain instructions). Most CPU’s of that time only had about 4 to 8 registers and each had a special use, which makes the 9900 very flexible and easier to program in comparison. However, unlike most other CPU’s, the 9900’s registers are not built in to the CPU itself! Most CPU’s registers are “hardware” registers and reside inside the CPU chip itself which makes accessing the data in a register very fast. The 16 general purpose registers of the 9900 CPU are actually stored in the computer’s RAM. The CPU has a special hardware register called the “workspace register” that holds the memory address of where the register memory starts in the computer’s RAM. This design has the unfortunate side effect of making register access slower on the 9900 CPU when compared to other CPU’s. There is an “up side” to this design however, it makes a context switch very efficient, which is important in a multi-tasking system and where the 9900 CPU was designed to be used. Unfortunately the 99/4A is a single tasking computer and this feature of the CPU is not needed, so instead of a benefit it becomes a slow down.
When a CPU is powered on, the first thing it must do is begin executing an instruction. Every CPU has what is known as a “power on” or “start up” address which is where the CPU will look for the first instruction to begin executing. So when power is applied to the CPU, it resets itself and loads the startup address into its program counter which causes a memory read operation to that memory address. Thus, there had better be a program instruction in memory at that startup address or the CPU will have nothing to do and the computer will not work.
This is where the ROM that we talked about earlier comes into play. Since RAM will be empty when the computer is turned on (remember that RAM is volatile memory), it cannot be used to get the computer started. A ROM chip is always going to be located at the CPU’s startup address and will contain instructions and data that will get the computer up and running. This process is called a “cold boot”.
In older computers the boot ROM would usually be about 2K (2048 bytes) and contain enough code to allow the computer to recognize other parts of the system like disk drives and cartridges, put something on the screen, and get the computer to a state where the user can begin using it. The classic computers from the mid 1980’s also typically had a BASIC language interpreter in a ROM chip, so when you turned the computer on, if you didn’t have a game cartridge inserted or a disk to load, you could at least write programs in BASIC, and that was a very wonderful thing.
Having those ROM chips does have a downside however. Since the CPU can only address a fixed number of memory locations, and since the contents of a ROM chip cannot be changed, the ROMs eat up some of the memory that would be available for other programs and data. Where the ROMS are (meaning what memory addresses they respond to) and how big they are (2K, 4K, 8K, etc.) will totally depend on the computer. The location of the ROM, RAM, and other memory devices in a computer make up what is called the computer’s “memory map”, and as a programmer it is important that you know where in memory these devices are.
The RAM in a computer is where the instructions that make up your program code and data your program needs to work with will be stored while the computer is running your program. Your “data” includes anything your program needs to do its job, for example in a game you might have the player’s screen coordinates, the current score, locations of “bad guys”, an array of bytes that represent the play field, etc.. The “art of programming” is being able to understand a problem and come up with a solution in the form of data structures and program instructions to manipulate that data. For example, in a game the problem might be stated as “shoot all the rocks”. One solution might require data structures in the form of a list to track the location of all the rocks and how much damage each rock has received. You also need to know where the player is, where the player’s bullets are, etc. All this information will be stored in the computer’s RAM and manipulated by the instructions that you write.
Memory Access Specific to the TMS9900 CPU in the 99/4A Computer
The 9900 CPU has some quirks that are very important to understand. Because it is a 16-bit CPU and the program counter is 16-bits the instruction size is also 16-bits, which means every assembly instruction in your program will require at least two bytes of RAM. Therefore, the program counter will always contain an even address.
The 9900 has word (16-bit) and byte oriented instructions, but the byte instructions are always accessed 16-bits at a time. When a program on the 9900 references a memory location at an odd address, the CPU will request the memory location from the even address associated with the odd address. For example, to retrieve the byte at memory location 3, the 9900 will actually place memory address 2 on the address bus and will receive the two bytes at memory locations 2 and 3.
Because of its 16-bit word orientation, and because it provides byte-oriented instructions, the 9900 CPU has a side effect that causes it to do twice as many memory operations as other CPU’s. For example, suppose we want to write to memory location 3. Since the CPU cannot write directly to an odd memory location, it must write to both memory locations 2 and 3. But doing so would destroy the contents of memory location 2 because in this example we only want to write a single byte to memory location 3. To solve this problem the CPU does a read-before-write. So, first it will read both memory locations 2 and 3, then replace the byte going to memory location 3, and finally it will write back both bytes. Also, while this read-before-write is not necessary when accessing 16-bit values (two bytes at an even address), the CPU still performs the read-before-write because internally it uses the same micro-code for all memory operations (both word and byte oriented).
Even though the 9900 is a 16-bit CPU and it is typically running at 3MHz, because of the read-before-write it performs for all memory operations, 8-bit CPU’s of the day like the Z80 and 6502 running at equal (or sometimes slower speeds) could usually outperform the 9900.
The 9900 CPU as configured in the 99/4A computer also has another disadvantage. The original design of the 99/4A called for the 8-bit TMS9995 CPU which was not ready, so the 16-bit TMS9900 was shoe-horned into the machine. The 9995 had 256 bytes of “workspace” RAM inside the CPU which had to be replicated in external RAM for the 9900. So, the 99/4A has 256 bytes of RAM on the 9900’s 16-bit bus, but because of the expense of memory chips at the time, and because 8-bit devices were cheaper, the other RAM in the system is 8-bit memory. Since the 9900 is a 16-bit CPU, everything other than the 256 bytes of “scratchpad” RAM and the console ROMs are accessed via a 16-to-8 bit multiplexer that adds 4 “wait states” to every memory operation. This is significantly compounded by the read-before-write nature of the 9900 CPU and makes the machine very slow despite its 16-bit nature.
It is important to understand these nuances because simply knowing how a CPU operates is not enough. You have to know and understand the system within which the CPU is placed. For example, if you were trying to write a speed critical loop, it is very important to understand that in the 99/4A all memory access causes wait-states except when accessing the limited 256 bytes of workspace RAM. This also highlights an advantage of using assembly language over a higher level language. In a high level language you do not have the options of where to place your program code or data, since the language itself performs all memory management.