Computer Architecture / Organization -- Modern Processor Architecture [part 1]

AMAZON multi-meters discounts AMAZON oscilloscope discounts

1. Introduction

The objective of this Section is not to provide the reader with sufficient knowledge to author a compiler code generator or design a hardware system.

Rather it is intended to illustrate ideas conveyed throughout the text as a whole and demonstrate that they really do appear in real commercial devices. However, a clear overview should result of the major features of each example and references are provided.

It is part of the philosophy of this guide not to render it dependent on any single commercial design but to concentrate attention on those concepts which are fundamental. The reader should bear this in mind. Lastly, each of the Sections below is intended to be self-contained and self-sufficient in order to allow the possibility of consideration of any one system alone. As a result some material is necessarily repeated. However, the reader is strongly advised to read all three.

Much may be learned from the comparison of the three machines.

Note: The NS32000 is considered as a series of processors whereas only the M68000 itself is presented, and not its successors…M68010, M68020 etc. This is justified on two counts. First, all NS32000 series processors share the same programmer's architecture and machine language. This is not true of the successors to the M68000. Secondly, it seemed desirable to first consider a simpler architecture without "enhancements" and "extensions".

2. Motorola 68000

2.1 Architecture

Design philosophy

The Motorola 68000 was a direct development of an earlier 8-bit microprocessor (the 6800) and is fabricated in a self-contained integrated device using VLSI electronic technology. It has succeeded in both the work-station and real-time control markets.

From a software engineering point of view it satisfies the following requirements…

• Reduction of semantic gap

• Structured (procedural) programming support

• Security for multi-user operating system

The M68000 fully qualifies as a complex instruction set computer (CISC). A large instruction set and range of "powerful" addressing modes seek to reduce the semantic gap between a statement in a high level language and one, of equal meaning, in machine language. Several instructions are included in order to allow compact, fast execution of programming constructs and procedure invocation, entry and return. Complex addressing modes similarly support the efficient referencing of both dynamic and static, structured or elemental, data. It should be noted that the ease with which this support may be utilized when machine code is automatically generated by a compiler is a separate issue outside the scope of this guide.

The M68000 also provides support for secure operation of an operating system.

Two operating modes are possible…user and supervisor. The current mode is always indicated by a flag in the processor state register. Certain instructions are privileged to the supervisor to protect first the operating system and secondly users from each other. Privilege violation causes a trap which can be dealt with by the operating system. A trap instruction allows invocation of operating system procedures. A single operand defines an offset into a table of sixteen vectors to operating system procedures.

There follows a list of the most important features of M68000 architecture…

• Complex instruction set (CISC)

• One or two address instructions

• Register file for expression evaluation (no windowing)

• Stack for procedure, function and interrupt service subroutine implementation

• Vectored interrupt mechanism

Instruction opcodes may require one or two operands. Most instructions executed operate on two operands. Operands may be of one, two or four bytes in length.

Vectored interrupts are prioritized and may be masked to inhibit those below a certain priority specified within the processor state register (PSR).

[1. It is one of the criticisms of the CISC approach that compilers cannot easily optimize code on a CISC architecture because of the extent of choice available. See [Patterson & Ditzel 80], [Tabak 87].

2. A trap is a software generated processor interrupt or exception. ]

Programmer's architecture

FIG. 1 shows the programmer's architecture of the M68000. Two register files are provided, one for addresses and one for data. a7 is in fact two registers each of which is used as a stack pointer (SP). Two stacks are maintained, one for supervisor mode, which is used for interrupt service routines, and one for user mode, which is used for subroutines. Any of the remaining address registers may be used as a frame pointer, from which to reference local variables within procedures, and as a static base pointer, from which to reference global variables.

TBL. 1: Flags in the M68000 processor state register and their meaning (when set)

FIG. 1: M68000 programmer's architecture

On reset the processor boots itself using code located at an address found in the interrupt despatch table (FIG. 2). The supervisor stack pointer is initialized with an address found below the reset vector at the bottom of the interrupt despatch table and hence at the bottom of memory at address zero.

The boot code must initialize the user stack and interrupt despatch table and then load (if necessary) and enter the remainder of the operating system kernal.

FIG. 3 depicts the much simplified form of the memory map.

FIG. 4 shows the PSR. Processor state recorded supports both cardinal and twos-complement signed integer arithmetic operations plus current interupt enable, trace enable and supervisor/user mode. The lower byte is called the condition code register (CCR), and may be universally accessed. The upper byte however is reserved for privileged access by the supervisor only. TBL. 1 summarizes the condition code flags and their meaning.

[3. In M68000 terminology, "word" refers to two consecutive bytes and "long" refers to four.]

FIG. 2: M68000 interrupt despatch table

Addressing modes

TBL. 2 summarizes the addressing modes available on the M68000 together with their assembly language notation and effective address computation. Note that "[…]", in the effective address column, should be read as "contents of…".

An addressing mode is specified within the basic instruction in a 6-bit field.

This is divided into two sub-fields…mode and reg. The latter may be used either to specify a register number or to qualify the mode itself. For example, both absolute and immediate modes share the same value (1112) in the mode field.

The reg field dictates how the necessary instruction extension shall be interpreted. In the case of data register direct mode (0002) reg contains the number of the data register to be accessed. Address register indirect is encoded similarly (mode=0102).

Indexing and displacement modifiers are permitted to allow referencing elemental data via an offset from a pointer, and elements within an array using a variable index stored in an address register (for efficient modification).

Pre-decrement and post-increment addressing modes render easier the maintenance and use of stacks and queues. They allow a move instruction to implement push and pop stack operations as opposed to merely copying or inspecting an item on the top of stack.

FIG. 3: M68000 memory map

Byte or word operations affect only the lower order fields within data registers.

Hence move.b d0,d1 will copy the contents of the least significant byte in d0 into that of d1. None of the upper three bytes in d0 will be affected in any way. It is as though they did not exist and the register was simply one byte long. The same applies to arithmetic and logical operations. In the case where memory is addressed, word or long word alignment is necessary. An attempt to access, say, a word at an odd address will result in failure and a bus error trap.

Immediate addressing includes the possibility of a short, "quick" operand stored within the instruction. Many constant operands in a typical code segment are small, e.g. loop index increments. A "quick" operand in an arithmetic instruction is just three bits long whereas for a move it is eight bits.

FIG. 4: M68000 processor state register

TBL. 2: M68000 addressing modes

TBL. 3: Instruction set of the M68000: Program control

TBL. 4: Instruction set of the M68000: Expression evaluation (continued in next Table)

TBL. 5: Instruction set of the M68000: Expression evaluation (continued from last Table)

TBL. 6: Conditional branching on the M68000

Instruction set

Tables 3, 4 and 5 summarize the M68000 instruction set. Where relevant, instruction variants are provided for operation on byte, word and long operands.

Program control is facilitated by a suite of instructions for condition evaluation and conditional branch. A compare multiple element (cmpm) instruction is included to allow simple, optimized implementation of array and string comparison4. TBL. 6 shows all possible conditional branches, with the processor state flag on which they depend, and their meaning.

In the case of two-address instructions the first is referred to as the source and the second the destination. The result of the operation will be placed in the destination which usually must be a register. Where operand order is important, for example with subtraction or division, one must take care since ordering is right to left. Hence sub.w (a0) ,d0 means subtract the contents of the memory location whose address is in a0 from the contents of d0 into which register the result is to be placed. Similarly divu #4, d7 means divide the contents of d7 by four. Note that instructions for short "quick" operands exist for addition and subtraction but not for multiplication and division.

A load/store programming approach may be taken with the M68000 since…

• Immediate to register

• Memory to register

• Register to memory

…moves are efficiently supported and encouraged. Only move instructions allow memory to memory movement. All arithmetic and logical operations place their result in a data register which all but forces a load/store approach.

FIG. 5: M68000 basic instruction formats

A problem remaining with the M68000, although it represents a great improvement over earlier machines, is that the instruction set is not wholly symmetric with respect to addressing modes. Care must be taken to ensure that use of a selected addressing mode is permitted with a given instruction. This can cause complication for the compiler author.

The instruction format typically includes fields for…

• Opcode

• Operand length

• Addressing mode for each operand

Instruction encoding depends strongly on the instruction concerned. The opcode begins at the most significant bit in the operation word or basic instruction (FIG. 5). In the illustration, the field marked "opcode" includes a 2-bit sub field which encodes operand length. Shown are two common formats. Several other formats are possible including those for single direct register addressed operand and conditional branch instructions.

The following instruction extensions may be required depending on addressing mode…

• Index word

• Immediate value

• Displacement

• Absolute address

Zero, one or two extensions are allowed. FIG. 6 shows the instruction extension formats.

An excellent concise summary of the M68000 architecture, together with information required for hardware system integration, is to be found in [Kane 81]. A complete exposition which is suitable for a practical course on M68000 system design, integration and programming, is to be found in [Clements 87].

FIG. 6: M68000 instruction extension format

2.2 Organization

Processor organization

The organization of the M68000 is a fairly standard contemporary design consisting of a single ALU and a fully microcoded control unit. External communication must be fully memory-mapped since no explicit support exists (such as dedicated i/o instructions or control signals). Later derivatives, such as the M68020, possess an instruction cache and pipelining.

Physical memory organization

The system bus is non-multiplexed with address bus width of twenty-four bits giving a uniform, linear 16-megabyte memory map. The data bus is of width sixteen bits although byte access is permitted. A drawback of the design is the requirement of word alignment. Words may only be accessed at even addresses, long words at addresses divisible by four.

Bus timing signals (FIG. 7) include AS (address strobe), which asserts that a valid address is available, and DTACK (data acknowledge), which asserts that valid data has been received or transmitted. If DTACK has not been asserted by the second half of the final clock cycle the start of the subsequent bus cycle will be delayed until it appears. Wait states (extra clock cycles) are then inserted into the bus cycle. One extra level signal is sent by the bus master for each byte of the data bus to indicate whether each is required or not. That the more significant data bus byte carries data from/to an even address, and the less significant byte data for an odd address, reflects the fact that data is stored with less significant bytes lower in memory [ Such an organization is sometimes referred to as "little-endian". ].

FIG. 7: M68000 bus timing for a read operation

2.3 Programming Constructs

Some closure of the semantic gap has been obtained by the designers both by careful selection of addressing modes for data referencing and by the inclusion of instructions which go far in implementing directly the commands of a high level language. Code generation is intended to produce fewer instructions.

However, the instructions themselves are more complex. In other words the problem of efficiently implementing selection and iteration is to be solved once and for all in microcode instead of code.

A "for" loop always requires signed addition of constant, compare and conditional branch operations on each iteration. This is a very common construct indeed. The constant is usually small, very frequently unity. The designers of the M68000 included a single instruction to this end (db<cond>, with condition set to false) optimizing its implementation in microcode once and for all. To take advantage of this instruction, a slight extra burden is thus placed upon the compiler to isolate loops with unity index decrements. There is more to db<cond> however. It checks a condition flag first, before decrementing the index and comparing it to-1. This may be used to implement loops which are terminated by either the success of a condition or an index decrementing to zero.

An example is that of repeatedly reading data elements into a buffer until either the buffer is full or an end of stream symbol is encountered. Unfortunately it is not always easy for a compiler to detect this kind of loop. There follows a code skeleton for each kind of loop discussed.

move. b ntimes, d<n> move. w #len, d<m>

; loop start ; loop start

… move. w input, d<n>

dbf d<n>,-<start offset> … cmpi. w #eos, d<n>

dbeq d<m>,-<start offset>

Below are shown two alternative implementations of a case construct, each with its disadvantages. The result of the computation of the case expression is first moved into a data register where it may be efficiently manipulated. Comparisons are then performed in order to detect which offset to use with a branch. Each offset thus directs the "thread of control" to a code segment derived from the high level language statement associated with a particular case label. Each selectable code segment must end with a branch to the instruction following the case construct end. It is usual to place case code segments above the case instruction. The disadvantage of this implementation is that quite a lot of code must be generated and hence executed in order to complete the branch and exit from the code segment selected. Its advantage is that the code produced is relocatable without effort since it is position independent.

move. w result, d<n> move. w result, d<n>

cmp. w #value 1, d<n> asl. w #2, d<n>

beq. s <offset 1> move. l 5(PC, d<n>), a<m>

cmp. w #value 2, d<n> jmp (a<m>) beq. s <offset 2> <address 1>

… <address 2>

bra. s <else offset> <address 2>

…

The method shown on the left is inefficient if the number of case labels is large (greater than about ten). However, for a small number it is more compact and hence usually preferred. In effect it is simply a translation of case into multiple if…then…else constructs at the machine level.

In the implementation on the right, known as the computed address method, a table of addresses is employed. Address computation is effected, prior to indirection, by use of PC relative with index and displacement addressing. The offset into the table must first be computed from the case expression value by shifting left twice (each address is four bytes long). The computed address method requires the compiler to generate the case label bounds together with code to verify that the case expression value falls within them. If it fails to do so an offset should be used which points to a code segment generated from the "else" clause in the construct. All table entries not corresponding to case label values should also point to the else code. The disadvantage here is that the table generated has to include an entry for every possible case expression value between the bounds rather than every stated case label value.

You should be able to see why, from the point of view of the contemporary machine, widely scattered case label values cause either poor performance or excessive memory consumption depending on the compiler case implementation.

In the latter instance the programmer may prefer to use a number of if…then… else constructs. However such a decision would mean that the architecture has dictated (and complicated) software design. Where possible, it is better to design an architecture to efficiently support the implementation of source constructs rather than the other way around.

FIG. 8: M68000 stack following subroutine invocation and entry

Procedures

Invocation of a procedure is very straightforward since the instruction set offers direct support through dedicated instructions for saving and restoring registers and creating and destroying a stack frame for local variables, link should be used at the start of a procedure. It creates a stack frame of the size quoted (in bytes), unlk should appear at the procedure end. It automatically destroys the stack frame by copying the frame pointer into the stack pointer and restoring the frame pointer itself (from a value saved by link on the stack). Any of the address registers may be employed as frame pointer. In order to save registers on the stack which are to be used within the procedure, movem may be employed as shown in the code segments which follow the next paragraph.

FIG. 8 depicts the stack contents following execution of movem and link, on entry to a procedure. Finally, the last instructions in a procedure should be movem, lea, rts to restore registers and throw away items on the stack which were passed as parameters, thus no longer required, by simply adjusting the value of the stack pointer. A return from the procedure is then effected by copying the return address back into the program counter (PC). In the case of a function procedure6 one must take care to leave the return value on the top of stack.

move. w 0, -(SP) movem. l d<n>-d<m>/a-a<q>, -(SP)

move. w<parameter 1>, -(SP) link a<r>, #<frame size>

… … move. w<parameter n>, -(SP) … bsr <offset> … move. w (SP)+, <result> unlk a<r>

movem. l (SP)+, d<n>-d<m>/a-a<q>

lea. l +<parameter block size>(SP), SP rts

The above code skeletons show how a function procedure call and return may be effected. Prior to the bsr (branch to subroutine) space is created for the return value by pushing an arbitrary value of the required size (long shown). Parameters are then loaded onto the stack in a predetermined order. On procedure entry, any registers to be used within the function procedure are saved, so that they may be restored on exit, and the stack frame then created.

[6 …using the Modula-2 terminology. Pascal users would normally use the term

"function". ]

Expression evaluation

The M68000 is a register machine for the purposes of expression evaluation. For example7, the following code segment may be used to evaluate

move. w a, d0 muls #2, d0 move. w c, d1 muls a, d1 muls #4, d1 move. w b, d2 muls d2, d2 sub. w d1, d2 divs d0, d2 move. w d2, RootSquared

The processor was not designed to perform expression evaluation on the stack.

There are two reasons why it would not be sensible to attempt it. Firstly, it would be inefficient. Only very rarely would the compiler require more than eight data registers. Registers are accessed without bus access cycles. Secondly, the arithmetic instructions are designed to leave the result in a data register. Stack evaluation simply is not supported. The instruction set is designed with the intention that registers be used to the maximum effect.

Data referencing is usually performed using address register indirect with displacement. The register used is the…

• Frame pointer if the variable is local

• Static base pointer if the variable is global

Two-address registers should be reserved for use in this way.

Accessing elements within an array is achieved by address register indirect with displacement and index. The displacement locates the base of the array (or string), offset from frame or static base pointer, and the index locates the required element.

M68000 assembly language programming

Programming the M68000 using assembly language requires detailed documentation of the assembler, linker and loader to be employed. The Motorola standard mnemonics and symbols are documented, together with an excellent treatment of assembly language programming in general and of the M68000 in particular, in [Kane, Hawkins & Leventhal 81]. However, it does not detail the programming tools required. Many contemporary workstations are built around the M68000 or its derivatives including the Apple Macintosh and Sun 300 series.

A very thorough and extremely readable exposition of the Macintosh 68000 Development System tools is to be found within [Little 86].

FIG. 9: NS32000 programmer's architecture

3. National Semiconductor 32000

3.1 Architecture

Design philosophy

The National Semiconductor 32000 was designed in the early 1980s to meet the market for very high performance systems in both the real-time control and work-station markets. Principal characteristics of the design are…

• Reduction of semantic gap

• Structured programming support

• Software module support

• Virtual memory

• Security for multi-user operating system

The three most distinctive characteristics are listed at the top. Both instruction set and addressing modes are designed to reduce code size and execution time of high level language statements. Single instructions replace several required in earlier machines. Most revolutionary, however, is explicit support for modular software. Items in external modules may be directly referenced, be they procedures or data.

There follows a list of features present…

• Complex instruction set (CISC)

• Two-address instructions

• Instruction cache (queue)

• Register file for expression evaluation (no windowing)

• Stack for procedure, function and interrupt service subroutine implementation

• Demand paged virtual memory with self-maintained associative translation cache

• Vectored interrupt mechanism with programmable arbitration

• Symmetric architecture with respect to…

- Number of operands (two)

- Addressing mode usage (any instruction may use any mode)

- Register usage (general purpose; address, data or array index)

- Processor (8, 16 and 32 bit versions use common machine language)

Among these the only truly original feature is that of symmetry. Almost any instruction may employ any addressing mode. Any register may be used for any purpose, address or data, and each is thus referred to as a general purpose register (GPR). This is intended to make compiler code generation easier.

Interrupt/event arbitration protocols available are…

• Polling (software control)

• Fixed priority

• Rotating priority

Only a higher priority event may cause pre-emption of an interrupt service routine.

FIG. 10: NS32000 memory map0

FIG. 11: NS32000 processor state register

Programmer's architecture

FIG. 9 shows the NS32000 programmer's architecture. Eight 32-bit general purpose registers are provided which has been shown to be adequate for the vast majority of expression evaluations.

Six special purpose registers (SPR) define the memory map (FIG. 10) for any running program. Three SPRs point to areas of (virtual) memory containing data, static base register (SB) points to the base of static or global memory, frame pointer (FP) points to local memory where variables, local to the currently executing procedure, are dynamically stored in a stack frame. program pointer (PC) points to the next instruction to be executed. FIG. 11 shows the processor state register (PSR) which defines the processor state. The state recorded in each flag is summarized in TBL. 7. Supervisor access only is allowed to the most significant byte to prevent users from interfering with the operating system in a multi-user environment.

SP0 and SP1 point to the "tops" of the supervisor stack and user stack respectively. Both stacks actually grow downwards in memory so that the addition of an item actually decreases the value of the address stored in the relevant stack pointer. Supervisor stack access is privileged to the operating system alone and is used for system subroutines. Most or all of these will be invoked via interrupt or trap exceptions. The supervisor call (svc) trap instruction is used by a program to invoke operating system procedures.

IntBase points to the base of the interrupt despatch table (FIG. 12) containing external procedure descriptors (see below) for all exception handling subroutines. The first sixteen are of fixed purpose. From there onwards are those for subroutines selected via a vector, read from the data bus after an interrupt, which serves as an index into the table.

MOD points to the current module descriptor (FIG. 13) which describes the software module to which the procedure currently executing belongs. It is only 16 bits in length which implies that all loaded module descriptors should reside in the bottom 64k of memory. As far as the machine is concerned a module is described via a pointer to the base of its global variables (SB), a pointer to the base in memory of its executable program code, and a pointer to the base of a link table (FIG. 13).

The program code of a module is simply a concatenation of its component procedures. Each procedure may be referenced, from within another module, by an external procedure descriptor (FIG. 14). This is composed of two fields.

The least significant sixteen bits form a pointer to the parent module descriptor.

The most significant sixteen bits form an offset from the base of program code, found in the module descriptor, where the entry point of the procedure is to be found.

It is these which are used as "vectors" in the interrupt despatch table. They also

Flag Write access

Meaning when set U Privileged User mode hence privileged instruction causes undefined instruction trap N Any Negative result of twos-complement arithmetic operation Z Any Zero result of arithmetic operation F Any Flag used for miscellaneous purposes e.g. arithmetic overflow L Any Lower value of second operand in comparison operations T Any Trace in operation causing TRC trap after every instruction C Any Carry after an addition, borrow after a subtraction

form one kind of entry in the module link table to describe procedures referenced which belong to other modules. The other kind of entry in the link table is simply the absolute address of a variable belonging to another module. Hence whenever an application is loaded, the descriptors and link tables for all its component software modules must be initialized in memory.

TBL. 7: Flags in the NS32000 processor state register and their meaning (when set)

FIG. 12: NS32000 interrupt despatch table

FIG. 13: NS32000 module descriptor and link table

FIG. 14: NS32000 external procedure descriptor

Addressing modes

TBL. 8 offers a summary of NS32000 addressing modes together with their encoding and effective address computation. Note that "[…]" in the effective address column means "contents of…".

Any principal addressing mode may be extended by addition of a scaled index.

The scaling indicates whether the array is one of…

• Bytes

• Words

• Double words

• Quad words

…where the term word is interpreted as meaning two bytes.

TBL. 8: NS32000 addressing modes

FIG. 15: NS32000 basic instruction format

Instruction set Tables 10.9 and 10.10 show almost all of the NS32000 instructions together with the operation caused by their execution. The… notation denotes one of the following operand lengths…

• Byte (i=b)

• Word (i=w)

• Double word (i=d)

The instruction set is thus also symmetric with respect to data length. For example movw means "move a word".

TBL. 11 lists all the possible branch conditions of the processor state and the associated branch instruction mnemonic. The possibility of branching according to the simultaneous state of two flags helps close the semantic gap with if…then…else selection. Note that semantic gap closure for selection and iteration is also assisted via the inclusion of add, compare & branch and case instructions (see below).

The general instruction is composed of a basic instruction (FIG. 15) of length one, two or three bytes possibly followed by one or two instruction extensions containing one of the following…

• Index byte

• Immediate value

• Displacement

…depending on both the instruction, which may have an implied operand, and each of the two addressing modes, one or both of which may require qualification. The basic instruction encodes…

• Opcode

• Operand length

• Addressing mode for each operand

FIG. 16 shows the format of a displacement extension which may be one, two or four bytes in length.

A complete description of the NS32000 instruction set and addressing modes may be found in [National Semiconductor 84].

TBL. 9: Instruction set of the NS32000: Program control

TBL. 10: Instruction set of the NS32000: Expression evaluation

TBL. 11: Branch conditions for the NS32000

FIG. 16: NS32000 displacement instruction extension format

3.2 Organization

Processor organization

FIG. 17 shows the organization of the NS32332. This an evolved member of the NS32000 series. The design is a hybrid stack+register machine, offering the convenience of a stack for procedure implementation and the speed and code compactness afforded by register file expression evaluation. An instruction cache queues instructions fetched when the bus is otherwise idle. A dedicated barrel shifter and adder are provided for rapid effective address calculation.

Address and data are multiplexed on a common bus. Additional working registers are provided. These will be invisible even to the compiler and are used by the microcode in implementing instructions.

FIG. 17: NS32332 processor organization

Physical memory organization

Physical memory of the NS32000 is organized as a simple, uniform linear array where each address value points to a single byte. Hence it is said to offer byte oriented addressing. However, depending on the data bus width of the processor in use, two or even four bytes may be read simultaneously.

A modular interleaved memory is supported as shown in FIG. 18. Four signals are provided by the processor to enable each memory bank to partake in any given bus transaction depending on the address and word length required.

This allows byte-oriented addressing without word alignment access restrictions.

Hence, for example, a two-byte word can be read at an odd address [ NS32000 documentation reserves the term "word" to mean two bytes, "double word" four bytes and "quad word" eight bytes. In the text here the term is used more generally. ].

TBL. 12 shows a table of all possible modes of bus access. FIG. 19 shows the timing diagram for a bus transaction without address translation.

TBL. 12: NS32000 bus access types

FIG. 18: NS32000 modular interleaved memory

FIG. 19: NS32000 bus timing

Virtual memory organization

The NS32000 employs a two level demand paged address translation scheme as shown in FIG. 20. A page size of 512 bytes is used to optimize the trade-off between page table size and program locality. Each page present in memory is pointed to by one of 128 entries in a special page called a pointer table. Each pointer table is pointed to by one of 256 entries in a page table. The page table itself is located via a pointer register in the memory management unit (MMU).

Only 132k of memory need thus be allocated to provide complete virtual to physical address mapping. However, usually only the page table and pointer tables currently in use are kept in memory. Each process running on the system may have its own private page and pointer tables. This affords both security and the possibility of sharing physical pages.

FIG. 20: NS32000 virtual to physical address translation

An associative translation cache is employed to avoid the need for extra bus cycles being required to look up entries in the page and pointer tables, otherwise address translation would be hopelessly slow. The cache replacement algorithm employed is least recently used (LRU) and cache size is just thirty-two entries.

Note that page faults can occur because either the desired page or the pointer table is absent from memory. When one occurs the MMU signals the central processing unit (CPU) to abort the current instruction and return all registers to their state before it began. The PC, PSR and SP are saved on the interrupt stack and an abort trap occurs, whereupon the page swap may be carried out. The MMU is designed to support a least frequently used (LFU) page replacement algorithm.

Security is afforded in the following ways…

• Separate "supervisor" page table

• Separate page table per process

• "Page protection" attributes to each page and pointer table entry

• Supervisor alone may modify page and pointer tables FIG. 21 shows the bus timing modified to allow address translation. Note that only one extra clock cycle per transaction is required provided a hit is obtained by the associative translation cache, which is 98% efficient!

3.3 Programming

Constructs

Closure of the semantic gap has been obtained by the designers both by careful selection of addressing modes for data referencing and by the inclusion of instructions which go as far in implementing directly the commands of a high level language. Code generation is intended to produce fewer instructions.

However, the instructions themselves are more complex. In other words the problem of efficiently implementing selection and iteration is solved once and for all in microcode instead of code.

A "for" loop always requires signed addition of constant, compare and conditional branch operations on each iteration. Since this is a very common construct indeed, and the constant is usually small, the designers of the NS32000 included a single instruction (acb) to this end, optimizing its implementation in microcode once and for all.

mov ntimes, index case *+4[r<n>:]

; loop start <offset1>

… <offset2>

acb$-1, index, *-<start offset> …

Above are code "skeletons" for the implementation of both for loop and case constructs, case effects a multi-way branch where the branch offset is selected according to the value placed previously in r<n>. This is used as an index into a table of offsets which may be placed anywhere but which it is sensible to locate directly below the case instruction. The argument to case is the location of an offset to be added to the PC which is addressed using PC memory space mode.

Each offset thus directs the "thread of control" to a code segment derived from the high level language statement associated with a particular case label. Each selectable code segment must end with a branch to the instruction following the case construct end. It is usual to place such code segments above the case instruction.

The compiler must generate code to evaluate the case expression which may then simply be placed in the index register. It should also generate the case label bounds, an offset table entry for every value within the bounds and code to verify that the case expression value falls within them. If it fails to do so an offset should be used which points to a code segment generated from the else clause in the case construct. Any offset not corresponding to a case label value should also point to the else segment. You should be able to see why widely scattered case label values indicate an inappropriate use of a case construct. In such circumstances it is better to use a number of if…then…else constructs (perhaps nested).

FIG. 21: NS32000 bus timing with address translation

Procedures

Invocation of a procedure is very straightforward since the instruction set offers direct support via dedicated instructions for saving and restoring registers and creating and destroying a stack frame for local variables, enter should be used as the first instruction of a procedure. It saves a nominated list of registers and creates a stack frame of the size quoted (in bytes), exit should be the last but one instruction. It restores a nominated list of registers and automatically destroys the stack frame by copying the frame pointer into the stack pointer and then restoring the frame pointer itself (from a value saved by enter on the stack). FIG. 22 depicts the stack contents following execution of enter on entry to a procedure. Finally, the last instruction in a procedure should be retwhich throws away the items on the stack which were passed as parameters, and thus no longer required, by simply adjusting the value of the stack pointer. It then effects a return from the procedure by copying the return address back into the program counter (PC. In the case of a function procedure -- …using the Modula-2 terminology. Pascal users would normally use the term function. -- one must take care to specify the argument of ret (i) so as to leave the return value on the top of stack.

movqd 0, tos enter [<reg list>], $<frame size>

movd<parameter 1>, tos …

… … movd<parameter n>, tos … bsr <offset> exit [<reg list>]

movd tos, <result> ret $<parameter block size>

FIG. 22: NS32000 stack following subroutine invocation and entry

The above code skeleton shows how a function procedure call and return may be effected. Prior to the bsr (branch to subroutine) space is created for the return value by pushing an arbitrary value of the required size onto the stack (double word shown). Parameters are then loaded onto the stack in a predetermined order. On procedure entry, any registers to be used within the function procedure are saved, so that they may be restored on exit, and the stack frame created.

External procedures, i.e. those which reside in other software modules of the application, may be invoked using either cxp (call external procedure) or cxpd (call external procedure via descriptor) instead of bsr. The argument to cxp is simply an offset (displacement) within the current module link table. That of cxpd is an external procedure descriptor (see above), rxp (return from external procedure) must be used in place of ret.

Expression evaluation

The NS32000 is a register machine for the purposes of expression evaluation.

For example, the following code segment may be used to evaluate

movd a, r0 muld $2, r0 movd c, r1 muld a, r1 muld $4, r1 movd b, r2

muld r2, r2 subd r2, r1 divd r1, r0 movd r0, RootSquared

The processor was not designed to perform expression evaluation on the stack.

There are two reasons why it would not be sensible to attempt it. Firstly, it would be inefficient. Only very rarely would the compiler require more than eight registers. Registers are accessed without bus access cycles. Secondly, the stack is only modified if a tos operand access class is read as is the case with the first, but not the second, operand in an arithmetic instruction. Hence add 4 (SP), tos will leave the stack size unaltered. The first operand will remain. The instruction set is designed with the intention that registers be used to the maximum effect.

Data referencing is usually performed using memory space mode, in particular…

• Frame memory space mode if the variable is local

• Static memory space mode if the variable is global Accessing elements within an array is achieved by concatenating a scaled index address modifier to a memory space mode.

NS32000 assembly language programming

Programming the NS32000 using assembly language requires detailed documentation of the assembler, linker and loader to be employed. The National Semiconductor assembler is documented in [National Semiconductor 87]. This runs under the Unix operating system and hence allows standard Unix tools to be used. [Martin 87] is devoted to the subject and is highly readable. NEXT>>

PREV. | NEXT