Computer Architecture / Organization -- System organization [part 2]

AMAZON multi-meters discounts AMAZON oscilloscope discounts

<<PREV.

Error detection and correction

Where reliability is required to exceed that inherent in the memory devices employed, error detection, or even error detection and correction, may be employed. Any such scheme implies an extra field added to every word of memory.

Parity offers the simplest and cheapest error detection. A single extra parity bit is updated on every write transaction. Two alternatives exist…

• Even parity

• Odd parity

…according to whether the parity bit denotes that an even or odd number of 1s are present in the word. Even parity may be computed in a single exclusive-or (XOR) operation according to…

Only single errors may be detected using a single parity bit. No correction is possible, only an event reported to the processor whose response may be programmed as an interrupt service routine if an interrupt is enabled for such an event. Any memory error is as likely to be systematic as random nowadays.

Hence, on small systems, it is now often considered satisfactory simply to report a memory device fault to the poor user or systems manager.

Hamming code offers single error correction/double error detection (SECDED). Although only a single error may be corrected, double errors may be detected and reported as an event. First, consider what is required of an additional syndrome word which is capable of indicating whether a word is correct and, if not, which of all possible errors has occurred. Given a data word length of n bits and a syndrome word length of m bits there are n+m possible error states.

Including the correct state, there are thus n+m+1 possible states of a memory read result. Thus the syndrome word length is defined by…

This implies the relationship shown in Table 1 between data word length, syndrome word length and percentage increase of physical memory size.

It is obvious from this that such a code is only economic on systems with large data bus width. Further, it would be useful if the syndrome word possessed the following characteristics…

• Value=zeroS No ErrorS No correction

• Value>zeroS Error, ValueS Bit in error(Invert to correct)

Table 1: Relationship between parameters for Hamming error detection code

Data bits -- Syndrome bits -- Percentage increase in memory size

8 4 50 16 5 31 32 6 19 64 7 11

Hamming code achieves all these desirable features by forming subsets of the data word and recording the parity of each. Each bit of the syndrome word in fact just represents the parity of a chosen subset of the data bits. It is possible to choose these sets in such a way that any change in parity, detected by an XOR between recorded and calculated syndrome words, indicates, not just an error, but precisely which bit is in error. For example, the 5-bit odd parity syndrome of a 16-bit data word is calculated via…

The error vector is calclated via…

…where pi

are stored as the syndrome and pSi

are computed after a read. E=0 indicates no error, ES 0 indicates one or two errors, each (or each pair) of which will cause E to take a unique value, allowing the erroneous bit(s) to be identified and corrected.

2.2 Virtual memory organization

Requirements

Virtual memory simply means the memory as it appears to the compiler and programmer. The requirements may be summarized…

• Single linear memory map

• Security

Single linear memory map…hides the complexity and detail of physical memory which should be of no interest whatsoever to the compiler or machine level programmer. Physical memory organization design aims should be considered distinct from those of virtual memory organization. In other words, as far as the compiler is concerned all of memory is unified into a single memory map. This memory map will be typically of a volume approximately equal to that of the low cost (mass storage) device and of access time approximately equal to that of "main" memory. Memory management encapsulates the task of ensuring that this is so. All virtual memory should be considered non-volatile.

These requirements pose severe problems for conventional programming techniques where the poor user, as well as the programmer, is required to distinguish between memory devices by portability (e.g. "floppy" vs. "hard" disc) and by volatility ("buffer" vs. "disc"). It is hardly surprising that computers are only used by a tiny proportion of those who would benefit (e.g. ~7% for business applications). The most promising new paradigm, which might unify user and programmer classes, is that of the object.

Objects are internal representations of "real world" entities. They are composed of…

• Methods

• State

State simply means variables which are private to the object. Methods are operations which affect state. Objects communicate by message passing.

The important point is that neither user nor compiler need give consideration to physical memory organization. Objects are persistent, a fact consonant with the non-volatility of virtual memory. By rendering communication explicit the need for a visible filing system is obviated. The intention here is to point out the relationship between objects and virtual memory15. It is not appropriate to give a full introduction to object oriented systems here. The reader is referred to an excellent introduction in BYTE magazine [Thomas 89] and to the "classic" text [Goldberg & Robson 83].

Security…of access becomes essential when a processor is a shared resource among multiple processes. The simplest approach, which guarantees effectiveness given a suitable compiler, is for each process to be allocated private memory, accessible by no other. The problem is that no such guarantee is necessarily possible at the architecture level. A careless or malicious programmer can easily create code (e.g. using an assembler) which accesses the private memory of another process unless the architecture design renders this physically impossible.

To achieve a secure architecture, memory access must be controlled by the process scheduler which is responsible for memory allocation to processes when they are created.

Memory device hierarchy

In order to address the problem of meeting the requirements of a virtual memory, we begin by adopting a hierarchical model of physical memory (FIG. 23). The topmost volume exists to render the fastest possible mean access time per reference. The bottom volume exists to render the lowest possible cost per bit.

Memory managment ensures that optimal use is made of each resource.

Given a cache, the top two layers are directly accessible on the system bus.

The means of unifying them into a single memory map, accessible via a single physical address, is discussed above. The task of unifying the result with mass storage, via a single virtual address, is discussed below.

[ 14. See Section 2.

15. A full treatment of the subject of virtual memory implementation more properly belongs in a text on operating system, e.g. [Deitel 84]. ]

FIG. 23: Hierarchy of memory devices

FIG. 24: Virtual to physical address translation

Virtual to physical address translation

The primary sub-task of memory management is address translation. The virtual memory map is split up into a sequence of blocks which may be either fixed or variable in size, called pages or segments respectively. As a result the virtual address may be split into two fields…

• Block number

• Offset into block

All device memory maps are divided into block frames, each of which holds some or other block. Blocks are swapped between frames as required, usually across device boundaries, by the memory manager16. The physical location of every block is maintained in a block map table.

Address translation requires two operations…

• Look up base address

• Add base address to offset

…as shown in FIG. 24.

Paged memory

For the moment we shall simplify the discussion by assuming a paged memory, though what follows also applies to a segmented memory. The size of a page will affect system performance. Locality has been shown to justify a typical choice of 512 bytes.

In fact the block map table must also record the physical memory device where the block is currently located. If it is not directly addressable in main memory a page fault event is reported to the processor. Whether software or hardware, the memory manager must respond by swapping in the required page to main memory. This strategy of swapping in a page when it is first referenced is called demand paging and is the most common and successful. Anticipatory paging is an alternative strategy which attempts to predict the need for a page before it has been referenced.

Security may be afforded by each scheduled process possessing its own distinct page map table. This may be used in such a way that no two processes are physically able to reference the same page frame even if their code is identical. It is only effective if no process other than the operating system is able to initialize or modify page map tables. That part of the operating system which does this is the memory allocation component of the process scheduler. It alone must have the ability to execute privileged instructions, e.g. to access a page map table base address register. Note that data or code may be shared by simply mapping a page frame to pages in more than one page map table.

A page replacement strategy is necessary since a decision must be taken as to which page is to be swapped out when another is swapped in. Exactly the same arguments apply here as for the replacement strategy used for updating the contents of an associative cache (see above). As before, the principle of locality suggests that the least recently used (LRU) strategy will optimize performance given structured code. Unfortunately it is very difficult to approximate efficiently. See [Deitel 84] for a full treatment of this topic.

[Note that this activity used to require software implementation, which would form part of the operating system. It is now typically subject to hardware implementation.]

FIG. 25: Fragmentation of a segmented memory

Lastly, because address translation must occur for every single memory reference, speed is of the highest importance. As pointed out above, a table look up and an addition is required for each translation. Addition of page frame address to offset merely requires concatenation [ Only the page frame number is needed, rather than its complete base address, because it is sufficient to completely specify page frame location on account of the fixed size of a page. The physical base address of a page frame is just its number followed by the appropriate number of zeros, e.g. nine zeros for a page size of 512 bytes. ]. Hence it is the table look up that limits performance. Because of this it is common to employ a dedicated associative cache for page map table entries deemed most likely to be referenced next. This typically forms part of a processor extension called a memory managment unit (MMU) which is also usually capable of maintaining the entire page map table without software intervention by independently responding to all page fault events.

Segmented memory

Everything said above about paged memory also applies to segmented memory, which offers an advantage in the ease of rendering security of access at the cost of significantly more difficult memory management due to possible fragmentation of the memory maps of every physical memory device.

Security is easier to achieve since the logical entities which require protection (e.g. the state of a process or object) will naturally tend to vary in size. It is easier to protect one segment than a number of pages.

FIG. 26: Address translation in a paged-segmented memory

Fragmentation is the term for the break up of a memory map such that free memory is divided into many small areas. An example schematic diagram is shown on FIG. 25. It arises due to repeated swapping in and out of segments which, by definition, vary in size. The damaging consequence is that, after a period of operation, a time will arrive where no contiguous area of memory may be found to frame a segment being swapped in.

At the expense of considerable complexity, it is possible to enjoy the best of both worlds by employing a paged segmented memory. Here the memory may be considered first divided into segments and subsequently into pages whose boundaries coincide with those of pages.

FIG. 26 shows how address translation is now performed. A virtual address is composed of a triplet…

• Segment

• Page offset within selected segment

• Word offset within selected page

The segment number selects the page map table to be used. The page number selects the page, offset from the segment base. Finally the word number selects the word, offset from the page base. Note that, although only a single addition is required, two table look ups must be performed. That is the potential disadvantage. Very fast table look ups must be possible. Benefit from caching both tables is impossible if frequent segment switching occurs, e.g. between code and data.

FIG. 27 allows comparison of the appearance of the three different virtual memory organization schemes discussed.

FIG. 27: Three different kinds of virtual memory organization.

3. External communication (I/O)

3.1 Event driven memory mapped I/O

Ports

A port, in the real world, is a place where goods arrive and depart. In a computer the same purpose is fulfilled except that it is data which is received or transmitted. The most efficient way for a processor to access a port is to render it addressable on the system bus. Each port has its own distinct address. A read operation then receives data while a write transmits it. Because ports thus appear within the main memory map this technique is known as memory mapped I/O.

Communication is inherently asynchronous since the port acts as a buffer. Once data is deposited there, either a processor or external device, depending on direction of data transfer, may read it whenever it is ready. Synchronous communication is possible if…

• Port arrival

• Port departure

…events are reported (FIG. 28).

FIG. 28: Port arrival and departure events

Ports may be unidirectional or bidirectional, the direction of data transfer being under program control.

Device drivers The process to which arrival and departure events are reported is called a device driver. Any system with more than one port for input or output must be multiprocessing, at least at the virtual level. In the now rare case where no interrupt generation is possible, polling of all ports must be iteratively undertaken to determine when events have occurred and select 18 the appropriate device driver to generate a response.

The software architecture for a collection of synchronous communication port drivers is shown below expressed in Occam…

PAR i=0 FOR devices WHILE running c.event[i]? signal port[i]? data process (data)

This code assumes the availability of a separate channel for each event. If only a single such channel is available it will be necessary to wait for a signal upon it and subsequently poll event sources.

If a compiler is unavailable for a language which supports such real-time systems programming, interrupt service routines must be individually coded and placed in memory such that the interrupt control mechanism is able to vector correctly. Note that at least two separate routines are required for each device.

Portability is lost. It is not sufficient that a language supports the encoding of interrupt routines. It must also properly support programming of multiple concurrent processes.

In many real-time applications the process which consumes the data also acts as the device driver. This is not usually the case with general purpose computer workstations. Most programs for such machines could not run efficiently conducting their I/O via synchronous communication. In the case of input from a keyboard the running program would be idle most of the time awaiting user key presses. The solution is for the keyboard device driver to act as an intermediary and communicate synchronously with the keyboard and asynchronously with the program, via a keyboard buffer.

The function of the event driven device drivers, which form the lowest layer of the operating system in a work-station, is to mediate between running programs and external devices. They usually communicate synchronously with the devices and asynchronously with running processes.

[ 18. …using a CASE construct.

19. Asynchronous communication implies the use of a buffer which must be protected from simultaneous access by both consumer and producer via mutual exclusion.]

Protocol

External communication channel protocols may be divided into two classes…

• Bit serial

• Bit parallel

Parallel protocols support the transfer of all data bits simultaneously. Bit serial protocols support the transfer of data bits sequentially, one after the other.

Serial protocols must each include a synchronization protocol. The receiver must obviously be able to unambiguously determine exactly when the first data bit is to appear, as well as whether it is to be the most or least significant bit. One method of achieving synchronization is to transmit a continuous stop code until data is to be sent, preceded by a start code of opposite polarity. The receiver need only detect the transition between codes. However it must still know fairly accurately the duration of a data bit.

Serial interfaces are easily and cheaply implemented, requiring only a bidirectional shift register at each end of a 1-bit data channel.

Both serial and parallel protocols require transaction protocol. Perhaps the simplest such is called the busy/ready protocol. Each party emits a level signal indicating whether it is busy or ready to proceed. Each transaction commences with the sender asserting ready. When the receiver ceases to assert busy, data transfer commences and the receiver re-asserts busy, so indicating acknowledgement to the sender. The entire cycle is termed a handshake. Finally when the receiver port has been cleared it must resume a ready signal, allowing the next word to be transmitted.

Note that all protocols are layered. Layers of interest are…

• Bit

• Word

• Packet (Frame)

• Message

Only the first two have been discussed here. The rest are more properly treated in a text on digital communications.

FIG. 29: Parallel port registers of the 6522 VIA mapped into memory

Peripheral interface devices

Here two commercially available peripheral interface devices are introduced.

What follows is in no way intended to be sufficient to prepare the reader to design working systems. Rather, it is intended to impart something of the nature of current commercially available devices.

The 6522 Versatile Interface Adaptor (VIA) implements a pair of parallel ports whose function is subject to program control via the provision of a control register and status register. Further assistance is given the systems programmer in the provision of two programmable timers and a programmable shift register. Each port, each timer and the shift register are capable of generating interrupt requests. FIG. 29 depicts the device as it appears in the main memory map.

Data direction registers are included for each of the two ports, A and B. Each bit within determines whether the corresponding bit in the port is an output or an input. Thus, if required, each port can be subdivided into up to eight subsidiary ports by grouping bits together.

The peripheral control register determines the protocol of each independent port. Automatic input and/or output hansdhaking are provided as options. Other options include, for example, the polarity of the handshake signals of the external device.

[ 20. See Section 5.

21. Their practical exploitation would require data sheets which are easily obtainable from electronic component suppliers.

22. …which may, with a little difficulty, be used as a serial port. ]

FIG. 30: Parallel port status register of the 6522 VIA

The VIA generates a single interrupt request following each type of event for which it is so enabled. Interrupt requests are enabled by setting the appropriate bit in the interrupt enable register. The device driver (interrupt service routine) must respond by polling the status of the VIA by reading the interrupt flag register which records the event which has occurred (FIG. 30).

The auxiliary control register decides whether or not data is latched as a result of a handshake, which would usually be the case. It also controls the shift register and the timers.

Timer control allows for free running, where it repeatedly counts down from a value stored in it by the processor, or one shot mode, whereby it counts down to zero just once. Timers are just counters which are decremented usually by the system clock. An extremely useful option is to cause an event on each time-out, allowing the processor to conduct operations upon timed intervals. Timers may even be used to generate a waveform on a port output pin, by loading new values after each time-out, or count edge signals arriving on a port input pin.

Shift register control allows for the shift timing to be controlled by…

• System clock tick

• Timer time-out

• External clock tick

…events. It also determines whether the shift direction is in or out and allows the shift register to be disabled. Note that no provision is made for a transaction protocol. This would have to be implemented in software using parallel port bits.

Serial communication is much better supported by the 6551 Asynchronous Communications Interface Adaptor (ACIA), whose memory mapped registers are shown in FIG. 31.

FIG. 31: Serial port registers of the 6551 ACIA mapped into memory

Bit level synchronization is established by use of an accurate special clock and predetermining the baud rate (the number of bits transferred per second). The control register allows program control of this and other parameters, such as the number of bits in the stop code (between one and two) and the length of the data word (between five and eight).

Like the VIA/IFR the status register encodes which event has occurred and brought about an interrupt request (FIG. 32). The device driver must poll it to determine its response. Note that parity error detection is supported.

The command register provides control over the general function of the interface device. Parity generation and checking may be switched on or off.

Automatic echo of incoming data, back to its source, is also an option. Other options are the enabling/disabling of interrupt request generation on port arrival/ departure events and the altogether enabling/disabling of transactions.

Serial communication has been traditionally used for the communication between a terminal and a modem, which connects through to a remote computer.

For this reason the handshake signals provided on commercial serial interface devices are called…

• Data terminal ready (sent)

• Data set ready (received)

Terminal communication

Thus far the mechanisms whereby the familiar terminal communicates data both in, from the keyboard, and out, to the "screen", remain unexplained in this volume.

Here is an overview of how a keyboard and a raster video display is interfaced to the system bus in a memory mapped fashion.

Every keyboard is basically an array of switches, each of which activates one row signal and one column signal, allowing its identity to be uniquely characterized by the bit pattern so produced. An encoder is then employed to produce a unique binary number for each key upon a "key press" event. It is not a difficult matter to arrange the codes produced to match those of the ASCII standard.

FIG. 32: Serial port status register of the 6551 ACIA

FIG. 33: Keyboard interface using a VIA port

FIG. 34: Raster scan of a phosphor screen

FIG. 33 shows the encoder connected to the system by means of a VIA port. The handshaking is not shown. Here the keyboard ready signal equates with key press event and should cause the VIA to generate an interrupt request and a handshake response (acknowledge).

The key matrix need not be large since certain bits in the character code output by the encoder are determined by the three special keys…

• Control (bits 5, 6)

• Shift (bits 4, 5)

• Alt (bit 7)

A minimum of 64 character keys, plus the four special keys, are usually required. Thus a 8×8 matrix would be sufficient.

The raster video display is much more difficult. Current technology relies on the cathode ray tube (CRT) for physical display. It may be briefly summarized as an electron beam scanned across a very large array of phosphor dots, deposited on the inside of a glass screen, causing them to glow. The beam is scanned raster fashion (FIG. 34), typically with approximately one thousand lines. By varying the intensity of the beam with time, in accordance with its position, the screen is made to exhibit a desired brightness pattern, e.g. to display characters.

The rapidity with which the intensity may be varied depends upon the quality of the CRT and determines the maximum number of picture elements or pixels which may be independently rendered of different brightness. The screen is divided up into a two-dimensional array of pixels. It is arranged that this array be memory mapped so that a program may modify the brightness pattern displayed simply by modifying the values stored in the array (FIG. 35). Typically, given a word width of one byte, a zero value results in the corresponding pixel being black (unilluminated) and FF16 results in it being white (fully illuminated). The digital values must be converted to an analogue of the intensity (usually a voltage) by a digital-toanalogue converter (DAC). Such a system would offer monochromatic graphics support. color graphics support uses one of two possible techniques…

• Color look up table (GLUT)

• Red, green and blue primary color planes (RGB)

FIG. 35: Byte mapped raster video display

In either case a CRT is required which is capable of exciting three different phosphors, one for each primary color. Typically this is done using three separate electron beams whose intensity is determined by the outputs of three separate DACs. A GLUT is a memory whose address input is the logical color, stored in each pixel location in the screen map, and whose data output is the physical color consisting of the digitally encoded intensities of each of the three primary colors.

In a RGB system a distinct screen map is stored for each primary color.

Address interleaving may be employed to give the impression of a single screen map accessible, for example, as a two-dimensional array of 4-byte words. Each pixel location has one byte for each primary color intensity and one reserved for other kinds of pixel labelling.

Extra hardware is required for drawing characters. Returning to the simpler monochromatic graphics system, we shall now look at how characters are displayed.

A character is an instance of a class of graphic object of that name. In fact the class "character" may be subdivided into a number of subclasses called fonts, so that each displayed character is an instance of one or other font as well as of character and graphic object.

Each character (instance of a font) is most simply defined as a two dimensional array of pixel values. Usually character pixel values may only be black or white. FIG. 36 shows an example. Font design requires good software and much artistic skill. Modern work-stations support user expansion of a font library, usually by purchasing commercial designs.

The screen map must be accessible both from the system bus and from a raster display controller. A memory accessible from two separate buses is known as a dual port memory. Mutual exclusion must be enforced to prevent simultaneous access by both controller and system.

FIG. 36: Simple example of font character design

FIG. 37: Use of raster display controller to interface with a video display

[ 23. A very high quality system might define characters using grey levels as well.

24. Font marketing is a rather telling example of a new product which is pure information.

One day a major share of the world economy might be the direct trade of pure information products over public communication networks.

25. ...more commonly referred to as a cathode ray tube controller (CRTC). An example is the 6545 CRTC integrated device.]

The raster display controller generates the addresses of locations for digital-to analogue conversion synchronized with display beam position. Synchronization with the display is achieved via…

• Horizontal sync

• Vertical sync

...signals. Upon receipt of horizontal sync the beam moves extremely rapidly back to the left-hand side of the screen and begins a new scan line. Upon receipt of vertical sync it flies back to the top left-hand corner to begin scanning a new frame. Three parameters alone are enough to characterize a raster display…

• Frame rate

• Line rate

• Pixel rate

FIG. 37 shows how a raster display controller is connected to the system bus, the dual port memory holding the screen map and the display itself.

In a pure graphics system the font ROM would be bypassed and the whole address required for the (much larger) screen map memory. Mutual exclusion of the screen memory may be achieved by inserting a wait state into the bus cycle if necessary. The necessary buffering of each memory connection (port) is not shown in the diagram.

In the character-oriented system shown, the screen map is made up of a much smaller array of character values, each one a code (usually ASCII) defining the character required at that position. A typical character-oriented display would be twenty-four lines of eighty characters. The code is used as part of the address in a font ROM [ It need not in fact be a ROM but usually this is the case. ] which stores the graphical definition of every character in every available font. The least significant bits determine the pixel within the character definition and are supplied by the controller. Just as a color graphics display requires multiple planes, an extra plane is required here to define the font of each character location. For example, address interleaving may be employed to give the appearance of a single two-dimensional array of 2-byte words. The least significant byte holds the character code, the most significant holds the font number, used as the most significant address byte in the font ROM.

Systems which are capable of overlaying, or partitioning, text and graphics on the same display are obviously more complicated but follow the same basic methodology.

FIG. 38: Connection of system bus to external communications bus via an IOP

3.2 External communication (I/O) processors

Small Computer Systems Interface (SCSI)

There are typically many external devices with which a system must communicate. Some of these may be mass storage devices offering cheap, non volatile but slow memory. Others may facilitate communication with other systems, via a network interface, or communication with users, for example via a laser printer.

One approach, which has become commonplace, is to connect all external devices together onto an external bus interfaced to the system bus by an I/O processor (IOP). FIG. 38 shows such an arrangement. An example of such is the Small Computer Systems Interface (SCSI) [ANSI 86], which is well defined

by a standards committee and well supported by the availability of commercially available integrated interface devices (e.g. NCR 5380). A detailed account of a hardware and software project using this chip may be found in [Ciarcia 86].

Every device must appear to have a single linear memory map, each location of which is of fixed size and is referred to as a sector or block. Each device is assigned a logical unit number whose value determines arbitration priority. Up to eight devices are allowed including the host adaptor to the system bus. The system then appears on the external bus as just another device, but is usually given the highest priority. A running program may in turn program the SCSI to undertake required operations, e.g. the read command shown in FIG. 39.

FIG. 39: SCSI Read command

Each SCSI bus transaction is made up of the following phases…

1. Bus free

2. Arbitration

3. Selection

4. Reselection

5. Command

6. Data transfer

7. Status

8. Message

Arbitration is achieved without a dedicated arbiter. Any device requiring the bus asserts a BSY signal on the control subsidiary bus and also that data bit channel whose bit number is equal to its logical unit number. If, after a brief delay, no higher priority data bit is set then that device wins mastership of the bus.

Selection of slave or target is achieved by asserting a SEL control signal together with the data bit corresponding to the required target and, optionally, that of the initiator. The target must respond by asserting BSY within a specified interval of time. If the target is conducting a time intensive operation such as a seek it may disconnect and allow the bus to go free for other transactions.

Afterwards it must arbitrate to reselect the initiator to complete the transaction.

SCSI seeks to make every device appear the same by requiring that each obeys an identical set of commands. The command set is said to be device independent and includes the following commands…

• Read

• Write

• Seek

An example of the format of a typical command is shown in FIG. 39. Note that by setting a link flag, commands may be chained together to form an I/O program. Chained commands avoid the time consuming process of arbitration.

Integrated SCSI interfaces, such as the NCR 5380, are capable of reading an I/O program in the memory of the host automatically via DMA.

Status and message phases are used to pass information about the progress and success of operations between initiator and target.

FIG. 40: Registers and connection of link processors

Hard channels (Links)

The severe limitation imposed on using bus communication is that bandwidth is fixed for the whole system. As more devices are added to an external bus a point will be reached beyond which performance will fall. An alternative approach is inspired by the model of external devices as processors running processes in their own right. Data flow between processors is facilitated by providing dedicated hard channels. Bandwidth then expands to whatever is required, as long as sufficient hard channels are available.

So far only one architecture offers hard channels. The Inmos Transputer is a complete computer with processor, memory and four hard channels (links), integrated into a single physical device. Links may be connected directly to other Transputers or indirectly, via link adaptors, to "alien" devices.

Each link is controlled by a link processor which implements synchronous communication by means of rendezvous28 at a dedicated memory location. It is programmed by means of dedicated instructions (in and out) which transfer three values into its own private registers (FIG. 40). The local process is identified by its workspace pointer which in turn identifies the PC value. It is stored to enable the process to be rescheduled, when the transaction is complete, simply by copying it to the tail of the ready queue. The message is identified simply by a pointer. A count of the bytes to be transferred is decremented each time a byte has been successfully transferred. The transaction is complete when the count reaches zero. Then the locally participating process, be it sender or receiver, is rescheduled and thus allowed to proceed and execute its next instruction as soon as its turn comes to run again.

The link protocol consists of byte data transfers, each in a frame with two start bits and one stop bit. An acknowledge handshake, confirming reception, is returned by the receiver even before the data transfer is complete (FIG. 41) requiring no delay between transactions.

FIG. 41: Protocol of a hard channel (link)

The advantages of a system composed of a number of Transputers over a purely bus based system may be summarized as follows…

• Communication bandwidth increases linearly with number of Transputers

• Arbitration is accounted for in scheduling mechanism

• Link adaptors are easier to implement than bus adaptors, requiring no dedicated software overhead

See Section 10 for more about Transputers, which certainly represent a very great step change in paradigm and not just in the area of external communications.

Exercises

Question one

i Show by an example why it is that a two-dimensional memory is most efficiently rendered square.

ii The cost per bit (ci ), size (si ) and access time (ti ) of memory device i in a given memory hierarchy are such that… The hit ratio, hi , of each device is defined as the proportion of references satisfied by device i without recourse to one lower in the hierarchy.

Give expressions for the following…

• Mean cost per bit

• Mean access time

• Cost efficiency (expressed as cost per bit of cheapest device over mean cost per bit)

• Access efficiency (expressed as access time of fastest device over mean access time)

What is the overall cost efficiency and access efficiency for the memory hierarchy described below (typical for a contemporary work-station)…

Cost (pence per bit) Size (bytes) Access time (ns)

Hit ratio 101 103 101 0.9 100 106 103 0.9999 10-2 108 107 1.00

What would the hit ratio of the topmost device have to be to yield an overall access efficiency of 10%?

Question two

i Summarize the component signal channels of the system control bus. Include channels for all signals mentioned in this Section.

ii Draw a timing diagram for daisy chain bus arbitration. Explain how a lower priority device, which requests the bus simultaneously with a higher priority one, eventually acquires mastership.

iii Some system bus implementations use polled arbitration whereby, when the bus is requested, the arbiter repeatedly decrements a poll count which corresponds to a device number. As soon as the requesting device recognizes its number, it asserts a busy signal and thus becomes bus master. It is therefore ensured that, if more than one device issues a requests the bus, the one with the highest number is granted it.

Contrast the advantages and disadvantages of the following three bus arbitration protocols.

• Daisy chain

• Polled

• SCSI bus method

Question three

i Draw a schematic diagram showing how an interleaved memory, three DACs, a raster display controller and a RGB video monitor are connected together to yield a three plane RGB color graphics display.

ii Show how the following components…

• VIA

• Modulo 8 counter

• 3-bit decoder

• 3-bit encoder

…may be connected in order to read a very simple 64-key unencoded key matrix which consists simply of an overlaid row and column of conductors such that the inter Section of each row and column pair may be shorted, FIG. 42. Explain how it is read, to produce a 6-bit key code, upon a key press event.

Question four

The LRU replacement strategy, for either an associative cache or a demand paged virtual memory, is difficult and slow to implement since each entry must be labelled with a time stamp and all entries consulted to determine when one is to be replaced.

Devise an alternative implementation which approximates LRU yet is efficient both in extra memory for entry labelling and in the rapidity with which the entry to be replaced may be determined. The minimum amount of combinational logic must be employed.

Question five

i Using the Hamming code described in this Section, derive the syndrome for the data word FAC916.

ii Show that every possible error, both in data and in syndrome, produces a distinct error vector when the Hamming code described in this Section is employed.

FIG. 42: 64-key unencoded key matrix

PREV. | NEXT