Guide to Computer Architecture and System Design--DESIGN METHODOLOGY AND EXAMPLE SYSTEMS (part 1)

AMAZON multi-meters discounts AMAZON oscilloscope discounts

1. PARALLEL PROCESS FACTORS

Parallel processing can be attributed to the following factors:

Pipelining

It is a technique of decomposing a sequential process into sub-operations, with each subprocess being executed in a special dedicated segment that operates concurrently with all other segments (arithmetic pipelines and instruction pipeline). RISC uses an efficient instruction pipeline. Data dependency is tackled by compiler support on a RISC machine, for proper subtask scheduling.

Vector processing For scientific computing, the vector instructions capability has to be tapped with a proper way of pipelined approach (on a multiple processor environment) to respond for real-time applications. Speedup is more often achieved with super-computers.

Array processors

This is achieved by deploying more functional units basically to support SIMD with pipelines besides meeting fault tolerance area with modular redundancy. Already SIMD benches offer good service as special attachments on general computers.

In multiprocessing environment, cache coherence problems arise because of the need to share writable data. In hardware solutions, the cache controller is specifically designed to monitor bus requests from all CPUs and IOPs, referred to as a snoopy cache controller.

Today the computing architectures are aimed at reliability, availability and serviceability for they have gained high degree of confidence limits in user applications. Some of the fault tolerant machines include ESS (Electronic Switching System), SIFT of NASA for commercial Aircrafts, system R of the IBM san Jose Research Laboratory for database management and the PLURIBUS of the ARPA (Advanced Research Projects Agency) computer network in real-time needs. The complexity of many software systems have become unmanageable, and the natural consequences are delays and unreliability.

The consequences of software errors and failures are normally more serious for on-line systems than in batch processing machines. Thus software must be consistent, robust and fail-safe to account for a fault-free computer.

Spoofing is an active endeavor in which the offender induces the system to provide the desired information while bugging is a passive activity in which the offender must await user communications and can only steal what the users transmit. Hardware reliability calls for good modular designs, testing for fault-coverage in the VLSI sphere, whereas the system reliability has to concentrate on software redundancy techniques like N-version programming, good module coupling and cohesions and a thorough self-diagnostics and a range of debugging tools to mean a real worker as an augmentive tool besides the ever improving knowledge power of the irreplaceable human experts on intelligent platforms.

The present decade may concur with natural language processing and neural networking as architectural blends.

Parallel and multi processing appeared since the 1980's to innovate newer systems.

The Cray-1 and Cyber-205 use Vectorizing compilers. Examples of multiprocessor systems include Univac 1100/80, Fujitsu M382, the IBM 370/168 MP, and the Cray X-MP. A high degree of pipelining and multiprocessing is greatly emphasized in commercial supercomputers. In what follows, the principles of multiple processes and some example computer structures have been touched upon to depict the architectural trends well over the past two decades.

2. PARALLEL PROCESSING MOTIVES

Research and development of multiprocessor systems are aimed at improving throughput, reliability, flexibility and availability. The speedup that can he achieved by a parallel machine with n identical processors working concurrently on a Single problem is at most n times faster than a Single processor whereas thruput is more defined on a batch process. In practice, the actual speedup varies from log2 n to an upper bound n/ln n. More often the speed achievable is dictated by the chosen problem and the supportive algorithms. Computer systems have improved in differing phases like batch processing, multiprogramming, time-sharing and multiprocessing. Varying amounts of parallel processing mechanisms have captured the uniprocessor Von Neumann trends. The techniques may include multiplicity of functional units, parallelism and pipelining within the CPU, Overlapped CPU and va operations and balancing of subsystem bandwidths.

Parallel processing applications in various spheres call for basic scientific research in the fields of data base maintenance systems to supercomputing targets.

Memory Subsystems

Block-structured programs yield a high degree of modularity as found with structured languages like Pascal, C and Algol. The modules are compiled to produce machine codes in a logical space which may be loaded, linked and executed. The set of logically related contiguous data elements which are produced is commonly called as segment. Segments are allowed to have variable sizes unlike pages. The method to map virtual address into a physical address space and segments sharing is a critical design factor. Burroughs B5500 uses segmentation techniques. Each system process has a Segment Table (ST), pointed to by a segment table base register (S1BR) with the total process being active. The S1BR permits the relocation of the ST, a segment itself, which is in main memory with the active programs. The page fault rate function is minimized by dynamic dataflow computing, good memory allocation strategies and cache memory maps. Pre-fetching with swap facility meets time- sharing configuration.

Since terminals are relatively slow devices which often interact, serial data transmission assures reliable communication. INTEL 8089 input output processor provides parallel data transfer used with microcomputers. Distributed data processes involve good amount of memory management on shared/sharable information. Multiprocessors are classified by the way their memory is organized. A multiprocessor system with common shared memory is called a shared-memory or tightly coupled multiprocessor. Loosely coupled or distributed memory systems have their own private local memories and prove fruitful when the interaction between tasks is minimum. Bus being a constraint, various inter connection methods like time-sharing, multiport memory, crossbar switching and hypercube are used with loosely coupled processing elements.

3. PIPELINE MECHANISM AND MACHINES

The efficient utilization of processor elements during pipelining and the effective control on program parameters playa significant role in reflecting the response-times.

The number of tasks that can be completed by a pipeline per unit time is called its throughput. The efficiency of a linear pipeline is measured by the percentage of busy time-space spans over the total time-space span, which equals the sum of all busy and idle time-space spans. An intelligent compiler must be developed to detect the concurrency among vector instructions so that parallelism can be installed which otherwise is lost by use of conventional languages. It is also desirable in CISC (Complex instruction set computer) to use high-level programming languages with rich parallel constructs on vector processors and multi-process protocols. Algorithms do playa dominant role in process optimization.

The attached array processors include AP-120B (FPS-164), and the IBM 3838. Vector computers include the early systems Star-100, TI-ASC having a speed of 30 to 50 million operations per second, and the further improved systems like Cray-1, Cyher-205 and VP-200. The Cray-1 is not a "stand-alone" computer. A front-end host computer serves as the system manager. The computation section using a 12.5 ns clock period has a 64-bit wordlength and plenty of register storage. The memory section has interleaved banks each of 65,536 words and also incorporates the SECDED (single error correction and double error detection) facility for safe communications. The I/O section have 12 input and output channels also meeting priority resolvers.

The efficiency of a computer depends heavily on the inherent parallelism m the programs it executes. The system designers and programmers share the responsibility of exploring the embedded parallelism on pipelined machines.

Multiple Processors Programming

Multiprocessing supports concurrent operations among co-operative processes in spite of shared resources. The Carnegie Mellon Cm, Cyber-170 and PDP-10 are some model examples of multiprocessing architecture. The symmetric systems facilitate error recovery in case of failure, by techniques like N-Version programming and backtracking.

Recoverability, however, is not synonymous with reliability. The inherent redundancy on a multiprocessor improves the fault tolerant character of a system. The connectivity of a multiprocessor organization is decided by whether the nodes are loosely or tightly coupled and the adjacency matrix. The time shared bus configuration assumes the least hardware cost and fit for uniprocesses. The crossbar switch means a tight connectivity and calls for reliability measures. Multiport memories allow distributed memory management.

The presence of private caches in a multiprocessor necessarily introduces cache coherence problems that may result in data inconsistency. Non homogeneous processing elements with differing functionality call for software resource managers in a multiprocessor system. Program control structures aid the programmer to write efficient parallel construct'). The high degree of concurrency in a multiprocessor can increase the complexity of fault -handling, especially in the recovery step. The use of transputers for scientific computing employ concurrency among the coordinating process elements. The data base distribution for a lightly-coupled multiprocessor environment calls for good system design besides an embedded physiques of the machine. Scaling up the activities and speedup are contrasting attributes to be met by effective algorithmics, of the respective machines.

Multiprocess Control

With a high degree of concurrency in multiprocessors, deadlocks will arise when members of a group of processes which hold resources are blocked indefinitely from access to resources held by other processes within the group. A deadlock is balanced by one or more of the following reasons, namely mutual exclusion, non preemption, awaiting and circular waits.

FIG. 1 SIMD Queue Model

Consider a system consisting of P identical processing elements and a single infinite queue to which processes arrive. The mean processing time of processes on each processor is 1/μ and the mean inter-arrival time of processes to the system is 1/λ. Figure 1 shows the model.

The utilization of the processors ρ = u/p, where u is the traffic intensity given by

u = λ / μ

The number of parallel tasks, the pipelength of each task and the vectorization factors playa dominant role on MIMD (multiple instruction stream multiple data stream) multiprocessing bench along with balanced system bandwidths and compiler-compilers for portability to augment the parallel processing scenario.

4. VLSI ARCHITECTURE

The constraints of power dissipation, I/O pincount, long communication delays are prominent figures in VLSI. Properly designed parallel structures that need to communicate only with their nearest neighbors will gain the most from very-large-scale integration. The delay in crossing a chip on polysilicon, one of the three primary interconnect layers on an NMOS chip, can be 10 to 50 times the delay of an individual gate. WSI (Wafer Scale Integrated) implementation of highly parallel computing structures demand high yield on the wafer. The wafer is structured so that the presence of faulty Processing Units is masked off and only functional ones are used. Many practical problems of testing, routing around a faulty PE, power consumption, synchronization, and packaging are open problems even to date. Feature extraction and pattern classification are initial candidates for possible VLSI implementation. The eigenvector approaches to feature selection and Bayes Quadratic discriminant functions, must be realizable with VLSI hardware. VLSI computing fits with applications in image processing, syntactic pattern recognition, pictorial queries and database systems gaining reliability.

Packet switching networks for dataflow multiprocessors and Wafer Scale integration of the switch lattice are worth mentions of the parallel processing architectures.

5. CERTAIN SYSTEM CONCEPTS

Illiac IV System Concepts

The IIliac IV project was started in the Computer Science Department at the University of Illinois under the principles of parallel operation to achieve a computational rate of 10^9 instructions/second. The system employed 256 processors operating simultaneously under a central control

The logical design of Illiac IV is patterned after that of the Solomon computers. In this design, a single master control unit (CU) sends instructions to a sizable number of independent processing elements (PEs) and transmits addresses to individual memory units associated with these PEs. The design of Illiac IV contained four CUs, each of which controlled a 64-ALU array processor. Each PE has its own 2048 word 64-bit memory called a PE memory which can be accessed in no longer than 350 ns. The I/O is handled by a host B6500 computer system. The operating system inclusive of assemblers and compilers resides m B6S00.

Illiac IV was indeed used as a network resource by the ARPA network system. The instruction set included a good number of arithmetic and logical functions. The B6S00 supervises a 10^12 -bit write -once read-only laser memory developed by the Precision Instrument Company. The beam from an argon laser records binary data by burning microscopic holes in a thin film of metal coated on a strip of polyester sheet, which is carried by a rotating drum. The read and record rate is four million bits per second. A projected use of this memory will allow the user to "dump" large quantities of programs and data into this storage medium for leisurely review at a later time; hard copy output can optionally he made from files within the laser memory.

Super Computer Architecture Example

The Texas Instruments advanced scientific computer (TI-ASC) is a highly modular system offering a wide spectrum of computing power and configurability. It has also a peripheral processor and supports on-line bulk storage. A significant feature of the central processor hardware is an operand look-ahead capability. Data communications are controlled by a data concentrator which, in turn, interfaces to the memory control unit through a channel control device. TI-980, the data concentrator, is a general-purpose computer with up to 64K 16-bit word of memory and a one microsecond cycle time.

The operating system provides for buffering, reformatting, routing, protocol handling, error control and recovery procedures and system control messages. Standard types of magnetic tape drives, card equipment, and printers are interfaced to the ASC. '£be memory control unit acting asynchronously is the specialty that can chase with technology race.

Staran Array Processor

The associative or content-addressed memory could acquire in a single memory access any data from memory. The STARAN S is an associative array processor consisting a symbolic assembler called APPLE and a set of supervisor, utility, debug, diagnostic and subroutine library program packages. Actual applications have been in real -time sensor related surveillance and control systems. In air traffic control application, a two-array STARAN S-500 was interfaced via leased telephone lines with the output ot the FAA ARSR long range radar at Suitland, Maryland. Digitized radar and beacon reports for all air traffic within a 55 mile radius of Philadelphia were transmitted to STARAN in real-time. A processing function for locating specific character strings (such as place names)

in textual information was developed for STARAN that ran 100 times faster on a Sigma 5 conventional machine. The large scale data management architects are certainly the beneficiaries of the associative mapping.

The Goodyear Aerospace STARAN and the Parallel Element processing Ensemble (PEPE), built around an associative memory (AM) instead of using the conventional random access memory belong to the SIMD class. AM is content addressable. also allowing parallel access of multiple memory words. This allows for use in searching data bases besides the image processing and pattern recognition domains.

The processor utility and complexity is often dedicated to the control it supports.

The simple one-by-one instruction execution to the wait/await forms of dataflow computing dictate the control architectures besides the other SIMD and MIMD benches.

The MPP System

In 1979, the massively parallel processor for image processing (MPP) was brought out by Goodyear Aerospace containing 16384 microprocessors to operate in parallel to process satellite images. Multiple-SIMD systems are dedicated for massively large arithmetic and logical processing. The high-level languges like Tranqual, Glypnir are extensions to Illiac-IV machine towards array processing.

One of the first projects aimed to dissipate the RISC (reduced instruction set computer) architecture advantages was conducted at the University of California, Berkeley.

The Berkeley RISC I is 32-bit integrated circuit CPU that supports 32-bit address and either 8-, 16-. or 32-bit data. It has a total of 31 instructions and a 32-hit instruction format containing a 8-bit fIxed length opcode. It has 138 registers. The strategy employed is more hardwired control and a rich register set with compact addressing modes allows these machines to be fit for distributed parallel processing ends. Thus, multiprocessor scheme of this type shall allow desirable features on a server-client architecture in order to meet continuing thruputs on the continuous system operation.

In universal high-speed machines it is necessary to have a large -capacity fast -access main store. Though it is technically feasible to make the one -level memory hierarchy, it is more economical to provide a core store and drum combination with virtual memory support. High-speed computing splits the program store into four separate stacks and extracting many instructions in a single cycle despite the fast 2 u sec. machine cycle time.

The memory segmentation of the Intel processors is Similar to the usage of multiple stacks.

Burroughs' B 6500 Stack Mechanism

Burroughs B6500/B7500 use the stack architecture mechanism to cope up with multiple processes and support languages ALGOL, COBOL, and FORTRAN. Some salient features of these systems are dynamic storage allocation, re-entrant programming, recursive procedure facilities, a tree structured stack organization, memory protection and an efficient interrupt system. The software has to cope up with good book-keeping of the stacks used dynamically under program execution and meet the tangible events of data flow and instruction catches. The addressing environment of the program is maintained by hardware. The stack orientation of the supervisor at runtime thus establishes an opening for multiple processes composed of complex program modules to be run smoothly.

VAX-II Extension to DEC Family

VAX-II is the Virtual Address extension of PDP- 11 architecture. It extends the 16 bit virtually addressable pattern of PDP-11 to 32 bits, giving an address capacity of 4.3 gigabytes. The high level language compilers accept the same source languages as the equivalent PDP-11 Compilers and execution of compiled programs gives the same results.

It's environment is real -time operations with a Unibus tradition. Actually VAX-11 is quite stack oriented, and although it is not optimally encoded for the purpose, can easily be used as a pure stack architecture if desired. It employs sixteen 32-bit general registers which are used for both fixed and floating point operands. Besides the program and stack pointers, the frame pointer (FP) and argument pointer (AP) need special mention. The VAX-II procedure calling convention builds a data structure on the stack called a stack frame. FP contains the address of this structure. AP contains the address of the data structure depicting the argument list. The T bit of the PSW which when set forces a trap at the end of each instruction. This trap is useful for program debugging and analysis purposes. The software is supported by a good mix of addressing schemes. The virtual address space of 4.3 gigabytes supports timesharing jobs and priority scheduling. The CPU is a microprogrammed processor which implements the native and compatibility mode instruction sets, the memory management, and the interrupt and exception mechanisms. Thus, the system fits to the class of MAXI computers of a uniprocessor type.

Any mix of four Unibus or Massbus adaptors provide for attaching to peripheral buses that are not compatible with the VAX-11 /780 processor/memory.

B 5000 Design Issues

A hardware-free notation was used (machine- independent language) in B 5000 processor with symbol manipulative capabilities like ALGOL. Speed of compilation and program debugging were improved in order to reduce the problem time. Program syntax should permit an almost mechanical translation from source languages into efficient machine code as is the case with COBOL. The parallel processing can be facilitated for a multiprogramming domain. The character mode of usage allows list structures as employed in information processing with interpreter languages. Each program word contains 4 syllables equal to 48-bits of store. It follows a stack mechanism for addressing capacity for a reliable amount of data transfers. It employs program reference tables and polish notation to ease the objects code generation. B 5000 also uses nesting of subroutines and dynamic storage allocation policy.

PDP-11 Evolution

Sets of computing systems exhibiting architectural similarity form a family. The evolving technology accounts much for the categorization besides the program compatibility features. A family usually is planned to span a wide cost and performance range which have to cope up with software Reuse, engineering and interchangeable peripherals The PDP-1I of Digital Equipment Corporation, a minicomputer. is an evolved system for range The weaknesses like limited address ability, small number of working registers. non provision of stack facilities, and limited I/O processing were improved upon to build the PDP-II general -purpose computer. The system is primarily intended for scientific computations because of addition of Six 64-bit registers for floating point arithmetic. It follows a Unibus architecture to accommodate the performance utility. A read only memory can be well used for reentrant codes but definitely unfit for an interactive I/O as used extensively in display processors and signal processing. Despite the deficiencies, the system could support a multiprogram environment meeting both COBOL and FORTRAN users. It operates under RSX-11 M, a real time system for project groups.

Thus a good amount of information science has gone into PDP systems range coverage.

The machine manufacturers went in either to manage constant cost with increasing functionally or decrease the cost to maintain constant functionality on the growth path.

PLAs ( Programmable Logic Arrays) and Microprogramming ( firmware concept) have been deployed actively in making the system stay.

Cray-1 System

A maxi-computer is a largest machine that can be built in a given technology at a given time. The CRA Y-l supports fast floating point operations ( 20 to 50 million flops) and makes use of an optimized Fortran compiler for vector processing. The system is equipped with 12 i/o channels, 16 memory banks, 12 functional units and more than 4 K-bytes of register storage. Though the machine is of CISC category, the instructions are expressed in either one or two 16-bit parcels. The arithmetic logic instructions occupy a 7-bit opcode.

With a good capability of pipelining, the system indicates interrupt conditions by the use of a 9-bit flag register. The interrupt conditions are: normal exit, error exit, i/o interrupt , uncorrected memory error, program range error, operand range error, floating point overflow, real-time clock interrupt, and console interrupt. Floating-point numbers is composed of a 49-bit signed magnitude fraction and a 15-bit biased exponent while integer arithmetic is performed in 24-bit or 64-bit 2's complement form. The addressing scheme is highly flexible for array operations. The Cray operating system IS a batch type supporting 63 jobs in a multiprogramming environment. Cray Assembler language has the features of a powerful macro assembler, an overlay loader, a full range of utilities including a text editor, and some debug aids. Front-end computers can be attached to any of the i/o channels without affecting the CRAY's performance. Cache memory, dynamic micro-program mapping and optimum algorithms shall go a longer way in aiding aspiring Cray users.

Symbol System

The SYMBOL language is directly implemented in hardware and thus uses less main memory for "System software". Also less virtual memory transfer time is needed. Source programs are special forms of string fields. They are variable length ASCII character strings with delimiters defining length and type. This is the outcome of a major developmental effort in increasing the hardware functional abilities (Memory management). The translator 1R accepts SYMBOL and produces a reverse polish object string by means of direct hardware compilation. Fairchild built SYMBOL computer during 1964-70 that provided hardware support for interactive compilation. A study of a modem computer installation and its users as a total "system" reveals where and how the computing cost is divided. Consultants from Iowa State University did such a study. The objective of data processing is to solve problems where the "user with a problem" is the input and the "Answer" is the output. It is assumed that the user has his problem well defined and has the data available but the data is not yet programmed. The conversion of his problem to a computable language and the debugging necessary for correct execution is included in the total cost of operating an installation. The SYMBOL hardware has been engineered for good reliability and at the same time easy maintenance.

IBM System/38

IBM system/38 is an object-oriented machine for data base maintenance with major attributes like programming independence, authorization ability and dynamic microprogramming. This employs a lot of pointer types to smoothen the data-flow and permits user to declare the object rights. The processor and I/O units have access to the main storage and a multilevel, queue driven task control is achieved with microcodes.

The system, in essence, supports modular programs with structured approach attracting the expert data base designers.

Personal computing systems: Alto

Personal computers are often targeted at a particular application area, such as scientific calculation, education, business, or entertainment. By personal computing we mean a low-cost computer structure that is dedicated to a single user. The Xerox Palo Alto Research Center ( PARC) Alto is a high performance personal computer by way of its 3 Mbit/sec communication facility on Ethernet systems. The system supports 64 K words ( 1 word = 16 bits) of semiconductor memory. Applications include document production, interactive programming, animation, simulation and playing music. BCPL is a typeless implementation language that has much in common with its well-known descendant, C, employed extensively to build Alto software. The BCPL emulator provides a vectored interrupt system implemented entirely in microcode, reason for servers being slowed down on the early Alto PCs. A design for a communication system must anticipate the need for standard communication protocols in addition to standards for the physical transmission media. The Alto was designed at a time when experience with protocols was limited.

Microprogramming is a form of emulation wherein one ISP (instruction set processor of a computer) is used to interpret a target ISP. Microprogramming permits an orderly approach to control design. The micro-program is easy to debug and maintain. It makes the control easy to check via coding techniques. (e.g. parity and Hamming code). One can implement complex ISPs through microprogramming. A two-way switch, controlled by a special flipflop called a conditional flip-flop, is inserted between control matrices toward., construction of good micro-programs for sensible instructions like repetitive loops more prominent in RISC machines besides multiuser operating systems. In a parallel computer with an asynchronous arithmetical unit every gate requires only one kind of control wave-form and the timing of that waveform is not critical. But for a serial machine of the Von-Neumann type requires many control signals which also include a decoding tree of even the external pinpoints that slows down the process.

6. BIT SLICED MICROPROCESSOR

An example of a micro-programmable system based on the AM 2910 sequencer and the Am 2901 bit slice ALUs is targeted to the PDP-8ISP, the main feature being clarity of implementation, while increasing the design complexity. The DEC PDP-II employing the Unibus concept is a CISC type microprogrammed to align data and control paths using cache stores. With fast Schottky TTL and a good pipeline provides the best parallelism to meet performance targets. A desk-top computer can be used as a simple calculator ( programmed machine) at any time during the entry or execution of a program. The wider display modes allow the generation of entirely arbitrary patterns on the CRT screen through the use of a graphics raster. The HP 9845 A having a high degree of physical integration provides CRT-thermal Printer dump. This was made possible by providing the ability to use the contents of the 16 K cache memory as a source of data to drive the internal thermal printer. That printer has a thermal print-head with 560 uniformly spaced Print resistors. The Graphics Dump produces a dot-for--dot image of the CRT's graphics mode display on the printer.

Intel Microprocessors evolution

Microcomputers have revolutionized design concepts in countless applications. The VLSI can meet the desirable properties like high gate- to-pin ratio, regular cell structure and high-volume applications. The 8080 of Intel had to use separate clock and controller chips to form a processing unit. With 8085 which sprung up in the year 1976, gave the least component count as well power requirement meeting many real-time interrupt applications controllable by RIM, SIM instructions. The 8086 chip in 1978 could provide for memory segments, varying register types and an addressing capacity of one mega bytes. The system based on 8086 CPU has a wider instruction set with nine nags slowly shifting over to the personal computer benches with evolutions 10 supporting a number of high level languages for portability. But a careful look at the Intel processors def10itely convey the best abilities of the assembly languages to suit better for dedicated business processes and real-time industrial control applications. MOS technology is characterized by the parameters like propagation delay of gates, speed- power product measured in picojoules, gate density and cost.

Electron beam lithography will make possible the scaling down of structures to micron and submicron sizes. The influential areas for microcomputer software includes diagnostic tools, specialized logic analyzers and hardware emulation tools. High level language support by microcomputer architects will make system programming efficient. The twenty first century is likely to see the birth of high-level language directly executable machines for user-friendliness and throughput increase.

7. TRANSPUTER TECHNOLOGY

As parallel processing has become affordable because of the availability of cost effective processors, the need for parallel programming constructs have become inevitable.

Modula-2 and ADA have added features to aid concurrency on a timeshared Unibus computer architecture.

Transputer introduced in 1985 combines high speed computing with high speed communication with its own built-in on chip memory, floating point processor, timer and serial communication data links for direct communication to other transputers.

OCCAM is a language specially designed for parallel processing using Transputers as processing elements. The small number of registers, together with the simplicity of the instruction set enables the processor to have relatively simple (and fast) data paths and control logic. Stack addressing ( zero address) is followed implicitly in covering the arithmetic and logical operations. The close mapping between the Occam process model and the transputer architecture allows for an efficient implementation of concurrent parallel processes. The scheduler maintains different queues including priority processes. The possible scheduling points are defined by the time slice period and a set of jump instructions allowing a rescheduling. Standard Inmos Links provide full-duplex synchronized communication between transputers opening venues for high performance concurrent systems in a single processor system or network architectures. The communicating speeds range from 5 to 20 M bits/Sec. The features of a few transputers is shown in the following Table 1.

Table 1 Transputer.

INMOS T9000 is a 32-bit CMOS microprocessor having a 64-bit floating point unit, 16 k bytes of cache memory, a communication processor, 4 high bandwidth (20 M bytes/sec) full duplex serial communication links and 2 control links. Its off chip message routing is supported by low latency, high bandwidth, and dynamic routing switch IMS CI04. TMS C40 is a digital Signal processing parallel processor for real time applications.

It has a 32 hit processor with 40-50 n sec. instruction cycle and throughput of 320 M bytes/sec. It also supports on chip DMA for 6 channels with external/internal docking.

The major applications of transputer technology include super-computing, process control and image processing.

An Occam program consists of a number of parallel processes, which can be run by a single processor or if fast execution is required, the program can be run by several processors,. Occam processes are built from three primitive events, namely, assignment, input and output.

Examples:

X: =y + z

Channel ? Var

adc? temp, where

variable "temp" is input from channel "adc1."

channel 1 var

dac? cont - Var

where variable "cont - var" is output to

channel'dae 1".

The primitive processes are combined to form constructs like SEQ, PAR, ALT and ALT with priority. The control structures include WHILE and IF statements. PAR avoids deadlock by allowing concurrency. Every variable and channel that are used must he declared with a data type.

The configuration description must include each processor in the network, the type of the processor, and the link interconnection between the processors. The processor T800 with four bidirectional links can be associated with 8 unidirectional channels.

Multiplexers and de-multiplexers playa key role in data acquisition software.

Parallel C is another popular concurrent programming language for a transputer system. It provides mechanisms to dynamically create new concurrent threads of execution within a task. Each thread has its own stack allocated by its creator, but shares its code, static data and heap space with other threads in the same task. Semaphore functions in the run time library are provided to control access to shared data and channels. This is relatively weak in compile time checks as compared to OCCAM. The transputer network can be carefully designed towards SIMD algorithms and still crucial MIMD image processing domains. PARAM based on T-500 is a parallel processing system developed at C DAC, Pune.

The advanced plant automation and control system ( APACS) developed by Electronics Research and development corporation, Trivandrum is based on transputers. The prototype model has undergone field trial at RCF Bombay in the ammonium Bicarbonate plant.

The results show an increase in production with quality improvement and reduction in plant shutdown time. Certain applications such as flight testing require high speed acquisition of data (about 2000 I/O’s every 10 m sec. is communicated). Xerox corporation's Ethernet is universally adopted as a defacto standard for LANs.

Today, most major computer networks( e.g. DEC net, HP's Advance Net, and Novell)

are commonly based on IEEE 802.3 Etheret standards. While the physical layer provides only a raw hit stream service, the data link layer attempts to make the physical link reliable and provides the means to activate, maintain and deactivate the link. It also provides services like error detection and control to the higher layers.

Transport layer and Application layer are exploited by implementing typical software routines that perform the network functions like File transfer, chat, network management and error management. The two competing protocols for Ethernet are TCP/IP and DECNET. Since virtually all UNIX systems support TCP/IP it has been chosen to implement the TCP/IP at the transputer end making communication with X windows possible.

The transputer represents a novel approach to designing VLSI microprocessor systems.

Inmos principal goal is to provide a family of high performance programmable VLSI components containing high bandwidth communication links for the creation of concurrent multiprocessor systems. IMS T424 has a simple microcoded CPU and 2 k bytes of on chip memory. It has only six registers. The A.B and C registers form an evaluation stack. The workspace pointer points to local variables at run time. The next instruction pointer is similar to program counter. The operand register is used to form instruction operands. It has only one instruction format being 1 byte long. The system designer is still posed with the decisions concerning which processes are most effective when operating concurrently, how concurrent processes should be distributed over transputers and when those processes should communicate with one another.

CONT.>>

PREV. | NEXT