Guide to Computer Architecture and System Design--PARALLEL PROCESSING AND FAULT-TOLERANCE (page 2)

AMAZON multi-meters discounts AMAZON oscilloscope discounts

<<PREV.

4. MULTIPROCESSING EPILOGUE

Microprogramming

With the birth and growth of microprocessor technology, the microprogramming strategy started springing up/or flexible systems. Microprogramming is a systematic and orderly approach of a CPU control unit introduced by Wilkes around 1950's. Each machine instruction is broken into microprograms. ROMs, RAM's and PLAs are utilized in microprogram designs. Parallel commands could be found in a microinstruction which consumes the instruction fastly.

Horizontal microprograms are of small lengths, easy to decode and results in lengthier microprograms. Vertical microprogramming calls for concurrency and process optimization methods but establishes simple coding schemes, whereas potential conflicts in bus transfers and in data dependent microperations must be handled with horizontal technique. Hardwired control units are faster at the expense of dedicated circuitry. Typical applications of microprogramming include emulation, high level language direct execution and tuning of architectures. Emulation is defined as the interpretation of an instruction set different from the native one. DEC VAX 111780 has both a native instruction set and a PDP-11 emulated set. If the host is faster, the implementation of guest gets speeded up.

EULER (a subset of ALGOL 60) is a direct execution language for IBM 360/30. Direct execution is useful when the micro engine is flexible and the high level language has easier feature extraction. Static microprogramming more fits with microprocessor based process instrumentation systems.

Tuning of architectures lends oneself to dynamic microprogramming which has to account for instructions usage monitoring, performance efficiency and deciding to modify the native Instruction set on evolving and strong processors. Thus, microprogramming leads to essentially language implementations for problem.

Oriented sectors. Also the single assignment rule of dataflow computations extends help for concurrency and parallelism with micro-operations giving the job of resource conflict detection and process scheduling from the operating system viewpoint. The extreme step is high level language architectures for direct execution with bottom up machine design strategy. Stack architectures fit well with specific numerical control production process applications using LISP language and also for enhancing the arithmetic capacity of existing systems which otherwise are indicated by overflow errors.

CRAY-1 is about two and a halftimes faster than that a Amdahl 470 v/6 (in 1978) having a 29 nsec cycle time, owing to employing concurrent activation of twelve tightly coupled functional units by vertical microprogramming and pipelining. The I/O section consists of 24 channels, of which 12 are k output. The instruction formats are similar to CDC 6600 and this in addition includes vector and backup registers presenting interacting challenges to compiler writers. The I/O devices are those transducers which allow the outside world to communicate with processor memory. A to D and digital to analog converters are more desirable in process control instrumentation. RS 232 C is used with modems in local area networks for serial communication. The attractive candidates of read-only stores ( ROS) are executable function call programs (math and graphic libraries) besides monitors and supervisors. For mass production of computers with ROS, thick film process in IC fabrication provides cost saving and increased reliability.

Ethernet is a standard local area network connecting terminals and computers within 100 meters to 10 kilometers range at a speed of 1 to 10 megabits per second. The topology is that of an unrooted tree. There is a single path between any two nodes of the network which facilitates easy collision detection meaning a way for aborting transmissions. Error checking is met by CRC(Cyclic redundancy check) in which the choice of generator polynomial is crucial.

Current memories include error correction and detecting schemes. Local networks have provisions with hardware interfaces and software protocol to continue operating in the presence of errors towards fault tolerance. The low cost of introducing extraneous bits for error control makes MTBF of semiconductor memories improve by ten times.

For example , the IBM 370 , CRAY-1 and the DEC VAX 11/780 have 8 bit ECC's appended to 64 bit words for SEC-OED and detect 70% of multiple (larger than 2 bit')) errors.

The electronic beam addressed memories are used for real-time slow switching.

Random addressing of a bit ( pixel) is possible and block addressing is done by a TV - like scan. Reading is partially destructive and refreshing is necessary every minute. These EBAMs promise to be fast compared to charge coupled devices and useful in extended core storage.

The Gorden research conferences, the American Chemical Society and many universities sponsor symposia on adhesion and adhesives. Thermal shock is an environment test performed to emphasize differences in expansion coefficients for components of the packaging system in VLSI area. The 64 pin MC 68000 ( 16 bit microprocessor) is offered in a head spreader package (68000 G) . Besides holding the chip in place, the die bonding adhesives must conduct heat from the chip to the heat sink. All Organic adhesives release water vapor in hermetically sealed packages. The released water vapor must be less than 15000 ppm. to avoid chip deterioration. In today’s life, failure of adhesives could cause computers to stop functioning, cities to blackout or missiles to misfire.

Having discussed on multiprocessing trends, we finally touch on the aspect of system reliability in section 5.

5. FAULT TOLERANCE IN COMPUTERS

The performance valuation of a computing system is called for when the system is not underutilized i.e., when the system cannot go insecure and when input efforts need not be increased or the output profitability shall not be impaired. In 1936, A.M. Turing has shown that most computers are logically equivalent to each other since they can compute the same wide class of computable functions, conditional only on the character sets they can accept and print and on the storage size.

System evaluation is of interest to architects, system programmers and users. Manual operations like tape-disk mounting, printer paper bursting and console interaction can have significant effects on overall performance. Thruput is a commonly used measure for batch systems and is expressed in units of job done per minute. The lack of sensitivity to order of completion makes thruput less important for fast response interactive benches.

Statistical and probabilistic methods will become important in computer systems as they serve more users in many unpredictable ways. Performance results, whether they are obtained by measurements, simulation or analysis are meaningful only with respect to the choices. Hardware monitors help in measuring resource performance metrics. The main attributes of micro-program speed improvement calls for reduction m instruction fetch times and more effective use of registers and fast storage units.

A few universities have taken the lead in producing fast-compile compilers. WATFOR Fortran compilers ( University of Waterloo, Canada) and PL/C, a PL/1 compiler at Cornell University support excellent diagnostic aids besides very fast compilation. Dan Ingalls and Don Knuth of Stanford University developed a software tool called execution time profile monitor with FORtran source input to study the program performance metrics. Automatic program generators from problem specification has definitely improved the programmer on line time to utilize the machine investments which are rather leased because of unaffordable cost and demand. REMAPT is a language for describing parts in numerical controlled machine tools. Execution profile analyzer helps system programmers to identify codes suitable for microprogramming. Multiplexed information and computing service has been successfully utilized in a predominantly timesharing environment at MIT.

Fault-intolerance is aimed in eliminating the sources of faults as much as possible whereas fault tolerance involves redundancy to provide a required level of service despite faults having occurred or being present. Reliability and availability dictate on the system dependability factor. High availability systems keep downtime to a minimum value and maintainability is affiliated to software engineering area.

Hardware redundancy is used as standby for continuous operations. Tandem computer is a system of high availability for commercial transactions. Triple modular redundancy is a concept for success voting on fault-tolerant instrumentation domain.

The software reliability is of major concern on the global markets. The effect of 25% increase in problem complexity results in a 100% increase in programming complexity.

Software reliability is the probability that the software will execute for a particular period of time without a failure, weighted by the cost to the user of each failure encountered.

Fault avoidance and tolerance are associated with software reliability. The Bell system's TSPs ( telephone traffic service position system) employs fault correction method and has a stringent availability requirement that downtime cannot exceed 2 hours in 40 years.

Users must generate good algorithms and well defined data structures to account for real concurrency with conventional languages. Blackbox approach of information hiding can be followed in software development to tackle the glassbox tests and improve safe communication skills.

Quantitative evolution of software quality proposed by B. W Boehm in 1976 dictated 60 quality metrics. In 1977, Walters and MCcall reduced quality factors to 11 candidates.

Accuracy in program error prediction is a major problem in quality control of a large scale software system.

A fault is a damage, defect, or deviation from the normal state of the computing system on which tasks execute. Fault detection is concerned with detecting the manifestation of a fault by some means other than program execution, while error detection deals with detecting those errors in program execution induced by fault (s). Fault location is possible only by 100010 error detection. Totally self-checking circuits are used to detect faults concurrently with normal operation. The purpose of error detecting codes is simply to detect the presence of errors whose non recognition could be harmful.

A code C is capable of detecting all unidirectional errors if the codewords are unordered. A syndrome is a binary word computed by the decoder and used in the decision as to which codeword was transmitted. Bose and Rao have shown that constant weight codes with minimum distance 2t + 2 are t - error correcting / all unidirectional error detecting codes. In general a distance K code will detect up to ( K - 1) errors. In future, code distances are expected to have a vital share in data communication.

The minimum distance of a code D(c) is the minimum hamming distance between two distinct codewords. Residue codes are well known arithmetic error detecting codes, while Berger codes are optimal systematic AUED codes. Error correcting codes can be classified according to the number of erroneous bits that can be corrected! detected. The error control schemes include ARQ (Automatic repeat request) and (forward error correction) FEC methods. ARQ scheme is preferred in data communication networks.

A hybrid method combining ARQ and FEC will improve error control as well increase thruput for bus bottlenecks. Coding theory, currently a subject of research owes a practical floor in the frontiers of computer science towards reliable and secure machines.

To conclude, parallel processing issues are evolving around the revolving computer architectures in the problem spaces.

TERMS

Virtual memory management, dataflow, insecure, graph, network, algorithm, arithmetic pipeline, dataflow, local memory, sorting, MIMD, Illiac IV, Handler, latency, RISC, petrinets, microprogramming, CRC, adhesives, fault intolerance, Boehm, fault, syndrome.

QUIZ

1. When is a graph said to be completely connected?

2. What is congestion? Explain how the congestion can be avoided in networks.

3. Compare and contrast synchronous and asynchronous buses from the viewpoints of data bandwidth, interface circuit cost and reliability.

4. When is serial communication called for? Explain the 8286 transceivers for half duplex communications with an interface schematic.

5. Differentiate between errors, faults and failures.

6. Clearly distinguish error detection and repairability of a system fault.

7. Write notes on testability and fault coverage in VLSI designs.

8. What are static dataflow machines?

9. Explain anyone type of pipelining for concurrency with an objective approach.

10. How the SIMD configuration helps the ASIC's group?

11. What is multiprocessing on a MIMD machine?

12. Mention the merits of horizontal microprogramming.

13. Define fault detection.

14. Write an extensive notes on coding theory for secure systems.

15. Explain any project features featuring the parallel processing need of your own experience on an implementable task.

PREV. | NEXT