Guide to Computer Architecture and System Design--CENTRAL PROCESSING UNIT: POWER OF ARCHITECTURES (part 1)

AMAZON multi-meters discounts AMAZON oscilloscope discounts

1. LEVELS OF ABSTRACTION

A central processor unit essentially is made up of the arithmetic logic unit which can do processing to varying degrees based on the architectural constraints. This powerful unit is termed the system controller in the dual sense it controls itself to arrive at a solution to the program input and is being controlled by the data available in primary memory, often called the mother-board. The central processor is also called the microprocessor with its birth in 1971. The computer system designed with nil programmability are called programmed machines. The examples are hand-held calculators, business desk-top calculating machines and identified process instrumentation systems mapping to the Applied Specific Integrated Circuits design group.

But since the manipulation and interpretation on data varies from each user, the computers really have to become programmable catering to flexibility in applications.

Though every processing element is a number cruncher, each has its own machine language. The major task of Language support is a primary parameter as any architecture is concerned. Fig. 1(a) presents the different levels of language organization. From the figure, the assembly Level is supposed to be the special area of system engineers who are to be thorough with the language of the particular central processing unit in terms of its geography and the powerful system routines (like MACROS, pseudo - programs and micro - routines) besides the basic instruction - set the processor supports at the assembly level for really tailorable events. With this exhaustive knowledge of the P.E. (processing element) configuration, the Assembler has been written for many a system, which is in essence, a true translator, which implies only a lot of calculations and computations are done before arriving at the Objective, i.e., the .obj files. Whereas any mistakes (called a<, bugs) arising after translation is fully attributed to the logical and psychological mind on the . ASM source file.

Defn. "An assembler translates the stated assembly code into machine code in a readable form ".

But with the evolving architectures touching the phase of computer networks and portability, the need has come up for Cross - assemblers in distributed and parallel processing environments.

"A Cross· Assembler is one which translates the assembly program of One machine to the object code of a second machine where the machines can be said to be equivalent if not identical".

=======

Software

Systems Application

HLL problem - oriented languages Translation compiler

Assembly language s Assembler

Operating system machine level

Hybrid

Conventional machine

Interpretation Microprogramming (executed directly by hardware)

Fig. 1 (a) Levels on computer architecture

=======

Cross-assembling across processors in the parallel processing domain needs a heavy weight on portability issues to cater the portable tasks. The complexity of computers grew much with the birth and growth of operating systems. A system can be said to be operating if the operating system is alive. Thus, the architectures have to maintain a lot of its own manageable data before being well capable of managing the User data. Thus rightly in the domain of software engineering, program maintenance is left to the USER (programmer, in genera!), and problem accounting is left to the processing element.

The High level language is more user friendly and English like, and each language has been designed for specific application areas like voluminous data processing in commercial circles, scientific computing needs and finally boiling to packages for interactive and command-based languages. The languages at this level are strictly syntax-driven serving more a general semantic base. Table 2-1 (a) depicts merits of HLL VS assembly level. Table 2-1 (b) lists some assembly level models. Now computer systems have captured the voluminous area of desk-top computing in daily applications to microprocessor-based process control domain covering both programmed needs and intelligent bases. The machine level, is of course, the straight mother-tongue as machines are concerned for number crunching which is more of one-to-one nature.

2. POWERFUL PARAMETERS

==============

Table 1 (a) HLL vs ALL

[High-level langs.

Machine independent and user-friendly

Less programmer efforts

Provides standardized application packages as utilities and 100% commercialized ]

[Assembly level Languages ---- One to one and for systems engineers

Very fast execution meeting both time and storage complexities

Best fits the ASICs area with lesser word length machines examples being microcontrollers ]

==============

Table 1 (b) Assembly language machine models

=============

With the conceived idea of stored program concept and sequential processing owing to John Von Neumann and with the birth of VLSI in 1970's with the inclusion of microprocessors, it is felt essential to study the power of computers so that the desirable features can be embedded into a machine in order to meet both the interactive command base and demand oriented applications. The power of a CPU is governed by the following list of parameters which will be discussed one-by-one.

(i) The clock of the system;

(ii)The wordlength of the processor;

(iii) The external Buses;

(iv) The inbuilt computing power;

(v) CPU Storage optimization;

(vi) Data transfer schemes - priority of;

(vii) The instruction set complexity; and

(viii) Cache memory and associative linking.

Clock

In real-time and in the fast world, our activities are governed by the universal clock which is based on the elementary unit of time, Second. Whereas the microprocessors began their generation from a clock frequency of 2 to 3 MHz and propagated to date, to a clock speed of 50 to 60 MHz in a span of two decades. Thus, the clock-speed is the first and foremost factor dictating the THRUPUT on a system which is composed of multi and parallel events.

Fig.2. (b) Growth of word lengths.

Word Length

A microprocessor like Intel 8085 is a 8-bit CPU which can at one stroke do arithmetic and logical computing on 8 bits of data. In the realm of architectures, the range of wordlengths from 1 bit to 64 bits is a tremendous growth regarding the number crunching factor in a matter of 2 decades. See Fig. 2-1(b). Application-wise, the process-control areas (dedicated) are suited for a wordlength of8 bits, 16 bits category accommodating the PC community (pioneered by IBM corporation), 32 bit machines catering to timesharing and multi-user platforms to the powerful giants of> 64 bits for the super computing categories.

It is, no doubt, this factor of wordlength, governs both the power of a machine and the cost involved in installation of such a system. It has been accepted and more become a standard to designate a set of 8 bits as a Byte.

External Buses

Most of the initial systems employed multiplexing of address and data pins because of the highly sequential nature of instruction flow for a program-meet. But with the inclusion of parallel programs, viz., user-written, user called and self-resident embedded software, it has become essential to have separate address and data paths to meet constraints of Turn around Time and Thru-put as the machine architect is concerned. For example, the IBM/360 supports decimal arithmetic as well floating -point arithmetic which obviously has to go in parallel on batch machines to cater to process or utilization. In this respect, Relocation is a facility which adds to the embedded power of the central processor.

This means being able to move a machine code program about in memory, the addresses still being valid. More precisely, it means being able to defer a decision on where-in the program is to be loaded until loading time (that is. not at assembly or compilation time). In order to achieve this, instead of addresses being absolute (that is, referring to a specific location in memory) they must be relative to something that will be adjusted at load time (when the program is loaded into memory just prior to execution). This is achieved based on the mode of specification of primary memory locations, viz., like direct or base addressing. Perhaps a Loader is used to accomplish the task of loading any program into main memory, usually from backing-store. Addressing memory locations is left to the instruction set of a computer which is more attached to the wordlength.

Thus, computers with a relatively short wordlength have to employ a more complex addressing mechanism requiring multiple accesses to memory just to fetch the instruction.

For the CPU, the data operated (Le., operands) are available either in registers or attached storage (Primary and cache memories) and the operators themselves, more often, reside within the Cpu. Thus, the number of Buses and width of each bus playa crucial role in computer operations.

The in-built computing power

The bipolar as well as unipolar devices suit to build up a computing element covering both logic and arithmetic. The speed of operation of the operator is the primary concern governing the power of a central processing unit besides the desirable property of embedding more storage to enhance the throughput by reducing the number of seeks to primary memories for the current and active data. Some essential parameters for different logic families is given in Table 2.

Table 2 Performance data for IC types

Because of the very major advantage of power of the unipolar devices, the MOS technology has become a promising candidate for VLSI design area in capturing higher power built within a single cpu. The CPU integration strength will directly reflect the instruction capabilities of a machine. Based on the processing abilities, the CPUs are said to follow the CISC (complex instruction set computer) or a RISC ( reduced instruction set computer) machine. Fig. 2 (a) gives the block diagram of a reduced instruction set machine. It has to be agreed that more CPU - in - storage is costlier than in general the memory capacity to compute into powerful compute - bound applications. Thus, the type of computation is only reflected by the main problem orientation.

The evolution of a CPU towards reduced instruction set computing (RISC) needs to be viewed in the context of application specific integrated circuits (ASICs). The CPU has to be designed having as few functional units as possible in terms of processor design.

The ALU set is implemented using adders, shift registers and combinational logic because the hardware ALU operates at a very high speed of just a few nanoseconds compared to the data rate. For example, the VLSI chip Motorola 68,000 having about 68,000 transistors on chip using NMOS technology operating at a clock of 8 MHz performs at a minimum instruction time of 500 nanoseconds to a maximum instruction time of 21.25 microseconds.

In the RISC unit shown in the Figure 2 (a), many internal registers serve as general purpose, program pointers and save areas. Thus, a relatively high amount of feeding can occur enhancing the inherent parallelism feature embedded in a CPU.

By using a proper pipeline-design, many complex instructions can be achieved by a few instructions of the RISC machine, which involves a higher degree of co-operation among the machine architects and compiler writers in the use of compilers to optimize object code performance.

===========

Primary memory

Cache memory

Data Bus

Address Bus

Mass register memory ~ storage facility

Compact arithmetic and logic unit

Control Bus

Fig. 2(a) RISC Architecture

=========

The main attributes like few instruction types and addressing modes, fixed and easily decoded instruction formats, fast single-cycle execution and hardwired control will accelerate thruput, justifying the need for large CPU-in storage. In addition, cache memories will be an added feature in the area of RISCs in database management systems, converging either towards the applications of information technology or to the building up of expert systems in real-time applications towards an intelligent base. This feature will provide more of a parallel processing environment towards a shared database, which is the demand of today.

External to the CPU, we can have separate address pins and data pins for parallel flow of information in the current age of information technology incorporating a memory mapped I/O feature in the area of multi -user multi-tasking applications. The only limitation of RISC is its lack of utility in the area of scientific computing.

Examples include RISC arch. developed by Stanford University. DEC, SILICON GRAPHICS, SONY, Military Avionics Program in U.S.A are using. Specific computer RISC microprocessor uses less number of addressing (Memory) Modes, Program manipulation instructions, and stronger memory system diagnostics.

A noteworthy feature in the computer industry is the inclusion of engineering work stations in the market of PCs, midrange systems and even mainframes.

An overwhelming number of work stations Use RISC technology. Newer systems are built around 32 bit conventional architecture, e.g. being, Intel's 803861 80486 as well RISC architectures such as Sun Micro Systems, Sparc and MIPS computer System.

With the present day trend in multitasking activities, the in-built computing power can be optimized by intelligent timesharing designs on a single CPU platform and a systematic method of memory organization and synchronization can be attempted by the use of semaphores and dataflow networking. 111e CPU power is also monitored by the ability of the system supervisor with respect to operating systems, for an effective throughput and improved Turn-around-time as the data transfer schemes configuration is concerned. Today, CPUs are available with a clock rate from 2 MHZ to 40 MHZ which does reflect the type of application environment.

CPU Storage Optimization

The need of increased storage within a CPU is attributed to two different situations, viz.,

To manage with a simple instruction set in arriving at solutions with higher speeds; To match to the computing speeds that is available of the CPU elements, more so, in the domain of pipeline and parallel processing.

Though, no doubt, the CPU storage available is an asset, the return can only be felt by intelligent programming co-ordination activity. This statement applies to both multi user interactive environment and giant batch processing machines for high Scientific computing. But, in essence, the additional overheads, both in terms of Hardware and Software is not ruled out in order to optimize the available use of resources at any point to time. For example, the Intcl8086 maintains a 6-byte instruction queue apart from its rich register storage towards optimization. For compute-bound problems a numerical co-processor 8087 and JIO processor may be added along with the prime 8086 CPU to speed up the activities. 8086 has a 20 bit address bus and the data-bus width is 16 bits.

Modular programming which calls for memory segmentation in using high level languages is a desirable feature in chip designs. The peak computing power is often stated in so many mega flops (million floating point operations per second) or lips (Logical inferences per second). This lips is basically decided by maintaining a close-look on the flag activities of the various CPU flags of a stated problem.

Data Transfer Schemes - Priority of:

Every computer system employs the following data transfer methods:

PROGRAMED DATA TRANSFER

Most of the .obj and .exe files are executed using this method. Essentially many instructions are just data movement across the working sections when the CPU is active, and about RO to 90% of the computer time is allotted for this. The basic reason behind this is, more often, the processing elements are sequential by nature. This is a synchronous scheme with a predictable output. Algorithms assist in' improving by manifolds the thruput factor of any available machine.

INTERRUPT DRIVEN I/O

Sometimes, it is required that interrupting programs are attended to and the main program is resumed. Also on a multiuser platform, the wanted data of a terminal is immediately communicated in the usage of operating system commands (which are permanently self-resident programs that have the highest priority of attention). Most of the process control and electronic instrumentation programs are interruptive in nature as interactive programs. At the Apex, the computer networks are either used for inter-process communication or just for data routing, which employ a higher degree of interrupt driven I/O mechanisms.

DMA (DIRECT MEMORY ACCESS)

This is also called cycle-stealing transfer, because, in essence the external buses are used by the channel requesting the DMA operation. More of the flush-m and flush-out activities need the DMA scheme not only to meet their own ends but also pleasing the CPU with a higher performance figure. Fig 2-2 (b) gives the block diagram for DMA scheme. The use of the interrupt driven and DMA methods will more reflect the character of a computer system with its prolonged use during its lifetime.

Fig. 2 (b) DMA block schematic

Instruction set complexity

The power of a system depends mainly on the total number of instructions it supports and more qualitatively the various addressing schemes the processor may employ.

The different addressing modes are listed and explained one by one.

==========

op-code Address of operand

Source and destination registers;

Actual memory address

Port number for input, output;

Program branch label

Fig. 3 (a) Direct addressing

==========

DIRECT ADDRESSING

Fig. 3 (a) addresses this type of accessing mechanism. In essence, every instruction has an Opcode (Operation code) and the accessing of Operands is implied by both the instruction mnemonic and the operands themselves. Here, the operand address is a part of the instruction itself and is explicitly stated. Examples are moving data between CPU registers, loading data between memory locations and registers where again the address of memory is directly specified. Also, branch instructions to a direct address falls under this category. IN and OUT instructions are a special category of this group. The instruction complexity is often dictated by varying attributes like instruction types, length of instruction and the instruction time. To summarize the features of the direct addressing method, the following points are noteworthy:

It is the fastest method (less instructions time); Address is a part of the instruction; Simple and easy for implementation.

NEXT>>

PREV. | NEXT