Microelectronics: HARDWARE -- Inside the CPU



Home | Forum | DAQ Fundamentals | DAQ Hardware | DAQ Software

Input Devices
| Data Loggers + Recorders | Books | Links + Resources


AMAZON multi-meters discounts AMAZON oscilloscope discounts


The criteria for the choice of a CPU are briefly set out. After an introduction to the nature of a program, the functions of the units within the CPU are discussed in detail. Measures to increase operating speeds are discussed.

When we are choosing a CPU for a microelectronic system, there are a number of points to consider:

• Microprocessor or microcontroller? If we are building a system that is required to process a large amount of data very quickly, then we choose a microprocessor. We can back up its powerful features with plenty of memory and a wide range of peripherals and input/output devices. If high performance and complex data processing is not essential, and particularly if the system is to take up only a small amount of space and is to be inexpensive, or is an embedded system, then we chose a microcontroller.

• Clock speed. High speed may be essential for large-scale processing, especially real-time control and complex graphics.

However, high speed makes the CPU expensive and may cause problems in the design of the circuit board. For many applications, a clock speed of 1 MHz is more than adequate.

• Bus width. The wider the address bus, the larger the address space. The wider the data bus, the easier and faster it is to process large numerical values more precisely. However, wide busses make it more difficult to lay out the circuit board. Wide busses are used in powerful computer systems, but an 8-bit data bus and a 16-bit address bus is sufficient for very many applications.

• Instruction set. The choice between a CISC processor (see opposite) and a RISC processor depends partly on the application.

In general, RISC processors are considered to be the faster choice.

• I/O facilities. Microcontrollers differ widely in the number of I/O pins or ports.

• On-chip facilities. Microcontrollers vary in the amount of RAM (none or 25 bytes to 1K), PROM (none to 8 K), EPROM/ EEPROM (none to 8K), they have, and whether or not they have an ADC, a DAC, and one or more timers.

Although there are many different kinds of CPU, as the previous section made clear, there are many similarities between them. This is because, although some CPUs are faster than others, some address more memory than others, and some have a bigger instruction set than others, they all basically do the same thing. Before we go on to look inside the CPU, we will outline what it does.

Programs and processing

A computer program consists of a sequence of codes. For long-term storage these may be held on a disk (floppy disk, hard disk or compact disc) or tape. They are transferred to a block of RAM when the CPU is to read them. Alternatively, the codes are stored in ROM and read from there. Each code group is either:

• an instruction to the CPU, or

• a value for it to use.

For simplicity, assume that each code group consists of a single byte.

This is actually the case in many CPUs but some read longer codes. If a code is 8 bits long, there can be 256 different codes, each corresponding with a different instruction. So there can be 256 different instructions to the CPU, which is more than enough for most of them. Similarly, there can be 256 different values (0, 1, 2,…, 255) which is certainly not enough for the calculations we may want the CPU to perform. However, there are ways of expressing larger numbers, negative numbers and also ways of expressing numbers with greater precision (for example numbers such as 23.456789). In SECTION 7, we will explain the ways of coding values. For the moment, we are thinking only of single-byte values.

The codes are not codes as we write or print them on paper. They are the states of sets of eight flip-flops (if SRAM) or eight MOSFET transistors (if DRAM), or eight fusible links (if PROM) and so on.

=========

CISC and RISC

Performing a logical operation by using a circuit of logic gates (hardware) is about ten times faster than running the equivalent software on a CPU. Traditionally, manufacturers increased the speed of their processors by including extra logic gates connected to perform a wider range of operations. As a result, a larger number (several hundred) of different instructions is needed to operate the CPU. This type of organization is known as a complex instruction set computer, shortened to CISC. Most of the 8 bit CPUs and more powerful ones such as the Intel Pentium and Motorola 68000, are CISC processors.

Designers have analyzed the programs written for CISC machines and noted that, in practice, a relatively small number (20%) of instructions are used for the majority (80%) of operations. Other instructions are less often used. These instructions can be omitted (and also the internal hardware associated with them), leaving the MPU with a reduced instruction set (RISC) of fewer than a hundred instructions. It is also possible to eliminate the internal software that many CISC chips have for performing certain operations. This simplifies the structure and the operation of the MPU and makes it faster. It also has other advantages; for example, the instructions can be all the same length, making the processing faster. In many of the newer CPUs, all instructions can be performed in a single cycle of the system clock. Examples of RISC processors are the Digital Alpha, and the Intel 80960 microprocessors and the PIC17CXX microcontrollers.

If a RISC computer has to perform certain tasks that have been eliminated from the instruction set, it will have to be programmed to perform the operation in several steps instead of just one. It may take longer than a CISC processor. Because such operations are needed only rarely, there is an overall gain of speed by using the RISC processor.

==============

Suppose the CPU is reading from a DRAM chip. The states of the eight transistors (reading from D7 to D0) are:

OFF OFF OFF OFF OFF ON ON OFF

This is what one byte of a program really is. It is the physical state of a set of transistors or other electronic storage devices. When the byte is read, it becomes a set of high and low voltages on the data bus:

L L L L L H H L

To make it easier to read, we represent low voltage by '0' and high voltage by '1':

0 0 0 0 0 1 1 0

As the CPU works its way through memory reading the sequence of bytes, it finds:

00000110

01011001

01111000

00000110

00100011

10000000

01110111

01110110

This is what the CPU reads into its registers, one line (one byte) at a time. There, the corresponding bits in the register are either set (= 1) or reset (= 0). At this stage we are back to flip-flops that are set or not set.

Remember, the program is really an array of set or reset flip-flops in memory (or their equivalent, transistors that are switched on or off).

Depending on which bits are high or low in each byte, the CPU takes action accordingly. There are complicated logic circuits in the CPU which, on receiving a given combination of 0's and 1's, cause the CPU to perform a particular action.

Although they mean a lot to the CPU, rows of binary digits mean very little to us. The strings of 0's and 1's are difficult to read. To make them easier to understand, we take the rows of bits as binary numbers, and then convert them to their equivalents in hexadecimal:

06 59 78 06

23 80 77 76

This does not mean much, except to an expert in Z80 machine code.

Here is the list of codes with a explanation of what the CPU does as its reads them:

06 Load into register B the number that comes next.

59 $59.

78 Copy the value in B to the accumulator (A).

06 Load into register B the number that comes next.

23 $23.

80 Add register B to register A and leave the result in A.

77 Copy the contents of A into the address that is in the HL registers (we assume that there is already an address in HL).

76 Halt operations.

This table shows the contents of the registers and memory location at each step of the program.

Notes:

(1) numbers are in hex and the addition is hex.

(2) values are copied from one register to another, not moved.

(3) everything the CPU does is done in a series of many, short, simple steps.

The example shows how the CPU is told to add two numbers and store their sum in memory. Basically, all CPUs perform this operation in the same winding way. So all CPUs need a similar structure in order to do it.

Inside the CPU


FIG. 1 The internal structure of a typical microprocessor includes complex logic circuits (control unit and ALU), with numerous registers, all linked by the internal bus.

FIG. 1 illustrates the main features found in most CPUs. The CPU comprises:

Control unit: This is a highly complex logic circuit which finds out what the microprocessor must do next and oversees the doing of it.

There are several stages in its operation and it steps from stage to stage in response to pulses arriving from the system clock. It repeats this cycle indefinitely. Control lines, not shown in the figure, run from the control unit to all other parts of the microprocessor. Through these, it exerts its control over the whole MPU. It also has connections to external parts of the system (memory, and ports for example) through the control bus.

Internal bus: This has similar structure and performs similar functions to the address and data busses of the external microelectronic system.

The internal bus may not have the same width as the address and data busses. For example, the Pentium has a 32-bit internal bus but has a 64-bit data bus. The 68000 family has a 32-bit internal bus, but the address bus has 24 bits and the data bus has 16 bits.

Data bus register: The register is the equivalent of an I/O port, through which the CPU receives data from the rest of the system or outputs data to it. Incoming data can be held in the register until the CPU is ready to receive it. Exchange of data between the internal bus and the data bus is under the control of the control unit.

Address bus register: It is the equivalent of an output port and used to transfer addresses from the internal bus to the external address bus.

Arithmetic/logic unit (ALU): The second complex logic circuit in the CPU is concerned with processing data. Its operations are controlled by the control unit. Given a single value it can increment it (add 1) or decrement it (subtract 1). Given two values it can add them, or subtract them. As explained in SECTION 7, multiplication and division is often done by repeated addition or subtraction. However, in some processors the ALU can perform multiplication and division directly, which is faster. The ALU is also able to perform logical operations, such as bitwise AND and OR, on a pair of values.

The remaining units of the CPU are registers. How many there are and the way they are used varies widely from one CPU to another, but there are four registers that are nearly always present:

Program counter: The CPU must always keep track of where it has got to in a program. The program counter is a register holding the most recently accessed address. As soon as the CPU has read the byte stored in that address, the address stored there is incremented by 1. We say that the program counter now points to the next address. This address is put on the internal bus, then into the address bus register and finally on to the address bus itself. In this way, the CPU works its way through a block of memory, reading each byte as it goes. Occasionally, the microprocessor has to jump to a different part of memory and continue its reading from a new address. On these occasions, the control unit puts the new address in the program counter, to direct the microprocessor to the different part of the program.

Stack pointer: Another register is used to hold the address of the top of the stack. The stack is a small block of memory where very important data is stored temporarily. The action of the stack is described in SECTION 11.

Status register: In some CPUs, this is called a flag register. It holds essential information about the result of an operation that has just occurred. The flag register usually has a capacity of one or two bytes, but the information is stored as the separate bits within the bytes. These flag bits are individually set to 1 or reset to 0 to indicate certain events.

For example, every time a calculation gives a zero result, the zero flag (Z) is set to 1. If we want to know if the most recent calculation gave a zero result or not, the CPU can look at this particular bit in the status register to see if it is 1 or 0. Another flag, the sign flag (S) may be set when a calculation gives a positive result (zero counts as positive). A further example of a flag is the carry flag (C), which is set when there is a carryout from a calculation. The use of this flag is illustrated in SECTION 7. The HC or half carry flag is set when there is a carryover between B3 and B4. This is not used in ordinary addition, but is important if the program is adding in BCD. The table below shows the positions of these flags and others in the flag register F of the Z80 microprocessor:

Bit 76543210

Bit SZ - HC - P/V N C

Instruction register: When an instruction code is read from memory it is placed in the instruction register. The code is then passed on to the control unit which than carries out the instruction.

Accumulator and other registers: In many CPUs a special register, the accumulator (register A), is set aside as the main register used in processing. A high proportion of the instructions refers to operations involving the accumulator. We will refer to the accumulator as A from now on. Examples of operations involving A include loading data from memory into A, storing data from A into memory, incrementing and decrementing A, adding another stored value to one held in A, and manipulating data in A by shifting the bits in various ways. These things are done according to the instruction that has been stored in the instruction register. The result of the operation is placed on the internal bus, and is often circulated back to the accumulator to replace the value that was there before. At the same time, one or more of the flags in the status register are set or reset, depending on the result of the operation. Some operations involve two values, one stored in the accumulator and one in a general register. Fig 4.1 shows only one general register, but most microprocessors have several of these to make operations more flexible. The Z80 for example has two banks of eight registers. The main register set is:

AF

BC

DE

HL

A is the accumulator and F is the flag register. The remaining six registers can be used for storing data temporarily, for example the intermediate results in a sequence of calculations. All registers are 8 bits wide, but the six general registers can be combined in pairs BC, DE and HL to hold 16-bit values. The machine code program included an example of this for the HL pair where, as a 16-bit register, they were used to hold a RAM address.

There is an alternate set of 8 registers, called A', F', B' and so on, which can be switched over to when a second line of processing is required. The Z80 also has:

• Register I, which points to a table of interrupt service routines stored in memory.

• Register R, which counts the number of executed program steps.

• Registers IX and IY, which are two index registers used in indexed addressing (SECTION 12).

Fig. 1 and the description above apply generally to many microprocessors and to many microcontrollers too. The main difference is that microcontrollers may also have memory and other devices such as timers built in to them. There are also CPUs, such as the 6502, which rely solely on the accumulator for processing. In other processors there is no special accumulator register set aside for the majority of the processing. Instead, the CPU has a number of registers, any of which can be subjected to the whole range of arithmetical and logical operations. For instance, the '1200' microcontroller has no accumulator but instead has 32 registers, any of which can be used in the same way as an accumulator.

Fast processors

On the whole, the modest processing speeds of microcontrollers are more than adequate for the tasks they have to perform. A washing machine that spends about 10 minutes at each washing stage and 5 minutes in rinsing and spinning does not need a processor capable of processing data at 10 million operations per second. In contrast, a flight control computer may have a mass of data to analyze as fast as possible before passing instructions to the control surfaces of the airplane. Similarly, a PC running a complex animated graphics program in over 16 million colors and with high-quality sound requires a high speed processor. These applications have led processor designers to increase the processing speed of microprocessors in various ways. An increase in the frequency of the system clock is an obvious solution, provided that the processor can be re-designed to operate at the increased speed.

Other measures to improve operating speed include:

RISC processors: These have been described earlier in the SECTION. In most applications, they are faster than CISC processors. Some CPUs, such as the Pentium, can operate in either CISC or RISC mode depending on the requirements of the application.

===========

Microcode

A CISC processor has several hundred instructions, some of them very complex. It is not possible for these instructions to be executed directly. Instead, when an instruction is waiting to be executed, the processor looks in a special ROM that is on its chip and finds there a short program that tells the processor how to carry out the instruction. In this way, the instruction is replaced by a microprogram of a special machine code known as microcode. The microcode takes over the operation of the control unit, the ALU, and other units of the processor until the microprogram is completed. Calling up, decoding and executing the microprogram takes time. The decoding may taken even longer that the actual execution.

RISC processors not only have fewer different instructions but all of these instructions can be executed directly as they reach the control unit. There is no microcode ROM on a RISC chip. This saves time, so nearly all instructions are executed in just one clock cycle. As a result, a RISC processor executes instructions about four times faster than a CISC processor.

===========

Wide busses: Parallel transfer of data, both inside the CPU and in the system outside, allows more data to be transferred at each cycle. The faster processors have busses that are 32 or 64 bits wide.

Dual processing: Some processors have two ALUs working in parallel so that two instructions may be processed at once. This speeds up execution but this technique can not be used to full effect if the two instructions take different lengths of time to execute.

Prefetch buffer: In a conventional computer the instruction is fetched from memory, then executed. This is referred to as the fetch-execute cycle. The time required for fetching and executing (which may involve fetching data to work on), sets the speed of operation of the processor. A processor with a prefetch buffer (such as the 8086 family, including the Pentium) does not wait for the cycle to begin before fetching the instruction. Instead it takes any opportunity when it is not busy to fetch the next and subsequent instructions. It stores them in the buffer, without executing them. The instructions are then there, stored on the chip, ready to send on to the control unit as soon as the time comes to process them.

Cache memory: This is a special type of RAM with short access time.

It may be located on the processor chip, so giving the fastest access, or there may be a special cache memory chip as part of the system's RAM. It is used for storing addresses and data that might be useful to the CPU in the near future. For example, it can hold data that has been read in ahead of the time it is required. When the CPU needs this data, it looks in the cache first. If it is there, it uses it. If it is not there, the CPU looks in the normal RAM. In some systems, data is stored in Level 1 (L1) and Level 2 (L2) caches. L1 is on the CPU chip so it is almost instantly available. The L2 cache is fast-acting DRAM on the computer board which is not so quickly read as L1 but is faster than the usual SRAM. Some processors have separate caches for data and for instructions. The instruction cache holds the data when it is first loaded and, from there, it goes to the prefetch buffer.

Floating point unit: The floating point format works with large positive and negative numbers that are stored in four bytes but is more complicated to process. Processing in the accumulator using software routines is slow. A floating-point unit is a logic circuit specially for processing numbers in floating-point format and is appreciably quicker. It may also include circuitry for multiplication and division.

MMX: Multimedia extensions are routines intended for speeding up graphics and sound processing in multimedia applications. They include SIMD instructions, which stands for single instruction multiple data. For example, a single SIMD instruction can be used to change the colour of many pixels at the same time.

Branch prediction: When a processor has performed a certain operation, it is more than likely that the program will require it to perform that same operation again. For example when the program has a loop in it, the processor has to jump back to the beginning of the loop every time it reaches the end of it. It may run round the loop hundreds of times. At any stage, it is safe to predict that it will jump back to the beginning of the loop when it comes to the end. It is only on the last time round the loop that this prediction will be wrong. Special routines within the CPU are used to store the instructions most recently used, on the assumption that they will be used again.

Coordinated instruction set and compiler: As will be explained in SECTION 7, a compiler is a program that lets the programmer key in the program in a form that is more understandable than machine code.

After it has been typed in, the compiler turns the program into machine code, that is, into instructions that the processor understands. Usually the instruction set is designed by the electronics engineers who design the layout and logic of the chip. After that stage is complete, the compiler is written by a software designer. This may lead to problems when it is found difficult to write an efficient program for some of the instructions. For many of the more recent processors, including the '1200', the instruction set and the compiler are both designed at the same time. The hardware and software engineers work as a team. The result is a processor/compiler combination that operates with the best possible efficiency to give faster running programs.

Pipelining: In a conventional computer, a byte of data on its way from RAM to the ALU of the CPU is copied from one location to another (FIG. 2a). There is one transfer per clock cycle, so the whole process in the example takes 3 cycles. Bytes arrive at the ALU every 3 cycles.

With pipelining (FIG. 2b) the second byte and third bytes begin their journey before the first byte has reached the ALU. A byte arrives at the ALU every cycle. Pipelining is advantageous but does not work well with certain kinds of instruction. Moreover, if the CPU is using branch prediction (see above) and the prediction is wrong, all the data in the pipeline has to be discarded.

FIG. 2 Pipelining is a technique for speeding up the reading of data. Without pipelining (a), a byte may take 3 clock cycles to reach the ALU. A new byte arrives every 3 cycles. With pipelining (b), a new byte arrives every cycle.

Out of order execution: If a calculation has several steps in it, the values to be used at a given step depend on values obtained in previous steps. The steps must be executed in the correct order. However, in a system with two or more processors, it is possible for an idle processor to check ahead to find an instruction that does not depend on previous calculations, and execute that instruction ahead of time. The result will then be ready when required.

===================

EXERCISE 1 -- Selecting a processor

Select a microprocessor or microcontroller that would be suitable for:

(a) switching traffic lights, with pre-settable delays at each stage.

(b) controlling an automatic weighing machine. Weights are displayed on a liquid crystal display.

(c) a hand-held stock logger, such as is used in supermarkets for shelf stock-taking.

(d) controlling a printer that is attached to a computer.

(e) monitoring the water level in a tank and warning when the rate of rise is too fast.

Use manufacturers' data sheets and other reference materials to help you make your choice. Make a list of the reasons for your choice.

EXERCISE 2 -- Inside the CPU

Investigate the internal structure of a microprocessor or microcontroller, using the manufacturers' data sheets and other reference materials. Draw a diagram to show the main units and the way data flows between them. Show the ports and indicate the direction(s) of flow of signals.

Write a brief account of the internal structure, outlining what each unit does.

List the flags of the status register (or equivalent register) and explain what each flag means.

Note any features of the device which suit it for special applications. Note any features that give the device high operating speed.

Problems on the CPU

1. What are the differences between CISC and RISC processors?

2. List the kinds of additional units that are included on the chip of several named microcontrollers.

3. What is machine code and in what forms does it exist in a microelectronic system?

4. Explain the function of the arithmetic logic unit.

5. Describe the status register of the Z80 or other named CPU, and the meaning of any three of the flags.

6. List the registers of the Z80 or other named CPU, and describe their functions.

7. List four ways in which the speed of operation of a microelectronic system may be increased.

8. Write a general account of the architecture of a named microprocessor or microcontroller, illustrated by a block diagram. Explain how the architecture is related to the functions of the device.

9. Rewrite the table to show the computer performing the operation 172 + 35 (these are decimals) and storing it in $2A2C.

10. Using the information given in the example, write a program in Z80 machine code to add 45, 125 and 23 and store the sum in $143D.

Multiple choice questions

1. The binary number 11010011101011 is represented in hex by:

A $BE43.

B $34EB.

C $13547.

D $D3A3.

2. A system is designed to have an address space from zero to $3FFF.

The number of address lines required is:

A 16 383.

B 16 384.

C 14.

D 15.

3. A processor that is programmed by only a small number of instructions is called a:

A RISC processor.

B microcontroller.

C CISC processor.

D PLC unit.

4. Programs are stored in the SRAM memory of a computer as:

A bits.

B a series of 1's and 0's.

C arrays of transistors switched on or off.

D machine code.

5. The program counter of a processor holds $14C2. The processor reads an instruction that tells it to jump to a new address, ten bytes further on. The address in the program counter will change to:

A $A.

B $14C3.

C $14CC.

D $0000.

6. A register that holds the flags may be called a:

A status register.

B program counter.

C index register.

D instruction register.

7. If the zero flag is set (=1) it indicates that:

A the result of the previous operation was not zero.

B the result of the previous operation was zero.

C the carry-out bit is 1.

D the result of the previous operation as $0000.

8. With respect to CISC processors, RISC processors:

A are slower.

B may be faster in many applications.

C are more difficult to program.

D are faster.

9. A place where data is stored for fast access is:

A cache memory.

B DRAM.

C the prefetch buffer.

D a CD-ROM drive.

10. In bitwise logic, $59 AND $33 is:

A $92.

B $8C.

C $7B.

D $11.

Answers to questions

PREV. | NEXT

Related Articles -- Top of Page -- Home

Updated: Thursday, May 18, 2017 10:09 PST