Random-access phrase recorder (Jan. 1987)

Home







High quality compressed speech data is easily recorded and replayed on a BBC microcomputer using this speech digitization unit.

by A.L. EVANS AND J. FENNER

OKI SEMICONDUCTOR DEVICES

ADPCM speech chips are available from stock at Manhattan Skyline of Maidenhead, who are sole UK distributors for OKI semiconductor parts. A chip set comprising MSM5218RS, MSM5204RS, ALP4B (two ) and 384kHz ceramic resonator are available to readers the special price of £ (x1.34 for $) 16.70--half the normal one-off price. Manhattan Skyline Ltd are at Manhattan House, Bridge Road, Maidenhead, Berks SL16 8DB.

The sequential nature of tape recorders limits the therapeutic and educational techniques that can be utilized. A random-access tape recorder would allow a variety of approaches to be tried, though rapid access of sequential tape/cassette recordings is expensive.' A useful system can be implemented by using a BBC B microcomputer and storing the speech data on floppy discs, though the impracticality of storing huge files on disc makes some form of data compression desirable. It can also be used to record a.d.p.c.m. data for prom-based synthesizers.



Fig. 3. Speech digitization unit includes eight-bit analog-to-digital converter, parallel-to-serial converter, analysis / synthesis chip and interface to the 1MHz bus.

SYSTEM DESCRIPTION

The system components are shown in Fig.1, in which adaptive differential pulse code modulation, a waveform coding technique, compresses the data by factors of two to four.

All units are standard, readily available items apart from the speech digitization unit, the design of which is described in this article.

Data is exchanged between the digitization unit and computer memory via the 1MHz bus. Power is supplied via the 5V line in the analog port, leaving the auxiliary power socket free for the disc drive. The system is controlled by means of a touch pad (concept keyboard) which allows the recording or playback of a large number of files of data, limited only by the disc filing system (31 for the Acorn DFS). Each file can hold as much data as is allowed by the computer 'memory, the data capacity of the disc is soon approached unless an 80-track drive is used.

In practice, a useful system has been implemented with the concept keyboard allowing a choice of nine files, each containing the speech data corresponding to a word or a short phrase. A small area of the touch pad is reserved for controlling the record or play-back mode-a prompt appears on the screen so that the user is aware of the currently selected mode. Writing the data file to disc takes one to two seconds and if the selected work or phrase is not currently in memory, there is a similar delay while data is read from the disc before speech starts.

SPEECH DIGITIZATION UNIT

Direct digitization of speech waveforms at 8 kHz only allows some three seconds' worth of data to be acquired in the 32K memory of the standard BBC B microcomputer (allowing a modest 8K for program storage). To gain more recording time, the adaptive differential p.c.m. technique implemented by the OKI company in their MSM5218 chip is used to give a data compression factor of two. In addition, the system allows software selection of sampling frequency, implementation of analysis and synthesis on the same chip, easy interfacing to an input/ output port, and low-power c-mos circuitry.

In the digitization unit, Fig.2, the audio input from the microphone is buffered, filtered and amplified before entering the eight-bit analog-to-digital converter (MSM5204). The data converter has a built in sample and hold function, and is control led by the start-conversion signal of the analysis/synthesis chip. Data is transferred to a parallel-to-serial converter (4014) before being presented to the 5218 as eight-bit serial data. Since the 5218 expects a 12-bit data input, the least significant four bits are padded out as additional zeros using hard wired circuitry (4024 and 4011 on the full circuit diagram in Fig.3). The analyzed data is transferred to memory from the data pins Do-D3 of the 5218. Figure 4 shows the timing diagrams for the data transfer to and from the microcomputer via the 6821 peripheral interface adapter. Port A of the 6821 p.i.a. is reserved for bidirectional data transfer while port B dedicated to control the 5218 chip functions. The master timing signal VcK from the 5218 is detected via CBI pin of the p.i.a. The p.i.a. was addressed by the computer at &FCEX using minimal decoding logic (Fig.3).


Fig.1. In the random access phrase recorder, all units are standard except the speech digitization unit which uses a.d.p.c.m. to compress the speech data.


Fig. 2. Speech digitization unit is based on the OKI MSM5218 a.d.p.c.m. analysis / synthesis chip which compresses the digitized speech waveform before sending it to the computer. The same chip also decodes the data and gives a synthesized waveform as an analog output.

Data returns from the computer again via the 6821 p.i.a. The 5218 is placed into synthesis mode by taking pin 6 low (port B bit 3). The VcK signal from the 5218 still controls the data transfer (Fig.4). An analog output is produced at pin 18 and after low-pass filtering the signal enters an audio amplifier before the speech is generated by an external speaker.

The components used in the speech digitization unit are readily available (OKI chips, active filters and resonator from Manhattan Skyline, Bridge Road, Maidenhead, Berks SL6 8DP). Component costs, including interconnecting plugs, cable and the enclosure, are below £ (x1.34 for $) 55.

The quality of the speech produced is good, its intelligibility being close to high quality recorded speech and significantly better than l.p.c. speech in a similar environment.


Fig. 4. Timing diagrams for the data transfer between the MSM5218 and microcomputer, (a) synthesis (b) analysis. The MSM5218 provides an 8kHz clock to control the data transfer.


Fig. 5. Flow diagram of Basic program controls the recording and playback of speech data.


Fig. 6. System has been made 'user-friendly' by using a touch pad to control the recording and playback. A program is provided to edit layout of the paper overlay.

SOFTWARE

The programs are written in Basic with machine code routines controlling the acquisition and playback of data. The main program is described by the flow diagram of Fig.5. The system waits for a touch-pad key to be selected and then branches to the routines controlling the data transfer.

If recording of speech data is selected the acquired routine is entered. All interrupts are disabled so that data acquisition occurs at a predictable rate. After the p.i.a. has been initialized, the system is synchronized with the VcK signal from the digitization unit and data acquired until the recording is terminated by the touch pad. When the allocated section of memory is full, the data starts overwriting the same section so that using a pointer allows the most recent six seconds of speech to be retained. When acquisition is terminated, the program adds a stop code, re-enables interrupts and returns to Basic so that the data may be transferred to disc.

In a similar way, when playback is selected, the program initially decides whether the desired speech is already in memory and if necessary reads the data in from a disc file. Again, all interrupts are disabled before the system initializes the p.i.a. for transfer of data to the digitization unit. The system detects the VcK signal from the 5218 and transfers the data according to the timing diagram of Fig.4. When the stop code is detected, interrupts are re-enabled and the system returns to the main program.

The system's versatility has been extended by providing an overlay editor which allows for example, therapist to design an overlay on the touch pad (see below). Thus the user has control of the number and size of sensitive elements on the touch pad. The overlay editor can be entered at the beginning of the main program but the program defaults to using the current overlay after a suitable time delay.

Software. Listings of the program controlling the acquisition and replay of the speech date, and for the overlay editor, together with the p.c. prototype layout are available from the editorial office in return for a self-addressed A4 envelope, marked 'phrase recorder'.

Aled Evans and John Fenner are medical physicists in the West of Scotland Health Board's' Department of Clinical Physics and Bioengineering in Glasgow.

References:

OKI Semiconductor Application Note. Simultaneous speech analysis and synthesis with MSM5218, 1982.

1. Thomas, A. The random access tape recorder system. Proc. 4th Annual Conference on Rehabilitation Engineering, Washington D.C. 1981.

2. Keating, D., Evans, A.L., Wyper, D.J. and Cunningham, E. Comparison of the intelligibility of some low-cost speech synthesis devices. British Journal of Disorders of Communication (vol.21), 1986, pp 167-172.

------------

Also see: Versatile operational amplifier

==========

(adapted from: Wireless World , Jan. 1987)

Top of Page

PREV. |   | NEXT |  Guide Index | HOME