Instruments

## TMS34082

## Designer's Handbook

# TMS34082 Designer's Handbook 

2564007-9721 revision A May 1991

Texas
INSTRUMENTS

## IMPORTANT NOTICE

Texas Instruments (TI) reserves the right to make changes to or to discontinue any semiconductor product or service identified in this publication without notice. TI advises its customers to obtain the latest version of the relevant information to verify, before placing orders, that the information being relied upon is current.

Tl warrants performance of its semiconductor products to current specifications in accordance with Tl's standard warranty. Testing and other quality control techniques are utilized to the extent TI deems necessary to support this warranty. Unless mandated by government requirements, specific testing of all parameters of each device is not necessarily performed.

TI assumes no liability for Tl applications assistance, customer product design, software performance, or infringement of patents or services described herein. Nor does TI warrant or represent that license, either express or implied, is granted under any patent right, copyright, mask work right, or other intellectual property right of TI covering or relating to any combination, machine, or process in which such semiconductor products or services might be or are used.

Texas Instruments products are not intended for use in life-support appliances, devices, or systems. Use of a TI product in such applications without the written consent of the appropriate TI officer is prohibited.

## Preface

## Read This First

## How to Use This Manual

The purpose of this user's guide is to provide the TI customer with information on the TMS34082 graphics floating-point processor. This manual can also be used as a reference guide for developing hardware or software applications. The following list summarizes the contents of the chapters and appendices in this user's guide.
Chapter 1 Overview of the TMS34082 Introduces the TMS34082, its key features, typical applications, and support tools available.

## Chapter 2 Pinout and Pin Descriptions

Illustrates the TMS34082s package, identifies the interfaces that signals are associated with, and provides an explanation of each signal.

## Chapter 3 Data Formats

Discusses the integer and floating-point operand formats accepted by the TMS34082.

## Chapter 4 Architecture

Describes the architectural elements of the TMS34082. Includes the bus interfaces, sequence control, registers, internal floating-point unit core, and test logic.

## Chapter 5 Coprocessor Mode

Describes using the TMS34082 as a coprocessor to the TMS34020, including the hardware interface, recommended configurations, and example programs with timing diagrams.

## Chapter 6 Host-Independent Mode

Provides information on using the TMS34082 as a stand-alone processor or a coprocessor to another host.

## Chapter 7 Internal Instructions

Shows how to use internal instructions in both coprocessor and host-independent mode. Explains the format and provides an alphabetical reference of the internal instruction set.

## Chapter 8 External Instructions

Shows how to use external instructions in both coprocessor and host-independent mode. Explains the format and provides an alphabetical reference of the external instruction set.

## Appendix A System Design Considerations

Provides recommendations on logic design, bypass capacitors, PWB design, and thermal considerations.

Appendix B TMS34082 Data Sheet
Contains the commercial data sheet for the TMS34082A.

## Appendix C SMJ34082 Data Sheet

Contains the advance information military data sheet for the SMJ34082A.

## Appendix D Maximizing Your MFLOPS with the TMS34082 and Motorola MC68030

Contains an application note on interfacing the TMS34082 (in host-independent mode) to the Motorola MC68030.

## Appendix E A High-Performance Floating-Point Image Computing Workstation for Medical Applications

Contains an application note on an imaging system using a TMS34020 with four TMS34082 coprocessors.

Appendix F Parallel Signal and Matrix Processing with the TMS34082
Contains an application note outlining and analyzing a TMS34082-based parallel architecture design.

## Related Documentation

The following documents are available from Texas Instruments. To obtain a copy of any of these TIdocuments, please call the Customer Response Center (CRC) at (800) 232-3200 unless otherwise noted. When ordering, please identify the book by its title and corresponding literature number.

TMS34082A Data Sheet (literature number SCGS001) is included in Appendix B of this book. It contains electrical specifications, timing information, and mechanical data for the TMS34082A.

SMJ34082A Data Sheet (literature number SGUS012A) is included in Appendix C of this book. It contains electrical specifications, timing information, and mechanical data for the SMJ34082A.

TMS34020 User's Guide (literature number SPVU019) discusses hardware aspects of the TMS34020, such as pin functions, architecture, stack operations, and interfaces. Contains the TMS34020 instruction set and interface to the TMS34082.

TMS34020 Data Sheet (literature number SPVS004) contains electrical specifications, timing information, and mechanical data for the TMS34020.

TMS34082 Software Tool Kit User's Guide describes the C compiler, assembler, linker, librarian, and simulator that are available for developing TMS34082 external instruction code. Call your TI sales representative for the demonstration version of the tool kit.

TMS340 Family Code-Generation Tools User's Guide (literature number SPVU004) describes the C compiler, assembler, linker, archiver, and auxiliary tools that are available for developing TMS34010, TMS34020, or TMS34020/TMS34082 code.

TMS34082 Assembly Support for Code-Generation Tools User's Guide (literature number SPVU029) summarizes the instruction code used with the TMS34082.

TIGA Interface User's Guide (literature number SPVU015) describes the Texas Instruments Graphics Architecture (TIGA), a software interface that standardizes communication between application software and TMS340-based hardware for IBM-compatible PCs.

TMS34082 3-D Graphics Library User's Guide describes an extensive array of C -callable functions including polygon clipping, shading, and vector and matrix operations. This library is TIGA-compatible and can also be used in non-TIGA applications. Call your TI sales representative or the DVP System Engineering Hotline for information on purchasing this product.

You may also find the following documentation useful. Many of the complex graphics instructions in the TMS34082 are based on algorithms found in this book:

Foley, James, and Andries van Dam. Fundamentals of Interactive Computer Graphics. Reading, Massachusetts: Addison-Wesley, 1982.

## Style and Symbol Conventions

This document uses the following conventions.
Program listings, program examples, filenames, and symbol names are shown in a special typeface similar to a typewriter's. Examples use a bold version of the special typeface for emphasis.

Here is a sample program listing:

| 0011 | 0005 | 0001 | .field | 1,2 |
| :--- | :--- | :--- | :--- | :--- |
| 0012 | 0005 | 0003 | .field | 3,4 |
| 0013 | 0005 | 0006 | .field | 6,3 |
| 0014 | 0006 |  | .even |  |

In syntax descriptions, the instruction is in a bold typeface font and parameters are in an italic typeface. Portions of a syntax that are in bold should be entered as shown; portions of a syntax that are in italics describe the type of information that should be entered. Here is an example of an instruction syntax:

## NEGF CRs, CRd

This instruction has two parameters, indicated by CRs and CRd. When you use NEGF, the parameters must be actual TMS34082 registers, such as RA9 and RB1.

Square brackets ( [ and ]) identify an optional parameter. If you use an optional parameter, you specity the information within the brackets; you don't enter the brackets themselves. Here's an example of an instruction that has an optional parameter:

## MOVD *Rs+, CRd[, count]

The MOVD instruction has three parameters. The first two parameters, Rs and CRd, are required. The third parameter, count, is optional. As this syntax shows, if you use the optional third parameter, you must precede it with a comma.

In the internal instruction set listings, Rs and Rd refer to TMS34020 source and destination registers, respectively. CRs and CRd refer to coprocessor or TMS34082 registers.

## Trademarks

EPIC, SCOPE, and TIGA are trademarks of Texas Instruments Incorporated. IBM, PC-DOS, and PC/AT are trademarks of International Business Machines, Inc.

MS-DOS is a trademark of Microsoft Corporation.
NeXT is a trademark of NeXT, INC.
PAL is a registered trademark of Monolithic Memories, Inc.
X Windows Systems is a trademark of the Massachusetts Institute of technology.

## If You Need Assistance . . .

| If you want to . . . Do this . . |  |
| :---: | :---: |
| Receive more information about TI floating-point products | Call the CRC Hot Line: <br> (800) 232-3200 <br> Or write to: <br> Texas Instruments Incorporated <br> Datapath VLSI Products <br> Marketing Communications <br> P.O. Box 655303, M/S 8316 <br> Dallas, Texas 75265 |
| Order TI documentation | Call the CRC Hot Line: (800) 232-3200 |
| Ask questions about product operation or report suspected problems | Call DVP Systems Engineering Hot Line: (214) 997-3970 |
| Inquiries related to this document: | Write to: <br> Texas Instruments Incorporated Datapath VLSI Products Marketing Communications P.O. Box 655303, M/S 8316 Dallas, Texas 75265 |

## Contents

1 Overview of the TMS34082 ..... 1-1
1.1 TMS34082 Key Features ..... 1-2
1.2 Performance Benchmarks ..... 1-3
1.3 TMS34082 General Description ..... 1-4
1.4 Typical Applications ..... 1-9
1.5 Development Tools ..... 1-11
1.5.1 TMS34082 Software Tool Kit ..... 1-11
1.5.2 TMS34082 3-D Graphics Library ..... 1-12
1.5.3 TMS34082 Demonstration Board ..... 1-13
1.6 TMS34020 Graphics System Processor ..... 1-14
1.6.1 TMS34020 Key Features ..... 1-14
1.6.2 TMS34020 Software Tools ..... 1-17
1.6.3 TIGA Graphics Interface ..... 1-19
1.6.4 TMS34020 Software Development Board ..... 1-20
1.7 TMS34082 Ordering Information ..... 1-21
1.8 Technical Assistance ..... 1-22
2 Pinout and Pin Descriptions ..... 2-1
2.1 Pinout ..... 2-2
2.2 Pin Functional Descriptions ..... 2-5
3 Data Formats ..... 3-1
3.1 Integer Formats ..... 3-2
3.1.1 Signed Integers ..... 3-2
3.1.2 Unsigned Integer ..... 3-2
3.2 Floating-Point Formats ..... 3-3
3.2.1 Single-Precision Floating-Point ..... 3-3
3.2.2 Double-Precision Floating-Point ..... 3-3
3.2.3 Denormal and Wrapped Numbers ..... 3-4
3.2.4 Special Floating-Point Numbers ..... 3-5
3.2.5 Range of Floating-Point Numbers ..... 3-6
4 Architecture ..... 4-1
4.1 Functional Block Diagram ..... 4-2
4.2 Operating Modes ..... 4-3
4.3 Bus Interfaces ..... 4-4
4.3.1 LAD Bus ..... 4-4
4.3.2 MSD Bus ..... 4-5
4.4 Sequence Control ..... 4-7
4.5 Registers ..... 4-8
4.5.1 Register Files RA and RB ..... 4-11
4.5.2 Feedback Registers $C$ and $C T$ ..... 4-12
4.5.3 Configuration Register (CONFIG) ..... 4-13
4.5.4 Status Register ..... 4-19
4.5.5 Indirect Address Register ..... 4-22
4.5.6 Stack ..... 4-22
4.5.7 Interrupt Vector Register ..... 4-23
4.5.8 Interrupt Return Register ..... 4-23
4.5.9 COUNTX and COUNTY Registers ..... 4-24
4.5.10 MIN-MAX/LOOPCT Register ..... 4-24
4.6 FPU Core ..... 4-25
4.6.1 Operand Selection ..... 4-25
4.6.2 Pipeline Registers ..... 4-27
4.6.3 ALU ..... 4-29
4.6.4 Multiplier ..... 4-29
4.6.5 Output Control ..... 4-31
4.7 $\overline{R E S E T}$ and RDY ..... 4-32
4.8 Emulation Control ..... 4-33
4.9 JTAG Test Port ..... 4-34
4.9.1 Test Instructions ..... 4-34
4.9.2 Boundary Scan Register ..... 4-35
5 Coprocessor Mode ..... 5-1
5.1 TMS34020/TMS34082 Interface Overview ..... 5-2
5.2 Clocks ..... 5-4
5.3 TMS34082 Initialization ..... 5-4
5.4 Configuration Register Settings for Coprocessor Mode ..... 5-5
5.4.1 Exception Masks ..... 5-5
5.4.2 Fast vs IEEE Mode ..... 5-5
5.4.3 Pipeline Mode Settings ..... 5-5
5.5 TMS34020/TMS34082 LAD Bus Operation ..... 5-6
5.5.1 LAD Bus Protocol ..... 5-7
5.5.2 Enabling the LAD Bus Drivers ..... 5-12
5.5.3 Bus Faults ..... 5-12
5.6 Polling the Coprocessor ..... 5-14
5.7 Interrupt Handling ..... 5-15
5.7.1 Exception Detect Interrupts ..... 5-15
5.7.2 Software Interrupts ..... 5-16
5.7.3 Interrupting the TMS34020 ..... 5-16
5.8 TMS34020/TMS34082 Code Example ..... 5-18
5.9 TMS34020/TMS34082 Timing Examples ..... 5-20
5.10 MSD Bus Operation in Coprocessor Mode ..... 5-24
5.10.1 Connecting External Memory ..... 5-24
5.10.2 TMS34082 External SRAM Timing Analysis ..... 5-25
5.10.3 Using External Code ..... 5-26
5.11 TMS34020/TMS34082/SRAM Code Example ..... 5-28
5.12 Multiple TMS34082s ..... 5-36
6 Host-Independent Mode ..... 6-1
6.1 Initialization ..... 6-2
6.1.1 Pin Connections ..... 6-2
6.1.2 Bootstrap Loader ..... 6-2
6.2 LAD Bus ..... 6-4
6.2.1 Control Signals ..... 6-4
6.2.2 Immediate Data Transfers ..... 6-5
6.3 MSD Bus ..... 6-7
6.3.1 MSD Bus Control Signals ..... 6-7
6.3.2 Memory Models ..... 6-8
6.4 Reset ..... 6-8
6.5 Wait States ..... 6-9
6.6 User Programmable Outputs ..... 6-9
6.7 Conditional Code Input ..... 6-9
6.8 Interrupts ..... 6-10
6.8.1 Hardware Interrupts ..... 6-10
6.8.2 Software Interrupts ..... 6-10
6.8.3 Exception Detect Interrupts ..... 6-11
7 Internal Instructions ..... 7-1
7.1 Internal Instructions Overview ..... 7-2
7.2 Complex Graphics Instructions ..... 7-4
7.3 Internal Routine Addresses and Cycle Counts ..... 7-7
7.4 Coprocessor Mode Internal Instruction Format ..... 7-14
7.4.1 Coprocessor ID Field ..... 7-14
7.4.2 Register Field ..... 7-14
7.4.3 Addressing Mode Field ..... 7-15
7.4.4 FPU Operation Field ..... 7-15
7.5 Type, Size, and I Fields ..... 7-16
7.6 Internal Instruction Opcodes ..... 7-17
8 External Instructions ..... 8-1
8.1 Overview ..... 8-2
8.2 FPU Processing Instruction Format ..... 8-2
8.2.1 FPU Processing Sequencer Opcodes ..... 8-3
8.2.2 Operand Selection ..... 8-3
8.2.3 FPU Processing Instruction Codes ..... 8-8
8.3 External Instruction Cycle Counts ..... 8-9
8.4 General Restrictions for External Instructions ..... 8-13
8.5 External Assembly Instructions ..... 8-14
A System Design Considerations ..... A-1
A. 1 Logic Design ..... A-2
A. 2 Bypass Capacitors ..... A-3
A. 3 PWB Design ..... A-4
A. 4 Clock Routing ..... A-5
A. 5 Thermal Considerations ..... A-6
B TMS34082A Data Sheet ..... B-1
C SMJ34082A Data Sheet ..... C-1
D Maximizing Your MFLOPS with the TMS34082 and Motorola MC68030 ..... D-1
Overview ..... D-3
Objectives ..... D-3
TMS34082 Overview ..... D-3
System Architecture ..... D-4
System Overview ..... D-4
Objectives and Trade-Offs ..... D-5
Software Description ..... D-5
Overview of Code Development ..... D-5
Big Endian, Little Endian ..... D-6
TMS34082 Code Development ..... D-8
Motorola MC68030 Code Development ..... D-9
Intel 80286 Code Development ..... D-10
Hardware Description ..... D-11
Overview ..... D-11
PC/AT Interface ..... D-11
Host Processor Interface ..... D-12
TMS34082 as a Parallel Processor ..... D-12
Performance Analysis ..... D-13
System Information - Parts List ..... D-14
Schematics - Hardware Design ..... D-15
PAL ${ }^{\circledR}$ Code Listing ..... D-39
Memory Decode for TMS34082 Accelerator Board ..... D-39
I/O Decode for TMS34082 Accelerator Board ..... D-41
Status Control for TMS34082 Accelerator Board ..... D-42
Byte Enable Decode for TMS34082 Accelerator Board ..... D-44
Pattern Decode for TMS34082 Accelerator Board ..... D-45
Software Listings ..... D-47
References ..... D-47
E A High Performance Floating-Point Image Computing Workstation for Medical ApplicationsE-1
Abstract ..... E-3
Introduction ..... E-4
Background ..... E-5
System Architecture ..... E-6
NeXT ${ }^{\text {tm }}$ Host System and Interface Logic ..... E-6
Processors ..... E-8
Memory ..... E-9
Video Display ..... E-9
Software Architecture ..... E-10
Application Areas ..... E-12
PACS Workstation ..... E-12
Electronic Alternator ..... E-13
Image Processing and Graphics ..... E-14
Conclusion ..... E-16
Acknowledgements ..... E-17
References ..... E-17
F Parallel Signal and Matrix Processing with the TMS34082 ..... F-1
Introduction ..... F-3
The HARP Architecture ..... F-4
TMS34082 Host-Independent Mode Optimizations ..... F-8
Algorithms ..... F-10
Simulation Results and Performance Analysis ..... F-13
Conclusion ..... F-19
Bibliography ..... F-20

## Figures

Figure 1-1. TMS34082 High-Level Block Diagram ..... 1-4
Figure 1-2. Coprocessor Mode Bus Architectures ..... 1-7
Figure 1-3. Host-Independent Mode Bus Architectures ..... 1-8
Figure 1-4. Sample TMS34082 Architectures ..... 1-10
Figure 1-5. Overview of TMS34082 Code-Generation Tools ..... 1-11
Figure 1-6. TMS34082 Demonstration Board Block Diagram ..... 1-13
Figure 1-7. TMS34020 High-Level Block Diagram ..... 1-15
Figure 1-8. TMS34020 and TMS34082 Software Tools ..... 1-18
Figure 1-9. Graphics Processing Shared Between TMS340 and Host Processors ..... 1-19
Figure 1-10. TMS34020 SDB Block Diagram ..... 1-20
Figure 2-1. TMS34082 Pinout, 145-Pin PGA Package ..... 2-2
Figure 3-1. IEEE Signed Integer Format ..... 3-2
Figure 3-2. IEEE Unsigned Integer Format ..... 3-2
Figure 3-3. IEEE Single-Precision Format ..... 3-3
Figure 3-4. IEEE Double-Precision Format ..... 3-4
Figure 3-5. Special Floating-Point Formats ..... 3-5
Figure 4-1. Functional Block Diagram ..... 4-2
Figure 4-2. Register Usage ..... 4-8
Figure 4-3. TMS34082 Register Model ..... 4-9
Figure 4-4. General-Purpose Registers ..... 4-11
Figure 4-5. Register Files with ONEFILE High ..... 4-12
Figure 4-6. Host-Independent Mode LAD Bus Configuration for LADCFG High ..... 4-15
Figure 4-7. MSD Bus Configuration for MEMCFG Low ..... 4-16
Figure 4-8. MSD Bus Configuration for MEMCFG High ..... 4-16
Figure 4-9. Indirect Address Register Format ..... 4-22
Figure 4-10. Stack Register Format ..... 4-23
Figure 4-11. Interrupt Vector Register Format ..... 4-23
Figure 4-12. Interrupt Return Register Format ..... 4-23
Figure 4-13. COUNT Registers Format ..... 4-24
Figure 4-14. MIN-MAX/LOOPCT Register Format ..... 4-24
Figure 4-15. FPU Core Functional Block Diagram ..... 4-26
Figure 4-16. Effects of Pipelining ..... 4-28
Figure 4-17. Functional Diagram for ALU ..... 4-29
Figure 4-18. Functional Diagram for Multiplier ..... 4-30
Figure 4-19. Instruction Register Order of Scan ..... 4-34
Figure 4-20. Boundary Scan Register Order of Scan ..... 4-36
Figure 5-1. TMS3402/TMS34082 Register Model ..... 5-2
Figure 5-2. TMS34020/TMS34082 Interconnection ..... 5-3
Figure 5-3. Transferring a Command from the TMS34020 to the TMS34082 ..... 5-8
Figure 5-4. Transferring TMS34020 Registers to the TMS34082 ..... 5-9
Figure 5-5. Transferring from the TMS34082 to a TMS34020 Register ..... 5-10
Figure 5-6. Transferring Memory to the TMS34082 ..... 5-11
Figure 5-7. Transferring from the TMS34082 to Memory ..... 5-12
Figure 5-8. Multiply 2 Double-Precision Numbers in TMS34020 Registers and Store Back to TMS34020 Registers (Mode 1) ..... 5-21
Figure 5-9. Add 2 Single-Precision Numbers from DRAM and Store Result Back to DRAM (Mode 2) ..... 5-22
Figure 5-10. Add 2 Single-Precision Numbers from DRAM and Store Result Back to DRAM (Mode 2), Instructions Not in TMS34020 Cache ..... 5-23
Figure 5-11. TMS34020/TMS34082/SRAM with Minimal SRAM Code Space (MEMCFG $=\mathrm{L}$ ) ..... 5-24
Figure 5-12. TMS34020/TMS34082/SRAM with Maximum SRAM Code/Data Space (MEMCFG = L) ..... 5-25
Figure 5-13. Memory Map for External Memory ..... 5-26
Figure 5-14. Example Subroutine Using the Jump Table ..... 5-27
Figure 5-15. TMS34020 with Multiple TMS34082/SRAM Blocks (MEMCFG = L) ..... 5-37
Figure 6-1. Bootstrap Loader ..... 6-3
Figure 6-2. Using FIFOs on the LAD Bus ..... 6-4
Figure 6-3. Using COINT as a Device Select (LADCFG=H) ..... 6-5
Figure 7-1. Source for Internal Instructions in Coprocessor Mode ..... 7-1
Figure 7-2. 3-D Graphics Pipeline Using TMS34082 Complex Instructions ..... 7-4
Figure 8-1. Source of Instructions for Coprocessor Mode ..... 8-1
Figure 8-2. Instructions in Host-Independent Mode ..... 8-1
Figure 8-3. Operand Selection ..... 8-4
Figure A-1. Example of Using $\overline{R A S}$ and $\overline{\text { CAS }}$ Buffers in Coprocessor Mode ..... A-2
Figure A-2. Recommended Bypass Capacitor Placement ..... A-3
Figure A-3. Recommended Clock Routing Techniques ..... A-5
Figure D-1. Motorola MC68030 Interface to the TMS34082 - Block Diagram ..... D-4
Figure D-2. Data Organization in Memory ..... D-7
Figure D-3. Block Diagram - TMS34082 Code ..... D-8
Figure D-4. Block Diagram - Motorola MC68030 Code ..... D-9
Figure D-5. Block Diagram - PC/AT Code ..... D-10
Figure D-6. PC/AC Interface: I/O and Memory Addressing ..... D-11
Figure D-7. Motorola MC68030 Interface: Memory Addressing ..... D-12
Figure D-8(a). Block Diagram ..... D-15
Figure D-8(b). Block Diagram ..... D-16
Figure D-9. PC/AT I/F and Control, Details of U1, U5, U6, U7, and U8 ..... D-17
Figure D-10. PC/AT I/F and Control, Details of U29 ..... D-18
Figure D-11. PC/AT I/F and Control, Details of U2, U3, and U4 ..... D-19
Figure D-12. Motorola MC68030 and Address Buffers, Details of U31, U32, and U33 ..... D-20
Figure D-13. Motorola MC68030 and Address Buffers, Details of U10 ..... D-21
Figure D-14. Motorola MC68030 and Address Buffers, Details of Oscillator and U30 ..... D-22
Figure D-15. Motorola MC68030 Decode/Control, Details of U11 ..... D-22
Figure D-16. Motorola MC68030 Decode/Control, Details of RP1, RP2, and RP3 ..... D-23
Figure D-17. Motorola MC68030 Decode/Control, Details of U11, U12, U13, and U30 ..... D-24
Figure D-18. $8 \mathrm{~K} \times 8$ SRAM, Details of U15, U16, U17, and U18 ..... D-25
Figure D-19. Motorola MC68030 Decode/Control, Details of U3, and U37 ..... D-27
Figure D-20. $8 \mathrm{~K} \times 8$ DP-SRAM, Details of U14, and U19 ..... D-28
Figure D-21. FIFO Logic, Details of U23, U24, U25, and U26 ..... D-29
Figure D-22. $8 \mathrm{~K} \times 8 \mathrm{DP}-$ SRAM, Details of U20 ..... D-31
Figure D-23. FIFO Logic, Details of U21 ..... D-32
Figure D-24. FIFO Logic DP-SRAM, Details of U22 ..... D-33
Figure D-25. FIFO Logic, Details of U14 ..... D-34
Figure D-26. TMS34082, Details of U38 ..... D34
Figure D-27. TMS34082, Details of U28 ..... D-35
Figure D-28. TMS34082, Details of U28 ..... D-36
Figure D-29. AT-Bus Connector ..... D-37
Figure D-30. Capacitors ..... D-38
Figure E-1. UWGSP3 Software Architecture ..... E-6
Figure E-2. UWGSP3 Software Architecture ..... E-10
Figure E-3. Lowpass Filter Specification Window ..... E-11
Figure E-4. Image Load (left) and Virtual Frame Buffer (right) Windows ..... E-12
Figure F-1. System Architecture ..... F-5
Figure F-2. System Memory Map ..... F-6
Figure F-3. PE Architecture ..... F-7
Figure F-4. LAD Bus Controller Architecture ..... F-7
Figure F-5: Parallel Jaccobi Updating on a Systolic Architecture ..... F-12
Figure F-6. Matrix Multiplication Performance on 10-Processor Systems ..... F-14
Figure F-7. $28 \times 128$ Matrix Multiplication on P-Processor Systems ..... F-14
Figure F-8. QRD Performance on 10-Processor Systems ..... F-16
Figure F-9. $\quad 28 \times 128$ QRD on P-Processor Systems ..... F-16
Figure F-10. SVD Performance on 8-Processor Systems ..... F-17
Figure F-11. $48 \times 48$ SVD on P-Processor Systems ..... F-18

## Tables

Table 1-1. TMS34082 Integer Benchmark Timings ..... 1-3
Table 1-2. TMS34082 Floating-Point Benchmark Timings ..... 1-3
Table 1-3. Description of the Benchmarks Used ..... 1-3
Table 1-4. Applications for the TMS34082 ..... 1-9
Table 1-5. TMS34082 Product Information ..... 1-21
Table 2-1. Pin Assignments (PGA Package) ..... 2-3
Table 2-2. Alphabetical Listing - Pin Assignments (PGA Package) ..... 2-4
Table 2-3. LAD Bus Signals ..... 2-5
Table 2-4. MSD Bus Signals ..... 2-7
Table 2-5. Clock and Control Signals ..... 2-9
Table 2-6. Emulation Control Signals ..... 2-9
Table 2-7. Power and N/C Signals ..... 2-10
Table 3-1. Floating-Point Number Representations ..... 3-6
Table 4-1. MSD Bus Control Signals ..... 4-5
Table 4-2. Memory Operations on MSD ..... 4-5
Table 4-3. Internal Registers ..... 4-10
Table 4-4. Configuration Register Definition ..... 4-14
Table 4-5. Pipeline Settings ..... 4-17
Table 4-6. Handling Wrapped Multiplier Outputs ..... 4-18
Table 4-7. Data Ordering for Loads/Stores ..... 4-18
Table 4-8. Rounding Modes ..... 4-19
Table 4-9. Status Register Definition ..... 4-20
Table 4-10. Signal States During Reset ..... 4-32
Table 4-11. Test Modes ..... 4-33
Table 4-12. Test Pins for Normal Operation ..... 4-34
Table 4-13. Instruction Register Opcodes ..... 4-35
Table 4-14. Boundary Scan Register Enable Bits ..... 4-35
Table 5-1. Recommended TMS34082 Pin Connections ..... 5-3
Table 5-2. Bus Cycle Completion Conditions ..... 5-13
Table 5-3. Bit Definitions for TMS34020 Status Check Command ..... 5-14
Table 5-4. Parameters Used for Calculating SRAM Speed ..... 5-25
Table 6-1. Pin Connections ..... 6-2
Table 7-1. Internal ROM Routines (for Mode 0 FPU Operations) ..... 7-8
Table 7-2. Coprocessor IDs ..... 7-14
Table 7-3. Addressing Modes ..... 7-15
Table 7-4. Operand Types ..... 7-16
Table 8-1. Cycle Counts for External Instructions ..... 8-9
Table 8-2. Bit Definitions for External Instructions ..... 8-14
Table D-1. Performance Comparison Chart ..... D-13
Table D-2. System Information - Parts List ..... D-14
Table D-3. $\quad 8 \mathrm{~K} \times 8$ SRAM DP-SRAM, Detail Pin Assignments for U15, U16, U17, and U18 ..... D-26
Table D-4. FIFO Logic, Details of Pin Assignments for U23, U24, U25, and U26 ..... D-30
Table F-1. Distributed FFT Performance Results ..... F-15
Table F-2. Pipelined FFT Performance Results for Real-Time Signal and Image Processing F-17

## Examples

Example 5-1. Using the Status Check Command ..... 5-14
Example 5-2. Saving and Restoring the TMS34082 Machine State ..... 5-16
Example 5-3. Multiplying Two $3 \times 3$ Matrices ..... 5-18
Example 5-4. Instructions for a $3 \times 3$ by $3 \times 3$ Matrix Multiply ..... 5-19
Example 5-5. Assembler Source for Double-Precision Multiply ..... 5-20
Example 5-6. Assembler Source for Single-Precision Add ..... 5-20
Example 5-7. TMS34020 Assembler Listing for $3 \times 3$ by $3 \times 3$ Matrix Multiply ..... 5-29
Example 5-8. TMS34082 Assembler Listing for $3 \times 3$ by $3 \times 3$ Matrix Multiply ..... 5-30
Example 5-9. Assembler Code for Multiple TMS34082s ..... 5-38

## Chapter 1

## Overview of the TMS34082

The Texas Instruments TMS34082 Graphics Floating-Point Processor is designed for your advanced numeric applications. This high-performance device offers an outstanding price/performance ratio, flexibility, and ease of use with TI's development tools. The TMS34082 acts as either a tightly coupled coprocessor for the TMS34020 Graphics System Processor (GSP), as an independent processor, or as a coprocessor to another host.

By integrating a 64-bit IEEE Floating-Point Unit (FPU) with a modified Harvard architecture microprocessor and multi-port register files onto a single device, the TMS34082 can sustain exceptionally high internal throughput rates. All internal data paths are 64 bits wide. The RISC-like basic instruction set executes at a rate of one instruction per clock cycle. In addition, many popular numeric and graphics routines are contained directly on-chip.

The TMS34082 offers an attractive cost/performance ratio and supports the integration of graphics- and computation-intensive solutions in a single, low-cost device. The cost per MFLOP performance achieved by the TMS34082 makes it an ideal floating-point solution.

Texas Instruments supports the TMS34082 with a complete set of PC-based hardware and software developmenttools, including an easy-to-use simulator, a TMS34020/TMS34082 software development board, a TMS34082 demonstration board, a 3-D graphics library, an optimizing C compiler, a macro-assembler, and software libraries.

### 1.1 TMS34082 Key Features

High-performance floating-point RISC processor optimized for graphics
Two operating modes:
Floating-point coprocessor for the TMS34020 Graphics System Processor Independent floating-point processor

Direct connection to TMS34020 coprocessor interface
Direct extension to the TMS34020 instruction set Multiple TMS34082 capability

Fast instruction cycle time:
TMS34082-40 . . 50-ns coprocessor mode, 50-ns host-independent mode
TMS34082-32. . .62.5-ns coprocessor mode, 60-ns host-independent mode

Sustained data transfer rates of 160M bytes/second (TMS34082-40)
Sequencer executes internal or user-programmed instructions
Twenty-two 64-bit data registers
Comprehensive floating-point and integer instruction set Internal programs for vector, matrix, and 3-D graphics operations

Full IEEE Std 754-1985 compatibility:
Addition, subtraction, multiplication, and comparison
Division and square root
Selectable data formats:
32-bit integer
32-bit single-precision floating-point
64-bit double-precision floating-point
External memory addressing capability:
Program storage (up to 64 K words)
Data storage (up to 64 K words)
$0.8-\mu \mathrm{m}$ EPIC ${ }^{\text {TM }}$ CMOS technology
High-performance
Low power (<1.5 W)

### 1.2 Performance Benchmarks

Tables 1-1 and 1-2 show benchmark timings. Table 1-3 describes the benchmarks selected to show TMS34082 performance.

Table 1-1. TMS34082 Integer Benchmark Timings ${ }^{\dagger}$

| Benchmark | Units of Measure | Integer |  |
| :--- | :--- | :---: | :---: |
|  |  | TMS34082A-32 | TMS34082A-40 |
| MIPS Equivalents | MIPS | 32 | 40 |
| Dhrystones | Dhrystones/second | 10,240 | 12,800 |

## Table 1-2. TMS34082 Floating-Point Benchmark Timings ${ }^{\dagger}$

| Benchmark | Units of Measure | Single-Precision |  | Double-Precision |  |
| :--- | :--- | :---: | :---: | :---: | :---: |
|  |  | TMS34082A-32 | TMS34082A-40 | TMS34082A-32 | TMS34082A-40 |
| Peak MFLOPS | MFLOPS | 32 | 40 | 16 | 20 |
| Linpack | MFLOPS | 11.0 | 13.7 | 6.3 | 7.9 |
| Whetstones | MWhetstones/second | 7.9 | 9.9 | 4.6 | 5.7 |

$\dagger$ Based on actual measured system performance.

## Table 1-3. Description of the Benchmarks Used ${ }^{\ddagger}$

| Benchmark | Operations Tested | Where Applicable |
| :--- | :--- | :--- |
| Linpack | Floating-point and integer array manipulation, <br> including Gaussian elimination, vector dot products, <br> and matrix multiplication | Dense systems of linear equations with array <br> manipulation |
| Whetstones | Mathematical operations: integer, floating-point, and <br> trigonometric operations | Engineering and scientific computing applications |
| Dhrystones | Enumeration, record and pointer manipulation, and <br> integer operations | Systems programming applications |

$\ddagger$ Reference: Hinnant, David F., "What Makes a Good Benchmark?", MIPS, September, 1988, pp. 102-103.

### 1.3 TMS34082 General Description

The TMS34082 is a high-speed floating-point processor implemented in the Texas Instruments advanced $0.8 \mu \mathrm{~m}$ CMOS technology. On a single chip, the TMS34082 combines a 16 -bit sequencer and a three-operand 64-bit FPU (source A, source B, destination) with twenty-two 64 -bit data registers. The data registers are organized into two banks of 10 registers each, with two registers for internal feedback. In addition, an instruction register to control FPU execution, a status register to retain the most recent FPU status results, eight control registers, and a two-register stack are provided. The key architectural elements are shown in Figure 1-1.

The ALU and the multiplier are closely coupled and work in parallel to perform sums of products and products of sums. During multiply/accumulate operations, both the ALU and the multiplier are active, and the registers in the FPU core can be used to feed back products and accumulate sums without tying up locations in register banks $A$ and $B$.

Data or code may be transferred between the LAD and MSD ports at the rate of one 32 -bit word per clock cycle with a one clock latency. That comes out to 1.28 billion bits/second. This provides sufficient bandwidth to quickly transfer vector or scalar arrays into or out of external memories. Up to 512 words may be transferred with a single memory move instruction.

Figure 1-1. TMS34082 High-Level Block Diagram


The TMS34082 complies fully with IEEE Std 754-1985, the industry standard for binary floating-point formats. Floating-point operands can be either singleor double-precision. In addition to floating-point operations, the TMS34082 performs 32-bit integer arithmetic, logical comparisons, and shifts. Integer operations may be performed on 32-bit 2s complement or unsigned operands. Floating-point to integer and integer to floating-point conversions are also available.

The comprehensive RISC-like instruction set eliminates the need for complex CISC-type instructions or wide microcoded instruction words. By programming the TMS34082 at the simplest level, operations are customized for each application and most instructions execute in one clock cycle. Divide and square root instructions are ideal for numeric processing and graphics rendering, such as ray tracing routines. Using dedicated hardware and patented algorithms, the TMS34082 calculates a 64-bit double-precision divide or square root result in only 13 or 16 clock cycles, respectively.

In a single clock cycle, two single-precision or integer operands may:

1) Be read from the register file
2) Be run through the ALU and/or multiplier
3) Have result placed back into the register file

This is accomplished with both the internal pipeline and output registers disabled. Double-precision multiplies take two clock cycles to complete. Such low latencies simplify writing assembly language code, eliminating the problem of data coherency in a long pipeline. Refilling or flushing the instruction pipeline is fast, also.

An internal ROM includes many commonly used matrix, graphics, and vector routines as described below. With the exception of MIN-MAX and compare operations, these routines are constructed directly from the TMS34082's basic instruction set. The internal routines include:

Matrix operations consisting of $1 \times 3,3 \times 3,1 \times 4$, and $4 \times 4$ matrix multiplies

Graphics routines such as backface testing, clipping, 2-D and 3-D compares, linear interpolation, 1-D and 2-D MIN-MAX, viewport scaling and conversion, cubic splines, and polygon elimination

Vector operations including add, subtract, magnitude, scaling, dot product, cross product, normalization, and reflection

Additional routines for $3 \times 3$ convolution, multiply/accumulate, and polynomial expansion

When used with the TMS34020, the TMS34082 operates in coprocessor mode. The TMS34020 can control multiple TMS34082 coprocessors without any additional glue logic or buffering. The clock and control signals are generated directly by the TMS34020. You can use external memory to store subroutines as well as data for those subroutines. See Chapter 5 for additional information.

When used alone or with processors other than the TMS34020, the TMS34082 functions in host-independent mode. The TMS34082 is fully programmable and can interface to other processors (such as a RISC, $80 \times 86$, or Motorola MC680x0 processor) or floating-point subsystems through its two 32-bit bidirectional buses. Chapter 6 covers this mode in greater detail.

Other features include:
Support of common microprocessor addressing modes (register, direct, indirect, postincrement, immediate)

A fully synchronous, on-chip, direct memory interface to SRAMs/ EPROMs with no glue logic and to DRAMs/VRAMs with minimal glue logic

Fully user-programmable hardware and software realtime interrupts.
The TMS34082 may implement a von Neumann architecture, a modified Harvard architecture, or a mixture of both. In a von Neumann architecture, data and instruction memories both reside on the same bus. However, a Harvard architecture has separate data and instruction sources so that both may be fetched in parallel. External data may originate from either the LAD or MSD ports. External instructions may only come from the MSD port, but the LAD port can be used to input jump entries into the MSD port memory.

Figure 1-2 shows possible TMS34082 bus architectures for coprocessor mode. In addition, Figure 1-3 shows several example architectures for host-independent mode.

Figure 1-2. Coprocessor Mode Bus Architectures


Data from LAD Port, Instructions from LAD and MSD Port


Figure 1-3. Host-Independent Mode Bus Architectures


Data and Instructions from MSD Port


Data from LAD and MSD Ports, Instructions from MSD Port


Data from LAD and MSD Ports, Instruction from MSD Port

### 1.4 Typical Applications

The 64-bit power and exceptional flexibility of the TMS34082 meet system computing requirements across the performance spectrum. These range from workstations to personal computers to embedded controllers. Table 1-4 lists typical end uses for this device. Figure 1-4 shows several examples of systems using the TMS34082.

Table 1-4. Applications for the TMS34082

| Numeric Processor | Graphics Processor |
| :--- | :--- |
| CAD/CAE workstations | 3-D graphics processing |
| UNIXJDOS accelerator for RISC/CISC machines | Graphics workstations/super workstations |
| Scientific computing | Image processing |
| Personal computers | Laser printers |
| Vector processing | Graphics rendering engines |
| Multiprocessing architectures | Imaging compression/decompression, JPEG |
| Digital signal processing | Flight simulators |
| High-speed protocol engines | Electronic publishing |
| Array processing | Computer animation |

Figure 1-4. Sample TMS34082 Architectures


Coprocessor for $80 \times 86$, Motorola MC680×0, or RISC Processor


Laser Printer Engine

### 1.5 Development Tools

### 1.5.1 TMS34082 Software Tool Kit

The TMS34082 Software Tool Kit can be used to develop code for host-independent mode applications or for external subroutines in coprocessor mode. The tool kit includes:

An ANSI standard, optimizing C compiler
A macro-assembler
A linker
An object code librarian
A functional simulator
The C compiler supports common subexpression elimination. A peephole optimizer is also provided to further enhance the execution speed and the code size of the source program. Inline assembly code can be incorporated into the C program for time-critical and hardware-dependent code sections. The object librarian allows the storage of frequently used functions in libraries for easy access (see Figure 1-5).

Figure 1-5. Overview of TMS34082 Code-Generation Tools


Included with the TMS34082 tool kit are highly optimized transcendental assembly language routines for sine, cosine, tangent, arc sine, arc cosine, and arc tangent. These are accurate to the least significant bit.

The TMS34082 tool kit will execute on an IBM PC/AT or compatible machine with MS-DOS (or PC-DOS) 2.0 or higher, 640 K of memory, one floppy drive, and one hard drive. An 80287/80387 math coprocessor is required for the simulator. A demonstration version of the Software Tool Kit is also available.

The interactive simulator displays the entire machine state of the TMS34082 (such as registers, address counter, stack, status register) and works with the C compiler/assembler/linker object files. The simulator is menu driven. During program execution, breakpoints may be set and the trace memory displayed. The cycle counting feature is useful when evaluating performance of the processor or during code optimization.

The TMS340 Family compiler and assembler, which support both the TMS34020 and TMS34082, are described in subsection 1.6.3 of this document.

### 1.5.2 TMS34082 3-D Graphics Library

The TMS34082 3-D Graphics Library contains an extensive array of C-callable functions including polygon clipping, shading, and vector and matrix operations. The library is TIGA-compatible and can also run as a non-TIGA product, giving the user portability and flexibility. The task of porting graphics standard to the TMS34020/TMS34082 is greatly simplified with the variety of functions in the library. The library also includes a 3-D graphics pipeline that can shorten the development time for application programs.

### 1.5.3 TMS34082 Demonstration Board

The TMS34082 Demonstration Board is a 40-MFLOP parallel processor with up to 3Mbytes of on-board memory. This powerfulboard allows youto evaluate performance and write code for the TMS34082 using the software tool kit, develop algorithm implementations, and integrate the software modules with the hardware. In addition, programs are executed directly on the TMS34082, resulting in much faster execution times than a software simulator. The board plugs into a PC/AT ${ }^{\text {TM }} 32$-bit card slot. Figure $1-6$ is a block diagram of the demonstration board.

Figure 1-6. TMS34082 Demonstration Board Block Diagram


Built on a PC/AT card occupying a single slot, the TMS34082 Demonstration Board features:

TMS34082-40 Floating-Point Processor (operating in host-independent mode)

20 MHz processor clock speed, 7.9 MFLOPs double-precision Linpack
Fully programmable: von Neumann or modified Harvard architectures or both

2M-bytes VRAM memory on LAD port accessible though PC/AT bus interface

256K-bytes VRAM memory on MSD port accessible through PC/AT bus interface, expandable up to 1M bytes of VRAM memory

### 1.6 TMS34020 Graphics System Processor

The TMS34020 Graphics System Processor (GSP) is an advanced 32-bit microprocessor optimized for graphic display systems. The TMS34020 is a member of the TMS340 family of computer graphics products from Texas Instruments.

The TMS34020 provides high-performance cost-effective solutions for applications that require efficient data manipulations in a graphics environment. The TMS34020 can be configured to serve in a host-based, standalone, or multiprocessing system. It has both host and multiprocessor interfaces to facilitate implementation of multiple TMS34020 systems.

The TMS34020 is supported by a full set of hardware and software development tools, including an optimizing C compiler, assembler, software libraries, a PC-based development board on a PC-based emulator. The TMS340 Family Code Generation Tools may be used to develop code for the TMS34082 in coprocessor mode. In addition, the TMS34020 is fully compatible with and supported by the Texas Instruments Graphics Architecture (TIGA).

### 1.6.1 TMS34020 Key Features

Fully programmable 32-bit general-purpose processor with 512M-byte linear address range (bit addressable)

Second generation graphics system processor:
Object code compatible with the TMS34010
Enhanced instruction set
Optimized graphics instructions
Direct coprocessor interface to TMS34082 Floating-Point Processor
On-chip peripheral features include:
Programmable CRT control
Direct DRAM/NRAM interface
Direct communication with an external (host) processor
Communication with multiple TMS34020s
Functional expansion with the coprocessor interface
Automatic CRT display refresh
Instruction set supports special graphics functions such as pixel processing, XY addressing, and window clip/hit detection

Programmable 1-,2-, 4-, $8-, 16$-, or 32 -bit pixel size
16 Boolean and 6 arithmetic pixel processing options (raster-ops)
30 general-purpose 32-bit registers
512-byte LRU on-chip instruction cache
General Description
The TMS340 family from Texas Instruments combines the best features of general-purpose microprocessors and graphics controllers to create a range of cost-effective, flexible, powerful graphics systems. The key features of the TMS340 family are speed, a high degree of programmability, and efficient manipulation of hardware-supported data types such as pixels and 2-dimensional pixel arrays.

With a built-in instruction cache, the ability to simultaneously access memory and registers, and an instruction set that enhances raster graphics operations, the TMS34020 provides programmable control of the CRT interface as well as the memory interface (both standard DRAM and multiport RAM). The 4G-bit (512M-byte) physical address space is completely bit addressable on bit boundaries using variable width data fields (1 to 32 bits). Figure 1-7 is a TMS34020 high-level block diagram.

Figure 1-7. TMS34020 High-Level Block Diagram


The TMS34020 unique memory interface speeds performance of tasks such as bit alignment and masking while supporting advanced DRAM access modes. The 32 -bit architectures supplies the large blocks of contiguously-addressable memory that are necessary in graphics applications.

Systems designed with the TMS34020 can utilize VRAM technology to facilitate applications such as high-bandwidth frame buffers. This circumvents the bottleneck often encountered when using conventional DRAMs in graphics systems.

The TMS34020 instruction set includes a full complement of general-purpose instructions, as well as graphics functions, that can be used to construct efficient high-level instructions. The instructions support arithmetic and Boolean operations, data moves, conditional jumps, and subroutine calls and returns.

The TMS34020 architecture supports a variety of pixel sizes, frame buffer sizes, and screen sizes. On-chip functions have been carefully selected so that no functions tie the TMS34020 to a particular display resolution. This enhances the portability of graphics software and allows the TMS34020 to adapt to graphics standards such as MIT's X-Windows ${ }^{\text {TM }}$, CGI/CGM, GKS, NAPLPS, PHIGS, and evolving industry standards.

Texas Instruments offers a wide variety of system solutions. The simplest TMS340 graphics system consists of the TMS34020 alone. Floating-point computations are performed in software using IEEE floating-point libraries. Adding a TMS34082 appears merely as an extension to the TMS34020 instruction set. The same calculations run much faster in dedicated hardware rather than software.

Adding external memory to the TMS34082 allows user-programmed subroutines, such as shading or contour fitting, to execute while the TMS34020 is performing other functions. Since the data for the subroutines is also in external memory, the TMS34082 is effectively decoupled from the TMS34020. The TMS34020 can poll the TMS34082 to see if the subroutine has finished. The highest performance TMS340 graphics solutions contain one or more TMS34020 along with multiple TMS34082s in a parallel processing environment. The TMS34020 acts as the display manager and also orchestrates tasks for the floating-point coprocessors. Jobs and/or data may be loaded into external memory of one TMS34082 while other TMS34082s are still executing.

### 1.6.2 TMS34020 Software Tools

Texas Instruments offers extensive development support for the TMS340 graphics family. Software tools for the TMS34020 also comprehend the TMS34082. The TMS340 Family software tools include:

An optimizing C compiler
An assembler
An archiver for building object libraries
A linker
A loader for TMS34020 and TMS34082 absolute load modules
A C source debugger
The compiler accepts programs written in C language. It outputs assembly language source code that is then processed by the assembler to convert the mnemonics to object code. The compiler and assembler generate efficient TMS34082 code in the form of internal instructions. The C compiler allows time-critical routines written in assembly language to be called from within the C program. The converse is also available; assembly routines may call C functions.

If external TMS34082 memory is present, the TMS34082 Software Tool Kit must be used to generate the subroutine code in the form of external instructions. When the TMS34082 load module has been generated, the TMS34020 loader can download both load modules as shown in Figure 1-8.

The TMS340 Family C Source Debugger supports both the TMS34020 and the TMS34082 in coprocessor mode. Other debugging tools for the TMS34082 in coprocessor and host-independent modes are available from third-party vendors.

Figure 1-8. TMS34020 and TMS34082 Software Tools


TMS340 Family Software Code Generation
Tools (used for generating TMS34020 code and TMS34082 internal instructions)

TMS34082 Software Tool Kit (generates
TMS34082 code for external memory)

### 1.6.3 TIGA $^{\text {TM }}$ Graphics Interface

The Texas Instruments Graphics Architecture (TIGA) is a software interface standard for the TMS340 family of graphics system processors. TIGA enhances the performance of MS-DOS-based PCs that contain a TMS34020 or TMS34020 (and an optional TMS34082) and an 8088/86 or 80286/80386 host microprocessor by optimizing communications between the graphics processor and the host processor. The TIGA interface allows the host and graphics processors to share execution of the application, as shown in Figure 1-9.

Figure 1-9. Graphics Processing Shared Between TMS340 and Host Processors


### 1.6.4 TMS34020 Software Development Board

The TMS34020 Software Development Board (SDB20) is a high-performance PC/AT bus graphics card. It allows you to write applications software for the TMS34020 and its companion floating-point processor, the TMS34082. The board also demonstrates the simplicity of hardware design using the TMS34020 and TMS34082 for high-performance bit-mapped graphics displays.

An optional upgrade kit, the TMS34082 SRAM Upgrade Kit, contains a business card sized board with the TMS34082 and 32K bytes of SRAM, plus software and documentation. The board plugs into the TMS34082 socket presently existing on the SDB20.

Figure 1-10. TMS34020 SDB Block Diagram


Key features of the TMS34020 SDB include:
1M-byte VRAM organized as $256 \mathrm{~K} \times 32$ bits
1M-byte DRAM organized as $256 \mathrm{~K} \times 32$ bits
TMS34082 Floating-Point Coprocessor (optional)
VGA support for $640 \times 480$ pixel resolution
Software selectable resolutions:
$1024 \times 768$ by 4 or 8 bits per pixel
$640 \times 480$ by 4 or 8 bits per pixel
$640 \times 480$ VGA mode
Software configurable base address over a full 16M-byte range
TMS34020 emulation support

### 1.7 TMS34082 Ordering Information

For the latest ordering and pricing information, please call your local Tl field sales representative or authorized TI distributor. Table 1-5 summarizes the products available for the TMS34082.

Table 1-5. TMS34082 Product Information

| Type | Description | Part Number |
| :---: | :---: | :---: |
| Silicon Devices | TMS34082A device, 32 MHz , 145-pin ceramic PGA package | TMS34082AGC-32 |
|  | TMS34082A device, 40 MHz , 145-pin ceramic PGA package | TMS34082AGC-40 |
| Documentation | TMS34082A Data Sheet | SCGS001 |
|  | TMS34082 Designer's Handbook | SCGU004 |
| Software | TMS34082 Demonstration Software Tool Kit | Contact TI |
|  | TMS34082 Software Tool Kit | TMDS3440808201 |
|  | TMS34082 3-D Graphics Library | Contact TI |
|  | TIGA Software Developer's Kit (includes the TMS340 Family Code Generation Tools and C Debugger for the PC) | TMS340SDK-PC |
| Hardware | TMS34020 Software Development Board (SDB20) | TMS3460120000 |
|  | TMS34082 SRAM Upgrade Kit | TMDS3481800-02 |

### 1.8 Technical Assistance

The Texas Instruments Datapath VLSI Products Systems Engineering group is a resource available to help you in the selection of Tl's high-performance FPUs, such as the TMS34082 Graphics Floating-Point Processor. Located in Dallas, the group works directly with designers to provide ready answers to device-related questions and also prepares a variety of applications information. The phone number for the DVP Systems Engineering hotline is (214) 997-3970.

## Chapter 2

## Pinout and Pin Descriptions

This chapter illustrates the TMS34082 pinouts and provides detailed descriptions of the TMS34082 signals. For mechanical dimensions of the TMS34082A packages, please refer to the data sheet in Appendix B. For mechanical dimensions of the SMJ34082A packages, please refer to the data sheet in Appendix C.

### 2.1 Pinout

The TMS34082A and the SMJ34082A are offered in a ceramic, 145-pin grid array (PGA) package (GC). Figure 2-1 shows the 145-pin PGA pinout.

Figure 2-1. TMS34082 Pinout, 145-Pin PGA Package
(TOP VIEW)


Table 2-1. Pin Assignments (PGA Package)

|  | Pin |  | Pin |  | Pin |  | Pin |  | Pin |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| GC\# | Name | GC\# | Name | GC\# | Name | GC\# | Name | GC\# | Name |
| A1 | NC | B15 | LAD27 | F1 | MSD10 | K15 | RDY | P2 | NC |
| A2 | LAD1 | C1 | MSD4 | F2 | MSD9 | L1 | MSD18 | P3 | MSD29 |
| A3 | LAD3 | C2 | MSD3 | F3 | $V_{C C}$ | L2 | MSD21 | P4 | MSD31 |
| A4 | LAD5 | C3 | MSD0 | F13 | CORDY | L3 | MSD23 | P5 | MSA1 |
| A5 | LAD8 | C4 | VSS | F14 | $\overline{\text { ALTCH }}$ | L13 | VSS | P6 | MSA3 |
| A6 | LAD9 | C5 | $V_{C C}$ | F15 | $\overline{\text { CAS }}$ | L14 | CIDO | P7 | MSA6 |
| A7 | LAD11 | C6 | LAD6 | G1 | MSD13 | L15 | CID2 | P8 | MSA8 |
| A8 | LAD12 | C7 | VSS | G2 | MSD12 | M1 | MSD20 | P9 | MSA10 |
| A9 | LAD13 | C8 | $V_{C C}$ | G3 | MSD11 | M2 | MSD24 | P10 | MSA13 |
| A10 | LAD15 | C9 | $\mathrm{V}_{\text {SS }}$ | G13 | WE | M3 | VSS | P11 | MWR |
| A11 | LAD17 | C10 | $\mathrm{V}_{\text {CC }}$ | G14 | EC1 | M13 | $V_{C C}$ | P12 | $\overline{\text { MOE }}$ |
| A12 | LAD19 | C11 | LAD21 | G15 | ECO | M14 | LCLK1 | P13 | INTG |
| A13 | LAD22 | C12 | $V_{S S}$ | H1 | MSD14 | M15 | LCLK2 | P14 | BUSFLT |
| A14 | LAD24 | C13 | LAD25 | H2 | TDO | N1 | MSD22 | P15 | RAS |
| A15 | NC | C14 | LAD26 | H3 | $V_{S S}$ | N2 | MSD26 | R1 | NC |
| B1 | MSD1 | C15 | LAD29 | H13 | $V_{S S}$ | N3 | VCC | R2 | MSD27 |
| B2 | NC | D1 | MSD6 | H14 | LOE | N4 | MSD28 | R3 | MSD30 |
| B3 | LADO | D2 | MSD5 | H15 | TDI | N5 | VSS | R4 | MSAO |
| B4 | LAD2 | D3 | MSD2 | J1 | MSD15 | N6 | $V_{C C}$ | R5 | MSA2 |
| B5 | LAD4 | D4 | NC | 12 | MSD16 | N7 | MSA5 | R6 | MSA4 |
| B6 | LAD7 | D13 | $V_{C C}$ | J3 | $\mathrm{V}_{\text {CC }}$ | N8 | $V_{S S}$ | R7 | MSA7 |
| B7 | LAD10 | D14 | LAD28 | J13 | CC | N9 | $V_{C C}$ | R8 | TCK |
| B8 | TMS | D15 | LAD31 | J14 | MASTER | N10 | MSA14 | R9 | MSA9 |
| B9 | LAD14 | E1 | MSD8 | J15 | CLK | N11 | $V_{S S}$ | R10 | MSA11 |
| B10 | LAD16 | E2 | MSD7 | K1 | MSD17 | N12 | $\overline{M A E}$ | R11 | MSA12 |
| B11 | LAD18 | E3 | $V_{S S}$ | K2 | MSD18 | N13 | LRDY | R12 | MSA15 |
| B12 | LAD20 | E13 | VSS | K3 | VSS | N14 | SF | R13 | DS/CS |
| B13 | LAD23 | E14 | LAD30 | K13 | CID1 | N15 | RESET | R14 | $\overline{M C E}$ |
| B14 | NC | E15 | $\overline{\text { COINT }}$ | K14 | $\overline{\text { INTR }}$ | P1 | MSD25 | R15 | NC |

Table 2-2. Alphabetical Listing - Pin Assignments (PGA Package)

| Pin |  | Pin |  | Pin |  | Pin |  | Pin |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Name | GC\# | Name | GC\# | Name | GC\# | Name | GC\# | Name | GC\# |
| $\overline{\text { ALTCH }}$ | F14 | LAD14 | B9 | MSA3 | P6 | MSD16 | J2 | TCK | R8 |
| BUSFLT | P14 | LAD15 | A10 | MSA4 | R6 | MSD17 | K1 | TDI | H15 |
| CAS | F15 | LAD16 | 810 | MSA5 | N7 | MSD18 | L1 | TDO | H2 |
| CC | J13 | LAD17 | A11 | MSA6 | P7 | MSD19 | K2 | TMS | B8 |
| CIDO | L14 | LAD18 | B11 | MSA7 | R7 | MSD20 | M1 | $\mathrm{V}_{\text {CC }}$ | C5 |
| CID1 | K13 | LAD19 | A12 | MSA8 | P8 | MSD21 | $\underline{L}$ | $V_{C C}$ | C8 |
| CID2 | L15 | LAD20 | B12 | MSA8 | R9 | MSD22 | N1 | $V_{C C}$ | C10 |
| CLK | J15 | LAD21 | C11 | MSA10 | P9 | MSD23 | L3 | $V_{\text {cc }}$ | D13 |
| COINT | E15 | LAD22 | A13 | MSA11 | R10 | MSD24 | M2 | VCC | F3 |
| CORDY | F13 | LAD23 | B13 | MSA12 | R11 | MSD25 | P1 | $V_{\text {cc }}$ | J3 |
| DS/CS | R13 | LAD24 | A14 | MSA13 | P10 | MSD26 | N2 | $V_{\text {cc }}$ | M13 |
| EC0 | G15 | LAD25 | C13 | MSA14 | N10 | MSD27 | R2 | $\mathrm{V}_{\mathrm{CC}}$ | N3 |
| EC1 | G14 | LAD26 | C14 | MSA15 | R12 | MSD28 | N4 | $V_{\text {cc }}$ | N6 |
| INTG | P13 | LAD27 | B15 | MSD0 | C3 | MSD29 | P3 | $\mathrm{V}_{\mathrm{CC}}$ | N8 |
| INTR | K14 | LAD28 | D14 | MSD1 | B1 | MSD30 | R3 | $V_{S S}$ | C4 |
| LADO | B3 | LAD29 | C15 | MSD2 | D3 | MSD31 | P4 | $V_{S S}$ | C7 |
| LAD1 | A2 | LAD30 | E14 | MSD3 | C2 | MWR | P11 | $V_{S S}$ | C9 |
| LAD2 | B4 | LAD31 | D15 | MSD4 | C1 | NC | A1 | $V_{S S}$ | C12 |
| LAD3 | A3 | LCLK1 | M14 | MSD5 | D2 | NC | A15 | $V_{\text {SS }}$ | E3 |
| LAD4 | B5 | LCLK2 | M15 | MSD6 | D1 | NC | B2 | $V_{S S}$ | E13 |
| LAD5 | A4 | LOE | H14 | MSD7 | E2 | NC | B14 | $V_{S S}$ | H3 |
| LAD6 | C6 | LRDY | N13 | MSD8 | E1 | NC | D4 | $V_{S S}$ | H13 |
| LAD7 | B6 | MAE | N12 | MSD9 | F2 | NC | P2 | $V_{S S}$ | K3 |
| LAD8 | A5 | MASTER | J14 | MSD10 | F1 | NC | R1 | $V_{S S}$ | 113 |
| LAD8 | A6 | $\overline{\mathrm{MCE}}$ | R14 | MSD11 | G3 | NC | R15 | $V_{S S}$ | M3 |
| LAD10 | B7 | $\overline{\text { MOE }}$ | P12 | MSD12 | G2 | RAS | P15 | $V_{S S}$ | N5 |
| LAD11 | A7 | MSAO | R4 | MSD13 | G1 | RDY | K15 | $V_{S S}$ | N8 |
| LAD12 | A8 | MSA1 | P5 | MSD14 | H1 | RESET | N15 | $V_{\text {SS }}$ | N11 |
| LAD13 | A9 | MSA2 | R5 | MSD15 | J1 | SF | N14 | WE | G13 |

### 2.2 Pin Functional Descriptions

The following tables contain the TMS34082 signal descriptions grouped by their functions.

## Table 2-3. LAD Bus Signals

| Pin |  | //O/Z | Description |
| :---: | :---: | :---: | :---: |
| Name | No. |  |  |
| $\overline{\text { ALTCH }}$ | F14 | 1 | Address Latch, active low. In coprocessor mode, falling edge of $\overline{\text { ALTCH }}$ latches instruction and status present on the LAD bidirectional bus (LAD31-0). |
|  |  | 0 | In host-independent mode, $\overline{\text { ALTCH }}$ is an address output write strobe for memory accesses on LAD31-0. |
| BUSFLT | P14 | 1 | Bus Fault. In coprocessor mode when high, indicates a data fault on the LAD bus (LAD31-0) during current bus cycle which causes TMS34082 not to capture the current data on LAD bus. Tied low if not used. Not used in host-independent mode. |
| $\overline{\text { CAS }}$ | F15 | 1 | Column Address Strobe, active low. In the coprocessor mode, causes TMS34082 to latch LAD bus data on CAS low-to-high transition if LRDY was high and BUSFLT was low at the previous LCLK2 rising edge. |
|  |  | O/Z | In host-independent mode, this signal is the read strobe output. |
| LADO | B3 | 1/O/Z | Local Address and Data Bus. In coprocessor mode, used by TMS34020 to input instructions and data operands to TMS34082, and used by TMS34082 to output results. In host-independent mode, used by the TMS34082 for address output and data I/O. |
| LAD1 | A2 |  |  |
| LAD2 | B4 |  |  |
| LAD3 | A3 |  |  |
| LAD4 | B5 |  |  |
| LAD5 | A4 |  |  |
| LAD6 | C6 |  |  |
| LAD7 | B6 |  |  |
| LAD8 | A5 |  |  |
| LAD9 | A6 |  |  |
| LAD10 | B7 |  |  |
| LAD11 | A7 |  |  |
| LAD12 | A8 |  |  |
| LAD13 | A9 |  |  |
| LAD14 | B9 |  |  |
| LAD15 | A10 |  |  |
| LAD16 | B10 |  |  |
| LAD17 | A11 |  |  |
| LAD18 | B11 |  |  |
| LAD19 | A12 |  |  |
| LAD20 | B12 |  |  |
| LAD21 | C11 |  |  |
| LAD22 | A13 |  |  |
| LAD23 | B13 |  |  |
| LAD24 | A14 |  |  |

## Table 2-3. LAD Bus Signals (Continued)

| Pin |  |  | Description |
| :---: | :---: | :---: | :---: |
| Name | No. | I/O/Z |  |
| LAD25 | C 13 | I/OR | Local Address and Data Bus. In coprocessor mode, used by TMS34020 to input instructions and data operands to TMS34082, and used by TMS34082 to output results. In host-independent mode, used by the TMS34082 for address output and data I/O. |
| LAD26 | C14 |  |  |
| LAD27 | B15 |  |  |
| LAD28 | D14 |  |  |
| LAD29 | C15 |  |  |
| LAD30 | E14 |  |  |
| LAD31 | D15 |  |  |
| $\overline{\text { LOE }}$ | H14 | 1 | Local Bus Output Enable, active low. Enables the local bus (LAD31-0) to be driven at the proper times when low. In addition, during the host-independent mode when LADCFG is low, does not affect $\overline{A L T C H}, \overline{C A S}, \overline{W E}$, CORDY, or COINT. When LADCFG is high, $\overline{A L T C H, ~ C O I N T, ~ a n d ~}$ CORDY are not disabled by LOE high; $\overline{\mathrm{CAS}}$ and $\overline{W E}$ are disabled. |
| LRDY | N13 | 1 | Local Bus Data Ready. In coprocessor mode, LDRY high indicates that data is available on LAD bus. LRDY low indicates that the TMS34082 should not load data from LAD31-0. In host-independent mode, when LRDY goes low, the device is stalled until LRDY is set high again. Tied high if not used. |
| $\overline{\text { RAS }}$ | P15 | 1 | Row Address Strobe, active low. In coprocessor mode this signal is high during all coprocessor instruction cycles. Not used in host-independent mode. |
| SF | N14 | 1 | Special Function. When high, indicates the LAD bus input is an instruction or data from TMS34020 registers. When low, indicates the LAD input is adata operand from memory. Not used in host-independent mode. |
| $\overline{W E}$ | G13 | 1 | Write Enable, active low. In coprocessor mode, the LAD bus write strobe from the TMS34020 to enable a write to or from the TMS34082 LAD bus. |
|  |  | OR | In host-independent mode, $\overline{\text { WE }}$ is the TMS34082 data write strobe. |

Table 2-4. MSD Bus Signals

| Pin |  |  | Description |
| :---: | :---: | :---: | :---: |
| Name | No. | 1/0/2 |  |
| DS/CS | R13 | 0 | Data Space/Code Space Select. When MEMCF is low and DS/CS is low, selects program memory on MSD port. When MEMCFG is low and DS/CS is high, selects data memory on MSD port. When MEMCFG is high, DS/CS is memory chip select, active low. |
| $\overline{\text { MAE }}$ | N12 | 1 | External Memory Address and Data Output Enable, active low. When this signal is low, the TMS34082 can output an address on MSA15-0 and data on MSD31-0. MAE high does not disable DS/CS, MCE, MWR, or MOE. |
| $\overline{\text { MCE }}$ | R14 | 0 | Memory Chip Enable. When MEMCFG is low, active (low) indicates access to external memory on MSD31-0. When MEMCFG is high, MCE low is external code memory chip select. |
| $\overline{\text { MOE }}$ | P12 | 0 | Memory Output Enable, active low. When low, enables output from external memory onto the MSD port. |
| MSAO | R4 | O/Z | Memory Address Bus. Addresses up to 64 K words of external program memory or up to 64 K words of external data memory on the MSD port, depending on setting of DS/CS select. |
| MSA1 | P5 |  |  |
| MSA2 | R5 |  |  |
| MSA3 | P6 |  |  |
| MSA4 | R6 |  |  |
| MSA5 | N7 |  |  |
| MSA6 | P7 |  |  |
| MSA7 | R7 |  |  |
| MSA8 | P8 |  |  |
| MSA9 | R9 |  |  |
| MSA10 | P9 |  |  |
| MSA11 | R10 |  |  |
| MSA12 | R11 |  |  |
| MSA13 | P10 |  |  |
| MSA14 | N10 |  |  |
| MSA15 | R12 |  |  |

## Table 2-4. MSD Bus Signals (Continued)

| Pin |  |  |  |  |
| :--- | :--- | :--- | :--- | :--- |
| Name | No. | I/O/Z |  |  |
| MSD0 | C3 |  |  |  |
| MSD1 | B1 |  |  |  |
| MSD2 | D3 |  |  |  |
| MSD3 | C2 |  |  |  |
| MSD4 | C1 |  |  |  |
| MSD5 | D2 |  |  |  |
| MSD6 | D1 |  |  |  |
| MSD7 | E2 |  |  |  |
| MSD8 | E1 |  |  |  |
| MSD9 | F2 |  |  |  |
| MSD10 | F1 |  |  |  |
| MSD11 | G3 |  |  |  |
| MSD12 | G2 |  |  |  |
| MSD13 | G1 |  |  |  |
| MSD14 | H1 |  |  |  |
| MSD15 | J1 | I/O/Z | External Memory Data Bus. Used to read from or write to external data or program memory. |  |
| MSD16 | J2 |  |  |  |
| MSD17 | K1 |  |  |  |
| MSD18 | L1 |  |  |  |
| MSD19 | K2 |  |  |  |
| MSD20 | M1 |  |  |  |
| MSD21 | L2 |  |  |  |
| MSD22 | N1 |  |  |  |
| MSD23 | L3 |  |  |  |
| MSD24 | M2 |  |  |  |
| MSD25 | P1 |  |  |  |
| MSD26 | N2 |  |  |  |
| MSD27 | R2 |  |  |  |
| MSD28 | N4 |  |  |  |
| MSD29 | P3 |  |  |  |
| MSD30 | R3 |  |  |  |
| MSD31 | P4 |  |  |  |
| MWR | P11 | O | Memory Write Enable. When low, data on MSD31-0 can be written to external program or data |  |

## Table 2-5. Clock and Control Signals

| Pin |  | VO/Z | Description |
| :---: | :---: | :---: | :---: |
| Name | No. |  |  |
| CC | J13 | 1 | Condition Code Input. May be used as an external conditional input for branch conditions. |
| $\begin{aligned} & \hline \text { CIDO } \\ & \text { CID1 } \\ & \text { CID2 } \end{aligned}$ | $\begin{aligned} & \mathrm{L} 14 \\ & \mathrm{~K} 13 \\ & \mathrm{~L} 15 \end{aligned}$ | 1 | Coprocessor ID. Used to set a coprocessor ID so that TMS34020 Graphics System Processor controlling multiple TMS34082s can designate which coprocessor is being selected by the current instruction. Tied low in host-independent mode. |
| CLK | J15 | 1 | System Clock in host-independent mode. Tied low in coprocessor mode. |
| $\overline{\text { COINT }}$ | E15 | 0 | Coprocessor interrupt Request, active low. In coprocessor mode, signals an exception not masked out in the configuration register. Remains low until the status register is read. In host-independent mode, user programmable I/O when LADCFG is low. Designates bus cycle boundaries on LAD31-0 when LADCFG is high. |
| CORDY | F13 | 0 | Coprocessor Ready. In coprocessor mode, if the TMS34020 sends an instruction before the TMS34082 has completed a previous instruction, this signal goes low to indicate that the TMS34020 should wait. User-programmable in host-independent mode. |
| INTG | P13 | 0 | Interrupt Grant. This signal is set high to acknowledge an interrupt request input in host-independent mode. |
| $\overline{\text { NTR }}$ | K14 | 1 | Interrupt Request, active low. Causes call to subroutine address in interrupt vector register in host-independent mode. Tied high in coprocessor mode. |
| LCLK1 LCLK2 | $\begin{aligned} & \text { M14 } \\ & \text { M15 } \end{aligned}$ | 1 | Local Clock 1 and 2, generated by the TMS34020, 90 degrees out of phase, to provide timing inputs to TMS34082 in coprocessor mode. Tied low in host-independent mode. |
| MSTR | J14 | 1 | Coprocessor/Host-Independent Mode Select. When low, puts the TMS34082 in coprocessor mode. When high, puts the TMS34082 in host-independent mode. |
| RDY | K15 | 1 | Ready. When RDY is low, causes a nondestructive stall of sequencer and floating-point operations. All internal registers and status in the FPU core are preserved. Also, no output lines will change state. |
| RESET | N15 | 1 | Reset, active low. Resets sequencer output and clears pipeline registers, internal states, status, and exception disable registers in FPU core. Other registers are unaffected. |

Table 2-6. Emulation Control Signals

| Pin |  |  |  |
| :--- | :--- | :---: | :--- |
| Name | No. | vo/z | Description |
| EC0 G14 <br> EC1 G15 | 1 | Emulator Mode Control and Test. Tied high for normal operation. |  |
| TCK | R8 | I | Test Clock for JTAG 4-wire boundary scan. Tied low for normal operation. |
| TDI | H15 | 1 | Test Data Input for JTAG 4-wire boundary scan. May be left floating. |
| TDO | H2 | O | Test Data Output for JTAG 4-wire boundary scan. |
| TMS | B8 | 1 | Test Mode Select for JTAG 4-wire boundary scan. May be left floating. |

## Table 2-7. Power and N/C Signals

|  |  |  |
| :---: | :---: | :---: |
| Name | No. | Description |
| NC | A1 |  |
| NC | A15 |  |
| NC | B2 |  |
| NC | 814 | No internal connection. These pins should be leff floating. |
| NC | D4 |  |
| NG | P2 |  |
| NC | R1 |  |
| NC | R15 |  |
| $V_{\text {CC }}$ | C5 |  |
| $V_{C C}$ | C8 |  |
| $V_{C C}$ | C10 |  |
| $V_{C C}$ | D13 |  |
| $V_{C C}$ | F3 |  |
| $V_{C C}$ | J3 |  |
| $V_{C c}$ | M13 | 5-V power supply. All pins must be connected and used. |
| $V_{\text {cc }}$ | N3 |  |
| Vcc | N6 |  |
| $V_{C C}$ | N9 |  |
| VSS | C4 |  |
| $V_{\text {SS }}$ | C7 |  |
| $V_{S S}$ | C9 |  |
| $V_{S S}$ | C12 |  |
| $V_{S S}$ | E3 |  |
| $V_{S S}$ | E13 |  |
| $V_{S S}$ | H3 | Ground pins. All pins must be connected and used. |
| $V_{S S}$ | H13 |  |
| $V_{S S}$ | K3 |  |
| VSS | L13 |  |
| $V_{S S}$ | M3 |  |
| $V_{S S}$ | N5 |  |
| VSs | N8 |  |
| $V_{S S}$ | N11 |  |

## Chapter 3

## Data Formats

The TMS34082 accepts operands as either:
IEEE floating-point numbers (IEEE Standard 754-1985)
Unsigned 32-bit integers
32-bit 2s-complement signed integers
Floating-point operands may be either single-precision (32 bits) or double-precision ( 64 bits). All internal integer instructions use signed integer data formats.

### 3.1 Integer Formats

The TMS34082 recognizes two types of integers: signed and unsigned. Only one type may be used in a single instruction. Internal instructions use only signed integers.

### 3.1.1 Signed Integers

A signed integer is a 32 -bit value in 2 s -complement format, as shown below. The most significant bit is the sign bit; a 1 signifies a negative number. Signed integers can represent values from $-2,147,438,648$ to $+2,147,438,647$.

Figure 3-1. IEEE Signed Integer Format


### 3.1.2 Unsigned Integer

An unsigned integer is also a 32-bit value, but can only represent positive numbers. The range for unsigned integers is 0 to 4,294,967,295.

Figure 3-2. IEEE Unsigned Integer Format


### 3.2 Floating-Point Formats

IEEE formats for floating-point operands, both single- and double-precision, consist of three fields; the sign (s), the exponent (e), and the fraction ( $f$ ), in that order. The most significant bit is the sign bit. The value of the mantissa contains a hidden bit, an implicit leading 1 , as shown below:
1.fraction

The representation of a normalized floating-point number is:

$$
\left.(-1)^{s} \times 1 . f \times 2^{(e-b i a s}\right)
$$

The bias is a number added to the true exponent to ensure that the exponent (e) is always positive. The bias is 127 for single-precision or 1023 for double-precision. Further details of IEEE formats and exceptions are covered in the IEEE Standard for Binary Floating-Point Arithmetic, IEEE Standard 754-1985.

### 3.2.1 Single-Precision Floating-Point

Single-precision floating-point numbers are 32 bits long; the exponent field is 8 bits, and the fraction field is 23 bits. The exponent is biased by 127. Single precision can represent values from $\pm 2^{-126}$ to $\pm 2^{127} \times\left(2-2^{-23}\right)$. That is approximately $\pm 1.2 \times 10^{-38}$ to $\pm 3.4 \times 10^{38}$. The format for a single-precision number is shown in Figure 3-3.

Figure 3-3. IEEE Single-Precision Format

s: sign of fraction
e: 8-bit exponent, biased by 127 (true exponent +127 )
f: 23-bit fraction

### 3.2.2 Double-Precision Floating-Point

A double-precision floating-point number is a 64-bit value. The exponent field is 11 bits, biased by 1023, and the fraction field is 52 bits. The range for double-precision is $\pm 2^{-1022}$ to $\pm 2^{1023} \times\left(2-2^{-52}\right)$, or approximately $\pm 2.2 \times 10^{-308}$ to $\pm 1.8 \times 10^{308}$.

Figure 3-4. IEEE Double-Precision Format

s : sign of fraction
e: 11-bit exponent, biased by 1023 (true exponent + 1023)
f: 52-bit fraction

### 3.2.3 Denormal and Wrapped Numbers

The TMS34082 also handles two other data formats that permit operations on very small floating-point numbers. Denormalized and wrapped floating-point numbers represent the same values, but in different formats. If very small values can be approximated by 0 in your application, you can set the Fast bit in the configuration register to force all denormal and wrapped inputs and outputs to 0 .

The ALU accepts denormalized numbers, that is, floating-point numbers so small that they cannot be normalized. A denormalized number results from decrementing the biased exponent field to 0 before normalization is complete. A denormal has the form of a floating-point number with a 0 exponent, a nonzero fraction, and a 0 in the leftmost (hidden) bit of a mantissa.

A single-precision denormalized number is equal to the following:

$$
(-1)^{s} \times(2)^{-126} \times 0 . f
$$

For double-precision, a denormal is equal to the following:

$$
(-1)^{s} \times(2)^{-1022} \times 0 . f
$$

If denormalized numbers are input to the multiplier, they will cause status exceptions. Denormals can be passed to the ALU to be wrapped. The wrapped operand is then input to the multiplier.

A wrapped number is a number created by normalizing a denormalized number's fraction field and subtracting from the exponent the number of shift positions (minus one) required to do so. The exponent is encoded as a $2 s$-complement negative number. When the mantissa of the denormal is normalized by shifting it left, the exponent field decrements from all 0s (wraps past 0) to a negative2s-complement number (except in the case of 0.1 xxx . . . , where the exponent is not decremented).

### 3.2.4 Special Floating-Point Numbers

There are three other special floating-point value representations (see Figure 3-5):

Zero (positive or negative) is represented by the appropriate sign bit, a 0 exponent field, and a 0 fraction field.

Infinity (positive or negative) is represented by the appropriate sign bit, is in the exponent field, and a 0 fraction field.

A Not a Number ( NaN ) designates data that has no mathematical value. A NaN has is in the exponent field with a nonzero fraction.

A NaN is produced whenever an invalid operation (such as division by 0 ) is executed. The TMS34082 treats all NaNs as signaling NaNs , setting the invalid (I) flag in the status register. The TMS34082 outputs all NaNs (regardless of input form) with a 0 sign bit and all 1 s in the exponent and fraction fields.

Figure 3-5. Special Floating-Point Formats

Single-Precision


| 63 | 62 | 52 |
| :---: | :---: | :---: |
| s | $00 \ldots 00$ | $00 \ldots \ldots \ldots \ldots \ldots \ldots .00$ |



### 3.2.5 Range of Floating-Point Numbers

Table 3-1 shows the range of possible single- and double-precision floating-point numbers.

Table 3-1. Floating-Point NumberRepresentations

| Type | Sign | Exponent | Hidden Bit | Fraction |
| :---: | :---: | :---: | :---: | :---: |
| NaNs | 0 | $11 . .11$ | 1 | $11 . .11$ |
|  |  | : : |  | : : |
|  | 0 | $11 . .11$ |  | $10 . .00$ |
|  |  | $11 . .11$ |  | $01 . .11$ |
|  |  | : : | 1 | : : |
|  | 0 | $11 . .11$ |  | $00 . .01$ |
| Positive Infinity | 0 | $11 . .11$ | 1 | $00 . .00$ |
| Positive Normals | 0 | $11 . .10$ | 1 | $11 . .11$ |
|  |  | : : |  | : |
|  | 0 | $00 . .01$ |  | $00 . .00$ |
| Positive Denormals | 0 | $00 . .00$ | 0 | 11 .. 11 |
|  |  | : : |  | : |
|  | 0 | $00 . .00$ |  | $00 . .01$ |
| Zero (Positive) | 0 | $00 . .00$ | 1 | $00 . .00$ |
| Zero (Negative) | 1 | $00 . .00$ | 1 | $00 . .00$ |
| Negative Denormals | 1 | 00 .. 00 | 0 | $00 . .01$ |
|  |  | : : |  | : |
|  | 1 | $00 . .00$ |  | $11 . .11$ |
| Negative Normals | 1 | $00 . .01$ | 1 | $00 . .00$ |
|  |  | : |  | : |
|  | 1 | $11 . .10$ |  | $11 . .11$ |
| Negative Infinity | 1 | $11 . .11$ | 1 | $00 . .00$ |
| NaNs | 1 | $11 . .11$ | 1 | $00 . .01$ |
|  |  | : : |  | : |
|  | 11 | $11 . .11$ |  | $01 . .11$ |
|  |  | $11 . .11$ |  | $10 . .00$ |
|  |  | : : | 1 | : : |
|  | 1 | $11 . .11$ |  | $11 . .11$ |
| Single: |  | < 8 bits> |  | <-23 bits -> |
|  |  | < 11 bits> |  | $<-52$ bits -> |

## Chapter 4

## Architecture

Because the sequencer, control and data registers, and FPU core are closely coupled, the TMS34082 can execute a wide variety of complex floating-point or integer calculations rapidly with a minimum of external data transfers. The internal architecture of the FPU core supports concurrent operation of the multiplier and the ALU, providing several options for storing or feeding back intermediate results. Also, several special registers are available to support calculations for graphics algorithms. Each of the main architectural elements of the TMS34082 is discussed in this chapter.

### 4.1 Functional Block Diagram

The main architectural features of the TMS34082 are illustrated in Figure 4-1.
Figure 4-1. Functional Block Diagram


### 4.2 Operating Modes

The TMS34082 has two operating modes: coprocessor mode and host-independent mode.

In coprocessor mode, the TMS34082 acts as a floating-point coprocessor to the TMS34020 Graphics System Processor. The TMS34082 is a direct extension of the TMS34020 and its instruction set. Operation in coprocessor mode is signaled by tying the MSTR input low. Chapter 5 details this operating mode.

In host-independent mode, the TMS34082 is a floating-point RISC processor. It may be used as a coprocessor to another host processor, as a parallel processor, or as a stand-alone processor. To operate in host-independent mode, the MSTR input must be high. This mode is covered in Chapter 6.

### 4.3 Bus Interfaces

The TMS34082 has two buses: the LAD (LAD31-0) and the MSD (MSD31-0). Each is a 32-bit bidirectional bus which can be used to transfer instructions and/or data.

One 32-bit operand can be input to the TMS34082 data registers each cycle. A 64-bit double-precision floating-point operand is input in two cycles. Transfers to and from the data registers can normally be programmed as block moves (loading one or more sets of operands with a single move instruction to minimize I/O overhead). Block transfers up to 512 words in length can be programmed in either direction between the LAD and MSD buses.

### 4.3.1 LAD Bus

When the TMS34082 is used as a coprocessor to the TMS34020, the LAD bus is the main interface between the two devices. Both data and instructions from the TMS34020 are input on the LAD bus. The data can be stored in internal registers or transferred to memory on the MSD port. In addition, data (from registers or the MSD bus) can be sent to the TMS34020.

With a single TMS34020 instruction, the TMS34020 can transfer both an instruction and data to the TMS34082. Data may be from TMS34020 registers or the local memory controlled by the TMS34020.

In host-independent mode, the LAD bus is used as a databus. Instructions may not be input on the LAD bus. However, data (an address) may be read from the LAD port to an internal register, and a jump to that address performed.

To permit direct input to or output from the LAD bus, other options are available for control of the bus in host-independent mode. When two 32-bit operands are selected for input to the FPU core, one operand may come directly from the LAD bus. A result from the FPU core may simultaneously be written to a data register and the LAD bus.

The main control signals for the LAD bus are:
$\overline{\overline{A L T C H}}$
$\overline{\mathrm{CAS}}$
$\overline{\mathrm{WE}}$
$\overline{\mathrm{LOE}}$
SF (coprocessor mode only).

The function of these signals depends upon the operating mode and are discussed further in Chapters 5 - Coprocessor Mode - and Chapter 6 - Host-Independent Mode.

### 4.3.2 MSD Bus

The MSD bus (MSD31-0) and its associated address bus (MSA15-0) are the external memory interface for the TMS34082. Control signals allow you to have separate code and data storage on the MSD port. Up to 64K 32-bit words of code space and 64 K words of data space are directly supported. The bus and control signals are optimized for use with static RAM (SRAM) memory. However, with some external logic, this bus may also be connected to DRAMs, VRAMs, or other system buses.

The MSD bus is the main instruction source in host-independent mode. Data may also be accessed on this port. The TMS34082 can operate with the LAD bus as its single data bus and the MSD bus as the instruction source, or with data storage on both ports and the program memory on the MSD port.

In coprocessor mode, use of the MSD bus is optional. External user-generated subroutines may be accessed via the MSD bus. In addition, data for these routines may be stored in memory on the MSD port. The code and data for these subroutines may be downloaded from the TMS34020 memory using an LAD-to-MSD move.

MSD bus control is the same in both coprocessor and host-independent modes. Control signals are summarized in Table 4-1. Different combinations of $\overline{M C E}, \overline{M W R}$, and $\overline{M O E}$ distinguish between memory reads and writes. Table 4-2 lists the memory operation performed for each combination of signals.

Table 4-1. MSD Bus Control Signals

| Name | Function |
| :--- | :--- |
| MSA15-0 | Memory Address Output |
| DS//्टS | Data Space/Code Space Select. This signal goes low to select program <br> memory or high to select data memory. |
| $\overline{\text { MCE }}$ | Memory Chip Enable. This signal goes low when reading from or writing to <br> memory. |
| $\overline{\text { MOE }}$ | Memory Output Enable. This signal goes low when reading from memory. |
| $\overline{\text { MWR }}$ | Memory Write Enable. This signal goes low when writing to memory. |
| $\overline{\text { MAE }}$ | MSD Bus Enable. When this input is low, the TMS34082 can output data and <br> address on MSD and MSA. |

Table 4-2. Memory Operations on MSD

| $\overline{\text { MCE }}$ | $\overline{\mathrm{MWR}}$ | $\overline{\mathrm{MOE}}$ | Memory Operation |
| :---: | :---: | :---: | :--- |
| 0 | 0 | 0 | Invalid |
| 0 | 0 | 1 | Write to memory |
| 0 | 1 | 0 | Read from memory |
| 0 | 1 | 1 | Invalid |
| 1 | x | x | No memory access |

The $\mathrm{DS} / \overline{\mathrm{CS}}$ output acts as the most significant address bit selecting between code and data memory. If a single block of memory is used for both code and data space, this output may be ignored. Without DS/CS, only 64 K words of memory can be accessed.

An alternate control scheme is chosen by setting the MEMCFG bit in the configuration register high. Then, $D S / \overline{C S}$ is the data space chip enable and $\overline{M C E}$ is the code space chip enable. Refer to subsection 4.5.3.3 - MSD Bus Configuration - for more information.

If the memory on the MSD port is shared with another processor, $\overline{\text { MAE }}$ may be used to prevent bus conflicts. When memory on the MSD port is shared, the host processor can monitor the state of the memory chip enable (MCE) to determine when the TMS34082 is accessing memory.

Otherwise, MAE may be tied low. The TMS34082 will only drive the MSD bus when writing to memory (signaled by MWR low).

### 4.4 Sequence Control

The sequencer selects the next program execution address either from internal code or from external program memory. Next address sources include:

Program counter
Instruction register
Stack
Interrupt vector register
Interrupt return register
Indirect address register
The two-deep stack is used to store return addresses for jump-to-subroutine instructions. When the TMS34082 receives an interrupt, the sequencer jumps to the interrupt service routine at the address given by the interrupt vector register. The interrupt return register stores the address where execution resumes after the interrupt routine is completed. The indirect address register is used for indirect branches and jumps to subroutines.

The sequencer allows many options for program execution control. Branches on status, conditional and unconditional jumps to subroutines, counted loops, and interrupt service routines may be programmed.

### 4.5 Registers

The TMS34082 contains:
Twenty 64-bit general-purpose registers
Two embedded 64-bit feedback registers
Ten control registers
Control registers are 17 to 32 bits long as shown in the register model in Figure 4-3. The 32 -bit control registers COUNTX, COUNTY, and MIN-MAXILOOPCT are used for internal graphics instructions. When you are not using these instructions, the registers are available for temporary storage.

32-bit single-precision floating-point or integer data is stored in the upper half (bits 63-32) of a register as shown in Figure 4-2. Double-precision data uses the complete 64-bit register. If a double-precision number is loaded into a 32 -bit register, both halves are written to the register. The first half of the data is lost because it is overwritten by the second half.

Figure 4-2. Register Usage


Integer or Single-Precision Numbers

| 63 | 32 | 31 | 0 |
| :--- | :--- | :--- | :--- |
|  |  |  |  |

Double-Precision Numbers

Register files RA and RB can be written to or read from the external buses as can the control registers. Internal registers C and CT are embedded in the FPU core and can only be accessed by the FPU internal buses. The C and CT registers cannot be used as sources or destinations for move instructions. Several other registers are not available as sources for FPU operations as listed in Table 4-3.

Block moves begin at the register address given in the instruction and sequence through the registers in the order shown in the register model, Figure 4-3. C and CT are omitted from the sequence because they cannot be accessed by the external buses. After the last register address (MIN-MAX/LOOPCT), the sequence starts again at address 0 (RAO).

Figure 4-3. TMS34082 Register Model


## Table 4-3. Internal Registers

| Address | Register | Restrictions on Use |
| :---: | :---: | :---: |
| 00000 | RAO |  |
| 00001 | RA1 |  |
| 00010 | RA2 |  |
| 00011 | RA3 |  |
| 00100 | RA4 |  |
| 00101 | RA5 |  |
| 00110 | RA6 |  |
| 00111 | RA7 |  |
| 01000 | RA8 |  |
| 01001 | RA9 |  |
| 01010 | c | Not a source or destination for external moves. C and CT cannot both be used as operands in the same instruction. |
| 01011 | CT | Not a source or destination for external moves. C and CT cannot both be used as operands in the same instruction. |
| $01100{ }^{+}$ | STATUS | Not a source for FPU instructions |
| $01101^{\ddagger}$ | CONFIG | Not a source for FPU instructions |
| $01110^{\ddagger}$ | COUNTX | Not a source for FPU instructions |
| $01111^{\ddagger}$ | COUNTY | Not a source for FPU instructions |
| 10000 | RB0 |  |
| 10001 | RB1 |  |
| 10010 | RB2 |  |
| 10011 | RB3 |  |
| 10100 | RB4 |  |
| 10101 | RB5 |  |
| 10110 | RB6 |  |
| 10111 | RB7 |  |
| 11000 | RB8 |  |
| 11001 | RB9 |  |
| 11010 | VECTOR | Not a source for FPU instructions |
| 11011 | MCADDR | Not a source for FPU instructions |
| $11100 \dagger$ | SUBADDO | Not a source for FPU instructions |
| $11101^{\ddagger}$ | SUBADD1 | Not a source for FPU instructions |
| $11110^{\ddagger}$ | IRAREG | Not a source for FPU instructions |
| $11111^{\ddagger}$ | MIN-MAX/LOOPCT | Not a source for FPU instructions |

† Using this address as a source register in external code inputs data directly from the LAD bus to the FPU. Only valid in host-independent mode.
$\ddagger$ Using this address as a source register in external code inputs the value one of the appropriate format (integer, single-, or double-precision) to the FPU.

### 4.5.1 Register Files RA and RB

The TMS34082 contains two register files, each with ten 64-bit registers. Most instructions operate on one value from each of the RA and RB register files and return the result to any register. Figure 4-4 illustrates the general-purpose registers of the TMS34082.

Figure 4-4. General-Purpose Registers


When the ONEFILE control bit is set high in the configuration register, data written to a register in RA file is simultaneously written to the corresponding location in RB file. For example, the same data is written to both RA1 and RB1 at once. In this mode the two register files act as a ten-word, two-read/one-write register file, as shown in Figure 4-5.

Figure 4-5. Register Files with ONEFILE High


### 4.5.2 Feedback Registers C and CT

The two 64-bit feedback registers, C and CT, are embedded in the FPU core. Data is stored in the C and CT registers in an unpacked format. That is, integer and single-precision numbers are not stored in the upper 32-bits of the registers, but aligned in fields throughout the 64 bits. For this reason, you should always make sure the data type in the instruction matches the actual data in the register.

C or CT can be used as one or both operands in an instruction, but may not be used together in the same instruction. For example, $\mathrm{C}+\mathrm{CT}$ is not valid, but $\mathrm{C}+\mathrm{C}$ is. The feedback registers may not be accessed for external moves.

The CT feedback register is used in integer divide and square root operations as a temporary holding register. Any data stored in CT will be lost during an integer divide or square root.

### 4.5.3 Configuration Register (CONFIG)

The configuration register (CONFIG) is a special 32-bit register which you load to set up the following TMS34082 functions:

Exception handling
Bus configurations
Pipeline configurations
Denormalized number handling
Data transfer operations
Rounding modes
The configuration register is initialized to FFE00020h. Writing to this register during a block move will not change the operation of LADCFG, MEMCFG, and LOAD until the move is complete. There is a one-cycle delay from when a new value is moved to the configuration register until that value takes effect. If the instruction following a move to the configuration register requires the new setting of the register to be valid, insert one nop (NoOperation) instruction after the move.

The format of the configuration register is given in Table 4-4.

## Table 4-4. Configuration Register Definition

| Bit No. | Name | Description |
| :---: | :---: | :---: |
| 31 | MIVAL | Multiplier invalid operation (l) exception mask. Initialized to one (enabled). |
| 30 | MOVER | Multiplier overflow (V) exception mask. Initialized to one (enabled). |
| 28 | MUNDER | Multiplier underflow (U) exception mask. Initialized to one (enabled). |
| 28 | MINEX | Multiplier inexact ( X ) exception mask. Initialized to one (enabled). |
| 27 | MDIVO | Divide by zero (DIVO) exception mask. Initialized to one (enabled). |
| 26 | MDENORM | Multiplier wrapped number output (DENORM) exception mask. Initialized to one (enabled). |
| 25 | AIVAL | ALU invalid operation (l) exception mask. Initialized to one (enabled). |
| 24 | AOVER | ALU overflow (V) exception mask. Initialized to one (enabled). |
| 23 | AUNDER | ALU underflow (U) exception mask. Initialized to one (enabled). |
| 22 | AINEX | ALU inexact ( $X$ ) exception mask. Initialized to one (enabled). |
| 21 | ADENORM | ALU denormal output (DENORM) exception mask. Initialized to one (enabled). |
| 20-11 | N/A | Reserved for later use. Initialized to all zeros. |
| 10 | VERSION | Version number, read only. Set to one. |
| 9 | LADCFG | LAD bus configuration for host-independent mode. When high, COINT defines LAD bus cycle boundaries. The setting of this bit has no effect in coprocessor mode. Initialized to zero. |
| 8 | MEMCFG | MSD bus configuration. When high, $\overline{M C E}$ and $\mathrm{DS} / \overline{\mathrm{CS}}$ are code and data space chip enable, respectively. Initialized to zero. |
| 7 | N/A | Reserved for later use. Initialized to zero. Note: You must always write a zero to this bit. |
| 6 | ONEFILE | When high, causes simultaneous write to both register files. Initialized to zero. |
| 5 | PIPES2 | When high, makes the FPU core output registers transparent. When low, the output registers are enabled. Initialized to one. |
| 4 | PIPES1 | When high, makes the FPU core internal pipeline registers transparent. When low, the FPU internal pipeline registers are enabled. Initialized to zero. |
| 3 | FAST | When high, Fast mode is selected (all denormalized inputs and outputs are zeroed). When low, IEEE mode is selected. Initialized to zero. |
| 2 | LOAD | Load order. $0=$ MSH, then LSH; $1=\mathrm{LSH}$, then MSH. Initialized to zero. |
| 1 | RND1 | Rounding mode select 1 . Initialized to zero. |
| 0 | RNDO | Rounding mode select 0. Initialized to zero. |

### 4.5.3.1 Exception Mask

The mask bits (bits 31-21) serve as exception detect enables. Setting bits high enables the detection of the specific exceptions. Exceptions that are unimportant to your specific application may be masked to prevent unwanted interrupts. When an enabled exception occurs, the ED bit in the status register is set high and can be used to generate interrupts.

When the exception mask has been loaded, the mask is appliedto the contents of the status register to disable unnecessary exceptions. Status results are ORed together and, if true, the exception detect (ED) status bit is set high. Individual status flags remain active and can be read independently of mask operations.

Since inexact results are normal for floating-point operations, you should usually mask out this exception for both the ALU (AINEX) and multiplier (MINEX).

### 4.5.3.2 LAD Bus Configuration (Host-Independent Mode)

The LADCFG bit (bit9) defines the LAD bus configuration for host-independent mode. Two different configurations are possible.

When LADCFG is low, COINT is a user-programmable signal not associated with the LAD bus. $\overline{\mathrm{CAS}}$ and $\overline{\mathrm{WE}}$ are not affected by $\overline{\text { LOE ( }}$ LAD bus enable).

When LADCFG is high, $\overline{\text { COINT }}$ defines LAD bus cycle boundaries and is controlled by bit 1 (C bit) of LAD move instructions. Also, CAS and $\overline{\mathrm{WE}}$ are disabled (placed in a high impedance state) when LOE is high.

With LADCFG high, a move instruction with the C bit high sets $\overline{\text { COINT }}$ low before the first word is moved. COINT remains low until the move is complete. You could use COINT to select between two devices on the LAD bus. COINT becomes the chip enable for one of the devices as shown in Figure 4-6.

The setting of $\overline{\text { COINT }}$ has no effect in coprocessor mode.
Figure 4-6. Host-Independent Mode LAD Bus Configuration for LADCFG high


### 4.5.3.3 MSD Bus Configuration

The MEMCFG bit defines the function of control signals for the MSD bus. Two different configurations are possible.

When MEMCFG is low, $\overline{M C E}$ is the memory chip enable signal. It goes low when memory is being accessed. DS/CS functions as the most significant address bit, selecting data memory when high or code memory when low. This configuration is illustrated in Figure 4-7.

Figure 4-7. MSD Bus Configuration for MEMCFG low


When MEMCFG is high, $\overline{\text { MCE }}$ becomes the code space chip enable and DS/CSthe data space chip enable. Both are active low. This may eliminate the need for an external inverter on DS/ $\overline{C S}$. Figure 4-8 show this configuration.

Figure 4-8. MSD Bus Configuration for MEMCFG high


### 4.5.3.4 Pipeline Settings

The PIPES2 and PIPES1 bits (bits 5-4) of the configuration register define the piepline register settings for the internal FPU core. PIPES2 is the enable for the FPU core output registers; PIPES1 is for the FPU core internal pipeline registers. Both are active low. When high, data flows through the registers. Table 4-5 details the pipeline operation for each setting of PIPES.

Table 4-5. Pipeline Settings

| PIPES2 | PIPES1 | Operation |
| :---: | :---: | :--- |
| 0 | 0 | Both pipeline registers enabled |
| 0 | 1 | Only FPU internal pipeline registers enabled |
| 1 | 0 | Only FPU output registers enabled |
| 1 | 1 | Both pipeline registers disabled (flowthrough) |

For more information on pipeline registers, refer to subsection 4.6.2-Pipeline Registers.

### 4.5.3.5 Fast and IEEE Modes

The FAST bit (bit 3 ) selects the mode for handling denormalized inputs and outputs. For many applications, very small numbers may be treated as zero, allowing the programmer to use Fast mode. In the Fast mode (FAST=1):

All denormalized or wrapped inputs and outputs are forced to zero and do not cause any status exceptions.

The DENIN (denormal input) status exception is disabled.
Using Fast mode simplifies error handling because you do not have to wrap and unwrap denormalized numbers. Forcing very small (denormalized) numbers to zero causes a loss of accuracy, however. If you multiply a very large number by a denormal, the result may be significantly larger than zero. If it is important in your application to distinguish between very small numbers and zero, use IEEE mode.

Setting FAST $=0$ selects IEEE mode. In this mode, the ALU can operate on denormalized inputs and return denormals. Denormals are not valid input to the multiplier; they must be wrapped first. If you input a denormal to the multiplier, the DENIN flag will be asserted and the result will be invalid (I flag set). Exponent underflow is possible during multiplication of small operands even when the operands are not wrapped numbers. If the multiplier result underflows, a wrapped number will be output. In IEEE mode, the wrapped number is not forced to zero.

When the multiplier produces a wrapped number as its result, it may be passed to the ALU and unwrapped. A zero is output if the wrapped result is too small to represent as a denormal (smaller then the minimum denormal). Table 4-6 describes how you should unwrap multiplier results and the status flags that are set when wrapped numbers are output from the multiplier.

Table 4-6. Handling Wrapped Multiplier Outputs

| Type of Result | Status Bits |  |  | Notes |
| :--- | :---: | :---: | :---: | :---: |
|  | DENORM | $\mathbf{X}$ | RND |  |
| Wrapped, exact | 1 | 0 | 0 | Unwrap with Wrapped, exact instruction |
| Wrapped, inexact | 1 | 1 | 0 | Unwrap with Wrapped, inexact instruction |
| Wrapped, increased in magnitude | 1 | 1 | 1 | Unwrap with Wrapped, rounded instruction |

### 4.5.3.6 Load Order

Since 64-bit double-precision data must be transferred 32 bits at a time, the TMS34082 must know which half of the word is loaded first. The LOAD bit (bit 2) defines the expected order. If LOAD $=0$, the most significant half (MSH) is transferred first, followed by the least significant half (LSH). When LOAD $=1$, the LSH is transferred first. The LOAD bit also determines the order data is read out of a register. Table 4-7 shows the load order for all data formats.

Table 4-7. Data Ordering for Loads/Stores

| Data Format | Size | Words Accessed |  |
| :---: | :---: | :---: | :---: |
|  |  | CONFIG LOAD bIt=0 <br> 31 $\qquad$ 0 | CONFIG LOAD bit $=1$ <br> 31 $\qquad$ |
| Integer | 32 bits | そ\%My, | \% X |
| Single-precision | 32 bits | Whande |  |
| Double-precision | 64 bits |  |  |

### 4.5.3.7 Rounding Modes

The TMS34082 supports the four IEEE standard rounding modes:
Round to nearest
Round towards zero (truncate)
Round towards positive infinity (round up)
Round towards minus infinity (round down)
The rounding function is selected by bits RND1 and RND0 as shown in Table 4-8. The default setting is round to nearest.

## Table 4-8. Rounding Modes

| RND1 | RNDO | Rounding Modes |
| :---: | :---: | :--- |
| 0 | 0 | Round towards nearest |
| 0 | 1 | Round towards zero (truncate) |
| 1 | 0 | Round towards infinity (round up) |
| 1 | 1 | Round towards negative infinity (round down) |

You should select the rounding mode which will minimize procedural errors. Rounding to nearest introduces an error no more than half of the least significant bit. Since rounding to nearest may involve rounding either up or down in successive steps, rounding errors tend to cancel each other.

In contrast, directed rounding modes may introduce errors approaching one bit for each rounding operation. Rounding errors may accumulate rapidly, particularly with single-precision operations.

### 4.5.4 Status Register

The floating-point status register (STATUS) is a 32-bit register used for reporting the exceptions that occur during TMS34082 operations and status codes set by the results of implicit and explicit compare operations. The status register is cleared upon reset, except for the INTENED flag which is set to one in coprocessor mode.

The status register can be used by test-and-branch instructions to control program flow. Because of the large number of FPU status outputs, branches on status can be used to save program execution time. The status register contents are also important when dealing with status exceptions including such conditions as overflow, underflow, invalid operations, or illegal data formats (such as infinity, Not a Number ( NaN ), or denormalized operands).

Table 4-9. Status Register Definition

| Bit No. | Name | Description |
| :---: | :---: | :---: |
| 31 | N | Sign bit. When high, the result is negative. ( $A<B$ for compare operations) |
| 30 | GT | $A>B$ (valid only for compare operations) |
| 29 | Z | zero flag. ( $A=B$ for compare operations) |
| 28 | V | IEEE Overflow flag. The result is greater than the largest allowable value for the specified format. |
| 27 | 1 | IEEE Invalid Operation flag. A NaN has been input to the FPU or an invalid operation has been requested. If I goes high because a NaN was input, the STX flags indicate which port had the NaN . |
| 26 | U | IEEE Underflow flag. The result is inexact and less than the minimum allowable value for the specified format. In Fast mode, this condition causes a zero result. |
| 25 | X | IEEE inexact flag. The result of an operation is inexact. |
| 24 | DIV0 | Divide by zero. An invalid operation involving a zero divisor has been detected by the multiplier. |
| 23 | RND | The mantissa of a number has been increased in magnitude by rounding. If the number generated was wrapped, then the unwrap, rounded instruction must be used to properly unwrap the wrapped number (see Table 4-6). |
| 22 | DENIN | The input to the multiplier is a denormal number. When DENIN goes high, the STX flags indicate which port had the denormal input. |
| 21 | DENORM | The multiplier output is a wrapped number or the ALU output is a denormal number. In the Fast mode, this condition causes the result to go to zero. It also indicates an invalid integer operation, for example, PASS (-A) with unsigned integer operand. |
| 20 | STX1 | A NaN or a denormal has been input on the A port. |
| 19 | STX0 | A NaN or a denormal has been input on the B port. |
| 18 | ED | Exception detect status signal representing logical OR of all enabled exceptions in the exception disable register. |
| 17 | UNORD | The two inputs of a comparison operation are unordered, that is, one or both of the inputs is a NaN . |
| 16 | INTFLG | Software interrupt flag. Set by external code to signal a software interrupt. |
| 15 | INTENHW | Hardware interrupt (INTR) enable |
| 14 | NXOROV | $N$ (negative) XOR V (overfiow) |
| 13 | VANDZB | V(overilow) AND NOTZ (not zero) |
| 12 | INTENED | ED interrupt enable (initialized to zero in host-independent mode, one in coprocessor mode). |
| 11 | INTENSW | Software interrupt enable for INTFLG (bit 16) |
| 10 | ZGT | $\mathrm{Zn}>\mathrm{Zmax}$ (valid for 2-D MIN-MAX instructions) |
| 9 | ZLT | $\mathrm{Zn}<\mathrm{Zmin}$ (valid for 2-D MIN-MAX instructions) |
| 8 | YGT | Yn > Ymax (valid for 1-D or 2-D MIN-MAX instructions) |
| 7 | YLT | $\mathrm{Yn}<\mathrm{Y}_{\min }$ (valid for 1-D or 2-D MIN-MAX instructions) |
| 6 | XGT | $\mathrm{X}_{\mathrm{n}}>\times \mathrm{mmax}$ (valid for 1-D or 2-D MIN-MAX instructions) |
| 5 | XLT | X n < Xmin (valid for 1-D or 2-D MIN-MAX instructions) |
| 4 | HINT | Hardware interrupt flag |
| 3-0 | n/a | Reserved, set to zero |

Output exceptions may be due to either an illegal data format or to a procedural error, such as:

Results too large or too small to be represented in the selected precision are signaled by V (overflow) and U (underflow).

An ALU output which was increased in magnitude by rounding causes $X$ (inexact) to be set.

Wrapped outputs from the multiplier may be inexact and increased in magnitude by rounding, which sets the $X$ (inexact) and RND (rounded) status flags high.

DENORM is set when the multiplier output is wrapped or the ALU output is denormalized.

DENORM is also set high when an illegal integer operation is performed.
DIV0 is set whenever the divisor is zero. The result of the operation is infinity.

Invalid operations cause the I flag to be set. The I bit will also go high if a NaN is input to the FPU.

The ED flag is a logical OR of the above exceptions. If any of the exception flags is high, ED will also be high. Exceptions can be masked out of ED by setting the appropriate bits in the configuration register. If the ED interrupt (INTENED) is enabled, an interrupt is generated when ED goes high.

Status flags are provided for both floating-point and integer results. Integer status is provided using Z for zero detect, N for sign, and V for overflow/carryout. Bits 14 and 13 are logical combinations of these three flags.

If the floating-point input to the multiplier is a denorm, DENIN will be set. If the input to the FPU is a NaN, I (invalid operation) will be set. STX1-0 indicate which operand is the source of the exception when either a denormal is input to the multiplier (DENIN=1) or a NaN is input ( $l=1$ ).

NaN inputs are all treated as IEEE signaling NaNs causing the I flag to be set. When the FPU outputs a NaN , it is always in the form of a signaling NaN with the I and appropriate STX flags set high. The exponent and fraction fields of the NaN are set to all 1s, regardless of the input fraction.

Invalid operations that set the I flag include:
Operations with NaN inputs
Zero divided by zero
Positive infinity minus positive infinity or negative infinity minus negative infinity

Positive infinity plus negative infinity
Square root of a negative number
Zero multiplied by infinity
The result of these operations is a NaN .
Bits 15,12 , and 11 in the status register are used to enable interrupts. Interrupts are enabled by setting INENHW (hardware interrupt), INTENSW (software interrupt), or INTENED (ED interrupt) high. A software interrupt is generated by writing to the status register with bit 16 (INTFLG) set to one.

### 4.5.5 Indirect Address Register

The indirect address register (MCADDR) can be set to point to a memory location for indirect move or jump operations on the MSD port. MCADDR is cleared upon reset. Although MCADDR cannot be used directly as an operand for FPU instructions, you can do an arithmetic operation on the value in MCADDR by first moving the contents to a register file location. Then perform the operation, choosing MCADDR as the destination.

The function of bit 16 varies, depending on whether the instruction is a move or jump. During a move instruction, bit 16 selects data space when set high or code space when low. During a jump instruction, bit 16 selects an internal instruction when set high or an external instruction when low (see Figure 4-9).

Figure 4-9. Indirect Address Register Format


### 4.5.6 Stack

The stack contains two subroutine return address registers (SUBADD0 and SUBADD1) which serves as a two-deep last-in, first-out (LIFO) stack. A subroutine jump causes the program counter to be pushed onto the stack, and a return from subroutine pops the last address pushed onto the stack. More than two pushes will overwrite the contents of SUBADD1.

Bit 31 (Pointer) is set high in the stack location that was written last and reset to zero in the other stack location. Setting bit 30 (Enable) high enables a write into bit 31 (set or reset the pointer) in either stack location. If bit 31 is zero in both SUBADD0 and SUBADD1 (as when the stack has been saved externally and later restored), SUBADDO can be designated as top of stack by setting bit 31 . The stack pointers are cleared upon reset.

Bit $16(I)$ is set high when the address in a stack location points to an internal routine or set low when the address is an external instruction.

Figure 4-10. Stack Register Format

| 31 | 30 | 16 |  |  |  |  |  |  |  |  |  |  |  |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| $P$ | $E$ | $\times$ | $\times$ | $\times$ | $\times$ | $\times$ | $\times$ | $\times$ | $\times$ | $\times$ | $\times$ | 1 | SUBADDO |
| $P$ | $E$ | $\times$ | $\times$ | $\times$ | $\times$ | $\times$ | $\times$ | $\times$ | $\times$ | $\times$ | $\times$ | 1 | SUBADD1 |

### 4.5.7 Interrupt Vector Register

The interrupt vector register (VECTOR) serves as a pointer to an external program to be executed upon receipt of an interrupt. Bit 16 (I) is always set low to point to a routine in external code space. The interrupt vector is cleared on reset. This register is only 17 bits wide (as shown in Figure 4-11) and should not be used for temporary storage.

Figure 4-11. Interrupt Vector Register Format


### 4.5.8 Interrupt Return Register

The interrupt return register (IRAREG) retains a copy of the program counter at the time of an external interrupt. This address is used as the next execution address upon returning from the interrupt. Bit 16 (I) is set high when the address points to an internal instruction or set low when the address is in an external instruction. This register is not affected by the reset signal and, as illustrated in Figure 4-12, is only 17 bits wide and should not be used for temporary storage.

Figure 4-12. Interrupt Return Register Format


### 4.5.9 COUNTX and COUNTY Registers

The counter registers (COUNTX, COUNTY) are used to store the current counts of the minimum and maximum values when executing MIN-MAX instructions. They may also serve as temporary storage for the user. COUNTX and COUNTY are cleared on reset.

Figure 4-13. COUNT Registers Format

| 31 | $16 \quad 15$ | Count for MIN value |
| :--- | :--- | :--- | :--- | :--- |
| Count for MAX value | 0 |  |

The COUNTX register is updated on both the 1-D and 2-D MIN-MAX instruction such that the count of the current minimum value is in the lower 16 bits of the register and the count of the current maximum value is in the upper 16 bits. The COUNTY register is used only in the 2-D MIN-MAX instruction to keep track of the counts of the minimum and maximum for the second value of a pair.

### 4.5.10 MIN-MAX/LOOPCT Register

The MIN-MAX/LOOPCT register stores the current values of two separate counters. The LSH contains the current loop counter and the MSH is used to hold the current minimum or maximum value of a MIN-MAX operation. This register may also serve as temporary storage for the user. The MIN-MAX/LOOPCT register is cleared upon reset.

Figure 4-14. MIN-MAX/LOOPCT Register Format
31

| MIN-MAX value | Loop Count |
| :---: | :---: |

### 4.6 FPU Core

The FPU core consists of a multiplier and ALU, each with an intermediate pipeline register and an output register. The multiplier and ALU may operate independently or in parallel.

The major components include:
Operand multiplexers
Pipeline registers
ALU
Multiplier
Output control
Figure 4-15 shows a functional block diagram of the FPU core.

### 4.6.1 Operand Selection

Four multiplexers select the multiplier and ALU operands. Possible operand sources are:

RA and RB register files
Internal feedback registers C and CT
FPU core output registers
The FPU core output registers provide the previous multiplier or ALU result. Note that if the output registers are used as operands, they must be enabled. (See subsection 4.6.2 - Pipeline Registers - for additional details.)

For external instructions, immediate data from the LAD bus or the value 1 may also be chosen as operands. These are selected by setting the appropriate address bits (see section 4.5 - Registers) and selecting the RA or RB register file as operands.

The selection of operands also depends on the ALU or multiplier operation chosen. Single-operand instructions aregenerally performed only on registers in the RA file. Exceptions to this are the PASS instruction and certain complex internal instructions. Also in chained mode (the ALU and multiplier acting in parallel) the RB operand may optionally be forced to 0 in the ALU or 1 in the multiplier.

Figure 4-15. FPU Core Functional Block Diagram
to/from Control Register


### 4.6.2 Pipeline Registers

Two levels of internal data registers are available to segment the internal data paths of the TMS34082 FPU core. The registers are enabled by setting the PIPES2-1 bits in the configuration register. The most basic choice is whether to use the device in unpipelined mode (with no internal registers enabled) or whether to enable one or more pipeline registers. When no internal registers are enabled, the clock period is longest (the TMS34082 timing specitications are contained in Appendix A) .

Enabling one or both sets of pipeline registers segments the data paths. When the intermediate pipeline is enabled, the register-to-register delay inside the device is minimized, allowing operation with the minimum cycle time. While one FPU instruction is executing, the next instruction may be input so that overlapping operations occur. This is commonly known as pipelined execution.

The TMS34082 may also operate with both sets of pipeline registers disabled. With this setting, two 32 -bit operands are read from the register file, an operation is performed by the ALU or multiplier, and the result is stored in the register file, all in one clock cycle. A double-precision ALU operation takes one clock cycle, but double-precision multiplies require two clock cycles to complete.

When the ALU and multiplier operate in parallel (chained mode), two data operands come from the register files while multiplier and ALU feedback provide the other two operands. Therefore, in chained mode the FPU core outputregisters mustbe enabled. After the chained operation is completed and the results have been stored, the FPU core output registers may be disabled again. Wait until all operations have completed to change pipeline settings to avoid loosing any results.

The selection of pipeline registers determines the latency from input to output, the number of cycles required for an instruction to be processed and the results to appear. For each register level enabled in the data path, one clock cycle is added to the latency from input to when the result is valid in the register file. Figure $4-16$ shows the latency of different pipeline settings. A result may be used as input on the same cycle that it is clocked into the register file.

## Figure 4-16. Effects of Pipelining

| Clock Cycle | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Instruction Register | Instruction 1 | Instruction 2 | Instruction 3 | Instruction 4 |  |  |  |
| gister File |  | Instruction 1 Results | Instruction 2 Results | Instruction 3 Results | Instruction 4 Results |  |  |



Both sets of pipeline registers are controlled by the PIPES2 and PIPES1 bits in the configuration register. When the device is powered up or reset, the intermediate pipeline registers are enabled (PIPES1=0) and the output registers are transparent (PIPES2=1). For internal instructions, control logic sets the pipeline registers as needed and restores them to their previous configuration after the instruction is completed.

Pipeline settings should be changed only when all instructions executing in the FPU core are completed and results are stored in the register file. Otherwise, results will be lost. The nop (No Operation) instruction may be inserted to allow time for the last instruction to finish before changing the pipeline configuration.

When using chained mode, the nop instruction may be used to adjust output register timing. Each nop instruction keeps the results in the output registers for one additional clock cycle. nop may be used in this manner only when the output registers are enabled.

### 4.6.3 ALU

The pipelined ALU contains a circuit for floating-point addition and/or subtraction of aligned operands, a pipeline register, an exponent adjuster, and a normalizer/rounder as shown in Figure 4-17. Exception logic is provided to detect denormalized inputs; these can be flushed to zero if the FAST input is set high. If the FAST input is low, the ALU accepts a denormal as input. The denormal exception flag (DENORM) goes high when the ALU output is a denormal.

Figure 4-17. Functional Diagram for ALU


Integer processing in the ALU includes both arithmetic and logical operations in either $2 s$ complement numbers or unsigned integers. The ALU performs addition, subtraction, comparison, logical shifts, logical AND, logical OR, and logical XOR. Format conversions and wrapping/unwrapping of denormals are also done by the ALU.

### 4.6.4 Multiplier

The pipelined multiplier (see Figure 4-6) performs a basic multiply function, division, and square root. The operands can be single- or double-precision floating-point numbers and can be converted to absolute values before multiplication takes place. Integer operands may also be used.

If the operands to the multiplier are double-precision or mixed precision (i.e., one single-precision and one double-precision), then one extra clock cycle is required to get the product through the multiplier pipeline. This means that for PIPES1=1, one clock cycle is required for the multiplier pipeline; for PIPES1=0, two clock cycles are required for the multiplier pipeline.

An exception circuit is provided to detect denormalized inputs; these are indicated by a high on the DENIN signal. Denormalized inputs must be wrapped by the ALU before multiplication, division, or square root. If results are wrapped (signaled by a high on the DENORM status pin), they must be unwrapped by the ALU first.

The multiplier and ALU can be operated simultaneously. Division and square root are each performed as an independent multiplier operation, even though both multiplier and ALU are active during these operations.

Figure 4-18. Functional Diagram for Multiplier


### 4.6.5 Output Control

An output MUX selects which result (ALU or multiplier) is written to the register. The instruction specifies where the result is stored. Results may be directed to the twenty registers in files RA and RB, the feedback registers ( $C$ and $C T$ ), or the other temporary storage registers.

Although it is possible to direct the result to the CONFIG, STATUS, MCADDR, VECTOR, IRAREG, SUBADD0, and SUBADD1 registers, it is not recommended. These registers have dedicated functions as discussed in section 4.5 .

The COUNTX, COUNTY, and MIN-MAX/LOOPCT may be used as temporary storage registers. Because they are only 32-bits wide, double-precision results cannot be stored in these registers.

## 4.7 $\overline{\text { RESET }}$ and RDY

The RESET input is an active low signal that asynchronously clears the internal states and resets the configuration and status registers to the default values. Internal pipeline registers are cleared, but the register files, C, and CT are not affected.

During reset, control inputs are in an inactive state as shown in Table 4-10. The LAD and MSD buses are placed in a high-impedance state, and the MSA bus outputs an address of 0 .

## Table 4-10. Signal States During Reset

| Signal Name | Logic Level |
| :---: | :---: |
| LAD31-0 | high impedance |
| $\overline{\text { ALTCH }}$ | high |
| $\overline{\text { CAS }}$ | high |
| $\overline{\text { WE }}$ | high |
| MSD31-0 | high impedance |
| MSA15-0 | low |
| $\overline{\mathrm{DS} / \overline{\mathrm{CS}}}$ | high |
| $\overline{\text { MAE }}$ | high |
| $\overline{\mathrm{MCE}}$ | high |
| $\overline{\text { MOE }}$ | high |
| $\overline{\text { MWR }}$ | high |
| $\overline{\mathrm{COINT}}$ | high |
| $\overline{\mathrm{CORDY}}$ | high |
| $\overline{\mathrm{INTG}}$ | low |

$\dagger$ Host-independent mode only.
Operation resumes on the rising edge of the clock after RESET is set high again. In host-independent mode, MCE becomes active and causes a read from code address 0 . In coprocessor mode, the TMS34082 goes to an idle state, waiting for an instruction from the TMS34020.

The TMS34082 can be nondestructively stalled by setting the RDY input low. The next rising clock edge is inhibited. Normal operation resumes on the cycle after the RDY input is set high again.

While halted, the registers and internal states are unaltered. Output pins remain at their previous levels. The asynchronous inputs ( $\overline{L O E}, \overline{M O E}$, and $\overline{\text { RESET }}$ ) are still active. If an interrupt is received while the device is stalled, it will be queued and serviced after operation resumes.

### 4.8 Emulation Control

Two emulation mode control pins, EC1-0, support system testing. These may be used, for example, to place all outputs in a high-impedance state, isolating the TMS34082 from the rest of the system.

Test modes are given in Table 4-11. For normal operation, EC1 and EC0 must both be high.

Table 4-11. Test Modes

| EC1-0 | Operation |
| :---: | :---: |
| 00 | All output and $1 / O$ pins are forced low |
| 01 | All output and 1/O pins are forced high |
| 10 | All output pins are placed in high-impedance state |
| 11 | Normal operation |

### 4.9 JTAG Test Port

The TMS34082 includes a 4-wire Test Access Port (TAP) interface that allows serial scan access to test circuitry within the device. This TAP is compatible with the IEEE 1149.1 (JTAG) specification. It was designed using the TI Scope ${ }^{T M}$ (System Controllability, Observability, and Partitioning Environments) guidelines. For normal operation, the input pins should be connected as shown below.

Table 4-12. Test Pins for Normal Operation

| Signal Name | Logic Level |
| :---: | :---: |
| TCLK | Tied low or high |
| TDI | Tie high or leave floating |
| TMS | Tie high or leave floating |

### 4.9.1 Test Instructions

The TAP includes an 8-bit instruction register used to tell the device what instruction is to be executed. The instruction register is loaded serially via the TDI input. The order of scan is shown in Figure 4-19.

Figure 4-19. Instruction Register Order of Scan


Four test instructions are supported; Table 4-13 lists their binary opcodes. Any instruction code not supported is interpreted as the Bypass instruction.

## Bypass

A one-bit bypass register is selected in the scan path. Data input from TDI is shifted into the bypass register, then out through TD0.

## Extest

This is the 1149.1 Extest instruction with the boundary scan register in the scan path. Data appearing at the device inputs and outputs is captured. Data previously loaded into the boundary scan register is applied to the device inputs and through the device outputs.

## Intest

This is the 1149.1 Intest instruction with the boundary scan register in the scan path. Data appearing at the device inputs and outputs is captured. Data previously loaded into the boundary scan register is applied to the device inputs and through the device outputs.

Sample
This instruction conforms to the 1149.1 Sample/Preload instruction. Data appearing at the device inputs and outputs is sampled without affecting normal operation. The boundary scan register is selected in the scan path.

Table 4-13. Instruction Register Opcodes

| Binary Opcode | Opcode | Description |
| :---: | :---: | :--- |
| 00000000 | BYPASS | Bypass scan |
| 00000011 | INTEST | Boundary scan in test mode |
| 10000010 | SAMPLE | Sample boundary scan in normal mode |
| 11111111 | EXTEST | Boundary scan in test mode |

### 4.9.2 Boundary Scan Register

The boundary scan register contains 181 bits, one for each functional input and output on the TMS34082. Each I/O pin has both an input and an output register bit associated with it. In addition, some three-state outputs have an additional bit in the scan register. These represent internal three-state enable registers, not actual pins on the package. Table 4-14 lists these scan bits and the outputs they affect.

Table 4-14. Boundary Scan Register Enable Bits

| Scan Name | Affected Outputs |
| :---: | :---: |
| CO-EN | $\overline{\text { COINT }}$, CORDY |
| ALTCH-EN | $\overline{\text { ALTCH }}$ (output) |
| CAS-EN | $\overline{\text { CAS (output), } \overline{\text { WE }} \text { (output) }}$ |
| LAD-EN | LAD31-O (outputs only) |
| MSD-EN | MSD31-0 (outputs only) |
| MSA-EN | MSA15-0 |
| MWR-EN | $\overline{\text { MWR }, \overline{M O E}, \text { DS/CS, } \overline{\text { MCE, }} \text {, INTG }}$ |

The boundary scan register is used to store test data that is to be applied internally and/or externally to the TMS34082 and to capture and store data that is applied to the functional inputs and outputs. The boundary scan register order of scan is shown in Figure 4-20.

Figure 4-20. Boundary Scan Register Order of Scan


Figure 4-20. Boundary Scan Register Order of Scan (Continued)


Figure 4-20. Boundary Scan Register Order of Scan (Continued)


## Chapter 5

## Coprocessor Mode

The TMS34082 provides closely coupled floating-point support for the TMS34020. The devices were designed with a direct-wire interface that requires no additional external glue logic. Combinations of TMS34020 and TMS34082 devices provide the performance to cover a broad range of graphic applications. This family of solutions makes upgrading your design easy.

The TMS34082 is more than a simple coprocessor. It contains complex instructions specifically tailored for graphics operations. The ability of the TMS34020 and TMS34082 to operate in parallel, support for multiple TMS34082 devices, and the option of adding external user-generated subroutines also increase system performance.

### 5.1 TMS34020/TMS34082 Interface Overview

Operation in coprocessor mode assumes the MSTR input signal is set low. In this mode, the TMS34082 acts as a tightly coupled coprocessor to the TMS34020. In terms of the instruction set and register resources, the TMS34082 appears as an extension to the TMS34020 register and instruction set.

Figure 5-1 shows the register allocation for the TMS34020/TMS34082 combination.

Figure 5-1. TMS3402/TMS34082 Register Model


TMS34020 Registers
TMS34082 Registers
The TMS34082 executes two different instruction sets:
Internal instructions from the TMS34020 are input on the LAD port. They include complex graphics, matrix, and vector routines. These are described in Chapter 7.

External instructions are input on the MSD port. This is a RISC-like instruction set. They are used to write user-defined subroutines. External instruction are covered in Chapter 8.

The interface between the TMS34020 and the TMS34082 consists of direct connections between pins. No glue logic is required other than gating the ready signals into the TMS34020. Figure 5-2 shows the interconnection.

The LAD interface includes the following signals: LAD31-0, $\overline{\mathrm{LOE}}, \overline{\mathrm{ALTCH}}$, LRDY, BUSFLT, $\overline{R A S}, \overline{\mathrm{CAS}}, \overline{W E}, \mathrm{SF}, \overline{\mathrm{COINT}, \mathrm{CORDY} \text {. These signals }}$ communicate between TMS34020 and TMS34082 in coprocessor mode. COINT and CORDY are the only signals that go from the TMS34082 to the TMS34020; all other signals are inputs to the TMS34082. COINT should be connected to one of the TMS34020 local interrupt requests, LINT1 or LINT2.

Figure 5-2. TMS34020/TMS34082 Interconnection


CORDY from the TMS34082 is logically ORed with other ready signals from the system to form the TMS34020 LRDY input ready signal. Note that LRDY connects to both the TMS34020 and the TMS34082 inputs.

When operating in the coprocessor mode, connect the remaining TMS34082 pins as shown in Table 5-1.

Table 5-1. Recommended TMS34082 Pin Connections

| Signal Name | Description | Logic Level |
| :--- | :--- | :---: |
| MSTR | Coprocessor/Host-independent mode select | tie low |
| CLK | Host-independent mode clock | tie low |
| CID2-0 | Coprocessor ID (assembler default is $000_{2}$ ) | tie low |
| EC1-0 | Emulator mode control | tie high |
| TCK | Test clock input | tie low |
| $\overline{\text { LOE }}$ | LAD output enable | tie low |
| $\overline{\text { INTR }}$ | Interrupt request input | tie high |

### 5.2 Clocks

Local clock input signals LCLK1 and LCLK2 are generated by the TMS34020. Internally, the TMS34082 generates a rising clock edge from each LCLK1 edge (rising or falling). In coprocessor mode, the TMS34082 actually operates at twice the LCLK1 input clock frequency.

LCLK1 controls most of the TMS34082 internal logic while LCLK2 is used for several simple functions such as synchronizing interrupt requests.

CLK is the system clock input in host-independent mode. It should be tied low for coprocessor mode.

### 5.3 TMS34082 Initialization

The TMS34082 uses the same $\overline{\text { RESET }}$ input signal that the TMS34020 uses. Upon reset, the TMS34082 clears all pipeline registers and internal states. The configuration register and status register return to their default values. When RESET returns high in coprocessor mode, the TMS34082 is in an idle state waiting for the next instruction from the TMS34020. The RESET signal is an asynchronous signal and does not require specific setup or hold times to a clock. However, the minimum pulse duration requirement must be met.

### 5.4 Configuration Register Settings for Coprocessor Mode

The configuration register (CONFIG) defines several selectable features of the TMS34082. The following subsections recommend settings for this register in coprocessor mode. Part of your system initialization program should set the configuration register to the appropriate value.

### 5.4.1 Exception Masks

Since inexact operations are common in floating-point operations, you should usually disable this exception for both the multiplier and ALU by setting the MINEX and AINEX bits to 0 .

### 5.4.2 Fast vs IEEE Mode

For most graphics applications where integer and single-precision floating-point number formats are used, operating the TMS34082 in Fast mode is sufficient. This also holds true for most double-precision floating-point applications. Because the internal instruction set does not include instructions to wrap and unwrap denormalized numbers, you should use Fast mode if you do not have memory on the MSD port for external instructions.

However, when working with very large or very small double-precision values, IEEE mode can be used to operate on denormalized numbers. Possible uses of IEEE mode include image processing and digital signal processing applications where accuracy is critical. External instructions must be used to wrap and unwrap denormalized numbers. See Chapter 8 for details on these instruction.

### 5.4.3 Pipeline Mode Settings

For coprocessor mode, the TMS34082 pipeline mode settings (PIPES2-1 in the CONFIG register) affect the performance of very few internal instructions. Most simple instructions, such as adds or multiplies, finish executing before the TMS34020 can issue the next instruction. Using the default setting allows you to run the TMS34082 at the maximum clock rate. This setting (PIPES2 $=1$, PIPES1 $=0$ ) is recommended unless you are using chained mode external instructions. While using chained mode instructions, PIPES2 should be set low to enable the FPU core output registers.

The complex instructions contained in internal ROM change the pipeline setting as needed and restore the previous pipeline setting after the instruction is completed.

### 5.5 TMS34020/TMS34082 LAD Bus Operation

The TMS34020 local memory interface is made up of a multiplexed address/data bus and associated control signals. During a memory cycle, the address and status are output on the LAD bus, and then the LAD bus is used for the data transfer. The local memory and DRAM/VRAM interfaces are used for transferring data or instructions between the TMS34020, memory, or the TMS34082 in addition to generating refreshing cycles for DRAM/VRAM.

In coprocessor mode, the TMS34082 LAD bus connects directly to the TMS34020 LAD bus. Coprocessor commands from the TMS34020 are input on this bus. In addition, data transfers between the TMS34020 or its local memory and the TMS34082 occur through the LAD bus. Transfers between the LAD and MSD buses can also be programmed.

A single coprocessor instruction may be used to pass a command to the TMS34082 and transfer data to/from the TMS34020 or memory. There are five general types of coprocessor instructions.

Command-only instructions transfer no data to the TMS34082.
TMS34020 to TMS34082 transfer instructions pass a command and data to the coprocessor. Three types of transfers are available:
move one 32-bit parameter
move two 32-bit parameters
move one 64-bit parameter
TMS34082 to TMS34020 transfer instructions pass a command to the coprocessor and the TMS34082 outputs data to the LAD bus. Two types of instructions are available:
move one 32-bit parameter
move one 64-bit parameter
Memory to TMS34082 transfer instructions pass a command from the TMS34020 and data from memory to the coprocessor. Up to 32 32-bit words may be transferred. Three types of memory moves are available:
move the number of words specified in the coprocessor instruction using postincrement
move the number of words specified in the coprocessor instruction using predecrement
move the number of words specified in a register using postincrement.

TMS34082 to memory transfer instructions pass a command to the coprocessor and the TMS34082 outputs data to the LAD bus. Up to 32 32-bit words may be transferred. Two types of memory moves are available:
move the number of words specified in the coprocessor instruction using postincrement
move the number of words specified in the coprocessor instruction using predecrement

### 5.5.1 LAD Bus Protocol

Both data and instructions are transferred over the bidirectional LAD bus in coprocessor mode. A unique combination of signal inputs distinguishes an instruction from data. SF, $\overline{\mathrm{ALTCH}}, \overline{\mathrm{CAS}}, \overline{\mathrm{RAS}}$, and $\overline{\mathrm{WE}}$ are used to distinguish coprocessor functions from other operations on the LAD bus.

The TMS34020 first fetches a coprocessor instruction from either internal cache or from local memory on the LAD bus. A coprocessor command is then issued to the TMS34082 from the TMS34020 by way of the following protocol:

A valid coprocessor ID (CID2-0) on LAD31-29
LAD3-0 $=0000_{2}$
$\overline{\text { RAS }}$ high
SF high during the falling edge of $\overline{\text { ALTCH }}$
Note: When using one TMS34082 in a system, the assembler/compiler default for $\mathrm{CID} 2-0=000$.
The command is then decoded and executed by the appropriate TMS34082. If a command-only instruction is issued, the TMS34082 begins execution at the rising edge of LCLK1 after $\overline{\text { ALTCH }}$ falls. A timing diagram for command-only instructions is shown in Figure 5-3.

Figure 5-3. Transferring a Command from the TMS34020 to the TMS34082


If operands are required from DRAM/VRAM, the TMS34020 sets up the appropriate DRAMNRAM address and timing. The data is then transferred directly between the TMS34082 and DRAM/VRAM.

All transfers to/from the TMS34082 are 32 bits wide. Therefore, the TMS34082 uses neither the TMS34020 SIZE16 signal nor all four individual byte enables (CAS3-0). Also, the even 32 TMS34020 assemblerdirective should be placed before all blocks of DRAM/VRAM memory that are used to store data or external code to be sent to the TMS34082. If the 32-bit words are not aligned on long word boundaries, the data is not sent to the TMS34082 correctly.

Instructions that pass data and commands to the TMS34082 begin execution on the rising edge of LCLK1 after $\overline{\mathrm{CAS}}$ rises after the last data transfer. Timing diagrams for instructions that transfer data and commands are given in Figures 5-4 through 5-7.

Figure 5-4. Transterring TMS34020 Registers to the TMS34082


Figure 5-5. Transferring from the TMS34082 to a TMS34020 Register


Figure 5-6. Transferring Memory to the TMS34082


When the TMS34082 is transferring data to memory, the TMS34020 outputs the memory address on the LAD bus. An extra clock cycle, called a spacer, is then inserted before the TMS34082 outputs data. The spacer is added to allow time for the TMS34020 to stop driving the LAD bus and the TMS34082 to set up valid data on LAD.

Figure 5-7. Transferring from the TMS34082 to Memory


### 5.5.2 Enabling the LAD Bus Drivers

The LAD bus drivers are enabled only when $\overline{\text { LOE }}$ is low, the correct TMS34082 coprocessor ID has been selected, and during the proper time slot within the execution cycle. Just bringing LOE low does not cause the LAD bus drivers to turn on. For most applications using a single TMS3020, TMS34082, and DRAMNRAM, $\overline{\text { LOE }}$ may be tied low.

In a system with multiple TMS34082 coprocessors, only one coprocessor can drive the LAD bus at a time. The TMS34082 contains internal logic that only allows it to drive the LAD bus when its coprocessor ID is contained in the move instruction. A TMS34082 write instruction with the broadcast ID is ignored.

### 5.5.3 Bus Faults

The TMS34082 BUSFLT input signal also ties directly to the TMS34020 BUSFLT pin. The TMS34082 supports bus retries and bus fault conditions in conjunction with the TMS34020. The bus cycle conditions are defined in Table 5-2.

## Table 5-2. Bus Cycle Completion Conditions

| Completion Condition | BUSFLT | LRDY |
| :--- | :---: | :---: |
| Wait | 0 | 0 |
| Successful transfer | 0 | 1 |
| Retry | 1 | 0 |
| Bus fault | 1 | 1 |

In the event of a systems fault involving the TMS34082, the abort command allows the TMS34020 to regain control. The abort terminates all coprocessor activity, restoring the TMS34082 to a known state so that it is available for further commands from the TMS34020. Chapter 7 covers the abort command in greater detail.

### 5.6 Polling the Coprocessor

When the TMS34020 issues an instruction to the TMS34082, CORDY (coprocessor ready) is high. It remains high even while the TMS34082 is busy executing the instruction. However, if another instruction is sent by the TMS34020 before the previous instruction has completed, CORDY will go low immediately, indicating that the TMS34020 must wait. When the TMS34082 is ready to accept the new instruction, CORDY returns high to signal the TMS34020 that the coprocessor is ready to accept a command. Because CORDY is usually ORed with other terms to form LRDY, CORDY going low also sends LRDY low, halting the TMS34020.

The instruction will still be valid on the LAD bus when CORDY (and LRDY) toggle, and the TMS34082 will latch the instruction. However, for longer TMS34082 operations, such as lengthy subroutines stored in SRAM, the TMS34020 may have to wait for a long period of time before the TMS34082 is ready. This ties up the TMS34020 and keeps it from executing other code. Instead, the TMS34020 can check the coprocessor's operating condition before issuing an instruction by way of the check status command. The TMS34020 assembler pseudo-op for this command is CHECK.

In response to the check status command, the TMS34082 outputs a status code to signal if it is busy or not. The TMS34082 returns a value of all 1 s if busy or all Os if idle, as shown in Table 5-3. This instruction is described further in Chapter 7.

## Table 5-3. Bit Definitions for TMS34020 Status Check Command

| Description | LAD Output |
| :---: | :---: |
| Coprocessor not busy | 0000 0000h |
| Coprocessor busy | FFFF FFFFh |

The TMS34020 does not have to enter an extended wait state to obtain access to the selected coprocessor, but may continue with another task not requiring the TMS34082. This allows the two devices to execute instructions in parallel. See Example 5-1 for an example of code using the check status command.

## Example 5-1. Using the Status Check Command



### 5.7 Interrupt Handling

The TMS34082 has two interrupt input sources in coprocessor mode:
An exception detect (ED) interrupt used to signal the TMS34082 that a status exception occurred

A software interrupt generated by an external instruction input on the MSD bus

Each exception has its own interrupt enable flag in the status register. If external SRAM memory is not used, the software interrupt should be disabled. On reset, the exception detect (ED) interrupt is enabled and the software interrupt is disabled.

Because hardware interrupts are not allowed in coprocessor mode, the hardware interrupt should be disabled. This is the default setting of the hardware interrupt enable flag in the status register. Also, INTR should be tied high.

### 5.7.1 Exception Detect Interrupts

If the exception detect interrupt is enabled, $\overline{\text { COINT }}$ goes low when the ED flag in the status register is 1 . The ED flag goes high when a status exception occurs (see subsection 4.5.3.1) COINT signals the exception to the TMS34020. This exception does not cause the TMS34082 to branch to the interrupt vector register address. The TMS34082 aborts the current instruction and goes to an idle state.

The $\overline{\text { COINT }}$ signal may be connected to either the TMS34020 LINT1 or LINT2 input. You can also combine COINT with other interrupt requests in the system to form LINT1 or LINT2. If its interrupts are enabled, the TMS34020 will branch to an interrupt vector to service the TMS34082 request.
$\overline{\text { COINT }}$ and ED are reset by reading the STATUS register. You should do this as part of your interrupt service routine.

In the interrupt service routine, saving the state of the TMS34082 may be desired. This is best accomplished by executing a block move of the TMS34082 registers to DRAM/VRAM memory. The TMS34020 assembly language instructions listed in Example 5-2 can be used for the desired precision. These routines do not save or restore the C and CT register. Restoring the TMS34082 machine state consists of moving the register values from memory back to the TMS34082. Restoring the status register sets the ED flag high. However, writing a 1 to ED will not cause an interrupt.

## Example 5-2. Saving and Restoring the TMS34082 Machine State

```
MOVE RA0, *A1+, 30 ; integer move, use TMS34020 register Al
    ; as the memory pointer
MOVF RA0, *A1+, 30 ; single-precision move, use TMS34020
    ; register Al as memory pointer
MOVD RA0, *AI+, 15 ; double-precision move, use TMS34020
; register Al as memory pointer,
MOVD RB1, *A1+, 15 ; remainder of double-precision move
    ; restoring TMS34082 machine state
MOVE *AI+, RA0, 30 ; integer move, use TMS34020 register A1
    ; as the memory pointer
MOVF *AI+, RA0, 30 ; single-precision move, use TMS34020
    ; register Al as memory pointer
MOVD *AI+, RA0, 15 ; double-precision move, use TMS34020
    ; register Al as memory pointer,
MOVD *A1+, RB1, 15 ; remainder of double-precision move
```


### 5.7.2 Software Interrupts

If software interrupts are enabled, an interrupt may be generated by an external instruction fetched from the MSD port. The interrupt sets the interrupt grant output (INTG) low, saves the current program counter in the interrupt return register (IRAREG) and branches to the address in the interrupt vector register. Interrupts are also disabled.

Your service routine should restore software interrupts at the end. The final instruction should be a return from interrupt that will branch to the value in the interrupt return register.

### 5.7.3 Interrupting the TMS34020

For some applications using long external subroutines, it is desirable to interrupt the TMS34082 to signal that the subroutine is finished. This relieves the TMS34020 from having to check the TMS34082 to see if it is ready for the next instruction.

This may be accomplished by intentionally executing an instruction (in external code) that sets the ED flag high. This causes COINT to go low, signaling an interrupt to the TMS34020. Any instruction that generates an exception flag, such as invalid operation, will work.

Possible instructions include:
Divide using 0 as the dividend
Use NaN as the operand for any instruction
Unwrap the floating-point value one (unwrap ONE.f)
In order to distinguish an intentional ED interrupt from one generated by a real exception, a register or memory location should first be loaded with a status code. Then the illegal operation is performed. The TMS34020 interrupt service routine should read the register or memory location to determine if the interrupt was intentional. The routine should also reset the register or memory location.

Before causing the ED interrupt, the external routine should make sure the internal stack (registers SUBADDR0 and SUBADDR1) is empty. This can be accomplished by clearing the stack pointers (bit 31) in both registers. You may wish to save the contents of these registers in external memory before clearing the stack pointers.

### 5.8 TMS34020/TMS34082 Code Example

Using combinations of the MMPY0F, MMPY1F, MMPY2F, and MMPY3F single-precision floating-point multiply instructions allows for several matrix multiply operations: $1 \times 3$ by $3 \times 3,1 \times 4$ by $4 \times 4,3 \times 3$ by $3 \times 3$, and $4 \times 4$ by $4 \times 4$. The following example shows the use of MMPY0F, MMPY1F and MMPY2F in performing a single-precision floating-point $3 \times 3$ by $3 \times 3$ matrix multiply, giving a $3 \times 3$ matrix result.

## Example 5-3. Multiplying Two $3 \times 3$ Matrices

$$
\left[\begin{array}{lll}
\mathrm{A}_{00} & A_{01} & A_{02} \\
\mathrm{~A}_{10} & A_{11} & A_{12} \\
\mathrm{~A}_{20} & A_{21} & A_{22}
\end{array}\right] \times\left[\begin{array}{lll}
\mathrm{B}_{00} & \mathrm{~B}_{01} & \mathrm{~B}_{02} \\
\mathrm{~B}_{10} & \mathrm{~B}_{11} & \mathrm{~B}_{12} \\
\mathrm{~B}_{20} & \mathrm{~B}_{21} & \mathrm{~B}_{22}
\end{array}\right]=\left[\begin{array}{lll}
\mathrm{C}_{00} & \mathrm{C}_{01} & \mathrm{C}_{02} \\
\mathrm{C}_{10} & \mathrm{C}_{11} & \mathrm{C}_{12} \\
\mathrm{C}_{20} & \mathrm{C}_{21} & \mathrm{C}_{22}
\end{array}\right]
$$

Algorithm:

$$
\begin{aligned}
& C_{00}=A_{00} \times B_{00}+A_{01} \times B_{10}+A_{02} \times B_{20} \\
& C_{01}=A_{00} \times B_{01}+A_{01} \times B_{11}+A_{02} \times B_{21} \\
& C_{02}=A_{00} \times B_{02}+A_{01} \times B_{12}+A_{02} \times B_{22} \\
& C_{10}=A_{10} \times B_{00}+A_{11} \times B_{10}+A_{12} \times B_{20} \\
& C_{11}=A_{10} \times B_{01}+A_{11} \times B_{11}+A_{12} \times B_{21} \\
& C_{12}=A_{10} \times B_{02}+A_{11} \times B_{12}+A_{12} \times B_{22} \\
& C_{20}=A_{20} \times B_{00}+A_{21} \times B_{10}+A_{22} \times B_{20} \\
& C_{21}=A_{20} \times B_{01}+A_{21} \times B_{11}+A_{22} \times B_{21} \\
& C_{22}=A_{20} \times B_{02}+A_{21} \times B_{12}+A_{22} \times B_{22}
\end{aligned}
$$

Matrix values:

| MATRIX A $=$ | 10 | 0 | 11 |
| :--- | ---: | ---: | ---: |
| -3 | -1 | -5 |  |
| 13 | 1 | 6 |  |
| MATRIX B $=$ | 3 | 1 | 5 |
| 2 | 1 | 3 |  |
| 4 | -1 | 1 |  |
| MATRIX C $=$ | 76 | -1 | 61 |
|  | -32 | 1 | -23 |
| 65 | 8 | 74 |  |

## Example 5-4. Instructions for a $3 \times 3$ by $3 \times 3$ Matrix Multiply

```
; Code for multiplying one 3 < 3 by another 3 < 3 matrix
    .IEEEFL ; Force IEEE floating-point representations
BEGIN;
; Move matrix B to the TMS34082
    MOVI MATRIXB,A0
    MOVF *A0+,RA0, 10
    MOVF *A0+,RB0,6
Point AO to first row of matrix A
    MOVI MATRIXA,A0
; Point Al to first row of matrix C
    MOVI MATRIXC, A1
    MOVI 3, A2
                                    ; three rows
ROWLOOP;
; Loop through all three rows
    MOVF *A0+,RB9,1 ; Movefirst A value on row to the TMS34082
    MMPYOF ; Multiply down the B column
    MOVF *A0+, RB9, 1 ; Move second A value on row to the TMS34082
    MMPY1F ; Multiply and accumulate down the second B column
    MOVF *A0+, RB9, 1 ; Move third A value on row to the TMS34082
    MMPY2F ; Multiply and accumulate down the third B column
; Move the current C row into TMS34020 memory
    MOVF RB6, *A1+, 3 ; Get the three row values
    DEC A2 ; Done four rows yet?
    JRNZ ROWLOOP ; If no, then compute the next row
HERE; JRUC HERE ; Done, endless loop
;Matrix storage
    .SECT "DATA"
MATRIXA
    .FLOAT 10, 0, 11
    .FLOAT -3, -1, -5
    .FLOAT 13, 1, 16
MATRIXB
    .FLOAT 3, 1, 5, 0 ; The zeros on the end of these rows are
    .FLOAT 2, 1, 3, 0 ; not necessary, but allow a memory-to-
    .FLOAT 4, -1, 1, 0 ; register transfer for the matrix.
    .FLOAT 0, 0, 0, 0 ; This row of zeros is necessary
MATRIXC
    .FLOAT 0, 0, 0
    . FLOAT 0, 0, 0
    .FLOAT 0, 0, 0
.SECT "TEXT"
```


### 5.9 TMS34020/TMS34082 Timing Examples

The following timing diagrams illustrate the timing relationships between the TMS34020 and TMS34082.

Figure 5-8 shows the multiplication of two double-precision numbers in TMS34020 registers and assumes that the TMS34020 instructions are contained in cache. The assembler source code is shown below.

## Example 5-5. Assembler Source for Double-Precision Multiply

```
MOVD A0, A1, RA0
MOVD A2, A3, RB0
MPYD RA0, RB0, RA4
MOVD RA4, A4, A5
```

Figure 5-9 shows an add operation for two single-precision numbers from DRAM assuming that the TMS34020 instructions are contained in cache. The assembler source code is shown below.

## Example 5-6. Assembler Source for Single-Precision Add

```
ADDF *A0+, RA0, RB0, RA2
MOVF RA2, *A1+
```

Figure $5-10$ shows the same add operation (adding two single-precision numbers from DRAM). However, this time the TMS34020 instructions are not in cache.


Note: Assume instructions are in TMS34020 cache, TMS34082 pipeline registers turned on (PIPES1=0) and output registers turned off (PIPES2=1), TMS34082 load order is MSH, then LSH.

Figure 5-8. Multiply 2 Double-Precision Numbers in TMS34020 Registers and Store Result Back to TMS34020 Register (Mode 0)


Note: Assume instructions are in TMS34020 cache, TMS34082 pipeline registers turned on (PIPES1 = 0) and output registers turned off (PIPES2-1), DRAM page mode accesses.

Figure 5-9. Add 2 Single-Precision Numbers from DRAM and Store Result Back to DRAM (Mode 2)


Note: Assume instructions are not in TMS34020 cache, TMS34082 pipeline registers turned on (PIPES1 $=0$ ) and output registers turned off (PIPES2=1) DRAM page mode accesses.

Figure 5-10. Add 2 Sinqle-Precision Numbers from DRAM and Store Result Back to DRAM (Mode 2), Instructions Not in TMS34020 Cache

### 5.10 MSD Bus Operation in Coprocessor Mode

Use of the MSD bus in coprocessor mode is optional. External memory on MSD31-0 can be used to store data, user-programmed subroutines, or both. External instructions for user-defined subroutines are covered in Chapter 8. Control signals for MSD and MSA buses, discussed in subsection 4.3.2, operate the same in host-independent and coprocessor modes. Different combinations of control signals distinguish between data memory and code memory.

Data or program code can be downloaded to external memory from the LAD bus. The data (or code) can be stored in the TMS34020's DRAM/VRAM memory and loaded by a LAD-to-MSD bus transfer.

### 5.10.1 Connecting External Memory

External coprocessor code space is added to the TMS34082 MSD port by adding external SRAM as shown in Figure 5-11. No external glue logic is necessary.

Figure 5-11. TMS34020/TMS34082/SRAM with Minimal SRAM Code Space (MEMCFG $=1$ )


The maximum amount of external memory directly addressable by the TMS34082 is 64 K words of program code and 64 K words of data as shown in Figure $5-13$. This comes out to 512 K bytes total. When additional memory is necessary, segmentation or paging techniques can be utilized.

Figure 5-12. TMS34020/TMS34082/SRAM with Maximum SRAM Code/Data Space (MEMCFG = L)


CC is a condition code input and may be used as an external input for branch conditions in external code. It is not used in internal instructions.

### 5.10.2 TMS34082 External SRAM Timing Analysis

When connecting external SRAM to the TMS34082 for code space and/or data space on the MSD port, the following calculations can be used in determining the total SRAM access time. These times must also include any chip select decode delays. The general formula for computing SRAM access times is:
$(1 / 2) \times t_{c(L C 1)}-t_{s u(M S D)}-t_{p(L C 1-M S A V)}=$ SRAM access speed
A description of these parameters is provided in Table 5-4.
Table 5-4. Parameters Used for Calculating SRAM Speed

| Parameter | Description |
| :---: | :---: |
| $\mathrm{t}_{\mathrm{c}(\mathrm{LC} 1)}$ | Local clock LCLK1 period: $1 / \mathrm{f}_{\text {clock }}$ |
| $\mathrm{t}_{\mathrm{Su}}$ (MSD) | Setup time: MSD data before LCLK1 high |
| $\mathrm{t}_{\mathrm{p}}($ LC1-MSAV $)$ | Propagation delay: LCLK1 to MSA valid |

The time delay incurred by inserting decode logic between the TMS34082 and external SRAM memory would be subtracted from the left side of the equation. For example, if an SN74AS32 (with a propagation delay of 6 ns maximum) is used in generating the SRAM chip enable (CE), then the SRAM access time requirements would subsequently be decreased by 6 ns.

### 5.10.3 Using External Code

Adding external memory to the MSD port allows you to write customized subroutines for your applications. External code is executed by performing a jump to subroutine command issued by the TMS34020.

The memory space is divided into a jump table and general-purpose memory for code and data, as shown in Figure 5-13. There are 32 entries into the subroutine jump table. The jump entry points start at address 0 and increment by 2. This allows two instructions (in the jump table) per subroutine. Using this memory organization, the jump table is relatively small, leaving the remaining memory to be partitioned as best suits your application.

Figure 5-13. Memory Map for External Memory


Figure 5-14 illustrates how an external subroutine would execute. The final instruction in the subroutine should be a return from subroutine (RTS). This puts the TMS34082 in an idle mode, waiting for the next instruction from the TMS34020.

Note: Before executing the final return from subroutine, the stack (SUBADDR1-0) must be empty. You may wish to save the contents of these registers in external memory. Then clear the stack pointers (bit 31) in both registers.

Figure 5-14. Example Subroutine Using the Jump Table


### 5.11 TMS34020/TMS34082/SRAM Code Example

This example describes a $3 \times 3$ by $3 \times 3$ matrix multiply routine using a subroutine stored in TMS34082 external SRAM. Data values for both matrices are stored in DRAMNRAM. Therefore, they must be fetched from memory and transferred to RA8-0 and RB8-0 (using the memory address pointers contained in TMS34020 registers B1 and B2, respectively).

Description of operation:

$$
\left[\begin{array}{lll}
\mathrm{A}_{00} & \mathrm{~A}_{01} & \mathrm{~A}_{02} \\
\mathrm{~A}_{10} & \mathrm{~A}_{11} & \mathrm{~A}_{12} \\
\mathrm{~A}_{20} & \mathrm{~A}_{21} & \mathrm{~A}_{22}
\end{array}\right] \times\left[\begin{array}{lll}
\mathrm{B}_{00} & \mathrm{~B}_{01} & \mathrm{~B}_{02} \\
\mathrm{~B}_{10} & \mathrm{~B}_{11} & \mathrm{~B}_{12} \\
\mathrm{~B}_{20} & \mathrm{~B}_{21} & \mathrm{~B}_{22}
\end{array}\right]=\left[\begin{array}{lll}
\mathrm{C}_{00} & \mathrm{C}_{01} & \mathrm{C}_{02} \\
\mathrm{C}_{10} & \mathrm{C}_{11} & \mathrm{C}_{12} \\
\mathrm{C}_{20} & \mathrm{C}_{21} & \mathrm{C}_{22}
\end{array}\right]
$$

Algorithm:

$$
\begin{aligned}
& C_{00}=A_{00} \times B_{00}+A_{01} \times B_{10}+A_{02} \times B_{20} \\
& C_{01}=A_{00} \times B_{01}+A_{01} \times B_{11}+A_{02} \times B_{21} \\
& C_{02}=A_{00} \times B_{02}+A_{01} \times B_{12}+A_{02} \times B_{22} \\
& C_{10}=A_{10} \times B_{00}+A_{11} \times B_{10}+A_{12} \times B_{20} \\
& C_{11}=A_{10} \times B_{01}+A_{11} \times B_{11}+A_{12} \times B_{21} \\
& C_{12}=A_{10} \times B_{02}+A_{11} \times B_{12}+A_{12} \times B_{22} \\
& C_{20}=A_{20} \times B_{00}+A_{21} \times B_{10}+A_{22} \times B_{20} \\
& C_{21}=A_{20} \times B_{01}+A_{21} \times B_{11}+A_{22} \times B_{21} \\
& C_{22}=A_{20} \times B_{02}+A_{21} \times B_{12}+A_{22} \times B_{22}
\end{aligned}
$$

The register file contents before the routine are:

| RA0 $=A_{00}$ | RB0 $=B_{00}$ |
| :--- | :--- |
| RA1 $=A_{01}$ | RB1 $=B_{01}$ |
| RA2 $=A_{02}$ | RB2 $=B_{02}$ |
| RA3 $=A_{10}$ | RB3 $=B_{10}$ |
| RA4 $=A_{11}$ | RB4 $=B_{11}$ |
| RA5 $=A_{12}$ | RB5 $=B_{12}$ |
| RA6 $=A_{20}$ | $R B 6=B_{20}$ |
| RA7 $=A_{21}$ | $R B 7=B_{21}$ |
| RA8 $=A_{22}$ | RB8 $=B_{22}$ |

The register file contents after the routine are:

| RA0 $=C_{00}$ | RB0 $=\mathrm{B}_{00}$ |
| :--- | :--- |
| RA1 $=\mathrm{C}_{01}$ | RB1 $=\mathrm{B}_{01}$ |
| RA2 $=\mathrm{C}_{02}$ | RB2 $=\mathrm{B}_{02}$ |
| RA3 $=\mathrm{C}_{10}$ | RB3 $=\mathrm{B}_{10}$ |
| RA4 $=\mathrm{C}_{11}$ | RB4 $=\mathrm{B}_{11}$ |
| RA5 $=\mathrm{C}_{12}$ | RB5 $=\mathrm{B}_{12}$ |
| RA6 $=\mathrm{C}_{20}$ | RB6 $=\mathrm{B}_{20}$ |
| RA7 $\mathrm{C}_{21}$ | RB7 $=\mathrm{B}_{21}$ |
| RA8 $=\mathrm{C}_{22}$ | RB8 $=\mathrm{B}_{22}$ |
| CT $=$ unknown |  |

Examples 5-7 and 5-8 are the assembly language source listings for both the TMS34020 and the TMS34082. The TMS34082 listing is for the TMS34082 external matrix multiply instructions contained in SRAM. Assume that the matrix multiply routine begins at address 3 Eh in SRAM and that the SRAM area for constants is from address FEh through FFh. The timing diagram for this example is shown in Figure 5-15.

Example 5-7. TMS34020 Assembler Listing for $3 \times 3$ by $3 \times 3$ Matrix Multiply

```
MOVEF *B1+, RA0, 9 ; move first matrix to coprocessor register file A,
    ; starting at memory address contained in 34030
    ; register B1
MOVEF *B2+, RB0, 9 ; move second matrix to coprocessor register file B,
    ; starting at register file B, memory address
    ; contained in 34020 register B2
CEXEC 0, 0000FFF ; coprocessor jump to external routine #31 decimal, at
    ; SRAM address 3Eh
```


## Example 5-8. TMS34082 Assembler Listing for $3 \times 3$ by $3 \times 3$ Matrix Multiply

```
segment code,memt.ype=0
    cjmp A, MAT
        ; jump to matrix multiply routine
MAT: ld CONFIG.i, all_pipes, 1
        ; load CONFIG register to turn on output registers (PIPES2=0)
    mult RA0.f, RBO.f, CT
        ; A00 * B00
    mult RA0.f, RB1.f, C
        ; A00 * B01
    mult.pass RAl.f, MULFB, RB3.f, CT, MULT
        ; A01 * B10, (A00 * B00) + 0
    mult.pass RA1.f, MULFB, RB4.f, CT, MULT
        ; A01 * B11, (A00 * B01) + 0
    mult.add RA2.f, MULFB, RB6.f, ALUFB, CT, ALU
        ; A02 * B20, (A01 * B10) + (A00 * B B00)
    mult.add RA2.f, MULFB, RB7.f, ALUFB, CT, ALU
        ; A02 * B21, (A01 * B11) + (A00 * B01)
    mult.add RA0.f, MULFB, RB2.f, ALUFB, RA0, ALU
        ; A00 * B02, (A02 * B20) + ((A01 * B10) + (A00 * B B00)) = C C00
    mult.add RA3.f, MULFB, RB0.f, ALUFB, RA1, ALU
```



```
    mult.pass RA1.f, MULFB, RB5.f, CT, MULT
        ; A01 * B12, (A00 * B02) + 0
    mult.pass RA4.f, MULFB, RB3.f, CT, MULT
        ; A11 * B10, (A10 * B00) + 0
    mult.add RA2.f, MULFB, RB8.f, ALUFB, CT, ALU
        ; A02 * B B2, (A01 * B B12) + (A00 * B B % )
    mult.add RA5.f, MULFB, RB6.f, ALUFB, CT, ALU
        ; A}12 * B20, (A11 * B10) + (A10 * B B00)
    mult.add RA3.f, MULFB, RB1.f, ALUFB, RA2, ALU
        ; A10 * B01, (A12 * B22) + ((A01 * B B12) + (A00 * B02 ) = C C02
    mult.add RA3.f, MULFB, RB2.f, ALUFB, RA3, ALU
        ; A}\mp@subsup{A}{10}{* * B
    mult.pass RA4.f, MULFB, RB4.f, CT, MULT
        ; A11 * B11, (A10 * B01) + 0
    mult.pass RA4.f, MULFB, RB5.f, CT, MULT
        ; A11 * B
    mult.add RA5.f, MULFB, RB7.f, ALUFB, CT, ALU
```



```
    mult.add RA5.f, MULFB, RB8.f, ALUFB, CT, ALU
        ; }\mp@subsup{A}{12}{}*\mp@subsup{B}{22}{},(\mp@subsup{A}{11}{}*\mp@subsup{B}{12}{})+(\mp@subsup{A}{10}{}* * B O2 )
    mult.add RA6.f, MULFB, RB0.f, ALUFB, RA4, ALU
```



```
    mult.add RA6.f, MULFB, RB1.f, ALUFB, RA5, ALU
        ; A}\mp@subsup{A}{20}{* * B
    mult.pass RA7.f, MULFB, RB3.f, CT, MULT
        ; A21 * B10, (A}20 * B B00) + 0 
    mult.pass RA7.f, MULFB, RB4.f, CT, MULT
        ; A21 * B11, (A20 * B}\mp@subsup{B}{01}{})+
    mult.add RA8.f, MULFB, RB6.f, ALUFB, CT, ALU
        ; A}\mp@subsup{A}{22}{* * B20, (A21 * B}\mp@subsup{B}{10}{0})+(\mp@subsup{A}{20}{}* * B00
```


## Example 5-8. TMS34082 Assembler Listing for $3 \times 3$ by $3 \times 3$ Matrix Multiply (Continued)

```
mult.add RA8.f, MULFB, RB7.f, ALUFB, CT, ALU
    ; A22 * B 
mult.add RA6.f, MULFB, RB2.f, ALUFB, RA6, ALU
    ; A20 * B02, (A22 * B20) + ((A21 * B10) + (A20 * B B00)) = C C20
mult.add RA7.f, MULFB, RB5.f, ALUFB, RA7, ALU
    ; A}21*\mp@subsup{B}{12}{\prime},(\mp@subsup{A}{22}{}*\mp@subsup{B}{21}{})+((\mp@subsup{A}{21}{}*\mp@subsup{B}{11}{})+(\mp@subsup{A}{20}{** B}\mp@subsup{B}{01}{}))=C=C2
mult.pass RA8.f, MULFB, RB8.f, CT, MULT
    ; A}\mp@subsup{A}{22 * B22, (A20 * B }{02
pass MULFB.f, RA8
    ; (A}\mp@subsup{2}{21}{* * B12) + 0
add MULFB.f, ALUFB.f, CT
    ; (A}\mp@subsup{A}{22}{* * B22) + (A20 * B B22)
nop
    ; no operation
add RA8.f, ALUFB, RA8.f
    ; (A21 * B12) + ((A22 * B22) + (A20 * B20)) = C C22
nop
    ; no operation
nop
    ; no operation
ld CONFIG.i, pipeline_only, 1
    ; load configuration register to turn off output registers (PIPES2=1)
rts
    ; return from subroutine, go to internal TMS34082 wait state
.segment data,memtype=1
    all_pipes: .data 0xFFC08
        ; CONFIG register setting for all pipeline registers enabled
    pipeline_only: .data 0xFFC28
        ; CONFIG register setting to turn off output registers only
```

Figure 5-15. $3 \times 3$ Matrix Multiply Using External SRAM for Data Space and Code Space (Mode 3)


MSD Bus ${ }^{T}$

DSICS
7
$\overline{M C E}$
7
$\overline{M O E}$
7
$\overline{M W R}$
$T$

Figure 5-15. $3 \times 3$ Matrix Multiply Using External SRAM for Data Space and Code Space (Mode 3) (Continued)


Figure 5-15. $3 \times 3$ Matrix Multiply Using External SRAM for Data Space and Code Space (Mode 3) (Continued)

$\overline{M W R}$

Figure 5-15 $3 \times 3$ Matrix Multiply Using External SRAM for Data Space and Code Space (Mode 3) (Continued)


### 5.12 Multiple TMS34082s

More than one coprocessor may be connected to the TMS34020 by setting the appropriate coprocessor ID field (CID2-0). Up to seven TMS34082s may be used with each TMS34020. See Figure 5-16. Assuming that each TMS34082 CORDY pin has a separate pull-up resistor, the TMS34020 can determine which coprocessors are present in the system by writing to and reading from TMS34082 register locations.

Figure 5-15. TMS34020 with Multiple TMS34082/SRAM Blocks (MEMCFG = L)


When CID2-0 $=100_{2}$, the TMS34020 broadcasts the instruction to all coprocessors. Broadcast reads by the TMS34082s are not permitted and are ignored.

Using the TMS34020 assembler directive called .coproc, the coprocessor ID number (between 0 and 7) may be set for generic coprocessor instructions. This directive maintains the coprocessor ID until another directive is received. An example follows where the default coprocessor ID is set to 1 and then to 0.

## Example 5-9. Assembler Code for Multiple TMS34082s

| coproc 1 | ; set the default coprocessor ID to 001 for the following |
| :--- | :--- |
|  | ; instructions |
| MPYF | RA2,RB0,RA8 |
| ADDF | RA8,RB2,RA5 |
| SQRTF | RA5,RA5 |
| COproc 0 | ; set the default coprocessor ID to 000 for the following |
|  | instructions |
| SUB | RAO, C,RA0 |
| SUB | RA1,C,RA1 |

Thus, while coprocessor 1 is still calculating its floating-point square root, coprocessor 0 is performing integer subtracts.For additional details on the assembler directives, refer to the TMS340 Family Code Generation Tools User's Guide.

## Chapter 6

## Host-Independent Mode

Operation in the host-independent mode assumes that the MSTR input signal is set high. The TMS34082 has several hardware control signals, as well as programmable features, which support system functions such as initialization, data transfer, or interrupts in host-independent mode. Details of initialization, LAD bus (LAD31-0) and MSD bus (MSD31-0) interface control, and interrupt handling are provided in this chapter.

### 6.1 Initialization

The following sections detail pin connections and initialization in host-independent mode.

### 6.1.1 Pin Connections

When operating in host-independent mode, you should connect TMS34082 pins as shown in Table 6-1.

Table 6-1. Pin Connections

| Signal Name | Description | Logic Level |
| :---: | :--- | :---: |
| SF | Special function input; not used in host-independent mode | tie low |
| RAS | Row Address Strobe; not used in host-independent mode | tie low |
| CID2-0 | Coprocessor ID; not used in host-independent mode | tie low |
| LCLK1-2 | Local clocks for coprocessor mode | tie low |
| MSTR | Host-independent/coprocessor mode select | tie high |
| EC1-0 | Emulator mode control | tie high |
| TCK | Test Clock | tie low |

### 6.1.2 Bootstrap Loader

To simplify initialization of external program memory, the TMS34082 provides a bootstrap loader. Once invoked, the loader causes the TMS34082 to read 65 words from the LAD bus and write 64 words to the external program memory on the MSD bus. The first word read is used to initialize the configuration register. The remaining words are instructions written to the code space of external memory, starting at address 0 .

To invoke the loader:

1) Set $\overline{R E S E T}$ low
2) Set $\overline{I N T R}$ low
3) After the minimum pulse duration, set $\overline{\text { RESET }}$ and $\overline{\mathbb{N T R}}$ high again

As shown in Figure 6-1, $\overline{\text { RESET must remain low while } \overline{\text { INTR }} \text { is pulled low. }}$ During the initialization, the TMS34082 is reset. Internal states and status are cleared, but data registers are not affected; the control registers return to their default values.

Figure 6-1. Bootstrap Loader


Loader operation begins on the second clock cycle after $\overline{\text { RESET }}$ and $\overline{\text { INTR }}$ return high. The first word is read into the configuration register on the rising edge of the third clock. Each successive rising edge loads an instruction word. The instruction word is output on the MSD bus one clock cycle after it is input on the LAD bus.

Once the loader is activated, an external interrupt (signaled by $\overline{N T R}$ low) is not granted until the load sequence is finished. However, RESET going low terminates the loader. When the load sequence is finished, program execution begins at external address 0 .

### 6.2 LAD Bus

In host-independent mode, the LAD bus is used to transfer data or instructions to and from the TMS34082 or the MSD bus. Instruction words may be transferred from the LAD bus to the MSD bus, but instructions cannot be input to the TMS34082 from the LAD bus. Details of LAD bus control and data input are given in the following sections.

### 6.2.1 Control Signals

Data transfers on the LAD bus are controlled primarily by the following signals:
$\overline{\mathrm{ALTCH}}$, the address write strobe
$\overline{\mathrm{CAS}}$, the memory read strobe
$\overline{W E}$, the memory write enable
The TMS34082 outputs an address during a cycle when $\overline{\text { ALTCH }}$ is low. The address may be latched externally on the rising edge of $\overline{\text { ALTCH. Because all }}$ 32 bits of the LAD bus can be used for an address, the LAD bus accesses up to 4G 32-bit words of memory.

When $\overline{W E}$ is low, data is output by the TMS34082 on the LAD bus. If multiple 32-bit words are output, $\overline{W E}$ toggles high at each rising clock edge, then returns low.

When $\overline{\text { CAS }}$ is low, the LAD bus is an input, reading data into the TMS34082. When multiple words are input, $\overline{\mathrm{CAS}}$ toggles at each rising clock edge.

If a bidirectional FIFO is used instead of memory, $\overline{\mathrm{CAS}}$ can be directly connected to the read clock and $\overline{W E}$ to the write clock. The CC input can be used to signal the TMS34082 when data is ready for input from the FIFO stack. (See Figures 6-2 and 6-3 for possible configurations.)

Figure 6-2. Using FIFOs on the LAD Bus


If $\operatorname{LADCFG}$ is set high in the configuration register, $\overline{\text { COINT defines bus cycle }}$ boundaries. If an indirect move to or from the LAD bus is coded with the $C$ bit (bit 1) set high, $\overline{\text { COINT goes low at the beginning of the move and remains }}$ low until the move is complete. COINT can be used to select a device on the LAD bus, as shown in Figure 6-2. In this case, $\overline{\text { COINT }}$ is the output enable for a FIFO.

Figure 6-3. Using $\overline{C O I N T}$ as a Device Select (LADCFG=H)


The TMS34082 only drives the LAD bus during instructions that output an address or data. The LAD bus drivers are disabled at any other time.

LOE, the LAD bus output enable, enables and disables the LAD bus. The LAD bus is placed in a high-impedance state when $\overline{\mathrm{LOE}}$ is high. However, bringing $\overline{\text { LOE }}$ low does not cause the LAD bus drivers to turn on. The instruction being executed must also enable the drivers.

If no other processors share the LAD bus, $\overline{L O E}$ may be tied low. Other wise, $\overline{L O E}$ may be used to prevent bus conflicts between the TMS34082 and other system masters.

LADCFG controls the signals affected by $\overline{L O E}$. If LADCFG is high, setting $\overline{L O E}$ high also disables $\overline{\mathrm{CAS}}$ and $\overline{\mathrm{WE}}$. When LADCFG is low, $\overline{\mathrm{COINT}}$ is a user-programmable output. $\overline{\mathrm{LOE}}$ does not affect $\overline{\mathrm{CAS}}$ or $\overline{\mathrm{WE}}$.

### 6.2.2 Immediate Data Transfers

Data input on the LAD bus can be written to data registers, control registers, or passed through for output on the MSD bus. Alternatively, the LAD bus input can be selected directly as an FPU source operand without writing to a register.

The clock period may be extended for immediate data input that does not meet the minimum data setup time. The clock is stretched by the data delay plus 5 ns . Refer to TMS34082 data sheet timing diagrams for additional information.

An FPU result can be written to a data register and passed out to the LAD bus. When this is done, the minimum clock period is extended by 15 ns (TMS34082-40) to allow for the propagation delay from the FPU core to the outputs.

Depending on the specific system implementation, transferring data to and from the LAD bus without intervening register operations can significantly improve throughput. Data moves to and from internal registers can be minimized at the cost of adjusting the clock period to assure integrity of FPU results onto the LAD bus.

### 6.3 MSD Bus

The MSD bus can be used to access either external data memory or external code memory, depending on the combination of control signals required. In the host-independent mode, the MSD bus is the source for all instructions. Data can also be transferred to or from the TMS34082 over the MSD bus, and data transfers between the LAD and MSD buses are possible.

### 6.3.1 MSD Bus Control Signals

Up to 64 K 32 -bit data operands and 64 K instructions may be directly addressed on the MSD bus. The address of memory is output on MSA15-0.

External memory operations are controlled by:
DS/ $\overline{C S}$, data space/code space select
$\overline{M C E}$, memory chip enable
$\overline{\mathrm{MOE}}$, memory output enable
$\overline{M W R}$, memory write enable
$\overline{\mathrm{MAE}}, \mathrm{MSD}$ bus output enable
When memory configuration (MEMCFG) is low, DS/ $\overline{C S}$ functions as the most significant address bit. DS/CS high selects data memory; DS/CS low selects code memory. $\overline{\mathrm{MCE}}$ is the memory chip enable for both code and data memory.

When MEMCFG is high, DS/ $\overline{C S}$ is the chip select for data memory and $\overline{M C E}$ is the chip select for code memory. This may eliminate the need for an external inverter.

The TMS34082 outputs data on the MSD bus when $\overline{M W R}$ and $\overline{M A E}$ are low. Otherwise, the device does not drive the MSD bus. If memory on the MSD bus is not shared, $\overline{\mathrm{MAE}}$ can be tied low.

If the memory on the MSD port is shared with a host processor, the $\overline{M A E}$ and RDY signals can be used to prevent conflicts between the TMS34082 and the host processor. The host processor can monitor the state of $\overline{M C E}$ (for MEMCFG low) to determine when the TMS34082 is not accessing memory. If $\overline{M C E}$ is not active, the host processor takes control of the MSD bus by asserting $\overline{M A E}$ and RDY low. Setting RDY low halts the TMS34082.

### 6.3.2 Memory Models

The TMS34082 Software Tool Kit supports three memory models: small, medium, and large.

The small memory model places the code and data in the same memory space. DS/ $\overline{C S}$ is unused. The maximum memory allowed is 64 K 32 -bit words, a combination of instructions and data.

The medium memory model uses separate data and code spaces. Up to 64 K of data words and 64 K of instructions are accessed.

The large memory model partitions the code space into banks, each containing 64 K words. External segment registers determine which bank is being accessed. Constants are stored in the same bank as the code that uses them. Variable data is stored in memory on the LAD bus. For more information on segment register requirements, see the TMS34082 Software Tool Kit User's Guide.

### 6.4 Reset

The TMS34082 is reset when the RESET input is brought low. $\overline{\text { RESET }}$ is an asynchronous signal that requires no setup or hold times with respect to the clock. However, the minimum pulse duration requirement must be met. Data registers are not affected by reset.

Upon reset, all internal states and pipeline registers are cleared. Control registers return to their default values, except for the interrupt register which is unaffected. Data registers are also not affected by reset. The state of control signals during reset is listed in Chapter 4, Table 4-10.

The TMS34082 ignores the first rising clock edge after $\overline{R E S E T}$ is returnedhigh. Program execution begins on the second cycle at address 0 . RESET is also used in conjunction with the $\mathbb{I N T R}$ signal to call a bootstrap loader. This operation is detailed in subsection 6.1.2.

### 6.5 Wait States

Setting RDY low causes the TMS34082 to stall. This input can be used to create wait states for slow memory accesses. Stalling the device does not affect any internal states or registers and output lines do not change.

In host-independent mode, LRDY can be used to stall the device. The function and timing are the same as RDY.

RDY (or LRDY) must be set low a minimum setup time before the rising clock edge you wish to inhibit. Operation resumes on the next rising clock edge after RDY (or LRDY) is set high. Again, there is a minimum setup time requirement before that clock edge.

### 6.6 User Programmable Outputs

In the host-independent mode, CORDY is a user-programmable output. If the LADCFG bit in the configuration register is low, $\overline{\text { COINT }}$ is also a user-programmable output. When LADCFG is high, COINT is used in LAD bus moves and is not programmable.

CORDY (or $\overline{\text { COINT }}$ ) is set high or low using the set mask instruction. CORDY (or COINT) remains at that setting until it is changed by another set mask instruction. $\overline{\text { COINT }}$ and CORDY are set/reset independent of each other.

### 6.7 Conditional Code Input

The CC pin is an external condition code input. A conditional jump to subroutine or conditional branch can be performed based on the state of this pin.

The CC input allows you to control program flow basedon some external status from other devices in your system. By polling this input, you can determine, for example, if a host processor has an instruction queued for the TMS34082.

### 6.8 Interrupts

The TMS34082 supports three types of interrupts in host-independent mode: hardware, software, and exception detects. Each of these has its own interrupt enable.

### 6.8.1 Hardware Interrupts

Upon power up or reset, hardware interrupts are disabled. Before enabling interrupts, the address of the interrupt handling routine should be stored in the interrupt address register. Hardware interrupts are enabled by setting INTENHW (bit 15 of the status register) high using the set mask instruction. A hardware interrupt is then signaled by setting INTR low.

When a hardware interrupt is received, the current program counter is pushed into the interrupt return register. The hardware interrupt flag, HINT (bit 4 of the status register), and interrupt grant, INTG, are set high. The interrupt mask is saved and all interrupts are disabled. The address in the interrupt vector is output to MSA15-0, causing a branch to the interrupt service routine.

After the interrupt service routine, the interrupts should be enabled again before a return from interrupt instruction is executed. Restoring the hardware interrupt clears the HINT flag and INTG.

Only one hardware interrupt may be queued. If a hardware interrupt is received while the first interrupt is being processed, the interrupt is recorded and serviced after the first interrupt sequence is finished. If a third or subsequent hardware interrupt is signaled, it will be ignored.

If a hardware interrupt is received during a multicycle instruction (such as divides, square roots, or moves), the interrupt is queued and serviced after the instruction is completed.

### 6.8.2 Software Interrupts

Upon power up or reset, software interrupts are disabled. Before enabling interrupts, the address of the interrupt handling routine should be stored in the interrupt address register. Software interrupts are enabled by setting INTENSW (bit 11 of the status register) high using the set mask instruction. An interrupt is then signaled by using the set mask instruction to send a software interrupt.

When a software interrupt is received, the current program counter is pushed into the interrupt return register. The software interrupt flag, INTFLG (bit 16 of the status register), and INTG is set high. The address in the interrupt vector is output to MSA15-0, causing a branch to the interrupt service routine.

The interrupts should be re-enabled before a return from interrupt instruction is executed. Restoring the software interrupt clears the HINT flag.

Because hardware interrupts may be queued, a hardware interrupt received while a software interrupt is being processed is recorded and serviced after the software interrupt is complete. This assumes the hardware interrupt was enabled before the software interrupt was received. If another hardware interrupt is signaled, it will be ignored.

### 6.8.3 Exception Detect Interrupts

A third type of interrupt is the exception detect interrupt. In the event of an FPU status exception in host-independent mode, the internal ED signal (bit 18 of the status register) is set high, causing an exception detect interrupt. If interrupts based on specific exceptions are not desired, the exceptions can be masked from the error detect (ED) logic by using the appropriate bits in the configuration register.

Upon power up or reset, exception detect interrupts are disabled. Before enabling interrupts, the address of the exception handling routine should be stored in the interrupt address register. Exception interrupts are enabled by setting INTENED (bit 12 of the status register) high using the set mask instruction.

When an error is detected and ED interrupts are enabled, the current program counter is pushed into the interrupt return register. ED is set high. The address in the interrupt vector is output to MSA15-0, causing a branch to the interrupt service routine.

The interrupts should be restored before a return from interrupt instruction is executed. Restoring interrupts clears the ED flag.

Because hardware interrupts may be queued, a hardware interrupt received while an exception interrupt is being processed is recorded and serviced after the first interrupt is finished. This assumes the hardware interrupt was enabled before the exception interrupt was received. If another hardware interrupt is signaled, it will be ignored.

## Chapter 7

## Internal Instructions

The TMS34082 internal instruction set includes arithmetic and logical operations, as well as complex instructions stored in an internal program ROM. Several addressing modes are available for internal instructions in addition to data types for integer, single- and double-precision floating-point formats.

In the coprocessor mode, the TMS34082 executes internal instructions through the LAD bus as shown in Figure 7-1.

Figure 7-1. Source for Internal Instructions in Coprocessor Mode


In the host-independent mode, an internal instruction can be executed by jumping to the proper internal ROM address. Chapter 8 of this manual shows the correct syntax for the JSR (jump to subroutine) and CJSR (conditional jump to subroutine) instructions.

### 7.1 Internal Instructions Overview

The TMS34082 FPU performs a wide range of internal arithmetic and logical operations, as well as complex operations (flagged ${ }^{\dagger}$ ), summarized below. Complex instructions are multicycle routines stored in the internal program ROM. These form a powerful set of primitives for graphics operations.

## One Operand Operations

| Absolute Value | 1s Complement |
| :--- | :--- |
| Square Root | 2s Complement |
| Reciprocalt $^{\dagger}$ |  |

## Conversions

Integer to Single-Precision Integer to Double-Precision
Single- to Double-Precision

## Two Operand Operations

Add
Subtract
Compare

## Matrix Operations

$4 \times 4,4 \times 4$ Multiply ${ }^{\dagger} \quad 3 \times 3,3 \times 3$ Multiply ${ }^{\dagger}$
$1 \times 4,4 \times 4$ Multiply ${ }^{\dagger}$

## Graphics Operations

Backface Testing $\dagger$
Polygon Clipping ${ }^{\dagger}$
2-D Linear Interpolation ${ }^{\dagger}$
2-D Window Compare ${ }^{\dagger}$
2-Plane Clipping (X, Y, X) ${ }^{\dagger}$
2-D Cubic Spline ${ }^{\dagger}$
Image Processing
$3 \times 3$ Convolution ${ }^{\dagger}$

## Chained Operations

Polynomial Expansion ${ }^{\dagger}$
1-D Min/Max ${ }^{\dagger}$

## Vector Operations

Add ${ }^{\dagger}$
Subtract ${ }^{\dagger}$
Magnitude ${ }^{\dagger}$
Scaling ${ }^{\dagger}$
$\dagger$ Indicates complex instructions

Single-Precision to Integer Double-Precision to Integer Double- to Single-Precision

Multiply/Accumulate ${ }^{\dagger}$
2-D Min/Max ${ }^{\dagger}$

Dot Product ${ }^{\dagger}$
Cross Product ${ }^{\dagger}$
Normalization ${ }^{\dagger}$
Reflection ${ }^{\dagger}$

Multiply
Divide
$1 \times 3,3 \times 3$ Multiply ${ }^{\dagger}$

Polygon Elimination ${ }^{\dagger}$
Viewport Scaling and Conversion ${ }^{\dagger}$
3-D Linear Interpolation ${ }^{\dagger}$
3-D Volume Compare ${ }^{\dagger}$
2-Plane Color Clipping (R, B, G, I) ${ }^{\dagger}$ 3-D Cubic Spline ${ }^{\dagger}$

The internal routines can be used in either coprocessor or host-independent mode. In coprocessor mode, the internal routines are invoked by TMS34020 instructions to its coprocessor(s). When the TMS34082 is used as a stand-alone processor, the internal microprograms can be called as subroutines by the externally stored code.

### 7.2 Complex Graphics Instructions

The internal complex instructions may be combined to form a 3-D graphics pipeline. A typical 3-D graphics pipeline includes three major operations on the input object database. The object database is first manipulated to generate normal vectors, and then transformed. The color and intensity values are also calculated. The second step involves the clipping of the objects to the viewing volume. Finally, the objects are displayed according to the rendering style selected. Figure 7-2 shows a typical 3-D graphics pipeline using the complex instructions.

Figure 7-2. 3-D Graphics Pipeline Using TMS34082 Complex Instructions


The complex instructions used in the polygon clipping mechanism can be organized into three functional groups. The first set consists of a single test (BACKF) to determine whether the polygon is forward or backward facing. The second set of instructions (CKVTXI, CKVTX) performs a test to trivially accept or reject a polygon as being visible by checking the vertex coordinates against the viewing volume. The third set of instructions (OUTC3X, OUTC3Y, OUTC3Z, CLIPFX, CLIPFY, CLIPFZ, CLIPRX, CLIPRY, CLIPRZ, CLIPCF, CLIPCR) determines whether the polygon edge crosses the viewing volume boundary and generates the new vertices and color values for the clipped polygon.

The complex instructions are implemented to make efficient use of the TMS34082 Registers and internal status is maintained throughout the clipping mechanism, thus allowing successive polygon edges to be clipped without repeated loading of vertex information. Figure 7-3 details the clipping portion of the pipeline.

Figure 7-3. 3-D Polygon Clipping Flow Chart


### 7.3 Internal Routine Addresses and Cycle Counts

External programs can call internal routines by executing a jump to subroutine with bit 16 (internal code select) set high and the address of the internal routine as the jump address. Internal routine addresses are given in Table 7-1.

The following table lists internal routines, their addresses, and the number of machine states required to complete the routine. The number in parenthesis after the machine states is the number of cycles before the next operation may begin. For example, it takes five clock cycles to complete an integer CPW (compare point to window) instruction where the status and results are valid; it would take 4 cycles after the CPW began executing before another operation to begin. In coprocessor mode, a machine state is half an LCLK1 period. Therefore, the number of LCLK1 cycles required is the number of machine states divided by 2 . In host-independent mode, a machine state is one CLK period.

These cycle counts are for mode 0 instructions only (no data transfers) after the instruction reaches the TMS34082. Only mode 0 instructions may be used in host-independent mode. In coprocessor mode, the time required to execute mode 1 and mode 2 instructions is the same as the related mode 1 instruction after both instruction and data have reached the TMS34082. The TMS34020 takes one LCLK1 cycle to output a mode 0 instruction and two (one-operand) or three (two-operand) LCKL1 cycles for a mode 1 instruction. A mode 2 instruction requires three TMS34020 LCLK1 cycles, plus one cycle for each memory transfer.

Table 7-1. Internal ROM Routines (for Mode O FPU Operations)

| Hex <br> Address | Assembler Opcode | Description | Precision | Machine States |
| :---: | :---: | :---: | :---: | :---: |
| 000 | ADD | Sum of ra and rb | integer | 2(1) |
| 001 | SUB | Subtract rb from ra | integer | 2(1) |
| 002 | CMP | Set status bits on result of ra minus rb | integer | 2(1) |
| 003 | SUB | Subtract ra from rb | integer | 2(1) |
| 004 |  | reserved |  |  |
| 005 |  | reserved |  |  |
| 006 | MOVE ${ }^{\dagger}$ | Load n FPU registers from TMS34020 GSP or its memory | integer | (see Note) |
| 007 | MOVE ${ }^{\text { }}$ | Save n FPU registers from TMS34020 GSP or its memory | integer | (see Note) |
| 008 | MPYS | Multiply ra and rb | integer | 2(1) |
| 009 | DIVS | Divide ra by rb | integer | 16(15) |
| 00A | INV | Divide 1 by rb | integer | 16(15) |
| OOB |  | reserved |  |  |
| 00C |  | reserved |  |  |
| OOD | MOVE | Move ra to rd, multiple, for $n$ registers | integer | (see Note) |
| O0E | MOVE | Move rb to rd, multiple, for n registers | integer | (see Note) |
| 00F |  | reserved |  |  |
| 010 | CPW | Compare point to window | integer | 5(4) |
| 011 | CPV | Compare point to volume | integer | 7(6) |
| 012 | BACKF | Test polygon for facing direction (backface test) | integer | 16(15) |
| 013 | INMNMX | Setup FPU registers for MNMX1 or MNMX2 instruction |  | 2(1) |
| 014 | LINTX | Linear interpolation, $X$ plane | integer | 26(25) |
| 015 | CLIPFX | Clip a line to an $X$ plane pair boundary (start w/point 1) | integer | 34(33) |
| 016 | CLIPRX | Clip a line to an $X$ plane pair boundary (start w/point 2) | integer | 34(33) |
| 017 | CLIPC | Clip color values to a plane pair boundary (start w/point 1) | integer | 27(26) |
| 018 | SCALE | Scale and convert coordinates for viewport | integer | 56(55) |
| 019 | MTRAN | Transpose a matrix | integer | 13(12) |
| 01A | CKVTX | Compare polygon vertex to a clipping volume | integer | 6(5) |
| 01B | CONV | $3 \times 3$ convolution | integer | 32(31) |
| 01 C | CLIPCR | Clip color values to a plane pair boundary (start w/point 2) | integer | 27(26) |
| 01D | OUTC3X | Compare a line to a clipping value, X plane | integer | 5(4) |
| 01E | CSPLN | Calculate cubic spline | integer | 22(21) |
| 01F |  | reserved |  |  |
| 020 | MOVE | Copy ra to rd | integer | 2(1) |
| 021 | NOT | Place 1's complement of ra in rd | integer | 2(1) |
| 022 | ABS | Place absolute value of ra in rd | integer | 2(1) |
| 023 | NEG | Place negated value of ra in rd | integer | 2(1) |
| 024 |  | reserved |  |  |
| 025 |  | reserved |  |  |

$\dagger$ Cannot be used in host-independent mode.
NOTE: Number of machine states varies, depending on the number of words moved.

Table 7-1. Internal ROM Routines (for Mode O FPU Operations) (Continued)

| Hex <br> Address | Assembler Opcode | Description | Precision | Machine States |
| :---: | :---: | :---: | :---: | :---: |
| 026 |  | reserved |  |  |
| 027 | VSCL ${ }^{\dagger}$ | Multiply vector by a scaling factor | integer | 4(3) |
| 028 | SQAR | Place (ra * ra) in rd | integer | 4(3) |
| 029 | SQRT | Extract square root of ra | integer | 20(19) |
| 02A | SQRTA | Extract square root of absolute value of ra | integer | 20(19) |
| 02B | ABORT | Stop execution of any FPU instruction | integer | 2(1) |
| 02C | CKVTX1 | Initialize check vertex instruction |  | 2(1) |
| 02D | CHECK | Check for previous instruction completion |  | 2(1) |
| 02E | MOVTSRAM ${ }^{\dagger}$ | Move data from system memory to external memory |  |  |
| 02F | MOVFSRAM ${ }^{\dagger}$ | Move data to system memory from external memory |  |  |
| 030 | POLY ${ }^{\dagger}$ | Polynomial expansion | integer | 4(3) |
| 031 | MAC ${ }^{\dagger}$ | Multiply and accumulate | integer | 4(3) |
| 032 | MNMX1 ${ }^{\dagger}$ | Determine 1-D minimum and maximum of a series | integer | 3(2) |
| 033 | MNMX2 ${ }^{\dagger}$ | Determine 2-D minimum and maximum of a series of pairs | integer | 5(4) |
| 034 | MMPYO | Multiply matrix elements 3-0 by vector element 0 | integer | 6(5) |
| 035 | MMPY1 | Multiply matrix elements 7-4 by vector element 1 | integer | 10(9) |
| 036 | MMPY2 | Multiply matrix elements 11-8 by vector element 2 | integer | 12(11) |
| 037 | MMPY3 | Multiply matrix elements 15-12 by vector element 3 | integer | 12(11) |
| 038 | MADD | Add matrix elements 15-12 to vector integer | integer | 9(8) |
| 039 | VADD | Add two vectors | integer | 4(3) |
| 03A | VSUB | Subtract a vector from a vector | integer | 4(3) |
| 03B | VDOT | Compute scalar dot product of two vectors | integer | $7(6)$ |
| 03C | VCROS | Compute cross product of two vectors | integer | 9(8) |
| 03D | VMAG | Determine the magnitude of a vector | integer | 30(29) |
| 03E | VNORM | Normalize a vector to unit magnitude | integer | 50(49) |
| 03F | VRFLCT | Given normal and incident vectors, find the reflection | integer | 16(15) |
| 080 | ADDF | Sum of ra and rb | single | 2(1) |
| 081 | SUBF | Subtract rb from ra | single | 2(1) |
| 082 | CMPF | Set status bits on result of ra minus rb | single | 2(1) |
| 083 | SUBF | Subtract ra from rb | single | 2(1) |
| 084 | ADDA | Absolute value of sum of ra and rb | single | 2(1) |
| 085 | SUBA | Absolute value of (ra minus rb) | single | 2(1) |
| 086 | MOVF | Loadn FPU registers from TMS34020 GSP or its memory | single |  |
| 087 | MOVF | Save n FPU registers from TMS34020 GSP or its memory | single |  |
| 088 | MPYF | Multiply ra and rb | single | 2(1) |
| 089 | DIVF | Divide ra by rb | single | 7(6) |
| 08A | INVF | Divide 1 by rb | single | 7(6) |
| 08B | ASUBA | Absolute value of ra minus absolute value of rb | single | 2(1) |
| 08C |  | reserved |  |  |

$\dagger$ Cannot be used in host-independent mode.
NOTE: Number of machine states varies, depending on the number of words moved.

Table 7-1. Internal ROM Routines (for Mode O FPU Operations) (Continued)

| Hex <br> Address | Assembler Opcode | Description | Precision | Machine States |
| :---: | :---: | :---: | :---: | :---: |
| 08D | MOVEF ${ }^{\dagger}$ | Move ra to rd, multiple, for $n$ registers | single | (see Note) |
| 08E | MOVEF ${ }^{\dagger}$ | Move ra to rd, multiple, for $n$ registers | single | (see Note) |
| 08F |  | reserved |  |  |
| 090 | CPWF | Compare point to window | single | 5(4) |
| 091 | CPVF | Compare point to volume | single | 7(6) |
| 092 | BACKFF | Test polygon for facing direction (backface test) | single | 16(15) |
| 093 | INMNMXF | Setup FPU registers for MNMX1 and MNMX2 | single | 2(1) |
| 094 | LINTXF | Linear interpolation, X plane | single | 17(16) |
| 095 | CLIPFXF | Clip a line to an X plane pair boundary (start w/point 1) | single | 25(24) |
| 096 | CLIPRXF | Clip a line to an X plane pair boundary (start w/point 2) | single | 25(24) |
| 097 | CLIPCF | Clip color values to a plane pair boundary (start w/point 1) | single | 18(17) |
| 098 | SCALEF | Scale and convert coordinates for viewport | single | 21(20) |
| 099 | MTRANF | Transpose a matrix | single | 13(12) |
| 09A | CKVTXF | Compare polygon vertex to a clipping volume | single | 6(5) |
| 09B | CONVF | $3 \times 3$ convolution | single | 17(16) |
| 09C | CLIPCRF | Clip color values to a plane pair boundary (start w/point2) | single | 18(17) |
| 09D | OUTC3XF | Compare a line to a clipping value, X plane | single | 5(4) |
| 09E | CSPLNF | Calculate cubic spline | single | 22(21) |
| 09F |  | reserved |  |  |
| OAO | MOVE | copy ra to rd | single | 2(1) |
| 0A1 | NOT | Place 1's complement of ra in rd | single | 2(1) |
| OA2 | ABS | Place absolute value of ra in rd | single | 2(1) |
| OA3 | NEG | Place negated value of ra in rd | single | 2(1) |
| OA4 | CVFD | Convert single-precision to double-precision | single | 2(1) |
| OA5 | CVFI | Convert single-precision to integer | single | 2(1) |
| OA6 | CVIF | Convert integer to single-precision | single | 2(1) |
| 0A7 | VSCLF ${ }^{\dagger}$ | Multiply vector by a scaling factor | single | 4(3) |
| 0A8 | SQARF | Place (ra * ra) in rd | single | 4(3) |
| OA9 | SQRTF | Extract square root of ra | single | 10(9) |
| OAA | SQRTAF | Extract square root of absolute value of ra | single | 10(9) |
| OAB | ABORT | Stop execution of any FPU instruction |  | 2(1) |
| OAC | CKVTX1 | Initialize check vertex instruction |  | 2(1) |
| OAD | CHECK | Check for previous instruction completion |  | 2(1) |
| OAE | MOVTSRAM ${ }^{\dagger}$ | Move data from system memory to external memory |  |  |
| OAF | MOVFSRAM ${ }^{\dagger}$ | Move data to system memory from external memory |  |  |
| OBO | POLYF ${ }^{\text {f }}$ | Polynomial expansion | single | 4(3) |
| OB1 | MACF ${ }^{\dagger}$ | Multiply and accumulate | single | 4(3) |
| OB2 | MNMX1F ${ }^{\dagger}$ | Determine 1-D minimum and maximum of a series | single | 3(2) |
| 0B3 | MNMX2F ${ }^{\dagger}$ | Determine 2-D minimum and maximum of a series of pairs | single | 5(4) |

$\dagger$ Cannot be used in host-independent mode.
NOTE: Number of machine states varies, depending on the number of words moved.

## Table 7-1. Internal ROM Routines (for Mode O FPU Operations) (Continued)

| Hex <br> Address | Assembler Opcode | Description | Precision | Machine States |
| :---: | :---: | :---: | :---: | :---: |
| OB4 | MMPYOF | Multiply matrix elements 3-0 by vector element 0 | single | 6(5) |
| 085 | MMPY1F | Multiply matrix elements 7-4 by vector element 1 | single | 10(9) |
| 0B6 | MMPY2F | Multiply matrix elements 11-8 by vector element 2 | single | 12(11) |
| 0B7 | MMPY3F | Multiply matrix elements 15-12 by vector element 3 | single | 12(11) |
| 0B8 | MADDF | Add matrix elements 15-12 to vector | single | 9(8) |
| OB9 | VADDF | Add two vectors | single | 4(3) |
| OBA | VSUBF | Subtract a vector from a vector | single | 4(3) |
| OBB | VDOTF | Compute scalar dot product of two vectors | single | 7(6) |
| OBC | VCROSF | Compute cross product of two vectors | single | 9(8) |
| OBD | VMAGF | Determine the magnitude of a vector | single | 20(19) |
| OBE | VNORMF | Normalize a vector to unit magnitude | single | $31(30)$ |
| OBF | VRFLCTF | Given normal and incident vectors, find the reflection | single | 16(15) |
| 0 CO | ADDD | Sum of ra and rb | double | 2(1) |
| 0 C 1 | SUBD | Subtract rb from ra | double | 2(1) |
| 0C2 | CMPD | Set status bits on result of ra minus rb | double | 2(1) |
| 0C3 | SUBD | Subtract ra from rb | double | 2(1) |
| 0 C 4 | ADDA | Absolute value of sum of ra and rb | double | 2(1) |
| 0 C 5 | SUBA | Absolute value of (ra minus rb) | double | 2(1) |
| 0C6 | MOVD ${ }^{\dagger}$ | Load $n$ FPU registers from TMS34020 GSP or its memory | double | (see Note) |
| 0 C 7 | MOVD ${ }^{\dagger}$ | Save $n$ FPU registers from TMS34020 GSP or its memory | double | (see Note) |
| $0 \mathrm{C8}$ | MPYD | Multiply ra and rb | double | 3(2) |
| 0C9 | DIVD | Divide ra by rb | double | 13(12) |
| OCA | INVD | Divide 1 by rb | double | 13(12) |
| OCB | ASUBA | Absolute value of ra minus absolute value of rb | double | 2(1) |
| OCC |  | reserved |  |  |
| OCD | MOVD ${ }^{\dagger}$ | Move ra to rd, multiple, for n registers | double | (see Note) |
| OCE | MOVD ${ }^{\dagger}$ | Move rb to rd, multiple, for n registers | double | (see Note) |
| OCF |  | reserved |  |  |
| ODO | CPWD | Compare point to window | double | 5(4) |
| OD1 | CPVD | Compare point to volume | double | 7(6) |
| OD2 | BACKFD | Test polygon for facing direction (backface test) | double | 25(24) |
| 0D3 | INMNMXD | Setup FPU registers for MNMX1 and MNMX2 | double | 2(1) |
| OD4 | LINTXD | Linear interpolation, $X$ plane | double | 26(25) |
| 0D5 | CLIPFXD | Clip a line to an $X$ plane pair boundary (start w/point 1) | double | 35(34) |
| 0D6 | CLIPRXD | Clip a line to an $X$ plane pair boundary (start w/point 2) | double | 35(34) |
| OD7 | CLIPCD | Clip color values to a plane pair boundary (start w/point 1) | double | 28(27) |
| 0D8 | SCALED | Scale and convert coordinates for viewport | double | 33(32) |
| 0D8 | MTRAND | Transpose a matrix | double | 13(12) |
| ODA | CKVTXD | Compare polygon vertex to a clipping volume | double | 6(5) |

$\dagger$ Cannot be used in host-independent mode.
NOTE: Number of machine states varies, depending on the number of words moved.

Table 7-1. Internal ROM Routines (for Mode O FPU Operations) (Continued)

| Hex Address | Assembler Opcode | Description | Precision | Machine States |
| :---: | :---: | :---: | :---: | :---: |
| ODB | CONVD | $3 \times 3$ convolution | double | 29(30) |
| ODC | CLIPCRD | Clip color values to a plane pair boundary (start w/point 1) | double | 31(30) |
| ODD | OUTC3XD | Compare a line to a clipping value, X plane | double | 5(4) |
| ODE | CSPLND | Calculate cubic spline | double | $31(30)$ |
| ODF |  | reserved |  |  |
| 0E0 | MOVE | Copy ra to rd | double | 2(1) |
| OE1 | NOT | Place 1's complement of ra in rd | double | 2(1) |
| OE2 | ABS | Place absolute value of ra in rd | double | 2(1) |
| 0E3 | NEG | Place negated value of ra in rd | double | 2(1) |
| OE4 | CVDF | Convert double-precision to single-precision | double | 2(1) |
| 0E5 | CVDI | Convert double-precision to integer | double | 2(1) |
| 0E6 | CVID | Convert integer to double-precision | double | 2(1) |
| 0E7 | VSCLD ${ }^{\dagger}$ | Multiply vector by a scaling factor | double | 7(6) |
| 0E8 | SQARD | Place (ra * ra) in rd | double | 5(4) |
| 0E9 | SQRTD | Extract square root of ra | double | 16(15) |
| OEA | SQRTAD | Extract square root of absolute value of ra | double | 16(15) |
| OEB | ABORT | Stop execution of any FPU instruction |  | 2(1) |
| OEC | CKVTX1 | Initialize check vertex instruction |  | 2(1) |
| OED | CHECK | Check for previous instruction completion |  | 2(1) |
| OEE |  | reserved |  |  |
| OEF |  | reserved |  |  |
| OFO | POLYD ${ }^{\dagger}$ | Polynomial expansion | double | 5(4) |
| 0F1 | MACD ${ }^{\dagger}$ | Multiply and accumulate | double | 5(4) |
| 0F2 | MNMX1D ${ }^{\dagger}$ | Determine 1-D minimum and maximum of a series | double | 3(2) |
| OF3 | MNMX2D ${ }^{\dagger}$ | Determine 2-D minimum and maximum of a series of pairs | double | 5(4) |
| 0F4 | MMPYOD | Multiply matrix elements 3-0 by vector element 0 | double | 11(10) |
| OF5 | MMPY1D | Multiply matrix elements $7-4$ by vector element 1 | double | 14(13) |
| 0F6 | MMPY2D | Multiply matrix elements 11-8 by vector element 2 | double | 16(15) |
| 0F7 | MMPY3D | Multiply matrix elements 15-12 by vector element 3 | double | 16(15) |
| 0F8 | MADDD | Add matrix elements 15-12 to vector | double | 9(8) |
| OF9 | VADDD | Add two vectors | double | 4(3) |
| OFA | VSUBD | Subtract a vector from a vector | double | 4(3) |
| OFB | VDOTD | Compute scalar dot product of two vectors | double | 10(9) |
| OFC | VCROSD | Compute cross product of two vectors | double | 15(14) |
| OFD | VMAGD | Determine the magnitude of a vector | double | 29(28) |
| OFE | VNORMD | Normalize a vector to unit magnitude | double | 49(48) |
| OFF | VRFLCTD | Given normal and incident vectors, find the reflection | double | 23(22) |
| 114 | LINTY | Linear interpolation, Y plane | integer | 26(25) |

$\dagger$ Cannot be used in host-independent mode.
NOTE: Number of machine states varies, depending on the number of words moved.

Table 7-1. Internal ROM Routines (for Mode O FPU Operations) (Continued)

| Hex <br> Address | Assembler <br> Opcode | Description | Precision | Machine <br> States |
| :---: | :--- | :--- | :--- | :---: |
| 115 | CLIPFY | Clip a line to an Y plane pair boundary (start w/point 1) | integer | $34(33)$ |
| 116 | CLIPRY | Clip a line to an Y plane pair boundary (start w/point 2) | integer | $34(33)$ |
| 11 D | OUTC3Y | Compare a line to a clipping value, Y plane | integer | $5(4)$ |
| 194 | LINTYF | Linear interpolation, Y plane | single | $17(16)$ |
| 195 | CLIPFYF | Clip a line to an Y plane pair boundary (start w/point 1) | single | $25(24)$ |
| 196 | CLIPRYF | Clip a line to an Y plane pair boundary (start w/point 2) | single | $25(24)$ |
| 19 D | OUTC3YF | Compare a line to a clipping value, Y plane | single | $5(4)$ |
| $1 D 4$ | LINTYD | Linear interpolation, Y plane | double | $17(16)$ |
| $1 D 5$ | CLIPFYD | Clip a line to an Y plane pair boundary (start w/point 1) | double | $25(24)$ |
| $1 D 6$ | CLIPRYD | Clip a line to an Y plane pair boundary (start w/point 2) | double | $25(24)$ |
| $1 D D$ | OUTC3YD | Compare a line to a clipping value, Y plane | double | $5(4)$ |
| 214 | LINTZ | Linear interpolation, Z plane | integer | $26(25)$ |
| 215 | CLIPFZ | Clip a line to an Z plane pair boundary (start w/point 1) | integer | $34(33)$ |
| 216 | CLIPRZ | Clip a line to an Z plane pair boundary (start w/point 2) | integer | $34(33)$ |
| $21 D$ | OUTC3Z | Compare a line to a clipping value, Z plane | integer | $5(4)$ |
| 294 | LINTZF | Linear interpolation, Z plane | single | $17(16)$ |
| 295 | CLIPFZF | Clip a line to an Z plane pair boundary (start w/point 1) | single | $25(24)$ |
| 296 | CLIPRZF | Clip a line to an Z plane pair boundary (start w/point 2) | single | $25(24)$ |
| $29 D$ | OUTC3ZF | Compare a line to a clipping value, Z plane | single | $5(4)$ |
| $2 D 4$ | LINTZD | Linear interpolation, Z plane | double | $17(16)$ |
| $2 D 5$ | CLIPFZD | Clip a line to an Z plane pair boundary (start w/point 1) | double | $25(24)$ |
| $2 D 6$ | CLIPRZD | Clip a line to an Z plane pair boundary (start w/point 2) | double | $25(24)$ |
| $2 D D$ | OUTC3ZD | Compare a line to a clipping value, Z plane | double | $5(4)$ |

$\dagger$ Cannot be used in host-independent mode.
NOTE: Number of machine states varies, depending on the number of words moved.

### 7.4 Coprocessor Mode Internal Instruction Format

The format of the TMS34082 instruction in coprocessor mode is shown below. The instruction is issued by the TMS34020 via the LAD bus.

| 31 | 28 | 24 | 20 | 15 | 13 | 8 | 7 | 6 | 6 | 5 | 0 |
| :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| ID | ra | rb | rd | md | fpuop | type | size | 0 | 1 | 00000 |  |

### 7.4.1 Coprocessor ID Field

The 3-bit ID field identifies which coprocessor the instruction is intended for. This coprocessor ID corresponds to the settings of the CID2-0 pins. To broadcast an instruction to all coprocessors, the ID field is set to 4 . The TMS34020 documentation recommends the coprocessor ID assignments shown below. However, both the TMS34020 and TMS34082 support using up to seven TMS34082s per TMS34020.

The assembler defaults to an ID of $\mathrm{00O}_{2}$. To define another ID as the current ID, use the coprocessor assembler directive.

## Table 7-2. Coprocessor IDs

| ID | Coprocessor | ID | Coprocessor |
| :---: | :---: | :---: | :--- |
| 000 | FPU 0 | 100 | FPU broadcast |
| 001 | FPU 1 | 101 | Reserved (or FPU 4) |
| 010 | FPU 2 | 110 | Reserved (or FPU 5) |
| 011 | FPU 3 | 111 | User defined (or FPU 6) |

### 7.4.2 Register Field

The ra, rb, and rd fields are for the two sources ( A and B ) and destination within the FPU. For most two-operand instructions, one operand must come from each register file. Register addresses were listed in Table 4-3. For the ra and rb fields, only the four least significant bits of the register address are used. Some multi-operand instructions redefine the ra, rb, and rd field.

Valid values for registers operands are:
ra: RA0-RA9 (also, C, and CT following rules below)
rb: RB0-RB9 (also, C, and CT following rules below)
rd: RAO-RA9 RB0-RB9, C, and CT
NOTE: Although the TMS34020 assembler only allows the above registers as destinations, the TMS34082 will accept any register address as a destination.

The following is a list of rules for using the C and CT registers as operands:

1) Do not use C or CT as source operands in any mode 1 or 2 ("Load and.") instructions.
2) Do not use C or CT in any MOVE, MOVD, or MOVF instructions. If it is necessary to move a value to or from the C or CT register, use the PASS, PASSF, or PASSD instruction (depending on the type of number in C or CT). C and CT are legal operands for the PASSx instructions. However, the type of number in C or CT must match the type (integer single-, or double-precision) of the PASSx instruction.
3) Do not use C or CT as source operands for integer divide (DIVS), integer inverse (INV), convert integer to single-precision (CVIF) or convert integer to double-precision (CVID) instructions.
4) For instructions requiring two source operands, C or CT can be used as both operands, but cannot be used together in the same instruction.

### 7.4.3 Addressing Mode Field

Four addressing modes are defined for the TMS34082. The md field indicates the addressing mode. Each addressing mode corresponds to one or two general-purpose TMS34020 coprocessor commands. Specific TMS34082 instructions are created by specifying the fields of the internal instruction as shown above.

## Table 7-3. Addressing Modes

| Mode | md <br> Field | Operation | General <br> TMS34020 <br> Coprocessor <br> Command |
| :---: | :---: | :--- | :---: |
| 0 | 00 | FPU internal operations with no jumps or external moves | CEXEC |
| 1 | 01 | Transfer instruction and data to/from TMS34020 registers | CMOVGC, <br> CMOVCG |
| 2 | 10 | Transfer instruction and data to/from memory (controlled by <br> TMS34020) on LAD bus | CMOVMC, <br> CMOVCM |
| 3 | 11 | Jump to external instructions in TMS34082 external <br> memory | CEXEC |

### 7.4.4 FPU Operation Field

The fpuop field tells the TMS34082 which operation (such as addition or subtraction) or complex instructions (such as clipping) to perform. Sometimes the rb field is also used to specify the operation. A list of instructions and their associated fpuop field is given in the TMS34082A Data Sheet (Appendix B).

### 7.5 Type, Size, and I Fields

The type and size bits identify the type of operand, as shown in Table 7-4. The I bit is used to indicate to the coprocessor that this is a 'reissue' of a coprocessor instruction due to a bus interruption. The least significant four bits are the bus status bits, which will all be zero to indicate a coprocessor cycle.

Table 7-4. Operand Types

| Type | Size | Operand Type |
| :---: | :---: | :--- |
| 0 | 0 | 32-bit Integer |
| 0 | 1 | Reserved |
| 1 | 0 | Single-precision floating-point (32-bit) |
| 1 | 1 | Double-precision floating-point (64-bit) |

### 7.6 Internal Instruction Opcodes

Details of each internal routine follow. The routines are listed alphabetically by their TMS34020 assembler opcodes.

Sets of related instructions (same operation, different operand types) are listed together. Sets begin on a new page and may contain the following information.

Syntax: Shows you how to enter an instruction. Each valid operand type is listed, along with its syntax. Bold text should be entered as shown. Italic text represents a symbol that tells what type of information should be entered. These symbols are further described in the operand section.

Execution: Illustrates the effects of execution on TMS34020 and TMS34082 registers and memory. The shaded portion represents steps that are executed for double-precision instructions only.

TMS34020 Instruction Words: Shows the object code generated for an instruction. This is the instruction to the TMS34020. In this instruction, transfers is the number of 32-bit words moved across the LAD bus. Transfers will generally be the number of operands for an integer or single-precision instruction. For a double-precision instruction, transfers is twice the number of operands.

TMS34082 Instruction Word: Shows the command generated by the TMS34020 that is sent (via the LAD bus) to the TMS34082. In this word, $t$ and $s$ are used to specify the type and size bits, respectively.

Operands: Explains the symbols used in the syntax section. Implied operands are values that must be in the appropriate register(s) before the instruction is executed. The following symbols are used as operands:

Rs, $\mathrm{Rs}_{1}, \mathrm{Rs}_{2} \quad$ TMS34020 source register(s)
$\mathrm{Rd}, \mathrm{Rd}_{1}, \mathrm{Rd}_{2} \quad$ TMs34020 destination register(s)
CRs, $\mathrm{CRs}_{1}, \mathrm{CRs}_{2}$

CRd
TMS34082 source register. Must be from the RA or RB register files, C, or CT. See the restrictions on the use of $C$ and CT given in subsection 7.3.2.

Unless otherwise noted, $C$ or CT may be substituted for RA or RB registers in any instruction which does not require data transfers to/from the TMS34020 or memory.

Description: Discusses the purpose of the instruction and any other general information related to it.

Algorithm: Illustrates the operations performed in a multicycle, complex instruction. The shaded portion represents steps that are executed for double-precision instructions only.

Temporary Storage: Lists registers that are used in complex instructions. Any value stored inthese registers prior to instruction execution will belost.

Outputs: Lists the registers that contain the result(s) of the complex instruction.

Instruction Type: Shows the type of TMS34020 coprocessor instruction. The TMS34020 has several general-purpose coprocessor instructions that are used to create the specific TMS34082 instructions.

Examples: Illustrates the correct syntax for a specific instruction and describes the effects of the instruction on memory and registers using various sets of data.

Not all topics are included for each instruction set. Each set contains at least the Syntax, Execution or Algorithm, both Instruction Words, and the Description sections.

## Syntax

## Execution

'34020
Instruction Words

Instruction to '34082

## Description

ABORT
Halts TMS34082

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 |
| 1 D |  |  | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |


| 31 | 29 |  | 0 |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | ---: |
| ID | 0 | 0001 | 0110 | 0000 | 0001 | 1110 | 0000 | 0000 |

This instruction will cancel all activity within the TMS34082, returning the FPU to an inactive state. Any time this instruction is present on a coprocessor cycle with a valid coprocessor ID, the addressed TMS34082 will ABORT all internal processing activity immediately. Block moves will be aborted before completion of the last move.

CEXEC, short

| Syntax | Type |  |  |  | Syntax |  |  |  |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  | Integer |  |  |  | ABS |  | CRs, CRd |  |  |  |  |  |  |  |  |
|  | Double-Precision |  |  |  | ABSD |  | CRs, CRd |  |  |  |  |  |  |  |  |
|  | Single-Precision |  |  |  | ABSF |  | CRs, CRd |  |  |  |  |  |  |  |  |
| Execution | $\|\mathrm{CRs}\| \rightarrow \mathrm{CRd}$ |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| '34020 <br> Instruction Words | 15 <br> 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|  | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | type | size |
|  | ID |  | CRs |  |  |  | 0 | 0 | 1 | 0 | CRd |  |  |  |  |
| Instruction to '34082 | $31 \quad 2928$ |  | 2524 |  |  | 2120 |  | 1615 |  |  |  |  |  |  | 0 |
|  | ID | CR |  |  | 10 |  | CRd |  | 00 | 1 | 11 | t | s00 | 0 | 00 |
| Operands | CRs | MS3 | 4082 | RA | sour | e r | giste | C | tain | ng | e op | ra |  |  |  |
|  | CRd | MS3 | 4082 | des | tinat | O | egist |  |  |  |  |  |  |  |  |
| Description | ABSx tak CRd. | s the |  |  | val | ue | $f$ the | on |  |  |  |  | res | he re | ult in |
|  | The sour | e reg | giste | , CR | s, m | ust | be in | e | A re | gist | file |  |  |  |  |
| Instruction Type | CEXEC, | hort |  |  |  |  |  |  |  |  |  |  |  |  |  |
| Example | ABS RA6 | , RB |  |  |  |  |  |  |  |  |  |  |  |  |  |
|  | This exa stores th | ple <br> inte | tak | th <br> esult | abs |  | e val | e | the | inte |  | onte | nts | $f R A$ | and |

Type Integer
Double-Precision
Single-Precision
$\mathrm{Rs}_{1} \rightarrow$ CRs

$\mid$ CRs $\mid \rightarrow$ CRd

## Syntax

ABS Rs ${ }_{1}$, CRs, CRd
ABSD $R s_{1}, R s_{2}$, CRs, CRd
ABSF Rs ${ }_{1}$, CRs, $C R d$
Syntax
Execution
'34020
Instruction Words

Operands

Description
Instruction to '34082

Destipn
Instruction Words

Integer or Single-Precision:

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | R | Rs 1 |  |  |  |
| 0 | 1 | 0 | 1 | 1 | 1 | 1 | type | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| ID |  |  | CRs |  |  |  | 0 | 0 | 1 | 0 | CRd |  |  |  |  |

Double-Precision:


Instruction Type

## Example


$R_{1} \quad$ TMS34020 source register for the integer or single-precision operand to the TMS34082 (or half of the 64-bit value for double-precision operands)
$\mathrm{Rs}_{2}$ TMS34020 source register for the remaining half of the 64-bit double-precision floating-point value to TMS34082.

CRs TMS34082 RA register to contain the 32-bit integer operand
CRd TMS34082 destination register
ABSx loads the contents of $\mathrm{Rs}_{1}$ (and $\mathrm{Rs}_{2}$ for double-precision values) into CRs, takes the absolute value of the contents of CRs, and stores the result in CRd.

The TMS34082 source register, CRs, must be in the RA register file. CMOVGC, one or two registers

ABSF A5, RA6, RB7
This example loads the single-precision contents of TMS34020 register A5 into TMS34082 register RA6, takes the absolute value of the contents of RA6, and stores the single-precision result in RB7.





| Syntax | Type | Syntax |
| :---: | :---: | :---: |
|  | Integer Double-Precision Single-Precision | $\begin{aligned} & \text { ADD } * R s+, C R s_{1}, C R s_{2}, C R d \\ & \text { ADDD } * R s+, C R s_{1}, C R s_{2}, C R d \\ & \text { ADDF } * R s+, C R s_{1}, C R s_{2}, C R d \end{aligned}$ |
| Execution | *Rs $\rightarrow \mathrm{CRs}_{1}$ |  |
|  | $\mathrm{Rs}+32 \rightarrow \mathrm{Rs}$ |  |
|  |  |  |
|  | RS + $32=$ Rs |  |
|  | $\star \mathrm{Rs} \rightarrow \mathrm{CRs}_{2}$ |  |
|  | $\mathrm{Rs}+32 \rightarrow \mathrm{Rs}$ |  |
|  | RRS $=$ CRS ${ }_{\text {d }}$ |  |
|  | RS $+32=$ Frs |  |
|  | $\mathrm{CRs}_{1}+\mathrm{CRs}_{2} \rightarrow \mathrm{CRd}$ |  |

'34020
Instruction Words

Instruction to '34082

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | transfers |  |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | type | size | 0 | 0 | R |  |  |  |
| ID |  |  | CRs 1 |  |  |  | CRs2 |  |  |  | CRd |  |  |  |


| 31 | 29 | 28 | 24 | 2120 |  | 15 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| ID | $\mathrm{CRs}_{1}$ | $\mathrm{CRs}_{2}$ | CRd | 1000 | 000 t | 5000 | 0000 |

Rs TMS34020 register containing the memory address
CRs $_{1}$ TMS34082 register to contain the first operand
$\mathrm{CRs}_{2}$ TMS34082 register to contain the second operand
CRd TMS34082 destination register
ADDx loads the contents of memory pointed to by $R s$ into $\mathrm{CRs}_{1}$ and $\mathrm{CRs}_{2}$, adds the contents of $\mathrm{CRs}_{1}$ and $\mathrm{CRs}_{2}$, and stores the result in CRd. After each load from memory, Rs is incremented by 32.

The two TMS34082 source registers, $\mathrm{CRs}_{1}$ and $\mathrm{CRs}_{2}$, must be in opposite register files.

CMOVMC, postincrement, constant count
ADD *A5+, RA5, RB6, RB7
This example loads memory starting at the address given by TMS34020 register A5 into TMS34082 registers RA5 and RB6, adds the integer values from RA5 and RB6, and stores the result in RB7.



Syntax
Execution
'34020 Instruction Words

Instruction to '34082

Operands

Description

Instruction Type
Example

ADDAF $R s_{1}$, Rs $_{2}$, CRs $_{1}$, CRs $_{2}$, CRd
$\mathrm{Rs}_{1} \rightarrow \mathrm{CRs}_{1}$
$\mathrm{Rs}_{2} \rightarrow \mathrm{CRs}_{2}$
$\left|\mathrm{CRs}_{1}+\mathrm{CRs}_{2}\right| \rightarrow \mathrm{CRd}$


$\mathrm{Rs}_{1} \quad$ TMS34020 source register for first 32-bit single-precision floatingpoint value to TMS34082
$\mathrm{Rs}_{2}$ TMS34020 source register for second 32-bit single-precision floating-point value to TMS34082
$\mathrm{CRs}_{1}$ TMS34082 register to contain the first single-precision operand
$\mathrm{CRs}_{2}$ TMS34082 register to contain the second single-precision operand
CRd TMS34082 destination register
ADDAF loads the contents or $\mathrm{Rs}_{1}$ and $\mathrm{Rs}_{2}$ into $\mathrm{CRs}_{1}$ and $\mathrm{CRs}_{2}$ respectively, takes the absolute value of the sum of $\mathrm{CRs}_{1}$ and $\mathrm{CRs}_{2}$, and stores the result in CRd.
$\mathrm{CRs}_{1}$ and $\mathrm{CRs}_{2}$, the two TMS34082 source registers, must be in opposite register files.

The integer and double-precision floating-point forms of this instruction are not supported.

CMOVGC, two registers
ADDAF A5, A9, RA7, RB9, RB0
This example loads the contents of TMS34020 registers A5 and A9 into TMS34082 registers RA7 and RB9 respectively, adds the contents of RA7 and RB9, takes the absolute value, and stores the result in RB0.
Syntax
Execution

'34020
Instruction Words

## Operands

## Description

Instruction Type

## Example



| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 |  | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | transfers |  |
| 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | size | 0 | 0 | R |  |  |  |
| ID |  |  | CRs 1 |  |  |  | CRs2 |  |  |  | CRd |  |  |  |


| 31 | 2524 |  | 2120 | 1615 |  |  | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| ID | CRs ${ }_{1}$ | $\mathrm{CRs}_{2}$ | CRd | 1000 | 1001 | s 000 | 0000 |

Rs TMS34020 register containing the memory address

CRs $_{1}$ TMS34082 register to contain the first operand
$\mathrm{CRs}_{2}$ TMS34082 register to contain the second operand
CRd TMS34082 destination register
ADDAx loads the contents of memory pointed to by $R s$ into $\mathrm{CRs}_{1}$ and $\mathrm{CRs}_{2}$, adds the contents of $\mathrm{CRs}_{1}$ and $\mathrm{CRs}_{2}$, takes the absolute value, and stores the result in CRd. After each load from memory, Rs is incremented by 32.
$\mathrm{CRs}_{1}$ and $\mathrm{CRs}_{2}$, the two TMS34082 source registers, must be in opposite register files.

The integer form of this operation is not supported.
CMOVMC, postincrement, constant count
ADDAD *A5+, RA7, RB9, RB0
This example loads memory starting at the address given by TMS34020 register A5 into TMS34082 registers RA7 and RB9, adds the double-precision contents of RA7 and RB9, takes the absolute value, and stores the result in RBO.



Syntax
Execution

'34020
Instruction Words

Instruction to '34082

## Operands

## Description

Instruction Type
CMOVMC, postincrement, constant count
Example
ASUBAD *A3+, RA7, RB3, RB1
This example loads memory starting at the address given by TMS34020 register A3 into TMS34082 registers RA7 and RB3, subtracts the absolute value of the contents of RB3 from the absolute value of RA7, and stores the result in RB1.

Syntax
'34020
Instruction Words

Instruction to '34082

## Description

Implied Operands

| Type | Syntax |
| :--- | :--- |
| Integer | BACKF |
| Double-Precision | BACKFD |
| Single-Precision | BACKFF |


| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | type | size |
| ID |  |  | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |


| 31 | 29 |  |  |  | 0 |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 10 | 0 | 0000 | 0000 | 0000 | 0010 | $010 t$ | $s 000$ | 0000 |

A convex polygon is tested to determine whether it is facing the current view area or if it is facing away from the current view area. This allows the elimination of polygons that do not need to be drawn in the current image. The first three vertices of the polygon are entered and tested as to rotation direction. If the direction is clockwise (forward facing), the polygon is visible; if the direction is counterclockwise (backward facing), then the polygon is invisible. This instruction also detects the case where the plane defined by the three points passes through the viewing point (position) of the eye. In this case, the polygon may be drawn as a line or ignored. The algorithm assumes that all of the vertices of the polygon lie on the plane defined by the first three vertices.
$R A O=X 0$,
$R A 1=Y 0$,
$R A 2=Z 0$,
$\mathrm{RA} 3=\mathrm{W} 0$
RA4 $=\mathrm{X} 1$,
$R A 5=\mathrm{Y} 1$,
RA6 $=\mathrm{Z} 1$,
$\mathrm{RA} 7=\mathrm{W} 1$
$R B 0=X 2$,
$R B 1=Y 2$,
RB2 $=\mathrm{Z} 2$,
$\mathrm{RB} 3=\mathrm{W} 2$
where $\mathrm{Xn}, \mathrm{Yn}, \mathrm{Zn}, \mathrm{Wn}$ are the coordinates of vertex Vn , already stored in the coprocessor registers.

| Algorithm | $C=R B 1 \times R A 3$ | ; Y $2 \times W 0$ |
| :---: | :---: | :---: |
|  | $\mathrm{C}=\mathrm{C}-(\mathrm{RA1} \times \mathrm{RB} 3)$ | ; $\mathrm{Y} 2 \times \mathrm{W} 0)-(\mathrm{Y} 0 \times \mathrm{W} 2)$ |
|  | $\mathrm{RB8}=\mathrm{C} \times$ RA4 | $;((Y 2 \times W 0)-(Y 0 \times W 2)) \times \mathrm{X} 1$ |
|  | $\mathrm{C}=\mathrm{RA} 0 \times \mathrm{RB} 3$ | ; X0×W2 |
|  | $\mathrm{C}=\mathrm{C}-(\mathrm{RB} 0 \times \mathrm{RA} 3)$ | ; (X0 $\times$ W2) - (X2-W0) |
|  | $\mathrm{RB} 9=\mathrm{C} \times$ RA5 | ; ((X0 $\times W 2$ ) - (X2 $\times$ W0) $) \times \mathrm{Y} 1$ |
|  | $\mathrm{C}=\mathrm{RB} 0 \times \mathrm{RA} 1$ | ; $\mathrm{X} 2 \times \mathrm{Y} 0$ |
|  | $C=C-(R A 0 \times R B 1)$ | ; $(\mathrm{X} 2 \times \mathrm{Y} 0)-(\mathrm{Y} 2 \times \mathrm{Y} 0)$ |
|  | $\mathrm{RAB}=\mathrm{C} \times \mathrm{RA} 7$ | $;((\mathrm{X} 2 \times \mathrm{Y} 0)-(\mathrm{Y} 2 \times \mathrm{X} 0)) \times \mathrm{W} 1$ |
|  | $R A 8=R A 8+R B 9$ | $;((\mathrm{X} 2 \times \mathrm{Y} 0)-(\mathrm{Y} 2 \times \mathrm{X} 0)) \times \mathrm{W} 1$ |
|  |  | $;+((X 0 \times W 2)-(X 2 \times W 0)) \times Y 1$ |
|  | $R A 8=R A 8+R B 8$ | $;((\mathrm{Y} 2 \times \mathrm{W} 0)-(\mathrm{Y} 0 \times \mathrm{W} 2)) \times \mathrm{X} 1$ |
|  |  | $;+((X 0 \times W 2)-(X 2 \times W 0)) \times Y 1$ |
|  |  | ; + ((X2 2 Y0) $-(\mathrm{Y} 2 \times \mathrm{X} 0)) \times \mathrm{W} 1$ |
|  | $\begin{aligned} & \text { if RAB }<0 \text { then } N=1 \\ & \text { else } N=0 \end{aligned}$ | ; set N as appropriate |
|  | $\text { if RA8 }=0 \text { then } Z=1$ | ; set Z as appropriate |
|  | else $Z=0$ |  |


| Temporary Storage | C, CT, RA8, RB8, RB9 |
| :---: | :---: |
| Outputs | The N and V status bits are set to indicate the following: |
|  | N Z Description |
|  | 00 Polygon is forward facing |
|  | 01 Polygon is parallel to view (reject or draw as line) |
|  | 10 Polygon is backward facing |
|  | 11 Polygon is backward facing |
| Instruction Type | CEXEC, short |

Syntax
Execution
'34020
Instruction Words

## CHECK Rd

If coprocessor is busy
FFFF FFFFh $\rightarrow$ Rd
If coprocessor is idle
$00000000 \mathrm{~h} \rightarrow \mathrm{Rd}$

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 1 |  |  | Rd | 0 |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | $R$ |  |  |  |  |  |
| 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |  |
| ID |  |  |  | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |

Instruction to '34082

## Operands

Description

| 31 | 29 |  |  |  | 0 |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 10 | 0 | 0001 | 1010 | 0000 | 0001 | 1110 | 0000 | 0000 |

Rd TMS34020 destination register for status information
CHECK checks the status of the coprocessor. If the TMS34082 coprocessor is busy, CHECK sets all the bits in Rd to 1. If the TMS34082 coprocessor is idle, CHECK sets all the bits in Rd to 0 .

This instruction allows polling of the TMS34082 prior to sending subsequent instructions to avoid halting the TMS34020 if the FPU is not ready to accept new commands. This polling may be required for user-defined instruction sequences that utilize the external program and data memory of the TMS34082.

Instruction Type
Example

CMOVGC, one register
CHECK A4
If the TMS34082 coprocessor is busy, this example sets all the bits in register A4 to 1. If the TMs34082 coprocessor is idle, this example resets all the bits in register A4 to 0 .


|  | You may now reload vertex $V(n+1)$ and repeat the instruction for all vertices in a polygon. |
| :---: | :---: |
| Temporary Storage | C, RB9 |
| Outputs | The status is set (ZGT, ZLT, YGT, YLT, XGT, and XLT) according to position. <br> $V=1 \quad$ Vertex out <br> $Z=1 \quad$ Vertex in |
|  | If repeated for all vertices in a polygon then: |
|  | $\underline{\mathrm{V}}$ - Description |
|  | 00 The polygon crosses the boundary of the clipping volume |
|  | 01 The polygon is totally inside the clipping volume |
|  | 10 The polygon is totally outside the clipping volume |
|  | 11 Not valid |
|  | The boundaries of the clipping volume that are crossed by the polygon may be determined by the ZLT (Z-plane), YLT (Y-plane), and XLT (X-plane) bits. |
| Instruction Type | CEXEC, short |
| Example | CKVTXI |
|  | MOVF *A5+, RA0, 4 |
|  | CKVTXF |
|  | This example first initializes the TMS34082 by executing the check vertex initialize instruction. Then the four homogeneous coordinates of the vertex are loaded, starting at the address given in TMS34020 register A5. Finally the status register is set according to the results of the check. |

## Syntax

'34020

## CKVTXI

\left.| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 |
| ID |  |  |  | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |$\right) 0$.

Instruction to '34082

| 31 | 29 |  |  |  | 0 |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | ---: |
| 10 | 0 | 0001 | 1000 | 0000 | 0001 | 1100 | 0000 | 0000 |

The CKVTXI instruction is used to initialize several bits in the status register before the first Check Vertex (CKVTX) instruction.

## Algorithm

Instruction Type
reset XLT
reset YLT
reset ZLT
set XGT
set YGT
set ZGT
;set starting $X$ OR outcode to 0 ;set starting Y OR outcode to 0 ;set starting Z OR outcode to 0 ;set starting $X$ AND outcode to 1 ;set starting Y AND outcode to 1 ;set starting Z AND outcode to 1

CEXEC, short

## Syntax

'34020
Instruction Words

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | type | size |  |
| ID |  |  |  | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

Instruction to '34082

## Description

Implied Operands

Algorithm
$C=C \times R B 3$
; $\mathrm{t} \times \mathrm{W} 2$
RB9 $=$ RA3
$R A 8=$ RB4 - RA4
; R2-R1
$C=C / R B 9$
; $\mathrm{t}^{\prime}=\mathrm{t} \times \mathrm{W} 2 / \mathrm{W} 1$ '
RA9 $=$ RB5 - RA5
; B2-B1
$C T=R A 8 \times C$
; (R2-R1) $\times \mathrm{t}^{\prime}$
$\mathrm{RA} 4=\mathrm{CT}+\mathrm{RA} 4 \quad ; \mathrm{R} 1{ }^{\prime}=\mathrm{R} 1+(\mathrm{R} 2-\mathrm{R} 1) \times \mathrm{t}^{\prime}$
$\mathrm{CT}=\mathrm{RA} 9 \times \mathrm{C}$
; $(\mathrm{B} 2-\mathrm{B} 1) \times \mathrm{t}^{\prime}$
$R A 5=C T+R A 5 \quad ; B 1 '=B 1+(B 2-B 1) \times t^{\prime}$
$R A 8=R B 6-R A 6$
; G2 - G1
$C T=R A 8 \times C$
; (G2-G1) $\times \mathrm{t}^{\prime}$
$R A 6=R A 6+C T \quad ; G 1 '=G 1+(G 2-G 1) \times t^{\prime}$
Temporary Storage CT, RA8, RA9
Outputs
RA4 $=$ R1' (red)
RA5 $=$ B1' (blue)
RA6 $=$ G1' (green)
$\mathrm{CT}=\mathrm{t}^{\prime}$
Instruction Type
CEXEC, short


## Syntax

'34020 Instruction Words

Instruction to '34082

## Description

Implied Operands

| Type | Syntax |
| :--- | :--- |
| Integer | CLIPFX |
| Double-Precision | CLIPFXD |
| Single-Precision | CLIPFXF |


| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | type | size |
| ID |  |  |  | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |


| 31 | 29 |  |  | 0 |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| ID | 0 | 0000 | 0000 | 0000 | 0010 | $101 t$ | 5000 | 0000 |

The CLIPFXx Instruction clips a line to the viewing volume when its first endpoint is outside the clipping (viewable) volume. Use CLIPFXx only if the $X$ coordinate of the first endpoint of a line is outside of the viewing volume. It also provides an interpolation factor that is used by the CLIPCx instruction when performing Gouraud shading. The endpoints are described by the homogeneous coordinates $\mathrm{P} 1=[\mathrm{X} 1, \mathrm{Y} 1, \mathrm{Z} 1, \mathrm{~W} 1]$ and $\mathrm{P} 2=[\mathrm{X} 2, \mathrm{Y} 2, \mathrm{Z} 2, \mathrm{~W} 2]$.

| RA0 $=\mathrm{X} 1$ | RB0 $=\mathrm{X} 2$ |
| :--- | :--- |
| RA1 $=\mathrm{Y} 1$ | RB1 $=\mathrm{Y} 2$ |
| RA2 $=\mathrm{Z1}$ | RB2 $=\mathrm{Z2}$ |
| RA3 $=W 1$ | RB3 $=\mathrm{W} 2$ |

## Algorithm

$C=R A O$
$C T=R B O$
If RA0 $<0$ then set $(\mathrm{N}=1)$
If $\mathrm{N}=1$ then

$$
\begin{array}{ll}
R B 8=R B 3+C T & ; b=W 2+X 2 \\
R A 8=R A 3+C & ; a=W 1+X 1
\end{array}
$$

else
RB8 $=$ RB3 - CT $\quad ; b=W 2-X 2$
$R A 8=R A 3-C \quad ; a=W 1-X 1$
$R B 9=R B 0-$ RA0 $\quad ; \mathrm{X} 2-\mathrm{X} 1$
RB8 $=$ RA8 - RB8 $\quad ; a-b$
$R A 9=R B 1-R A 1 \quad ; Y 2-Y 1$
$C=R A 8 / R B 8 \quad ; t=a /(a-b)$
$\mathrm{RA} 8=\mathrm{RB} 2-\mathrm{RA} 2 \quad ; \mathrm{Z2}-\mathrm{Z1}$
$C T=R B 9 \times C$
$R A 0=C T+R A 0$
; $(\mathrm{X} 2-\mathrm{X} 1) \times \mathrm{t}$
$C T=R A 9 \times C \quad ;(Y 2-Y 1) \times t$
$R A 1=C T+R A 1 \quad ; Y 1 \prime=Y 1+(Y 2-Y 1) \times t$
$\mathrm{RA} 9=\mathrm{RB} 3-\mathrm{RA} 3 \quad ; \mathrm{W} 2-\mathrm{W} 1$
$\mathrm{CT}=\mathrm{RA} 8 \times \mathrm{C} \quad ;(\mathrm{Z2}-\mathrm{Z} 1) \times \mathrm{t}$
$\mathrm{RA} 2=\mathrm{CT}+\mathrm{RA} 2 \quad ; \mathrm{Z} 1^{\prime}=\mathrm{Z1}+(\mathrm{Z2}-\mathrm{Z} 1) \times \mathrm{t}$
$C T=R A 9 \times C \quad ;(W 2-W 1) \times t$
$R A 3=C T+R A 3 \quad ; W 1^{\prime}=W 1+(W 2-W 1) \times t$

| Temporary Storage | CT, RA8, RA9, RB8, RB9 |
| :--- | :--- |
| Outputs | RA0 $=\mathrm{X} 1$ |
|  | RA1 $=\mathrm{Y} 1$ |
|  | $\mathrm{RA} 2=\mathrm{Z} 1$ |
|  | $\mathrm{RA} 3=\mathrm{W} 1$ |
|  | $\mathrm{C}=\mathrm{t}$ |
|  | Instruction Type |
|  | CEXEC, short |



| Temporary Storage | CT, RA8,RA9, RB8, RB9 |
| :--- | :--- |
| Outputs | RA0 $=\mathrm{X} 1$ |
|  | RA1 $=\mathrm{Y} 1$ |
|  | $\mathrm{RA} 2=\mathrm{Z} 1$ |
|  | $\mathrm{RA} 3=\mathrm{W} 1$ |
|  | $\mathrm{C}=\mathrm{t}$ |
|  |  |
|  | Instruction Type |
|  | CEXEC, short |


| Syntax | Type | Syntax |
| :--- | :--- | :--- |
|  | Integer | CLIPFZ |
|  | Double-Precision | CLIPFZ |
|  | Single-Precision | CLIPFZF |

Instruction to '34082

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | type | size |
| ID |  |  | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |

'34020
Instruction Words

| 31 | 29 |  |  |  | 0 |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| ID | 0 | 0000 | 0000 | 0010 | 0010 | $101 t$ | 5000 | 0000 |

The CLIPFZx Instruction clips a line to the viewing volume when its first endpoint is outside the clipping (viewable) volume. Use CLIPFZx only if the $Z$ coordinate of the first endpoint of a line is outside of the viewing volume. It also provides an interpolation factor that is used by the CLIPCx instruction when performing Gouraud shading. The endpoints are described by the homogeneous coordinates $\mathrm{P} 1=[\mathrm{X} 1, \mathrm{Y} 1, \mathrm{Z} 1, \mathrm{~W} 1]$ and $\mathrm{P} 2=[\mathrm{X} 2, \mathrm{Y} 2, \mathrm{Z} 2, \mathrm{~W} 2]$.

Implied Operands
$\begin{array}{ll}\text { RA0 }=X 1 & \text { RB0 }=X 2 \\ \text { RA1 }=Y 1 & \text { RB1 }=Y 2 \\ \text { RA2 }=Z 1 & \text { RB2 }=Z 2 \\ \text { RA3 }=W 1 & \text { RB3 }=W 2\end{array}$

## Algorithm

$\mathrm{C}=\mathrm{RA} 2$
$\mathrm{CT}=\mathrm{RB} 2$
If RA2 $<0$ then set ( $\mathrm{N}=1$ )
If $\mathrm{N}=1$ then

$$
\begin{array}{ll}
\text { RB8 }=\text { RB3 }+C T & ; b=W 2+Z 2 \\
\text { RA }=\text { RA } 3+C & ; a=W 1+Z 1
\end{array}
$$

else
$\mathrm{RB} 8=\mathrm{RB} 3-\mathrm{CT} \quad ; \mathrm{b}=\mathrm{W} 2-\mathrm{Z} 2$
$\mathrm{RA} 8=\mathrm{RA} 3-\mathrm{C} \quad ; \mathrm{a}=\mathrm{W} 1-\mathrm{Z} 1$
RB9 $=$ RB0 - RA $0 \quad ; \mathrm{X} 2-\mathrm{X} 1$
$R B 8=R A 8-R B 8 \quad ; a-b$
$\mathrm{RA} 9=\mathrm{RB} 1-\mathrm{RA} 1 \quad ; \mathrm{Y} 2-\mathrm{Y} 1$
$\mathrm{C}=$ RA8 /RB8 $\quad ; \mathrm{t}=\mathrm{a} /(\mathrm{a}-\mathrm{b})$
RA8 $=$ RB2 - RA2 $;$ Z2 $-\mathrm{Z1}$
$\mathrm{CT}=\mathrm{RB} 9 \times \mathrm{C} \quad ;(\mathrm{X} 2-\mathrm{X} 1) \times \mathrm{t}$
$\mathrm{RA} 0=\mathrm{CT}+\mathrm{RA} 0 \quad ; \mathrm{X} 1=\mathrm{X} 1+(\mathrm{X} 2-\mathrm{X} 1) \times \mathrm{t}$
$\mathrm{CT}=\mathrm{RA} 9 \times \mathrm{C} \quad ;(\mathrm{Y} 2-\mathrm{Y} 1) \times \mathrm{t}$
$\mathrm{RA} 1=\mathrm{CT}+\mathrm{RA} 1 \quad ; \mathrm{Y} 1{ }^{\prime}=\mathrm{Y} 1+(\mathrm{Y} 2-\mathrm{Y} 1) \times \mathrm{t}$
RA9 $=$ RB3 - RA $3 \quad$; W2 - W 1
CT $=$ RA $8 \times \mathrm{C} \quad ;(\mathrm{Z2}-\mathrm{Z} 1) \times \mathrm{t}$
$\mathrm{RA} 2=\mathrm{CT}+\mathrm{RA} 2 \quad ; \mathrm{Z1}=\mathrm{Z1}+(\mathrm{Z2}-\mathrm{Z} 1) \times \mathrm{t}$
CT $=$ RA $9 \times \mathrm{C} \quad ;(\mathrm{W} 2-\mathrm{W} 1) \times \mathrm{t}$
$R A 3=C T+R A 3 \quad ; W 1=W 1+(W 2-W 1) \times t$

Temiporary Storage CT, RA8, RA9, RB8, RB9
Outputs

$$
\begin{aligned}
& \mathrm{RA} 0=\mathrm{X} 1^{\prime} \\
& \mathrm{RA} 1=\mathrm{Y} 1 \\
& \mathrm{RA} 2=\mathrm{Z} 1^{\prime} \\
& \mathrm{RA} 3=\mathrm{W} 1^{\prime} \\
& \mathrm{C}=\mathrm{t}
\end{aligned}
$$

Instruction Type CEXEC, short

## Syntax

'34020
Instruction Words

| Type | Syntax |
| :--- | :--- |
| Integer | CLIPRX |
| Double-Precision | CLIPRXD |
| Single-Precision | CLIPRXF |


| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | type | size |
| ID |  |  | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

Instruction to '34082

| 31 | 29 |  |  | 0 |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| ID | 0 | 0000 | 0000 | 0010 | 0010 | $110 t$ | 5000 | 0000 |

## Description

Implied Operands

Algorithm
$C T=R B O$
$C=R A O$
If RB0 $<0$ then set $(\mathrm{N}=1)$
If $N=1$ then

$$
R B 8=R A 3+C \quad ; b=W 1-X 1
$$

$R A 8=R B 3+C T$

$$
; a=W 2-X 2
$$

else

$$
\text { RB8 }=\text { RA3 }-C \quad ; b=W 1+X 1
$$

$$
\mathrm{RA} 8=\mathrm{RB} 3-\mathrm{CT}
$$

$$
; a=W 2+X 2
$$

$R B 9=R A 0-R B 0$
$R B 8=R A 8-R B 8$
; X1 - X2
$R A 9=R A 1-R B 1$
$C=R A 8 / R B 8$
$R A 8=R A 2-R B 2$
; $\mathrm{a}-\mathrm{b}$
$C T=R B 9 \times C$
$R A O=C T+R B O$
; Y1 - Y2
; $t=a /(a-b)$
; Z1 - Z2
$C T=R A 9 \times C$
; $(\mathrm{X} 1-\mathrm{X} 2) \times t$
; $\mathrm{X} 2^{\prime}=\mathrm{X} 2+(\mathrm{X} 1-\mathrm{X} 2) \times t$
$R A 1=C T+R B 1$
; $(\mathrm{Y} 1-\mathrm{Y} 2) \times \mathrm{t}$
$R A 9=R A 3-R B 3$
$; \mathrm{Y} 2^{\prime}=\mathrm{Y} 2+(\mathrm{Y} 1-\mathrm{Y} 2) \times \mathrm{t}$
$C T=R A 8 \times C$
$\mathrm{RA} 2=\mathrm{CT}+\mathrm{RB} 2$
; W1 -W2
$C T=R A 9 \times C$
$R A 3=C T+R B 3$
; $(\mathrm{Z} 1-\mathrm{Z} 2) \times \mathrm{t}$
; Z2' $=\mathbf{Z 2}+(\mathrm{Z} 1-\mathrm{Z} 2) \times t$
; (W1 - W2) $\times t$
; $W 2$ ' $=W 2+(W 1-W 2) \times t$

Temporary Storage CT, RA8, RA9, RB8, RB9
Outputs
This writes [ $\left.\mathrm{X} 2^{\prime}, \mathrm{Y} 2^{\prime}, \mathrm{Z2}{ }^{\prime}, \mathrm{W} 2^{\prime}\right]$ over [ $\mathrm{X} 1, \mathrm{Y} 1, \mathrm{Z} 1, \mathrm{~W} 1$ ].
RAO $=X 2{ }^{\prime}$
$\mathrm{RA} 1=\mathrm{Y} 2^{\prime}$
RA3 $=Z 2^{\prime}$
RA4 $=$ W2'
$\mathrm{C}=\mathrm{t}$
Instruction Type CEXEC, short
Syntax
'34020
Instruction Words

| Type | Syntax |
| :--- | :--- |
| Integer | CLIPRY |
| Double-Precision | CLIPRYD |
| Single-Precision | CLIPRYF |


| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | type | size |
| ID |  |  |  | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

Instruction to '34082

| 31 | 29 |  |  |  | 0 |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 10 | 0 | 0000 | 0000 | 0001 | 0010 | $110 t$ | 5000 | 0000 |

## Description

Implied Operands
The CLIPRYx Instruction clips a line to the viewing volume when its second endpoint is outside the clipping (viewable) volume. Use CLIPRYx only if the $Y$ coordinate of the second endpoint of a line is outside of the viewing volume. It also provides an interpolation factor that is used by the CLIPCRx instruction when performing Gouraud shading. The endpoints are described by homogeneous coordinates $\mathrm{P} 1=[\mathrm{X} 1, \mathrm{Y} 1, \mathrm{Z} 1, \mathrm{~W} 1]$ and $\mathrm{P} 2=[\mathrm{X} 2, \mathrm{Y} 2, \mathrm{Z} 2, \mathrm{~W} 2]$.

$$
\begin{array}{ll}
\mathrm{RA} 0=\mathrm{X} 1 & \mathrm{RB}=\mathrm{X} 2 \\
\mathrm{RA} 1=\mathrm{Y} 1 & \text { RB1 }=\mathrm{Y} 2 \\
\mathrm{RA} 2=\mathrm{Z} 1 & \text { RB2 }=\mathrm{Z2} \\
\mathrm{RA} 3=\mathrm{W} 1 & \text { RB3 }=\mathrm{W} 2
\end{array}
$$

## Algorithm

$\mathrm{CT}=\mathrm{RB} 1$
$C=$ RA1
If RB1 < 0 then set $(N=1)$
If $\mathrm{N}=1$ then

$$
\begin{array}{ll}
\text { RB8 }=\text { RA3 }+C & ; b=W 1-Y 1 \\
\text { RA } 8=\text { RB3 }+C T & ; a=W 2-Y 2
\end{array}
$$

else

| RB8 $=$ RA3 -C | ; $\mathrm{b}=\mathrm{W} 1+\mathrm{Y} 1$ |
| :---: | :---: |
| RA8 $=$ RB3 - CT | ; $\mathrm{a}=\mathrm{W} 2+\mathrm{Y} 2$ |
| RB9 $=$ RAO-RB0 | ; X1-X2 |
| RB8 $=$ RA8 - RB8 | ;a-b |
| $\mathrm{RA} 9=\mathrm{RA} 1-\mathrm{RB} 1$ | ; Y 1 - Y2 |
| $\mathrm{C}=$ RA8 /RB8 | ; $t=a /(a-b)$ |
| $\mathrm{RA} 8=\mathrm{RA} 2-\mathrm{RB} 2$ | ; Z 1 - Z2 |
| $\mathrm{CT}=\mathrm{RB} 9 \times \mathrm{C}$ | ; $(\mathrm{X} 1-\mathrm{X} 2) \times \mathrm{t}$ |
| $\mathrm{RAO}=\mathrm{CT}+\mathrm{RBO}$ | ; $\mathrm{X} 2^{\prime}=\mathrm{X} 2+(\mathrm{X} 1-\mathrm{X} 2) \times \mathrm{t}$ |
| $C T=R A 9 \times C$ | ; Y 1 - Y2) $\times \mathrm{t}$ |
| $\mathrm{RA} 1=\mathrm{CT}+\mathrm{RB} 1$ | ; $\mathrm{Y} 2^{\prime}=\mathrm{Y} 2+(\mathrm{Y} 1-\mathrm{Y} 2) \times \mathrm{t}$ |
| $\mathrm{RA} 9=\mathrm{RA} 3-\mathrm{RB} 3$ | ; W1-W2 |
| $C T=R A 8 \times C$ | ; $(\mathrm{Z} 1-\mathrm{Z} 2) \times \mathrm{t}$ |
| $\mathrm{RA} 2=\mathrm{CT}+\mathrm{RB} 2$ | ; $\mathrm{Z2}$ ' $=\mathrm{Z2}+(\mathrm{Z} 1-\mathrm{Z} 2) \times \mathrm{t}$ |
| CT $=$ RA9 $\times$ C | ; (W1-W2) $\times$ t |
| $R A 3=C T+R B 3$ | ; W2' $=\mathrm{W} 2+(\mathrm{W} 1-\mathrm{W} 2) \times$ |


| Temporary Storage | CT, RA8, RA9, RB8, RB9 |
| :--- | :--- |
| Outputs | This writes $\left[\mathrm{X} 2^{\prime}, \mathrm{Y} 2^{\prime}, \mathrm{Z2} 2^{\prime}, \mathrm{W} 2^{\prime}\right]$ over $[\mathrm{X} 1, \mathrm{Y} 1, \mathrm{Z} 1, \mathrm{~W} 1]$. |
|  | RA0 $=\mathrm{X} 2^{\prime}$ |
|  | $\mathrm{RA1}=\mathrm{Y} 2^{\prime}$ |
|  | $\mathrm{RA3}=\mathrm{Z2}$ |
|  | $\mathrm{RA4}=\mathrm{W} 2^{\prime}$ |
|  | $\mathrm{C}=\mathrm{t}$ |
|  | Instruction Type |
|  | CEXEC, short |

Syntax
'34020
Instruction Words

| Type | Syntax |
| :--- | :--- |
| Integer | CLIPRZ |
| Double-Precision | CLIPRZD |
| Single-Precision | CLIPRZF |


| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | type | size |  |
| ID |  |  |  | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |

Instruction to '34082

| 31 | 29 |  | 0 |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | ---: |
| ID | 0 | 0000 | 0000 | 0010 | 0010 | $110 t$ | $s 000$ | 0000 |

## Description

Implied Operands

Algorithm
The CLIPRZx Instruction clips a line to the viewing volume when its second endpoint is outside the clipping (viewable) volume. Use CLIPRZx only if the $Z$ coordinate of the second endpoint of a line is outside of the viewing volume. It also provides an interpolation factor that is used by the CLIPCRx instruction when performing Gouraud shading. The endpoints are described by the homogeneous coordinates $\mathrm{P} 1=[\mathrm{X} 1, \mathrm{Y} 1, \mathrm{Z} 1, \mathrm{~W} 1]$ and $\mathrm{P} 2=[\mathrm{X} 2, \mathrm{Y} 2, \mathrm{Z} 2, \mathrm{~W} 2]$.

$$
\begin{array}{ll}
R A 0=X 1 & \text { RB0 }=\mathrm{X} 2 \\
\mathrm{RA} 1=\mathrm{Y} 1 & \text { RB1 }=\mathrm{Y} 2 \\
\mathrm{RA} 2=\mathrm{Z} 1 & \text { RB2 }=\mathrm{Z2} \\
\mathrm{RA} 3=\mathrm{W} 1 & \text { RB3 }=\mathrm{W} 2
\end{array}
$$

$\mathrm{CT}=\mathrm{RB} 2$
$\mathrm{C}=\mathrm{RA} 2$
If RA2 $<0$ then set ( $\mathrm{N}=1$ )
If $\mathrm{N}=1$ then

| RB8 $=$ RA3 $+C$ | $; b=W 1-Z 1$ |
| :--- | :--- |
| RA8 $=$ RB3 $+C T$ | $; a=W 2-Z 2$ |

else
$\mathrm{RB} 8=\mathrm{RA} 3-\mathrm{C} \quad ; \mathrm{b}=\mathrm{W} 1+\mathrm{Z} 1$
RA8 $=$ RB3 - CT $\quad ; a=W 2+Z 2$
RB9 $=$ RA $0-$ RB $0 ;$ X1 -X 2
$R B 8=$ RA8 - RB8 $\quad ; a-b$
$\mathrm{RA} 9=\mathrm{RA} 1-\mathrm{RB} 1 \quad ; \mathrm{Y} 1-\mathrm{Y} 2$
$\mathrm{C}=$ RA8 /RB8 $\quad ; \mathrm{t}=\mathrm{a} /(\mathrm{a}-\mathrm{b})$
$\mathrm{RA} 8=\mathrm{RA} 2-\mathrm{RB} 2 \quad ; \mathrm{Z1}-\mathrm{Z2}$
$\mathrm{CT}=\mathrm{RB} 9 \times \mathrm{C} \quad ;(\mathrm{X} 1-\mathrm{X} 2) \times \mathrm{t}$
$\mathrm{RAO}=\mathrm{CT}+\mathrm{RB} 0 \quad ; \mathrm{X} 2^{\prime}=\mathrm{X} 2+(\mathrm{X} 1-\mathrm{X} 2) \times \mathrm{t}$
$C T=R A 9 \times C \quad ;(Y 1-Y 2) \times t$
$\mathrm{RA} 1=\mathrm{CT}+\mathrm{RB} 1 \quad ; \mathrm{Y} 2^{\prime}=\mathrm{Y} 2+(\mathrm{Y} 1-\mathrm{Y} 2) \times \mathrm{t}$
RA9 $=$ RA $3-$ RB3 $;$ W $1-$ W2
$\mathrm{CT}=\mathrm{RA} 8 \times \mathrm{C} \quad ;(\mathrm{Z} 1-\mathrm{Z} 2) \times \mathrm{t}$
$\mathrm{RA} 2=\mathrm{CT}+\mathrm{RB} 2 \quad ; \mathrm{Z2}=\mathrm{Z2}+(\mathrm{Z} 1-\mathrm{Z2}) \times \mathrm{t}$
CT $=$ RA9 $\times \mathrm{C} \quad ;(\mathrm{W} 1-\mathrm{W} 2) \times t$
$R A 3=C T+R B 3 \quad ; W 2=W 2+(W 1-W 2) \times t$

| Temporary Storage | CT, RA8, RA9, RB8, RB9 |
| :--- | :--- |
| Outputs | This writes $\left[\mathrm{X} 2^{\prime}, \mathrm{Y}^{\prime}, \mathrm{Z2}^{\prime}, \mathrm{W} 2^{\prime}\right]$ over $[\mathrm{X} 1, \mathrm{Y} 1, \mathrm{Z1}, \mathrm{~W} 1]$. |
|  | RA0 $=\mathrm{X} 2^{\prime}$ |
|  | RA1 $=\mathrm{Y} 2^{\prime}$ |
|  | RA3 $=\mathrm{Z2} 2^{\prime}$ |
|  | RA4 $=\mathrm{W} 2^{\prime}$ |
|  | $\mathrm{C}=\mathrm{t}$ |

Instruction Type
CEXEC, short





| Syntax | Type | Syntax |
| :---: | :---: | :---: |
|  | Integer | CMP - *Rs, $\mathrm{CRs}_{1}, \mathrm{CRs}_{2}$ |
|  | Double-Precision | CMPD - *Rs, $\mathrm{CRs}_{1}, \mathrm{CRs}_{2}$ |
|  | Single-Precision | CMPF $-* R s, \mathrm{CRs}_{1}, \mathrm{CRs}_{2}$ |
| Execution | Rs - $32 \rightarrow$ Rs |  |
|  | $\star \mathrm{Rs} \rightarrow \mathrm{CRs}_{1}$ |  |
|  |  |  |
|  |  |  |
|  | Rs $-32 \rightarrow$ Rs |  |
|  | $* \mathrm{Rs} \rightarrow \mathrm{CRs}_{2}$ |  |
|  |  |  |
|  |  |  |
|  | Flags ( $\mathrm{CRs}_{1}-\mathrm{CR}$ | TMS34082 Status Register |

## '34020 <br> Instruction Words

Instruction to '34082

## Operands

## Description

Instruction Type

## Example

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | transfers |  |  |
| 1 | 0 | 0 | 0 | 0 | 1 | 0 | t | 5 | 0 | 0 | R | Rs |  |  |  |
| 10 |  |  | CRs 1 |  |  |  | CRs2 |  |  |  | 0 | 0 | 0 | 0 | 0 |


| 31 | 29 | 28 | 25 | 24 | 21 | 20 |  |  |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| ID | CRs 1 | CRs 2 | 00000 | 1000 | $010 t$ | $s 000$ | 0000 |  |

Rs TMS34020 register containing the memory address
$\mathrm{CRs}_{1} \quad$ TMS34082 register to contain the first operand. Must be from RA register file.
$\mathrm{CRs}_{2}$ TMS34082 register to contain the second operand. Must be from RB register file.

CMPx loads the contents of memory pointed to by Rs into $\mathrm{CRs}_{1}$ and $\mathrm{CRs}_{2}$, subtracts $\mathrm{CRs}_{2}$ from $\mathrm{CRs}_{1}$, and sets the appropriate status bits in the TMS34082 status register. Before each load from memory, Rs is decremented by 32 .

CMOVMC, predecrement, constant count
CMP -*A5, RA5, RB6
This example loads the integer contents of memory starting at the address given by TMS34020 register A5 minus 32 into TMS34082 registers RA5 and RB6, subtracts the integer contents of RB6 from RA5, and sets the status bits in the TMS34082 status register.


Syntax
'34020
Instruction Words

Instruction to '34082

| 31 | 29 |  |  | 0 |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| $1 D$ | 0 | 0000 | 0000 | 0000 | 0010 | $001 t$ | 5000 | 0000 |

## Description

Implied Operands

Algorithm

Temporary Storage
Outputs
Status Bits

| Type | Syntax |
| :--- | :--- |
| Integer | CPV |
| Double-Precision | CPVD |
| Single-Precision | CPVF |


| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | type | size |
| ID |  |  | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

A point $[\mathrm{Xn}, \mathrm{Yn}, \mathrm{Zn}]$ is compared to the volume defined by $X \mathrm{~min}, \mathrm{Ymin}, \mathrm{Zmin}$ and Xmax, Ymax, Zmax . Six comparison bits within the status register are set according to the comparison. The TMS34020 may read the status and perform a 64-way branch based on the six comparison bits.

$$
\begin{array}{lll}
\mathrm{RA} 0=\mathrm{X} \min & \mathrm{RA} 3=\mathrm{X} \max & \mathrm{RB}=\mathrm{Xn} \\
\mathrm{RA} 1=\mathrm{Y} \min & \mathrm{RA} 4=\mathrm{Y} \max & \mathrm{RB} 1=\mathrm{Yn} \\
\mathrm{RA} 2=\mathrm{Z} \min & \mathrm{RA} 5=\mathrm{Z} \max & \mathrm{RB} 2=\mathrm{Zn}
\end{array}
$$

If RBO - RAO $<0$, set XLT ; test for XLT (Xn - Xmin) else reset XLT
If RA3 - RB0 $<0$, set XGT ; test for XGT (Xmax -Xn ) else reset XGT
If RB1 - RA1 <0, set YLT ; test for YLT (Yn - Ymin) else reset YLT
If RA4 - RB1 <0, set YGT ; test for YGT (Ymax - Yn) else reset $Y G T$
If RB2 - RA2 $<0$, set ZLT ; test for ZLT ( $\mathrm{Zn}-\mathrm{Zmin}$ ) else reset ZLT
If RA5 - RB2 $<0$, set ZGT ; test for ZGT (Zmax - Zn) else reset ZGT

## CT

Status register set
XLT (bit 5) is set high if ( $X n<X_{\min }$ )
XGT (bit 6) is set high if $(X n>X \max )$
YLT (bit 7) is set high if ( $\mathrm{Yn}<\mathrm{Y} \min$ )
YGT (bit 8) is set high if ( $\mathrm{Yn}>\mathrm{Y} \max$ )
ZLT (bit 9 ) is set high if $(\mathrm{Zn}<\mathrm{Zmin})$
ZGT (bit10) is set high if ( $\mathrm{Zn}>\mathrm{Zmax}$ )
Instruction Type


Syntax
'34020
Instruction Words

| Type | Syntax |
| :--- | :--- |
| Integer | CSPLN |
| Double-Precision | CSPLND |
| Single-Precision | CSPLNF |


| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 0 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | type | size |
| ID |  |  |  | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

Instruction to '34082

Description

Implied Operands

## Algorithm

$C=C+R A 4$
; $T=T+d T$
$R A 7=R B 3$
; $X=A 3$
$R A 7=(R A 7 \times C)+R B 2$
; $X=(X \times T)+A 2$
$R A 7=(R A 7 \times C)+R B 1$
$; X=(X \times T)+A 1$
$R A 7=(R A 7 \times C)+R B 0$
$; X=(X \times T)+A 0$
$R A 8=R B 7$
$; Y=B 3$
$R A 8=(R A 8 \times C)+R B 6$
$; Y=(Y \times T)+B 2$
$R A 8=(R A 8 \times C)+R B 5 \quad ; Y=(Y \times T)+B 1$
$R A 8=(R A 8 \times C)+R B 4 \quad ; Y=(Y \times T)+B 0$
$C T=R A 1 \times C \quad ; Z=C 3 \times T$
$R A 9=C T+R A 0 \quad ; Z=Z+C 2$
$R A 9=(R A 9 \times C)+R B 9 \quad ; Z=(Z \times T)+C 1$
$R A 9=(R A 9 \times C)+R B 8 \quad ; Z=(Z \times T)+C 0$
Temporary Storage C, CT
Outputs
$R A 7=X$
RA8 $=Y$
$R A 9=Z$
Instruction Type

CEXEC, short

## Syntax

Execution
'34020 Instruction Words

Instruction to '34082

## Operands

## Description

Instruction Type

## Example

CVDF CRs, CRd
(CRs) $\rightarrow$ CRd

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 |
| ID |  |  | CRs |  |  |  | 0 | 1 | 0 | 0 | CRd |  |  |  |  |


| 31 | 29 | 28 | 25 | 24 | 21 | 20 | 16 | 15 | 0 |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| ID | CRs | 0100 | CRd | 0001 | 1111 | 1000 | 0000 |  |  |

CRs TMS34082 source register containing a 64-bit double-precision floating-point operand

CRd TMS34082 destination register
CVDF converts a 64-bit IEEE double-precision floating-point number to a 32-bit IEEE single-precision floating-point number. The double-precision number resides in CRs, and the converted single-precision number is stored in CRd.

The source register, CRs, must be in the RA register file.
CEXEC, short
CVDF RA5, RA7
This example converts the contents of RA5 to a single-precision floating-point number and stores the result in RA7.



## Syntax

Execution
'34020
Instruction Words

Instruction to '34082

## Operands

## Description

## Temporary

Storage
Example

CVDF - *Rs, CRs, CRd

$$
\begin{aligned}
& \mathrm{Rs}-32 \rightarrow \mathrm{Rs} \\
& { }^{*} \mathrm{Rs} \rightarrow \mathrm{CRs} \\
& \mathrm{Rs}-32 \rightarrow \mathrm{Rs} \\
& { }^{\mathrm{Rs} \rightarrow \mathrm{CRs}} \\
& (\mathrm{CRs}) \rightarrow \mathrm{CRd}
\end{aligned}
$$

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
| 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | R |  |  |  |  |
| 10 |  |  | CRs |  |  |  | 0 | 1 | 0 | 0 | CRd |  |  |  |  |
| 31 | 29 | 28 | 25 | 24 | 21 | 20 |  | 16 | 15 |  |  |  |  |  | 0 |
| ID |  | CRs |  | 0100 |  | CRd |  |  | 1001 |  | 1111 |  | 1000 | 0000 |  |

Rs TMS34020 register containing the memory address
CRs TMS34082 register to contain the 64-bit double-precision floating-point operand

CRd TMS34082 destination register
CVDF loads the double-precision contents of memory pointed to by Rs into CRs and converts the 64-bit IEEE double-precision floating-point value to a 32-bit IEEE single-precision floating-point value. The double-precision number resides in CRs, and the converted single-precision number is stored in CRd. Before each load from memory, Rs is decremented by 32.

The TMS34082 source register, CRs, must be in the RA register file.
CMOVMC, predecrement, constant cont
CVDF -*B5, RA5, RA7
This example loads the contents of memory starting at the address given by TMS34020 register B5 minus 32 into TMS34082 register RA5, converts the contents of RA5 to a single-precision number, and stores the result in RA7.


Syntax CVDI Rs $s_{1}$, Rs $_{2}$ CRs, CRd
Execution
'34020 Instruction Words

Instruction to '34082

Operands

Description

Instruction Type
Example
(CRs) $\rightarrow$ CRd floating-point operand

CRd TMS34082 destination register

CMOVGC, two registers
CVDI A4, A5, RA5, RB7

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | R |  | $\mathrm{Rs}_{1}$ |  |
| 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | R |  |  |  |
| ID |  |  | CRs |  |  |  | 0 | 1 | 0 | 1 | CRd |  |  |  |


| 31 | 29 | 28 | 25 | 24 | 21 | 20 | 16 | 15 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| ID | CRs | 0101 |  | CRd | 0101 | 1111 | 1000 | 0000 |

$\mathrm{Rs}_{1} \quad$ TMS34020 source register for half the 64-bit double-precision floating-point value to TMS34082
$\mathrm{Rs}_{2}$ TMS34020 source register for remaining half of the 64-bit double-precision floating-point operand

CRs TMS34082 source register to contain the double-precision

CVDI loads a 64-bit IEEE double-precision floating-point number from $\mathrm{Rs}_{1}$ and $\mathrm{Rs}_{2}$ into CRs and converts it to a 32-bit integer number. The double-precision number resides in CRs, and the converted integer number is stored in CRd.

The TMS34082 source register, CRs, must be in the RA register file.

This example loads TMS34020 registers A4 and A5 into TMS34082 register RA5, converts the contents of RA5 to an integer, and stores the result in RB7.

Syntax
Execution
'34020
Instruction Words

## Operands

## Description

## Example

CVDI -*Rs, CRs, CRd
Rs $-32 \rightarrow$ Rs
*Rs $\rightarrow$ CRs
Rs $-32 \rightarrow R s$
*Rs $\rightarrow$ CRs
(CRs) $\rightarrow$ CRd

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
| 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | R | Rs |  |  |  |
| ID |  |  | CRs |  |  |  | 0 | 1 | 0 | 1 | CRd |  |  |  |  |


| 31 | 29 | 28 | 25 | 24 | 21 | 20 | 16 | 15 |  | 0 |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| 10 | CRs | 0101 | CRd | 1001 | 1111 | 1000 | 0000 |  |  |  |

Rs TMS34020 register containing the memory address
CRs TMS34082 register to contain the 64-bit double-precision floating-point operand

CRd TMS34082 destination register
CVDI loads the double-precision contents of memory pointed to by the predecremented value of Rs into CRs and converts the 64-bit IEEE double-precision floating-point value to an integer value. The double-precision number resides in CRs, and the converted integer number is stored in CRd. Before each load from memory, Rs is decremented by 32.

The TMS34082 source register, CRs, must be in the RA register file.
CMOVMC, predecrement, constant count
CVDI -*B5, RA5, RA7
This example loads the contents of memory starting at the address given by TMS34020 register B5 minus 32 into TMS34082 register RA5, converts the contents of RA5 to an integer number, and stores the result in RA7.


## Syntax

Execution
'34020
Instruction Words

Instruction to '34082

## Operands

Description

Instruction Type
Example

CVFD Rs, CRs, CRd
Rs $\rightarrow$ CRs
(CRs) $\rightarrow$ CRd

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | R | Rs |  |  |  |
| 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| ID |  |  | CRs |  |  |  | 0 | 1 | 0 | 0 | CRd |  |  |  |  |


| 31 | 29 | 28 | 25 | 24 | 21 | 20 | 16 | 15 |  | 0 |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| ID | CRs | 0100 | CRd | 0101 | 1111 | 0000 | 0000 |  |  |  |

Rs TMS34020 source register containing the 32-bit single-precision floating-point value to TMS34082

CRs TMS34082 register to contain the 32-bit single-precision floating-point operand

CRd TMS34082 destination register
CVFD loads the single-precision contents of Rs into CRs and converts the 32-bit IEEE single-precision floating-point value to a 64-bit IEEE double-precision floating-point value. The single-precision number resides in CRs, and the converted double-precision number is stored in CRd.

The TMS34082 source register, CRs, must be in the RA register file.
CMOVGC, one register
CVFD B5, RA5, RA7
This example loads TMS34020 register B5 into TMS34082 register RA5, converts the contents of RA5 to a double-precision number, and stores the result in RA7.
Syntax
Execution
34020
Instruction Words

Instruction to '34082

## Operands

## Description

CVFD * Rs + , CRs, CRd
$\star$ Rs $\rightarrow$ CRs
Rs $+32 \rightarrow$ Rs
(CRs) $\rightarrow$ CRd

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | R | Rs |  |  |  |
| ID |  |  | CRs |  |  |  | 0 | 1 | 0 | 0 | CRd |  |  |  |  |


| 31 | 29 | 28 | 25 | 24 | 21 | 20 | 16 |  | 15 |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| ID | CRs | 0100 | CRd | 1001 | 1111 | 0000 | 0000 |  |  |  |

Rs TMS34020 register containing the memory address
CRs TMS34082 register to contain the 32-bit single-precision floating-point operand

CRd TMS34082 destination register
CVFD loads the single-precision contents of memory pointed to by Rs into CRs and converts the 32-bit IEEE single-precision floating-point value to a 64-bit IEEE double-precision floating-point value. The single-precision number resides in CRs, and the converted double-precision number is stored in CRd. After each load from memory, Rs is incremented by 32.

The TMS34082 source register, CRs, must be in the RA register file.
CMOVMC, postincrement, constant count
CVFD *B5+, RA5, RA7
This example loads the contents of memory starting at the address given by TMS34020 register B5 into TMS34082 register RA5, converts the contents of RA5 to a double-precision number, and stores the result in RA7.

Syntax
Execution
'34020
Instruction Words

Instruction to '34082

Operands

Description

Instruction Type
Example

CVFD -*Rs + , CRs, CRd
Rs $-32 \rightarrow$ Rs
*Rs $\rightarrow$ CRs
(CRs) $\rightarrow$ CRd

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 |
| 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | R | Rs |  |  |  |
| ID |  |  | CRs |  |  |  | 0 | 1 | 0 | 0 | CRd |  |  |  |  |



Rs TMS34020 register containing the memory address
CRs TMS34082 register to contain the 32-bit single-precision floating-point operand

CRd TMS34082 destination register
CVFD loads the single-precision contents of memory pointed to by Rs into CRs and converts the 32-bit IEEE single-precision floating-point value to a 64-bit IEEE double-precision floating-point value. The single-precision number resides in CRs, and the converted double-precision number is stored in CRd. Before each load from memory, Rs is decremented by 32.

The TMS34082 source register, CRs, must be in the RA register file.
CMOVMC, predecrement, constant count
CVFD -*B5, RA5, RA7
This example loads the contents of memory starting at the address given by TMS34020 register B5 minus 32 into TMS34082 register RA5, converts the contents of RA5 to a double-precision number, and stores the result in RA7.

Syntax

## Execution

'34020 Instruction Words

Instruction to '34082

Operands

Description

Instruction Type

## Example

CVFI CRs, CRd
(CRs) $\rightarrow$ CRd

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 |
| ID |  |  | CRs |  |  |  | 0 | 1 | 0 | 1 | CRd |  |  |  |  |


| 2928 |  | 2524 | 2120 | 1615 |  |  | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| ID | CRs | 0101 | CRd | 0001 | 1111 | 0000 | 0000 |

CRs TMS34082 source register containing a 32-bit single-precision floating-point operand

CRd TMS34082 destination register
CVFI converts a 32-bit IEEE single-precision floating-point value to a 32-bit integer value. The single-precision number resides in CRs, and the converted integer number is stored in CRd.

The source register, CRs, must be in the RA register file.
CEXEC, short
CVFI RA5, RA7
This example converts the contents of RA5 to an integer and stores the result in RA7.

Syntax
Execution
'34020 Instruction Words

Instruction to '34082

## Operands

## Description

## Instruction Type

Example

CVFI Rs, CRs, CRd
Rs $\rightarrow$ CRs
(CRs) $\rightarrow$ CRd

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | R | Rs |  |  |  |
| 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| ID |  |  | CRs |  |  |  | 0 | 1 | 0 | 1 | CRd |  |  |  |  |


| 31 | 29 | 28 | 25 | 24 | 21 | 20 | 16 | 15 | 0 |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| ID | CRs | 0101 | CRd | 0101 | 1111 | 0000 | 0000 |  |  |

Rs TMS34020 source register for the 32-bit single-precision floatingpoint value to TMS34082

CRs TMS34082 register to contain the 32-bit single-precision floating-point operand

CRd TMS34082 destination register
CVFI loads the single-precision contents of Rs into CRs and converts the 32-bit IEEE single-precision floating-point value to a 32-bit integer value. The single-precision number resides in CRs, and the converted integer number is stored in CRd.

The TMS34082 source register, CRs, must be in the RA register file.
CMOVGC, one register
CVFI B5, RA5, RB7
This example loads TMS34020 register B5 into TMS34082 register RA5, converts the contents of RA5 to an integer, and stores the result in RB7.


Syntax
Execution
'34020 Instruction Words

Instruction to '34082

Operands

Description

Instruction Type
Example

CVFI - *Rs, CRs, CRd
Rs $-32 \rightarrow$ Rs
*Rs $\rightarrow$ CRs
(CRs) $\rightarrow$ CRd

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 |
| 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | R | Rs |  |  |  |
| ID |  |  | CRs |  |  |  | 0 | 1 | 0 | 1 | CRd |  |  |  |  |


| 31 | 29 | 28 | 25 | 24 | 21 | 20 | 16 | 15 |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| ID | CRs | 0101 | CRd | 1001 | 1111 | 0000 | 0000 |  |  |  |

Rs TMS34020 register containing the memory address
CRs TMS34082 register to contain the 32-bit single-precision floating-point operand

CRd TMS34082 destination register
CVFI loads the single-precision contents of memory pointed to by Rs into CRs and converts the 32-bit IEEE single-precision floating-point value to a 32-bit integer value. The single-precision number resides in CRs, and the converted integer number resides in CRd. Before each load from memory, Rs is decremented by 32.

The TMS34082 source register, CRs, must be in the RA register file.
CMOVMC, predecrement, constant count
CVFI -*B5, RA5, RA7
This example loads the contents of memory starting at the address given by TMS34020 register B5 minus 32 into TMS34082 register RA5, converts the contents of RA5 to an integer number, and stores the result in RA7.


Syntax CVID Rs, CRs, CRd

## Execution $\quad$ Rs $\rightarrow$ CRs

(CRs) $\rightarrow$ CRd
'34020
Instruction Words


| 31 | 29 | 28 | 25 | 24 | 21 | 20 | 16 | 15 | 0 |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| DD | CRs | 0110 | CRd | 0101 | 1111 | 1000 | 0000 |  |  |

Rs TMS34020 source register containing the 32-bit integer value to TMS34082

CRs TMS34082 source register to contain the 32-bit integer operand
CRd TMS34082 destination register
CVID loads the integer contents of Rs into CRs and converts a 32-bit integer value to a 64-bit IEEE double-precision floating-point value. The integer resides in CRs, and the converted double-precision number is stored in CRd. For this instruction, the integer in Rs must be sent as both words of a 64-bit transfer.

The TMS34082 source register, CRs, must be in the RA register file.
CMOVGC, two registers
CVID B5, RA5, RA7
This example loads TMS34020 register B5 into TMS34082 register RA5, converts the contents of RA5 to a double-precision number, and stores the result in RA7.

| Syntax | CVIF CRs, CRd |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Execution | $(C R s) \rightarrow$ CRd |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| '34020 Instruction Words | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|  | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 |
|  | ID |  |  | CRs |  |  |  | 0 | 1 | 1 | 0 | CRd |  |  |  |  |
| Instruction to '34082 | $\begin{array}{llllllll}31 & 29 & 28 & 25 & 24 & 21 & 20\end{array}$ |  |  |  |  |  |  | 16 |  | 15 |  |  |  |  | 0 |  |
|  | ID |  | CRs |  | 0110 |  |  | CRd |  | 0001 |  | 1111 | 0000 |  | 0000 |  |
| Operands | CRs TMS34082 source register containing the 32-bit integer operand |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
|  | CRd TMS34082 destination register |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| Description | CVIF converts a 32-bit integer value to a 32-bit IEEE single-precision floating-point value. The integer resides in CRs, and the converted single-precision number is stored in CRd. |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
|  | The source register, CRs, must be in the RA register file. C and CT may no be used as operands for this instruction. |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| Instruction Type | CEXEC, short |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| Example | CVIF RA5, RA7 |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
|  | This example converts the contents of RA5 to a single-precision number and stores the result in RA7. |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |

Syntax
Execution
'34020 Instruction Words

Instruction to '34082

## Operands

## Description

## Instruction Type

## Example

CVIF Rs, CRs, CRd
Rs $\rightarrow$ CRs
(CRs) $\rightarrow$ CRd

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | R | Rs |  |  |  |
| 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| ID |  |  | CRs |  |  |  | 0 | 1 | 1 | 0 | CRd |  |  |  |  |


| 31 | 29 | 28 | 25 | 24 | 21 | 20 | 16 | 15 | 0 |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| ID | CRs | 0110 | CRd | 0101 | 1111 | 0000 | 0000 |  |  |

Rs TMS34020 source register for the 32-bit integer value to TMS34082
CRs TMS34082 source register to contain the 32-bit integer operand
CRd TMS34082 destination register
CVIF loads the integer contents of Rs into CRs and converts a 32-bit integer value to a 32 -bit IEEE single-precision floating-point value. The integer resides in CRs, and the converted single-precision number resides in CRd.

The TMS34082 source register, CRs, must be in the RA register file.
CMOVGC, one register
CVIF A3, RA5, RA7
This example loads TMS34020 registers of A3 into TMS34082 register RA5, converts the contents of RA5 to a single-precision number, and stores the result in RA7.
Syntax
Execution
'34020
Instruction Words

$$
\text { CVIF *Rs }+ \text {, CRs, CRd }
$$

$* R s \rightarrow$ CRs
Rs $+32 \rightarrow$ Rs
(CRs) $\rightarrow$ CRd

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | R |  |  |  |  |
| ID |  |  | CRs |  |  |  | 0 | 1 | 1 | 0 | CRd |  |  |  |  |

Instruction to '34082

## Operands

## Description

| 31 | 29 | 28 | 25 | 24 | 21 | 20 | 16 | 15 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| ID | CRs | 0110 | CRd | 1001 | 1111 | 0000 | 0000 |  |

Rs TMS34020 register containing the memory address
CRs TMS34082 register to contain the 32-bit integer operand
CRd TMS34082 destination register
CVIF loads the integer contents of memory pointed to by Rs into CRs and converts the 32-bit integer value to a 32-bit IEEE single-precision floating-point value. The integer number resides in CRs, and the converted single-precision number is stored in CRd. After each load from memory, Rs is incremented by 32.

The TMS34082 source register, CRs, must be in the RA register file.
Instruction Type
Example
CMOVMC, postincrement, constant count
CVIF *B5+, RA5, RA7
This example loads the contents of memory starting at the address given by TMS34020 register B5 into TMS34082 register RA5, converts the contents of RA5 to a single-precision number, and stores the result in RA7.

## Syntax

Execution
'34020
Instruction Words

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 |
| 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | R |  |  |  |  |
| ID |  |  | CRs |  |  |  | 0 | 1 | 1 | 0 | CRd |  |  |  |  |


| 31 | 29 | 28 | 25 | 24 | 21 | 20 | 16 | 15 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| ID | CRs | 0110 | CRd | 1001 | 1111 | 0000 | 0000 |  |

Operands

## Description

Instruction Type

## Example

CVIF - *Rs, CRs, CRd
Rs $-32 \rightarrow$ Rs
*Rs $\rightarrow$ CRs
(CRs) $\rightarrow$ CRd

## Instruction to '34082

Rs TMS34020 register containing the memory address
CRs TMS34082 register to contain the 32-bit integer operand
CRd TMS34082 destination register
CVIF loads the integer contents of memory pointed to by Rs into CRs and converts the 32-bit integer value to a 32-bit IEEE single-precision floating-point value. The integer number resides in CRs, and the converted single-precision number is stored in CRd. Before each load from memory, Rs is decremented by 32 .

The TMS34082 source register, CRs, must be in the RA register file.
CMOVMC, predecrement, constant count
CVIF -*B5, RA5, RA7

This example loads the contents of memory starting at the address given by TMS34020 register B5 minus 32 into TMS34082 register RA5, converts the contents of RA5 to a single-precision number, and stores the result in RA7.




Instruction Type
Example
Syntax
Execution
'34020
Instruction Words

## Operands

Description
Type
Syntax
Integer DIVS Rs ${ }_{1}$, Rs $_{2}$, CRs $_{1}$, CRs $_{2}$, CRd
Single-Precision DIVF $R s_{1}$, Rs $_{2}$, CRs $_{1}, C R s_{2}$, CRd
$\mathrm{Rs}_{1} \rightarrow \mathrm{CRs}_{1}$
$\mathrm{Rs}_{2} \rightarrow \mathrm{CRs}_{2}$
100
$\left(\frac{\mathrm{CRs}_{1}}{\mathrm{CRs}_{2}}\right) \rightarrow \mathrm{CRd}$


Instruction to '34082

| 31 | 29 | 28 | 25 | 24 | 21 | 20 | 16 |  |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| ID | CRs 1 | CRs 2 | CRd | 0101 | 001 t | 0000 | 0000 |  |

$\mathrm{Rs}_{1} \quad$ TMS34020 source register for the first value to TMS34082
$\mathrm{Rs}_{2} \quad$ TMS34020 source register for the second value to TMS34082
$\mathrm{CRs}_{1}$ TMS34082 register to contain the first operand. Must be in RA register file.
$\mathrm{CRs}_{2}$ TMS34082 register to contain the second operand. Must be in RB register file.

CRd TMS34082 destination register
DIVx loads the contents of $\mathrm{Rs}_{1}$ and $\mathrm{Rs}_{2}$ into $\mathrm{CRs}_{1}$ and $\mathrm{CRs}_{2}$ respectively, divides the contents of $\mathrm{CRs}_{1}$ by $\mathrm{CRs}_{2}$, and stores the result in CRd. For integer divides, the CT register is used for temporary storage. Any value stored in this register prior to DIVS will be corrupted.

The double-precision form of this instruction is not supported.
CMOVGC, two registers
DIVF A5, A6, RA5, RB6, RA7
This example loads TMS34020 registers A5 and A6 into TMS34082 registers RA5 and RB6 respectively, divides the contents of RA5 by RB6, and stores the result in RA7.



Syntax
Execution
'34020
Instruction Words

## GETCST

TMS34082 Status Register $\rightarrow$ ST register of TMS34020

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
| 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| ID |  |  | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |

Instruction to '34082

| 31 | 29 |  |  | 0 |  |  |  |  |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| 10 | 0 | 0000 | 0000 | 0000 | 0100 | 1110 | 0000 | 0000 |

Description

Instruction Type
Example

GETCST loads 4 MSBs of the TMS34082 status register (STATUS) into the TMS34020 status register (ST).

## CMOVCS

GETCST
This example sends the TMS34082 status register to the TMS34020. The TMS34020 takes the value and masks off the 4 MSBs ; it then stuffs the values in the TMS34020 status register corresponding to the $\mathrm{N}, \mathrm{C}, \mathrm{Z}, \mathrm{V}$ bits.



## Syntax

'34020
Instruction Words
INMNMX

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 |
| ID |  |  |  | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

Instruction to '34082

| 31 | 29 |  |  |  | 0 |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 10 | 0 | 0000 | 0000 | 0000 | 0010 | 0110 | 0000 | 0000 |

## Description

Algorithm

Instruction Type

The IMNMX instruction configures the registers in preparation for either the MNMX1 or MNMX2 instruction. The following initializations occur (internal flags are set; register values are not altered):

$$
\begin{array}{ll}
\text { RB0 }=\text { MAX } & \text {; set to positive infinity (used to store minimum } X \text { values) } \\
\text { RB1 }=\text { MIN } & \text {; set to negative infinity (used to store maximum } X \text { values) } \\
\text { RB2 }=\text { MAX } & \text {; set to positive infinity (used to store minimum } Y \text { values) } \\
\text { RB3 }=\text { MIN } & \text {; set to negative infinity (used to store maximum } Y \text { values) } \\
\text { COUNTX }=0 & \text {; bits 15-0 for } X \text { minimums, bits } 31-16 \text { for } X \text { maximums } \\
\text { COUNTY }=0 & \text {; bits 15-0 for } Y \text { minimums, bits } 31-16 \text { for } Y \text { maximums } \\
\text { Count }=0 & \text {; set count to zero (bits 31-16 of MIN-MAXILOOPCT register }
\end{array}
$$



## Syntax

Execution
'34020
Instruction Words

Instruction to '34082

## Operands

Description

Instruction Type

## Example

Type
Integer
Double-Precision
Single-Precision
$\mathrm{Rs}_{1} \rightarrow \mathrm{CRs}$

$\frac{1}{\mathrm{CRs}} \rightarrow \mathrm{CRd}$
Integer or Single-Precision:

|  |  |  |  |  |  |  |  |  |  |  |  | $\mathrm{Rs}_{1}$ |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | R |  |  |  |  |
| 0 | 1 | 0 | 1 | 0 | 1 | 0 | type | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| ID |  |  | 0 | 0 | 0 | 0 | CRs |  |  |  | CRd |  |  |  |  |

## Double-Precision:

|  | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | R |  | $\mathrm{Rs}_{1}$ |  |
| 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | R |  | R |  |
| ID |  |  | 0 | 0 | 0 | 0 | CRs |  |  |  | CRd |  |  |  |


| 31 | 29 | 28 | 25 | 24 | 21 | 20 | 16 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 10 | 0000 | CRs | CRd | 0101 | $010 t$ | 5000 | 0000 |

$\mathrm{Rs}_{1} \quad$ TMS34020 source register containing the operand (or half of the 64-bit double-precision floating-point operand.)
$\mathrm{Rs}_{2}$ TMS34020 source register containing the remaining half of the double-precision operand.

CRs TMS34082 register to contain the operand. Must be in the RB register file.

CRd TMS34082 destination register
This instruction loads the contents of the $\mathrm{Rs}_{1}$ (and $\mathrm{Rs}_{2}$ for double-precision) into CRs, divides 1 by CRs, and places the result in CRd. For integer inverses, CT is used as a temporary storage register. Any value stored in CT prior to INV will be corrupted.

CMOVGC, one or two registers
INV A2, RB8, RB2
This example loads the contents of TMS34020 register A2 into RB8, divides 1 by RB8, and stores the integer result in RB2.

Syntax
Execution
34020
Instruction Words

Instruction to '34082

## Operands

Description

Instruction Type
Example

Type
Syntax
Integer
Double-Precision
INV *Rs+, CRs, CRd
INVD *Rs + , CRs, CRd
Single-Precision
INVF *Rs+, CRs, CRd
*Rs $\rightarrow$ CRs
Rs $+32 \rightarrow$ Rs


$\frac{1}{\mathrm{CRs}} \rightarrow \mathrm{CRd}$

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 10 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | transfers |
| 1 | 0 | 0 | 1 | 0 | 1 | 0 | type | size | 0 | 0 | R | Rs |  |  |
| ID |  |  | 0 | 0 | 0 | 0 | CRs |  |  |  | CRd |  |  |  |


| $31 \quad 29$ | 28 | 25 | 24 | 21 | 20 | 16 | 15 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| ID | 0000 | CRs | CRd | 1001 | $010 t$ | s 000 | 0000 |

Rs TMS34020 register containing the memory address
CRs TMS34082 register to contain the operand. Must be in the RB register file.

CRd TMS34082 destination register
This instruction loads the contents of memory pointed to by Rs into CRs, divides 1 by CRs, and places the result in CRd. After each load from memory, Rs is incremented by 32. For integer inverses, CT is used as a temporary storage register. Any value stored in CT prior to INV will be corrupted.

CMOVMC, postincrement, constant count
INVD *A2+, RB8, RB1
This example loads the double-precision contents of memory starting at the address given by TMS34020 register A2 into TMS34082 register RB8, divides 1 by RB8, and stores the result in RB1.


## Syntax

Execution
'34020 Instruction Words

Instruction to '34082

Operands

Description

Instruction Type
Example

## JUMPC $n$

Execute external TMS34082 instructions found at address $2 \times n$

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 1 | $n$ |  |  |  |  | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| ID |  |  | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |


| 31 | 29 | $28 \quad 25$ | $24 \quad 21$ | $20 \quad 16$ | 1514 | 13 | 9 | 8 | 3 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| ID |  | 0000 | 0000 | 0000 | 11 | n |  | 00000 | 0000 |

$n \quad$ Specifies the jump table entry to which the TMS34082 instruction execution is sent May be a number from 0 to 15.

JUMPC begins execution of TMS34082 external instructions stored in TMS34082 external local memory. The starting address is specified as TMS34082 external memory address $2 \times n$. Usually, a jump table is stored in these locations to permit calling several complex subroutines.

CEXEC, long
JUMPC 4
This example executes TMS34082 instructions stored in the TMS34082's local memory on the MSD bus. Instruction execution begins at address 8.



## Syntax <br> '34020 <br> Instruction Words

| Type | Syntax |
| :--- | :--- |
| Integer | LINTZ |
| Double-Precision | LINTZD |
| Single-Precision | LINTZF |


| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | type | size |
| ID |  |  | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |

Instruction to '34082

| 31 | 29 |  | 0 |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | ---: |
| ID | 0 | 0000 | 0000 | 0010 | 0010 | $100 t$ | 5000 | 0000 |

## Description

Implied Operands

Algorithm

Temporary Storage
Outputs

Instruction Type

Perform linear interpolation given two points and a plane (the plane is assumed perpendicular to one of the coordinate axes).

| $R A 0=X 1$ | $R B 0=X 2$ |
| :--- | :--- |
| $R A 1=Y 1$ | $R B 1=Y 2$ |
| $R A 2=Z 1$ | $R B 2=Z 2$ |

$\mathrm{RA} 3=\mathrm{RB} 9-\mathrm{RA} 2 \quad ; \mathrm{Z3}-\mathrm{Z1}$
$R B 6=R B 0-R A 0 \quad ; X 2-X 1$
$\mathrm{RB} 7=\mathrm{RB} 1-\mathrm{RA} 1 \quad ; \mathrm{Y} 2-\mathrm{Y} 1$
$R B 8=$ RB2 - RA2 $; Z 2-\mathrm{Z1}$
$\mathrm{C}=\mathrm{RA} 3 / \mathrm{RB} 8$
$\mathrm{RB} 6=\mathrm{C} \times \mathrm{RB} 6$
; $t=(\mathrm{Z} 3-\mathrm{Z} 1) /(\mathrm{Z} 2-\mathrm{Z} 1)$
; $\mathrm{t} \times(\mathrm{X} 2-\mathrm{X} 1)$
$R B 7=C \times R B 7 \quad ; t \times(Y 2-Y 1)$
$\mathrm{RB} 8=\mathrm{C} \times$ RB8 $\quad ; \mathrm{t} \times(\mathrm{Z2}-\mathrm{Z1})$
$R A 0=R B 6+R A 0 \quad ; X 3=X 1+(t \times(X 2-X 1))$
$R A 1=R B 8+R A 1 \quad ; Y 3=Y 1+(t \times Y 2-Y 1))$
$R A 2=R B 8+R A 2 \quad ; Z 3=Z 1+(t \times(Z 2-Z 1))$
C, RA3, RB8-RB6
RAO $=\mathrm{X} 3 \quad$; interpolated values
$R A 1=Y 3$
$\mathrm{RA} 2=\mathrm{Z} 3$
CEXEC, short


| Syntax | Type | Syntax |
| :--- | :--- | :--- |
|  | Integer | MAC $_{1}, \mathrm{Rs}_{2}, \mathrm{CRs}_{1}, \mathrm{CRs}_{2}$ |
|  | Single-Precision | MACF Rs $_{1}, R s_{2}, C R s_{1}, C R s_{2}$ |

Execution
'34020
Instruction Words
Instruction Words

Instruction to '34082

## Operands

## Implled Operands

## Description

## Outputs

Instruction Type
Example
$\mathrm{Rs}_{1} \rightarrow \mathrm{CRs}_{1}$
$\mathrm{Rs}_{2} \rightarrow \mathrm{CRs}_{2}$
$\mathrm{C}+\left(\mathrm{CRs}_{1} \times \mathrm{CRs}_{2}\right) \rightarrow \mathrm{C}$

| 15 | 14 | 13 | 12 | 11 | 10 | 0 | 8 | 6 | 5 | 4 | 3 | 2 | 1 |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | $R$ | $R_{1}$ |  |  |
| 0 | 1 | 1 | 1 | 1 | 1 | 1 | type | 0 | 0 | 0 | $R$ | $R_{2}$ |  |  |
| CR |  |  |  |  |  |  |  |  |  |  |  | $\mathrm{CRs}_{1}$ |  | 0 |


| 31 | 29 | 28 | 25 | 24 | 20 | 19 | 16 | 15 |  | 0 |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| ID | CRs $_{1}$ | 00011 | CRs $_{2}$ | 0111 | 111 t | 0000 | 0000 |  |  |  |

$\mathrm{Rs}_{1} \quad$ TMS34020 source register for the first (An) value to TMS34082
$\mathrm{Rs}_{2} \quad$ TMS34020 source register for the second (Bn) value to TMS34082
CRs $_{1} \quad$ TMS34082 register to contain the $A_{n}$ operand. Must be in the RA register file.
$\mathrm{CRs}_{2}$ TMS34082 register to contain the $\mathrm{B}_{\mathrm{n}}$ operand. Must be in the RB register file.

C Register Previously accumulated sum
MACx is used to perform multiply and accumulate operations of the form:

$$
\left(\left(A_{0} \times B_{0}\right)+\left(A_{1} \times B_{1}\right)+\left(A_{2} \times B_{2}\right)+\ldots\left(A_{n} \times B_{n}\right)\right)
$$

This instruction loads two operands from $\mathrm{Rs}_{1}$ and $\mathrm{Rs}_{2}$ into $\mathrm{CRs}_{1}$ and $\mathrm{CRs}_{2}$ respectively, performs one multiply, and adds the result to the previously accumulated sum.

The double-precision form of this instruction is not supported.
The new accumulated sum is stored in the $C$ Register. The next multiply/accumulate may now be performed.

CMOVGC, two registers
MAC A1, A2, RA1, RB1
This instruction loads the integer contents of A1 and A2 into RA1 and RB1, respectively, and multiples the contents of RA1 by RB1. The product is added to the value stored in the C register and the result is stored back in C .

Syntax

Execution
'34020
Instruction Words

Instruction to '34082

## Operands

implied Operands
Description

Outputs

Instruction Type

Type
Syntax
Integer
Single-Precision

MAC *Rs,+ CRs $_{1}$, CRs $_{2}$ [, count]
MACF $* R s+$, CRs ${ }_{1}$, CRs ${ }_{2}$ [, count]

Repeat count times:
$* \mathrm{Rs} \rightarrow \mathrm{CRs}_{1}$
Rs $+32 \rightarrow$ Rs
*Rs $\rightarrow \mathrm{CRs}_{2}$
$\mathrm{C}+\left(\mathrm{CRs}_{1} \times \mathrm{CRs}_{2}\right) \rightarrow \mathrm{C}$


| 31 | 29 | 28 | 25 | 24 | 20 | 19 | 16 |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| ID | $\mathrm{CRs}_{1}$ | 00011 | CRs $_{2}$ | 1011 | 111 t | 0000 | 0000 |  |  |

Rs TMS34020 register containing the memory address
CRs $_{1}$ TMS34082 register to contain the $A_{n}$ operand. Must be in the RA register file.
$\mathrm{CRs}_{2}$ TMS34082 register to contain the $\mathrm{B}_{\mathrm{n}}$ operand. Must be in the RB register file.
count Number of times the instruction is executed; must be between 1-16 (default is 1 ). The number of transfers is $2 \times$ count

C Register
Previously accumulated sum
MACx is used to perform multiply and accumulate operations of the form:

$$
\left(\left(A_{0} \times B_{0}\right)+\left(A_{1} \times 1\right)+\left(A_{2} \times B_{2}\right)+\ldots\left(A_{n} \times B_{n}\right)\right) .
$$

This instruction loads two operands from memory starting at the address given by TMS34020 register Rs into TMS34082 registers $\mathrm{CRs}_{1}$ and $\mathrm{CRs}_{2}$, performs one multiply, and adds the result to the previously accumulated sum. This sequence is repeated count times. After each load from memory, Rs is incremented by 32 .

The double-precision form of this instruction is not supported.
The new accumulated sum is stored in the $C$ register. The next multiply/accumulate may now be performed.

CMOVMC, postincrement, constant count

## Example

CLRF C
MACF *A1+, RA9, RB9, 6
This example performs a sum of six products. First, the TMS34082 C register is set to zero. Then, the single-precision contents of memory starting at TMS34020 register A1 is loaded into TMS34082 registers RA9 and RB9. The contents of RA9 and RB9 are multiplied, the result is added to the C register, and the sum is stored in C . This process is repeated 5 more times. The end result is stored in C.



## Syntax

'34020 Instruction Words

Instruction to '34082

Description

Implied Operands

| $A 4 \times 4$ matrix in the FPU registers |  |  |  |
| :--- | :--- | :--- | :--- |
| $R A 0=B 00$ | $R A 1=B 01$ | RA2 $=\mathrm{B} 02$ | RA3 $=\mathrm{B} 03$ |
| $R A 4=B 10$ | $R A 5=B 11$ | RA6 $=\mathrm{B} 12$ | RA7 $=\mathrm{B} 13$ |
| $R A 8=B 20$ | $R A 9=B 21$ | RB0 $=\mathrm{B} 22$ | RB1 $=\mathrm{B} 23$ |
| $R B 2=B 30$ | $R B 3=B 31$ | RB4 $=\mathrm{B} 32$ | RB5 $=\mathrm{B} 33$ |

The first element $(\mathrm{AxO})$ of a row vector: $\mathrm{RB} 9=\mathrm{AxO}$

## Algorithm

$R B 6=R B 9 \times R A 0 ; A x 0 \times B 00$
$\mathrm{RB} 7=\mathrm{RB} 9 \times \mathrm{RA} 1 \quad ; \mathrm{Ax0} \times \mathrm{B} 01$
$R B 8=R B 9 \times R A 2 ; A x 0 \times B 02$
$R B 9=R B 9 \times R A 3 ; A x 0 \times B 03$
$\mathrm{CT}=\mathrm{RB} 9$; CT is used to store $(\mathrm{Ax0} \times \mathrm{B} 03)$ value ; since RB9 will be corrupted.
Temporary Storage C
Outputs

Instruction Type
Example

| Type | Syntax |
| :--- | :--- |
| Integer | MMPYO |
| Double-Precision | MMPYOD |
| Single-Precision | MMPYOF |


| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | type | size |
| ID |  |  | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |


| 31 | 29 |  | 0 |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | ---: |
| ID | 0 | 0000 | 1000 | 0000 | 0011 | 111 t | 5000 | 0000 |

This instruction multiplies the matrix B by a vector element, A 0 . This instruction may be combined with MMPY1, MMPY2, and MMPY3 to multiply matrices of several sizes $1 \times 4$ by $4 \times 4,4 \times 4$ by $4 \times 4,1 \times 3$ by $3 \times 3$, and $3 \times 3$ by $3 \times 3$ matrix multiplies may be implemented.

A $4 \times 4$ matrix in the FPU registers

| $\mathrm{RB6}=\mathrm{RB} 9 \times \mathrm{RA0}$ | ; $\mathrm{AxO} \times \mathrm{BOO}$ |
| :---: | :---: |
| $\mathrm{RB} 7=\mathrm{RB} 9 \times \mathrm{RA} 1$ | ; $A x 0 \times B 01$ |
| $R B 8=R B 9 \times R A 2$ | ; $\mathrm{Ax0} 0 \times \mathrm{B02}$ |
| $R B 9=R B 9 \times R A 3$ | ; $\mathrm{Ax} 0 \times \mathrm{B03}$ |

$R B 6=A x 0 \times B 00$
$\mathrm{RB} 7=\mathrm{Ax} 0 \times \mathrm{B} 01$
$\mathrm{RB} 8=\mathrm{Ax} 0 \times \mathrm{B} 02$
$R B 9=C T=A x 0 \times B 03$
CEXEC, short
See Example 5-4 for code for a $3 \times 3$ by $3 \times 3$ matrix multiply.

| Type | Syntax |
| :--- | :--- |
| Integer | MMPY1 |
| Double-Precision | MMPY1D |
| Single-Precision | MMPY1F |

'34020
Instruction Words

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | type | size |
| 1 D |  |  | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |

Instruction to '34082

| 31 | 29 |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| ID | 0 | 0000 | 1010 | 0000 | 0011 | $111 t$ | 5000 |

This instruction multiplies the matrix B by an vector element, A1. This instruction may be combined with MMPY0, MMPY2, and MMPY3 to multiply matrices of several sizes. $1 \times 4$ by $4 \times 4,4 \times 4$ by $4 \times 4,1 \times 3$ by $3 \times 3$, and $3 \times 3$ by $3 \times 3$ matrix multiplies may be implemented.

A $4 \times 4$ matrix in the FPU registers:

| RA0 $=$ B00 | RA1 $=$ B01 | RA2 $=B 02$ | RA3 $=$ B03 |
| :--- | :--- | :--- | :--- |
| RA4 $=$ B10 | RA5 $=$ B11 | RA6 $=$ B12 | RA7 $=$ B13 |
| RA8 $=$ B20 | RA9 $=$ B21 | RB0 $=$ B22 | RB1 $=$ B23 |
| RB2 $=$ B30 | RB3 $=$ B31 | RB4 $=$ B32 | RB5 $=$ B33 |

The initial products from MMPYO for the resulting matrix row:
RB6 $=\mathrm{Ax} 0 \times \mathrm{B} 00$
$\mathrm{RB} 7=\mathrm{Ax} 0 \times \mathrm{B} 01$
$\mathrm{RB} 8=\mathrm{Ax0} \times \mathrm{B} 02$
$C T=A x 0 \times B 03$

The second element (Ax1) of a row vector: RB9 = Ax1
Algorithm
$\begin{array}{ll}R B 6=R B 6+(R B 9 \times R A 4) & ;(\mathrm{A} 00 \times \mathrm{B} 00)+(\mathrm{A} \times 1 \times \mathrm{B} 10) \\ \mathrm{RB} 7=\mathrm{RB} 7+(\mathrm{RB9} \times \mathrm{RA} 5) & ;(\mathrm{A} 0 \times \mathrm{B} 01)+(\mathrm{A} 1 \times \mathrm{B} 11) \\ \mathrm{RB} 8=\mathrm{RB} 8+(\mathrm{RB9} \times \mathrm{RA} 6) & ;(\mathrm{A} 00 \times \mathrm{B} 02)+(\mathrm{A} \times 1 \times \mathrm{B} 12) \\ \mathrm{RB} 9=\mathrm{CT}+(\mathrm{RB} 9 \times \mathrm{RA} 7) & ;(\mathrm{A} 00 \times \mathrm{B} 03)+(\mathrm{A} 1 \times \mathrm{B} 13)\end{array}$
$C T=$ RB9 $; C T$ is used to store the fourth value since
; RB9 will be corrupted.

Temporary Storage
Inputs

C
$R B 6=(A x 0 \times B 00)+(A x 1 \times B 10)$
$R B 7=(A x 0 \times B 01)+(A \times 1 \times B 11)$
$R B 8=(A x 0 \times B 02)+(A x 1 \times 12)$
$R B 9=C T=(A x 0 \times B 03)+(A x 1 \times B 13)$
CEXEC, short
See Example 5-4 for code for a $3 \times 3$ by $3 \times 3$ matrix multiply.
Syntax

34020
Instruction Words

Instruction to '34082

Description

Implled Operands

## Algorithm

Temporary Storage
Outputs

| Type | Syntax |
| :--- | :--- |
| Integer | MMPY2 |
| Double-Precision | MMPY2D |
| Single-Precision | MMPY2F |


| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | type | size |
| ID |  |  | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |


| 31 | 29 |  |  |  | 0 |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| ID | 0 | 0000 | 1100 | 0000 | 0011 | 111 t | 5000 | 0000 |

This instruction multiplies the matrix B by a vector element, A 2 . This instruction may be combined with MMPY0, MMPY1, and MMPY3 to multiply matrices of several sizes. $1 \times 4$ by $4 \times 4,4 \times 4$ by $4 \times 4,1 \times 3$ by $3 \times 3$, and $3 \times 3$ by $3 \times 3$ matrix multiplies may be implemented.

A $4 \times 4$ matrix in the FPU registers:

| $\mathrm{RAO}=\mathrm{B00}$ | $\mathrm{RA1}=\mathrm{B01}$ | $\mathrm{RA} 2=\mathrm{B} 02$ | $\mathrm{RA} 3=\mathrm{B03}$ |
| :---: | :---: | :---: | :---: |
| $R A 4=B 10$ | $\mathrm{RA} 5=\mathrm{B} 11$ | $R A 6=B 12$ | $R A 7=B 13$ |
| $R A 8=B 20$ | $\mathrm{RA} 9=\mathrm{B} 21$ | $\mathrm{RB0}=\mathrm{B} 22$ | $R B 1=B 23$ |
| $R B 2=B 30$ | $\mathrm{RB3}=\mathrm{B} 31$ | $\mathrm{RB4}=\mathrm{B} 32$ | $R B 5=B 33$ |

The accumulated sums from MMPY0 and MMPY1 for the resulting matrix:
$R B 6=(A x 0 \times B 00)+(A x 1 \times B 10)$
$R B 7=(A x 0 \times B 01)+(A x 1 \times B 11)$
RB8 $=(\mathrm{Ax0} \times \mathrm{B} 02)+(\mathrm{Ax1} \times \mathrm{B} 12)$
$C T=(A x 0 \times B 03)+(A x 1 \times B 13)$
The third element (Ax2) of a row vector: RB9 $=A x 2$

| $R B 6=R B 6+(C \times R A 8)$ | ; $(\mathrm{Ax0} 0 \times \mathrm{B00}+\mathrm{Ax} 1 \times \mathrm{B} 10)+(\mathrm{Ax} 2 \times \mathrm{B} 20)$ |
| :---: | :---: |
| $\mathrm{RB} 7=\mathrm{RB} 7+(\mathrm{C} \times \mathrm{RA} 9)$ | $;(A x 0 \times B 01+A x 1 \times B 11)+(A x 2 \times B 21)$ |
| $\mathrm{RB8}=\mathrm{RB} 8+(\mathrm{C} \times \mathrm{RB} 0)$ | $;(A x 0 \times B 02+A x 1 \times B 12)+(A x 2 \times B 22)$ |
| $\mathrm{RB9}=\mathrm{CT}+(\mathrm{C} \times \mathrm{RB} 1)$ | $;(A x 0 \times B 03+A x 1 \times B 13)+(A x 2 \times B 23)$ |
| $C T=R B 9$ | ; CT is used to store the fourth value since ; RB9 will be corrupted. |

CT
RB6 $=(\mathrm{Ax0} \times \mathrm{B} 00)+(\mathrm{Ax} 1 \times \mathrm{B} 10)+(\mathrm{A} 2 \times \mathrm{B} 20)$
$R B 7=(A x 0 \times B 01)+(A x 1 \times B 11)+(A x 2 \times B 21)$
$R B 8=(A x 0 \times B 02)+(A x 1 \times B 12)+(A x 2 \times B 22)$
$R B 9=C T=(A x 0 \times B 03)+(A x 1 \times B 13)+(A x 2 \times B 23)$
Note that the result of this operation is the completed row for a $1 \times 3$ by $3 \times 3$ or $3 \times 3$ by $3 \times 3$ matrix multiply.

CEXEC, short
See Example 5-4 for code for a $3 \times 3$ by $3 \times 3$ matrix multiply.

## Syntax

'34020 Instruction Words

Instruction to '34082

Description

Implied Operands

Algorithm

Temporary Storage

## Outputs

Instruction Type

$$
\begin{array}{ll}
C=R B 9 & \\
R B 9=C T & \\
R B 6=R B 6+(C \times R B 2) & ;(A \times 0 \times B 00+A \times 1 \times B 10+A \times 2 \times B 20) \\
& ;+(A \times 3 \times B 30) \\
R B 7=R B 7+(C \times R B 3) & ;(A \times 0 \times B 01+A \times 1 \times B 11+A \times 2 \times B 21) \\
& ;+(A \times 3 \times B 31) \\
R B 8=R B 8+(C \times R B 4) & ;(A \times 0 \times B 02+A \times 1 \times B 12+A \times 2 \times B 22) \\
& ;+(A \times 3 \times B 32) \\
R B 9=R B 9+(C \times R B 5) & ;(A \times 0 \times B 03+A \times 1 \times B 13+A \times 2 \times B 23) \\
& ;+(A \times 3 \times B 33)
\end{array}
$$

| Type | Syntax |
| :--- | :--- |
| Integer | MMPY3 |
| Double-Precision | MMPY3D |
| Single-Precision | MMPY3F |


| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | type | size |
| ID |  |  | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |


| 31 | 29 |  |  |  | 0 |  |  |  |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| ID | 0 | 0000 | 1110 | 0000 | 0011 | 111 t | 5000 | 0000 |

This instruction multiplies the matrix B by a vector element, A3. This instruction may be combined with MMPY0, MMPY1, and MMPY2 to multiply matrices of several sizes. $1 \times 4$ by $4 \times 4,4 \times 4$ by $4 \times 4,1 \times 3$ by $3 \times 3$, and $3 \times 3$ by $3 \times 3$ matrix multiplies may be implemented.
A matrix in FPU registers:

| RA0 $=$ B00 | RA1 $=$ B01 | RA2 $=$ B02 | RA3 $=$ B03 |
| :--- | :--- | :--- | :--- |
| RA4 $=$ B10 | RA5 $=$ B11 | RA6 $=$ B12 | RA7 $=$ B13 |
| RA8 $=$ B20 | RA9 $=$ B21 | RB0 $=$ B22 | RB1 $=$ B23 |
| RB2 $=$ B30 | RB3 $=$ B31 | RB4 $=$ B32 | RB5 $=$ B33 |

The accumulated sums from MMPY0, MMPY1 and MMPY2 for the resulting matrix:

```
RB6 \(=(\mathrm{Ax0} \times \mathrm{B} 00)+(\mathrm{Ax1} \times \mathrm{B} 10)+(\mathrm{Ax} 2 \times \mathrm{B} 20)\)
\(R B 7=(A x 0 \times B 01)+(A x 1 \times B 11)+(A x 2 \times B 21)\)
RB8 \(=(\mathrm{Ax0} \times \mathrm{B} 02)+(\mathrm{A} 1 \times \mathrm{B} 12)+(\mathrm{Ax} 2 \times \mathrm{B} 22)\)
\(C T=(A x 0 \times B 03)+(A x 1 \times B 13)+(A x 2 \times B 23)\)
```

The fourth element $(A x 3)$ of a row vector: $\mathrm{RB} 9=\mathrm{Ax} 3$

C

The output of this operation is the result matrix row.
RB6 = Result $x 0$
RB7 $=$ Result $\times 1$
RB8 = Result $x 2$
RB9 $=$ Result $x 3$
CEXEC, short

```
Syntax
'34020 Instruction Words
```

Instruction to '34082

## Description

## Operands

Implied Operands

## Algorithm

Temporary Storage
Outputs

| Type | Syntax |
| :--- | :--- |
| Integer | MNMX1 CRs |
| Double-Precision | MNMX1D CRs |
| Single-Precision | MNMX1F CRs |


| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | type | size |
| ID |  |  | CRs |  |  |  | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |


| $31 \quad 29$ | 28 | 25 |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 10 | CRs | 00100 | 0000 | 0011 | $111 t$ | 5000 | 0000 |

The 1-D Min/Max function compares the current data to a current minimum value and a current maximum value. If the current data is less than the minimum then the minimum is set to the current data; if the current data is greater than the current maximum then the maximum value is set to the current data. For each current data tested a counter is incremented and when the minimum or maximum values are updated the current counter value is put in a minimum count or maximum count register so that the count of the data responsible for the minimum or maximum is in the respective count register. The INMNMX instruction should be used to initialize the min/max registers before the first MNMX1 instruction.

CRs TMS34082 register containing the value to test for minimum/maximum. Must be in the RA register file.

RB0 = Current integer minimum
RB1 = Current integer maximum
COUNTX contains the counts for the current maximum and minimum values Bits 15-0 are the count value for the current minimum Bits 31-16 are the count value for the current maximum

```
If CRs < RB0
    RB0 = CRs ; RB0 tracks current X minimum
    COUNTX bits 15-0 = Count
    If CRs > RB1
    RB1 = CRs ; RB1 tracks current X maximum
    COUNTX bits 31-16 = Count
    Count = Count +1
```

None
$\mathrm{RB}=$ minimum of (CRs and RB0)
RB1 = maximum of (CRs and RB1)
COUNTX 15-0 is updated to the current count if CRs is a minimum.
COUNTX 31-16 is updated to the current count if CRs is a maximum.

Instruction Type CEXEC, short
Syntax
'34020
Instruction Words

Instruction to '34082

## Description

Operands

Implied Operands

| Type | Syntax |
| :--- | :--- |
| Integer | MNMX2 $C R s_{1}, C R s_{2}$ |
| Double-Precision | MNMX2D $C R s_{1}, C R s_{2}$ |
| Single-Precision | MNMX2F $C R s_{1}, C R s_{2}$ |


| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | type | size |
| ID |  |  | CRs 1 |  |  |  | 0 | 0 | 1 | 1 | 0 | CRs2 |  |  |  |



The 2-D Min/Max function compares two current data values ( X and Y ) to a current minimum value and a current maximum value. If the current data is less than the minimum then the minimum is set to the current data; if the current data is greater than the current maximum then the maximum value is set to the current data. For each current data tested a counter is incremented and when the minimum or maximum values are updated, the current counter value is put in a minimum count or maximum count register so that the count of the data responsible for the minimum or maximum is in the respective count register. The INMNMX instruction should be used to initialize the min/max registers before the first MNMX2 instruction.
$\mathrm{CRs}_{1}$ TMS34082 register containing the value to test for X minimum/maximum. Must be in RA register file.
$\mathrm{CRs}_{2}$ TMS34082 register containing the value to test for Y minimum/maximum. Must be in RA register file.

RB0 $=$ current $X$ minimum
RB1 $=$ current $X$ maximum
RB2 $=$ current $Y$ minimum
RB3 $=$ current $Y$ maximum
COUNTX contains the counts for the current maximum and minimum values Bits 15-0 are the count value for the current X minimum Bits 31-16 are the count value for the current $X$ maximum
COUNTY contains the counts for the current $Y$ maximum and minimum values Bits 15-0 are the count value for the current $Y$ minimum Bits 31-16 are the count value for the current $Y$ maximum

| Algorithm | If $\mathrm{CRs}_{1}<\mathrm{RB} 0$ <br> $\mathrm{RBO}=\mathrm{CRs}_{1}$ <br> ; RBO tracks current $X$ minimum <br> COUNTX bits $15-0=$ Count <br> If $\mathrm{CRs}_{1}>$ RB1 <br> $\mathrm{RB} 1=\mathrm{CRs}_{1}$ <br> ; RB1 tracks current X maximum <br> COUNTX bits $31-16=$ Count <br> If $\mathrm{CRs}_{2}<\mathrm{RB} 2$ <br> RB2 $=$ CRs $_{2} \quad ;$ RB2 tracks current $Y$ minimum <br> COUNTY bits $15-0=$ Count <br> If $\mathrm{CRs}_{2}>\mathrm{RB} 3$ <br> $\mathrm{RB} 3=\mathrm{CRs}_{2} \quad$; RB3 tracks current Y maximum <br> COUNTY bits $31-16=$ Count <br> Count $=$ Count +1 |
| :---: | :---: |
| Temporary Storage | None |
| Outputs | $\mathrm{RBO}=$ minimum of $\left(\mathrm{CRs}_{1}\right.$ and RBO) <br> RB1 = maximum or (CRs 1 and RB1) <br> $\mathrm{RB} 2=$ minimum of $\left(\mathrm{CRs}_{2}\right.$ and RB2) <br> RB3 $=$ maximum or ( $\mathrm{CRs}_{2}$ and RB3) <br> COUNTX $15-0$ is updated to the current count if CRs $_{1}$ is a $X$ minimum. COUNTX 31-16 is updated to the current count if CRs $_{1}$ is a $X$ maximum. COUNTY 15-0 is updated to the current count if $\mathrm{CRs}_{2}$ is a Y minimum. COUNTY 31-16 is updated to the current count if $\mathrm{CRs}_{2}$ is a Y maximum. |
| Instruction Type | CEXEC, short |

Syntax

Execution
'34020
Instruction Words

| Type | Syntax |
| :--- | :--- |
| Integer | MOVE Rs, CRd |
| Single-Precision | MOVF Rs, CRd |

Rs $\rightarrow$ CRd

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | R | Rs |  |  |  |
| 0 | 1 | 0 | 0 | 1 | 1 | 0 | type | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| ID |  |  | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | CRd |  |  |  |  |

Instruction to '34082

| 31 | 2928 | 2120 | 16 |  |  |  |  |
| :---: | ---: | :--- | :--- | :--- | :--- | :--- | :--- |
| ID | 0000 | 0000 | CRd | 0100 | $110 t$ | 0000 | 0000 |

## Description

Instruction Type

## Example

Rs TMS34020 source register for the 32-bit value to TMS34082
CRd TMS34082 destination register to hold the 32-bit value MOVx moves the contents of Rs into CRd.

CMOVGC, one register
MOVF A5, RA7
This example moves the single-precision floating-point contents of TMS34020 register A5 into TMS34082 register RA7.

## Syntax

## Execution

'34020
Instruction Words

Instruction to '34082

Operands

## Description

Instruction Type
Example

| Type | Syntax |
| :--- | :--- |
| Integer | MOVE $R s_{1}, R s_{2}, C R d$ |
| Double-Precision | MOVD $R s_{1}, R s_{2}, C R d$ |
| Single-Precision | MOVF $R s_{1}, R s_{2}, C R d$ |

Integer or Single-Precision:
$\mathrm{Rs}_{1} \rightarrow \mathrm{CRd}$
advance to next TMS34082 register
$\mathrm{Rs}_{2} \rightarrow$ CRd

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | R | $\mathrm{Rs}_{1}$ |  |
| 0 | 1 | 0 | 0 | 1 | 1 | 0 | type | size | 0 | 0 | R | Rs 2 |  |
| ID |  |  |  |  |  |  |  |  |  |  |  | 0 | 0 |


| 2928 |  | 2019 |  | 1615 |  |  | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 10 | 0000 | 0000 | CRd | 0100 | 110 t | s 000 | 0000 |

$R_{1} \quad$ TMS34020 source register for the first value (or half of a double-precision value) to TMS34082
$\mathrm{Rs}_{2} \quad$ TMS34020 source register for the second value (or the remaining half of the double-precision value) to TMS34082

CRd TMS34082 destination register that holds the first value. For integer and single-precision moves, the second value will be placed in the next register in the TMS34082 register sequence list.

MOVx moves the contents of $\mathrm{Rs}_{1}$ and $\mathrm{Rs}_{2}$ into CRd (and CRd+1 for integer and single-precision instructions).

For double-precision moves, the TMS34082 configuration register LOAD bit determines whether the LSBs or the MSBs will be moved first:

If the LOAD bit $=1, \quad$ then the LSBs are moved first (32 LSBs of the fraction)

If the LOAD bit $=0, \quad$ then the MSBs are moved first (sign, exponent, and 20 MSB of the fraction)

The LOAD bit default is 0 .
CMOVGC, two registers
MOVE A5, A6, RA7
This instruction moves the integer contents of TMS34020 registers A5 and A6 into TMS34082 registers, RA7 and RA8, respectively.

Syntax
Execution
'34020
Instruction Words

Instruction to '34082

## Operands

Description

MOVD $C R s, R d_{1}, R d_{2}$

CRs (MSH or LSH) $\rightarrow$ Rd $_{1}$ CRs (LSH or MSH) $\rightarrow \mathrm{Rd}_{2}$

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 2 | 1 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | R | $\mathrm{Rd}_{1}$ |  |
| 0 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | R | $\mathrm{Rd}_{2}$ |  |
| ID |  |  |  |  |  |  |  |  |  |  |  |  | 0 |


| 31 | 2928 | 1615 | 0 |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| ID | 0000 | 0000 | CRd | 0100 | 1111 | 1000 | 0000 |

CRs TMS34082 source register holding the value to TMS34020
$\mathrm{Rd}_{1} \quad$ TMS34020 destination register for half the double-precision value
$\mathrm{Rd}_{2} \quad$ TMS34020 destination register for the remaining half of the double-precision value

MOVD moves one 64-bit value from TMS34082 register CRs to TMS34020 registers $\mathrm{Rd}_{1}$ and $\mathrm{Rd}_{2}$.

The TMS34082 configuration register LOAD bit determines whether the LSBs or the MSBs will be moved first:

> If the LOAD bit =1, $\quad \begin{aligned} & \text { then the LSBs are moved first } \\ & \text { (32 LSBs of the fraction) }\end{aligned}$ If the LOAD bit $=0$, $\begin{aligned} & \text { then the MSBs are moved first } \\ & \text { (sign, exponent, and } 20 \mathrm{MSBs} \text { of the fraction) }\end{aligned}$

The LOAD bit default is 0 .
CMOVCG, two registers
MOVD RA7, A5, A6
This example moves the double-precision floating-point contents of TMS34082 register RA7 to TMS34020 registers A5 and A6. The order (MSBs or LSBs in A5) depends on the value of the LOAD bit in the configuration register.

## Syntax

| Type | Syntax |
| :--- | :--- |
| Integer | MOVE $* R s+, C R d, R d$ |
| Double-Precision | MOVD $* R s+, C R d, R d$ |
| Single-Precision | MOVF $* R s+, C R d, R d$ |

## Execution

'34020
Instruction Words

Instruction to '34082


Rs TMS34020 source register containing the address of the first 32-bit value (or half of the 64-bit value) to move to the TMS34082

CRd TMS34082 destination register to hold the first value
Rd TMS34020 register containing the number of 32-bit transfers to make. This value must be in the range 0 to 31

$$
\begin{array}{ll}
\text { If } \mathrm{Rd}=0, & \text { then } 32 \text { 32-bit transfers are made } \\
\text { If } \mathrm{Rd}=1 \rightarrow 31, & \text { then Rd 32-bit transfers are made }
\end{array}
$$

Note that because 64-bit floating-point values require two 32-bit moves, an odd number in Rd will give unpredictable results.

## Description

Instruction Type

## Example

MOVx moves values from memory beginning at the address in Rs into TMS34082 registers beginning at CRd. Rs is incremented after each transfer. CRs is advanced to the next register in the sequence list after each 32-bit transfer for integer and single-precision moves, after every two 32-bit transfers for double-precision moves. The number of 32-bit transfers made is determined by the value of $R d$.

For double-precision moves, the TMS34082 configuration register LOAD bit determines whether the LSBs or the MSBs will be moved first:

> If the LOAD bit $=1, \quad \begin{aligned} & \text { then the LSBs are moved first } \\ & \text { (32 LSBs of the fraction) }\end{aligned}$ If the LOAD bit $=0$, $\begin{aligned} & \text { then the MSBs are moved first } \\ & \text { (sign, exponent, and } 20 \text { MSBs of the fraction) }\end{aligned}$

The LOAD bit default is 0 .
CMOVMC, postincrement, register count
MOVE *A5+, RA7, B7
This instruction moves integer values from TMS34020 memory location pointed to by A5 to TMS34082 registers beginning at RA7. After each 32-bit transfer, register A5 is incremented, and the TMS34082 destination is advanced to the next register in the TMS34082 register sequence list. B7 holds the number of 32-bit transfers to be made.
Syntax
Execution
'34020
Instruction Words

Instruction to '34082

## Operands

## Description

Type
Syntax
Integer
Double-Precision
Single-Precision
Repeat count times
$* R s \rightarrow$ CRd
$R$ Rs $+32 \rightarrow$ Rs MOVE *Rs+, CRd, [,count]
MOVD *Rst, CRd, [,count]
MOVF *Rst, CRd, [,count]

##  <br> \#Femsempe

advance to the next TMS34082 register

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 |  | transfers |  |  |
| 1 | 0 | 0 | 0 | 1 | 1 | 0 | type | size | 0 | 0 | $R$ | Rs |  |  |
| ID |  |  |  |  | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |  | CRd |


| 31 | 2928 | 1615 |  | 0 |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| ID | 0000 | 0000 | CRd | 1000 | $110 t$ | 5000 |

Rs TMS34020 source register containing the address of the first 32-bit value (or half the first 64-bit value) to move to the TMS34082

CRd TMS34082 destination register to hold the first operand
count The number of 32 -bit or 64 -bit transfers to make. This value must be in the range 1 to 32 for integer and single-precision moves or 1 to 16 for double-precision moves. The default value is 1 . Count determines the value of transfers:

Integer or Single-Precision:
If count $=32, \quad$ then transfers $=0$
If count $=1 \rightarrow 31, \quad$ then transfers $=$ count
Double-Precision:

$$
\begin{array}{ll}
\text { If count }=16, & \text { then transfers }=0 \\
\text { If count }=1 \rightarrow 15, & \text { then transfers }=2 \times \text { count }
\end{array}
$$

MOVx moves values from memory beginning at the address in Rs into TMS34082 registers beginning at CRd. Rs is incremented after each transfer. CRs is advanced to the next register in the sequence list after each 32-bit transfer for integer and single-precision moves, after everytwo 32-bit transfers for double-precision moves. The number of 32 -bit transfers made is determined by the value of count.

For double-precision moves, the TMS34082 configuration register LOAD bit determines whether the LSBs or the MSBs will be moved first:

If the LOAD bit $=1, \quad$ then the LSBs are moved first ( 32 LSBs of the fraction)

If the LOAD bit $=0, \quad$ then the MSBs are moved first (sign, exponent, and 20 MSBs of the fraction)

The LOAD bit default is 0 .

Instruction Type
Example

CMOVMC, postincrement, constant count
MOVD *A5+, RB7, 4
This example moves four 64-bit double-precision floating-point values from TMS34020 memory location pointed to by A5 to TMS34082 registers beginning at RB7. After each 32-bit transter, register A5 is incremented; after every two 32-bit transfers, the TMS34082 destination is advanced to the next register in the TMS34082 register sequence list. Count specifies that four 64-bit transfers (eight 32 -bit transfers) are made.
Syntax
Execution
'34020
Instruction Words

Instruction to ’34082

## Operands

## Description

Syntax MOVE -*Rs, CRd, [, count] MOVD -*Rs, CRd [, count] MOVF -*Rs, CRd [, count]

Repeat count times
$R s-32 \rightarrow R s$
${ }^{*}$ Rs $\rightarrow$ CRd


advance to the next TMS34082 register


Rs TMS34020 source register containing the address of the bit immediately after the first 32-or 64-bit value to move to the TMS34082

CRd TMS34082 destination register to hold the first value
count The number of 32- or 64-bit transfers to make. This value must be in the range 1 to 32 for integer and single-precision moves or 1 to 16 for double-precision moves; the default value is 1 . Count determines the value of transfers:

Integer or Single-Precision:
If count $=32, \quad$ then transfers $=0$
If count $=1 \rightarrow 31, \quad$ then transfers $=$ count
Double-Precision:

$$
\begin{array}{ll}
\text { If count }=16, & \text { then transfers }=0 \\
\text { If count }=1 \rightarrow 15, & \text { then transfers }=2 \times \text { count }
\end{array}
$$

MOVx moves values from memory beginning at the address in (Rs -32 ) into TMS34082 registers beginning at CRd. Before each transfer, the contents of Rs are decremented; after each transfer (or every two transfers for double-precision moves), the TMS34082 destination is advanced to the next register in the TMS34082 register sequence list. The number oftransters made is determined by the value of count.
For double-precision moves, the TMS34082 configuration register LOAD bit determines whether the LSBs or the MSBs will be moved first:

If the LOAD bit $=1$, then the LSBs are moved first ( 32 LSBs of the fraction)

If the LOAD bit $=0, \quad$ then the MSBs are moved first (sign, exponent, and 20 MSBs of the fraction)

The LOAD bit default is 0 .
Instruction Type CMOVMC, predecrement, constant count

## Example <br> MOVF - *A5, RB7, 4

This example moves four 32 -bit single-precision floating-point values from TMS34020 memory location pointed to by (A5-32) to TMS34082 registers beginning at RB7. Before each 32-bit transfer, register A5 is decremented; after each transfer, TMS34082 destination is advanced to the next register in the TMS34082 register sequence list. Count specifies that four 32-bit transfers are made.

Syntax

## Execution

'34020 Instruction Words

Instruction to '34082

## Operands

## Description

| Type | Syntax |
| :--- | :--- |
| Integer | MOVE $C R d, * R d+[$, count $]$ |
| Double-Precision | MOVD CRd,*Rd $+[$, count $]$ |
| Single-Precision | MOVF CRd, *Rd + [, count $]$ |

Repeat count times
CRs $\rightarrow$ \& Rd
$R d+32 \rightarrow R d$


advance to the next TMS34082 register

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | R | 0 |  |  |
| 1 | 0 | 0 | 0 | 1 | 1 | 1 | type | size | 0 | 0 |  | transfers |  |  |
| ID |  |  | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |  | CRd |  |  |


| 31 | 2019 |  | 1615 |  | 0 |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| ID | 0000 | 0000 | CRd | 1000 | 111 t | s 000 | 0000 |

CRs TMS34082 source register for the first 32-bit value (or half of the first 64-bit value) to TMS34020 memory

Rd TMS34020 register containing the address for the first value transferred
count The number of 32- or 64-bit transfers to make. This value must be in the range 1 to 32 for integer and single-precision moves or 1 to 16 for double-precision moves. The default value is 1 . Count determines the value of transfers:

Integer or Single-Precision:

$$
\begin{array}{ll}
\text { If count }=32, & \text { then transfers }=0 \\
\text { If count }=1 \rightarrow 31, & \text { then transfers }=\text { count }
\end{array}
$$

Double-Precision:

$$
\begin{array}{ll}
\text { If count }=16, & \text { then transfers }=0 \\
\text { If count }=1 \rightarrow 15, & \text { then transfers }=2 \times \text { count }
\end{array}
$$

MOVx moves the values from TMS34082 registers beginning at CRd to memory beginning at the address in Rd. After each 32-bit transfer, Rd is incremented. The TMS34082 register is advanced to the next register in the TMS34082 register sequence after every 32-bit transfer for integer and single-precision moves or after every second 32-bit transfer for double-precision moves. The number of transfers made is determined by the value of count.

For double-precision moves, the TMS34082 configuration register LOAD bit determines whether the LSBs or the MSBs will be moved first:

| If the LOAD bit $=1$, | then the LSBs are moved first <br> (32 LSBs of the fraction) |
| :--- | :--- |
| If the LOAD bit $=0$, | then the MSBs are moved first <br> (sign, exponent, and20 MSBs of the fraction) |

The LOAD bit default is 0 .

Instruction Type
Example

CMOVCM, postincrement, constant count
MOVE RB7, *A5+, 4
This example moves four 32-bit integer values from TMS34082 registers beginning at RB7 to TMS34020 memory pointed to by A5. After each 32-bit transfer, register A5 is incremented, and the TMS34082 destination is advanced to the next register in the TMS34082 register sequence list. Count specifies that four 32-bit transfers are made.
Syntax
Execution
'34020
Instruction Words

Instruction to '34082

## Operands

## Description

| Type | Syntax |
| :--- | :--- |
| Integer | MOVE CRs, $-* R d[$, count $]$ |
| Double-Precision | MOVD CRs $-* R d$, count |
| Single-Precision | MOVF CRs,$-* R d[$, count $]$ |

Repeat count times
$\mathrm{Rd}-32 \rightarrow \mathrm{Rd}$
CRs $\rightarrow$ Rd


advance to the next TMS34082 register

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | $R$ | Rd |  |  |  |
| 1 | 0 | 0 | 0 | 1 | 1 | 1 | type | size | 0 | 0 |  | transfers |  |  |  |
| ID |  |  | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |  | CRd |  |  |  |



CRd TMS34082 source register for the first value to TMS34020 memory
Rd TMS34020 register containing the address of the bit immediately following the 32 bits (or 64 bits for double-precision moves) used to store the first value transferred.
count The number of 32- or 64-bit transfers to make. This value must be in the range 1 to 32 for integer and single-precision moves or 1 to 16 for double-precision moves. The default value is 1 . Count determines the value of transfers:

Integer or Single-Precision:

| If count $=32$, | then transfers $=0$ |
| :--- | :--- |
| If count $=1 \rightarrow 31$, |  |
| then transfers $=$ count |  |

Double-Precision:
If count $=16, \quad$ then transfers $=0$
If count $=1 \rightarrow 15$,
then transfers $=2 \times$ count

MOVx moves the values from TMS34082 registers beginning at CRd to memory beginning at the address (Rd-32). Before each 32-bit transfer, Rd is decremented; after each 32-bit transfer (or every two transfers for double-precision moves), the TMS34082 register is advanced to the next register in the TMS34082 register sequence. The number of 32-bit transfers made is determined by the value of count.

For double-precision moves, the TMS34082 configuration register LOAD bit determines whether the LSBs or the MSBs will be moved first:
$\begin{array}{ll}\text { If the LOAD bit }=1, & \begin{array}{l}\text { then the LSBs are moved first } \\ (32 \text { LSBs of the fraction) }\end{array} \\ \text { If the LOAD bit }=0, & \begin{array}{l}\text { then the MSBs are moved first } \\ \text { (sign, exponent, and } 20 \text { MSBs of the fraction) }\end{array}\end{array}$
The LOAD bit default is 0 .

Instruction Type
Example

CMOVCM, predecrement, constant count
MOVD RB7, -*A5, 2
This example moves two 64-bit double-precision values from TMS34082 registers beginning at RB7 to TMS34020 memory pointed to by (A5 - 32). Before each 32-bit transfer, register A5 is decremented; after every two 32-bit transfers, the TMS34082 destination is advanced to the next register in the TMS34082 register sequence list. Count specifies that two 64-bit transfers are made (four 32-bit transfers).

| Syntax | Type |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  | Integ <br> Doub <br> Sing | $\begin{aligned} & \mathrm{er} \\ & \text { le-P } \\ & \mathrm{e}-\mathrm{P} \end{aligned}$ | recis cisio |  |  | MO <br> MO <br> MO | $\begin{aligned} & \text { IE } \\ & \text { ID } \\ & \text { IF } \end{aligned}$ | CRs CRs, CRs, |  | [, co <br> [, co <br> [, co |  |  |  |  |  |  |
| Execution |  | Rs dva | unt <br> $\rightarrow \mathrm{CR}$ <br> ce to |  |  |  |  | $82 \text { C }$ |  |  | d | egist |  |  |  |  |
| '34020 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| Instruction Words | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | type | size |
|  |  | ID |  |  |  | Rs |  |  |  | unt |  |  |  | CRd |  |  |
| Instruction to '34082 | 31 | 29 | 8 |  | 24 |  | 19 | - |  | 15 |  |  |  |  |  | 0 |
|  | 1 D |  | CR |  |  | unt |  | CRd |  | 000 | 1 | $101 t$ |  | 000 | 0 | 00 |

## Operands

Description

## Instruction Type

Example

CRs Source register RA that holds the first value to move
CRd Destination register to hold the first value moved
count The number of registers to move. This value must be in the range of 1 to 15 ; the default is 1 .

MOVx moves count values from registers starting with CRs to registers starting with CRd. Both source and destination registers are advanced to the next register in the TMS34082 register sequence after each move.

The first source register, CRs, must be in the RA register file.
CEXEC, short
MOVF RA7, RB4, 3
This example moves three 32-bit single-precision floating-point values from TMS34082 register RA7, RA8, and RA9 to TMS34082 registers RB4, RB5, and RB6, respectively.

Syntax
Execution
'34020
Instruction Words

Instruction to '34082

## Operands

## Implied Operands

## Description

MOVFSRAM *Rd+[, count]
Repeat count times
*MCADDR $\rightarrow$ *Rd
$\mathrm{Rd}+32 \rightarrow \mathrm{Rd}$
MCADDR $+32 \rightarrow$ MCADDR

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | R | Rd |  |  |  |
| 1 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | transfers |  |  |  |  |
| ID |  |  | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |


| 31 | 2928 |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| ID | 00011 | 1100 | 0000 | 1001 | 1110 | 0000 | 0000 |

Rd TMS34020 register (indirect postincrement) containing the address of the first 32-bit integer value transferred
count The number of 32-bit transfers to make. This value must be in the range 1 to 32 ; the default value is 1 . Count determines the value of transfers:

If count $=32, \quad$ then transfers $=0$
If count $=1 \rightarrow 31, \quad$ then transfers $=$ count
MCADDR
TMS34082 indirect address register containing the first address in memory on the MSD port for the first 32-bit value to move

MOVFSRAM moves the 32-bit values from memory on the MSD point beginning with the address in MCADDR to memory beginning at the address in Rd. After each 32 -bit transfer, Rd and MCADDR are incremented. The number of 32-bit transfers made is determined by the value of count.

NOTE: Since MCADDR refers to 32-bit word addresses and Rs refers to bit addresses, MCADDR is incremented by 1 (one 32-bit word) and Rs is incremented by 32 (one 32-bit word).

CMOVCM, postincrement, constant count
Syntax
Execution
'34020
Instruction Words

Instruction to '34082

Operands

Implied Operands

Description

Instruction Type

MOVFSRAM -*Rd [, count]
Repeat count times
$\mathrm{Rd}-32 \rightarrow \mathrm{Rd}$
*MCADDR $\rightarrow$ *Rd
MCADDR $+32 \rightarrow$ MCADDR

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | R | Rd |  |  |  |
| 1 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | transfers |  |  |  |  |
| ID |  |  | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |


| 31 | 2928 |  | 0 |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| ID | 0 | 0011 | 1100 | 0000 | 1001 | 1110 | 0000 |

Rd TMS34020 register (indirect predecrement) containing the address of the bit immediately following the 32-bits used to store the first 32-bit integer value transferred
count The number of 32-bit transfers to make. This value must be in the range 1 to 32 ; the default value is 1 . Count determines the value of transfers:

$$
\begin{array}{ll}
\text { If count }=32, & \text { then transfers }=0 \\
\text { If count }=1 \rightarrow 31, & \text { then transfers }=\text { count }
\end{array}
$$

MCADDR
TMS34082 indirect address register containing the first address in memory on the MSD port for the first 32-bit value to move

MOVFSRAM moves the 32-bit values from memory on the MSD port beginning at the address in MADDR to memory beginning at the address (Rd-32). Before each 32-bit transfer, Rd is decremented; after each 32-bit transfer, MCADDR is incremented. The number of 32 -bit transfers made is determined by the value of count.

NOTE: Since MCADDR refers to 32-bit word addresses and Rs refers to bit addresses, MCADDR is incremented by 1 (one 32-bit word) and Rs is decremented by 32 (one 32-bit word).

CMOVCM, predecrement, constant count

## Syntax

## Execution

'34020
Instruction Words

Instruction to '34082

Operands

Implied Operands

Description

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 0 |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | $R$ |  | $R \mathrm{Rd}$ |  |  |  |
| 1 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | $R$ |  | $R \mathrm{Rs}$ |  |  |  |
| ID |  |  |  | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |

MOVTSRAM *Rst, Rd

If $R d=0$
Repeat 32 times
*Rs $\rightarrow$ *MCADDR
Rs $+32 \rightarrow$ Rs
MCADDR $+32 \rightarrow$ MCADDR

If $\mathrm{Rd}=1 \rightarrow 31$
Repeat Rdtimes
*Rs $\rightarrow$ *MCADDR
Rs $+32 \rightarrow$ Rs
MCADDR $+32 \rightarrow$ MCADDR

| 31 | 2828 |  |  | 0 |  |  |  |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| $1 D$ | 00001 | 1100 | 0000 | 1001 | 1110 | 0000 | 0000 |

Rs TMS34020 source register (indirect postincrement) containing the address of the first 32-bit value to move

Rd TMS34020 register containing the number of 32-bit transfers to make. This value must be in the range 0 to 31
$\begin{array}{ll}\text { If } R d=0, & \text { then } 32 \text { 32-bit transfers are made } \\ \text { If } R d=1 \rightarrow 31, & \text { then Rd 32-bit transfers are made } .\end{array}$
MCADDR
TMS34082 indirect address register containing the first address in memory on the MSD port where the 32-bit values are to be stored

MOVTSRAM moves 32-bit values from memory beginning at the address in Rs into memory on the MSD port beginning at the address in MCADDR. After each transfer, Rs and MCADDR are incremented. The number of 32-bit transfers made is determined by the contents of Rd.

NOTE: Since MCADDR refers to 32-bit word addresses and Rs refers to bit addresses, MCADDR is incremented by 1 (one 32-bit word) and Rs is incremented by 32 (one 32-bit word).

CMOVMC, postincrement, register count
Syntax
Execution
34020
Instruction Words

Instruction to '34082

Operands

Implied Operands

Description

Instruction Type

MOVTSRAM *Rs+ [, count]
Repeat count times
*Rs $\rightarrow$ *MCADDR
Rs $+32 \rightarrow$ Rs
MCADDR $+32 \rightarrow$ MCADDR

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 0 | 1. | 1 | 0 | 1 | 0 | 0 | transfers |  |  |  |  |
| 1 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | R | Rs |  |  |  |
| ID |  |  | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |



Rs TMS34020 source register (indirect postincrement) containing the address of the first 32-bit value to move
count The number of 32-bit transfers to make. This value must be in the range 1 to 32 ; the default value is 1 . Count determines the value of transfers:
$\begin{array}{ll}\text { If count }=32, & \text { then transfers }=0 \\ \text { If count }=1 \rightarrow 31, & \text { then transfers }=\text { count }\end{array}$
MCADDR
TMS34082 indirect address register containing the first address in memory on the MSD port where the 32-bit values are to be stored

MOVTSRAM moves the 32-bit values from memory beginning at the address in Rs into memory on the MSD port beginning at the address in MCADDR. After each transfer, Rs and MCADDR are incremented. The number of 32-bit transfers made is determined by the value of count.

NOTE: Since MCADDR refers to 32-bit word addresses and Rs refers to bit addresses, MCADDR is incremented by 1 (one 32-bit word) and Rs is incremented by 32 (one 32-bit word).

CMOVMC, postincrement, constant count

## Syntax

## Execution

'34020 Instruction Words

Instruction to '34082

## Operands

## Implled Operands

## Description

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 2 | 1 | 0 |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 |  | transfers |  |  |  |  |
| 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | R |  | Rs |  |  |  |
| ID |  |  |  | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |

MOVTSRAM -*Rs [, count]
Repeat count times
Rs $-32 \rightarrow$ Rs
*Rs $\rightarrow$ *MCADDR
MCADDR $+32 \rightarrow$ MCADDR
$31 \quad 2928$
ID $\quad 00001 \quad 1100 \quad 0000 \quad 1001 \quad 1110 \quad 0000 \quad 0000$
Rs TMS34020 source register (indirect predecrement) containing the address of the bit immediately after first 32 -bit integer to move to the coprocessor
count The number of 32-bit transfers to make. This value must be in the range 1 to 32 ; the default value is 1 . Count determines the value of transfers:

$$
\begin{array}{ll}
\text { If count }=32, & \text { then transfers }=0 \\
\text { If count }=1 \rightarrow 31, & \text { then transfers }=\text { count }
\end{array}
$$

## MCADDR

TMS34082 indirect address register containing the first address in memory on the MSD port where the 32-bit values are to be stored

MOVTSRAM moves the 32-bit values from memory beginning at the address in (Rs - 32) into memory on the MSD port beginning at the address in MCADDR. Before each transfer, the contents of Rs are decremented; after each transfer, the contents of the MCADDR register are incremented. The number of 32 -bit transfers made is determined by the value of count.

NOTE: Since MCADDR refers to 32-bit word addresses and Rs refers to bit addresses, MCADDR is incremented by 1 (one 32-bit word) and Rs is decremented by 32 (one 32-bit word).

CMOVMC, predecrement, constant count

Syntax
Execution
'34020
Instruction Words

Instruction to '34082

## Operands

Description

Instruction Type
Example

| 31 | 29 | 28 | 25 | 24 | 21 | 20 | 16 | 15 | 0 |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| ID | CRs 1 | CRs 2 | CRd | 0101 | $000 t$ | 0000 | 0000 |  |  |


| Type | Syntax |
| :--- | :--- |
| Integer | MPYS $R s_{1}, R s_{2}, C R s_{1}, C R s_{2}, C R d$ |
| Single-Precision | MPYF $R s_{1}, R s_{2}, C R s_{1}, C R s_{2}, C R d$ |

$\mathrm{Rs}_{1} \rightarrow \mathrm{CRs}_{1}$
$\mathrm{Rs}_{2} \rightarrow \mathrm{CRs}_{2}$
$\mathrm{CRs}_{1} \times \mathrm{CRs}_{2} \rightarrow \mathrm{CRd}$

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | $R$ | 0 |  |
| 0 | 1 | 0 | 1 | 0 | 0 | 0 | type | 0 | 0 | 0 | $R$ | Rs1 |  |
| CRs1 |  |  |  |  |  |  |  |  |  | 0 | Rs2 |  |  |
| CRs 2 |  |  |  |  |  |  |  |  |  | CRd |  |  |  |

$\mathrm{Rs}_{1} \quad$ TMS34020 source register for the first value to coprocessor
$\mathrm{Rs}_{2} \quad$ TMS34020 source register for the second value to coprocessor
$\mathrm{CRs}_{1}$ Coprocessor register to contain the first operand
$\mathrm{CRs}_{2}$ Coprocessor register to contain the second operand
CRd Coprocessor destination register
MPYx loads the contents of $R s_{1}$ and $\mathrm{Rs}_{2}$ into $\mathrm{CRs}_{1}$ and $C R s_{2}$ respectively, multiplies $\mathrm{CRs}_{1} \times \mathrm{CRs}_{2}$, and stores the result in CRd. The two operands, $\mathrm{CRs}_{1}$ and $\mathrm{CRs}_{2}$, must be in opposite register files.

The double-precision form of this instruction is not supported.
CMOVGC, two registers
MPYS A5, A6, RA5, RB6, RA7
This example loads TMS34020 registers A5 and A6 into TMS34082 registers RA5 and RB6 respectively, multiplies the contents of RA5 by RB6, and stores the integer result in RA7.


Syntax

Execution
'34020
Instruction Words

Instruction to '34082

Operands

Description

Instruction Type
Example

| Type | Syntax |
| :--- | :--- |
| Integer | MPYS $-* R s+, C R s_{1}, C R s_{2}, C R d$ |
| Double-Precision | MPYD $-{ }^{*} R s, C R s_{1}, C R s_{2}, C R d$ |
| Single-Precision | MPYF $-* R s+, C R s_{1}, C R s_{2}, C R d$ |

Rs $-32 \rightarrow$ Rs
*Rs $\rightarrow \mathrm{CRs}_{1}$
$R s-32 \rightarrow$ Rs

$\star \mathrm{Rs} \rightarrow \mathrm{CRs}_{2}$


$\mathrm{CRs}_{1} \times \mathrm{CRs}_{2} \rightarrow \mathrm{CRd}$

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | count |  |
| 1 | 0 | 0 | 1 | 0 | 0 | 0 | type | size | 0 | 0 | R |  |  |  |
| ID |  |  | CRs1 |  |  |  | CRs2 |  |  |  | CRd |  |  |  |


| 31 | 29 | 28 | 25 | 24 | 21 | 20 | 16 | 15 |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| ID | CRs1 | CRs 2 | CRd | 1001 | $000 t$ | 5000 | 0000 |  |

Rs TMS34020 source register containing the memory address
CRs $_{1}$ Coprocessor register to contain the first operand
$\mathrm{CRs}_{2}$ Coprocessor register to contain the second operand
CRd Coprocessor destination register
MPYx loads the contents of memory pointed to by Rs into $\mathrm{CRs}_{1}$ and $\mathrm{CRs}_{2}$, multiplies $\mathrm{CRs}_{1}$ by $\mathrm{CRs}_{2}$ and stores the result in CRd. Before each load from memory, Rs is decremented by 32. The two operands, $\mathrm{CRs}_{1}$ and $\mathrm{CRs}_{2}$, must be in opposite register files.

CMOVMC, predecrement, constant count
MPYD -*A5, RA5, RB6, RA7
This example loads memory starting at the address given by TMS34020 register A5 minus 32 into coprocessor registers RA5 and RB6, multiplies the contents of RA5 by RB6 and stores the result in RA7.


Syntax
Execution
'34020
Instruction Words

Instruction to '34082

Operands

Description

Instruction Type
Example

$\mathrm{Rs}_{1} \quad$ TMS34020 source register for the value (or half the value for doubleprecision) to TMS34082
$\mathrm{Rs}_{2}$ TMS34020 source register for the remainder of the 64-bit doubleprecision floating-point value to TMS34082

CRs TMS34082 register containing the operand
CRd TMS34082 destination register
NEGx loads the contents of $\mathrm{Rs}_{1}$ (and $\mathrm{Rs}_{2}$ for double-precision) into register CRs, negates CRs, and stores the result in CRd. The integer instruction (NEG) takes the 2 s complement of the value.

The source register, CRs, must be in the RA TMS34082 register file.
CMOVGC, one or two registers
NEGD A5, A6, RA5, RB7
This example loads the double-precision floating-point contents of TMS34020 registers A5 and A6 into RA5, negates the contents of RA5 and stores the result in RB7.
Syntax
Execution
'34020
Instruction Words

Instruction to '34082

| Type | Syntax |
| :--- | :--- |
| Integer | NEG $* R s+, C R s, C R d$ |
| Double-Precision | NEGD $* R s+, C R s, C R d$ |
| Single-Precision | NEGF $* R s+, C R s, C R d$ |


| *Rs $\rightarrow$ CRs |
| :---: |
| $R s+32 \rightarrow$ Rs |
|  |
|  |
| $-\mathrm{CRs} \rightarrow$ CRd |


| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 10 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | transfers |
| 1 | 0 | 0 | 1 | 1 | 1 | 1 | type | size | 0 | 0 | R | Rs |  |  |
|  | ID |  | CRs |  |  |  | 0 | 0 | 1 | 1 | CRd |  |  |  |


| 31 | 29 | 28 | 25 | 24 | 20 | 18 | 16 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| ID | CRs | 0011 | CRd | 1001 | 111 t | $\mathbf{s} 000$ | 0000 |

Rs TMS34020 register containing the memory address
CRs TMS34082 register to contain the operand
CRd TMS34082 destination register
NEGx loads the contents of memory pointed to by Rs into CRs, negates the contents of CRs, and stores the result in CRd. The integer instruction (NEG) takes the 2s complement of the value. After each load from memory, Rs is incremented by 32.

The source register, CRs, must be in the RA TMS34082 register file.
CMOVMC, postincrement, constant count
NEGF *A5+, RA5, RB7
This example loads memory at the address given by TMS34020 register A5 into TMS34082 register RA5, negates the contents of RA5, and stores the result in RB7.


Syntax
Execution
'34020
Instruction Words

Instruction to '34082

## Operands

## Description

Instruction Type

## Example

NOT Rs, CRs, CRd
Rs $\rightarrow$ CRs
NOT CRs $\rightarrow$ CRd

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | R | Rs |  |  |  |
| 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| ID |  |  | CRs |  |  |  | 0 | 0 | 0 | 1 | CRd |  |  |  |  |


| 31 | 29 | 28 | 25 | $24 \quad 20$ | 19 |  | 16 | 15 |  |  | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| ID |  | CRs |  | 0001 |  | CRd |  | 0101 | 1110 | 0000 | 0000 |

Rs TMS34020 source register for the 32-bit integer value to TMS34082
CRs TMS34082 register to contain the 32-bit integer operand
CRd TMS34082 destination register
NOT loads the contents (integer) of Rs into the CRs, takes the 1s complement of the contents of register CRs, and stores the result in CRd.

The source register, CRs, must be in the RA TMS34082 register file.
CMOVGC, one register
NOT A5, RA5, RA7
This example loads TMS34020 register A5 into TMS34082 register RA5, takes the 1 s complement of the contents of RA5, and stores the result in RA7.


Syntax

## Execution

'34020
Instruction Words

Instruction to '34082

Operands

Description

Instruction Type
Example

NOT -*Rs, CRs, CRd
Rs $-32 \rightarrow$ Rs
*Rs $\rightarrow$ CRs
NOT CRs $\rightarrow$ CRd

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 |
| 1 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | R |  |  |  |  |
| ID |  |  | CRs |  |  |  | 0 | 0 | 0 | 1 | CRd |  |  |  |  |


| 31 | 29 | 28 | 25 | 24 | 20 | 19 |  | 16 |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| ID | CRs | 0001 | CRd | 0101 | 1110 | 0000 | 0000 |  |

Rs TMS34020 register containing the memory address
CRs TMS34082 register to contain the 32-bit integer operand
CRd TMS34082 destination register
NOT loads the contents (integer) of memory pointed to by Rs into the CRs, takes the 1s complement of the contents of register CRs, and stores the result in CRd. Before each load from memory, Rs is decremented by 32.

The source register, CRs, must be in the RA TMS34082 register file.
CMOVMC, predecrement, constant count
NOT -*A5, RA5, RA7
This example loads memory at the address given by TMS34020 register A5 minus 32 into TMS34082 register RA5 takes the is complement of the contents of RA5, and stores the result in RA7.

Syntax
'34020
Instruction Words

Instruction to '34082

Description

Implied Operands

## Algorithm

Temporary Storage
Outputs

Instruction Type

| Type | Syntax |
| :--- | :--- |
| Integer | OUTC3X |
| Double-Precision | OUTC3XD |
| Single-Precision | OUTC3XF |


| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | type | size |
| ID |  |  |  | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |


| 31 | 29 |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| ID | 0 | 0000 | 0000 | 0000 | 0011 | $101 t$ | 5000 |

The OUTC3Xx algorithm compares the given endpoints of a line to the clipping volume in the $X$-axis. The instruction sets three status register bits based on the location of the two endpoints with respect to the clipping volume. OUTC3Xx is used before the clipping instructions to determine which ends of the line need to be clipped.
$\mathrm{RA} 0=\mathrm{X} 1$
$\mathrm{RB} 0=\mathrm{X} 2$
$\mathrm{RA} 1=\mathrm{Y} 1$
$\mathrm{RB} 1=\mathrm{Y} 2$
$\mathrm{RA} 2=\mathrm{Z} 1$
$\mathrm{RB} 2=\mathrm{Z} 2$
$\mathrm{RA} 3=\mathrm{W} 1$
RB3 $=W 2$
$C T=R B 3$
; $\mathrm{CT}=\mathrm{W} 2$
$C=R A 3$
; $\mathrm{C}=\mathrm{W} 1$
$C T=C T-|R B 0| \quad ; \operatorname{set} V=1$ if $(W 2-|X 2|)<0$
$C=C-|R A 0| \quad ;$ set $N=1$ if $(W 1-|X 1|)<0$
If $N=1$ and $V=1$ and $(\operatorname{sign} X 1=\operatorname{sign} X 2)$, then $\operatorname{set} Z=1$
C, CT
Status bits set:
$\underline{Z} \underline{N}$ $\qquad$
1111 both points outside on same side of volume in X -axis
011 both points outside on opposite sides of the volume in $X$-axis
010 only point P1 [X1,Y1,Z1,W1] outside of volume in X -axis
$0 \quad 0 \quad 1$ only point $\mathrm{P} 2[\mathrm{X} 2, \mathrm{Y} 2, \mathrm{Z2}, \mathrm{~W} 2]$ outside of volume in X -axis
000 both points $P 1$ and $P 2$ inside the volume in $X$-axis
CEXEC, short
Syntax

'34020
Instruction Words

Instruction to '34082

| Type | Syntax |
| :--- | :--- |
| Integer | OUTC3Y |
| Double-Precision | OUTC3YD |
| Single-Precision | OUTC3YF |


| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | type | size |
| ID |  |  | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |


| 31 | 29 |  |  |  |  |  |  |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| 10 | 0 | 0000 | 0000 | 0001 | 0011 | $101 t$ | 5000 |

The OUTC3Yx algorithm compares the given endpoints of a line to the clipping volume in the Y -axis. The instruction sets three status register bits based on the location of the two endpoints with respect to the clipping volume. OUTC3Yx is used before the clipping instructions to determine which ends of the line need to be clipped.

Implied Operands

| RA0 $=\mathrm{X} 1$ | RB0 $=\mathrm{X} 2$ |
| :--- | :--- |
| RA1 $=\mathrm{Y} 1$ | RB1 $=\mathrm{Y} 2$ |
| RA $2=\mathrm{Z} 1$ | RB2 $=\mathrm{Z} 2$ |
| RA3 $=\mathrm{W} 1$ | RB3 $=\mathrm{W} 2$ |

Algorithm
$\mathrm{CT}=\mathrm{RB} 3 \quad ; \mathrm{CT}=\mathrm{W} 2$
$C=R A 3 \quad ; C=W 1$
$C T=C T-|R B 1| \quad ;$ set $V=1$ if $(W 2-|Y 2|)<0$
$C=C-|R A 1| \quad ;$ set $N=1$ if $(W 1-|Y 1|)<0$
If $N=1$ and $V=1$ and (sign $Y 1=\operatorname{sign} Y 2)$, then $\operatorname{set} Z=1$
Temporary Storage
C, CT
Status bits set:
Z N V
Description
$\begin{array}{lll}1 & 1 & 1 \\ \text { both points outside on same side of volume in } \mathrm{Y} \text {-axis }\end{array}$
011 both points outside on opposite sides of the volume in $Y$-axis
010 only point $\mathrm{P} 1[\mathrm{X} 1, \mathrm{Y} 1, \mathrm{Z} 1, \mathrm{~W} 1]$ outside of volume in Y -axis
$0 \quad 0 \quad 1$ only point $\mathrm{P} 2[\mathrm{X} 2, \mathrm{Y} 2, \mathrm{Z} 2, \mathrm{~W} 2]$ outside of volume in Y -axis
000 both points P 1 and P 2 inside the volume in Y -axis
Instruction Type

## Syntax

'34020
Instruction Words

| Type | Syntax |
| :--- | :--- |
| Integer | OUTC3Z |
| Double-Precision | OUTC3ZD |
| Single-Precision | OUTC3ZF |


| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | type | size |
| ID |  |  | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |

Instruction to '34082

Description

Implied Operands

Algorithm

Temporary Storage
Outputs

Instruction Type

| 31 | 28 |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 10 | 0 | 0000 | 0000 | 0010 | 0011 | $101 t$ | 5000 | 0000 |

The OUTC3Zx algorithm compares the given endpoints of a line to the clipping volume in the Z-axis. The instruction sets three status register bits based on the location of the two endpoints with respect to the clipping volume. OUTC3Zx is used before the clipping instructions to determine which ends of the line need to be clipped.

| RA0 $=\mathrm{X} 1$ | RB0 $=\mathrm{X} 2$ |
| :--- | :--- |
| RA1 $=\mathrm{Y} 1$ | RB1 $=\mathrm{Y} 2$ |
| RA2 $=\mathrm{Z} 1$ | RB2 $=\mathrm{Z} 2$ |
| RA3 $=\mathrm{W} 1$ | RB3 $=\mathrm{W} 2$ |

$$
\begin{array}{ll}
C T=\text { RB3 } & ; C T=W 2 \\
C=\text { RA3 } & ; C=W 1 \\
C T=C T-\mid \text { RB2 } 2 & ; \operatorname{set} V=1 \text { if }(W 2-|Z 2|)<0 \\
C=C-|R A 2| & ; \operatorname{set} N=1 \text { if }(W 1-|Z 1|)<0 \\
\text { If } N=1 \text { and } V=1 \text { and }(\operatorname{sign} Z 1=\operatorname{sign} \mathrm{Z2}) \text {, then } \operatorname{set} Z=1
\end{array}
$$

C, CT
Status bits set:
$\underline{\mathrm{Z}} \mathrm{N} \mathrm{V}$
Description
$\begin{array}{llll}1 & 1 & 1 & \text { both points outside on same side of volume in } Z \text {-axis }\end{array}$
$\begin{array}{lll}0 & 1 & 1 \\ \text { both points outside on opposite sides of the volume in } Z \text {-axis }\end{array}$
010 only point $\mathrm{P} 1[\mathrm{X} 1, \mathrm{Y} 1, \mathrm{Z} 1, \mathrm{~W} 1]$ outside of volume in Z -axis
$0 \quad 0 \quad 1$ only point $\mathrm{P} 2[\mathrm{X} 2, \mathrm{Y} 2, \mathrm{Z2}, \mathrm{~W} 2]$ outside of volume in Z -axis
000 both points P1 and P2 inside the volume in Z-axis
CEXEC, short

Syntax
34020
Instruction Words

Instruction to '34082

| Type | Syntax |
| :---: | :---: |
| Integer | POLY $\mathrm{CRs}_{1}, \mathrm{CRs}_{2}$ |
| Double-Precision | POLYD CRs $_{1}$, CRs $_{2}$ |
| Single-Precision | POLYF $\mathrm{CRs}_{1}, \mathrm{CRs}_{2}$ |


| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | type | size |
| ID |  |  |  | CRs 1 |  | 0 | 0 | 0 | 0 | 1 |  | CRs 2 |  |  |  |


| 31 | 29 | 28 | 25 | 24 | 20 | 19 | 16 | 15 | 0 |
| :---: | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| 10 | CRs1 | 0 | 0001 | CRs $_{2}$ | 0011 | $111 t$ | s000 | 0000 |  |

Description

Implied Operands
POLYx performs a multiply and accumulate of the form:

$$
\left.A_{n} \times X^{n}+A_{n-1} \times X^{n-1}+A_{n-2} \times X^{n-2}+\ldots\right)+A_{0}
$$

which can also be represented as:

$$
\left(\ldots\left(\left(A_{n} \times X+A_{n-1}\right) \times X+A_{n-2}\right) \times X+\ldots\right)+A_{0}
$$

where the value $X$ is assumed present in the TMS34082 C register and the coefficients $A_{n}$ through $A_{1}$ are to be multiplied by $X$ and accumulated. This instruction multiplies $\mathrm{CRs}_{1}$ by C , adds the result to $\mathrm{CRs}_{2}$, and stores the sum in $\mathrm{CRs}_{1}$.

CRs $_{1}$ TMS34082 register containing $A_{n}$ or accumulated value. Must be in the RA register file.
$\mathrm{CRs}_{2}$ TMS34082 register containing $\mathrm{A}_{\mathrm{n}-1}$ or next coefficient in series. Must be in the RB register file.

Algorithm
$\mathrm{CT}=\mathrm{C} \times \mathrm{CRs}_{1}$
$\mathrm{CRs}_{1}=\mathrm{CT}+\mathrm{CRs}_{2}$
CT
The new accumulated value in $\mathrm{CRs}_{1}$
CEXEC, short

Syntax
'34020
Instruction Words

| Type | Syntax |
| :--- | :--- |
| Integer | SCALE |
| Double-Precision | SCALED |
| Single-Precision | SCALEF |


| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | type | size |
| ID |  |  | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

Instruction to '34082

| 31 | 29 |  | 0 |  |  |  |  |
| :---: | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| ID | 000000 | 0000 | 0000 | 0011 | $000 t$ | 5000 | 0000 |

Description

Implled Operands

Algorithm

Temporary Storage
Outputs
$\mathrm{RAO}=\mathrm{X} 1^{\prime}$
$R A 1=Y 1^{\prime}$
RA2 $=\mathrm{Z} 1^{\prime}$
$\mathrm{RA} 3=\mathrm{W} 1$
Instruction Type
CEXEC, short

Syntax
Execution
'34020
Instruction Words

Instruction to '34082

## Operands

Description

Instruction Type

## Example

| Type | Syntax |
| :--- | :--- |
| Integer | SQR $R s_{1}, C R s, C R d$ |
| Double-Precision | SQRD $R s_{1}, R s_{2}, C R s, C R d$ |
| Single-Precision | SQRF $R s_{1}, C R s, C R d$ |

$\mathrm{Rs}_{1} \rightarrow \mathrm{CRs}$

CRs $\times \mathrm{CRs} \rightarrow \mathrm{CRd}$
Integer or Single-Precision:

|  |  |  |  |  | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | R | Rs 1 |  |  |  |
| 0 | 1 | 0 | 1 | 1 | 1 | 1 | type | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| ID |  |  | CRs |  |  |  | 1 | 0 | 0 | 0 | CRd |  |  |  |  |

## Double-Precision:



| 31 | 29 | 28 | 25 | 24 | 21 | 20 | 16 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| ID | CRs | 1000 | CRd | 0101 | $111 t$ | $\mathbf{s} 000$ | 0000 |

$\mathrm{Rs}_{1} \quad$ TMS34020 source register for the value (or half the value for doubleprecision operands) to TMS34082
$\mathrm{Rs}_{2}$ TMS34020 source register for the remaining half of the 64-bit operand to the TMS34082

CRs TMS34082 register to contain the operand
CRd TMS34082 destination register
SQRx loads the contents of Rs into CRs, squares the contents of CRs, and stores the result in CRd.

The source register, CRs, must be in the RA TMS34082 register file.
CMOVGC, one register
SQR A5, RA5, RB7
This example loads TMS34020 register A5 into TMS34082 register RA5, squares the contents of RA5, and stores the result in RB7.

| Syntax | Type | Syntax |
| :---: | :---: | :---: |
|  | Integer | SQR *Rs+, CRs, CRd |
|  | Double-Precision | SQRD *Rs +, CRs, CRd |
|  | Single-Precision | SQRF *Rs + , CRs, CRd |
| Execution | $*$ Rs $\rightarrow$ CRs |  |
|  | $\mathrm{Rs}+32 \rightarrow \mathrm{Rs}$ |  |
|  |  |  |
|  |  |  |
|  | $\mathrm{CRs} \times \mathrm{CRs} \rightarrow \mathrm{CRd}$ |  |

'34020
Instruction Words

Instruction to '34082

Operands

Description

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | transfers |
| 1 | 0 | 0 | 1 | 1 | 1 | 1 | type | size | 0 | 0 | R | Rs |  |  |
|  | ID |  | CRs |  |  |  | 1 | 0 | 0 | 0 | CRd |  |  |  |


| 31 | 29 | 28 | 25 | 24 | 21 | 20 | 16 | 15 |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| 10 | CRs | 1000 | CRd | 1001 | $111 t$ | 5000 | 0000 |  |

Rs TMS34020 source register containing the memory address
CRs TMS34082 register to contain the operand

CRd TMS34082 destination register
SQRx loads the contents of memory pointed to by Rs into CRs, squares the contents of CRs, and stores the result in CRd. After each load from memory, Rs is incremented by 32.

The source register, CRs, must be in the RA TMS34082 register file.
CMOVMC, postincrement, constant count
SQR *A5+, RA5, RB7
This example loads memory starting at the address given by TMS34020 register A5 into TMS34082 register RA5, squares the contents of RA5, and stores the result in RB7.


| Syntax | Type |  |  |  |  | Syntax |  |  |  |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  | Integer |  |  |  |  | SQRT CRs, CRd |  |  |  |  |  |  |  |  |  |  |
|  | Double-Precision |  |  |  |  | SQRTD CRs, CRd |  |  |  |  |  |  |  |  |  |  |
|  | Single-Precision |  |  |  |  | SQRTF CRs, CRd |  |  |  |  |  |  |  |  |  |  |
| Execution | $\sqrt{\text { CR }}$ | $\rightarrow$ | Rd |  |  |  |  |  |  |  |  |  |  |  |  |  |
| '34020 <br> Instructlon Words | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|  | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | type | size |
|  | ID |  |  | CRs |  |  |  | 1 | 0 | 0 | 1 | CRd |  |  |  |  |
| Instruction to '34082 | $\begin{array}{rrrrrrrr}31 & 29 & 28 & 25 & 24 & 21 & 20\end{array}$ |  |  |  |  |  |  | $16 \quad 15$ |  |  |  | 0 |  |  |  |  |
|  | 1 |  | CR |  |  | 01 |  | CRd |  | 000 |  | 111 t | s | 000 | 00 | 00 |
| Operands | CRs TMS34082 source register containing the operand |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
|  | CRd TMS34082 destination register |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| Description | SQRTx takes the square root of the contents of CRs and stores the result in CRd. |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
|  | The source register, CRs, must be in the RA TMS34082 register file. |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| Transparency | CEXEC, short |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| Example | SQRTD RA5, RA7 |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
|  | This example takes the square root of the contents of RA5 and stores the result in RA7. |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |

## Syntax

| Type | Syntax |
| :--- | :--- |
| Integer | SQRT $R s_{1}$, CRs, CRd |
| Double-Precision | SQRTD $R s_{1}, R s_{2}$, CRs, CRd |
| Single-Precision | SQRTF $R s_{1}$, CRs, CRd |


| Execution | $\mathrm{Rs}_{1} \rightarrow \mathrm{CRs}$ |
| :---: | :---: |
|  |  |
|  | $\sqrt{\text { CRs }} \rightarrow \mathrm{CRd}$ |

## '34082 Instruction Words

Integer or Single-Precision:

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | R | Rs1 |  |  |  |
| 0 | 1 | 0 | 1 | 1 | 1 | 1 | type | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| ID |  |  | CRs |  |  |  | 1 | 0 | 0 | 1 | CRd |  |  |  |  |

Double-Precision:

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | $R$ | 0 |  |  |
| 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | $R$ | $R_{1}$ |  |  |
| ID |  |  |  | CRs |  |  |  |  |  |  |  |  |  | 1 |

Instruction to '34082

| 31 | 29 | 28 | 25 | 24 | 21 | 20 | 16 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 10 | CRs | 1001 | CRd | 0101 | $111 t$ | 5000 | 0000 |

## Operands

Description

## Transparency

Example
$\mathrm{Rs}_{1} \quad$ TMS34020 source register for the value (or half the double-precision value) to the TMS34082
$\mathrm{Rs}_{2}$ TMS34020 source register for the value for the remaining half of the double-precision value to the TMS34082

CRs TMS34082 register to contain the operand
CRd TMS34082 destination register
SQRTx loads the contents of Rs into CRs, takes the square root of the contents of CRs, and stores the result in CRd.

The source register, CRs, must be in the RA TMS34082 register file.
CMOVGC, one register
SQRTF A5, RA5, RA7
This example loads TMS34020 register A5 into TMS34082 register RA5, takes the square root of the single-precision floating-point value in RA5, and stores the result in RA7.


Syntax
'34020
Instruction Words

Instruction to '34082

Operands

Description

Transparency

## Example

| Type | Syntax |
| :--- | :--- |
| Integer | SQRT $-* R s, C R s, C R d$ |
| Double-Precision | SQRTD $-* R s, C R s, C R d$ |
| Single-Precision | SQRTF $-* R s, C R s, C R d$ |

Rs $-32 \rightarrow$ Rs
*Rs $\rightarrow$ CRs

##  <br> 

$\sqrt{\mathrm{CRs}} \rightarrow \mathrm{CRd}$

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 10 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | transfers |
| 1 | 0 | 0 | 1 | 1 | 1 | 1 | type | size | 0 | 0 | R |  |  |  |
| ID |  |  | CRs |  |  |  | 1 | 0 | 0 | 1 | CRd |  |  |  |


| 31 | 29 | 28 | 25 | 24 | 21 | 20 | 16 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| ID | CRs | 1001 |  | CRd | 1001 | $111 t$ | 5000 |

Rs TMS34020 source register containing the memory address
CRs TMS34082 register to contain the operand
CRd TMS34082 destination register
SQRTx loads the contents of memory pointed to by Rs minus 32 into CRs, takes the square root of the contents of CRs, and stores the result in CRd. Before each load from memory, Rs is decremented by 32.

The source register, CRs, must be in the RA TMS34082 register file.
CMOVMC, predecrement, constant count
SQRTF -*A5, RA5, RA7
This example loads memory starting at the address given by TMS34020 register A5 minus 32 into TMS34082 register RA5, takes the square root of the single-precision floating-point value in RA5, and stores the result in RA7.

| Syntax | Type |  |  |  | Syntax |  |  |  |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  | Integer |  |  |  | SQRTA |  | CRs, CRd |  |  |  |  |  |  |  |  |
|  | Double-Precision |  |  |  | SQRTAD CRs, CRd |  |  |  |  |  |  |  |  |  |  |
|  | Single-Precision |  |  |  | SQRTAF CRs, CRd |  |  |  |  |  |  |  |  |  |  |
| Execution | $\sqrt{\text { CRs }} \rightarrow$ | Rd |  |  |  |  |  |  |  |  |  |  |  |  |  |
| $34020$ <br> Instruction Words | $15 \quad 14$ | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|  | $1{ }^{1} 1$ | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | type | size |
|  | ID |  | CRs |  |  |  | 1 | 0 | 1 | 0 |  |  | CRd |  |  |
| Instruction to '34082 | $31 \quad 2928$ |  | 2524 |  |  | 2120 |  | 1615 |  |  | 0 |  |  |  |  |
|  | ID | CR |  |  | 10 |  | CRd |  | 000 | 1 | 111 |  | s 000 | 00 | 000 |
| Operands | CRs TMS34082 register containing the operand |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
|  | CRd TMS34082 destination register |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| Description | SQRTAx takes the square root of the absolute value of the contents of CRs and stores the result in CRd. |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
|  | The source register, CRs, must be in the RA TMS34082 register file. |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| Transparency | CEXEC, short |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| Example | SQRTA RA5, RB7 |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
|  | This example takes the square root of the absolute value of RA5 and stores the result in RB7. |  |  |  |  |  |  |  |  |  |  |  |  |  |  |

Syntax

Execution

| Type | Syntax |
| :--- | :--- |
| Integer | SQRTA $R s_{1}, C R s, C R d$ |
| Double-Precision | SQRTAD $R s_{1}, R s_{2}, C R s, C R d$ |
| Single-Precision | SQRTAF $R s_{1}, C R s, C R d$ |

$\mathrm{Rs}_{1} \rightarrow \mathrm{CRs}$


## '34082 Instruction Words

Integer or Single-Precision:

|  |  |  |  |  |  |  |  |  |  |  |  |  |  |  | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | R | Rsi |  |  |  |
| 0 | 1 | 0 | 1 | 1 | 1 | 1 | type | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| ID |  |  | CRs |  |  |  | 1 | 0 | 1 | 0 | CRd |  |  |  |  |

Double-Precision:


Instruction to '34082

## Operands

Description

Transparency
Example

CMOVGC, one register
SQRTAD A3, A5, RA5, RA7
This example loads TMS34020 register A5 and A3 into TMS34082 register RA5, takes the square root of the absolute value of the contents of RA5, and stores the result in RA7.

Syntax
Execution
'34020
Instruction Words

Instruction to '34082

Operands

Description

Transparency
Example

| Type | Syntax |
| :---: | :---: |
| Integer | SQRTA *Rs + , CRs, CRd |
| Double-Precision | SQRTAD *Rst, CRs, CRd |
| Single-Precision | SQRTAF *Rs+, CRs, CRd |
| *Rs $\rightarrow$ CRs |  |
| $\mathrm{Rs}+32 \rightarrow \mathrm{Rs}$ |  |
|  |  |
|  |  |
| $\sqrt{\text { CRs }} \rightarrow$ CRd |  |


| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 10 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | transfers |
| 1 | 0 | 0 | 1 | 1 | 1 | 1 | type | size | 0 | 0 | R |  |  | s |
| ID |  |  | CRs |  |  |  | 1 | 0 | 1 | 0 | CRd |  |  |  |



Rs TMS34020 source register containing the memory address
CRs TMS34082 register to contain the operand
CRd TMS34082 destination register
SQRTAx loads the contents of memory pointed to by Rs into CRs, takes the square root of the absolute value of the contents of CRs, and stores the result in CRd. After each load from memory, Rs is incremented by 32.

The source register, CRs, must be in the RA TMS34082 register file.
CMOVMC, postincrement, constant count
SQRTA *A5+, RA5, RA7
This example loads memory starting at the address given by TMS34020 register A5 into TMS34082 register RA5, takes the square root of the absolute value of the contents of RA5, and stores the result in RA7.

Syntax
Execution
'34020
Instruction Words

Instruction to '34082

Operands

Description

Transparency

## Example

| Type | Syntax |
| :--- | :--- |
| Integer | SQRTA $-\star R s, C R s, C R d$ |
| Double-Precision | SQRTAD $-\star R s, C R s, C R d$ |
| Single-Precision | SQRTAF $-\star R s, C R s, C R d$ |



| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 10 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | transfers |
| 1 | 0 | 0 | 1 | 1 | 1 | 1 | type | size | 0 | 0 | R |  |  |  |
| ID |  |  | CRs |  |  |  | 1 | 0 | 1 | 0 | CRd |  |  |  |


| 31 | 2928 | 2524 |  | 1615 |  | 0 |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| ID | CRs | 1010 | CRd | 1001 | 111 t | $\mathbf{5 0 0 0}$ | 0000 |

Rs TMS34020 source register containing the memory address
CRs TMS34082 register to contain the operand
CRd TMS34082 destination register
SQRTAx loads the contents of memory pointed to by Rs minus 32 into CRs, takes the square root of the absolute value of the contents of CRs, and stores the result in CRd. Before each load from memory, Rs is decremented by 32.

The source register, CRs, must be in the RA TMS34082 register file.
CMOVMC, predecrement, constant count
SQRTA -*A5, RA5, RA7
This example loads memory starting at the address given by TMS34020 register A5 minus 32 into TMS34082 register RA5, takes the square root of the absolute value of the contents of RA5, and stores the result in RA7.



Syntax
Execution
'34020
Instruction Words

Instruction to '34082

Operands

Description

## Transparency

Example
Type
Integer
Single-Precision
$\mathrm{Rs}_{1} \rightarrow \mathrm{CRs}_{1}$
$\mathrm{Rs}_{2} \rightarrow \mathrm{CRs}_{2}$
$\mathrm{CRs}_{2}-\mathrm{CRs}_{1} \rightarrow \mathrm{CRd}$
SUB Rsp, Rs $s_{1}$, CRs $_{2}, C R s_{1}, C R d$

SUBF Ris ${ }_{2}, R s_{1}, C R s_{2}, C R s_{1}, C R d$

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | R |  | Rs, |  |
| 0 | 1 | 0 | 0 | 0 | 1 | 1 | type | 0 | 0 | 0 | R |  |  |  |
| ID |  |  | CRs ${ }_{1}$ |  |  |  | CRs2 |  |  |  | CRd |  |  |  |



Rs1 TMS34020 source register for the first (subtrahend) value to TMS34082
$R_{2}$ TMS34020 source register for the second (minuend) value to TMS34082
$\mathrm{CRs}_{1}$ TMS34082 RA register to contain the subtrahend operand
$\mathrm{CRs}_{2}$ TMS34082 RB register to contain the minuend operand
CRd TMS34082 destination register
SUBx loads the contents of $\mathrm{Rs}_{1}$ and $\mathrm{Rs}_{2}$ into $\mathrm{CRs}_{1}$ and $\mathrm{CRs}_{2}$ respectively, subtracts the contents of $\mathrm{CRs}_{1}$ from $\mathrm{CRs}_{2}$, and stores the result in CRd. Note that in the syntax, $\mathrm{Rs}_{2}$ and $\mathrm{CRs}_{2}$ are listed before $\mathrm{Rs}_{1}$ and $C R s_{1}$.

The syntax for this instruction and the previous instruction, subtract (RA register - RB register), is similar. The order of the operands determines which instruction is used. If an RA register is listed first, the previous instruction is used. If an RB register is first, this instruction is used.

The double-precision form of this instruction is not supported.
CMOVGC, two registers
SUB A3, A0, RB5, RA3, RA7
This example loads TMS34020 registers B6 and A0 into TMS34082 registers RB5 and RA3, subtracts the contents of RA3 from RB5, and stores the result in RA7.
Syntax
Execution
I34020
Instruction Words

Instruction to '34082
Description
Transparency
Example

| Type | Syntax |
| :--- | :--- |
| Integer | SUB $* R s+C R s_{1}, C R s_{2}, C R d$ |
| Double-Precision | SUBD $* R s+C R s_{1}, C R s_{2}, C R d$ |
| Single-Precision | SUBF $* R s+C R s_{1}, C R s_{2}, C R d$ |

${ }^{* R s} \rightarrow \mathrm{CRs}_{1}$
$R s+32 \rightarrow R s$


$\star \mathrm{Rs} \rightarrow \mathrm{CRs}_{2}$
Rs $+32 \rightarrow$ Rs


$\mathrm{CRs}_{1}-\mathrm{CRs}_{2} \rightarrow \mathrm{CRd}$

| 15 | 14 | 13 | 2 | 1 | 0 |  | 8 |  | 5 | 4 | 3 | 2 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | transfers |
| 1 | 0 | 0 | 0 | 0 | 0 | 1 | type | size | 0 | 0 | R |  | Rs |
| ID |  |  | CRs ${ }_{1}$ |  |  |  | CRs2 |  |  |  | CRd |  |  |


| 31 | 2928 | 2524 |  | 1615 |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| ID | $\mathrm{CRs}_{1}$ | $\mathrm{CRs}_{2}$ | CRd | 1000 | 001 t | s 000 | 0000 |

Rs TMS34020 register containing the memory address
$\mathrm{CRs}_{1}$ TMS34082 RA register to contain the minuend operand
CRs2 TMS34082 RB register to contain the subtrahend operand
CRd TMS34082 destination register
SUBx loads the contents of memory pointed to by Rs into $\mathrm{CRs}_{1}$ and $\mathrm{CRs}_{2}$, subtracts the contents of $\mathrm{CRs}_{2}$ from $\mathrm{CRs}_{1}$, and stores the result in CRd. After each load from memory, Rs is incremented by 32.

The syntax for this instruction and the next instruction for subtract (RB register - RA register) is similar. The order of the operands determines which instruction is used. If an RA register is listed first, this instruction is used. If an RB register is first, the other instruction is used.

CMOVMC, postincrement, constant count
SUBF *A0+, RA5, RB3, RA7
This example loads memory starting at the address given by TMS34020 register A0 into TMS34082 registers RA5 and RB3, subtracts the contents of RB3 from RA5, and stores the result in RA7.





## Syntax

Execution
'34020
Instruction Words

Instruction to '34082

Operands

## Description

Instruction Type
Example

SUBAF $R s_{1}, R s_{2}, C R s_{1}, C R s_{2}, C R d$

$$
\begin{aligned}
& \mathrm{Rs}_{1} \rightarrow \mathrm{CRs}_{1} \\
& \mathrm{Rs}_{2} \rightarrow \mathrm{CRs}_{2} \\
& \left|C R s_{1}-\mathrm{CRs}_{2}\right| \rightarrow \mathrm{CRd}
\end{aligned}
$$



| 2928 |  | 2524 | 2120 | 1615 |  |  | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| ID | $\mathrm{CRs}_{1}$ | CRs 2 | CRd | 0100 | 1011 | 0000 | 0000 |

$\mathrm{Rs}_{1}$ TMS34020 source register for first 32-bit single-precision floating-point value to coprocessor
$\mathrm{Rs}_{2}$ TMS34020 source register for second 32-bit single-precision floating-point value to coprocessor
$\mathrm{CRs}_{1}$ Coprocessor RA register to contain the first single-precision operand
$\mathrm{CRs}_{2}$ Coprocessor RB register to contain the second single-precision operand

CRd Coprocessor destination register
This instruction loads the contents of $R s_{1}$ and $R s_{2}$ into $C R s_{1}$ and $C R s_{2}$ respectively and subtracts $\mathrm{CRs}_{2}$ from $\mathrm{CRs}_{1}$, placing the absolute value of the result in CRd.

The integer and double-precision forms of this instruction are not supported.
CMOVGC, two registers
SUBAF A9, A3, RA9, RB3, RB1
This instruction loads the contents of TMS34020 registers A9 and A3 into coprocessor registers RA9 and RB3 respectively, subtracts RB3 from RA9, takes the absolute value of the difference, and stores the result in RB1.




| Syntax | Type | Syntax |
| :--- | :--- | :--- |
|  | Integer | VADD |
|  | Double-Precision | VADDD |
|  | Single-Precision | VADDF |

```
'34020
Instruction Words
```

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | type | size |
| ID |  |  |  | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |

Instruction to '34082

| 31 | 2928 |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 10 | 0 | 0001 | 0010 | 0000 | 0011 | 111 t | s 000 | 0000 |

Description

Implied Operands

Adds the $X, Y$ and $Z$ components of a vector in RB2-RB0 to the $X, Y$, and $Z$ components of a vector in RA2-RA0.

$$
\begin{array}{ll}
\mathrm{RA} 0=\mathrm{X} 1 & \mathrm{RB} 0=\mathrm{X} 2 \\
\mathrm{RA} 1=\mathrm{Y} 1 & \mathrm{RB} 1=\mathrm{Y} 2 \\
\mathrm{RA} 2=\mathrm{Z} 1 & \mathrm{RB} 2=\mathrm{Z} 2
\end{array}
$$

$$
\begin{array}{ll}
R A 0=R A 0+R B 0 & ; X 1+X 2 \\
R A 1=R A 1+R B 1 & ; Y 1+Y 2 \\
R A 2=R A 2+R B 2 & Z Z 1+Z 2
\end{array}
$$

Temporary Storage
Outputs
Instruction Type

None
The sum of the vectors is stored in RA2-RA0.
CEXEC, short


Syntax
'34020
Instruction Words

| Type | Syntax |
| :--- | :--- |
| Integer | VDOT |
| Double-Precision | VDOTD |
| Single-Precision | VDOTF |


| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | type | size |
| ID |  |  | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |

Instruction to '34082

| 31 |  |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 10 | 0 | 0001 | 0110 | 0000 | 0011 | $111 t$ | 5000 | 0000 |

## Description

Implied Operands

## Algorithm

Temporary Storage
Outputs
Instruction Type

Given two vectors V1 in RA2-RA0 and V2 in RB2-RB0, calculate the dot product.

Vector V1 in RA2-RA0 and vector V2 in RB2-RB0
$\mathrm{RAO}=\mathrm{X} 1$
$\mathrm{RB} 0=\mathrm{X} 2$
RA1 $=\mathrm{Y} 1$
$\mathrm{RB} 1=\mathrm{Y} 2$
RB2 $=\mathbf{Z 2}$
$C=R A 0 \times R B 0$
; $\mathrm{X} 1 \times \mathrm{X} 2$
$C=C+(R A 1 \times R B 1)$
; $(\mathrm{X} 1 \times \mathrm{X} 2)+(\mathrm{Y} 1 \times \mathrm{Y} 2)$
$R A 4=C+(R A 2 \times R B 2) \quad ;(X 1 \times X 2)+(Y 1 \times Y 2)+(Z 1 \times Z 2)$
C
The scalar dot product of the two vectors is stored in RA4.
CEXEC, short

## Syntax

'34020
Instruction Words

Instruction to '34082

| Type | Syntax |
| :--- | :--- |
| Integer | VMAG |
| Double-Precision | VMAGD |
| Single-Precision | VMAGF |


| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | type | size |
| ID |  |  | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |


| 31 |  |  | 0 |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| ID | 00001 | 1010 | 0000 | 0011 | $111 t$ | 5000 | 0000 |

## Description

Implied Operands

Algorithm

Temporary Storage
Outputs
Instruction Type
Given a vector in RA2-RA0, compute the length of the vector.
$\mathrm{RAO}=\mathrm{X} 1$
$R A 1=Y 1$
$\mathrm{RA} 2=\mathrm{Z} 1$

$$
\begin{array}{ll}
C=R A 0 & \\
\text { RA3 }=C \times C & ;(X \times X) \\
\text { CT }=\text { RA1 } & \\
\text { CT }=C T \times C T & ;(Y \times Y) \\
R A 3=C T+R A 3 & \\
C=\text { RA2 } & (Z \times Z)+(Y \times Y) \\
C T=C \times C & ;(X \times X)+(Y \times Y)+(Z \times Z) \\
\text { RA3 }=C T+\text { RA3 } & ; S Q R T\left(X^{2}+Y^{2}+Z^{2}\right) \\
\text { RA3 }=\text { SQRT(RA3 }) &
\end{array}
$$

C, CT
The scalar magnitude of the vector is stored in RA3.
CEXEC, short

Syntax
'34020
Instruction Words

| Type | Syntax |
| :--- | :--- |
| Double-Precision | VNORMD |
| Single-Precision | VNORMF |


| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | size |
| ID |  |  | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |

Instruction to '34082

| 31 | 2928 |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| ID | 0 | 0001 | 1100 | 0000 | 0011 | 1111 | 5000 | 0000 |

## Description

Implied Operands
$R A 0=X 0$
$R A 1=Y 0$
$\mathrm{RA} 2=\mathrm{Z} 0$
Algorithm
$C=R A O$
$C=C \times C$
; X0 $\times$ X0
CT $=$ RA1
$R A 9=C T \times C T$
; $\mathrm{YO} \times \mathrm{YO}$
$R A 9=C+R A 9$
$\mathrm{C}=\mathrm{RA} 2$
$C=C \times C$
RA9 $=\mathrm{C}+\mathrm{RA} 9$
ZO $\times$ ZO
$\mathrm{C}=\mathrm{SQRT}(\mathrm{RA} 9) \quad:$ SQRT $\left(X 0^{2}+Y 0^{2}+Z 0^{2}\right)$
$R A 3=C \quad$; save the magnitude in RA3
$C=1 / C \quad ; 1 /$ magnitude
$R A 0=C \times R A 0$
$R A 1=C \times R A 1$
$\mathrm{RA} 2=\mathrm{C} \times \mathrm{RA} 2$

Temporary Storage
Outputs

Instruction Type

C, CT, RA9
The unit length vector is stored in registers RA2-RA0.
$R A O=X 0 /\left(S Q R T\left(X 0^{2}+Y 0^{2}+Z 0^{2}\right)\right)$
RA1 $=Y 0 /\left(\right.$ SQRT $\left.\left(X 0^{2}+Y 0^{2}+Z 0^{2}\right)\right)$
RA2 $=Z 0 /\left(\operatorname{SQRT}\left(X 0^{2}+Y 0^{2}+Z 0^{2}\right)\right)$
RA3 $=\operatorname{SQRT}\left(X 0^{2}+Y 0^{2}+Z 0^{2}\right)$
$\mathrm{C}=1 /$ (SQRT $\left(X 0^{2}+Y 0^{2}+Z 0^{2}\right)$ )
CEXEC, short
Syntax
'34020
Instruction Words

Instruction to '34082

Description

Implied Operands

Algorithm

Temporary Storage

## Outputs

$C=R A 0 \times R B 0$
$C=C+(R A 1 \times R B 1)$
$C=C+($ RA2 $\times$ RB2 $) \quad ;$ scalar dot product in $C \quad(\cos ($ Theta $))$
$C=C+C \quad ; C=2 \times \cos$ (Theta)
$C T=C \times R A O$
RB0 $=\mathrm{CT}-$ RB0 $\quad ; \mathrm{Xr}=\mathrm{Xn} \times(2 \times \cos ($ Theta $))-\mathrm{Xi}$
$C T=C \times R A 1$
$R B 1=C T-R B 1 \quad ; Y r=Y n \times(2 \times \cos (T h e t a))-Y i$
$C T=C \times R A 2$
$\mathrm{RB} 2=\mathrm{CT}-\mathrm{RB} 2 \quad ; \mathrm{Zr}=\mathrm{Zn} \times(2 \times \cos ($ Theta $))-\mathrm{Zi}$
C, CT, RA9
The reflected vector components $x, y$ and $z$ are stored in RB2-RB0.

$$
\begin{aligned}
& R B 0=X r=X n \times(2 \times((X n \times X r)+(Y n \times Y r)+(Z n \times Z r)))-X i \\
& R B 1=Y r=Y n \times(2 \times((X n \times X r)+(Y n \times Y r)+(Z n \times Z r)))-Y i \\
& R B 2=Z r=Z n \times(2 \times((X n \times X r)+(Y n \times Y r)+(Z n \times Z r)))-Z i
\end{aligned}
$$

Instruction Type
CEXEC, short

| Type | Syntax |
| :--- | :--- |
| Integer | VRFLCT |
| Double-Precision | VRFLCTD |
| Single-Precision | VRFLCTF |


| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 6 | 6 | 4 |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | type | size |  |
| ID |  |  |  | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |


| 31 |  |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| ID | 0 | 0001 | 1110 | 0000 | 0011 | $111 t$ | 5000 | 0000 |

The VRFLCT instruction calculates the vector reflection of a vector incident on a surface defined by a normal vector. The normal vector should be normalized before issuing the VRFLCT instruction.

Vector in RA2-RA0 is the normal vector $\left(X_{n} i+Y_{n} j+Z_{n} k\right)$ ). Vector in RB2-RB0 is the incident vector ( $\mathrm{X}_{\mathrm{i}} \mathbf{i}+\mathrm{Y}_{\mathrm{i}} \mathrm{j}+\mathrm{Z}_{\mathrm{i}} \mathbf{k}$ ) )
RAO $=\mathrm{Xn} \quad \mathrm{RBO}=\mathrm{Xi}$
$\mathrm{RA} 1=\mathrm{Yn} \quad \mathrm{RB} 1=\mathrm{Yi}$
$\mathrm{RA} 2=\mathrm{Zn} \quad \mathrm{RB} 2=\mathrm{Zi}$


Syntax

Type
Integer
Double-Precision
Single-Precision

## Syntax

VSCL $R_{1}$, CRs
VSCLD Rs ${ }_{1}, R_{2}$, CRs
VSCLF Rs ${ }_{1}$, CRs

## '34082 Instruction Words

Integer or Single-Precision:

| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | R |  |  |  |  |
| 0 | 1 | 0 | 1 | 1 | 1 | 1 | type | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| ID |  |  | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | CRs |  |  |  |

Double-Precision:

| 15 | 14 | 13 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 2 | 1 |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | R | 0 |  |
| 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | R |  |  |
| 10 |  |  |  |  |  |  |  |  |  |  |  |  |  |

Instruction to '34082

## Description

## Operands

Implied Operands
$\mathrm{RA} 0=\mathrm{X} 1$
RA1 $=\mathrm{Y} 1$
RA2 $=\mathrm{Z} 1$

## Algorithm

$\mathrm{Rs}_{1} \rightarrow \mathrm{CRs}$

$\mathrm{RAO}=\mathrm{RA} 0 \times \mathrm{CR}$
RA1 $=$ RA $1 \times$ CRs
$R A 2=R A 2 \times C R s$

Temporary Storage
Outputs

None
The scaled vector is stored in RA2-RA0.
RAO $=X 1$ '
RA1 $=Y 1^{\prime}$
RA2 $=$ Z1'
Instruction Type
CMOVGC, one or two registers

Syntax

'34020
Instruction Words

Instruction to '34082

Description

Operands

Implied Operands

| Type | Syntax |
| :--- | :--- |
| Integer | VSCL $-* R s, C R s$ |
| Double-Precision | VSCLD $-* R s, C R s$ |
| Single-Precision | VSCLF $-* R s, C R s$ |



| 31 | 2928 |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| ID | CRs | 0 | 1110 | 0000 | 1001 | $111 t$ | s 000 | 0000 |

The $\mathrm{X}, \mathrm{Y}$, and Z components of a vector in registers RA2-RA0 are multiplied by a scalar in CRs (loaded from memory pointed to by Rs). Before each load from memory, Rs is decremented by 32.

Rs TMS34020 register containing the memory address
CRs Coprocessor RB register to contain the scaling factor. Must be in the RB register file.
$R A 0=X 1$
RA1 $=\mathrm{Y} 1$
$\mathrm{RA} 2=\mathrm{Z} 1$
Algorithm
Rs $-32 \rightarrow$ Rs
*Rs $\rightarrow$ CRs


$\mathrm{RAO}=\mathrm{RA} 0 \times \mathrm{CRs}$
$\mathrm{RA} 1=\mathrm{RA} 1 \times \mathrm{CRs}$
$R A 2=R A 2 \times C R s$

Temporary Storage
Outputs

Instruction Type

None
The scaled vector is stored in RA2-RA0.
RAO $=X 1$ '
$R A 1=Y 1$ '
$\mathrm{RA} 2=\mathrm{Z1}{ }^{\prime}$
CMOVMC, predecrement, constant count


## Chapter 8

## External Instructions

The external instruction set is executed through the MSD port of the TMS34082. The multiplier and ALU may be operated in parallel using these RISC-like instructions. Integer, single-precision, and double-precision floating-point formats are supported. In coprocessor mode, user-defined subroutines constructed out of external instructions may be executed through the MSD port. See Figure 8-1.

Figure 8-1. Source of Instructions for Coprocessor Mode


In host-independent mode, the TMS34082 is controlled by external instructions input on the MSD bus.

Figure 8-2. Instructions in Host-Independent Mode


### 8.1 Overview

External instructions are 32 bits long and their formats (number, length, and function of fields) depend upon the operations being selected. Separate formats are provided for data transfers to and from the TMS34082, FPU processing, test and branch operations, and subroutine calls.

In the host-independent mode, the TMS34082 is controlled by external instructions input on the MSD bus. In the coprocessor mode, the TMS34082 executes user-defined routines (external instructions stored in memory on the MSD bus) by executing a jump to external code. Up to 32 routines may be defined by the user using external instructions in coprocessor mode.

To cause a jump to the external routine, the TMS34020 sends the TMS34082 an instruction with the md field (bits 15-14) set high. The fpuop is the routine number ( $0-31$ ). The TMS34082 multiplies the routine number by two to get the jump address. This creates a compact jump table where every other address is the starting address of a routine. The remaining memory can then be allocated according to user need. Using every other address as a starting address allows a single-instruction subroutine to be implemented without another jump. For more complex routines, the first instruction in the routine will be a jump to another memory location. In either case, the last instruction should be a return from subroutine or jump to internal instruction address 10FFF (hex). This puts the TMS34082 in an idle state, waiting for the next instruction from the TMS34020. Before the last return from subroutine or jump to internal address 10FFF, the stack (SUBADDR1-0) must be cleared. This can be accomplished by setting the stack pointer (bit 31) in both registers to 0 . You may wish to save the contents of these registers in external memory before clearing the stack pointers.

### 8.2 FPU Processing Instruction Format

The largest group of external instructions control FPU operations. These instructions can select operands from input registers, internal feedback, or from the LAD bus (32-bit operations only). Independent ALU or multiplier operations and chained-mode operations (ALU and multiplier acting in parallel) can be coded.

The format for an FPU processing instruction is shown below:

| 28 | 27 | 2322 | 2019 |  | 15 |  | 14 | 11 | 10 |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| sequencer op |  | ra |  | rb |  | rd | sel_op | FPU operation |  |

### 8.2.1 FPU Processing Sequencer Opcodes

Valid sequencer opcodes for this instruction format:
0000 continue
0001 continue with LAD enable for output ( $\overline{\text { ALTCH }}$ strobe)
0010 continue with LAD enable for output $\overline{W E}$ strobe) ${ }^{\dagger}$
$\dagger$ Permits simultaneous write to a register and to the LAD bus. Writing to the LAD bus during FPU operation requires a 15 -ns extension (TMS34082-40) of the clock period when the write is performed.

### 8.2.2 Operand Selection

Instructions that control FPU operations can select operands from internal registers, internal feedback, or the LAD bus (32-bit operations only). When register addresses are used as sources (ra or rb field), only the lower four bits are used. Most instructions use three operands:
ra is the operand $A$ source address (RA9-0, C, CT)
rb is the operand B source address (RB9-0, C, CT)
rd is the result destination address
When ra (or rb) is set to $1100_{2}$, the A (or B ) operand comes from the LAD bus without first being written into a register.

When the CONFIG, COUNTX, or COUNTY register (address 13, 14, or 15) is selected as the ra operand, a one is input to the FPU.

When the SUBADD1, IRAREG, or MIN-MAX/LOOPCT register (address 29, 30 , or 31 ) is selected as the rb operand, a one is input to the FPU.

The sel_op field chooses the operands. When low, sel_op bits $14-11$ select the following feedback operands:
bit 14 for ALU feedback to multiplier A input
bit 13 for multiplier feedback to multiplier B input
bit 12 for multiplier feedback to ALU A input
bit 11 for ALU feedback to ALU B input
The sel_op bits allow many different combinations of operands from the register file and feedback registers. Figure 8-1 shows the operands selected for each combination of sel_op bits.

Note: If feedback operands are used, the FPU core output registers must be enabled (PIPES2=0).

Figure 8-3. Operand Selection


Figure 8-3. Operand Selection (Continued)

sel_op $=0101$

sel_op $=0110$

sel_op $=0111$

Figure 8-3. Operand Selection (Continued)


Figure 8-3. Operand Selection (Continued)


### 8.2.3 FPU Processing Instruction Codes

Instruction bits 10-0 select the multiplier or ALU operation. When the FPU core is busy with multicycle operations (division, square root, or double-precision floating-point multiplication), the FPU stops the sequencer until the FPU is ready for the next operation.

### 8.3 External Instruction Cycle Counts

Table 8-1 lists external instructions, pipeline settings, and the number of cycles required to complete each routine. The number in parenthesis after each cycle count is the number of cycles before the next operation may begin. For block move operations, $n$ specifies the number of words transferred.

Table 8-1. Cycle Counts for External Instructions

| Assembler Opcode | Description of Routine | Cycle Counts |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: |
|  |  | $\begin{gathered} \text { PIPES2-1 } \\ 11 \end{gathered}$ | $\begin{gathered} \text { PIPES2-1 } \\ 10 \end{gathered}$ | $\begin{gathered} \text { PIPES2-1 } \\ 01 \end{gathered}$ | $\begin{gathered} \text { PIPES2-1 } \\ 00 \end{gathered}$ |
| ADD | Add A + B | 1(1) | 2(1) | 2(1) | 3(1) |
| AND | Logical AND A, B | 1(1) | 2(1) | 2(1) | 3(1) |
| ANDNA | Logical AND not A, B | 1(1) | 2(1) | 2(1) | 3(1) |
| ANDNB | Logical AND A, not B | 1(1) | 2(1) | 2(1) | 3(1) |
| CJMP | Conditional jump | 1(1) | 1(1) | 1(1) | 1(1) |
| CSJR | Conditional jump to subroutine | 1(1) | 1(1) | 1(1) | 1(1) |
| CMP | Compare A, B | 1(1) | 2(1) | 2(1) | 3(1) |
| COMPL | Pass 1's complement of A | 1(1) | 2(1) | 2(1) | 3(1) |
| DIV | Divide A / B <br> single-precision <br> double-precision integer | $\begin{gathered} 8(8) \\ 13(13) \\ 16(16) \end{gathered}$ | $\begin{gathered} 8(7) \\ 13(12) \\ 16(15) \end{gathered}$ | $\begin{gathered} 9(7) \\ 15(12) \\ 17(15) \end{gathered}$ | $\begin{gathered} 9(7) \\ 15(12) \\ 17(15) \end{gathered}$ |
| DTOF | Convert from DP to SP | 1(1) | 2(1) | 2(1) | 3(1) |
| DTOI | Convert from DP to integer | 1(1) | 2(1) | 2(1) | 3(1) |
| DTOU | Convert from DP to unsigned integer | 1(1) | 2(1) | 2(1) | 3(1) |
| FTOD | Convert from SP to DP | 1(1) | 2(1) | 2(1) | 3(1) |
| FTOI | Convert from SP to integer | 1(1) | 2(1) | 2(1) | 3(1) |
| FTOU | Convert from SP to unsigned integer | 1(1) | 2(1) | 2(1) | 3(1) |
| ITOD | Convert from integer to DP | 1(1) | 2(1) | 2(1) | 3(1) |
| ITOF | Convert from integer to SP | 1(1) | 2(1) | 2(1) | 3(1) |
| LD | Load n words into register single-precision double-precision integer | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ |
| LDLCT | Load loop counter with value | 1(1) | 1(1) | 1(1) | 1(1) |
| MASK | Set programmable mask | 1(1) | 1(1) | 1(1) | 1(1) |
| MOVA | Move A | 1(1) | 2(1) | 2(1) | 3(1) |
| MOVLM | Move $n$ words from LAD bus to MSD bus single-precision double-precision integer | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \\ \hline \end{gathered}$ | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \\ \hline \end{gathered}$ |

Table 8-1. Cycle Counts for External Instructions (Continued)

| Assembler Opcode | Description of Routine | Cycle Counts |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: |
|  |  | $\begin{gathered} \text { PIPES2-1 } \\ 11 \end{gathered}$ | $\begin{gathered} \text { PIPES2-1 } \\ 10 \end{gathered}$ | $\begin{gathered} \text { PIPES2-1 } \\ 01 \end{gathered}$ | $\begin{gathered} \text { PIPES2-1 } \\ 00 \end{gathered}$ |
| MOVML | Move n words from MSD bus to LAD bus single-precision double-precision integer | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ |
| MOVRR | Multiple move, register to register single-precision double-precision integer | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ |
| MULT | Multiply A * B single-precision double-precision integer | $\begin{aligned} & 1(1) \\ & 2(2) \\ & 1(1) \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \end{aligned}$ | $\begin{aligned} & 3(1) \\ & 4(2) \\ & 3(1) \end{aligned}$ |
| MULT.ADD |  | $\begin{aligned} & 1(1) \\ & 2(2) \\ & 1(1) \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \end{aligned}$ | $\begin{aligned} & 3(1) \\ & 4(2) \\ & 3(1) \end{aligned}$ |
| MULT.NEG |  | $\begin{aligned} & 1(1) \\ & 2(2) \\ & 1(1) \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \end{aligned}$ | $\begin{aligned} & 3(1) \\ & 4(2) \\ & 3(1) \end{aligned}$ |
| MULT.PASS | $\begin{aligned} & \text { Multiply } \mathrm{A}_{1} * \mathrm{~B}_{1} \text {, Add } \mathrm{A}_{2}+0 \\ & \text { single-precision } \\ & \text { double-precision } \\ & \text { integer } \end{aligned}$ | $\begin{gathered} 1(1) \\ 2(2) \\ 1(1) \end{gathered}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \end{aligned}$ | $\begin{aligned} & 3(1) \\ & 4(2) \\ & 3(1) \end{aligned}$ |
| MULT.SUB | ```Multiply A1 * B1, Subtract A2 - B2 single-precision double-precision integer``` | $\begin{aligned} & 1(1) \\ & 2(2) \\ & 1(1) \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \end{aligned}$ | $\begin{aligned} & 3(1) \\ & 4(2) \\ & 3(1) \end{aligned}$ |
| MULT.2SUBA |  | $\begin{aligned} & 1(1) \\ & 2(2) \\ & 1(1) \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \end{aligned}$ | $\begin{aligned} & 3(1) \\ & 4(2) \\ & 3(1) \end{aligned}$ |
| MULT.SUBRL | $\begin{aligned} & \text { Muttiply } \mathrm{A}_{1} * \mathrm{~B}_{1} \text {, Subtract } \mathrm{B}_{2}-\mathrm{A}_{2} \\ & \text { single-precision } \\ & \text { double-precision } \\ & \text { integer } \\ & \hline \end{aligned}$ | $\begin{aligned} & 1(1) \\ & 2(2) \\ & 1(1) \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 3(1) \\ & 4(2) \\ & 3(1) \end{aligned}$ |

Table 8-1. Cycle Counts for External Instructions (Continued)

| Assembler Opcode | Description of Routine | Cycle Counts |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: |
|  |  | $\begin{gathered} \text { PIPES2-1 } \\ 11 \end{gathered}$ | $\begin{gathered} \text { PIPES2-1 } \\ 10 \end{gathered}$ | $\begin{gathered} \text { PIPES2-1 } \\ 01 \end{gathered}$ | $\begin{gathered} \hline \text { PIPES2-1 } \\ 00 \end{gathered}$ |
| NEG | Pass - A | 1(1) | 2(1) | 2(1) | 3(1) |
| NOR | Logical NOR A, B | 1(1) | 2(1) | 2(1) | 3(1) |
| OR | Logical OR A, B | 1(1) | 2(1) | 2(1) | 3(1) |
| PASS | Pass A | 1(1) | 2(1) | 2(1) | 3(1) |
| PASS | Pass B | 1(1) | 2(1) | 2(1) | 3(1) |
| PASS.ADD | Multiply $\mathrm{A}_{1}$ * 1, Add $\mathrm{A}_{2}+\mathrm{B}_{2}$ <br> single-precision <br> double-precision integer | $\begin{aligned} & 1(1) \\ & 2(2) \\ & 1(1) \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \end{aligned}$ | $\begin{aligned} & 3(1) \\ & 4(2) \\ & 3(1) \end{aligned}$ |
| PASS.NEG | ```Multiply A1 * 1, Subtract 0-A2 single-precision double-precision integer``` | $\begin{aligned} & 1(1) \\ & 2(2) \\ & 1(1) \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \end{aligned}$ | $\begin{aligned} & 3(1) \\ & 4(2) \\ & 3(1) \end{aligned}$ |
| PASS.PASS | Multiply $\mathrm{A}_{1} * 1, \operatorname{Add} \mathrm{~A}_{2}+0$ single-precision double-precision integer | $\begin{aligned} & 1(1) \\ & 2(2) \\ & 1(1) \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \end{aligned}$ | $\begin{aligned} & 3(1) \\ & 4(2) \\ & 3(1) \end{aligned}$ |
| PASS.SUB | ```Multiply A1 * 1, Subtract A2 - B2 single-precision double-precision integer``` | $\begin{aligned} & 1(1) \\ & 2(2) \\ & 1(1) \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \end{aligned}$ | $\begin{aligned} & 3(1) \\ & 4(2) \\ & 3(1) \end{aligned}$ |
| PASS.2SUBA | ```Multiply A1 * 1, Subtract 2-A A single-precision double-precision integer``` | $\begin{aligned} & 1(1) \\ & 2(2) \\ & 1(1) \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \end{aligned}$ | $\begin{aligned} & 3(1) \\ & 4(2) \\ & 3(1) \end{aligned}$ |
| PASS.SUBRL | ```Multiply A1 * 1, Subtract B2 - A2 single-precision double-precision integer``` | $\begin{aligned} & 1(1) \\ & 2(2) \\ & 1(1) \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \end{aligned}$ | $\begin{aligned} & 3(1) \\ & 4(2) \\ & 3(1) \end{aligned}$ |
| RTI | Return from interrupt | 1(1) | 1(1) | 1(1) | 1(1) |
| RTS | Return from subroutine | 1(1) | 1(1) | 1(1) | 1(1) |
| SLL | Logical shift left A by B bits | 1(1) | 2(1) | 2(1) | 3(1) |
| SQRT | Square root of $A$ single-precision double-precision integer | $\begin{aligned} & 11(11) \\ & 16(16) \\ & 20(20) \end{aligned}$ | $\begin{aligned} & 11(10) \\ & 16(15) \\ & 20(19) \end{aligned}$ | $\begin{aligned} & 12(10) \\ & 17(15) \\ & 21(19) \end{aligned}$ | $\begin{aligned} & 12(10) \\ & 17(15) \\ & 21(19) \end{aligned}$ |
| SRA | Arithmetic shift right $A$ by B bits | 1(1) | 2(1) | 2(1) | 3(1) |
| SRL | Logical shift right $A$ by B bits | 1(1) | 2(1) | 2(1) | 3(1) |

## Table 8-1. Cycle Counts for External Instructions (Continued)

| Assembler <br> Opcode | Description <br> of Routine | Cycle Counts |  |  |  |
| :--- | :--- | :---: | :---: | :---: | :---: |
|  |  | PIPES2-1 <br> 11 | PIPES2-1 <br> 10 | PIPES2-1 <br> 01 | PIPES2-1 <br> 00 |
| ST | Store $n$ words from register <br> single-precision <br> double-precision | $n+1$ | $n+1$ | $n+1$ | $n+1$ |
|  | integer | $2 n+1$ | $2 n+1$ | $2 n+1$ | $2 n+1$ |
|  | Subtract A-B | $n+1$ | $n+1$ | $n+1$ | $n+1$ |
| SUB | Subtract B-A | $1(1)$ | $2(1)$ | $2(1)$ | $3(1)$ |
| SUBRL | Convert from unsigned integer to DP | $1(1)$ | $2(1)$ | $2(1)$ | $3(1)$ |
| UTOD | $1(1)$ | $2(1)$ | $2(1)$ | $3(1)$ |  |
| UTOF | Convert from unsigned integer to SP | $1(1)$ | $2(1)$ | $2(1)$ | $3(1)$ |
| UWRAPI | Unwrap inexact operand | $1(1)$ | $2(1)$ | $2(1)$ | $3(1)$ |
| UWRAPR | Unwrap rounded operand | $1(1)$ | $2(1)$ | $2(1)$ | $3(1)$ |
| UWRAPX | Unwrap exact operand | $1(1)$ | $2(1)$ | $2(1)$ | $3(1)$ |
| WRAP | Wrap denormalized operand | $1(1)$ | $2(1)$ | $2(1)$ | $3(1)$ |
| XOR | Logical exclusive OR A, B | $1(1)$ | $2(1)$ | $2(1)$ | $3(1)$ |

### 8.4 General Restrictions for External Instructions

Restrictions that apply to all external instructions are as follows:
Registers C and CT cannot both be used as operands in the same instruction.

Absolute value modifiers are permitted with floating-point operations only.
Integer and floating-point operand types cannot be used in the same operation (except conversions).

Signed and unsigned integer operand types cannot be used in the same operation.

Operands with the LAD bus as the source cannot be specified with a double-precision operand type.

Multiplier and ALU feedback (MULFB and ALUFB) cannot be specified as operands unless the FPU core output registers are turned on (PIPES2 = 1).

Results from chained-mode operations are always of the same type. If one result is double-precision, the other is forced to be also. For example, a multiply/pass operation with double-precision multiplier inputs and a single-precision input for the pass operation will result in two double-precision outputs. Be careful that subsequent instructions have the correct data types when these results are used as input.

### 8.5 External Assembly Instructions

A detailed explanation of each external instruction is provided on the following pages of this chapter. The instructions are in alphabetical order by their TMS34082 assembler opcode. Table 8-2 is a list of the selectable bit definitions used in this chapter.

## Table 8-2. Bit Definitions for External Instructions

| Bit Number | Mnemonic | Description |
| :---: | :---: | :---: |
| 29 | e | $0=$ normal operation, <br> $1=$ send output to LAD bus with $\overline{W E}$ strobe |
| 28 | h | 0 = normal operation, <br> $1=$ send output to LAD bus with $\overline{\text { ALTCH }}$ strobe |
| 27-24 | ra | operand A source address |
| 23-20 | rb | operand B source address |
| 19-15 | rd | result destination address |
| 14-11 | sel_op | operand selection (see subsection 8.2.2) |
| 9-7 | type or t | operand format: <br> $000=$ single-precision on ra and single-precision on rb $001=$ single-precision on ra and double-precision on rb $010=$ double-precision on ra and single-precision on rb $011=$ double-precision on ra and double-precision on rb $100=$ integer (2's complement) on both ra and rb $101=$ unsigned integer on both ra and rb |
| 8 | pa | precision of ra: $0=$ single-precision, $1=$ double-precision |
| 7 | pb | precision of rb : <br> $0=$ single-precision, $1=$ double-precision |
| 6 | s | output source: <br> $0=$ ALU result, $1=$ multiplier result |
| 5 | a | negate ALU result: $0=$ normal ALU result, $1=$ negated ALU result |
| 4 | va | absolute value of ra: $0=r a, 1=\|r a\|$ |
| 3 | vb | absolute value of rb : $0=\mathrm{rb}, 1=\|\mathrm{rb}\|$ |
| 2 | vy | absolute value of rd: $0=\mathrm{rd}, 1=\|\mathrm{rd}\|$ |
| 2 | m | negate multiplier result: $0=$ normal multiplier result, $1=$ negated multiplier result |
| 2 | ny | negate output result: <br> $0=$ normal output result, $1=$ negated output result |
| 1 | wa | wrapped number on ra: $0=$ normal format, $1=$ wrapped number |
| 0 | wb | wrapped number on rb: $0=$ normal format, $1=$ wrapped number |



| Syntax | and ra.type, rb.type, rd[.modifier] |  |  |  |  |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Execution | ra AND rb $\rightarrow$ rd |  |  |  |  |  |  |  |  |  |  |  |
| Instruction Words | 31 |  | 30 | 29 |  | 28 | 27 |  | 23 |  |  | 15 |
|  | 0 |  | 0 | e |  | h |  | ra |  | rb |  |  |
|  | $14 \quad 11$ | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|  | sel_op | 0 | 1 | 0 | t | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| Description | This instruction takes the logical AND of ra with rb and places the result in rd. |  |  |  |  |  |  |  |  |  |  |  |
| Sources for ra | RA9-RAO <br> C or CT Register MULFB (Multiplier feedback) LAD (Immediate data from LAD bus) ONE (the value one) |  |  |  |  |  |  |  |  |  |  |  |
| Sources for rb | RB9-RB0 <br> C or CT Register <br> ALUFB (ALU feedback) <br> LAD (Immediate data from LAD bus) ONE (the value one) |  |  |  |  |  |  |  |  |  |  |  |
| Types for ra and ib | $i$ (signed integer) <br> u (unsigned integer) |  |  |  |  |  |  |  |  |  |  |  |
| Modifiers for ra and rb | none |  |  |  |  |  |  |  |  |  |  |  |
| Destinations for rd | RA9-RAO RB9-RB0 C or CT STATUS, VECTOR | CON MC |  | OUNT SUBA | $X,$ $\mathrm{DD}$ | UNT SUB | DD | IR | G, | LOO |  |  |
| Modifiers for rd | e (send output to LAD bus, $\overline{\text { WE }}$ strobe) <br> h (send output to LAD bus, $\overline{\text { ALTCH }}$ strobe) |  |  |  |  |  |  |  |  |  |  |  |
| Restrictions | The types for ra and rb must be the same. |  |  |  |  |  |  |  |  |  |  |  |
| Example | and LAD.i, ONE.i, CT |  |  |  |  |  |  |  |  |  |  |  |


| Syntax | andna ra.type, rb.type, rd[.modifier] |  |  |  |  |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Execution | (NOT ra) AND rb $\rightarrow$ rd |  |  |  |  |  |  |  |  |  |  |  |
| Instruction Words | 31 | 30 |  | 29 |  | 28 | 27 |  | 23 |  |  | 15 |
|  | 0 | 0 |  | e |  | h |  | ra |  | 内 |  |  |
|  | $14 \quad 11$ | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|  | sel_op | 0 | 1 | 0 | $t$ | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
| Description | This instruction takes the logical AND operation of (NOT ra) with rb and places the result in rd. |  |  |  |  |  |  |  |  |  |  |  |
| Sources for ra | RA9-RAO <br> C or CT Register MULFB (Multiplier feedback) LAD (Immediate data from LAD bus) ONE (the value one) |  |  |  |  |  |  |  |  |  |  |  |
| Sources for rb | RB9-RB0 <br> C or CT Register <br> ALUFB (ALU feedback) <br> LAD (Immediate data from LAD bus) ONE (the value one) |  |  |  |  |  |  |  |  |  |  |  |
| Types for ra and rb | i (signed integer) u (unsigned integer) |  |  |  |  |  |  |  |  |  |  |  |
| Modiflers for ra and rb | none |  |  |  |  |  |  |  |  |  |  |  |
| Destinations for rd | $\begin{aligned} & \text { RA9-RA0 } \\ & \text { RB9-RB0 } \\ & \text { C or CT } \\ & \text { STATUS, CONFIG, COUNTX, COUNTY } \\ & \text { VECTOR, MCADDR, SUBADDO, SUBADD1, IRAREG, LOOPCT } \end{aligned}$ |  |  |  |  |  |  |  |  |  |  |  |
| Modifiers for rd | e (send output to LAD bus, $\overline{\text { WE }}$ strobe) <br> $h$ (send output to LAD bus, $\overline{\text { ALTCH }}$ strobe) |  |  |  |  |  |  |  |  |  |  |  |
| Restrictions | The types for ra and rb must be the same. |  |  |  |  |  |  |  |  |  |  |  |
| Example | andna RA0.u, RB8.u, C.h |  |  |  |  |  |  |  |  |  |  |  |


| Syntax | andnb ra |  | , rb.t | , rd[.m | mod |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Execution | ra AND ( | T | rb) |  |  |  |  |  |  |  |  |  |
| Instruction Words | 31 |  | 30 | 29 |  | 28 | 27 |  | 23 |  |  | 15 |
|  | 0 |  | 0 | e |  | h |  | ra |  | rb |  |  |
|  | 1411 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|  | sel_op | 0 | 1 | 0 | t | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
| Description | This instr the result | ction $\mathrm{n} \text { rd }$ | ntak <br> d. | he logi | ical | ND ol | erat | no | with | NO |  | places |
| Sources for ra | RA9-RAO <br> C or CT MULFB ( LAD (Imm ONE (the | egis Multip edia valu | ter plier ate da ene | dback <br> from | $\angle A D$ |  |  |  |  |  |  |  |
| Sources for rb | RB9-RB0 <br> C or CT <br> ALUFB <br> LAD (Imm <br> ONE (the | egis LU edia valu | ter feedb ate da e one) |  | $\angle A D$ |  |  |  |  |  |  |  |
| Types for ra and rb | i (signed <br> u (unsign | integ d in | er) <br> teger |  |  |  |  |  |  |  |  |  |
| Destinations for rd | RA9-RAO <br> RB9-RB0 <br> C or CT <br> STATUS, <br> VECTOR | CON <br> MC | NFIG ADD | UUNT SUBA | X, DD | UN <br> SUB | DD |  |  |  |  |  |
| Modifiers for rd | e (send <br> $h$ (send | utput <br> utput | $\begin{aligned} & \text { It to } L \\ & \text { it to } \end{aligned}$ | bus, bus, |  | robe) $\overline{\mathrm{H}} \operatorname{str}$ |  |  |  |  |  |  |
| Restrictions | The types | for | ra an | must | be | e sa |  |  |  |  |  |  |
| Example | andnb C | .i, | ONE | RBI | . h |  |  |  |  |  |  |  |


| Syntax | Cjmp cond_masks, address <br> If condition is true, <br> jump address $\rightarrow$ Program Count <br> If condition is false, <br> 1+ Program Count$\rightarrow$ Program Count |
| :--- | :--- |

## Alternate Opcodes <br> The following are alternative opcodes recognized by the TMS34082 Software Tool Kit that perform the same instruction as cjmp.

Opcode Description

| beq address | Branch on equal |
| :--- | :--- |
| bge address | Branch on greater than or equal |
| bgt address | Branch on greater than |
| ble address | Branch on less than or equal |
| blt address | Branch on less than |
| bne address | Branch on not equal |
| boh address | Branch on overflow high |
| bol address | Branch on overflow low |
| br address | Branch unconditional <br> brloop address <br> bucch address <br> buccl address |
|  | Branch on loop count |
|  | Branch on CC pin high |
| jmpind |  |
| jmpindch on CC pin low |  |

## Example

```
cjmp D | P, 0x030
```

This example decrements the value in the LOOPCT register, then checks to see if it is zero. If it is, the jump is taken (since $P$ is set to change polarity). The address output on MSA15-MSA0 is 30 hex.
Syntax cjsr cond_masks, addressExecution
Instruction Words
Description
Conditions forcond_masks
If condition is true,
Program Counter $\rightarrow$ SABADDRx jump address $\rightarrow$ Program Counter If condition is false,

    Program Counter \(+1 \rightarrow\) Program Counter
    | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 18 | 18 | 17 | 16 |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| 1 | 0 | 0 | 1 | A | N | G | Z | V | E | C | P | D | O | M | I |

15

```
jump address
```

Jump conditional to the specified subroutine address. During a jump instruction, no FPU operations are performed.
Listed below are the jump instruction condition mask bits (enabled when high):

| A | Always | C | CC pin |
| :--- | :--- | :--- | :--- |
| N | Negative | P | Change polarity (for N,G,Z,V,E,C, and D) |
| G | Greater than | D | Decrement LOOPCT, Jump not zero |
| Z | Zero | M | Jump indirect using MCADDR |
| V | Overflow | I | Jump to internal ROM routine |
| E | ED bit |  |  |

An unconditional jump may be done by setting the $A$ mask bit high. If $C$ is enabled and the CC bit is high, all other jump condition enables except $P, M$, and I are turned off. Multiple jump conditions are separated by vertical bar '|' and are logically ORed together. The condition mask $P$ changes polarity for each individual bit before the logical OR operation.
Range for address 0x0-0xFFFF

## Alternate Opcodes

Example

The following are alternative opcodes recognized by the TMS34082 Software Tool Kit that perform the same instruction as cjsr.

| Opcode | Description |
| :--- | :--- |
| call address | Call unconditional |
| callcch address | Call on CC pin high |
| callccl address | Call on CC pin low |
| calleq address | Call on equal |
| callge address | Call on greater than or equal |
| calligt address | Call on greater than |
| calle address | Call on less than or equal |
| callit address | Call on less than |
| callne address | Call on not equal |
| calloh address | Call on overflow high |
| callol address | Call on overflow |
|  |  |
| callind | Call indirect unconditional |
| callindcch | Call indirect on CC pin high |
| callindccl | Call indirect on CC pin low |
| callindeq | Call indirect on equal |
| callindge | Call indirect on greater than or equal |
| callindgt | Call indirect on greater than |
| callindle | Call indirect on less than or equal |
| callindlt | Call indirect on less than |
| callindne | Call indirect on not equal |
| callindoh | Call indirect on overflow high |
| callindol | Call indirect on overflow low |
| intcall address | Internal call unconditional |
| cjsr 1 c m |  |

This instruction checks the CC input and jumps to the address in the MCADDR register if CC is high.

Note: 'cjsr A, address' is equivalent to 'call address'


| Syntax | compl ra.type, rd[.modifier] |  |  |  |  |  |  |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Execution | (NOT ra) $\rightarrow$ rd |  |  |  |  |  |  |  |  |  |  |  |  |  |
| Instruction Words | 31 |  |  | 29 | 28 | 27 | 24 | 23 |  |  | 21 | 20 | 19 | 15 |
|  | 0 |  |  | e | h |  | , | 0 |  |  | 0 | 0 |  | d |
|  | 14 |  | 10 | 9 | 8 | 6 |  | 5 | 4 |  |  | 1 |  | 0 |
|  | sel_ |  | 0 | 1 | type | 0 |  | 1 | 0 |  |  | 1 |  | 0 |
| Description | This instruction takes the 1s complement of ra and places it in rd. |  |  |  |  |  |  |  |  |  |  |  |  |  |
| Sources for ra | RA9-RA0 <br> C or CT Register MULFB (Multiplier feedback) LAD (Immediate data from LAD bus) ONE (the value one) |  |  |  |  |  |  |  |  |  |  |  |  |  |
| Types for ra | i (signed integer) <br> u (unsigned integer) |  |  |  |  |  |  |  |  |  |  |  |  |  |
| Modifiers for ra | none |  |  |  |  |  |  |  |  |  |  |  |  |  |
| Destinations for rd | RA9-R RB9-R C or STAT VECT |  |  |  | UNT <br> UBA |  | $\begin{aligned} & \text { JUN } \\ & \text { SU } \end{aligned}$ |  |  | STATUS, CONFIG, COUNTX, COUNTY VECTOR, MCADDR, SUBADDO, SUBADD1, IRAREG, LOOPCT | $\mathrm{G}, \mathrm{~L}$ | $\mathrm{PC}^{-}$ |  |  |
| Modifiers for rd | e (send output to LAD bus, $\overline{\mathrm{WE}}$ strobe) <br> h (send output to LAD bus, $\overline{\text { ALTCH }}$ strobe) |  |  |  |  |  |  |  |  |  |  |  |  |  |
| Example | compl RA7.i, C.h |  |  |  |  |  |  |  |  |  |  |  |  |  |





| Syntax | dtou ra[.modifier], rd[.modifier] |  |  |  |  |  |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Execution | ra (double-precision $\rightarrow$ rd (unsigned integer) |  |  |  |  |  |  |  |  |  |  |  |  |
| Instruction Words | 31 | 30 | 29 | 28 | 27 | 24 | 23 | 22 | 21 |  |  |  | 15 |
|  | 0 | 0 | e | h |  |  | 0 | 0 | 0 |  |  | rd |  |
|  | $14 \quad 11$ | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 |  |  |
|  | sel_op | 0 | 0 | 1 | 1 | 0 | 1 | va | 0 | 1 | 1 |  |  |
| Description | This instruction takes a double-precision floating-point formatted value in ra and converts it to an unsigned integer. |  |  |  |  |  |  |  |  |  |  |  |  |
| Sources for ra | RA9-RA0 <br> C or CT Register MULFB (Multiplier feedback) ONE (the value one) |  |  |  |  |  |  |  |  |  |  |  |  |
| Types for ra | type is implicit in the opcode |  |  |  |  |  |  |  |  |  |  |  |  |
| Modifiers for ra | $v$ (absolute value) |  |  |  |  |  |  |  |  |  |  |  |  |
| Destinations for rd | ```RA9-RA0 RB9-RB0 C or CT STATUS, CONFIG, COUNTX, COUNTY VECTOR, MCADDR, SUBADD0, SUBADD1, IRAREG, LOOPCT``` |  |  |  |  |  |  |  |  |  |  |  |  |
| Modifiers for rd | e (send output to LAD bus, $\overline{\text { WE }}$ strobe) <br> h (send output to LAD bus, $\overline{\text { LLTCH }}$ strobe) |  |  |  |  |  |  |  |  |  |  |  |  |
| Example | dtou RA7.v, C |  |  |  |  |  |  |  |  |  |  |  |  |






| Syntax | itof ra, rd[.modifier] |  |  |  |  |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Execution | ra (integer) $\rightarrow$ rd (single-precision) |  |  |  |  |  |  |  |  |  |  |  |
| Instruction Words | 31 | 30 | 29 | 28 | 27 | 24 | 23 | 22 | 21 | 20 | 19 | 15 |
|  | 0 | 0 | e | h | ra |  | 0 | 0 | 0 | 0 |  | rd |
|  | 1411 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 |  | 0 |
|  | sel_op | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |  | 0 |
| Description | This instruction converts the value in ra from integer form to single-precision floating-integer and places the result in rd. |  |  |  |  |  |  |  |  |  |  |  |
| Sources for ra | RA9-RAO <br> C or CT Register MULFB (Multiplier feedback) ONE (the value one) |  |  |  |  |  |  |  |  |  |  |  |
| Types for ra | type is implicit in the opcode |  |  |  |  |  |  |  |  |  |  |  |
| Modiflers for ra | none |  |  |  |  |  |  |  |  |  |  |  |
| Destinations for rd | ```RA9-RAO RB9-RB0 C or CT STATUS, CONFIG, COUNTX, COUNTY VECTOR, MCADDR, SUBADD0, SUBADD1, IRAREG, LOOPCT``` |  |  |  |  |  |  |  |  |  |  |  |
| Modiflers for rd | e (send output to LAD bus, $\overline{\mathrm{WE}}$ strobe) <br> h (send output to LAD bus, $\overline{\text { ALTCH strobe) }}$ |  |  |  |  |  |  |  |  |  |  |  |
| Example | itof ONE, RBO.h |  |  |  |  |  |  |  |  |  |  |  |

## Syntax <br> Instruction Words

## Description

Destinations for reg

Types for reg

Sources for address

Range for count
Example

Id reg.type, address, count


During a move instruction no FPU operation is performed. Register control logic for move instructions counts sequentially from the beginning register address, with the exception that the $C$ and $C T$ registers are omitted from the count. The entire register file acts like a ring buffer during the move instruction. The C and CT registers are not accessible to moves. It is illegal to use the C or CT register address as the starting address for a move instruction.
$T$ (Type) and $S$ (Size) give the format of the numbers
T $0=$ integer, $\quad 1=$ floating point $S 0=32$ bits $1=64$ bits
Note: Setting TS = 01 is reserved
Word count is the number of operands to be moved ( $n$ ). A count of 0 will move 256 items. The beginning register address is stored in the register field, and the beginning memory address is the start address field (bits 15-0).

An indirect move is designated by selecting MCADDR as the address. The M bit will be set low and the 16 low-order bits are then disregarded. The starting address in memory comes from the MCADDR register.

To move data from the LAD bus, LAD is selected as the address. The L bit will be set high, and the low-order 16 bits are set to 0 . An address of 'COINT' will load data from the LAD bus and set COINT low for the cycles the load is executing. ( C will be set high in the instruction word.) This option is valid for host-independent mode only.
RA9-RA0
RB9-RB0
STATUS, CONFIG, COUNTX, COUNTY
VECTOR, MCADDR, SUBADD0, SUBADD1, IRAREG, LOOPCT
f (single-precision floating-point) d (double-precision floating-point)
i (signed integer)
u (unsigned integer)
0x0-0xFFFF
MCADDR, LAD, COINT
0-31
ld CONFIG.i, $0 \times 100,1$
ld RAO.i, MCADDR, 3
ld RB1.i, LAD, 3

| Syntax | Idict count |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Execution | count $\rightarrow$ LOOPCT |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| Instruction Words | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 18 | 18 | 17 | 16 |
|  | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|  | 15 |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
|  | count |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| Description | This instruction loads the LOOPCT register with the value specified by count. If the register is loaded with 0 , the loop would execute 64 K times. |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| Range for count | 0x0-0xFFFFF |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| Example | ldict 0x0A |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
|  | This example loads the LOOPCT register with 10 (A hex). |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |


| Syntax | Idmcaddr address |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Execution | address $\rightarrow$ MCADDR |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| Instruction Words | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 |
|  | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|  | 16 |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
|  | address |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| Description | This instruction loads the indirect address (MCADDR) register with the value specified by count. This is a 17-bit value; the most significant bit selects between code and data space. |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| Range for address | 0x0-0x1FFFF |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| Example | ldmcaddr 0x0A |  |  |  |  |  |  |  |  |  |  |  |  |  |  |

This example loads the MCADDR register with 10 (A hex).



## Syntax

Instruction Words
movim address.type, count[, memory_type]

| 31 | 30 | 29 | 28 | 27 | 26 | 25 | 18 | 17 | 16 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 1 | 1 | 0 | 0 | T | S | word count | M | D |  |


| 15 | 2 |  | 1 |
| :--- | :---: | :---: | :---: |
| start address | $\mathrm{c}^{\dagger}$ |  |  |

$\dagger$ for indirect moves only.
Each instruction can transfer up to 256 items that are 1 or 2 words long. During a move instruction, no FPU operation is performed.
$T$ (Type) and $S$ (Size) determine the number format.
T $0=$ integer, $\quad 1=$ floating point
$S 0=32$ bits, $1=64$ bits
Note: Setting TS $=01$ is reserved
Word count is the number of operands to be moved ( $n$ ). A word count of 0 will move 256 items. 51232 -bit values may be moved by setting count=0 and specifying double-precision format. The beginning memory address (for MSD) is the start address field.

An indirect MOVE is selected by using 'MCADDR' for the address. Bit 17 (M) will be set high. The starting address is found in the indirect address register (MCADDR). When $M$ is high, the 16 low-order bits are disregarded, with the exception of bit 1 . Choosing 'COINT' as the address will set the C bit high. The COINT output will be low during the cycles the move is executing. The MCADDR register stores the starting address. This option is only valid for host-independent mode.

If bit 16 (D) is high, data space is used as the destination. If the bit is low, code space is used.

Valid memory types are CODE ( $D=0$ ) and DATA ( $D=1$ ). The default value, if none is specified, is CODE. If 'MCADDR' or 'COINT' is the address, the memory type must NOT be specified (bit 16 of the MCADDR register selects the memory type).

Destination addresses
$0 \times 0-0 \times F F F F$ MCADDR, COINT
Types for reg
$f$ (single-precision floating-point) d (double-precision floating-point) i (signed integer) u (unsigned integer)

Range for count
Types for memory_type

0-255
CODE
DATA

Example<br>movlm _vec1.i, 2, DATA<br>movlm MCADDR.f, 2<br>movlm COINT.i, 2

Syntax
Instruction Word

## Description

Sources for address

Types for reg

Range for count
movml address.type, count[, count[, memory_type]

| 31 | 30 | 29 | 28 | 27 | 26 | 2120 | 16 |  |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| 1 | 1 | 0 | 0 | T | S | word count | M | D |


| 15 | 2 | 1 |
| :---: | :---: | :---: |
| start address | $\mathrm{C}^{\dagger}$ |  |

$\dagger_{\text {for indirect moves only. }}$
Each instruction can transfer up to 256 items that are 1 or 232 -bit words long. During a move instruction, no FPU operation is performed.
Valid sequencer opcode for this instruction format 1101 move $n$ words from MSD to LAD.
$T$ (Type) and $S$ (Size) determine the number format
T $0=$ integer, $\quad 1=$ floating point
$S 0=32$ bits, $1=64$ bits
Note: Setting TS $=01$ is reserved
Word count is the number of operands to be moved (n). A word count of 0 will move 256 items. 512 32-bit values may be moved by setting count=0 and specifying double-precision format. The beginning memory address (for MSD) is the start address field.
An indirect MOVE is selected by using 'MCADDR' for the address. Bit 17 (M) will be set high. The starting address is found in the indirect address register (MCADDR). When $M$ is high, the 16 low-order bits are disregarded, with the exception of bit 1. Choosing 'COINT' as the address will set the $C$ bit high. The COINT output will be low during the cycles the move is executing. The MCADDR register stores the starting address. This option is only valid for host-independent mode.
If bit 16 (D) is high, data space is used as the source. If the bit is low, code space is used.
Valid memory types are CODE ( $\mathrm{D}=0$ ) and DATA $(\mathrm{D}=1)$. The default value, if none is specified, is CODE. If 'MCADDR' or 'COINT' is the address, the memory type must NOT be specified (bit 16 of the MCADDR register selects the memory type).
0x0-0xFFFF
MCADDR, COINT
f (single-precision floating-point) d (double-precision floating-point) i (signed integer) $u$ (unsigned integer)

## Types for memory_type

## Example

CODE
DATA
movml _vec2.f, 3, DATA
movml MCADDR.f, 3
movml COINT.i, 3


| Syntax | mult ra.[modifier]type, rb.[modifier]type, rd[.modifier] |  |  |  |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Execution | $\mathrm{ra} \times \mathrm{rb} \rightarrow \mathrm{rd}$ |  |  |  |  |  |  |  |  |  |  |
| Instruction Words | 31 | $30 \quad 29$ |  |  | 28 | 27 |  | 2423 |  | 2019 | 15 |
|  | 0 | 0 | e |  | h |  | ra |  | rb |  |  |
|  | $14 \quad 11$ | 109 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|  | sel op | 0 | pa | pb | 1 | 0 | va | vb | ny | wa | wb |
| Description | This instruction takes the product of ra and rb and places it in rd. |  |  |  |  |  |  |  |  |  |  |
| Sources for ra | RA9-RAO <br> C or CT Register ALUFB (ALU feedback) LAD (Immediate data from LAD bus) ONE (the value one) |  |  |  |  |  |  |  |  |  |  |
| Sources for rb | RB9-RB0 <br> C or CT Register MULFB (Multiplier feedback) LAD (Immediate data from LAD bus) ONE (the value one) |  |  |  |  |  |  |  |  |  |  |
| Types for ra and rb | $\dagger$ (single-precision floating-point) <br> d (double-precision floating-point) <br> i (signed integer) <br> u (unsigned integer) |  |  |  |  |  |  |  |  |  |  |
| Destinations for rd | RA9-RA0 <br> RB9-RB0 <br> C or CT <br> STATUS, CONFIG, COUNTX, COUNTY <br> VECTOR, MCADDR, SUBADD0, SUBADD1, IRAREG, LOOPCT |  |  |  |  |  |  |  |  |  |  |
| Modifiers for rd | n (negated, not valid for integer types) <br> $e$ (send output to LAD bus, WE strobe) <br> h (send output to LAD bus, $\overline{\text { ALTCH strobe) }}$ |  |  |  |  |  |  |  |  |  |  |
| Restrictions | Feedback registers (ALUFB or MULFB) may not be used as operands for double-precision multiplies. |  |  |  |  |  |  |  |  |  |  |
| Example | mult RAO.f,C.f, CT |  |  |  |  |  |  |  |  |  |  |



| Modifiers for rd | a (negate ALU result, valid only for chained mode noninteger types) m (negate multiplier result, valid only for chained mode noninteger types) <br> e (send output to LAD bus, WE strobe) <br> h (send output to LAD bus, $\overline{\text { ALTCH }}$ strobe) |
| :---: | :---: |
| Output_sources | ALU MULT |
| Restrictions | If ra2 is specified, then at least one feedback source must be used (either ra or ra2). If rb2 is specified, then at least one feedback source must be used (either rb or rb2). |
|  | Feedback registers (ALUFB or MULFB) may not be used as operands for double-precision multiplies. |
| Example | mult.add RA2.f, lad2, RB7.u. ALUFB2, CT.a, ALU |


| Syntax | mult.neg ra.type, ra2. rb.type, rd[.mod |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: |
| Execution | $\mathrm{ra} \times \mathrm{rb} \rightarrow \mathrm{rd}$ or MULFB; $0-\mathrm{ra} 2 \rightarrow \mathrm{rd}$ O |  |  |  |  |
| Instruction Words | 31 | 30 | 2 |  | 28 |
|  | 0 | 0 |  |  | h |
|  | $14 \quad 11$ | $10 \quad 9$ | 8 | 7 | 6 |
|  | sel_op | 1 | pa | pb | $s$ |
| Description | The chained-mode instruction places th rd or the multiplier feedback, and concu 0 and places the result into either rd or |  |  |  |  |
| Sources for ra | RA9-RA0 <br> C or CT Register <br> ALUFB (ALU feedback) <br> LAD (Immediate data from LAD bus) ONE (the value one) |  |  |  |  |
| Sources for rb | RB9-RB0 <br> C or CT Register MULFB (Multiplier feedback) LAD (Immediate data from LAD bus) ONE (the value one) |  |  |  |  |
| Types for ra and rb | $f$ (single-precision floating-point) <br> d (double-precision floating-point) <br> $i$ (signed integer) <br> u (unsigned integer) |  |  |  |  |
| Modifiers for ra and rb | none |  |  |  |  |
| Sources for ra2 | RA9-RA0 <br> C or CT register MULFB (Multiplier feedback) LAD (immediate data from LAD bus) ONE (the value one) |  |  |  |  |
| Sources for rb2 | RB9-RB0 <br> C or CT register ALUFB (ALU feedback) LAD (immediate) ONE (the value one) |  |  |  |  |
| Destinations for rd | $\begin{aligned} & \text { RA9-RA0 } \\ & \text { RB9-RB0 } \\ & \text { C or CT } \end{aligned}$ |  |  |  |  |


| Modifiers for ra | a (negate ALU result, valid only for chained mode noninteger types) |
| :--- | :--- |
|  | $m$ (negate multiplier result, valid only for chained mode noninteger types) |
|  | e (send output to LAD bus, $\overline{\text { WE strobe) }}$ |
|  | h (send output to LAD bus, $\overline{\text { ALTCH strobe) }}$ |
| Output_sources | ALU |
|  | MULT |

Restrictions

Example

If ra2 is specified then at least one feedback source must be used (either ra or ra2). If rb2 is specified then at least one feedback source must be used (either rb or rb2).

Feedback registers (ALUFB or MULFB) may not be used as operands for double-precision multiplies.
mult.neg RA1.f, LAD2, RB6d, RB0, MULT

| Syntax | mult.pass ra.type, ra2, rb.type, rd[.modifier], output_source |  |  |  |  |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Execution | $\mathrm{ra} \times \mathrm{rb} \rightarrow \mathrm{rd}$ or MULFB; ra2 $+0 \rightarrow \mathrm{rd}$ or ALUFB |  |  |  |  |  |  |  |  |  |  |  |
| Instruction Words | 31 | 30 |  | 29 |  | 28 | 27 |  | 2423 |  | 2019 | 15 |
|  | 0 | 0 |  | 0 |  | 0 |  | ra or ra2 |  | b |  |  |
|  | $14 \quad 11$ | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|  | sel_op | 1 | $t$ | pa | pb | $s$ | 1 | 0 | a | m | 0 | 0 |
| Description | This chained-mode instruction places the product of a value of ra and rb in rd or the multiplier feedback, and concurrently places the sum of the value of ra2 and 0 into either rd of the ALU feedback. |  |  |  |  |  |  |  |  |  |  |  |
| Sources for ra | RA9-RAO <br> C or CT Register <br> ALUFB (ALU feedback) <br> LAD (Immediate data from LAD bus) ONE (the value one) |  |  |  |  |  |  |  |  |  |  |  |
| Sources for rb | RB9-RB0 <br> C or CT Register MULFB (Multiplier feedback) LAD (Immediate data from LAD bus) ONE (the value one) |  |  |  |  |  |  |  |  |  |  |  |
| Modifiers for ra and rb | none |  |  |  |  |  |  |  |  |  |  |  |
| Sources for ra2 | RA9-RA0 <br> C or CT register MULFB (Multiplier feedback) LAD (immediate data from LAD bus) ONE (the value one) |  |  |  |  |  |  |  |  |  |  |  |
| Sources for rb2 | RB9-RB0 <br> C or CT register <br> ALUFB (ALU feedback) <br> LAD (immediate) <br> ONE (the value one) |  |  |  |  |  |  |  |  |  |  |  |
| Destinations for rd | $\begin{aligned} & \text { RA9-RA0 } \\ & \text { RB9-RB0 } \\ & \text { C or CT } \end{aligned}$ |  |  |  |  |  |  |  |  |  |  |  |
| Modifiers for rd | a (negate ALU result, valid only for chained mode noninteger types) <br> m (negate multiplier result, valid only for chained mode noninteger types) <br> e (send output to LAD bus, WE strobe) <br> h (send output to LAD bus, $\overline{\text { ALTCH }}$ strobe) |  |  |  |  |  |  |  |  |  |  |  |
| Output_sources | ALU <br> MULT |  |  |  |  |  |  |  |  |  |  |  |


| Restrictions | If ra2 is specified then at least one feedback source must be used (either ra <br> or ra2). If rb2 is specified then at least one feedback source must be used <br> (either rb or rb2). |
| :--- | :--- |
| Feedback registers (ALUFB or MULFB) may not be used as operands for |  |
| double-precision multiplies. |  |

mult.pass RA4.f, C2, RB6.d, CT.a, ALU

Syntax
Execution

Instruction Words

| 31 | 30 | 29 | 28 |  | 27 | 2423 |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | $e$ | h | ra or ra2 | rb or rb2 | rd |  |


| 14 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| sel_op | 1 | t | pa | pb | s | 0 | 0 | a | m | 0 | 1 |

Description

Sources for ra

Sources for rb

Types for ra and rb
mult.sub ra.type, ra2, rb.type, rb2, rd[.modifier], output_source
$\mathrm{ra} \times \mathrm{rb} \rightarrow \mathrm{rd}$ or MULFB; ra2 - rb2 $\rightarrow \mathrm{rd}$ or ALUFB

This chained-mode instruction places the product of the values of ra and rb in either rd or MULFB, and concurrently places the difference of the values from ra2 and rb2 into rd or ALUFB.

RA9-RA0
C or CT Register
ALUFB (ALU feedback)
LAD (Immediate data from LAD bus)
ONE (the value one)
RB9-RB0
C or CT Register
MULFB (Multiplier feedback)
LAD (Immediate data from LAD bus)
ONE (the value one)
f (single-precision floating-point) d (double-precision floating-point)
i (signed integer)
u (unsigned integer)
none

RA9-RA0
C or CT register
MULFB (Multiplier feedback)
LAD (immediate data from LAD bus)
ONE (the value one)
Sources for rb2
RB9-RB0
C or CT register
ALUFB (ALU feedback)
LAD (immediate)
ONE (the value one)
Destinations for rd
RA9-RA0

RB9-RB0
C or CT

| Modifiers for rd | a (negate ALU result, valid only for chained mode noninteger types) <br> m (negate multiplier result, valid only for chained mode non-nteger types) <br> e (send output to LAD bus, $\overline{W E}$ strobe) |
| :--- | :--- |
| h (send output to LAD bus, $\overline{\text { ALTCH strobe) }}$ |  |

Feedback registers (ALUFB or MULFB) may not be used as operands for double-precision multiplies.

Example
mult.sub RA8.d, MULFB2, RB4.d, ALUFB2, RA9.m,MULT

Syntax
Execution
Instruction Words

Instruction Words

Sources for ra

Sources for rb

Modifiers for ra and rb
none

Sources for ra2

Sources for rb2

Destinations for rd

RA9-RA0
C or CT register
MULFB (Multiplier feedback)
LAD (immediate data from LAD bus)
ONE (the value one)
RB9-RB0
C or CT register
ALUFB (ALU feedback)
LAD (immediate)
ONE (the value one)
RA9-RA0
RB9-RB0
C or CT
Modifiers for rd a (negate ALU result, valid only for chained mode noninteger types)
m (negate multiplier result, valid only for chained mode noninteger types)
e (send output to LAD bus, WE strobe)
$h$ (send output to LAD bus, ALTCH strobe)
Output_sources
ALU
MULT
Restrictions
Example

| Syntax | mult.subrl ra.type, ra2. rb.type, rb2; rd[.modifier], output_source |  |  |  |  |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Execution | $\mathrm{ra} \times \mathrm{rb} \rightarrow \mathrm{rd}$ or MULFB; $\mathrm{rb2}-\mathrm{ra} 2 \rightarrow \mathrm{rd}$ or ALUFB |  |  |  |  |  |  |  |  |  |  |  |
| Instruction Words | 31 |  | 30 | 29 |  | 28 | 27 | 72 | 23 |  |  | 15 |
|  | 0 |  | 0 | e |  | h |  | ra or ra2 |  | or rb2 |  |  |
|  | $14 \quad 11$ | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|  | sel_op | 1 | It | pa | pb | s | 0 | 0 | a | m | 1 | 1 |
| Description | This instruction places the product of a value in ra and rb in either rd or multiplier feedback and concurrently subtracts the value of ra2 from rb2 and places the result in either rd or the ALU feedback. |  |  |  |  |  |  |  |  |  |  |  |
| Sources for ra | RA9-RA0 <br> C or CT Register <br> ALUFB (ALU feedback) <br> LAD (Immediate data from LAD bus) ONE (the value one) |  |  |  |  |  |  |  |  |  |  |  |
| Sources for rb | RB9-RB0 <br> C or CT Register MULFB (Multiplier feedback) LAD (Immediate data from LAD bus) ONE (the value one) |  |  |  |  |  |  |  |  |  |  |  |
| Types for ra and rb | f (single-precision floating-point) <br> d (double-precision floating-point) <br> i (signed integer) <br> u (unsigned integer) |  |  |  |  |  |  |  |  |  |  |  |
| Modifiers for ra and rb | none |  |  |  |  |  |  |  |  |  |  |  |
| Sources for ra2 | RA9-RA0 <br> C or CT register MULFB (Multiplier feedback) LAD (immediate data from LAD bus) ONE (the value one) |  |  |  |  |  |  |  |  |  |  |  |
| Sources for rb2 | RB9-RB0 <br> C or CT register ALUFB (ALU feedback) LAD (immediate) ONE (the value one) |  |  |  |  |  |  |  |  |  |  |  |
| Destinations for rd | $\begin{aligned} & \text { RA9-RAO } \\ & \text { RB9-RBO } \\ & \text { C or CT } \end{aligned}$ |  |  |  |  |  |  |  |  |  |  |  |


| Modifiers for rd | a (negate ALU result, valid only for chained mode noninteger types) |
| :--- | :--- |
|  | $m$ (negate multiplier result, valid only for chained mode noninteger types) |
|  | e (send output to LAD bus, $\overline{W E}$ strobe) |
|  | $h$ (send output to LAD bus, $\overline{A L T C H}$ strobe) |
| Output_sources | ALU |
|  | MULT |

Restrictions

Example


## Syntax

## Instruction Words

nop

| 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

## Description

This instruction performs no operation. If FPU core output registers are enabled (PIPES2=0), the output registers hold their previous value.

This instruction may be used if the TMS34082 is idle, or to wait for a previous instruction to finish.

| Syntax | nor ra.type, rb.type, rd[.modifier] |  |  |  |  |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Execution | ra NOR rb $\rightarrow$ rd |  |  |  |  |  |  |  |  |  |  |  |
| Instruction Words | 31 | 30 |  | 29 |  | 28 | 27 |  | 23 |  |  | 15 |
|  | 0 | 0 |  | e |  | h |  | ra |  | rb |  |  |
|  | $14 \quad 11$ | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|  | sel_op | 0 | 1 | 0 | t | 0 | 0 | 0 | 1 | 0 | 1 | 1 |
| Description | This instruction takes the logical NOR of ra with rb and places the result in rd. |  |  |  |  |  |  |  |  |  |  |  |
| Sources for ra | RA9-RAO <br> C or CT Register MULFB (Multiplier feedback) LAD (Immediate data from LAD bus) ONE (the value one) |  |  |  |  |  |  |  |  |  |  |  |
| Sources for rb | RB9-RB0 <br> C or CT Register <br> ALUFB (ALU feedback) <br> LAD (Immediate data from LAD bus) ONE (the value one) |  |  |  |  |  |  |  |  |  |  |  |
| Types for ra and rb | i (signed integer) u (unsigned integer) |  |  |  |  |  |  |  |  |  |  |  |
| Modifiers for ra and rb | none |  |  |  |  |  |  |  |  |  |  |  |
| Destinations for rd | RA9-RA0 RB9-RB0 C or CT STATUS, VECTOR | CONFI MCAD | IG, DDR | UUN <br> SUBA | X, | UNT | DD1 | IR | G, | LOO |  |  |
| Modifiers for rd | e (send output to LAD bus, $\overline{W E}$ strobe) <br> $h$ (send output to LAD bus, $\overline{\text { LLTCH }}$ strobe) |  |  |  |  |  |  |  |  |  |  |  |
| Restrictions | The types for ra and rb must be the same. |  |  |  |  |  |  |  |  |  |  |  |
| Example | nor CT.u, LAD.u, RB9.e |  |  |  |  |  |  |  |  |  |  |  |



| Syntax | pass ra.[modifier]type, rd[.modifier] |  |  |  |  |  |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Execution | $\mathrm{ra} \rightarrow \mathrm{rd}$ |  |  |  |  |  |  |  |  |  |  |  |  |
| Instruction Words | 31 | 30 | 29 |  | 28 | 27 | 24 | 23 | 22 | 21 | 20 | 19 | 15 |
|  | 0 | 0 | e |  | h |  |  | 0 | 0 | 0 | 0 | r |  |
|  | $14 \quad 11$ | 10 | 9 |  | 7 | 6 | 5 |  |  | 2 | 1 |  |  |
|  | sel_op | 0 |  | type |  | 0 | 1 |  |  | 0 | 0 |  |  |
| Description | This instruction copies the value in ra to rd. |  |  |  |  |  |  |  |  |  |  |  |  |
| Sources for ra | RA9-RAO <br> C or CT Register MULFB (Multiplier feedback) LAD (Immediate data from LAD bus) ONE (the value one) |  |  |  |  |  |  |  |  |  |  |  |  |
| Types for ra | f (single-precision floating-point) <br> d (double-precision floating-point) <br> $i$ (signed integer) <br> u (unsigned integer) |  |  |  |  |  |  |  |  |  |  |  |  |
| Modifiers for ra | $v$ (absolute value, not valid for integer types) |  |  |  |  |  |  |  |  |  |  |  |  |
| Destinations for rd | RA9-RA0 <br> RB9-RB0 <br> C or CT <br> STATUS, CONFIG, COUNTX, COUNTY <br> VECTOR, MCADDR, SUBADD0, SUBADD1, IRAREG, LOOPCT |  |  |  |  |  |  |  |  |  |  |  |  |
| Modifiers for rd | e (send output to LAD bus, $\overline{\mathrm{WE}}$ strobe) <br> h (send output to LAD bus, $\overline{\text { ALTCH }}$ strobe) |  |  |  |  |  |  |  |  |  |  |  |  |
| Example | pass RA5.vf, CT |  |  |  |  |  |  |  |  |  |  |  |  |


| Syntax | pass rb.[modifier]type, rd[.modifier] |  |  |  |  |  |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Execution | $\mathrm{rb} \rightarrow \mathrm{rd}$ |  |  |  |  |  |  |  |  |  |  |  |  |
| Instruction Words | 31 | 30 | 29 |  | 28 |  | 27 | 26 | 25 | 24 |  |  | 15 |
|  | 0 | 0 | - |  | h |  | 0 | 0 | 0 | 0 | r | rd |  |
|  | 1411 | 10 | 9 |  | 7 | 6 |  |  |  | 2 |  | 0 |  |
|  | se!_op | 0 |  | type |  | 0 |  |  |  | 1 |  | 1 |  |
| Description | This instruction copies the value in rb to rd. |  |  |  |  |  |  |  |  |  |  |  |  |
| Sources for rb | RB9-RB0 <br> C or CT Register ALUFB (ALU feedback) LAD (Immediate data from LAD bus) ONE (the value one) |  |  |  |  |  |  |  |  |  |  |  |  |
| Types for rb | f (single-precision floating-point) <br> d (double-precision floating-point) <br> i (signed integer) <br> u (unsigned integer) |  |  |  |  |  |  |  |  |  |  |  |  |
| Modifiers for rb | $v$ (absolute value, not valid for integer types) |  |  |  |  |  |  |  |  |  |  |  |  |
| Destinatlons for rd | ```RA9-RA0 RB9-RBO C or CT STATUS, CONFIG, COUNTX, COUNTY VECTOR, MCADDR, SUBADD0, SUBADD1, IRAREG, LOOPCT``` |  |  |  |  |  |  |  |  |  |  |  |  |
| Modifiers for rd | e (send output to LAD bus, $\overline{W E}$ strobe) <br> h (send output to LAD bus, $\overline{\text { ALTCH strobe) }}$ |  |  |  |  |  |  |  |  |  |  |  |  |
| Execution | pass RB2.i, CT |  |  |  |  |  |  |  |  |  |  |  |  |


Modifiers for rd
Output_sourcesALUMULT
Restrictions
Example
pass.add RA.d, MULFB2, RB9.f, CT,ALU


| Syntax | pass.pass ra.type, ra2. rd[.modifier], output_source |  |  |  |  |  |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Execution | $\mathrm{ra} \times 1 \rightarrow \mathrm{rd}$ or MULFB; ra2 $+0 \rightarrow \mathrm{rd}$ or ALUFB |  |  |  |  |  |  |  |  |  |  |  |  |
| Instruction Words | 31 | 30 | 29 | 28 | 27 | 24 | 23 | 22 | 21 |  |  |  | 15 |
|  | 0 | 0 | e | h |  |  | 0 | 0 | 0 |  | 0 |  |  |
|  | 14 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 |  |  |
|  | sel | 1 | t | pa | pb | s | 1 | 1 | a | m | 0 |  |  |
| Description | This chained-mode instruction places the product of a value of ra and 1 in rd or the multiplier feedback, and concurrently places the sum of the value of ra2 and 0 into either $r d$ of the ALU feedback. |  |  |  |  |  |  |  |  |  |  |  |  |
| Sources for ra | RA9-RA0 <br> C or CT Register <br> ALUFB (ALU feedback) <br> LAD (Immediate data from LAD bus) ONE (the value one) |  |  |  |  |  |  |  |  |  |  |  |  |
| Types for ra | $f$ (single-precision floating-point) <br> d (double-precision floating-point) <br> $i$ (signed integer) <br> u (unsigned integer) |  |  |  |  |  |  |  |  |  |  |  |  |
| Modifiers for ra | none |  |  |  |  |  |  |  |  |  |  |  |  |
| Sources for ra2 | RA9-RA0 <br> C or CT register MULFB (Multiplier feedback) LAD (immediate data from LAD bus) ONE (the value one) |  |  |  |  |  |  |  |  |  |  |  |  |
| Destinations for rd | $\begin{aligned} & \text { RA9-RA0 } \\ & \text { RB9-RB0 } \\ & \text { C or CT } \end{aligned}$ |  |  |  |  |  |  |  |  |  |  |  |  |
| Modifiers for rd | a (negate ALU result, valid only for chained mode noninteger types) <br> m (negate multiplier result, valid only for chained mode noninteger types) <br> e (send output to LAD bus, WE strobe) <br> h (send output to LAD bus, $\overline{\text { ALTCH }}$ strobe) |  |  |  |  |  |  |  |  |  |  |  |  |
| Output_sources | ALU <br> MULT |  |  |  |  |  |  |  |  |  |  |  |  |
| Restrictions | If rb2 is specified then at least one feedback source must be used (either rb or rb2). |  |  |  |  |  |  |  |  |  |  |  |  |
| Example | pass.pass RA7.f, C2, RB0, ALU |  |  |  |  |  |  |  |  |  |  |  |  |


| Syntax | pass.sub ra.type, ra2. rb.type, rd[.modifier], output_source |  |  |  |  |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Execution | $\mathrm{ra} \times 1 \rightarrow \mathrm{rd}$ or MULFB; ra2 $-\mathrm{rb} \rightarrow \mathrm{rd}$ or ALUFB |  |  |  |  |  |  |  |  |  |  |  |
| Instruction Words | 31 | 30 |  | 29 |  | 28 | 27 |  | 2423 |  | 2019 | 15 |
|  | 0 | 0 |  |  |  | 0 |  | ra or ra2 |  | rb |  |  |
|  | $14 \quad 11$ | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|  | sel op | 1 | $t$ | pa | pb | s | 0 | 1 | a | m | 0 | 1 |
| Description | This chained-mode instruction places the product of the values of ra and 1 in either rd or MULFB, and concurrently places the difference of the values from ra2 and rb into rd or ALUFB. |  |  |  |  |  |  |  |  |  |  |  |
| Sources for ra | RA9-RA0 <br> C or CT Register ALUFB (ALU feedback) LAD (Immediate data from LAD bus) ONE (the value one) |  |  |  |  |  |  |  |  |  |  |  |
| Sources for rb | RB9-RB0 <br> C or CT Register MULFB (Multiplier feedback) LAD (Immediate data from LAD bus) ONE (the value one) |  |  |  |  |  |  |  |  |  |  |  |
| Types for ra and rb | $f$ (single-precision floating-point) d (double-precision floating-point) <br> $i$ (signed integer) <br> u (unsigned integer) |  |  |  |  |  |  |  |  |  |  |  |
| Types for ra and rb | none |  |  |  |  |  |  |  |  |  |  |  |
| Sources for ra2 | RA9-RAO <br> C or CT register MULFB (Multiplier feedback) LAD (immediate data from LAD bus) ONE (the value one) |  |  |  |  |  |  |  |  |  |  |  |
| Destinations for rd | RA9-RA0 RB9-RB0 C or CT |  |  |  |  |  |  |  |  |  |  |  |
| Modifiers for rd | a (negate ALU result, valid only for chained mode noninteger types) <br> m (negate multiplier result, valid only for chained mode noninteger types) <br> e (send output to LAD bus, WE strobe) <br> h (send output to LAD bus, $\overline{\text { ALTCH }}$ strobe) |  |  |  |  |  |  |  |  |  |  |  |

Output_sources

Restrictions

## Example

ALU
MULT
If ra2 is specified then at least one feedback source must be used (either ra or ra2).
pass.sub RA1.i, LAD2, RB7.u, CT, ALU

| Syntax | pass.subri ra2. rb.type, rd[.modifier], output_source |  |  |  |  |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Execution | $\mathrm{ra} \times 1 \rightarrow \mathrm{rd}$ or MULFB; $\mathrm{rb}-\mathrm{ra} 2 \rightarrow \mathrm{rd}$ or ALUFB |  |  |  |  |  |  |  |  |  |  |  |
| Instruction Words | 31 | 30 |  | 29 |  | $\underline{28}$ | $27 \quad 2423$ |  |  | 2019 |  | 15 |
|  | 0 | 0 |  | e |  |  |  | ra or ra2 |  | rb |  |  |
|  | $14 \quad 11$ | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|  | sel_op | 1 | $t$ | pa | pb | s | 0 | 1 | a | m | 1 | 1 |
| Description | This instruction places the product of a value in ra and 1 in either rd or multiplier feedback and concurrently the value of ra2 and rb and places the result in either rd or the ALU feedback. |  |  |  |  |  |  |  |  |  |  |  |
| Sources for ra | RA9-RAO <br> C or CT Register ALUFB (ALU feedback) LAD (Immediate data from LAD bus) ONE (the value one) |  |  |  |  |  |  |  |  |  |  |  |
| Sources for rb | RB9-RB0 <br> C or CT Register MULFB (Multiplier feedback) LAD (Immediate data from LAD bus) ONE (the value one) |  |  |  |  |  |  |  |  |  |  |  |
| Types for ra and rb | $f$ (single-precision floating-point) <br> d (double-precision floating-point) <br> $i$ (signed integer) <br> u (unsigned integer) |  |  |  |  |  |  |  |  |  |  |  |
| Modifiers for ra and rb | none |  |  |  |  |  |  |  |  |  |  |  |
| Sources for ra2 | RA9-RAO <br> C or CT register MULFB (Multiplier feedback) LAD (immediate data from LAD bus) ONE (the value one) |  |  |  |  |  |  |  |  |  |  |  |
| Destinations for rd | $\begin{aligned} & \text { RA9-RAO } \\ & \text { RB9-RB0 } \\ & \text { C or CT } \end{aligned}$ |  |  |  |  |  |  |  |  |  |  |  |
| Modifiers for rd | a (negate ALU result, valid only for chained mode noninteger types) <br> m (negate multiplier result, valid only for chained mode noninteger types) <br> e (send output to LAD bus, WE strobe) <br> h (send output to LAD bus, $\overline{\text { ALTCH }}$ strobe) |  |  |  |  |  |  |  |  |  |  |  |
| Output_sources | ALU <br> MULT |  |  |  |  |  |  |  |  |  |  |  |


| Restrictions | If ra2 is specified then at least one feedback source must be used (either ra <br> or ra2). <br> If rb2 is specified then at least one feedback source must be used (either rb <br> or rb2). |
| :--- | :--- |
| Example | pass.subrl C.d, MULFB2, RB9.d, RA0.m, ALU |

Syntax
Execution
Instruction Words

Description

Sources for ra

Types for ra

Destinations for rd

Modifiers for rd

Output_sources

Restrictions

Example
pass.2suba ra.type, ra2, rd[.modifier], output_source
$r a \times 1 \rightarrow r d$ or MULFB; $2-r a 2 \rightarrow r d$ or ALUFB


| 14 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| sel_op | 1 | t | pa | pb | s | 0 | 1 | a | m | 1 | 0 |

This chained-mode instruction places the product of ra and 1 in rd or in MUL feedback, and concurrently subtracts the value of ra2 from 2 and places the result in rd or in the ALU feedback.

RA9-RAO
C or CT Register
ALUFB (ALU feedback)
LAD (Immediate data from LAD bus)
ONE (the value one)
$f$ (single-precision floating-point)
d (double-precision floating-point)
i (signed integer)
u (unsigned integer)
none
RA9-RA0
C or CT register
MULFB (Multiplier feedback)
LAD (immediate data from LAD bus)
ONE (the value one)
RA9-RA0
RB9-RB0
C or CT
a (negate ALU result, valid only for chained mode noninteger types)
m (negate multiplier result, valid only for chained mode noninteger types)
e (send output to LAD bus, WE strobe)
$h$ (send output to LAD bus, $\overline{\text { ALTCH }}$ strobe)
ALU
MULT
If ra2 is specified then at least one feedback source must be used (either ra or ra2).
pass.2suba RA2.f, ONE2, RB9.f, CTa, MULT

## Syntax

## Execution

Instruction Words

## Description

Alternate Opcodes

## rti

IRAREG $\rightarrow$ Program Counter

| 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

This instruction causes a jump to the address stored in the interrupt return register (IRAREG). It does not affect the INTG signal, which remains active until interrupts are re-enabled.
reti is an equivalent opcode for this instruction.

| Syntax | rts |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Execution | SUBADDR0 or SUBADDR1 $\rightarrow$ Program Counter |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| Instruction Words | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 |
|  | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|  | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|  | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Description | This instruction causes a jump to the address stored in the top of the stack. Note: ret is also valid |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| Alternate Opcodes | ret is an equivalent opcode for this instruction. |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |

## Syntax

Instruction Words
sll ra.type, rb.type, rd[.modifier]

| 31 | 30 |  | 29 |  | 28 |  | 27 | 2423 |  | 2019 | rd | 15 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 |  |  | e |  |  |  | ra |  | rb |  |  |  |
| $14 \quad 11$ | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |  |
| sel_op | 0 | 1 | type |  | 0 | 1 | 0 | 1 | 0 | 0 | 0 |  |

This instruction shifts the value in ra to the left by the number of bit positions indicated in rb. Zeros are shifted into the least significant bit location.

RA9-RAO
C or CT Register
MULFB (Multiplier feedback)
ONE (the value one)
RB9-RB0
i (signed integer) u (unsigned integer)
none

Destinations for rd RA9-RA0
RB9-RB0
C or CT
STATUS, CONFIG, COUNTX, COUNTY
VECTOR, MCADDR, SUBADD0, SUBADD1, IRAREG, LOOPCT
Modifiers for rd

Restrictions

## Example

e (send output to LAD bus, $\overline{W E}$ strobe)
$h$ (send output to LAD bus, $\overline{\text { ALTCH }}$ strobe)
The shift count is input as a five-bit positive number right-aligned in the exponent field of a single-precision floating point number. All other bits in the 32-bit word should be set to zero. For example $0 \times 00800000=$ shift count of 1.

This example shows how to shift using a variable shift value stored in RA0 and the operand to be shifted in RA1.

```
ld RB0.u, shift_const, 1 ; load RB0 with shift count
    ; of 23
sll RA0.u, RBO.u, RB1 ; prepare run-time shift
    ; value (in RAO).
sll RA1.u, RB1.u, C ; actual shift of RA1 with
    ; the shift value in RAO
shift_const: data 0x0b 8000 00
    ; equivalent shift count of 23
```

| Syntax | sqrt ra.[modifier]type, rd[.modifier] |  |  |  |  |  |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Execution | $\sqrt{r a} \rightarrow r d$ |  |  |  |  |  |  |  |  |  |  |  |  |
| Instruction Words | 31 | 30 | 29 |  | 28 | 27 | 24 | 23 | 22 | 21 | 20 |  | 15 |
|  | 0 | 0 | e |  | h |  |  | 0 | 0 | 0 | 0 |  |  |
|  | $14 \quad 11$ | 10 | 9 |  | 7 | 6 | 5 | 4 | 3 | 2 | 1 |  |  |
|  | sel_op | 0 |  | type |  | 1 | 1 | va | 1 | ny | wa |  |  |
| Description | This instruction takes the square root of the value in ra and places it in rd. |  |  |  |  |  |  |  |  |  |  |  |  |
| Sources for ra | RA9-RAO <br> C or CT Register ALUFB (ALU feedback) ONE (the value one) |  |  |  |  |  |  |  |  |  |  |  |  |
| Types for ra | $f$ (single-precision floating-point) <br> d (double-precision floating-point) <br> $i$ (signed integer) <br> $u$ (unsigned integer) |  |  |  |  |  |  |  |  |  |  |  |  |
| Modifiers for ra | $v$ (absolute value, not valid for integer types) <br> w (wrapped, not valid for integer types) |  |  |  |  |  |  |  |  |  |  |  |  |
| Destinations for rd | RA9-RAO <br> RB9-RB0 <br> C or CT <br> STATUS, CONFIG, COUNTX, COUNTY <br> VECTOR, MCADDR, SUBADDO, SUBADD1, IRAREG, LOOPCT |  |  |  |  |  |  |  |  |  |  |  |  |
| Modifiers for rd | n (negated, not valid for integer types) <br> e (send output to LAD bus, WE strobe) <br> h (send output to LAD bus, $\overline{\mathrm{ALTCH}}$ strobe) |  |  |  |  |  |  |  |  |  |  |  |  |
| Restrictions | Absolute value modifiers, negated result and wrapped numbers are only permitted with floating-point operations. |  |  |  |  |  |  |  |  |  |  |  |  |
| Example | sqrt RA7.u, C.n |  |  |  |  |  |  |  |  |  |  |  |  |




## Syntax

Instruction Word

Description

## Sources for reg

st reg.type, address, count

| 31 | 30 | 29 | 28 | 27 | 26 | 25 |  |
| :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| L | 1 | M | 1 | T | S | word count | register |


| 15 | 2 | 1 |
| :---: | :---: | :---: |
| start address | $\mathrm{C}^{\dagger}$ | $A^{\dagger}$ |

$\dagger$ For LAD moves only.
During a move instruction, no FPU operation is performed. Register control logic for move instructions counts sequentially from the beginning register address, with the exception that the C and CT registers are omitted from the count. The entire register file acts like a ring buffer during the move instruction. The C and CT registers are not accessible to moves. It is illegal to use the C or CT register address as the starting address for a move instruction.
$T$ (Type) and $S$ (Size) give the format of the numbers
T $0=$ integer $\quad 1=$ floating point
$S 0=32$ bits $\quad 1=64$ bits
Note: Setting TS $=01$ is reserved
Word count is the number of operands to be moved (n). A count of 0 will move 256 items 1 or 2 32-bit words long. The beginning register address is stored in the register field, and the beginning memory address is the start address field (bits 15-0).

An indirect move is designated by selecting MCADDR as the address. The M bit will be set low, and the 16 low-order bits are then disregarded. The starting address in memory comes from the MCADDR register.

To move data to the LAD bus, 'LAD' is selected as the address. The $L$ bit will be set high, and the low-order 16 bits are set to 0 . An address of LAD_A will write data to the LAD bus with an $\overline{\text { ALTCH }}$ strobe (instead of the normal $\overline{W E})$. The $A$ bit will be set high in the instruction word.

An address of 'COINT' will write data to the LAD bus and set COINT low for the cycles the store is executing. (C will be set high in the instruction word.) An address of 'COINT_A' will store to the LAD bus with COINT enabled and an $\overline{\text { ALTCH }}$ strobe (instead of the normal $\overline{W E}$ ). $C$ and $A$ will be set high in the instruction word. The COINT and COINT_A options are only valid for host-independent mode.

RA9-RA0
RB9-RB0
STATUS, CONFIG, COUNTX, COUNTY
VECTOR, MCADDR, SUBADD0, SUBADD1, IRAREG, LOOPCT

| Types for reg | f (single-precision floating-point) <br> d (double-precision floating-point) <br> $i$ (signed integer) <br> u (unsigned integer) |
| :---: | :---: |
| Destination addresses | $\begin{aligned} & \text { 0x0-0xFFFF } \\ & \text { MCADDR, LAD, COINT, COINT_A, LAD_A } \end{aligned}$ |
| Range for count | 0-31 |
| Example | st RAO.f, MCADDR, 3 st RB1.i, LAD, 5 |


| Syntax | sub ra.[modifier]type, rb.[modifier]type, rd[.modifier] |  |  |  |  |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Execution | $\mathrm{ra}-\mathrm{rb} \rightarrow \mathrm{rd}$ |  |  |  |  |  |  |  |  |  |  |  |
| Instruction Words | 31 | 30 |  | 29 |  | 28 | 27 |  | 2423 | 2019 |  | 15 |
|  | 0 | 0 |  |  |  | h |  | ra |  | rb |  |  |
|  | $14 \quad 11$ | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 |  |
|  | sel_op | 0 | t | pa | pb | 0 | 0 | va | vb | vy | 0 |  |
| Description | This instruction places the difference in the values in ra and rb in rd. |  |  |  |  |  |  |  |  |  |  |  |
| Sources for ra | RA9-RA0 <br> C or CT Register MULFB (Multiplier feedback) LAD (Immediate data from LAD bus) ONE (the value one) |  |  |  |  |  |  |  |  |  |  |  |
| Sources for rb | RB9-RB0 <br> C or CT Register ALUFB (ALU feedback) LAD (Immediate data from LAD bus) ONE (the value one) |  |  |  |  |  |  |  |  |  |  |  |
| Types for ra and rb | f (single-precision floating-point) d (double-precision floating-point) <br> $i$ (signed integer) <br> u (unsigned integer) |  |  |  |  |  |  |  |  |  |  |  |
| Modifiers for ra and rb | $v$ (abslute value, not valid for integer types) |  |  |  |  |  |  |  |  |  |  |  |
| Destinations for rd | ```RA-9RAO RB9-RB0 C or CT STATUS, CONFIG,COUNTX, COUNTY VECTOR, MCADDR, SUBADD0, SUBADD1, IRAREG, LOOPCT``` |  |  |  |  |  |  |  |  |  |  |  |
| Modifiers for rd | $v$ (absolute value, not valid for integer types) <br> $e$ (send output to LAD bus, WE strobe) <br> h (send output to LAD bus, $\overline{\text { ALTCH }}$ strobe) |  |  |  |  |  |  |  |  |  |  |  |
| Example | sub LAD.vd, ONE.vf, RAO.h |  |  |  |  |  |  |  |  |  |  |  |



| Syntax | utod ra, rd[.modifier] |  |  |  |  |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Execution | ra (unsigned integer) $\rightarrow$ rd (double-precision) |  |  |  |  |  |  |  |  |  |  |  |
| Instruction Words | 31 | 30 | 29 | 28 | 27 | 24 | 23 | 22 | 21 | 20 | 19 | 15 |
|  | 0 | 0 | e | h | ra |  | 0 | 0 | 0 | 0 |  | rd |
|  | 14 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|  | sel_o | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 |
| Description | This instruction converts an unsigned integer value in ra to a double-precision floating-point format and places the result in rd. |  |  |  |  |  |  |  |  |  |  |  |
| Sources for ra | RA9-RA0 <br> C or CT Register MULFB (Multiplier feedback) ONE (the value one) |  |  |  |  |  |  |  |  |  |  |  |
| Types for ra | type is implicit in the opcode |  |  |  |  |  |  |  |  |  |  |  |
| Modifiers for ra | none |  |  |  |  |  |  |  |  |  |  |  |
| Destinations for rd | ```RA9-RAO RB9-RB0 C or CT STATUS, CONFIG, COUNTX, COUNTY VECTOR, MCADDR, SUBADDO, SUBADD1, IRAREG, LOOPCT``` |  |  |  |  |  |  |  |  |  |  |  |
| Modifiers for rd | $e$ (send output to LAD bus, $\overline{\text { WE }}$ strobe) <br> h (send output to LAD bus, $\overline{\text { ALTCH }}$ strobe) |  |  |  |  |  |  |  |  |  |  |  |
| Example | utod MULFB, CT.e |  |  |  |  |  |  |  |  |  |  |  |


| Syntax | utof ra, rd[.modifier] |  |  |  |  |  |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Execution | ra (unsigned integer) $\rightarrow$ rd (single-precision) |  |  |  |  |  |  |  |  |  |  |  |  |
| Instruction Words | 31 | 30 | 29 | 28 | 27 |  | 24 | 23 | 22 | 21 | 20 | 19 | 15 |
|  | 0 | 0 | e | h |  | ra |  | 0 | 0 | 0 | 0 |  | rd |
|  | 14 | 10 | 9 | 8 | 7 |  | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|  | sel_ | 0 | 0 | 0 | 0 |  | 0 | 1 | 0 | 1 | 0 | 1 | 0 |
| Description | This instruction converts an unsigned integer in ra to single-precision floating-point format and places it in rd. |  |  |  |  |  |  |  |  |  |  |  |  |
| Sources for ra | RA9-RA0 <br> C or CT Register MULFB (Multiplier feedback) LAD (Immediate data from LAD bus) ONE (the value one) |  |  |  |  |  |  |  |  |  |  |  |  |
| Types for ra | type is implicit in the opcode |  |  |  |  |  |  |  |  |  |  |  |  |
| Modifiers for ra | none |  |  |  |  |  |  |  |  |  |  |  |  |
| Destinations for rd | ```RA9-RA0 RB9-RB0 C or CT STATUS, CONFIG, COUNTX, COUNTY VECTOR, MCADDR, SUBADD0, SUBADD1, IRAREG, LOOPCT``` |  |  |  |  |  |  |  |  |  |  |  |  |
| Modifiers for rd | e (send output to LAD bus, $\overline{W E}$ strobe) <br> h (send output to LAD bus, $\overline{\text { ALTCH }}$ strobe) |  |  |  |  |  |  |  |  |  |  |  |  |
| Example | utof LAD, RA9.e |  |  |  |  |  |  |  |  |  |  |  |  |







## Appendix A

## System Design Considerations

Using high-performance CMOS logic devices, such as the TMS34082, requires careful attention to high-speed logic design and PWB design practices. A few simple design techniques can reduce check-out time during the development phase and, more importantly, improve system reliability as your product enters production. The following sections are general recommendations to reduce your chances of intermittent problems.

## A. 1 Logic Design

Check to make sure that the drive capability of each TMS34082 output driver is not exceeded, particularly with the clock drivers. This can affect the output signal quality as well as driver supply demands.

When operating in coprocessor mode, do not use buffers on the following signals between the TMS34020 and TMS34082 (unless a critical path timing analysis between the two devices has been completed):

LCLK1 and LCLK2 (local clocks)
$\square \overline{\text { ALTCH }}$ (address latch)
■ $\overline{\mathrm{CAS}}$ (column address strobe)

- SF (special function)

Figure A-1 shows how $\overline{R A S}$ and $\overline{\text { CAS }}$ buffers can be added for DRAMNRAM memory. These buffers effectively isolate the DRAM/NRAM devices from the TMS34020.

Figure A-1. Example of Using $\overline{R A S}$ and $\overline{C A S}$ Buffers in Coprocessor Mode


## A. 2 Bypass Capacitors

The TMS34082 is a high-speed CMOS device containing two 32-bit data buses and one 16 -bit address bus. As a result, a constant voltage source must be maintained for the device during signal transitions. The TMS34082 contains $10 \mathrm{~V}_{\mathrm{CC}}$ pins and 14 GND pins for internal power requirements.

External bypass capacitors must also be used for decoupling the switching transitions. Use two or more $0.1-\mu \mathrm{F}$ low-leakage high-quality capacitors around the perimeter of the TMS34082 package or under the device. Place the capacitors as close to the TMS34082 as possible. These are used to filter out unwanted switching noise caused by the CMOS output drivers, one of the major sources of noise. Also, use one $470-\mathrm{pF}$ low-leakage high-quality capacitor to reduce the very high frequency noise (such as clock frequencies) and at least one $10-\mu \mathrm{F}$ solid tantalum filter capacitor to take care of low frequency noise (such as power supply surges). The $10-\mu \mathrm{F}$ filter capacitor smooths out voltage spikes during switching transitions. The capacitance values are approximate and should have a working voltage of at least 10 V . By using three capacitor sizes, three different frequency bands of noise are filtered as opposed to just one narrow band for one bypass capacitor size.

Figure A-2. Recommended Bypass Capacitor Placement


## A. 3 PWB Design

The TMS34082 should be designed into a PC board environment with an embedded $V_{C C}$ or GND plane. For any production high-speed logic board, power planes are an absolute necessity. Each $V_{C C}$ and GND pin on the TMS34082 must be connected to the appropriate supply pin. Use the shortest amount of PWB etch possible. This effectively forms a common reference point throughout the PC board as well as the device substrate.

As with most complex CMOS devices, extra care must be used when distributing CMOS logic over more than one GND plane. An example of this is when a TMS34020 is on one board and multiple TMS34082s (running in coprocessor mode) are located on a daughtercard. The common ground connection between the two power planes behaves like an inductor according to transmission line theory. The greater the current, the greater the inductance. Here, the solution is to use many GND connections and to make them as short as possible. In addition, even more bypass capacitors should be used.

When using a PGA socket, use gold-plated contacts where the TMS34082 pins mate into the socket to lower the inductance and resistance. A gold plating thickness of 10 microinches is sufficient.

## A. 4 Clock Routing

Clocks are the heart of a high-performance system, so a little extra care will pay off many times over. Many of these ideas not only apply to the TMS34082, but to most high-speed CMOS logic devices.
PC board layout must take into account transmission-line theory. It is generally accepted that any clock line over 7 inches long should be considered as a transmission line. Use a daisy-chained clock distribution system and avoid using a ' $T$ ' (where three lines of etch come into a common vertex) or stubs. Avoid the use of $90^{\circ}$ angles within the clock trace; use arcs or smooth lines instead, as shown in Figure A-3. This reduces the number of signal reflections within the clock trace.

Figure A-3. Recommended Clock Routing Techniques


When routing your PC board, route the clock signals first (they may even be hand routed). To help reduce cross talk and radiated RF interference, keep the length of clock interconnections as short as possible and place the majority of clock routing next to one of the $V_{C C}$ or $G N D$ power planes. Cross talk is where one signal gets coupled onto another signal; one trace behaves like a transmitter antenna and the other trace acts as a receiver.To further reduce cross talk, make certain that the clock trace does not run parallel to data or control lines for more than three inches if they are spaced within 100 mils of each other. Traces adjacent to the clock lines that are connected to GND also may be used.
Since many clock interconnections behave like transmission lines, impedance mismatches can generate reflections. From a time-domain point of view, these can result in ringing, undershoot, and overshoot. If the clock drivers generate excessive amounts of ringing and undershoot at their destinations, it will be necessary to put either an impedance matching termination network at the farthest signal point from the driver or a series resistor ( $22 \Omega$ to $39 \Omega$ ) between the clock driver output and the receiving input. Using a series resistor also slows down the signal response times slightly. The amount of undershoot or ringing may be difficult to predict before hand, but there are many good articles on transmission line theory for PC board design.

## A. 5 Thermal Considerations

Because the TMS34082 is implemented in CMOS, its power consumption requirements are low and generate little heat. You must make certain that the operating temperature of the surrounding environments is within TMS34082 operating specifications.

## Appendix B

## TMS34082A Data Sheet

The pinout, electrical specifications, timing diagrams, and mechanical specifications are contained within the TMS34082 Data Sheet and appear in this appendix.

- High-Performance Floating-Point RISC

Processor Optimized for Graphics

- Two Operating Modes
- Floating-Point Coprocessor for TMS34020 Graphics System Processor
- Independent Floating-Point Processor
- Direct Connection to TMS34020

Coprocessor Interface

- Direct Extension to the TMS34020 Instruction Set
- Multiple TMS34082A Capability
- Fast Pipeline Instruction Cycle Time
- TMS34082A-40 . . 50-ns Coprocessor Mode . . . 50-ns Host-Independent Mode
- TMS34082A-32 . . . 62.5-ns Coprocessor Mode . . . 60-ns Host-Independent Mode
- Sustained Data Transfer Rates of 160

MBytes/s (TMS34082-40)

- Sequencer Executes Internal or

User-Programmed Instructions

- 22 64-Bit Data Registers
- Comprehensive Floating-Point and Integer Instruction Set
- Internal Programs for Vector, Matrix, and 3-D Graphics Operations
- Full IEEE Standard 754-1985 Compatibility
- Addition, Subtraction, Multiplication, and Comparison
- Division and Square Root
- Selectable Data Formats
- 32-Bit Integer
- 32-Bit Single-Precision Floating-Point
- 64-Bit Double-Precision Floating-Point
- External Memory Addressing Capability
- Program Storage (up to 64K Words)
- Data Storage (up to 64 K Words)
- $0.8-\mu \mathrm{m}$ EPIC ${ }^{\text {T" }}$ CMOS Technology
- High-Performance
- Low Power (< 1.5 W )


## description

The TMS34082A is a high-speed graphics floating-point processor implemented in Texas Instruments advanced $0.8-\mu \mathrm{m}$ CMOS technology. The TMS34082A combines a 16 -bit sequencer and a 3 -operand (source A, source B, and destination) 64-bit Floating-Point Unit (FPU) with 22 64-bit data registers on a single chip. The data registers are organized into two files of ten registers each, with two registers for internal feedback. In addition, it provides an instruction register to control FPU execution, a status register to retain the most recent FPU status outputs, eight control registers, and a two-deep stack (see functional block diagram).

The TMS34082A is fully compatible with IEEE Standard 754-1985 for binary floating-point addition, subtraction, multiplication, division, square root, and comparison. Floating-point operands can be either in single- or double-precision IEEE format.

In addition to floating-point operations, the TMS34082A performs 32-bit integer arithmetic, logical comparisons, and shifts. Integer operations may be performed on 32-bit 2s complement or unsigned operands. Integer results are 32 -bits long (even for $32 \times 32$ integer multiplication). Absolute value conversions, floating-point to integer conversions, and integer to floating-point conversions are available.

The ALU and the multiplier are closely coupled and can be operated in parallel to perform sums of products or products of sums. During multiply/accumulate operations, both the ALU and the multiplier are active and the registers in the FPU core can be used to feedback products and accumulate sums without tying up locations in register files $A$ and $B$.

When used with the TMS34020, the TMS34082A operates in the coprocessor mode. The TMS34020 can control multiple TMS34082A coprocessors. When used as a stand-alone or with processors other than the TMS34020, the TMS34082A operates in the host-independent mode. The TMS34082A is fully programmable by the user and can interface to other processors or floating-point subsystems through its two 32-bit bidirectional buses. In

EPIC is a trademark of Texas Instruments Incorporated.
the coprocessor mode, the TMS340 family tools may be used to develop code for the TMS34082A. The TMS34082A software tool kit is used to develop code for host-independent mode applications or for external routines in the coprocessor mode.

## pin descriptions

Pin descriptions and grid assignments for the TMS34082A are given on the following pages. The pin at location D4 has been added for indexing purposes.


PIN GRID ASSIGNMENTS

| PIN |  | PIN |  | PIN |  | PIN |  | PIN |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| No. | name | No. | name | No. | NAME | No. | NAME | No. | NAME |
| A1 | NC | B15 | LAD27 | F1 | MSD10 | K15 | RDY | P2 | NC |
| A2 | LAD1 | C 1 | MSD4 | F2 | MSD9 | L1 | MSD18 | P3 | MSD29 |
| A3 | LAD3 | C2 | MSD3 | F3 | $V_{C C}$ | L2 | MSD21 | P4 | MSD31 |
| A4 | LAD5 | C3 | MSDO | F13 | CORDY | L3 | MSD23 | P5 | MSA1 |
| A5 | LAD8 | C4 | $V_{S S}$ | F14 | $\overline{\text { ALTCH }}$ | L13 | VSS | P6 | MSA3 |
| A6 | LAD9 | C5 | $V_{C C}$ | F15 | $\overline{\text { CAS }}$ | L14 | CIDO | P7 | MSA6 |
| A7 | LAD11 | C6 | LAD6 | G1 | MSD13 | L15 | CID2 | P8 | MSA8 |
| A8 | LAD12 | C7 | $\mathrm{V}_{\text {SS }}$ | G2 | MSD12 | M1 | MSD20 | P9 | MSA10 |
| A9 | LAD13 | C8 | $V_{C C}$ | G3 | MSD11 | M2 | MSD24 | P10 | MSA13 |
| A10 | LAD15 | C9 | VSS | G13 | WE | M3 | $\mathrm{V}_{\text {SS }}$ | P11 | MWR |
| A11 | LAD17 | C10 | $V_{C C}$ | G14 | EC1 | M13 | $V_{C C}$ | P12 | $\overline{\mathrm{MOE}}$ |
| A12 | LAD19 | C11 | LAD21 | G15 | ECO | M14 | LCLK1 | P13 | INTG |
| A13 | LAD22 | C12 | $V_{S S}$ | H1 | MSD14 | M15 | LCLK2 | P14 | BUSFLT |
| A14 | LAD24 | C13 | LAD25 | H2 | TDO | N1 | MSD22 | P15 | $\overline{\mathrm{RAS}}$ |
| A15 | NC | C14 | LAD26 | H3 | $v_{S S}$ | N2 | MSD26 | R1 | NC |
| B1 | MSD1 | C15 | LAD29 | H13 | $V_{S S}$ | N3 | $V_{C C}$ | R2 | MSD27 |
| B2 | NC | D1 | MSD6 | H14 | $\overline{\text { LOE }}$ | N4 | MSD28 | R3 | MSD30 |
| B3 | LADO | D2 | MSD5 | H15 | TDI | N5 | $V_{S S}$ | R4 | MSAO |
| B4 | LAD2 | D3 | MSD2 | J1 | MSD15 | N6 | $V_{C C}$ | R5 | MSA2 |
| B5 | LAD4 | D4 | NC | J2 | MSD16 | N7 | MSA5 | R6 | MSA4 |
| B6 | LAD7 | D13 | $V_{C C}$ | J3 | $V_{C C}$ | N8 | $V_{S S}$ | R7 | MSA7 |
| B7 | LAD10 | D14 | LAD28 | J13 | CC | N9 | $V_{C C}$ | R8 | TCK |
| B8 | TMS | D15 | LAD31 | J14 | MSTR | N10 | MSA14 | R9 | MSA9 |
| B9 | LAD14 | E1 | MSD8 | J15 | CLK | N11 | $V_{S S}$ | R10 | MSA11 |
| B10 | LAD16 | E2 | MSD7 | K1 | MSD17 | N12 | MAE | R11 | MSA12 |
| B11 | LAD18 | E3 | $V_{S S}$ | K2 | MSD19 | N13 | LRDY | R12 | MSA15 |
| B12 | LAD20 | E13 | $V_{S S}$ | K3 | VSS | N14 | SF | R13 | DS/CS |
| B13 | LAD23 | E14 | LAD30 | K13 | CID1 | N15 | RESET | R14 | $\overline{\mathrm{MCE}}$ |
| B14 | NC | E15 | $\overline{\text { COINT }}$ | K14 | INTR | P1 | MSD25 | R15 | NC |

logic symbol ${ }^{\dagger}$

$\dagger$ This symbol is in accordance with ANSIIIEEE Std 91-1984.

## functional block diagram



TERMINAL FUNCTIONS

| PIN |  |  | DESCRIPTION |
| :---: | :---: | :---: | :---: |
| NAME | NO. | 1/O |  |
| $\overline{\text { ALTCH }}$ | F14 | $\begin{gathered} 1 \\ {[0]} \end{gathered}$ | Address Latch, active low. In the coprocessor mode, falling edge of $\overline{\text { ALTCH }}$ latches instruction and status present on the LAD bidirectional bus (LAD31-0). In the host-independent mode, $\overline{\text { ALTCH }}$ is address output strobe for memory accesses on LAD31-0. |
| BUSFLT | P14 | 1 | Bus Fault. In the coprocessor mode, BUSFLT high indicates a data fault on the LAD bus (LAD31-0) during current bus cycle, which in turn causes TMS34082A not to capture current data on LAD bus. Tied low if not used or in the host-independent mode. |
| $\overline{\text { CAS }}$ | F15 | $\begin{gathered} 1 \\ {[0]} \end{gathered}$ | Column Address Strobe, active low. In the coprocessor mode, causes TMS34082A to latch LAD bus data when $\overline{C A S}$ has a low-to-high transition if LRDY was high and BUSFLT was low at the previous LCLK2 rising edge. In the host-independent mode, this signal is the read strobe output. |
| CC | J13 | 1 | Condition Code Input. In both modes, may be used as an external conditional input for branch conditions. |
| $\begin{aligned} & \text { CIDO } \\ & \text { CID1 } \\ & \text { CID2 } \end{aligned}$ | $\begin{aligned} & \text { L14 } \\ & \text { K13 } \\ & \text { L15 } \end{aligned}$ | 1 | Coprocessor ID. In the coprocessor mode, used to set a coprocessor ID so that a TMS34020 Graphics System Processor controlling multiple TMS34082A coprocessors can designate which coprocessor is being selected by the current instruction. Tied low in the host-independent mode. |
| CLK | $J 15$ | 1 | System Clock. In the coprocessor mode, tied low. In the host-independent mode, input is the system clock. |
| $\overline{\mathrm{COINT}}$ | E15 | 0 | Coprocessor Interrupt Request, active low. In the coprocessor mode, signals an exception not masked out in the configuration register. Remains low until the status register is read. In the host-independent mode, user programmable I/O when LADCFG is low. When LADCFG is high, designates bus cycle boundaries on LAD31-0. |
| CORDY | F13 | 0 | Coprocessor Ready. In the coprocessor mode, if the TMS34020 sends an instruction before the TMS34082A has completed a previous instruction, this signal goes low to indicate that the TMS34020 should wait. In the host-independent mode, user programmable. |
| DS/CS | $R 13$ | 0 | Data Space/Code Space. In both modes, when MEMCFG is low and DS/CS is low, selects program memory on MSD port. When MEMCFG is low and DS/CS is high, selects data memory on MSD port. When MEMCFG is high, DS/CS is memory chip select, active low. |
| $\begin{aligned} & \text { EC0 } \\ & \text { EC1 } \end{aligned}$ | $\begin{aligned} & \text { G15 } \\ & \text { G14 } \end{aligned}$ | 1 | Emulator Mode Control and Test. In both modes, tied high for normal operation. |
| INTG | P13 | 0 | Interrupt Grant Output. In the coprocessor mode, INTG is low. In the host-independent mode, this signal is set high to acknowledge an interrupt request input. |
| INTR | K14 | 1 | Interrupt Request Input, active low. In the coprocessor mode, INTR is tied high. In the host-independent mode, causes call to subroutine address in interrupt vector register. |

$\dagger$ The []'s denote the type of buffer utilized in the host-independent mode. If no [ ]'s appear, the buffer type is identical for both modes of operation.

TERMINAL FUNCTIONS (Continued)

| PIN |  |  | DESCRIPTION |
| :---: | :---: | :---: | :---: |
| NAME | NO. |  | DESCRIPTION |
| LAD0 | B3 | I/O | Local Address and Data Bus. In the coprocessor mode, used by TMS34020 to input instructions and data operands to TMS34082A, and used by TMS34082A to output results. In the host-independent mode, used by the TMS34082A for address output and data $/ / \mathrm{O}$. |
| LAD1 | A2 |  |  |
| LAD2 | B4 |  |  |
| LAD3 | A3 |  |  |
| LAD4 | B5 |  |  |
| LAD5 | A4 |  |  |
| LAD6 | C6 |  |  |
| LAD7 | B6 |  |  |
| LAD8 | A5 |  |  |
| LAD9 | A6 |  |  |
| LAD10 | B7 |  |  |
| LAD11 | A7 |  |  |
| LAD12 | A8 |  |  |
| LAD13 | A9 |  |  |
| LAD14 | B9 |  |  |
| LAD15 | A10 |  |  |
| LAD16 | B10 |  |  |
| LAD18 | B11 |  |  |
| LAD19 | A12 |  |  |
| LAD20 | B12 |  |  |
| LAD21 | C11 |  |  |
| LAD22 | A13 |  |  |
| LAD23 | B13 |  |  |
| LAD24 | A14 |  |  |
| LAD25 | C13 |  |  |
| LAD26 | C14 |  |  |
| LAD27 | B15 |  |  |
| LAD28 | D14 |  |  |
| LAD29 | C15 |  |  |
| LAD30 | E14 |  |  |
| LAD31 | D15 |  |  |
| LCLK1 LCLK2 | $\begin{aligned} & \text { M14 } \\ & \text { M15 } \end{aligned}$ | 1 | Local Clocks 1 and 2. In the coprocessor mode, two local clocks generated by the TMS34020,90 degrees out of phase, to provide timing inputs to TMS34082A. In the host-independent mode, tied low. |
| $\overline{\text { LOE }}$ | H14 | 1 | Local Bus Output Enable, active low. In both modes, enables the local bus (LAD31-0) to be driven at the proper times when low. In addition during the host-independent mode when LADCFG is low, does not affect $\overline{\mathrm{ALTCH}}, \overline{\mathrm{CAS}}, \overline{\mathrm{WE}}, \mathrm{CORDY}$, or $\overline{\mathrm{COINT}}$. When LADCFG is high, $\overline{\mathrm{ALTCH}}, \overline{\mathrm{COINT}}$, and CORDY are not disabled by $\overline{\mathrm{LOE}}$ high; $\overline{\mathrm{CAS}}$ and $\overline{\mathrm{WE}}$ are disabled. |
| LRDY | N13 | 1 | Local Bus Data Ready. In the coprocessor mode, when LRDY is high, indicates that data is available on LAD bus. When LRDY is low, indicates that the TMS34082A should not load data from LAD31-0 and may also be used in conjunction with BUSFLT. In the host-independent mode, when LRDY is low, the device is stalled until LRDY is set high again and tied high if not used. |
| $\overline{M A E}$ | N12 | 1 | Memory Address and Data Output Enable, active low. In both modes, with $\overline{\text { MAE }}$ low, the TMS34082A can output an address on MSA15-0 and data on MSD31-0. MAE high does not disable DS/CS, MCE, MWR, or MOE. |
| $\overline{\mathrm{MCE}}$ | R14 | 0 | Memory Chip Enable. In both modes, when MEMCFG low, active (low) indicates access to external memory on MSD31-0. When MEMCFG is high, MCE low is external code memory chip select. |
| $\overline{\text { MOE }}$ | P12 | 0 | Memory Output Enable, active low. In both modes when low, enables output from external memory on to MSD port. |

TERMINAL FUNCTIONS (Continued)

|  |  |  |  |
| :---: | :---: | :---: | :---: |
| NAME | NO. | 1/0 | DESCRIPTION |
| MSAO | R4 | 0 | Memory Address output. In both modes, addresses up to 64 K words of external program memory and/or up to 64 K words of data memory on the MSD port, depending on setting of DS/CS select. |
| MSA1 | P5 |  |  |
| MSA2 | R5 |  |  |
| MSA3 | P6 |  |  |
| MSA4 | R6 |  |  |
| MSA5 | N7 |  |  |
| MSA6 | P7 |  |  |
| MSA7 | R7 |  |  |
| MSA8 | P8 |  |  |
| MSA9 | R9 |  |  |
| MSA10 | P9 |  |  |
| MSA11 | R10 |  |  |
| MSA12 | R11 |  |  |
| MSA13 | P10 |  |  |
| MSA14 | N10 |  |  |
| MSA15 | R12 |  |  |
| MSD0 | C3 |  |  |
| MSD1 | B1 |  |  |
| MSD2 | D3 |  |  |
| MSD3 | C 2 |  |  |
| MSD4 | C1 |  |  |
| MSD5 | D2 |  |  |
| MSD6 | D1 |  |  |
| MSD7 | E2 |  |  |
| MSD8 | E1 |  |  |
| MSD9 | F2 |  |  |
| MSD10 | F1 |  |  |
| MSD11 | G3 |  |  |
| MSD12 | G2 |  |  |
| MSD13 | G1 |  |  |
| MSD14 | H1 |  |  |
| MSD15 | J1 |  | External Memory Data. In both modes, I/Os to external memory. Used to read from or write to external |
| MSD16 | J2 | 1/0 | data or program memory on the MSD port. |
| MSD17 | K1 |  |  |
| MSD18 | L1 |  |  |
| MSD19 | K2 |  |  |
| MSD20 | M1 |  |  |
| MSD21 | L2 |  |  |
| MSD22 | N1 |  |  |
| MSD23 | L3 |  |  |
| MSD24 | M2 |  |  |
| MSD25 | P1 |  |  |
| MSD26 | N2 |  |  |
| MSD27 | R2 |  |  |
| MSD28 | N4 |  |  |
| MSD29 | P3 |  |  |
| MSD30 | R3 |  |  |
| MSD31 | P4 |  |  |
| MSTR | J14 | I | Host-Independent/Coprocessor Mode Select. In the coprocessor mode, MSTR must be tied low to operate properly. In the host-independent mode, MSTR must be tied high to operate properly. |
| $\overline{M W R}$ | P11 | 0 | Memory Write Enable. In both modes, when low, data on MSD31-0 can be written to external program or data memory. |

## TERMINAL FUNCTIONS (Continued)

| PIN |  | l/O ${ }^{\text {+ }}$ | DESCRIPTION |
| :---: | :---: | :---: | :---: |
| NAME | NO. |  |  |
| NC | A1 |  | No Internal Connection. These pins should be left floating. |
|  | A15 |  |  |
|  | B2 |  |  |
|  | B14 |  |  |
|  | D4 |  |  |
|  | P2 |  |  |
|  | R1 |  |  |
|  | R15 |  |  |
| $\overline{\text { RAS }}$ | P15 | 1 | Row Address Strobe, active low. In the coprocessor mode, $\overline{\mathrm{RAS}}$ is high during all of coprocessor instruction cycle. In the host-independent mode, it is not used. |
| RDY | K15 | 1 | Ready. In both modes, when RDY is low, it causes a nondestructive stall of sequencer and floating-point operations. All internal registers and status in the FPU core are preserved. Also, no output lines will change state. |
| RESET | N15 | 1 | Reset, active low. In both modes, resets sequencer output and clears pipeline registers, internal states, status, and exception disable registers in FPU core. Other registers are unaffected. |
| SF | N14 | 1 | Special Function Input. In the coprocessor mode when SF is high, indicates the LAD bus input is an instruction or data from TMS34020 registers. When SF is low, indicates the LAD input is a data operand from memory. In the host-indèpendent mode, not used. |
| TCK | R8 | 1 | Test Clock for JTAG four-wire boundary scan. In both modes, TCK is low for normal operation. |
| TDI | H15 | 1 | Test Data Input for JTAG four-wire boundary scan. In both modes, TDI may be left floating. |
| TDO | H2 | 0 | Test Data Output for JTAG four-wire boundary scan |
| TMS | B8 | 1 | Test Mode Select for JTAG four-wire boundary scan. In both modes, TMS may be left floating. |
| $V_{C C}$ | C5 | 1 | 5-V Power Supply. All pins must be connected and used. |
|  | C8 |  |  |
|  | C10 |  |  |
|  | D13 |  |  |
|  | F3 |  |  |
|  | J3 |  |  |
|  | M13 |  |  |
|  | N3 |  |  |
|  | N6 |  |  |
|  | N9 |  |  |
| $V_{\text {SS }}$ | C4 | 1 | Ground Pins. All pins must be connected and used. |
|  | C7 |  |  |
|  | C9 |  |  |
|  | C12 |  |  |
|  | E3 |  |  |
|  | E13 |  |  |
|  | H3 |  |  |
|  | H13 |  |  |
|  | K3 |  |  |
|  | L13 |  |  |
|  | M3 |  |  |
|  | N5 |  |  |
|  | N8 |  |  |
|  | N11 |  |  |
| WE | G13 | $\begin{gathered} 1 \\ {[\mathrm{O}]} \\ \hline \end{gathered}$ | Write Enable, active low. In the coprocessor mode, the write strobe from the TMS34020 to enable a write to or from the TMS34082A LAD bus. In the host-independent mode, the TMS34082A write strobe output. |

[^0]D3150, SEPTEMBER 1988 - REVISED MAY 1991 - SCGS001

## data flow

The TMS34082A has two bidirectional 32-bit buses, LAD31-0 and MSD31-0. Each bus can be used to pass instructions and data operands to the FPU core and to output results. A separate 16-bit bus, MSA15-0, provides memory addressing capability on the MSD bus.

When the TMS34082A is used as a coprocessor for the TMS34020 Graphics System Processor (GSP), data for the TMS34082A can be transferred through the 32-bit bidirectional data bus (LAD31-0) and may be passed to any internal registers or to external memory on the memory expansion interface (MSD31-0). When the TMS34082A is used as a standalone FPU, it can use both the LAD bus (LAD31-0) and the MSD bus (MSD31-0) to interface with external data memory or system buses.
In the host-independent mode, the TMS34082A can be operated with the LAD bus as its single data bus and the MSD bus as the instruction source, or with data storage on either port and the program memory on the MSD bus.
The data space/code space (DS/CS) output can be used to control access either to data memory or program memory on the MSD port. Up to 64 K words of code space and 64 K words of data space are directly supported. In the coprocessor mode, both instructions and data are transferred on the LAD bus with the option of accessing external user-generated programs on the MSD port.
One 32 -bit operand can be input to the data registers each clock cycle. A 64-bit double-precision floating-point operand is input in two cycles. Transfers to or from the data registers can normally be programmed as block moves, loading one or more sets of operands with a single move instruction to minimize I/O overhead. Several modes for moving operands and instructions are available. Block transfers up to 512 words between the LAD and MSD buses can be programmed in either direction.
To permit direct input to or output from the LAD bus in the host-independent mode, other options for controlling the LAD bus have been implemented. When two 32 -bit operands are being selected for input to the FPU core, one operand may be selected from LAD. On output from the FPU, a result may simultaneously be written to a register and to the LAD bus.
During initialization in the host-independent mode, a bootstrap loader can bring 65 32-bit words from the LAD bus and write them out to external program memory on the MSD bus, after which the device begins executing from the first memory location (zero). The first word is loaded into the configuration register. This option facilitates the initial loading of program memory on the MSD port upon power-up.

## architecture

Because the sequencer, control and data registers, and FPU core are closely coupled, the TMS34082A can execute a variety of complex floating-point or integer calculations rapidly, with a minimum of external data transfers. The internal architecture of the FPU core supports concurrent operation of the multiplier and the ALU, providing several options for storing or feeding back intermediate results. Also, several special registers are available to support specific calculations for graphics algorithms. Each of the main architectural elements of the TMS34082A is discussed below.

The control functions of the TMS34082A are provided by sequence control logic, register control logic, and bus interface control logic, together with user-programmed configuration settings stored in the configuration register. The on-board sequencer selects the next program execution address, either from internal code or from external program memory. Next-address sources include the program counter, stack, interrupt vector register, interrupt return register, or address register (for indirect jumps).
COUNTX, COUNTY, and MIN-MAX/LOOPCT registers are used for temporary storage by internal graphics routines. They may also serve as temporary storage for the user.

A separate FPU status register is provided, which can be used by test-and-branch instructions to control program execution. Because of the large number of status outputs, branches on status can be easily programmed. The status register contents are also important when dealing with status exceptions including such conditions as overflow, underflow, invalid operations (divide by zero), or illegal data formats such as infinity, Not a Number ( NaN ), or denormalized operands.

Register control logic permits all data and control registers to be accessed in accordance with applicable architectural restrictions. Register files $A$ and $B$ can be written to or read from the external buses, as can the control registers. Internal registers C and CT are embedded in the FPU core and can only be accessed by the FPU internal buses. The C and CT registers cannot be used as sources or destinations for MOVE instructions, and several registers (listed in Table 1) are not available as sources for FPU operations.

TABLE 1. INTERNAL REGISTERS

| REGISTER ADDRESS | REGISTER NAME | RESTRICTIONS ON USE |
| :---: | :---: | :---: |
| 00000 | RA0 |  |
| 00001 | RA1 |  |
| 00010 | RA2 |  |
| 00011 | RA3 |  |
| 00100 | RA4 |  |
| 00101 | RA5 |  |
| 00110 | RA6 |  |
| 00111 | RA7 |  |
| 01000 | RA8 |  |
| 01001 | RA9 |  |
| 01010 | $\mathrm{c}^{\dagger}$ | Not a source or destination for moves |
| 01011 | $\mathrm{CT}^{\dagger}$ | Not a source or destination for moves |
| 01100 | STATUS | Not a source for FPU instructions |
| 01101 | CONFIG | Not a source for FPU instructions |
| 01110 | COUNTX | Not a source for FPU instructions |
| 01111 | COUNTY | Not a source for FPU instructions |
| 10000 | RBO |  |
| 10001 | RB1 |  |
| 10010 | RB2 |  |
| 10011 | RB3 |  |
| 10100 | RB4 |  |
| 10101 | RB5 |  |
| 10110 | RB6 |  |
| 10111 | RB7 |  |
| 11000 | RB8 |  |
| 11001 | RB9 |  |
| 11010 | VECTOR | Not a source for FPU instructions |
| 11011 | MCADDR | Not a source for FPU instructions |
| 11100 | SUBADD0 | Not a source for FPU instructions |
| 11101 | SUBADD1 | Not a source for FPU instructions |
| 11110 | IRAREG | Not a source for FPU instructions |
| 11111 | MIN-MAX/LOOPCT | Not a source for FPU instructions |

[^1]
## TMS34082A

## GRAPHICS FLOATING-POINT PROCESSOR

## D3150, SEPTEMBER 1988 - REVISED MAY 1991 - SCGS001

## register files A and B, feedback registers C and CT

TMS34082A contains two register files, each with ten 64-bit registers and two 64-bit feedback registers. Most instructions will operate on one value from each of the RA and RB register files and return the result to either the RA or RB files or one of the feedback registers.

When the ONEFILE control bit is high in the configuration register, data written to a register in file RA is simultaneously written to the corresponding location in file RB. In this mode, the two register files act as a ten-word, two-read/one-write register file.



FEEDBACK REGISTERS


FIGURE 1. DATA REGISTERS
Two 64-bit feedback registers, $C$ and CT, are embedded in the FPU core. FPU instructions may use the feedback registers as one of the operands, but the registers cannot be accessed for external moves. The C and CT registers can be used as either the A or B operand, but both cannot be used as operands during the same instruction. However, C (or CT) may be used for more than one operand in the same instruction. For example, $C+C T$ is not a valid instruction, but $C+C$ is.

The CT feedback register is used in integer divide operations as a temporary holding register. Any data stored in CT will be lost during an integer divide.

## internal control/status register definitions

## configuration register definition

The configuration register (CONFIG) is a special 32-bit register that the user loads to configure the TMS34082A for exception handling, IEEE mode (vs. fast mode), rounding modes, and data-fetch operations. The configuration register is initialized to 'FFE00420' hex.

TABLE 2. CONFIGURATION REGISTER DEFINITION

| BIT NO. | NAME | DESCRIPTION |
| :---: | :---: | :---: |
| 31 | MIVAL | Multiplier invalid operation (I) exception mask. Initialized to 1 (enabled). |
| 30 | MOVER | Multiplier overfiow ( V ) exception mask. Initialized to 1 (enabled). |
| 29 | MUNDER | Multiplier underflow (U) exception mask. Initialized to 1 (enabled). |
| 28 | MINEX | Multiplier inexact ( $X$ ) exception mask. Initialized to 1 (enabled). |
| 27 | MDIVO | Divide by zero (DIVO) exception mask. Initialized to 1 (enabled). |
| 26 | MDENORM | Multiplier denormal (DENORM) exception mask. Initialized to 1 (enabled). |
| 25 | AIVAL | ALU invalid operation (I) exception mask. Initialized to 1 (enabled). |
| 24 | AOVER | ALU overflow ( $V$ ) exception mask. Initialized to 1 (enabled). |
| 23 | AUNDER | ALU underflow (U) exception mask. Initialized to 1 (enabled). |
| 22 | AINEX | $A L U$ inexact ( $X$ ) exception mask. Initialized to 1 (enabled). |
| 21 | ADENORM | ALU denormal (DENORM) exception mask. Initialized to 1 (enabled). |
| 11-20 | N/A | Reserved, set to all Os. |
| 10 | REVISION | Revision number, read only. Set to 1. |
| 9 | LADCFG | When low, $\overline{\mathrm{CAS}}, \overline{\mathrm{WE}}, \mathrm{CORDY}, \overline{\mathrm{COINT}}$, and $\overline{\mathrm{ALTCH}}$ are active signals not affected by $\overline{\mathrm{LOE}}$. When high, $\overline{\mathrm{LOE}}$ high places $\overline{C A S}$ and $\overline{W E}$ in high impedance, as well as the LAD bus. $\overline{C O I N T}$, which defines the LAD cycle boundaries, is controlled by bit 1 of the LAD move instruction instead of the set mask instruction. $\overline{\text { COINT }}$ will remain high unless a LAD move instruction (with bit 1 high) is in progress. The setting of this bit has no effect in the coprocessor mode. Initialized to 0. |
| 8 | MEMCFG | When high, MCE becomes code space chip enable and DS/CS becomes data space chip enable (eliminates need for external inverter). When low, $\overline{M C E}$ is chip select for external code and data space. DS/ $\overline{C S}$ functions as an address bit which selects code space (when low) or data space (when high). Initialized to 0. |
| 7 | N/A | Reserved for later use. Initialized to 0 . Must be loaded with 0 . |
| 6 | ONEFILE | When high, causes simultaneous write to both register files (for example, to both RA0 and RB0 at once). The register files act as a single two-read, one-write register file. Initialized to 0. |
| 5 | PIPES2 | When high, makes FPU output registers transparent. When low, registers are enabled. Initialized to 1. |
| 4 | PIPES1 | When high, makes FPU internal pipeline registers transparent. When low, registers are enabled. Initialized to 0. |
| 3 | FAST | When high, fast mode is selected (all denormalized inputs and outputs are 0 ). When low, IEEE mode is selected. Initialized to 0 . |
| 2 | LOAD | Load order. $0=$ MSH, then LSH; $1=$ LSH, then MSH. Initialized to 0 . |
| 1 | RND1 | Rounding mode select 1 . Initialized to 0 . |
| 0 | RND0 | Rounding mode select 0 . Initialized to 0. |

LSH denotes least-significant half of a 64-bit word, MSH denotes most-significant half of a 64-bit word.
The mask bits serve as exception detect enables for the exception masks listed above. Setting the bit high (logic ' 1 ') enables the detection of the specific exception. When an enabled exception occurs, the ED bit in the status register will be set high and can be used to generate interrupts. The fast bit allows the TMS34082A to control the handling of denormalized numbers. When the fast bit is set high, all denormalized numbers input to the device are flushed to zero, and all denormalized results are also flushed to zero (this is also called 'sudden underflow'). When the fast bit is low, IEEE mode is selected. Denormalized numbers may be generated by (or input to) the ALU. Denormalized numbers must first be wrapped before being used as operands for multiply or divide instructions.

The LOAD bit defines the expected order of double-precision operands. Atreset, this bit will default to 0 indicating that the most significant 32 bits are transferred first. If the bit is set to a 1 , then the expected order of 64 -bit data transfers starts with the least significant 32 bits.
The RNDO and RND1 bits select the IEEE rounding mode, as shown in Table 3.

## GRAPHICS FLOATING-POINT PROCESSOR

TABLE 3. ROUNDING MODE

| RND1 - RND0 | ROUNDING MODES |
| :---: | :--- |
| 00 | Round towards nearest |
| 01 | Round toward zero (truncated) |
| 10 | Round towards infinity (round up) |
| 11 | Round towards negative infinity (round down) |

## status register definition

The floating-point status register (STATUS) is a 32-bit register used for reporting the exceptions that occur during TMS34082A operations and status codes set by the results of implicit and explicit compare operations. The status register is cleared upon reset, except for the INTENED flag, which is set to 1 in the coprocessor mode.

TABLE 4. STATUS REGISTER DEFINITION

| BIT NO. | NAME | DESCRPTION |
| :---: | :---: | :---: |
| 31 | N | Sign bit ( A < B fiag for compare) |
| 30 | GT | $A>B$ (valid on compare) |
| 29 | Z | Zero flag ( $A=B$ for compare) |
| 28 | V | IEEE overflow flag. The result is greater than the largest allowable value for the specified format. |
| 27 | 1 | IEEE invalid operation flag. A NaN has been input to the multiplier or the ALU, or an invalid operation [(0 * 1) or $(\infty-\infty)$ or $(-\infty+\infty)$ ] has been requested. This signal also goes high if an operation involves the square root of a negative number. When IVAL hoes high, the STX pins indicate which port had the NaN . |
| 26 | U | IEEE underflow flag. The result is inexact and less than the minimum allowable value for the specified format. In fast mode, this condition causes the result to go to zero. |
| 25 | X | IEEE inexact flag. The result of an operation is inexact. |
| 24 | DIVO | Divide by zero. An invalid operation involving a zero divisor has been detected by the multiplier. |
| 23 | RND | The mantissa of a number has been increased in magnitude by rounding. If the number generated was wrapped, then the 'unwrap rounded' instruction must be used to properly unwrap the wrapped number. |
| 22 | DENIN | Input to the multiplier is a denormalized number. When DENIN goes high, the STX pins indicate which port has the denormal input. |
| 21 | DENORM | The multiplier output is wrapped number or the ALU output is a denormalized number. In fast mode, this condition causes the result to go to zero. It also indicates an invalid integer operation with a negative unsigned integer result. |
| 20 | STX1 | A NaN or a denormalized number has been input on the A port. |
| 19 | STX0 | A NaN or a denormalized number has been input on the B port. |
| 18 | ED | Exception detect status signal representing logical OR of all enabled exceptions in the configuration register. |
| 17 | UNORD | The two inputs of a comparison operation are unordered, i.e.; one or both of the inputs is an NaN . |
| 16 | INTFLG | Software interrupt flag. Set by external code to signal a software interrupt. |
| 15 | INTENHW | Hardware interrupt (INTR) enable, active high (initialized to zero) |
| 14 | NXOROV | $N$ (negative) XOR V (overflow) |
| 13 | VANDZB | $V$ (overflow) AND $\bar{Z}$ (NOT zero) |
| 12 | INTENED | ED interrrupt enable, active high (initialized to zero in the host-independent mode, one in the coprocessor mode) |
| 11 | INTENSW | Software interrupt (INTFLG) enable, active high (initialized to zero) |
| 10 | ZGT | $\mathrm{Zn}>\mathrm{Zmax}$ (valid for 2-D MIN-MAX instruction) |
| 9 | ZLT | $\mathrm{Zn}<\mathrm{Zmin}$ (valid for 2-D MIN-MAX instruction) |
| 8 | YGT | $\mathrm{Y} \mathrm{n}>\mathrm{Ymax}$ (valid for 1-D or 2-D MIN-MAX instruction) |
| 7 | YLT | Yn < Ymin (valid for 1-D or 2-D MIN-MAX instruction) |
| 6 | XGT | $X \mathrm{X}>\times \mathrm{max}$ (valid for 1-D or 2-D MIN-MAX instruction) |
| 5 | XLT | X n < X min (valid for 1-D or 2-D MIN-MAX instruction) |
| 4 | HINT | Hardware interrupt flag |
| 3-0 | N/A | Reserved |

indirect address register (MCADDR) definition
The indirect address register (MCADDR) can be set to point to a memory location for indirect move or jump operations through the MSD port. MCADDR is cleared upon reset.

| $\mathbf{3 1}$ | $\mathbf{X}$ | X | X | X | X | X | X | X | X | X | X | X | X | X | X | V | 0 |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |

FIGURE 2. INDIRECT ADDRESS DEFINITION
The function of bit 16 varies, depending on whether the instruction is a MOVE or JUMP. During a MOVE instruction, bit 16 selects data space when set high, or code space when low. During a JUMP instruction, bit 16 selects an internal instruction when set high, or an external instruction when low.

## stack registers (SUBADD1-SUBADDO) definition

The stack contains two subroutine return address registers, SUBADDO and SUBADD1, which serves as a two-deep LIFO (last-in, first-out) stack. A subroutine jump causes the program counter to be pushed onto the stack, and a return from subroutine pops the last address pushed on the stack. More than two pushes will overwrite the contents of SUBADD1.

Bit 31 (Pointer) is set high in the stack location that was written last and reset to zero in the other stack location. Setting bit 30 (Enable) high enables a write into bit 31 (set or reset the pointer) in either stack location. If bit 31 is zero in both SUBADD0 and SUBADD1 (as when the stack has been saved externally and later restored), SUBADDO can be designated as top of stack by setting bit 31 . The stack pointers (bit 31) are cleared upon reset.
Bit 16 (I) is set high when the address in a stack location points to an internal routine, or set low when the address is for an external instruction.


FIGURE 3. STACK DEFINITION

## interrupt vector register (VECTOR) definition

The interrupt vector register (VECTOR) serves as a pointer to an external program to be executed upon receipt of an interrupt. Bit 16 (I) is always set low to point to a routine in external code space. The interrupt vector is cleared on reset.


## FIGURE 4. INTERRUPT VECTOR DEFINITION

## interrupt return register (IRAREG) definition

The interrupt return register (IRAREG) retains a copy of the program counter at the time of an external interrupt. This address is used as the next execution address upon returning from the interrupt. Bit 16 (I) is set high when the address in the stack location points to an internal instruction, or set low when the address is for an external instruction. This register is not affected by the reset signal.


FIGURE 5. INTERRUPT RETURN DEFINITION

## COUNTX and COUNTY registers definition

The counter registers (COUNTX, COUNTY) are used to store the current counts of the minimum and maximum values when executing MIN-MAX instructions. COUNTX and COUNTY are cleared on reset.

| 31 | 16 | 0 |
| :--- | :--- | :--- | :--- |
| COUNT FOR MAX VALUE | COUNT FOR MIN VALUE |  |

FIGURE 6. COUNTY AND COUNTX REGISTER DEFINITION
The COUNTX register is updated on both the 1-D and 2-D MIN-MAX instruction such that the count of the current minimum value is in the lower 16 bits of the register and the count of the current maximum value is in the upper 16 bits. The COUNTY register is used only in the 2-D MIN-MAX instruction to keep track of the counts of the minimum and maximum for the second value of a pair. The COUNTX and COUNTY registers may also be used for temporary storage when not using the MIN-MAX instructions.

## MIN-MAX/LOOPCT register

The MIN-MAX/LOOPCT register stores the current values of two separate counters. The LSH contains the current loop counter, and the MSH is used to hold the current minimum or maximum value of a MIN-MAX operation. The MIN-MAX/LOOPCT register is cleared upon reset. The MIN-MAX/LOOPCT register may also be used for temporary storage when not using the MIN-MAX instructions.

FIGURE 7. MIN-MAX/LOOPCT REGISTER DEFINITION

## FPU core

The FPU core itself consists of a multiplier and an ALU, each with an intermediate pipeline register and an output register (see Figure 8, FPU core functional block diagram). Four multiplexers select the multiplier and ALU operands from the data registers, feedback registers, or previous multiplier or ALU result. Results are directed either to the internal feedback registers ( C or CT ), the 20 data registers in register files RA and RB, or the ten other miscellaneous registers.

Both the internal pipeline registers and the output registers can be enabled or made transparent (disabled) by setting the PIPES2-PIPES1 bits in the configuration register. When the device is powered up, the default settings of the internal registers are PIPES2 high (output registers transparent) and PIPES1 low (internal pipeline registers enabled).
When the FPU core is used for chained operations, the multiplier and ALU operate in parallel. Two data inputs are provided from the RA and RB input registers, while multiplier and ALU feedback are used as the other two operands. While in the chained mode, the output registers of the FPU must be enabled to latch feedback operands. The appropriate registers must be enabled by setting the PIPES2-PIPES1 controls in the configuration register at the beginning of chained operations, and the PIPES2-PIPES1 control should then be reinitialized upon termination.
Fully pipelined operation (both pipeline and output registers enabled) affects timing when writing results back to the RA and RB register files. To adjust writeback timing, it is possible to issue the NOP (no operation) instruction to the FPU core when the results are to be retained in the output registers for one or more additional cycles. The NOP instruction is only effective when the output registers are enabled, as each NOP causes the output register contents to be retained for one additional cycle.

SCGS001 - D3150, SEPTEMBER 1988 - REVISED MAY 1991


FIGURE 8. FPU CORE FUNCTIONAL BLOCK DIAGRAM

## TMS34082A operating modes

The TMS34082A can operate as a stand-alone floating-point processor or a graphics coprocessor to the TMS34020 Graphics System Processor. Control of FPU operation is provided either from external program memory or from the TMS34020. External instructions are addressed by address lines MSA15-0 and are input on MSD31-0. TMS34020 instructions are input on LAD31-0.
Both the MSD and LAD buses can be used for data transfers as well. Combinations of control signals distinguish instruction fetches from data transfers. A single instruction may be used to transfer data and to perform an operation within the FPU.
The TMS34082A supports external code and data storage with the memory expansion interface, MSD31-0. Up to 64K 32 -bit data operands and 64 K instructions may be added externally to the TMS34082A. The signal DS/CS controls whether data space or code space is being accessed, and read/write control is provided with the chip enable ( $\overline{\mathrm{MCE}}$ ), output enable ( $\overline{\mathrm{MOE}}$ ), address enable ( $\overline{\mathrm{MAE}}$ ), write enable ( $\overline{\mathrm{MWR}}$ ), and address lines (MSA15-0).
The TMS34082A also provides instructions that allow the TMS34020 to read/write directly from/to external memory. The external code support permits full utilization of the TMS34082A features and instruction set.

## coprocessor-mode operation

Operation in the coprocessor mode assumes MSTR is low. In this mode, the TMS34082A acts as a closely coupled coprocessor to the TMS34020. The interface between the two devices consists of direct connections between pins. More than one coprocessor may be connected to the TMS34020 by setting the appropriate coprocessor ID (CID2-CID0). Up to four coprocessors executing in parallel may be used with a single TMS34020.
In the coprocessor mode, clock signals are provided by LCLK1 and LCLK2 from the TMS34020. Internally, the FPU generates a rising clock edge from each LCLK1 edge (rising or falling). Thus, the TMS34082A actually operates at twice the LCLK1 input clock frequency.

## initialization (coprocessor mode)

On reset, the TMS34082A clears all pipeline registers and internal states. The configuration register and status register return to their initialization values. When $\overline{\text { RESET }}$ returns high in the coprocessor mode, the TMS34082A is in an idle state waiting for the next instruction from the TMS34020.

## LAD bus control (coprocessor mode)

Both data and instructions are transferred over the bidirectional LAD bus in the coprocessor mode. A unique combination of signal inputs distinguishes an instruction from data. SF, $\overline{A L T C H}, \overline{\mathrm{CAS}}, \overline{\mathrm{RAS}}$, and $\overline{\mathrm{WE}}$ are used to designate coprocessor functions from other operations on the LAD bus.
Data may be transferred to or from TMS34020 registers or memory via LAD31-0. Transfers between the LAD and MSD buses can also be programmed. A single coprocessor instruction may be used to transfer data to the TMS34082A and then perform an FPU operation.

## MSD bus control (coprocessor mode)

Use of the MSD bus in the coprocessor mode is optional. External memory on MSD31-0 can be used to store data, user-programmed subroutines, or both. Different combinations of control signals distinguish between data memory and code memory. Control signals for MSD and MSA buses operate the same in the host-independent and coprocessor modes.

## interrupt handling (coprocessor mode)

A software interrupt to the TMS34082A is generated by the set mask external instruction. When the interrupt is granted, the current program counter is stored in the interrupt return register, and a branch to the interrupt vector address is executed. Software interrupts may be disabled.

If the exception detect interrupt (ED) is enabled, a TMS34082A exception causes $\overline{\text { COINT }}$ to go low, signalling the exception to the TMS34020. This exception does not cause a branch to the interrupt vector. If its interrupts are enabled, the TMS34020 will branch to an interrupt vector to service the TMS34082A request. Interrupts are cleared by reading the TMS34082A status register.

## host-independent mode operation

Operation in the host-independent mode assumes MSTR high. The TMS34082A has several hardware control signals, as well as programmable features, which support system functions such as initialization, data transfer, or interrupts in the host-independent mode. CLK provides the input clock to the TMS34082A. Details of initialization, LAD and MSD bus interface control, and interrupt handling are provided in the following sections.

## initialization (host-independent mode)

To simplify initialization of external program memory, the TMS34082A provides a bootstrap loader to perform an initial program load of 64 instructions. Once invoked, the loader causes the TMS34082A to read 65 words from the LAD bus and write 64 words out to the external program memory on the MSD bus, beginning with location 0 . The first word read is used to initialize the configuration register.
This loader is invoked by first setting $\overline{\text { RESET }}$ low, and then $\overline{N T R}$ low. A separate timing diagram for using the bootstrap loader is provided (see Figure 34). INTR should be taken low after $\overline{\text { RESET }}$ is already low, as shown in the diagram. When the bootstrap loader is started, the FPU core is reset (internal states and status are cleared, but not data registers) and the stack pointer, program counter, and interrupt vector register are all set to zero.
$\overline{\text { RESET }}$ must be set high again before the loader operation can start (see Figure 34). Once the loader is active, an external interrupt (signalled by $\overline{\mathrm{NTR}}$ low) will not be granted until the load sequence is finished. However, $\overline{R E S E T}$ going low terminates the load sequence, regardless of whether the sequence is complete. When the load sequence is finished, the device begins program execution at external address 0.

## LAD bus control (host-independent mode)

Data transfer from the LAD bus (LAD31-0) is controlled primarily by output signals, $\overline{\mathrm{ALTCH}}, \overline{\mathrm{WE}}$, and $\overline{\mathrm{CAS}}$. $\overline{\mathrm{ALTCH}}$ is the address write strobe that signals an address is being output on the LAD bus. The $\overline{\mathrm{CAS}}$ signal is the read strobe, and $\overline{W E}$ is the write enable output to memory.
If a bidirectional FIFO is used instead of memory, $\overline{C A S}$ can be directly connected to the read clock and $\overline{W E}$ to the write clock. The CC input can be used to signal the TMS34082A when data is ready for input from the FIFO stack.

Data input on the LAD bus can be written to data registers, control registers, or passed through for output on the MSD bus. Alternatively, the LAD bus input can be selected directly as an FPU source operand without writing to a register.
An FPU result can be written to a data register and at the same time be passed out on the LAD bus. When this is done, the clock period may need to be extended up to 15 ns (TMS34082A-40) to allow for the propagation delay from the FPU core to the outputs.

Depending on the specific system implementation, transferring data to and from the LAD bus without intervening register operations may significantly improve throughput. In the host-independent mode, data moves to and from internal registers can be minimized at the cost of adjusting the clock period to assure integrity of FPU inputs to and output from the LAD bus.

POST OFFICE BOX 655303 • DALLAS, TEXAS 75265

## TMS34082A <br> GRAPHICS FLOATING-POINT PROCESSOR

D3150, SEPTEMBER 1988 - REVISED MAY 1991 - SCGS001

## MSD bus control (host-independent mode)

The MSD bus can be used to access either external data memory or external code memory, depending on the combination of control signals required. If the memory on the MSD port is shared with a host processor, the $\overline{\text { MAE }}$ and RDY signals can be used to prevent conflicts between the host and the TMS34082A. When memory on the MSD port is shared, the host processor can monitor the state of the TMS34082A memory chip enable (MCE) to determine when the TMS34082A is not accessing the memory.
Otherwise, the $\overline{M A E}$ signal may be tied low (if unused), and the TMS34082A can use $\overline{M O E}, \overline{M C E}, \overline{M W R}$, and DS/CS to control external memory operations into either data space or code space, as selected by DS/CS.

## interrupt handling (host-independent mode)

Interrupts to the TMS34082A can be signalled by setting the interrupt request input (INTR) low. INTR is associated with the vector in the interrupt vector register. Software interrupts are signalled by setting the software interrupt flag in the status register.

In the event of an FPU status exception in the host-independent mode, an interrupt is generated that causes a branch to an exception handler routine. The address of the exception handler is stored in the interrupt vector register by the user prior to execution of the FPU program. Interrupts may be disabled by setting the appropriate bits in the status register.
absolute maximum ratings over operating free-air temperature range (unless otherwise noted) $\dagger$
Supply voltage, $\mathrm{V}_{\mathrm{CC}}$ (see Note 1) ...................................................................... 6 V

Off-state output voltage range ....................................................................... 2 V to 6 V
Operating free-air temperature range $\ldots \ldots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .0^{\circ} \mathrm{C}$ to $70^{\circ} \mathrm{C}$
Storage temperature range ................................................................... $-10^{\circ} \mathrm{C}$ to $150^{\circ} \mathrm{C}$
$\dagger$ Stresses beyond those listed under "absolute maximum ratings" may cause permanent damage to the device. These are stress ratings only and functional operation of the device at these or any other conditions beyond those indicated under "recommended operating conditions" is not implied. Exposure to absolute-maximum-rated conditions for extended periods may affect device reliability.
NOTE 1: All voltage levels are with respect to ground ( $V_{S S}$ ).
recommended operating conditions

|  |  |  |  | MIN | NOM | MAX | UNIT |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| $V_{\text {CC }}$ | Supply voltage |  |  | 4.75 | 5 | 5.25 | V |
| $V_{\text {SS }}$ | Supply voltage (see Note 2) |  |  | 0 | 0 | 0 | V |
| $\mathrm{V}_{\mathrm{IH}}$ | High-level input voltage |  |  | 2 |  | cc+0.3 | V |
| $\mathrm{V}_{\text {IL }}$ | Low-level input voltage |  |  | -0.3 |  | 0.8 | V |
| ${ }^{\mathrm{I} \mathrm{OH}}$ | High-level output current |  |  |  |  | -8 | mA |
| 1 OL | Low-level output current |  |  |  |  | 8 | mA |
| ${ }_{\text {f clock }}$ | Clock frequency | Coprocessor mode | TMS34082A-32 |  |  | 8 | MHz |
|  |  | Coprocessor mode | TMS34082A-40 |  |  | 10 |  |
|  |  | Host-independent mode | TMS34082A-32 |  |  | 16.7 |  |
|  |  |  | TMS34082A-40 |  |  | 20 |  |
| TA | Operating free-air temperature |  |  | 0 |  | 70 | ${ }^{\circ} \mathrm{C}$ |

NOTE 2: In order to minimize noise on $V_{S S}$, care should be taken to provide a minimum-inductance path between the $V_{S S}$ pins and system ground.
electrical characteristics over recommended operating free-air temperature range (unless otherwise noted)

| PARAMETER |  |  | TEST CONDITIONS |  | MIN | TYP $\ddagger$ | MAX | UNIT |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| $\mathrm{V}_{\mathrm{OH}}$ | High-level output voltage |  | $\mathrm{V}_{\text {CC }}=4.75 \mathrm{~V}$, | $1 \mathrm{OH}=-8 \mathrm{~mA}$ | 2.6 |  |  | V |
| $\mathrm{V}_{\mathrm{OL}}$ | Low-level output voltage |  | $\mathrm{V}_{\text {CC }}=4.75 \mathrm{~V}$, | $1 \mathrm{OL}=8 \mathrm{~mA}$ |  |  | 0.6 | V |
| 10 | High-impedance bidirectional pins output current |  | $\mathrm{V}_{\mathrm{CC}}=4.75 \mathrm{~V}$, | $\mathrm{V}_{\mathrm{O}}=2.8 \mathrm{~V}$ |  |  | 10 | $\mu \mathrm{A}$ |
|  |  |  | $\mathrm{V}_{\text {CC }}=4.75 \mathrm{~V}$, | $\mathrm{V}_{\mathrm{O}}=0.6 \mathrm{~V}$ |  |  | -10 |  |
| 11 | Input current |  | $\mathrm{V}_{1}=\mathrm{V}_{\text {SS }}$ to $\mathrm{V}_{\text {C }}$ |  |  |  | $\pm 5$ | $\mu \mathrm{A}$ |
| $\mathrm{ICC}^{\text {§ }}$ | Supply current | Dynamic | $V_{C C}=5.25 \mathrm{~V}$ |  |  |  | 300 | mA |
|  |  | Quiescent | $\mathrm{V}_{1}=\mathrm{V}_{\text {IL }}$ max or | $\mathrm{I}^{\mathrm{OH}}=1 \mathrm{OL}=0$ |  |  | 50 | mA |
|  |  |  | $\mathrm{V}_{1}=0.2 \mathrm{Vor} \mathrm{V}_{\text {c }}$ | $\mathrm{I}_{\mathrm{OH}}=\mathrm{I}_{\mathrm{OL}}=0$ |  |  | 50 |  |
| $\mathrm{C}_{i}$ | Input capacitance |  |  |  |  | 10 |  | pF |

$\ddagger$ All typical values are at $\mathrm{V}_{C C}=5 \mathrm{~V}$ and $\mathrm{T}_{\mathrm{A}}=25^{\circ} \mathrm{C}$.
§ICC is measured at maximum clock frequency. Inputs are presented with random logic highs and lows to assure the toggling of internal nodes.

## coprocessor mode (MSTR low)

switching characteristics over recommended ranges of supply voltage and operating free-air temperature (unless otherwise noted) $\dagger$
propagation delay times

| PARAMETER |  | FIGURE | TMS34082A-32 |  | TMS34082A-40 |  | UNIT |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  |  | MIN | MAX | MIN | MAX |  |
| $t_{p}$ (ATCL-CORV) | Propagation delay time, $\overline{\text { ALTCH }}$ low to CORDY valid |  | 11 |  | 40 |  | 35 | ns |
| tp(ATCH-LADV) | Propagation delay time, $\overline{\text { ALTCH }}$ high to LAD data valid | 16 |  | 35 |  | 30 |  |  |
| $\mathrm{t}_{\mathrm{p}}$ (CASL-LADV) | Propagation delay time, $\overline{\text { CAS }}$ low to LAD data valid | 14 |  | 30 |  | 25 |  |  |
| tp(CASH-LADZ) | Propagation delay time, $\overline{\text { CAS }}$ high to LAD disabled | 14 |  | 30 |  | 25 |  |  |
| $\mathrm{t}_{\mathrm{p}}($ LC1-DCSL)ML | Propagation delay time, LCLK1 $\uparrow$ or $\downarrow$ to DS/ $\overline{\mathrm{CS}}$ low with MEMCFG low | 17, 21, 23 |  | 21 |  | 18 |  |  |
| tp(LC1-DCSH)ML | Propagation delay time, LCLK1 $\uparrow$ or $\downarrow$ to DS/CS high with MEMCFG low | $\begin{aligned} & \hline 17,19,21, \\ & 23,24,26 \\ & \hline \end{aligned}$ |  | 21 |  | 18 |  |  |
| $\mathrm{t}_{\mathrm{p}}(\mathrm{LC} 1-\mathrm{DCSL}$ )MH | Propagation delay time, LCLK1 $\uparrow$ or $\downarrow$ to DS/CS low with MEMCFG high | $\begin{array}{\|c\|} \hline 18,20,22, \\ 25,27 \\ \hline \end{array}$ | 3 | 26 | 3 | 18 |  |  |
| $\mathrm{t}_{\mathrm{p}}(\mathrm{LC} 1-\mathrm{DCSH}) \mathrm{ML}$ | Propagation delay time, LCLK1 $\uparrow$ or $\downarrow$ to $\mathrm{DS} / \mathrm{CS}$ high with MEMCFG high | $\begin{gathered} \hline 18,20,22, \\ 25,27 \\ \hline \end{gathered}$ | 3 | 13 | 3 | 11 |  |  |
| tp(LC1-DCSH)ML | Propagation delay time, LCK1 $\uparrow$ or $\downarrow$ to MCE low | $\begin{aligned} & \hline 17-19, \\ & 21-27 \end{aligned}$ | 3 | 21 | 3 | 18 |  |  |
| $\mathrm{t}_{\mathrm{p}}($ LC1-DCSH)ML | Propagation delay time, LCLK1 $\uparrow$ or $\downarrow$ to $\overline{\text { MCE }}$ high with MEMCFG low | $\begin{gathered} 17,19,21, \\ 23 \end{gathered}$ | 3 | 23 | 3 | 18 |  |  |
| tp(LC1-MCEH)MH | Propagation delay time, LCLK1 $\uparrow$ or $\downarrow$ to $\overline{\text { MCE }}$ high with MEMCFG high | $\begin{array}{\|c} \hline 18,22,25, \\ 27 \end{array}$ | 3 | 13 |  | 11 |  |  |
| ${ }^{\text {tp }}$ (LC1-MOEL) | Propagation delay time, LCLK1 $\uparrow$ or $\downarrow$ to $\overline{\text { MOE }}$ low | $\begin{array}{\|c} \hline 17,18, \\ 21-23,26, \\ 27 \end{array}$ | 10 | 30 |  | 25 |  |  |
| tp(LC1-MOEH) | Propagation delay time, LCLK1 $\uparrow$ or $\downarrow$ to $\overline{\text { MOE }}$ high | $\begin{array}{\|c} \hline 17,18, \\ 21-23,26, \\ 27 \end{array}$ | 3 | 13 |  | 11 |  |  |
| ${ }^{\text {tp }}$ (LC1-MSDV) | Propagation delay time, LCLK1 $\uparrow$ or $\downarrow$ to MSA address valid | 17-27 |  | 20 |  | 18 |  |  |
| ${ }^{\text {tp }}$ (LC1-MSDV) | Propagation delay time, LCLK1 $\uparrow$ or $\downarrow$ to MSD data valid | $\begin{gathered} \hline 19,20-22, \\ 24,25 \end{gathered}$ |  | 38 |  | 36 |  |  |
| $t_{p(L C 1-M W R L) ~}^{\text {a }}$ | Propagation delay time, LCLK1 $\uparrow$ or $\downarrow$ to $\overline{\text { MWR }}$ low | $\begin{array}{\|c\|} \hline 19-22,24, \\ 25 \\ \hline \end{array}$ | 10 | 30 | 10 | 25 |  |  |
| ${ }^{\text {tp }}$ (LC1-MWRH) | Propagation delay time, LCLK $1 \uparrow$ or $\downarrow$ to $\overline{\text { MWR }}$ high | $\begin{gathered} 20-22,24, \\ 25 \end{gathered}$ | 3 | 13 | 3 | 11 |  |  |
| ${ }^{\text {tp }}$ (LC1H-COIL) | Propagation delay time, LCLK1 $\uparrow$ to $\overline{\text { COINT }}$ low | 12 |  | 23 |  | 15 |  |  |
| $\mathrm{t}_{\mathrm{p}}$ (LC1H-COIH) | Propagation delay time, LCLK1 $\uparrow$ to $\overline{\text { COINT }}$ high | 12 |  | 23 |  | 15 |  |  |
| tp(LC1H-LADV) | Propagation delay time, LCLK1 $\uparrow$ to LAD data valid | 16 |  | 28 |  | 23 |  |  |
| tp(MSDV-LADV) | Propagation delay time, MSD data valid to LAD data valid | 26, 27 |  | 30 |  | 25 |  |  |
| tp(RASH-LADXZ) | Propagation delay time, $\overline{\text { RAS }}$ high to LAD disabled | 16 |  | 30 |  | 25 |  |  |

$\dagger$ See Parameter Measurement Information for load circuit, voltage waveforms, and timing diagrams. The device parameters are measured for PIPES2 high and PIPES1 low. No other pipeline settings are specified.

## coprocessor mode (MSTR low)

switching characteristics over recommended ranges of supply voltage and operating free-air temperature (unless otherwise noted) (continued) ${ }^{\dagger}$
enable and disable times

| PARAMETER |  | FIGURE | TMS34082A-32 |  | TMS34082A-40 |  | UNIT |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  |  | MIN | MAX | MIN | MAX |  |
| ten(LOEL-LADZX) | Enable time, LOE low to LAD enabled |  | 16 | 3 | 15 | 3 | 14 | ns |
| $\mathrm{t}_{\text {en }}$ (MAEL-MSAZX) | Enable time, MAE low to MSA enabled | 21, 22 | 3 | 15 | 3 | 12 |  |  |
| ten(MAEL-MSDZX) | Enable time, MAE low to MSD enabled | 22 | 3 | 15 | 3 | 12 |  |  |
| $t_{\text {dis }}$ (LOEH-LADXZ) | Disable time, LOE high to LAD disabled | 16 | 3 | 15 | 3 | 12 | ns |  |
| $\mathrm{t}_{\text {dis }}$ (MAEH-MSAXZ) | Disable time, $\overline{\text { MAE }}$ high to MSA disabled | 21, 22 | 3 | 15 | 3 | 12 |  |  |
| $t_{\text {dis }}$ (MAEH-MSDXZ) | Disable time, MAE high to MSD disabled | 21 | 3 | 15 | 3 | 12 |  |  |

valid times

| PARAMETER |  | FIGURE | TMS34082A-32 |  | TMS34082A-40 |  | UNIT |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  |  | MIN | MAX | MIN | MAX |  |
| $t_{v}$ (MWRH-MSA) | Valid time, MSA address after MWR high |  | 20-22, 24, 25 | 1 |  | 1 |  | ns |
| $\mathrm{t}_{\mathrm{V} \text { (MWRH-MSD) }}$ | Valid time, MSD output data after MWR high | 20-22, 24, 25 | 1 |  | 1 |  |  |  |
| $\mathrm{t}_{\mathrm{v}}$ (LC1-MSA) | Valid time, MSA address valid after LCK $\uparrow$ or $\downarrow$ | 17-22, 24-27 | 3 |  | 3 |  |  |  |
| $\mathrm{t}_{\mathrm{v}}(\mathrm{LC} 1 \mathrm{~L}-\mathrm{COR})$ | Valid time, CORDY valid after LCLK1 low | 11 | 0 |  | 0 |  |  |  |

timing requirements over recommended ranges of supply voltage and operating free-air temperature (unless otherwise noted) ${ }^{\dagger}$
clock period and pulse duration

|  | PARAMETER | FIGURE | PIPELINE CONTROLS PIPES2-PIPES1 | TMS34082A-32 |  | TMS34082A-40 |  | UNIT |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  |  |  |  | MIN | MAX | MIN | MAX |  |
| $\mathrm{t}_{\mathrm{C}}(\mathrm{LC} 1)$ | Clock period, LCLK1 (1/f clock) | 10, 17-22, 24-27 | X0 | 125 |  | 100 |  | ns |
|  |  |  | 11 | 152 |  | 136 |  |  |
| $\mathrm{t}_{\mathrm{c}(\mathrm{LC} 2)}$ | Clock period, LCLK2 (1/f clock) | 10 | $\times 0$ | 125 |  | 100 |  |  |
|  |  |  | 11 | 152 |  | 136 |  |  |
| ${ }^{t}$ w(LC1H) | Pulse duration, LCLK1 high | 10 | X0 | 52.5 |  | 42.5 |  | ns |
|  |  |  | 11 | 66 |  | 61 |  |  |
| $t_{\text {w }}$ (LC1L) | Pulse duration, LCLK1 low | 10 | X0 | 52.5 |  | 42.5 |  |  |
|  |  |  | 11 | 66 |  | 61 |  |  |
| $\mathrm{t}_{\mathrm{w}}(\mathrm{LC} 2 \mathrm{H})$ | Pulse duration, LCLK2 high | 10 | $\times 0$ | 52.5 |  | 42.5 |  |  |
|  |  |  | 11 | 66 |  | 61 |  |  |
| ${ }^{\text {w }}$ (LC2L) | Pulse duration, LCLK2 low | 10 | X0 | 52.5 |  | 42.5 |  |  |
|  |  |  | 11 | 66 |  | 61 |  |  |
| $\mathrm{tw}_{\mathrm{w}}$ (DCSH)MH | Pulse duration, DS/ $\overline{\mathrm{CS}}$ high with MEMCFG high | 20,25,27 | XX | 5 |  | 7 |  |  |
| $\mathrm{t}_{\mathrm{w}}$ (RSTL) | Pulse duration, $\overline{\text { RESET }}$ low | 12 | XX | 30 |  | 30 |  |  |
| $t_{\text {w }}$ (MCEH) | Pulse duration, $\overline{M C E}$ high | 18, 25, 27 | XX | 5 |  | 7 |  |  |
| $\mathrm{t}_{\text {w }}$ (MOEH) | Pulse duration, $\overline{\text { MOE }}$ high | 17, 18, 23, 26, 27 | XX | 8 |  | 8 |  |  |
| $\mathrm{t}_{\text {w }}$ (MWRH) | Pulse duration, MWR high | 20,24, 25 | XX | 8 |  | 8 |  |  |

[^2] PIPES2 high and PIPES1 low. No other pipeline settings are specified.

## TMS34082A

GRAPHICS FLOATING-POINT PROCESSOR

D3150, SEPTEMBER 1988 - REVISED MAY 1991 - SCGS001
coprocessor mode (MSTR Iow)
timing requirements over recommended ranges of supply voltage and operating free-air temperature (unless otherwise noted) (continued) ${ }^{\dagger}$
transition times

|  | PARAMETER |  | TMS34082A-32 | TMS34082A-40 |  |
| :--- | ---: | ---: | ---: | ---: | ---: |
|  | FIGURE | MIN | MAX | MIN | MAX |
| UNIT |  |  |  |  |  |
| $\mathrm{t}_{\mathrm{t}(\mathrm{LC} 1)}$ | Transition time, LCLK1 | 10 | 15 | 13.5 |  |
| $\mathrm{t}_{\mathrm{t}(\mathrm{LC} 2)}$ | Transition time, LCLK2 | 10 | ns |  |  |

setup and hold times

| PARAMETER |  | FIGURE | TMS34082A-32 |  | TMS34082A-40 |  | UNIT |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  |  | MIN | MAX | MIN | MAX |  |
| $\mathrm{t}_{\text {Su( }}$ (BUS-LC2H) | Setup time, BUSFLT valid before LCLK2 $\uparrow$ |  | 11 | 20 |  | 13 |  | ns |
| $\mathrm{t}_{\text {su }}(\mathrm{CC}-\mathrm{LC} 1)$ | Setup time, CC valid before LCLK1 $\uparrow$ or $\downarrow$ | 12 | 7 |  | 5 |  |  |  |
| $t_{\text {Su }}$ (LAD-ATCL) | Setup time, LAD address valid before ALTCH low | 13-16, 23 | 15 |  | 12 |  |  |  |
| $\mathrm{t}_{\text {SU(LAD-CASH) }}$ | Setup time, LAD address valid before $\overline{\text { cas }}$ high | 13, 15, 24, 25 | 13 |  | 10 |  |  |  |
| $\mathrm{t}_{\text {Su }}$ (LRD-LC2H) | Setup time, LRDY valid before LCLK2 $\uparrow$ | 11 | 20 |  | 13 |  |  |  |
| $\mathrm{t}_{\text {SU(MSD-LC }}$ () | Setup time, MSD data valid before LCLK1 $\uparrow$ or $\downarrow$ | 17, 18, 23 | 11 |  | 7 |  |  |  |
| $t_{\text {Su(RASH-ATCL }}$ | Setup time, RAS high before $\overline{\text { ALTCH }}$ low | 13-15, 23 | 35 |  | 30 |  |  |  |
| $\mathrm{t}_{\text {Su(RDYL-LC1) }}$ | Setup time, RDY low before LCLK1 $\uparrow$ or $\downarrow$ | 12 | 20 |  | 10 |  |  |  |
| $\mathrm{t}_{\text {Su }}$ (RSTH-LC1) | Setup time, RESET high before LCLK1 $\uparrow$ or $\downarrow$ | 12 | 40 |  | 40 |  |  |  |
| ${ }^{\text {S Su(SF-ATCL }}$ ) | Setup time, SF valid before ALTCH low | 13-16, 23 | 15 |  | 10 |  |  |  |
| $t_{\text {Su(WEL-CASL) }}$ | Setup time, WE low for data write before CAS low | 13, 16 | 15 |  | 12 |  |  |  |
| th(ATCH-SF) | Hold time, SF valid after ALTCH high | 13-15, 23 | 15 |  | 12 |  | ns |  |
| th(ATCL-LAD) | Hold time, LAD address valid after ALTCH low | 13-16, 23 | 21 |  | 13 |  |  |  |
| $t_{\text {( }}$ (CASH-LAD) | Hold time, LAD data valid after $\overline{\mathrm{CAS}}$ high | 13, 15, 24, 25 | 0 |  | 0 |  |  |  |
| th(CASH-SF) | Hold time, SF valid after $\overline{\mathrm{CAS}}$ high | 13-15, 23 | 15 |  | 12 |  |  |  |
| th(LC1-CC) | Hold time, CC valid after LCLK1 $\uparrow$ or $\downarrow$ | 12 | 3 |  | 3 |  |  |  |
| $\mathrm{th}(\mathrm{LC} 1-\mathrm{MSD})$ | Hold time, MSD input data valid after LCLK1 $\uparrow$ or $\downarrow$ | 17, 18, 23 | 5 |  | 5 |  |  |  |
| th(LC1-RDY) | Hold time, RDY valid after LCLK1 $\uparrow$ or $\downarrow$ | 12 | 3 |  | 3 |  |  |  |
| th(LC1H-LC2L) | Hold time, LCLK2 low after LCLK1 high | 10 | 16 |  | 12 |  |  |  |
| th(LC2H-BUS) | Hold time, BUSFLT valid after LCLK2 high | 11 | 0 |  | 0 |  |  |  |
| th(LC2H-LC1H) | Hold time, LCLK1 high after LCLK2 high | 10 | 16 |  | 12 |  |  |  |
| th(LC2H-LRD) | Hold time, LRDY valid after LCLK2 high | 11 | 0 |  | 0 |  |  |  |
| th(WEH-SF) | Hold time, SF valid after WE high | 13 | 15 |  | 12 |  |  |  |

$\dagger$ See Parameter Measurement Information for load circuit, voltage waveforms, and timing diagrams. The device parameters are measured for PIPES2 high and PIPES1 low. No other pipeline settings are specified.
coprocessor mode (MSTR low)
timing requirements over recommended ranges of supply voltage and operating free-air temperature (unless otherwise noted) (continued) ${ }^{\dagger}$
delay times

| PARAMETER |  | FIGURE | TMS34082A-32 |  | TMS34082A-40 |  | UNIT |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  |  | MIN | MAX | MIN | MAX |  |
| $\mathrm{t}_{\mathrm{d}}$ (DCSL-MCEL)MH | Delay time, DS/CS high to $\overline{\mathrm{MCE}}$ low with MEMCFG high |  | 18, 22 | 4 |  | 4 |  | ns |
| $\mathrm{t}_{\mathrm{d}}(\mathrm{DCSH}-\mathrm{MWRL})$ | Delay time, DS/CS high to MWR low | 19, 24 | 6 |  | 6 |  |  |  |
| $t_{d}(\mathrm{MCEH}-\mathrm{DCSL}) \mathrm{MH}$ | Delay time, $\overline{M C E}$ high to DS/CS Iow with MEMCFG high | 20 | 4 |  | 4 |  |  |  |
| $\mathrm{t}_{\mathrm{d}}$ (MCEH-MWRL) | Delay time, $\bar{M} C E$ high to $\overline{M W R}$ low | 25 | 7 |  | 7 |  |  |  |
| $\mathrm{t}_{\mathrm{d}}$ (MOEH-MWRL) | Delay time, MOE high to MWR low | 19 | 7 |  | 7 |  |  |  |
| ${ }^{\text {t }}$ (MSAV-MWRL $)$ | Delay time, MSA valid to $\overline{\text { MWR }}$ low | $\begin{gathered} 20-22,24 \\ 25 \end{gathered}$ | 5 |  | 5 |  |  |  |
| $\mathrm{t}_{\mathrm{d}}(\mathrm{MSDZ}$-MOEL) | Delay time, MSD disabled to MOE low | 21, 22 | 3 |  | 3 |  |  |  |
| $\mathrm{t}_{\mathrm{d}}$ (MWRH-MCEL)MH | Delay time, MWR high to MCE low with MEMCFG high | 25 | 4 |  | 4 |  |  |  |
| $\mathrm{t}_{\mathrm{d} \text { (MWRH-MOEL) }}$ | Delay time, $\overline{M W R}$ high to $\overline{M O E}$ low | 19, 21, 22 | 7 |  | 7 |  |  |  |
| td(MWRH-MSDVZ) | Delay time, MWR high to MSD disabled | 21 | 1 | 9 | 1 | 9 |  |  |
| $\mathrm{t}_{\mathrm{d}}$ (MWRL-MSDZX) | Delay time, $\overline{\text { MWR }}$ low to MSD enabled | 21, 22 | 0 | 7 | 0 | 7 |  |  |

$\dagger$ See Parameter Measurement information for load circuit, voltage waveforms, and timing diagrams. The device parameters are measured for PIPES2 high and PIPES1 low. No other pipeline settings are specified.

## TMS34082A

GRAPHICS FLOATING-POINT PROCESSOR

D3150, SEPTEMBER 1988 - REVISED MAY 1991 - SCGS001
host-independent mode (MSTR high)
switching characteristics over recommended ranges of supply voltage and operating free-air temperature (unless otherwise noted) ${ }^{\dagger}$
propagation delay times


[^3]TMS34082A GRAPHICS FLOATING-POINT PROCESSOR

SCGS001 - D3150, SEPTEMBER 1988 - REVISED MAY 1991
host-independent mode (MSTR high)
switching characteristics over recommended ranges of supply voltage and operating free-air temperature (unless otherwise noted) (continued) ${ }^{\dagger}$
enable and disable times

| PARAMETER |  | FIGURE | TMS34082A-32 |  | TMS34082A-40 |  | UNIT |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  |  | MIN | MAX | MIN | MAX |  |
| ten(CLKH-LADZX) | Enable time, CLK high to LAD enabled |  | 29, 30 | 5 |  | 5 |  | ns |
| ten(LOEL-LADZX) | Enable time, $\overline{\text { LOE }}$ low to LAD enabled | 33 | 5 | 18 | 5 | 14 |  |  |
| ten(MAEL-MSAZX) | Enable time, $\overline{\text { MAE }}$ low to MSA enabled | 41, 42 | 3 | 15 | 3 | 12 |  |  |
| ten(MAEL-MSDZX) | Enable time, $\overline{\text { MAE }}$ low to MSD enabled | 42 | 3 | 15 | 3 | 12 |  |  |
| $\mathrm{t}_{\text {dis }}$ (CLKH-LADXZ) | Disable time, CLK high to LAD disabled $\ddagger$ | 29,30 |  | 25 |  | 23 | ns |  |
| $\mathrm{t}_{\text {dis }}($ LOEH-LADXZ) | Disable time, $\overline{\text { LOE }}$ high to LAD disabled | 33 | 5 | 15 | 5 | 12 |  |  |
| $\mathrm{t}_{\text {dis }}($ MAEH-MSAXZ $)$ | Disable time, MAE high to MSA disabled | 41, 42 | 3 | 15 | 3 | 12 |  |  |
| t dis(MAEH-MSDXZ) | Disable time, $\overline{\text { MAE }}$ high to MSD disabled | 42 | 3 | 15 | 3 | 12 |  |  |

valid times

| PARAMETER |  | FIGURE | TMS34082A-32 |  | TMS34082A-40 |  | UNIT |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  |  | MIN | MAX | MIN | MAX |  |
| $t_{v}$ (ATCH-LAD) | Valid time, LAD output data after ALTCH high |  | 29,30 | 2 |  | 2 |  | ns |
| $\mathrm{t}_{\mathrm{v}}$ (CLKH-MSA) | Valid time, MSA address valid after CLK high | 35-47 | 3 |  | 3 |  |  |  |
| $\mathrm{t}_{\mathrm{V} \text { (MWRH-MSD) }}$ | Valid time, MSD data valid after MWR high | 35, 36, 40-42 | 1 |  | 1 |  |  |  |
| $t^{\text {V (MWRH-MSA) }}$ | Valid time, MSA address valid after MWR high | 35, 36, 40-41 | 1 |  | 1 |  |  |  |
| $t_{v}$ (WEH-LAD) | Valid time, LAD data valid after $\bar{W}$ | 30, 33, 43, 44 | 2 |  | 2 |  |  |  |

$\dagger$ See Parameter Measurement Information for load circuit, voltage waveforms, and timing diagrams. The device parameters are measured for PIPES2 high and PIPES1 low. No other pipeline settings are specified.
$\ddagger$ Valid only for last write in sesries. The LAD bus is not placed in high-impedance state between consecutive outputs.

GRAPHICS FLOATING-POINT PROCESSOR
D3150. SEPTEMBER 1988 - REVISED MAY 1991 - SCGS001

## host-independent mode (MSTR high)

timing requirements over recommended ranges of supply voltage and operating free-air temperatre (unless otherwise noted) $\dagger$
clock period and pulse duration

| PARAMETER |  | FIGURE | PIPELINE CONTROLS PIPES2-PIPES1 | TMS34082A-32 | TMS34082A-40 | UNIT |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  |  | MIN MAX |  | MIN MAX |  |
| ${ }^{\text {t }}$ (CLK) | Clock period time, CLK (1/fclock) |  | 28-31, 33-48 | $\begin{aligned} & \hline \mathrm{X0} \\ & 11 \end{aligned}$ | $\begin{aligned} & 60 \\ & 66 \end{aligned}$ | $\begin{aligned} & 50 \\ & 61 \end{aligned}$ | ns |
| $t_{\text {w }}$ (ATCH) | Pulse duration, $\overline{\text { ALTCH }}$ high | 30 | XX | 7 | 7 | ns |
| $\mathrm{t}_{\mathrm{w}}$ (CASH) | Pulse duration, $\overline{C A S}$ high | 29,31, 32, 35, 36 | XX | 7 | 7 |  |
| ${ }^{\text {w }}$ (CLKH) | Pulse duration, CLK high | 28 | XX | 15 | 15 |  |
| $\mathrm{t}_{\mathrm{w}}$ (CLKL) | Pulse duration, CLK low | 28 | XX | 15 | 15 |  |
| $\mathrm{t}_{\mathrm{w} \text { (DCSH) }}$ | Pulse duration, DS/ES high | 36, 40, 44 | XX | 5 | 5 |  |
| $\mathrm{t}_{\text {W (ITRL) }}$ | Pulse duration, INTR Iow | 34, 47 | XX | 20 | 15 |  |
| $\mathrm{t}_{\mathrm{w}}(\mathrm{MCEH})$ | Pulse duration, $\overline{\mathrm{MCE}}$ high | 36, 38, 44-46 | XX | 5 | 5 |  |
| $\mathrm{t}_{\text {W }}$ (MOEH $)$ | Pulse duration, $\overline{\mathrm{MOE}}$ high | 37, 38, 43-46 | $X X$ | 8 | 8 |  |
| $t_{\text {w }}$ (MWRH) | Pulse duration, MWR high | 35, 36, 40 | XX | 8 | 8 |  |
| $t_{\text {w }}$ (RSTL) | Pulse duration, $\overline{\mathrm{RESET}}$ fow | 34 | XX | 30 | 20 |  |
| $t_{\text {w }}$ (WEH) | Pulse duration, $\overline{W E}$ high | 30, 33, 43, 44 | XX | 7 | 7 |  |

## transition time

| PARAMETER |  | FIGURE | TMS34082A-32 |  | TMS34082A-40 |  | UNIT |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  |  | MIN | MAX | MiN | MAX |  |
| $\mathrm{tt}_{( } \mathrm{CLK}$ ) | Transition time, CLK |  | 28 |  | 15 |  | 15 | ns |

$\dagger$ See Parameter Measurement Information for load circuit, voltage waveforms, and timing diagrams. The device parameters are measured for PIPES2 high and PIPES1 low. No other pipeline settings are specified.

## host-independent mode (MSTR high)

timing requirements over recommended ranges of supply voltage and operating free-air temperatre (unless otherwise noted) (continued) ${ }^{\dagger}$
setup and hold times

| PARAMETER |  | FIGURE | TMS34082A-32 |  | TMS34082A-40 |  | UNIT |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  |  | MIN | MAX | MIN | MAX |  |
| $\mathrm{t}_{\text {su( }}$ (CC-CLKH) | Setup time, CC before CLK high |  | 45 | 7 |  | 5 |  | ns |
| ${ }^{\text {tsu }}$ (LADV-CLKL) | Setup time, LAD data valid before CLK low for immediate data input $\ddagger$ | 32 | 10 |  | 10 |  |  |  |
| $\mathrm{t}_{\text {SU( }}$ (ITRL-CLKH) | Setup time, INTR before CLK high | 47 | 20 |  | 10 |  |  |  |
| $\mathrm{t}_{\text {su }}$ (LAD-CLKH) | Setup time, LAD input data valid before CLK high | $\begin{gathered} 29,31, \\ 34-36 \end{gathered}$ | 9 |  | 9 |  |  |  |
| tsu(LRD-CLKH) | Setup time, LRDY before CLK high | 48 | 20 |  | 15 |  |  |  |
| tsu(MSD-CLKH) | Setup time, MSD data valid before CLK high | $\begin{gathered} \hline 37,38, \\ 43-47 \end{gathered}$ | 10 |  | 8 |  |  |  |
| ${ }^{\text {t }}$ su(RDYV-CLKH) | Setup time, RDY valid before CLK high | 48 | 20 |  | 10 |  |  |  |
| $\mathrm{t}_{\text {su }}$ (RSTH-CLKH) | Setup time, $\overline{\text { RESET }}$ high before CLK high | 34 | 40 |  | 40 |  |  |  |
| ${ }^{\text {tsu(RSTL-ITRL) }}$ | Setup time, $\overline{\text { RESET }}$ low before $\overline{\text { INTR }}$ low for bootstrap loader | 34 | 10 |  | 10 |  | ns |  |
| th(CLKH-CC) | Hold time, CC after CLLK high | 45 | 0 |  | 0 |  |  |  |
| th(CLKH-ITR) | Hold time, $\overline{\text { NTTR }}$ after CLK high | 47 | 0 |  | 0 |  |  |  |
| th(CLKH-LAD) | Hold time, LAD input data valid after CLK high | $\begin{gathered} 29,31,35, \\ 36 \\ \hline \end{gathered}$ | 3 |  | 3 |  |  |  |
| $\mathrm{th}^{\text {(CLKH-LRD) }}$ | Hold time, LRDY after CLK high | 48 | 0 |  | 0 |  |  |  |
| th(CLKH-MSD) | Hold time, MSD data valid after CLK high | $\begin{gathered} \hline 37,38, \\ 43-47 \end{gathered}$ | 2 |  | 2 |  |  |  |
| th(CLKH-RDY) | Hold time, RDY after CLK high | 48 | 0 |  | 0 |  |  |  |
| th(CLKL-LAD) | Hold time, LAD data after CLK low for immediate data input ${ }^{\ddagger}$ | 32 | 5 |  | 5 |  |  |  |
| th(ITRL-RSTH) | Hold time, $\overline{\text { RESET }}$ low after $\overline{\text { INTR }}$ low for bootstrap loader | 34 | 10 |  | 10 |  |  |  |

$\dagger$ See Parameter Measurement Information for load circuit, voltage waveforms, and timing diagrams. The device parameters are measured for PIPES2 high and PIPES1 low. No other pipeline settings are specified.
$\ddagger$ This mode permits data input that does not meet the minimum setup before CLK high. The clock period for this mode must be extended according to the equation:

Adjusted clock period $=$ Normal clock period + Data delay $+5 n s$
The data delay is the delay from CLK high to valid data. This mode may not be used to input data for divides or square roots.

## D3150, SEPTEMBER 1988 - REVISED MAY 1991 - SCGS001

## host-independent mode (MSTR high)

timing requirements over recommended ranges of supply voltage and operating free-air temperature (unless otherwise noted) (continued) ${ }^{\dagger}$
delay times

| PARAMETER |  | FIGURE | TMS34082A-32 |  | TMS34082A-40 |  | UNIT |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  |  | MIN | MAX | MIN | MAX |  |
| ${ }^{\text {d }}$ (ATCH-CASL $)$ | Delay time, $\overline{\text { ALTCH }}$ high to $\overline{\text { CAS }}$ low |  | 29 | 6 |  | 6 |  | ns |
| $\mathrm{t}_{\mathrm{d}}$ (ATCH-WEL) | Delay time, $\overline{\text { ALTCH }}$ high to $\overline{W E}$ low | 30 | 5 |  | 5 |  |  |  |
| $\mathrm{t}_{\mathrm{d}}(\mathrm{CASH}-\mathrm{ATCL})$ | Delay time, CAS high to ALTCH low | 29 | 5 |  | 5 |  |  |  |
| ${ }^{t_{d}(C A S H-W E L)}$ | Delay time, $\overline{\text { CAS }}$ high to WE low | 33 | 5 |  | 5 |  |  |  |
| $\mathrm{t}_{\mathrm{d} \text { (COIL-ATCL) }}$ | Delay time, $\overline{\text { COINT }}$ low to ALTCH low | 29, 30 | 0 |  | 0 |  |  |  |
| $\mathrm{t}_{\mathrm{d}}(\mathrm{COIL}-\mathrm{CASL})$ | Delay time, $\overline{\text { COINT }}$ low to $\overline{\text { CAS }}$ low | 31,35, 36 | 2 |  | 2 |  |  |  |
| ${ }^{\text {t }}$ (COIL-WEL) | Delay time, COINT low to $\bar{W} E$ low | 33 | 0 |  | 0 |  |  |  |
| $\mathrm{t}_{\text {d(DCSH-MCEL) }}$ MH | Delay time, DS/CS high to MCE low with MEMCFG high | 38,42 | 4 |  | 4 |  |  |  |
| $\mathrm{t}_{\mathrm{d}}$ (DCSH-MWRL) | Delay time, DS/CS high to MWR low | 35, 39 | 6 |  | 6 |  |  |  |
| $\mathrm{t}_{\mathrm{d}}$ (MCEH-DCSL)MH | Delay time, MCE high to DC/CS low with MEMCFG high | 40 | 4 |  | 4 |  |  |  |
| $\mathrm{t}_{\mathrm{d}}(\mathrm{MCEH}-\mathrm{MWRL})$ | Delay time, $\overline{M C E}$ high to $\overline{M W R}$ low | 36 | 7 |  | 7 |  |  |  |
| $\mathrm{t}_{\text {d }}$ (MOEH-MWRL) | Delay time, $\bar{M}$ ( ${ }^{\text {a }}$ high to MWR low | 39 | 7 |  | 7 |  |  |  |
| ${ }^{\text {d }}$ (MSAV-MWRL) | Delay time, MSA valid to MWR low | $\begin{gathered} 35,36 \\ 40-42 \end{gathered}$ | 5 |  | 5 |  |  |  |
| $\mathrm{t}_{\mathrm{d}}$ (MSDZ-MOEL) | Delay time, MSD disabled to MOE low | 41, 42 | 3 |  | 3 |  |  |  |
| $\mathrm{t}_{\text {d(MWRH-MCEL }}$ MH | Delay time, $\overline{M W R}$ high to MCE low with MEMCFG high | 36 | 4 |  | 4 |  |  |  |
| $\mathrm{t}_{\mathrm{d}}$ (MWRH-MOEL) | Delay time, MWR high to MOE low | 41, 42 | 7 |  | 7 |  |  |  |
| ${ }^{\text {d }}$ d(MWRH-MSDXZ $)$ | Delay time, MWR high to MSD disabled | 42 | 1 | 9 | 1 | 9 |  |  |
| ${ }^{\text {d }}$ (MWRL-MSDZX) | Delay time, MWR low to MSD enabled | 41, 42 | 0 | 7 | 0 | 7 |  |  |
| $\mathrm{t}_{\mathrm{d}}(W \mathrm{WEH}-\mathrm{ATCL})$ | Delay time, WE high to ALTCH low | 29 | 5 |  | 5 |  |  |  |
| $t_{d}$ (WEH-CASL) | Delay time, $\overline{\text { WE }}$ high to $\overline{\mathrm{CAS}}$ low | 31 | 5 |  | 5 |  |  |  |

$\dagger$ See Parameter Measurement Information for load circuit, voltage waveforms, and timing diagrams. The device parameters are measured for PIPES2 high and PIPES1 low. No other pipeline settings are specified.

## EXPLANATION OF LETTER SYMBOLS

This data sheet uses a type of letter symbol based on JEDEC Std-100 and IEC Publication 748-2, 1985, to describe time intervals. The format is:

Where:
$t_{A(B C-D E) F}$
Subscript $A$ indicates the type of dynamic parameter being represented. One of the following is used:
Switching Characteristics:
$p=$ Propagation delay time
en = Enable time
dis $=$ Disable time
Timing Requirements:
c = Clock period
$\mathrm{w}=$ Pulse duration
$\mathrm{t}=$ Transition time
$d=$ Delay time
su $=$ Setup time
$h=$ Hold time
$v=$ Valid time
Subscript $B$ indicates the name of the signal or terminal for which a change of state or level (or establishment of a state or level) constitutes a signal event assumed to occur first, that is, at the beginning of the time interval.

Subscript $C$ indicates the direction of the transistion and/or the final state or level of the signal represented by B. One or two of the following are used:
$H=$ High or transition to high
L = Low or transition to low
$V=A$ valid steady-state level
$X=$ Unknown, changing, or "don't care" level
$Z=$ High-impedance (off) state
Subscript $D$ indicates the name of the signal or terminal for which a change of state or level (or establishment of a state or level) constitutes a signal event assumed to occur last, that is, at the end of the time interval.

Subscript $E$ indicates the direction of the transition and/or the final state or level of the signal represented by D. One or two of the symbols described in Subscript $C$ are used.

Subscript $F$ indicates additional information such as mode of operation, test conditions, etc.
The hyphen between the $C$ and D subscripts is omitted when no confusion is likely to occur. For these letter symbols on this data sheet, the signal names are further abbreviated as follows:

| SIGNAL NAME | B\&D SUBSCRIPT | SIGNAL NAME | $\begin{gathered} \text { B\&D } \\ \text { SUBSCRIPT } \end{gathered}$ | SIGNAL NAME | B \& D SUBSCRIPT | SIGNAL NAME | $\begin{gathered} \text { B\&D } \\ \text { SUBSCRIPT } \end{gathered}$ | SIGNAL NAME | $\begin{gathered} \hline \text { B\&D } \\ \text { SUBSCRIPT } \end{gathered}$ |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| $\overline{\text { ALTCH }}$ | ATC | CORDY | COR | LCLK2 | LC2 | MSA(0:15) | MSA | TCK | TCK |
| BUSFLT | BFT | DC/CS | DCS | $\overline{\text { LOE }}$ | LOE | MSD(0:31) | MSD | TDI | TDI |
| $\overline{\text { CAS }}$ | CAS | EC(0:1) | EC | LRDY | LRD | $\overline{\text { MWR }}$ | MWR | TDO | TDO |
| CC | CC | INTG | INT | $\overline{\text { MAE }}$ | MAE | $\overline{R A S}$ | RAS | TMS | TMS |
| CID(0:2) | CID | $\overline{\text { NTR }}$ | ITR | MSTR | MST | RDY | RDY | $\mathrm{V}_{\mathrm{CC}} N_{\text {Ss }}$ | - |
| CLK | CLK | LAD(0:31) | LAD | $\overline{\text { MCE }}$ | MCE | RESET | RST | WE | WE |
| COINT | COI | LCLK1 | LC1 | $\overline{\text { MOE }}$ | MOE | SF | SF | MEMCFG | M |

## PARAMETER MEASUREMENT INFORMATION



NOTES: A. Phase relationships between waveforms were chosen arbitrarily. All input pulses are supplied by pulse generators having the following characteristics: $\mathrm{PRR}=1 \mathrm{MHz}, \mathrm{Z}_{\mathrm{O}}=50 \Omega, \mathrm{t}_{\mathrm{r}} \leq 6 \mathrm{~ns}, \mathrm{t}_{\mathrm{f}} \leq 6 \mathrm{~ns}$.
B. Waveform 1 is for an output with internal conditions such that the output is low except when disabled by the output control.
C. Waveform 2 is for an output with internal conditions such that the output is high except when disabled by the output control. For tpLZ and $\mathrm{t}_{\mathrm{PHZ}}, \mathrm{V}_{\mathrm{OL}}$ and $\mathrm{V}_{\mathrm{OH}}$ are measured values.

FIGURE 9

PARAMETER MEASUREMENT INFORMATION


FIGURE 10. COPROCESSOR MODE, INPUT CLOCKS

$\dagger$ Q1, Q2, Q3, and Q4 represent the first, second, third, and fourth quarter clocks, respectively, of the LCLK1 clock period.
FIGURE 11. COPROCESSOR MODE, BUS CONTROL SIGNALS

PARAMETER MEASUREMENT INFORMATION


FIGURE 12. COPROCESSOR MODE, CONTROL SIGNALS

## PARAMETER MEASUREMENT INFORMATION


†Q1, Q2, Q3, and Q4 represent the first, second, third, and fourth quarter clocks, respectively, of the LCLK1 clock period.
FIGURE 13. COPROCESSOR MODE, TMS34020 GSP TO TMS34082A

PARAMETER MEASUREMENT INFORMATION

$\dagger$ Q1, Q2, Q3, and Q4 represent the first, second, third, and fourth quarter clocks, respectively, of the LCLK1 clock period.
FIGURE 14. COPROCESSOR MODE, TMS34082A TO TMS34020 GSP INCLUDING COPROCESSOR INTERNAL CYCLE

## PARAMETER MEASUREMENT INFORMATION


$\dagger$ Q1, Q2, Q3, and Q4 represent the first, second, third, and fourth quarter clocks, respectively, of the LCLK1 clock period.
FIGURE 15. COPROCESSOR MODE, DRAM/VRAM MEMORY TO TMS34082A

## GRAPHICS FLOATING-POINT PROCESSOR

D3150, SEPTEMBER 1988 - REVISED MAY 1991 - SCGS001
PARAMETER MEASUREMENT INFORMATION

† Q1, Q2, Q3, and Q4 represent the first, second, third, and fourth quarter clocks, respectively, of the LCLK1 clock period.
FIGURE 16. COPROCESSOR MODE, TMS34082A TO DRAM/VRAM MEMORY


[^4]FIGURE 17. COPROCESSOR MODE MSD BUS TIMING, MEMORY TO TMS34082A WITH MEMCFG LOW


NOTE: This example shows a data read followed by an instruction read followed by an instruction read. This option for using DS/ $\overline{\mathrm{CS}}$ as data space chip enable and MCE as code space chip enable is invoked by setting the MEMCFG bit high in the configuration register. When MEMCFG is high, DS/ $\overline{C S}$ and $\overline{M C E}$ rise after every clock edge. In this mode, DS/ $\overline{C S}$ and $\overline{M C E}$ may not both be active (low) at the same time.

FIGURE 18. COPROCESSOR MODE MSD BUS TIMING, MEMORY TO TMS34082A WITH MEMCFG HIGH

## PARAMETER MEASUREMENT INFORMATION



[^5]FIGURE 19. COPROCESSOR MODE MSD BUS TIMING, TMS34082A TO MEMORY WITH MEMCFG LOW

## PARAMETER MEASUREMENT INFORMATION



NOTE: This example shows multiple data writes. Timing for multiple code writes would be similar. This option for using DS/CS as data space chip enable and $\overline{M C E}$ as code space chip enable is invoked by setting the MEMCFG bit high in the configuration register. When MEMCFG is high, DS/ $\overline{C S}$ and $\overline{M C E}$ rise after every clock edge. In this mode, $\overline{D S} / \overline{C S}$ and $\overline{M C E}$ may not both be active (low) at the same time.

FIGURE 20. COPROCESSOR MODE MSD BUS TIMING, TMS34082A TO MEMORY WITH MEMCFG HIGH

PARAMETER MEASUREMENT INFORMATION

t The setting of DS/ $\overline{C S}$ determines whether the value on the MSD bus is an instruction or data.
$\ddagger \overline{\text { MCE }}$ does not toggle at each clock edge.
$\S \overline{M O E}$ goes high at each clock edge.
NOTE: This example shows a data write followed by an instruction read.
FIGURE 1. COPROCESSOR MODE, MSD ENABLE/DISABLE TIMING WITH MEMCFG LOW

## PARAMETER MEASUREMENT INFORMATION



NOTE: This example shows a data write followed by an instruction read. Timing for multiple code writes would be similar. This option for using $D S / \overline{C S}$ as data space chip enable and $\overline{M C E}$ as code space chip enable is invoked by setting the MEMCFG bit high in the configuration register. When MEMCFG is high, DS/ $\overline{\mathrm{CS}}$ and $\overline{\mathrm{MCE}}$ rise after every clock edge. In this mode, DS/ $\overline{\mathrm{CS}}$ and $\overline{\mathrm{MCE}}$ may not both be active (low) at the same time.

FIGURE 2. COPROCESSOR MODE, MSD BUS ENABLE/DISABLE TIMING WITH MEMCFG HIGH

PARAMETER MEASUREMENT INFORMATION

$\dagger$ Q1, Q2, Q3, and Q4 represent the first, second, third, and fourth quarter clocks, respectively, of the LCLK1 clock period.
$\ddagger$ The setting of DS/ $\overline{C S}$ determines whether the value on the MSD bus in an instruction or data.
$\S \overline{M C E}$ does not toggle at each rising clock edge.
$\llbracket \overline{M O E}$ goes hiigh at each rising clock edge.
FIGURE 3. COPROCESSOR MODE, JUMP TO EXTERNAL MEMORY SUBROUTINE WITH MEMCFG LOW

$\dagger \overline{M C E}$ does not toggle at each clock edge.
$\ddagger \overline{M O E}$ goes high at each clock edge.
FIGURE 4. COPROCESSOR MODE, LAD TO MSD BUS TRANSFER TIMING WITH MEMCFG LOW

## PARAMETER MEASUREMENT INFORMATION



[^6]FIGURE 5. COPROCESSOR MODE, LAD TO MSD BUS TRANSFER TIMING WITH MEMCFG HIGH

## PARAMETER MEASUREMENT INFORMATION


$\dagger \overline{M C E}$ does not toggle at each clock edge.
$\ddagger \overline{M O E}$ goes high at each clock edge.
FIGURE 6. COPROCESSOR MODE, MSD TO LAD BUS TRANSFER TIMING WITH MEMCFG LOW

$\dagger \mathrm{DS} / \overline{\mathrm{CS}}$ valid for moves to data space; $\overline{\mathrm{MCE}}$ valid for moves to code space. Only one would be valid for each move instruction.
NOTE: This option for using DS/CS as data space chip enable and $\overline{M C E}$ as code space chip enable is involved by setting the MEMCFG bit high in the configuration register.

FIGURE 7. COPROCESSOR MODE, MSD TO LAD BUS TRANSFER TIMING WITH MEMCFG HIGH

## D3150. SEPTEMBER 1988 - REVISED MAY 1991 - SCGS001

## PARAMETER MEASUREMENT INFORMATION



FIGURE 8. HOST-INDEPENDENT MODE, LAD BUS TIMING FOR MEMORY TO TMS34082A

$\dagger \overline{\text { COINT }}$ timing is for LADCFG high only. When the LADCFG bit is set high in the configuratin register, $\overline{\text { COINT }}$ is controlled by bit 1 of the LAD move instruction instead of the set mask instruction.
NOTE: This timing diagram assumes an external address latch to store address for external memory reads. Data input hold time on the latch is zero; data (or address) output hold time is nonzero.

FIGURE 9. HOST-INDEPENDENT MODE, LAD BUS TIMING FOR MEMORY TO TMS34082A

## PARAMETER MEASUREMENT INFORMATION


$\dagger$ Valid only for last write in series. The LAD bus is not placed in high-impedance state between consecutive outputs.
$\ddagger \overline{\text { COINT }}$ timing is for LADCFG high only. When the LADCFG bit is set high in the configuration register, $\overline{\text { COINT }}$ is controlled by bit 1 of the LAD move instruction instead of the set mask instruction.
NOTE: This timing diagram assumes an external address latch to store address for external memory reads. Data input hold time is zero. Data (or address) outputhold time is nonzero. Valid only for last write in series. The LAD bus is not placed in high impedance between consecutive outputs.

FIGURE 10. HOST-INDEPENDENT MODE, LAD BUS TIMING FOR TMS34082A TO MEMORY

D3150, SEPTEMBER 1988 - REVISED MAY 1991 - SCGSO01

$\dagger \overline{\text { COINT }}$ timing is for LADCFG high only. When the LADCFG bit is set high in the configuration register, $\overline{\text { COINT }}$ is controlled by bit 1 of the LAD move instruction instead of the set mask instruction.

FIGURE 11. HOST-INDEPENDENT MODE, LAD BUS TIMING INPUT TO TMS34082A

$\dagger$ This mode permits data input which does not meet the minimum setup before CLK high. For immediate data input, CLK must be high for more than 20 ns . This input mode cannot be used to input data for divides and square roots.

Adjusted clock period $=$ Normal clock period + Data delay $+5 n s$
FIGURE 12. HOST-INDEPENDENT MODE, LAD BUS TIMING INPUT OF IMMEDIATE DATA TO TMS34082A

$\dagger$ When the LADCFG bit is high, $\overline{L O E}$ high places $\overline{C A S}$ and $\overline{W E}$ (as well as the LAD bus) in high impedance.
$\ddagger$ Valid only for LADCFG high. When the LADCFG bit is high in the configuration register, COINT is controlled by bit 1 of the LAD move instruction instead of the set mask instruction.
NOTE: If the instruction writes the result of an FPU operation to a register and outputs the result to the LAD bus, in the same cycle, the minimum clock period must be extended.

FIGURE 13. HOST-INDEPENDENT MODE, LAD BUS TIMING OUTPUT FROM TMS34082A

## PARAMETER MEASUREMENT INFORMATION


$\dagger \overline{\text { RESET }}$ is level sensitive. When $\overline{R E S E T}$ is set low, both LAD and MSD buses are placed in high-impedance state. When $\overline{R E S E T}$ is released, the sequencer forces a jump to address 0 . If INTR goes low while $\overline{R E S E T}$ is low, the loader moves 64 words through to the external memory on MSD. Timing for the LAD to MSD move is shown in a later diagram, with the exception that the first word on LAD loads the configuration register and does not pass to the MSD bus.
$\ddagger$ INTR may be low one or more cycles after $\overline{R E S E T}$ goes low. $\overline{R E S E T}$ is held low, and then $\overline{\text { INTR }}$ is taken low. The bootstrap loader starts when $\overline{\text { RESET }}$ is set high, which may involve a delay of one or more cycles after INTR goes low.
NOTE: When the bootstrap loader is invoked, the first data word input on the LAD bus should be the configuration register settings, which will be written into the configuration register. This allows the user to select the MEMCFG setting, for reading or writing memory on the MSD port, as well as the LADCFG setting for the LAD bus interface.

FIGURE 14. HOST-INDEPENDENT MODE LAD BUS TIMING, BOOTSTRAP LOADER OPERATION

$\dagger \overline{\text { COINT }}$ timing is for LADCFG high only. When the LADCFG bit is set high in the configuration register, $\overline{\mathrm{COINT}}$ is controlled by bit 1 of the LAD move instruction instead of the set mask instruction.
$\ddagger \overline{M C E}$ does not toggle at each rising clock edge.
$\S \overline{M O E}$ goes high at each rising clock edge.
FIGURE 15. HOST-INDEPENDENT MODE, LAD TO MSD BUS TRANSFER TIMING WITH MEMCFG LOW

$\dagger$ COINT timing is for LADCFG high only. When the LADCFG bit is set high in the configuration register, COINT is controlled by bit 1 of the LAD move instruction instead of the set mask instruction.
$\ddagger \mathrm{DS} / \overline{\mathrm{CS}}$ valid for moves to data space; $\overline{\text { MCE }}$ valid for moves to code space. Only one of these would be valid for each move instruction. § This option for using DS/CS as data space chip enable and $\overline{M C E}$ as code space chip enable is invoked by setting the MEMCFG bit high in the configuration register.

FIGURE 16. HOST-INDEPENDENT MODE, LAD TO MSD BUS TRANSFER TIMING WITH MEMCFG HIGH

## PARAMETER MEASUREMENT INFORMATION


$\dagger$ The setting of DS/CS determines whether the value on the MSD bus is an instruction or data.
$\ddagger \overline{\mathrm{MCE}}$ does not toggle at each rising clock edge.
$\S \overline{M O E}$ goes high at each rising clock edge.
NOTE; This example shows a data read followed by an instruction read.
FIGURE 17. HOST-INDEPENDENT MODE MSD BUS TIMING, MEMORY TO TMS34082A WITH MEMCFG LOW

## GRAPHICS FLOATING-POINT PROCESSOR

D3150, SEPTEMBER 1988 - REVISED MAY 1991 - SCGS001

## PARAMETER MEASUREMENT INFORMATION



NOTE: This example shows a data read followed by an instruction read followed by an instruction read. This option for using DS/ $\overline{\mathrm{CS}}$ as data space chip enable and MCE as code space chip enable is invoked by setting the MEMCFG bit high in the configuration register. When MEMCFG is high, DS/CS and $\overline{M C E}$ rise after every rising clock edge. In this mode, DS/ $\overline{C S}$ and $\overline{M C E}$ may not both be active (low) at the same time.

FIGURE 18. HOST-INDEPENDENT MODE MSD BUS TIMING, MEMORY TO TMS34082A WITH MEMCFG HIGH

TMS34082A

## PARAMETER MEASUREMENT INFORMATION


$\dagger$ The setting of DS/CS determines whether the value on the MSD bus is an instruction or data.
$\ddagger \overline{\mathrm{MCE}}$ does not toggle at each rising clock edge.
$\S \overline{M W R}$ goes high at each rising clock edge.
NOTE: This example shows a data write followed by a code read.
FIGURE 19. HOST-INDEPENDENT MODE MSD BUS TIMING, TMS34082A TO MEMORY WITH MEMCFG LOW


NOTE: This example shows multiple data writes. Timing for multiple code writes would be similar. This option for using DS/ $\overline{\mathrm{CS}}$ as data space chip enable and $\overline{M C E}$ as code space chip enable is invoked by setting the MEMCFG bit high in the configuration register. When MEMCFG is high, $\mathrm{DS} / \overline{\mathrm{CS}}$ and $\overline{\mathrm{MCE}}$ rise after every rising clock edge. In this mode, DS/CS and $\overline{\mathrm{MCE}}$ may not both be active (low) at the same time.

FIGURE 20. HOST-INDEPENDENT MODE MSD BUS TIMING, TMS34082A TO MEMORY WITH MEMCFG HIGH

## PARAMETER MEASUREMENT INFORMATION



[^7]FIGURE 21. HOST-INDEPENDENT MODE, MSD ENABLE/DISABLE TIMING WITH MEMCFG LOW


NOTE: This example shows a data write followed by an instruction read. Timing for multiple code writes would be similar. This option for using DS/ $\overline{C S}$ as data space chip enable and $\overline{M C E}$ as code space chip enable is invoked by setting the MEMCFG bit high in the configuration register. When MEMCFG is high, $D S / \overline{C S}$ and $\overline{M C E}$ rise after every rising clock edge. In this mode, $D S / \overline{C S}$ and $\overline{M C E}$ may not both be low at the same time.

FIGURE 22. HOST-INDEPENDENT MODE, MSD BUS ENABLE/DISABLE TIMING WITH MEMCFG HIGH

## PARAMETER MEASUREMENT INFORMATION


$\dagger \overline{\mathrm{MCE}}$ does not toggle at each rising clock edge.
$\ddagger \overline{\text { MOE }}$ goes high at each rising clock edge.
FIGURE 23. HOST-INDEPENDENT MODE, MSD TO LAD BUS TRANSFER TIMING WITH MEMCFG HIGH

## GRAPHICS FLOATING-POINT PROCESSOR

D3150, SEPTEMBER 1988 - REVISED MAY 1991 - SCGS001

$\dagger \mathrm{DS} / \overline{\mathrm{CS}}$ valid for moves to data space; $\overline{M C E}$ valid for moves to code space. Only one would be valid for each move instruction.
NOTE: This option for using DS/CS as data space chip enable and MCE as code space chip enable is involved by setting the MEMCFG bit high in the configuration register.

FIGURE 24. HOST-INDEPENDENT MODE, MSD TO LAD BUS TRANSFER TIMING WITH MEMCF HIGH

TMS34082A GRAPHICS FLOATING-POINT PROCESSOR

$\dagger$ Dotted line shows DS/ $\overline{\mathrm{CS}}$ for MEMCFG high.
$\ddagger$ The $C C$ input is registered on each rising edge of the clock, so the $C C$ bit can be latched one cycle and tested during the next cycle.
FIGURE 25. HOST-INDEPENDENT MODE, MSD BUS TIMING TEST CONDITION (CC) AND BRANCH

## PARAMETER MEASUREMENT INFORMATION


$\dagger$ Dotted line shows DS/ $\overline{\mathrm{CS}}$ for MEMCFG high.
$\ddagger$ Valid for MEMCFG low only. When MEMCFG low, $\overline{\text { COINT }}$ is set high by the set mask instruction, and it remains high until reset with another set mask instruction.
§ The CORDY output is set low by the set mask instruction, and it remains low until reset with another set mask instruction.
FIGURE 26. HOST-INDEPENDENT MODE MSD BUS TIMING, SET/RESET COINT AND CORDY

## PARAMETER MEASUREMENT INFORMATION


$\dagger$ Dotted lines show DS/ $\overline{C S}$ and $\overline{M C E}$ for MEMCFG high.
$\ddagger \overline{\mathbb{N T R}}$ is negative-edged triggered.
NOTE: Interrupts are not granted during multi-cycle instructions. This example shows two interrupt requests. The first is granted immediately; the second, after the first is finished. INTG remains high after an interrupt is granted until interrupts are reenabled or a return from interrupt instruction is executed.

FIGURE 27. HOST-INDEPENDENT MODE, MSD BUS TIMING EXTERNALINTERRUPT TO TMS34082A


NOTE: When either RDY or LRDY is set low and the setup time before CLK high is observed, the device is stalled for one or more clock cycles, until RDY or LRDY is set high again. During a wait state, internal states and status are preserved and output signals do not change. LRDY can be used in this manner only in the host-independent mode.

FIGURE 28. HOST-INDEPENDENT MODE, MSD BUS TIMING WAIT STATE TIMING

## PROGRAMMING INFORMATION

## programming the TMS34082A

The TMS34082A is supported by a software development tool kit, including a C compiler and an assembler. Program development using the tools is described in the TMS34082A tool kit documentation. Information on internal instructions and listing of the external instructions are provided in the following sections.
In both the coprocessor and host-independent modes, the TMS34082A instruction word is 32 bits long. The number, length, and arrangement of fields in the 32 -bit word depends on the operating mode and operation selected. Internal microcode to the TMS34082A is not restricted to the same 32-bit instruction formats so certain internal programs may execute faster than the same operations written with external code can achieve.
In the coprocessor mode, the TMS34082A can execute instructions both from the TMS34020 and from the program memory on the MSD bus (MSD31-0). In the host-independent mode the TMS34082A is controlled from code input on the MSD bus. Internal instructions may be executed in the host-independent mode by performing a jump to the internal address.

## PROGRAMMING INFORMATION

## internal instructions

The TMS34082A FPU performs a wide range of internal arithmetic and logical operations, as well as complex operations (flagged ' $\dagger$ '), summarized below. Complex instructions are multi-cycle routines stored in the internal program ROM.

One-Operand Operations:

Absolute Value
Square Root
Reciprocal ${ }^{\dagger}$
Conversions:
Integer to Single
Integer to Double
Single to Double
Two-Operand Operations:
Add
Subtract
Compare
Matrix Operations:

$$
\begin{array}{ll}
4 \times 4,4 \times 4 \text { Multiply } \dagger & 3 \times 3,3 \times 3 \text { Multiply } \dagger \\
1 \times 4,4 \times 4 \text { Multiply } \dagger & 1 \times 3,3 \times 3 \text { Multiply }
\end{array}
$$

Graphics Operations:
Backface Testing ${ }^{\dagger}$
Polygon Clipping ${ }^{\dagger}$
2-D Linear Interpolation ${ }^{\dagger}$
2-D Window Compare ${ }^{\dagger}$
2-Plane Clipping ( $X, Y, Z)^{\dagger}$
2-D Cubic Spline ${ }^{\dagger}$
Image Processing:
$3 \times 3$ Convolution ${ }^{\dagger}$
Chained Operations:
Polynomial Expansion $\dagger$
1-D Min/Max ${ }^{\dagger}$
Vector Operations:
Add $\dagger$
Subtract ${ }^{\dagger}$
Magnitude ${ }^{\dagger}$
Scaling ${ }^{\dagger}$

1s Complement
2s Complement

Single to Integer
Double to Integer
Double to Single
Multiply
Divide

Polygon Elimination ${ }^{\dagger}$
Viewport Scaling and Conversion ${ }^{\dagger}$
3-D Linear Interpolation ${ }^{\dagger}$
3-D Volume Compare ${ }^{\dagger}$
2-Plane Color Clipping (R,B,G,I) $\dagger$
3-D Cubic Spline ${ }^{\dagger}$

Multiply/Accumulate ${ }^{\dagger}$
2-D Min/Max ${ }^{\dagger}$
Dot Product ${ }^{\dagger}$
Cross Product ${ }^{\dagger}$
Normalization $\dagger$
Reflection ${ }^{\dagger}$

The internal ROM routines may be used in either the coprocessor or host-independent mode. In the coprocessor mode, the internal routines are invoked by TMS34020 instructions to its coprocessor(s).
In the host-independent mode, the internal programs can be called as subroutines by the externally stored code. External programs can call internal routines by executing a jump to subroutine with bit 16 (internal code select) set high and the address of the internal routine as the jump address.

The format of the TMS34082A instruction in the coprocessor mode is shown in Figure 49. The instruction is issued by the TMS34020 via the LAD bus.

[^8]
## PROGRAMMING INFORMATION



FIGURE 29. TMS34082A INSTRUCTION
The 3-bit ID field identifies the coprocessor for which the instruction is intended. This coprocessor ID corresponds to the settings of the CID2-CID0 pins. To broadcast an instruction to all coprocessors, the ID is set to 4 h .

TABLE 5. COPROCESSORID

| ID | COPROCESSOR |
| :---: | :---: |
| 000 | FPU0 |
| 001 | FPU1 |
| 010 | FPU2 |
| 011 | FPU3 |
| 100 | FPU broadcast |
| 101 | Reserved |
| 110 | Reserved |
| 111 | User defined |

Four coprocessor addressing modes are defined for the TMS34082A. The md field indicates the addressing mode.

TABLE 6. ADDRESSING MODES

| MODE | MD FIELD | OPERATION |
| :---: | :---: | :--- |
| 0 | 00 | FPU internal operations with no jump or external moves |
| 1 | 01 | Transfer data to/from TMS34020 registers |
| 2 | 10 | Transfer data to/from memory (controlled by TMS34020) |
| 3 | 11 | External instructions |

The type and size bits identify the type of operand; as shown below in Table 7. The I bit is used to indicate to the TMS34082A that this is a reissue of a coprocessor instruction due to a bus interruption. The least significant four bits are the bus status bits, which will all be zero to indicate a coprocessor cycle.

TABLE 7. OPERAND TYPES

| TYPE | SIZE | OPERAND TYPE |
| :---: | :---: | :--- |
| 0 | 0 | 32-bit integer |
| 0 | 1 | Reserved |
| 1 | 0 | Single-precision floating-point (32-bit) |
| 1 | 1 | Double-precision floating-point (64-bit) |

The ra, rb, and rd fields are for the two sources and destination within the FPU. Register addresses are listed in Table 1. For the ra and rb fields, only the four least significant bits of the register address are used. The ra field may only use the RA register file, C , and CT . The RB field may only use the RB register file, C and CT .

The Floating-Point Unit Operation (fpuop) field is the FPU opcode ( 5 bits) described in Tables 8, 9, and 10.
In the coprocessor mode, the TMS34082A executes user-defined routines (stored in external memory on the MSD bus) by executing a jump to external code. For this instruction, the md field (bits 15-13) is set high and the fpuop field gives the routine number ( $0-31$ ). The TMS34082A multiplies the routine number by two to get the jump address. For example, routine number 14 would have a jump address of 28 decimal or 1 C hex.

## PROGRAMMING INFORMATION

The routines are coded using the external instruction format discussed in the next section. The last instruction should be a jump to internal instruction address OFFFh with the l-bit(internal) set or a return from subroutine instruction. This puts the FPU in an idle state, waiting for the next instruction from the TMS34020.

TABLE 8. COPROCESSOR MODE INSTRUCTIONS

| FPUOP | TMS34020 ASSEMBLER OPCODE | DESCRIPTION |
| :---: | :---: | :---: |
| 00000 | ADDx | Sum of ra and rb, place in rd |
| 00001 | SUBx | Subtract rb from ra, place result in rd |
| 00010 | CMPx | Set status bits on result of ra minus rb |
| 00011 | SUBx | Subtract ra from rb, place result in rd |
| 00100 | ADDAx | Absolute value of sum of ra and rb, place result in rd |
| 00101 | SUBAx | Absolute value of (ra minus rb), place result in rd |
| 00110 | MOVE or MOVx | Load multiple FPU registers from TMS34020 GSP or its memory |
| 00111 | MOVE or MOVx | Save multiple FPU registers to TMS34020 GSP or its memory |
| 01000 | MPYx | Multiply ra and rb, place result in rd |
| 01001 | DIVx | Divide ra by rb, place result in rd |
| 01010 | INVx | Divide 1 by rb, place result in rd |
| 01011 | ASUBAx | Absolute value of ra minus absolute value of rb, place in rd |
| 01100 | reserved |  |
| 01101 | MOVEx | Move ra to rd, multiple, for $n$ registers |
| 01110 | MOVEx | Move rb to rd, multiple, for $n$ registers |
| 01111 | (see Table 10) | Single operand instructions, rb field redefined |
| 10000 | CPWx | Compare point to window (set XLT, XGT, YLT, TGT) |
| 10001 | CPVx | Compare point to volume (set XLT, XGT, YLT, YGT, ZLT, ZGT) |
| 10010 | BACKFx | Test polygon for facing direction (backface test) |
| 10011 | INMNMXx | Setup FPU registers for MNMX1 or MNMX2 instruction |
| 10100 | LINTx | Given [ $\mathrm{X} 1, \mathrm{Y} 1, \mathrm{Z} 1$ ], [ $\mathrm{X} 2, \mathrm{Y} 2, \mathrm{Z} 2$ ], and a plane, find [ $\mathrm{X} 3, \mathrm{Y} 3, \mathrm{Z} 3$ ] |
| 10101 | CLIPFX | Clip a line to a plane pair boundary (start with point 1) |
| 10110 | CLIPRx | Clip a line to a plane pair boundary (start with point 2) |
| 10111 | CLIPCFX | Clip color values to a plane pair boundary (start with point 1) |
| 11000 | SCALEX | Scale and convert coordinates for viewpoint |
| 11001 | MTRANx | Transpose a matrix |
| 11010 | CKVTXx | Compare a polygon vertex to a clipping volume |
| 11011 | CONVx | $3 \times 3$ convolution |
| 11100 | CLIPCRx | Clip color values to a plane pair boundary (start with point 2) |
| 11101 | OUTC3x | Compare a line to a clipping value |
| 11110 | CSPLNx | Calculate cubic spline for given coefficients |
| 11111 | (see Table 11) | Vector and matrix instructions, rb field redefined |

[^9]
## PROGRAMMING INFORMATION

TABLE 9. COPROCESSOR MODE INSTRUCTIONS, FPUOP $=\mathbf{0 1 1 1 1}{ }_{2}$

| RB | TMS34020 ASSEMBLER OPCODE | DESCRIPTION |
| :--- | :--- | :--- |
| 0000 | PASS | Copy ra to rd |
| 0001 | NOT | Place 1s complement of ra in rd |
| 0010 | ABS | Place absolute value of ra in rd |
| 0011 | NEG | Place negated value of ra in rd |
| 0100 | CVDF | Convert double in ra to single in rd (T and S define ra) |
| 0100 | CVFD | Convert single in ra to double in rd (T and S define ra) |
| 0101 | CVDI | Convert double in ra to integer in rd (T and S define ra) |
| 0101 | CVFI | Convert single in ra to integer in rd (T and S define ra) |
| 0110 | CVID | Convert integer in ra to double in rd (T and S define ra) |
| 0110 | CVIF | Convert integer in ra to single in rd (T and S define ra) |
| 0111 | VSCLx | Multiply each component of a velocity by a scaling factor |
| 1000 | SQARx | Place (ra * ra) in rd |
| 1001 | SQRTx | Extract square root or ra, place in rd |
| 1010 | SQRTAx | Extract square root of absolute value of ra, place in rd |
| 1011 | ABORT | Stop execution of any FPU instruction |
| 1100 | CKVTXI | Initialize check vertex instruction |
| 1101 | CHECK | Check for previous instruction completion |
| 1110 | MOVMEM | Move data from system memory to external memory @ MCADDR |
| 1111 | MOVMEM | Move data to system memory from external memory @ MCADDR |

TABLE 10. COPROCESSOR MODE INSTRUCTIONS, FPUOP $=11111_{2}$

| RB | TMS34020 ASSEMBLER OPCODE | DESCRIPTION |
| :---: | :---: | :--- |
| 0000 | POLYx | Polynomial expansion |
| 0001 | MACx | Multiply and accumulate |
| 0010 | MNMX1x | Determine 1-D minimum and maximum of a series |
| 0011 | MNMX2x | Determine 2-D minimum and maximum of a series of pairs |
| 0100 | MMPY0x | Multiply matrix elements $0,1,2,3$ by vector element 0 |
| 0101 | MMPY1x | Multiply matrix elements $4,5,6,7$ by vector element 1 |
| 0110 | MMPY2x | Multiply matrix elements $8,9,10,11$ by vector element 2 |
| 0111 | MMPY3x | Multiply matrix elements $12,13,14,15$ by vector element 3 |
| 1000 | MADDx | Add matrix elements $12,13,14,15$ to vector |
| 1001 | VADDx | Add two vectors |
| 1010 | VSUBx | Subtract a vector from a vector |
| 1011 | VDOTx | Compute scalar dot product of two vectors |
| 1100 | VCROSx | Compute cross product of two vectors |
| 1101 | VMAGx | Determine the magnitude of a vector |
| 1110 | VNORMx | Normalize a vector to unit magnitude |
| 1111 | VRFLCTx | Given normal and incident vectors, find the reflection |

$F$ denotes single-precision, $D$ denotes double-precision floating-point, $x$ denotes operand type, and a blank designates signed integer

## PROGRAMMING INFORMATION

## external instructions

External instructions are 32 bits iong, and their formats (number, length, and function of fields) depend on the operations being selected. Separate formats are provided for data transfers, FPU processing, test and branch operations, and subroutine calls.

Instructions that control FPU operations can select operands from input registers, internal feedback, or from the LAD bus (32-bit operations only). The format for an FPU processing instruction is shown in Figure 50.

| 31 | 28 | 24 |  | 15 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| OP | RA | RB | RD | SEL_OP | INSTRUCTION |

## FIGURE 30. FPU PROCESSING EXTERNAL INSTRUCTION FORMAT

The op field selects the sequencer operation. Three continue instructions are available to permit control of the $\overline{\text { WE and }} \overline{\text { ALTCH }}$ strobe outputs, which enable LAD output in the host-independent mode. The ra, rb, and rd fields are for the two sources and destination in the TMS34082A register file. The sel_op field selects the source of the operands: register file or feedback registers. The instruction field designates the operation to be performed.

External instructions and cycle counts are listed in Table 11. Absolute values of operands or results, negated results, and wrapped number inputs are selectable options. Chained operations, using the multiplier and ALU in parallel, and other instructions to control program flow and move data are included.
External instruction timing depends on the pipeline registers setting, controlled by the PIPES2-1 bits in the configuration register. Most FPU processing instructions (with the exception of divide, square root, and double-precision multiply) execute in one cycle per pipeline stage.

## PROGRAMMING INFORMATION

TABLE 11. EXTERNAL INSTRUCTIONS AND TIMING

| TMS34082A ASSEMBLER OPCODE | DESCRIPTION OF ROUTINE | $\begin{gathered} \text { PIPES2-1 } \\ 11 \end{gathered}$ | $\begin{gathered} \hline \text { PIPES2-1 } \\ 10 \\ \hline \end{gathered}$ | $\begin{gathered} \text { PIPES2-1 } \\ 01 \end{gathered}$ | $\begin{gathered} \text { PIPES2-1 } \\ 00 \end{gathered}$ |
| :---: | :---: | :---: | :---: | :---: | :---: |
| ADD | Add A + B | 1(1) | 2(1) | 2(1) | 3(1) |
| AND | Logical AND A, B | 1(1) | 2(1) | 2(1) | 3(1) |
| ANDNA | Logical AND NOT A, B | 1(1) | 2(1) | 2(1) | 3(1) |
| ANDNB | Logical AND A, NOT B | 1(1) | 2(1) | 2(1) | 3(1) |
| CJMP | Conditional jump | 1(1) | 1(1) | 1(1) | 1(1) |
| CSJR | Conditional jump to subroutine | 1(1) | 1(1) | 1(1) | 1(1) |
| CMP | Compare A, B | 1(1) | 2(1) | 2(1) | 3(1) |
| COMPL | Pass 1s complement of $A$ | 1(1) | 2(1) | 2(1) | 3(1) |
| DIV | Divide A / B SP DP integer | $\begin{gathered} 8(8) \\ 13(13) \\ 16(16) \end{gathered}$ | $\begin{gathered} 8(7) \\ 13(12) \\ 16(15) \end{gathered}$ | $\begin{gathered} 9(7) \\ 15(12) \\ 17(15) \end{gathered}$ | $\begin{gathered} 9(7) \\ 15(12) \\ 17(15) \end{gathered}$ |
| DTOF | Convert from DP to SP | 1(1) | 2(1) | 2(1) | 3(1) |
| DTOI | Convert from DP to integer | $1(1)$ | 2(1) | 2(1) | 3(1) |
| DTOU | Convert from DP to unsigned integer | 1(1) | 2(1) | 2(1) | 3(1) |
| FTOD | Convert from SP to DP | 1(1) | 2(1) | 2(1) | 3(1) |
| FTOI | Convert from SP to integer | 1(1) | 2(1) | 2(1) | 3(1) |
| FTOU | Convert from SP to unsigned integer | 1(1) | 2(1) | 2(1) | 3(1) |
| ITOD | Convert from integer to DP | 1(1) | 2(1) | 2(1) | 3(1) |
| ITOF | Convert from integer to SP | 1(1) | 2(1) | 2(1) | 3(1) |
| LD | Load n words into register <br> SP <br> DP <br> integer | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ |
| LDLCT | Load loop counter with value | 1(1) | 1(1) | 1(1) | 1(1) |
| LDMCADDR | Load MCADDR with value | 1(1) | 1(1) | $1(1)$ | 1(1) |
| MASK | Set programmable mask | 1(1) | 1(1) | 1 (1) | 1(1) |
| MOVA | Move A (no status flags active) | 1(1) | 2(1) | 2(1) | 3(1) |
| MOVLM | Move $n$ words from LAD bus to MSD bus SP <br> DP <br> integer | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ |
| MOVML | Move n words from MSD bus to LAD bus SP <br> DP integer | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ |
| MOVRR | Multiple move, register to register <br> SP <br> DP <br> integer | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ |
| MULT.ADD | ```Multiply A}\mp@subsup{A}{1}{*}\mp@subsup{\textrm{B}}{1}{},\mathrm{ Add A A2 + B2 SP DP integer``` | $\begin{aligned} & 1(1) \\ & 2(2) \\ & 1(1) \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 3(1) \\ & 4(2) \\ & 3(1) \\ & \hline \end{aligned}$ |

DP denotes double-precision, and SP denotes single-precision.

## PROGRAMMING INFORMATION

TABLE 11. EXTERNAL INSTRUCTIONS AND TIMING (Continued)

| TMS34082A ASSEMBLER OPCODE | DESCRIPTION OF ROUTINE | $\begin{gathered} \text { PIPES2-1 } \\ 11 \end{gathered}$ | $\begin{gathered} \text { PIPES2-1 } \\ 10 \end{gathered}$ | $\begin{aligned} & \text { PIPES2-1 } \\ & 01 \end{aligned}$ | $\begin{gathered} \text { PIPES2- } 1 \\ 00 \end{gathered}$ |
| :---: | :---: | :---: | :---: | :---: | :---: |
| MULT.NEG | ```Multiply A1* 的, Subtract 0-A A SP DP integer``` | $\begin{aligned} & 1(1) \\ & 2(2) \\ & 1(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 3(1) \\ & 4(2) \\ & 3(1) \\ & \hline \end{aligned}$ |
| MULT | Multiply A * B SP DP integer | $\begin{aligned} & 1(1) \\ & 2(2) \\ & 1(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 3(1) \\ & 4(2) \\ & 3(1) \\ & \hline \end{aligned}$ |
| MULT.PASS | ```Multiply A }\mp@subsup{A}{1}{*}\mp@subsup{\textrm{B}}{1}{},\mathrm{ Add A A + 0 SP DP integer``` | $\begin{aligned} & 1(1) \\ & 2(2) \\ & 1(1) \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \\ & \hline \end{aligned}$ | 3(1) <br> 4(2) <br> 3(1) |
| MULT.SUB | ```Multiply A1 * B1, Subtract A2 - B2 SP DP integer``` | $\begin{aligned} & 1(1) \\ & 2(2) \\ & 1(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 3(1) \\ & 4(2) \\ & 3(1) \\ & \hline \end{aligned}$ |
| MULT.2SUBA | ```Multiply A1 * B1, Subtract 2- A2 SP DP integer``` | $\begin{aligned} & 1(1) \\ & 2(2) \\ & 1(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \\ & \hline \end{aligned}$ | 3(1) <br> 4(2) <br> 3(1) |
| MULT.SUBRL | ```Multiply A1* B1, Subtract B2- A2 SP DP integer``` | $\begin{aligned} & 1(1) \\ & 2(2) \\ & 1(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 3(1) \\ & 4(2) \\ & 3(1) \end{aligned}$ |
| NEG | Pass - A (2s Complement) | 1(1) | 2(1) | 2(1) | 3(1) |
| NOR | Logical NOR A, B | 1(1) | 2(1) | 2(1) | 3(1) |
| OR | Logical OR A, B | 1(1) | 2(1) | 2(1) | 3(1) |
| PASS | Pass A | 1(1) | 2(1) | 2(1) | 3(1) |
| PASS | Pass B | 1(1) | 2(1) | 2(1) | 3(1) |
| PASS.ADD | ```Multiply A1 * 1, Add A2 + B2 SP DP integer``` | $\begin{aligned} & 1(1) \\ & 2(2) \\ & 1(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 3(1) \\ & 4(2) \\ & 3(1) \end{aligned}$ |
| PASS.NEG | ```Multiply A1* 1, Subtract 0-A2 SP DP integer``` | $\begin{aligned} & 1(1) \\ & 2(2) \\ & 1(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 3(1) \\ & 4(2) \\ & 3(1) \\ & \hline \end{aligned}$ |
| PASS.PASS | ```Multiply A1 * 1, Add A 2 + 0 SP DP integer``` | $\begin{aligned} & 1(1) \\ & 2(2) \\ & 1(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \end{aligned}$ | $\begin{aligned} & 3(1) \\ & 4(2) \\ & 3(1) \\ & \hline \end{aligned}$ |
| PASS.SUB | ```Multiply A1*1, Subtract A2 - B2 SP DP integer``` | $\begin{aligned} & 1(1) \\ & 2(2) \\ & 1(1) \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 3(1) \\ & 4(2) \\ & 3(1) \\ & \hline \end{aligned}$ |
| PASS.2SUBA | ```Multiply A1 * 1, Subtract 2- A2 SP DP integer``` | $\begin{aligned} & 1(1) \\ & 2(2) \\ & 1(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \\ & \hline \end{aligned}$ | 3(1) <br> 4(2) <br> 3(1) |

DP denotes double-precision, and SP denotes single-precision.

## PROGRAMMING INFORMATION

TABLE 11. EXTERNAL INSTRUCTIONS AND TIMING (Continued)

| TMS34082A <br> ASSEMBLER OPCODE | DESCRIPTION of ROUTINE | CYCLE COUNTS |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: |
|  |  | $\begin{gathered} \hline \text { PIPES2-1 } \\ 11 \end{gathered}$ | $\begin{gathered} \hline \text { PIPES2-1 } \\ 10 \end{gathered}$ | $\begin{gathered} \text { PIPES2-1 } \\ 01 \end{gathered}$ | $\begin{gathered} \hline \text { PIPES2-1 } \\ 00 \end{gathered}$ |
| RTS | Return from subroutine | 1(1) | 1(1) | 1(1) | 1(1) |
| SLL | Logical shift left A by B bits | 1(1) | 2(1) | 2(1) | 3(1) |
| SQRT | Square root of $A$ SP DP integer | $\begin{aligned} & 11(11) \\ & 16(16) \\ & 20(20) \end{aligned}$ | $\begin{aligned} & 11(10) \\ & 16(15) \\ & 20(19) \end{aligned}$ | $\begin{aligned} & 12(10) \\ & 17(15) \\ & 21(19 \end{aligned}$ | $\begin{aligned} & 12(10) \\ & 17(15) \\ & 21(19) \end{aligned}$ |
| PASS.SUBRL | ```Multiply A1* 1, Subtract B2 - A2 SP DP integer``` | $\begin{array}{r} 1(1) \\ 2(2) \\ 1(1) \\ \hline \end{array}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 3(1) \\ & 4(2) \\ & 3(1) \\ & \hline \end{aligned}$ |
| SRA | Arithmetic shift right $A$ by $B$ bits | 1(1) | 2(1) | 2(1) | 3(1) |
| SRL | Logical shift right $A$ by $B$ bits | 1(1) | 2(1) | 2(1) | 3(1) |
| ST | Store n words from register SP DP integer | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ |
| SUB | Subtract A - B | 1(1) | 2(1) | 2(1) | 3(1) |
| SUBRL | Subtract B-A | 1(1) | 2(1) | 2(1) | 3(1) |
| UTOD | Convert from unsigned integer to DP | 1(1) | 2(1) | 2(1) | 3(1) |
| UTOF | Convert from unsigned integer to SP | 1(1) | 2(1) | 2(1) | 3(1) |
| UWRAPI | Unwrap inexact operand | 1(1) | 2(1) | 2(1) | 3(1) |
| UWRAPR | Unwrap rounded operand | 1(1) | 2(1) | 2(1) | 3(1) |
| UWRAPX | Unwrap exact operand | 1(1) | 2(1) | 2(1) | 3(1) |
| WRAP | Wrap denormalized operand | 1(1) | 2(1) | 2(1) | 3(1) |
| XOR | Logical exclusive OR A, B | 1(1) | 2(1) | 2(1) | 3(1) |

DP denotes double-precision, and SP denotes single-precision.

## MECHANICAL DATA

## GC pin-grid-array ceramic package

This is a hermetically sealed package.


NOTES: A. Pins are located within $0,13(0.005)$ radius of true position relative to each other at maximum meterial condition and within $0,457(0.018)$ radius of the center of the ceramic.
B. Dimensions do not include solder finish.

## Appendix C

## SMJ34082A Data Sheet

The pinout, electrical specifications timing diagrams, and mechanical specifications are contained within the SMJ34082A Data Sheet and appear in this appendix.

The SMJ34082A is fully characterized over military temperature range.

- Military Temperature Range $\left(-55^{\circ} \mathrm{C}\right.$ to $125^{\circ} \mathrm{C}$ )
- Class B, High-Reliability Processing
- High-Performance Floating-Point RISC Processor Optimized for Graphics
- Two Operating Modes
- Floating-Point Coprocessor for SMJ34020 Graphics System Processor
- Independent Floating-Point Processor
- Direct Connection to SMJ34020 Coprocessor Interface
- Direct Extension to the SMJ34020 Instruction Set
- Multiple SMJ34082A Capability
- Fast Pipelined Instruction Cycle Time
- SMJ34082A-30 . . . 66-ns Coprocessor Mode . . . 65-ns Host-Independent Mode
- SMJ34082A-28 . . . 70-ns Coprocessor Mode . . . 70-ns Host-Independent Mode
- Sustained Data Transfer Rates of 120 Mbytes/s (SMJ34082A-30)
- Sequencer Executes Internal or User-Programmed Instructions
- 22 64-Bit Data Registers
- Comprehensive Floating-Point and Integer Instruction Set
- Internal Programs for Vector, Matrix, and 3-D Graphics Operations
- Full IEEE Standard 754-1985 Compatibility
- Addition, Subtration, Multiplication, and Comparison
- Division and Square Root
- Selectable Data Formats
- 32-Bit Integer
- 32-Bit Single-Precision Floating-Point
- 64-Bit Double-Precision Floating-Point
- External Memory Addressing Capability
- Program Storage (up to 64K Words)
- Data Storage (up to 64K Words)
- 0.8- $\mu \mathrm{m} \mathrm{EPIC}{ }^{\text {TM }}$ CMOS Technology
- High-Performance
- Low Power (< 2 W)


## description

The SMJ34082A is a high-speed graphics floating-point processor implemented in Texas Instruments advanced $0.8-\mu \mathrm{m}$ CMOS technology. The SMJ34082A combines a 16 -bit sequencer and a 3 -operand (source A, source B, and destination) 64-bit Floating-Point Unit (FPU) with 22 64-bit data registers on a single chip. The data registers are organized into two files of ten registers each, with two registers for internal feedback. In addition, it provides an instruction register to control FPU execution, a status register to retain the most recent FPU status outputs, eight control registers, and a two-deep stack (see functional block diagram).
The SMJ34082A is fully compatible with IEEE Standard 754-1985 for binary floating-point addition, subtraction, multiplication, division, square root, and comparison. Floating-point operands can be either in single- or double-precision IEEE format.

In addition to floating-point operations, the SMJ34082A performs 32-bit integer arithmetic, logical comparisons, and shifts. Integer operations may be performed on 32-bit 2s complement or unsigned operands. Integer results are 32-bits long (even for $32 \times 32$ integer multiplication). Absolute value conversions, floating-point to integer conversions, and integer to floating-point conversions are available.

The ALU and the multiplier are closely coupled and can be operated in parallel to perform sums of products or products of sums. During multiply/accumulate operations, both the ALU and the multiplier are active and the registers in the FPU core can be used to feedback products and accumulate sums without tying up locations in register files $A$ and $B$.
When used with the SMJ34020, the SMJ34082A operates in the coprocessor mode. The SMJ34020 can control multiple SMJ34082A coprocessors. When used as a stand-alone or with processors other than the SMJ34020, the SMJ34082A operates in the host-independent mode. The SMJ34082A is fully programmable by the user

EPIC is a trademark of Texas Instruments Incorporated.
and can interface to other processors or floating-point subsystems through its two 32 -bit bidirectional buses. In the coprocessor mode, the TMS340 family tools may be used to develop code for the SMJ34082A. The TMS34082A Software Tool Kit is used to develop code for host-independent mode applications or for external routines in the coprocessor mode.

## pin descriptions

Pin descriptions and grid assignments for the SMJ34082A are given on the following pages. The pin at location D4 has been added for indexing purposes.


SGUS012A - D3592, SEPTEMBER 1990 - REVISED MAY 1991
Pin Grid Assignments

| PIN |  | PIN |  | PIN |  | PIN |  | PIN |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| NO. | NAME | NO. | NAME | NO. | NAME | NO. | NAME | NO. | NAME |
| A1 | NC | B15 | LAD27 | F1 | MSD10 | K15 | RDY | P2 | NC |
| A2 | LAD1 | C1 | MSD4 | F2 | MSD9 | L1 | MSD18 | P3 | MSD29 |
| A3 | LAD3 | C2 | MSD3 | F3 | $V_{C C}$ | L2 | MSD21 | P4 | MSD31 |
| A4 | LAD5 | C3 | MSDO | F13 | CORDY | L3 | MSD23 | P5 | MSA1 |
| A5 | LAD8 | C4 | $V_{S S}$ | F14 | $\overline{\text { ALTCH }}$ | L13 | $\mathrm{V}_{S S}$ | P6 | MSA3 |
| A6 | LAD9 | C5 | $V_{C C}$ | F15 | $\overline{\text { CAS }}$ | L14 | CIDO | P7 | MSA6 |
| A7 | LAD11 | C6 | LAD6 | G1 | MSD13 | L15 | CID2 | P8 | MSA8 |
| A8 | LAD12 | C7 | $V_{S S}$ | G2 | MSD12 | M1 | MSD20 | P9 | MSA10 |
| A9 | LAD13 | C8 | $V_{C C}$ | G3 | MSD11 | M2 | MSD24 | P10 | MSA13 |
| A10 | LAD15 | C9 | $\mathrm{V}_{\text {SS }}$ | G13 | $\overline{W E}$ | M3 | $v_{S S}$ | P11 | MWR |
| A11 | LAD17 | C10 | $V_{C C}$ | G14 | EC1 | M13 | $V_{C C}$ | P12 | $\overline{\text { MOE }}$ |
| A12 | LAD19 | C11 | LAD21 | G15 | ECO | M14 | LCLK1 | P13 | INTG |
| A13 | LAD22 | C12 | $v_{S S}$ | H1 | MSD14 | M15 | LCLK2 | P14 | BUSFLT |
| A14 | LAD24 | C13 | LAD25 | H2 | TDO | N1 | MSD22 | P15 | $\overline{\text { RAS }}$ |
| A15 | NC | C14 | LAD26 | H3 | $V_{S S}$ | N2 | MSD26 | R1 | NC |
| B1 | MSD1 | C15 | LAD29 | H13 | $\mathrm{V}_{S S}$ | N3 | $V_{C C}$ | R2 | MSD27 |
| B2 | NC | D1 | MSD6 | H14 | $\overline{\text { LOE }}$ | N4 | MSD28 | R3 | MSD30 |
| B3 | LADO | D2 | MSD5 | H15 | TDI | N5 | $V_{S S}$ | R4 | MSAO |
| B4 | LAD2 | D3 | MSD2 | J1 | MSD15 | N6 | $V_{C C}$ | R5 | MSA2 |
| B5 | LAD4 | D4 | NC | J2 | MSD16 | N7 | MSA5 | R6 | MSA4 |
| B6 | LAD7 | D13 | $\mathrm{V}_{\text {cc }}$ | J3 | $\mathrm{V}_{\mathrm{CC}}$ | N8 | $V_{S S}$ | R7 | MSA7 |
| B7 | LAD10 | D14 | LAD28 | J13 | CC | N9 | $V_{C C}$ | R8 | TCK |
| B8 | TMS | D15 | LAD31 | J14 | MSTR | N10 | MSA14 | R9 | MSA9 |
| B9 | LAD14 | E1 | MSD8 | J15 | CLK | N11 | VSS | R10 | MSA11 |
| B10 | LAD16 | E2 | MSD7 | K1 | MSD17 | N12 | MAE | R11 | MSA12 |
| B11 | LAD18 | E3 | $v_{S S}$ | K2 | MSD19 | N13 | LRDY | R12 | MSA15 |
| B12 | LAD20 | E13 | $V_{S S}$ | K3 | $V_{S S}$ | N14 | SF | R13 | DS/CS |
| B13 | LAD23 | E14 | LAD30 | K13 | CID 1 | N15 | RESET | R14 | MCE |
| B14 | NC | E15 | $\overline{\text { COINT }}$ | K14 | INTR | P1 | MSD25 | R15 | NC |

logic symbol ${ }^{\dagger}$

$\dagger$ This symbol is in accordance with ANSI/IEEE Std 91-1984.
functional block diagram


Terminal Functions

| PIN |  | 1/0t | DESCRIPTION |
| :---: | :---: | :---: | :---: |
| NAME | NO. |  |  |
| $\overline{\text { ALTCH }}$ | F14 | $\begin{gathered} 1 \\ {[0]} \end{gathered}$ | Address Latch, active low. In the coprocessor mode, falling edge of ALTCH latches instruction and status present on the LAD bidirectional bus (LAD31-0). In the host-independent mode, $\overline{\mathrm{ALTCH}}$ is address output strobe for memory accesses on LAD31-0. |
| BUSFLT | P14 | 1 | Bus Fault. Inthe coprocessormode, BUSFLThigh indicates adatafaulton the LAD bus (LAD31-0) during current bus cycle, which in turn causes SMJ34082A not to capture current data on LAD bus. Tied low if not used or in the host-independent mode. |
| $\overline{\text { CAS }}$ | F15 | $\begin{gathered} 1 \\ {[0]} \end{gathered}$ | Column Address Strobe, active low. Inthe coprocessor mode, causes SMJ34082A to latch LAD bus data when CAS has a low-to-high transition if LRDY was high and BUSFLT was low at the previous LCLK2 rising edge. In the host-independent mode, this signal is the read strobe output. |
| CC | $J 13$ | 1 | Condition Code Input. In both modes, may be used as an external conditional input for branch conditions. |
| $\begin{aligned} & \text { CID0 } \\ & \text { CID1 } \\ & \text { CID2 } \end{aligned}$ | $\begin{aligned} & \mathrm{L} 14 \\ & \mathrm{~K} 13 \\ & \mathrm{~L} 15 \end{aligned}$ | 1 | Coprocessor ID. In the coprocessor mode, used to set a coprocessor ID so that a SMJ34020 Graphics System Processor controlling multiple SMJ34082A coprocessors can designate which coprocessor is being selected by the current instruction. Tied low in the host-independent mode. |
| CLK | J15 | 1 | System Clock. In the coprocessor mode, tied low. In the host-independent mode, input is the system clock. |
| COINT | E15 | 0 | Coprocessor Interrupt Request, active low. In the coprocessor mode, signals an exception not masked out in the configuration register. Remains low until the status register is read. In the host-independent mode, user programmable I/O when LADCFG is low. When LADCFG is high, designates bus cycle boundaries on LAD31-0. |
| CORDY | F13 | 0 | Coprocessor Ready. In the coprocessor mode, if the SMJ34020 sends an instruction before the SMJ34082A has completed a previous instruction, this signal goes low to indicate that the SMJ34020 should wait. In the host-independent mode, user programmable. |
| DS/CS | R13 | 0 | Data Space/Code Space. In both modes, when MEMCFG is low and DS/CS is low, selects program memory on MSD port. When MEMCFG is low and DS/CS is high, selects data memory on MSD port. When MEMCFG is high, DS/CS is memory chip select, active low. |
| $\begin{aligned} & \text { EC0 } \\ & \text { EC1 } \end{aligned}$ | $\begin{aligned} & \text { G15 } \\ & \text { G14 } \end{aligned}$ | 1 | Emulator Mode Control and Test. In both modes, tied high for normal operation. |
| INTG | P13 | 0 | Interrupt Grant Output. In the coprocessor mode, INTG is low. In the host-independent mode, this signal is set high to acknowledge an interrupt request input. |
| INTR | K14 | 1 | Interrupt Request Input, active low. In the coprocessor mode, $\overline{\text { NTR }}$ is tied high. Inthe host-independent mode, causes call to subroutine address in interrupt vector register. |

T The [ ]'s denote the type of buffer ut
ilized in the host-independent mode. If no [ ]'s appear, the buffer type is identical for both modes of operation.

## Terminal Functions (Continued)

|  |  |  |  |
| :---: | :---: | :---: | :---: |
| NAME | NO. |  |  |
| LADO | B3 |  |  |
| LAD1 | A2 |  |  |
| LAD2 | B4 |  |  |
| LAD3 | A3 |  |  |
| LAD4 | B5 |  |  |
| LAD5 | A4 |  |  |
| LAD6 | C6 |  |  |
| LAD7 | B6 |  |  |
| LAD8 | A5 |  |  |
| LAD9 | A6 |  |  |
| LAD10 | B7 |  |  |
| LAD11 | A7 |  |  |
| LAD12 | A8 |  |  |
| LAD13 | A9 |  |  |
| LAD14 | B9 |  | Local Address and Data Bus. In the coprocessor mode, used by SMJ34020 to input instructions and |
| LAD15 | A10 | 1/0 | data operands to SMJ34082, and used by SMJ34082A to output results. In the host-independent mode, |
| LAD16 | B10 | $1 /$ | used by the SMJ34082A for address output and data I/O. |
| LAD17 | A11 |  |  |
| LAD18 | B11 |  |  |
| LAD19 | A12 |  |  |
| LAD20 | B12 |  |  |
| LAD21 | C11 |  |  |
| LAD22 | A13 |  |  |
| LAD23 | B13 |  |  |
| LAD24 | A14 |  |  |
| LAD25 | C13 |  |  |
| LAD26 | C14 |  |  |
| LAD27 | B15 |  |  |
| LAD28 | D14 |  |  |
| LAD29 | C15 |  |  |
| LAD30 | E14 |  |  |
| LAD31 | D15 |  |  |
| LCLK1 LCLK2 | $\begin{aligned} & \text { M14 } \\ & \text { M15 } \end{aligned}$ | 1 | LocalClocks 1 and2. In the coprocessor mode, two local clocks generated by the SMJ34020,90 degrees out of phase, to provide timing inputs to SMJ34082A. In the host-independent mode, tied low. |
| $\overline{L O E}$ | H14 | 1 | Local Bus Output Enable, active low. In both modes, enables the local bus (LAD31-0) to be driven at the proper times when low. In addition during the host-independent mode when LADCFG is low, does not affect $\overline{A L T C H}, \overline{\mathrm{CAS}}, \overline{W E}, \mathrm{CORDY}$, or $\overline{\mathrm{COINT}}$. When LADCFG is high, $\overline{\mathrm{ALTCH}}, \overline{\mathrm{COINT}}$, and CORDY are not disabled by $\overline{L O E}$ high; $\overline{C A S}$ and $\overline{W E}$ are disabled. |
| LRDY | N13 | 1 | Local Bus Data Ready. In the coprocessor mode, when LRDY is high, indicates that data is available on LAD bus. When LRDY is low, indicates that the SMJ34082A should not load data from LAD31-0 and may also be used in conjunction with BUSFLT. In the host-independent mode, when LRDY is low, the device is stalled until LRDY is set high again and tied high if not used. |
| $\overline{\mathrm{MAE}}$ | N12 | 1 | Memory Address and Data Output Enable, active low. In both modes, with MAE low, the SMJ34082A can output an address on MSA15-0 and data on MSD31-0. MAE high does not disable DS/CS, $\overline{M C E}, \overline{M W R}$, or MOE. |
| $\overline{\mathrm{MCE}}$ | R14 | 0 | Memory Chip Enable. In both modes, when MEMCFG low, active (low) indicates access to external memory on MSD31-0. When MEMCFG is high, MCE low is external code memory chip select. |
| MOE | P12 | 0 | Memory Output Enable, active low. In both modes when low, enables output from external memory on to MSD port. |

Terminal Functions (Continued)

|  |  |  |  |
| :---: | :---: | :---: | :---: |
| NAME | NO. | 1/O | DESCRIPTION |
| MSAO | R4 | 0 | Memory Address output. In both modes, addresses upto 64 K words of external programmemory and/or up to 64 K words of data memory on the MSD port, depending on setting of DS/CS select. |
| MSA1 | P5 |  |  |
| MSA2 | R5 |  |  |
| MSA3 | P6 |  |  |
| MSA4 | R6 |  |  |
| MSA5 | N7 |  |  |
| MSA6 | P7 |  |  |
| MSA7 | R7 |  |  |
| MSA8 | P8 |  |  |
| MSA9 | R9 |  |  |
| MSA10 | P9 |  |  |
| MSA11 | R10 |  |  |
| MSA12 | R11 |  |  |
| MSA13 | P10 |  |  |
| MSA14 | N10 |  |  |
| MSA15 | R12 |  |  |
| MSD0 | C3 |  |  |
| MSD1 | B1 |  |  |
| MSD2 | D3 |  |  |
| MSD3 | C2 |  |  |
| MSD4 | C1 |  |  |
| MSD5 | D2 |  |  |
| MSD6 | D1 |  |  |
| MSD7 | E2 |  |  |
| MSD8 | E1 |  |  |
| MSD9 | F2 |  |  |
| MSD10 | F1 |  |  |
| MSD11 | G3 |  |  |
| MSD12 | G2 |  |  |
| MSD13 | G1 |  |  |
| MSD14 | H1 |  |  |
| MSD15 | J1 | I/O | External Memory Data. In both modes, I/Os to external memory. Used to read from or write to external |
| MSD16 | J2 | 1/O | data or program memory on the MSD port. |
| MSD17 | K1 |  |  |
| MSD18 | L1 |  |  |
| MSD19 | K2 |  |  |
| MSD20 | M1 |  |  |
| MSD21 | L2 |  |  |
| MSD22 | N1 |  |  |
| MSD23 | L3 |  |  |
| MSD24 | M2 |  |  |
| MSD25 | P1 |  |  |
| MSD26 | N2 |  |  |
| MSD27 | R2 |  |  |
| MSD28 | N4 |  |  |
| MSD29 | P3 |  |  |
| MSD30 | R3 |  |  |
| MSD31 | P4 |  |  |
| MSTR | J14 | 1 | Host-Independent/Coprocessor Mode Select. In the coprocessor mode, MSTR must be tied low to operate properly. In the host-independent mode, MSTR must be tied high to operate properly. |
| MWR | P11 | 0 | Memory Write Enable. In both modes, when low, data on MSD31-0 can be written to external program or data memory. |

Texas

Terminal Functions (Continued)

| PIN |  | v/0t | DESCRIPTION |
| :---: | :---: | :---: | :---: |
| NAME | No. |  |  |
| NC | A1 <br> A15 <br> B2 <br> B14 <br> D4 <br> P2 <br> R1 <br> R15 |  | No Internal Connection. These pins should be left floating. |
| $\overline{\text { RAS }}$ | P15 | 1 | Row Address Strobe, active low. In the coprocessor mode, $\overline{\text { RAS }}$ is high during all of coprocessor instruction cycle. In the host-independent mode, it is not used. |
| RDY | K15 | 1 | Ready. In both modes, when RDY is low, it causes a nondestructive stall of sequencer and floating-point operations. All internal registers and status in the FPU core are preserved. Also, no output lines will change state. |
| RESET | N15 | 1 | Reset, active low. In both modes, resets sequencer output and clears pipeline registers, internal states, status, and exception disable registers in FPU core. Other registers are unaffected. |
| SF | N14 | 1 | Special Function Input. In the coprocessor mode when SF is high, indicates the LAD bus input is an instruction or data from SMJ34020 registers. When SF is low, indicates the LAD input is a data operand from memory. In the host-independent mode, not used. |
| TCK | R8 | 1 | Test Clock for JTAG four-wire boundary scan. in both modes, TCK is low for normal operation. |
| TDI | H15 | 1 | Test Data Input for JTAG four-wire boundary scan. In both modes, TDI may be left floating. |
| TDO | H2 | 0 | Test Data Output for JTAG four-wire boundary scan |
| TMS | B8 | 1 | Test Mode Select for JTAG four-wire boundary scan. In both modes, SMJ may be leff floating. |
| $V_{C C}$ | C5 C8 C10 D13 F3 J3 M13 N3 NB N9 | 1 | 5-V Power Supply. All pins must be connected and used. |
| $\mathrm{V}_{S S}$ | C4 <br> C7 <br> C9 <br> C12 <br> E3 <br> E13 <br> H3 <br> H13 <br> K3 <br> L13 <br> M3 <br> N5 <br> N8 <br> N11 | 1 | Ground Pins. All pions must be connected and used. |
| $\overline{W E}$ | G13 | $\begin{gathered} \hline 1 \\ {[0]} \end{gathered}$ | Write Enable, active low. In the coprocessor mode, the write strobe from the SMJ34020 to enable a write to or from the SMJ34082A LAD bus. In the host-independent mode, the SMJ34082A write strobe output. |

[^10]D3592, SEPTEMBER 1990 - REVISED MAY 1991 - SGUS012A

## data flow

The SMJ34082A has two bidirectional 32-bit buses, LAD31-0 and MSD31-0. Each bus can be used to pass instructions and data operands to the FPU core and to output results. A separate 16-bit bus, MSA15-0, provides memory addressing capability on the MSD bus.

When the SMJ34082A is used as a coprocessor for the SMJ34020 Graphics System Processor (GSP), data for the SMJ34082A can be transferred through the 32-bit bidirectional data bus (LAD31-0) and may be passed to any internal registers or to external memory on the memory expansion interface (MSD31-0). When the SMJ34082A is used as a standalone FPU, it can use both the LAD bus (LAD31-0) and the MSD bus (MSD31-0) to interface with external data memory or system buses.

In the host-independent mode, the SMJ34082A can be operated with the LAD bus as its single data bus and the MSD bus as the instruction source, or with data storage on either port and the program memory on the MSD bus.

The data space/code space (DS/ $\overline{\mathrm{CS}}$ ) output can be used to control access either to data memory or program memory on the MSD port. Up to 64 K words of code space and 64 K words of data space are directly supported. In the coprocessor mode, both instructions and data are transferred on the LAD bus with the option of accessing external user-generated programs on the MSD port.

One 32-bit operand can be input to the data registers each clock cycle. A 64-bit double-precision floating-point operand is input in two cycles. Transfers to or from the data registers can normally be programmed as block moves, loading one or more sets of operands with a single move instruction to minimize I/O overhead. Several modes for moving operands and instructions are available. Block transfers up to 512 words between the LAD and MSD buses can be programmed in either direction.

To permit direct input to or output from the LAD bus in the host-independent mode, other options for controlling the LAD bus have been implemented. When two 32-bit operands are being selected for input to the FPU core, one operand may be selected from LAD. On output from the FPU, a result may simultaneously be written to a register and to the LAD bus.

During initialization in the host-independent mode, a bootstrap loader can bring 6532 -bit words from the LAD bus and write them out to external program memory on the MSD bus, after which the device begins executing from the first memory location (zero). The first word is loaded into the configuration register. This option facilitates the initial loading of program memory on the MSD port upon power-up.

## architecture

Because the sequencer, control and data registers, and FPU core are closely coupled, the SMJ34082A can execute a variety of complex floating-point or integer calculations rapidly, with a minimum of external data transfers. The internal architecture of the FPU core supports concurrent operation of the multiplier and the ALU, providing several options for storing or feeding back intermediate results. Also, several special registers are available to support specific calculations for graphics algorithms. Each of the main architectural elements of the SMJ34082A is discussed below.
The control functions of the SMJ34082A are provided by sequence control logic, register control logic, and bus interface control logic, together with user-programmed configuration settings stored in the configuration register. The on-board sequencer selects the next program execution address, either from internal code or from external program memory. Next-address sources include the program counter, stack, interrupt vector register, interrupt return register, or address register (for indirect jumps).
COUNTX, COUNTY, and MIN-MAX/LOOPCT registers are used for temporary storage by internal graphics routines. They may also serve as temporary storage for the user.

A separate FPU status register is provided, which can be used by test-and-branch instructions to control program execution. Because of the large number of status outputs, branches on status can be easily programmed. The status register contents are also important when dealing with status exceptions including such conditions as overflow, underflow, invalid operations (divide by zero), or illegal data formats such as infinity, Not a Number ( NaN ), or denormalized operands.

Register control logic permits all data and control registers to be accessed in accordance with applicable architectural restrictions. Register files $A$ and $B$ can be written to or read from the external buses, as can the control registers. Internal registers C and CT are embedded in the FPU core and can only be accessed by the FPU internal buses. The C and CT registers cannot be used as sources or destinations for MOVE instructions, and several registers (listed in Table 1) are not available as sources for FPU operations.

Table 1. linternal Registers

| REGISTER ADDRESS | REGISTER NAME | RESTRICTIONS ON USE |
| :---: | :---: | :---: |
| 00000 | RAO |  |
| 00001 | RA1 |  |
| 00010 | RA2 |  |
| 00011 | RA3 |  |
| 00100 | RA4 |  |
| 00101 | RA5 |  |
| 00110 | RA6 |  |
| 00111 | RA7 |  |
| 01000 | RA8 |  |
| 01001 | RA9 |  |
| 01010 | $\mathrm{c}^{\dagger}$ | Not a source or destination for moves |
| 01011 | CT ${ }^{\top}$ | Not a source or destination for moves |
| 01100 | STATUS | Not a source for FPU instructions |
| 01101 | CONFIG | Not a source for FPU instructions |
| 01110 | COUNTX | Not a source for FPU instructions |
| 01111 | COUNTY | Not a source for FPU instructions |
| 10000 | RBO |  |
| 10001 | RB1 |  |
| 10010 | RB2 |  |
| 10011 | RB3 |  |
| 10100 | RB4 |  |
| 10101 | RB5 |  |
| 10110 | RB6 |  |
| 10111 | RB7 |  |
| 11000 | RB8 |  |
| 11001 | RB9 |  |
| 11010 | VECTOR | Not a source for FPU instructions |
| 11011 | MCADDR | Not a source for FPU instructions |
| 11100 | SUBADDO | Not a source for FPU instructions |
| 11101 | SUBADD1 | Not a source for FPU instructions |
| 11110 | IRAREG | Not a source for FPU instructions |
| 11111 | MIN-MAX/LOOPCT | Not a source for FPU instructions |

${ }^{T} C$ and CT registers cannot both be used for FPU operand sources in the same instruction.

## GRAPHICS FLOATING-POINT PROCESSOR

D3592, SEPTEMBER 1990 - REVISED MAY 1991 - SGUS012A

## register files A and B, feedback registers C and CT

SMJ34082A contains two register files, each with ten 64-bit registers and two 64-bit feedback registers. Most instructions will operate on one value from each of the RA and RB register files and retum the result to either the RA or RB files or one of the feedback registers.
When the ONEFILE control bit is high in the configuration register, data written to a register in file RA is simultaneously written to the corresponding location in file RB. In this mode, the two register files act as a ten-word, two-read/one-write register file.


Figure 1. Data Registers
Two 64-bit feedback registers, C andCT, are embedded in the FPU core. FPU instructions may use the feedback registers as one of the operands, but the registers cannot be accessed for external moves. The C and CT registers can be used as either the $A$ or $B$ operand, but both cannot be used as operands during the same instruction. However, C (or CT) may be used for more than one operand in the same instruction. For example, $C+C T$ is not a valid instruction, but $C+C$ is.

The CT feedback register is used in integer divide operations as a temporary holding register. Any data stored in CT will be lost during an integer divide.

## Internal control/status register definitions

## configuration register definition

The configuration register (CONFIG) is a special 32-bit register that the user loads to configure the SMJ34082A for exception handling, IEEE mode (vs. fast mode), rounding modes, and data-fetch operations. The configuration register is initialized to 'FFE00420' hex.

Table 2. Configuration Register Definition

| BIT NO. | NAME | DESCRIPTION |
| :---: | :---: | :---: |
| 31 | MIVAL | Multiplier invalid operation (l) exception mask. Initialized to 1 (enabled). |
| 30 | MOVER | Multiplier overflow (V) exception mask. Initialized to 1 (enabled). |
| 29 | MUNDER | Multiplier underflow (U) exception mask. Initialized to 1 (enabled). |
| 28 | MINEX | Multiplier inexact ( $X$ ) exception mask. Initialized to 1 (enabled). |
| 27 | MDIVO | Divide by zero (DIVO) exception mask. Initialized to 1 (enabled). |
| 26 | MDENORM | Multiplier denormal (DENORM) exception mask. Initialized to 1 (enabled). |
| 25 | AIVAL | ALU invalid operation (I) exception mask. initialized to 1 (enabled). |
| 24 | AOVER | ALU overflow (V) exception mask. Initialized to 1 (enabled). |
| 23 | AUNDER | ALU underflow ( $U$ ) exception mask. Initialized to 1 (enabled). |
| 22 | AINEX | ALU inexact ( $X$ ) exception mask. Initialized to 1 (enabled). |
| 21 | ADENORM | ALU denormal (DENORM) exception mask. Initialized to 1 (enabled). |
| 11-20 | N/A | Reserved, set to all Os. |
| 10 | REVISION | Revision number, read only. Set to 1. |
| 9 | LADCFG | When low, $\overline{C A S}, \overline{W E}$, CORDY, $\overline{C O I N T}$, and $\overline{A L T C H}$ are active signals not affected by LOE. When high, $\overline{\text { LOE }}$ high places $\overline{\mathrm{CAS}}$ and $\overline{\mathrm{WE}}$ in high impedance, as well as the LAD bus. COINT, which defines the LAD cycle boundaries, is controlled by bit 1 of the LAD move instruction instead of the set mask instruction. COINT will remain high unless a LAD move instruction (with bit 1 high) is in progress. The setting of this bit has no effect in the coprocessor mode. Initialized to 0 . |
| 8 | MEMCFG | When high, $\overline{M C E}$ becomes code space chip enable and DS/CS becomes data space chip enable (eliminates need for external inverter). When low, $\overline{M C E}$ is chip select for external code and data space. DS/CS functions as an address bit which selects code space (when low) or data space (when high). Initialized to 0. |
| 7 | N/A | Reserved for later use. Initialized to 0 . Must be loaded with 0 . |
| 6 | ONEFILE | When high, causes simultaneous write to both register files (for example, to both RAO and RBO at once). The register files act as a single two-read, one-write register file. Initialized to 0. |
| 5 | PIPES2 | When high, makes FPU output registers transparent. When low, registers are enabled. Initialized to 1. |
| 4 | PIPES1 | When high, makes FPU internal pipeline registers transparent. When low, registers are enabled. Initialized to 0. |
| 3 | FAST | When high,fast mode is selected (all denormalized inputs and outputs are 0 ). When low, IEEE mode is selected. Initialized to 0. |
| 2 | LOAD | Load order. $0=M \mathrm{MSH}$, then LSH; $1=\mathrm{LSH}$, then MSH. Initialized to 0 . |
| 1 | RND1 | Rounding mode select 1. Initialized to 0. |
| 0 | RNDO | Rounding mode select 0 . Initialized to 0 . |

LSH denotes least-significant half of a 64 -bit word, MSH denotes most-significant half of a 64 -bit word.
The mask bits serve as exception detect enables for the exception masks listed above. Setting the bit high (logic ' 1 ') enables the detection of the specific exception. When an enabled exception occurs, the ED bit in the status register will be set high and can be used to generate interrupts. The fast bit allows the SMJ34082A to control the handling of denormalized numbers. When the fast bit is set high, all denormalized numbers input to the device are flushed to zero, and all denormalized results are also flushed to zero (this is also called 'sudden underflow'). When the fast bit is low, IEEE mode is selected. Denormalized numbers may be generated by (or input to) the ALU. Denormalized numbers must first be wrapped before being used as operands for multiply or divide instructions.
The LOAD bit defines the expected order ofdouble-precision operands. At reset, this bit will default to 0 indicating that the most significant 32 bits are transferred first. If the bit is set to a 1 , then the expected order of 64 -bit data transfers starts with the least significant 32 bits.

The RND0 and RND1 bits select the IEEE rounding mode, as shown in Table 3.

Table 3. Rounding Mode

| RND1 - RNDO | ROUNDING MODES |
| :---: | :--- |
| 00 | Round towards nearest |
| 01 | Round toward zero (truncated) |
| 10 | Round towards infinity (round up) |
| 11 | Round towards negative infinity (round down) |

## status register definition

The floating-point status register (STATUS) is a 32-bit register used for reporting the exceptions that occur during SMJ34082A operations and status codes set by the results of implicit and explicit compare operations. The status register is cleared upon reset, except for the INTENED flag, which is set to 1 in the coprocessor mode.

Table 4. Status Register Definition

| BIT NO. | NAME | DESCRPTION |
| :---: | :---: | :---: |
| 31 | N | Sign bit ( $A<B$ flag for compare) |
| 30 | GT | $A>B$ (valid on compare) |
| 29 | Z | Zero flag ( $A=B$ for compare) |
| 28 | V | IEEE overflow flag. The result is greater than the largest allowable value for the specified format. |
| 27 | 1 | IEEE invalid operation flag. A NaN has been input to the multiplier or the ALU, or an invalid operation [(0*1) or $(\infty-\infty)$ or $(-\infty+\infty)$ ] has been requested. This signal also goes high if arroperation involves the square root of a negative number. When IVAL hoes high, the STX pins indicate which port had the NaN. |
| 26 | U | IEEE underflow flag. The result is inexact and less than the minimum allowable value for the specified format. In fast mode, this condition causes the result to go to zero. |
| 25 | $X$ | IEEE inexact flag. The result of an operation is inexact. |
| 24 | DIV0 | Divide by zero. An invalid operation involving a zero divisor has been detected by the multiplier. |
| 23 | RND | The mantissa of a number has been increased in magnitude by rounding. If the number generated was wrapped, then the 'unwrap rounded' instruction must be used to properly unwrap the wrapped number. |
| 22 | DENIN | Input to the multiplier is a denormalized number. When DENIN goes high, the STX pins indicate which port has the denormal input. |
| 21 | DENORM | The multiplier output is wrappednumber or the ALUoutput is a denormalized number. In fast mode, this condition causes the result to go to zero. It also indicates an invalid integer operation with a negative unsigned integer result. |
| 20 | STX1 | A NaN or a denormalized number has been input on the A port. |
| 18 | STXO | A NaN or a denormalized number has been input on the B port. |
| 18 | ED | Exception detect status signal representing logical OR of all enabled exceptions in the configuration register. |
| 17 | UNORD | The two inputs of a comparison operation are unordered, i.e.; one or both of the inputs is an NaN. |
| 16 | INTFLG | Software interrupt flag. Set by external code to signal a software interrupt. |
| 15 | INTENHW | Hardware interrupt (INTR) enable, active high (initialized to zero) |
| 14 | NXOROV | N (negative) XOR V (overflow) |
| 13 | VANDZB | $V$ (overflow) AND $\bar{Z}$ (NOT zero) |
| 12 | INTENED | ED interrrupt enable, active high (initialized to zero in the host-independent mode, one in the coprocessor mode) |
| 11 | INTENSW | Software interrupt (INTFLG) enable, active high (initialized to zero) |
| 10 | ZGT | $\mathrm{Zn}>\mathrm{Zmax}$ (valid for 2-D MIN-MAX instruction) |
| 9 | ZLT | $\mathrm{Zn}<\mathrm{Zmin}$ (valid for 2-D MIN-MAX instruction) |
| 8 | YGT | Yn $>$ Ymax (valid for 1-D or 2-D MIN-MAX instruction) |
| 7 | YLT | $\mathrm{Y}_{\mathrm{n}}<\mathrm{Y}_{\mathrm{min}}$ (valid for 1-D or 2-D MIN-MAX instruction) |
| 6 | XGT | $\mathrm{X} n>\times \mathrm{max}$ (valid for 1-D or 2-D MIN-MAX instruction) |
| 5 | XLT | Xn < X min (valid for 1-D or 2-D MIN-MAX instruction) |
| 4 | HINT | Hardware interrupt flag |
| 3-0 | N/A | Reserved |

indirect address register (MCADDR) definition
The indirect address register (MCADDR) can be set to point to a memory location for indirect move or jump operations through the MSD port. MCADDR is cleared upon reset.


Figure 2. Indirect Address Definition
The function of bit 16 varies, depending on whether the instruction is a MOVE or JUMP. During a MOVE instruction, bit 16 selects data space when set high, or code space when low. During a JUMP instruction, bit 16 selects an internal instruction when set high, or an external instruction when low.

## stack registers (SUBADD1-SUBADD0) definition

The stack contains two subroutine return address registers, SUBADD0 and SUBADD1, which serves as a two-deep LIFO (last-in, first-out) stack. A subroutine jump causes the program counter to be pushed onto the stack, and a return from subroutine pops the last address pushed on the stack. More than two pushes will overwrite the contents of SUBADD1.

Bit 31 (Pointer) is set high in the stack location that was written last and reset to zero in the other stack location. Setting bit 30 (Enable) high enables a write into bit 31 (set or reset the pointer) in either stack location. If bit 31 is zero in both SUBADD0 and SUBADD1 (as when the stack has been saved externally and later restored), SUBADDO can be designated as top of stack by setting bit 31. The stack pointers (bit 31) are cleared upon reset.
Bit $16(1)$ is sethigh when the address in a stack location points to an internal routine, or set low when the address is for an external instruction.

| 31 |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| $P$ | $E$ | $X$ | $X$ | $X$ | $X$ | $X$ | $X$ | $X$ | $X$ | $X$ | $X$ | $X$ | $X$ | $X$ | 1 |  |
| $\mathbf{P}$ | $E$ | $X$ | $X$ | $X$ | $X$ | $X$ | $X$ | $X$ | $X$ | $X$ | $X$ | $X$ | $X$ | $X$ | 1 |  |

Figure 3. Stack Definition

## interrupt vector register (VECTOR) definition

The interrupt vector register (VECTOR) serves as a pointer to an external program to be executed upon receipt of an interrupt. Bit 16 (I) is always set low to point to a routine in external code space. The interrupt vector is cleared on reset.


Figure 4. Interrupt Vector Definition

## interrupt return register (IRAREG) definition

The interrupt return register (IRAREG) retains a copy of the program counter at the time of an external interrupt. This address is used as the next execution address upon returning from the interrupt. Bit $16(\mathrm{I})$ is set high when the address in the stack location points to an internal instruction, or set low when the address is for an external instruction. This register is not affected by the reset signal.


Figure 5. Interrupt Return Definition

## COUNTX and COUNTY registers definition

The counter registers (COUNTX, COUNTY) are used to store the current counts of the minimum and maximum values when executing MIN-MAX instructions. COUNTX and COUNTY are cleared on reset.

| 31 | 16 |  |
| :--- | :---: | :---: | :---: |
|  | COUNT FOR MAX VALUE | COUNT FOR MIN VALUE |

Figure 6. COUNTY and COUNTX Register Definition
The COUNTX register is updated on both the 1-D and 2-D MIN-MAX instruction such that the count of the current minimum value is in the lower 16 bits of the register and the count of the current maximum value is in the upper 16 bits. The COUNTY register is used only in the 2-D MIN-MAX instruction to keep track of the counts of the minimum and maximum for the second value of a pair. The COUNTX and COUNTY registers may also be used for temporary storage when not using the MIN-MAX instructions.

## MIN-MAX/LOOPCT register

The MIN-MAX/LOOPCT register stores the current values of two separate counters. The LSH contains the current loop counter, and the MSH is used to hold the current minimum or maximum value of a MIN-MAX operation. The MIN-MAX/LOOPCT register is cleared upon reset. The MIN-MAX/LOOPCT register may also be used for temporary storage when not using the MIN-MAX instructions.

COUNT FOR MIN-MAX VALUE
LOOP COUNT

Figure 7. MIN-MAX/LOOPCT Register Definition

## FPU core

The FPU core itself consists of a multiplier and an ALU, each with an intermediate pipeline register and an output register (see Figure 8, FPU core functional block diagram). Four multiplexers select the multiplier and ALU operands from the data registers, feedback registers, or previous multiplier or ALU result. Results are directed either to the internal feedback registers (C or CT), the 20 data registers in register files RA and RB, or the ten other miscellaneous registers.

Both the internal pipeline registers and the output registers can be enabled or made transparent (disabled) by setting the PIPES2-PIPES1 bits in the configuration register. When the device is powered up, the default settings of the internal registers are PIPES2 high (output registers transparent) and PIPES1 low (internal pipeline registers enabled).

When the FPU core is used for chained operations, the multiplier and ALU operate in parallel. Two data inputs are provided from the RA and RB input registers, while multiplier and ALU feedback are used as the other two operands. While in the chained mode, the output registers of the FPU must be enabled to latch feedback operands. The appropriate registers must be enabled by setting the PIPES2-PIPES1 controls in the configuration register at the beginning of chained operations, and the PIPES2-PIPES1 control should then be reinitialized upon termination.

Fully pipelined operation (both pipeline and output registers enabled) affects timing when writing results back to the RA and RB register files. To adjust writeback timing, it is possible to issue the NOP (no operation) instruction to the FPU core when the results are to be retained in the output registers for one or more additional cycles. The NOP instruction is only effective when the output registers are enabled, as each NOP causes the output register contents to be retained for one additional cycle.


Figure 8. FPU Core Functional Block Diagram

## GRAPHICS FLOATING-POINT PROCESSOR

D3592, SEPTEMBER 1990 - REVISED MAY 1991 - SGUS012A
SMJ34082A operating modes
The SMJ34082A can operate as a stand-alone floating-point processor or a graphics coprocessor to the SMJ34020 Graphics System Processor. Control of FPU operation is provided either from external program memory or from the SMJ34020. External instructions are addressed by address lines MSA15-0 and are input on MSD31-0. SMJ34020 instructions are input on LAD31-0.
Both the MSD and LAD buses can be used for data transfers as well. Combinations of control signals distinguish instruction fetches from data transfers. A single instruction may be used to transfer data and to perform an operation within the FPU.
The SMJ34082A supports external code and data storage with the memory expansion interface, MSD31-0. Up to 64 K 32 -bit data operands and 64 K instructions may be added externally to the SMJ34082A. The signal DS/CS controls whether data space or code space is being accessed, and read/write control is provided with the chip enable ( $\overline{\mathrm{MCE}}$ ), outputenable ( $\overline{\mathrm{MOE}}$ ), address enable ( $\overline{\mathrm{MAE}}$ ), write enable ( $\overline{\mathrm{MWR}}$ ), and address lines (MSA15-0).
The SMJ34082A also provides instructions that allow the SMJ34020 to read/write directly from/to external memory. The external code support permits full utilization of the SMJ34082A features and instruction set.

## coprocessor-mode operation

Operation in the coprocessor mode assumes MSTR is low. In this mode, the SMJ34082A acts as a closely coupled coprocessor to the SMJ34020. The interface between the two devices consists of direct connections between pins. More than one coprocessor may be connected to the SMJ34020 by setting the appropriate coprocessor ID (CID2-CID0). Up to four coprocessors executing in parallel may be used with a single SMJ34020.
In the coprocessor mode, clock signals are provided by LCLK1 and LCLK2 from the SMJ34020. Internally, the FPU generates a rising clock edge from each LCLK1 edge (rising or falling). Thus, the SMJ34082A actually operates at twice the LCLK1 input clock frequency.

## initialization (coprocessor mode)

On reset, the SMJ34082A clears all pipeline registers and internal states. The configuration register and status register return to their initialization values. When RESET returns high in the coprocessor mode, the SMJ34082A is in an idle state waiting for the next instruction from the SMJ34020.

## LAD bus control (coprocessor mode)

Both data and instructions are transferred over the bidirectional LAD bus in the coprocessor mode. A unique combination of signal inputs distinguishes an instruction from data. SF, $\overline{A L T C H}, \overline{C A S}, \overline{R A S}$, and $\overline{W E}$ are used to designate coprocessor functions from other operations on the LAD bus.

Data may be transferred to or from SM 334020 registers or memory via LAD31-0. Transfers between the LAD and MSD buses can also be programmed. A single coprocessor instruction may be used to transfer data to the SMJ34082A and then perform an FPU operation.
MSD bus control (coprocessor mode)
Use of the MSD bus in the coprocessor mode is optional. External memory on MSD31-0 can be used to store data, user-programmed subroutines, or both. Different combinations of control signals distinguish between data memory and code memory. Control signals for MSD and MSA buses operate the same in the host-independent and coprocessor modes.
interrupt handling (coprocessor mode)
A software interrupt to the SMJ34082A is generated by the set mask external instruction. When the interrupt is granted, the current program counter is stored in the interrupt return register, and a branch to the interrupt vector address is executed. Software interrupts may be disabled.

If the exception detect interrupt (ED) is enabled, a SMJ34082A exception causes $\overline{\text { COINT }}$ to go low, signalling the exception to the SMJ34020. This exception does not cause a branch to the interrupt vector. If its interrupts are enabled, the SMJ34020 will branch to an interrupt vector to service the SMJ34082A request. Interrupts are cleared by reading the SMJ34082A status register.

## host-independent mode operation

Operation in the host-independent mode assumes MSTR high. The SMJ34082A has several hardware control signals, as well as programmable features, which support system functions such as initialization, data transfer, or interrupts in the host-independent mode. CLK provides the input clock to the SMJ34082A. Details of initialization, LAD and MSD bus interface control, and interrupt handling are provided in the following sections.

## initialization (host-independent mode)

To simplify initialization of external program memory, the SMJ34082A provides a bootstrap loader to perform an initial program load of 64 instructions. Once invoked, the loader causes the SMJ34082A to read 65 words from the LAD bus and write 64 words out to the external program memory on the MSD bus, beginning with location 0 . The first word read is used to initialize the configuration register.

This loader is invoked by first setting RESET low, and then INTR low. A separate timing diagram for using the bootstrap loader is provided (see Figure 34). INTR should be taken low after RESET is already low, as shown in the diagram. When the bootstrap loader is started, the FPU core is reset (internal states and status are cleared, but not data registers) and the stack pointer, program counter, and interrupt vector register are all set to zero.

RESET must be set high again before the loader operation can start (see Figure 34). Once the loader is active, an external interrupt (signalled by INTR low) will not be granted until the load sequence is finished. However, $\overline{R E S E T}$ going low terminates the load sequence, regardless of whether the sequence is complete. When the load sequence is finished, the device begins program execution at external address 0 .

## LAD bus control (host-independent mode)

Data transfer from the LAD bus (LAD31-0) is controlled primarily by output signals, $\overline{\mathrm{ALTCH}}, \overline{\mathrm{WE}}$, and $\overline{\mathrm{CAS}}$. $\overline{\text { ALTCH }}$ is the address write strobe that signals an address is being output on the LAD bus. The $\overline{\mathrm{CAS}}$ signal is the read strobe, and $\overline{W E}$ is the write enable output to memory.
If a bidirectional FIFO is used instead of memory, $\overline{C A S}$ can be directly connected to the read clock and $\overline{W E}$ to the write clock. The CC input can be used to signal the SMJ34082A when data is ready for input from the FIFO stack.

Data input on the LAD bus can be written to data registers, control registers, or passed through for output on the MSD bus. Alternatively, the LAD bus input can be selected directly as an FPU source operand without writing to a register.

An FPU result can be written to a data register and at the same time be passed out on the LAD bus. When this is done, the clock period may need to be extended up to 15 ns (SMJ34082-30) to allow for the propagation delay from the FPU core to the outputs.

Depending on the specific system implementation, transferring data to and from the LAD bus without intervening register operations may significantly improve throughput. In the host-independent mode, data moves to and from internal registers can be minimized at the cost of adjusting the clock period to assure integrity of FPU inputs to and output from the LAD bus.

D3592, SEPTEMBER 1990 - REVISED MAY 1991 - SGUSO12A
MSD bus control (host-independent mode)
The MSD bus can be used to access either external data memory or external code memory, depending on the combination of control signals required. If the memory on the MSD port is shared with a host processor, the MAE and RDY signals can be used to prevent conflicts between the host and the SMJ34082A. When memory on the MSD port is shared, the host processor can monitor the state of the SMJ34082A memory chip enable (MCE) to determine when the SMJ34082A is not accessing the memory.
Otherwise, the $\overline{M A E}$ signal may be tied low (if unused), and the SMJ34082A can use $\overline{M O E}, \overline{M C E}, \overline{M W R}$, and DS/CS to control external memory operations into either data space or code space, as selected by DS/CS.
interrupt handling (host-independent mode)
Interrupts to the SMJ34082A can be signalled by setting the interrupt request input (INTR) low. $\overline{\text { INTR }}$ is associated with the vector in the interrupt vector register. Software interrupts are signalled by setting the software interrupt flag in the status register.

In the event of an FPU status exception in the host-independent mode, an interrupt is generated that causes a branch to an exception handler routine. The address of the exception handler is stored in the interrupt vector register by the user prior to execution of the FPU program. Interrupts may be disabled by setting the appropriate bits in the status register.

## absolute maximum ratings over operating free-air temperature range (unless otherwise noted) ${ }^{\dagger}$


$\dagger$ Stresses beyond those listed under "absolute maximum ratings" may cause permanent damage to the device. These are stress ratings only and functional operation of the device at these or any other conditions beyond those indicated under "recommended operating conditions" is not implied. Exposure to absolute-maximum-rated conditions for extended periods may affect device reliability.
NOTE 1: All voltage levels are with respect to ground ( $V_{S S}$ ).
recommended operating conditions

| PARAMETER |  |  |  | MIN | NOM MAX | UNIT |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| VCC | Supply voltage |  |  | 4.5 | $5 \quad 5.5$ | V |
| $V_{\text {SS }}$ | Supply voltage (see Note 2) |  |  |  | 0 | V |
| $\mathrm{V}_{\text {IH }}$ | High-level input voltage |  |  | 2.4 | $\mathrm{V}_{\mathrm{CC}}+0.3$ | V |
| $V_{\text {IL }}$ | Low-level input voltage |  |  | -0.3 | 0.6 | V |
| IOH | High-level output current |  |  |  | -8 | mA |
| 1 OL | Low-level output current |  |  |  | 8 | mA |
| $f_{\text {clock }}$ | Clock frequency |  | SMJ34082A-28 |  | 7.1 | MHz |
|  |  | Coprocessor mode | SMJ34082A-30 |  | 7.6 |  |
|  |  | Host-independent Mode | SMJ34082A-28 |  | 14.3 |  |
|  |  |  | SMJ34082A-30 |  | 15.4 |  |
| TA | Operating free-air temperature |  |  | -55 |  | ${ }^{\circ} \mathrm{C}$ |
| TC | Operating case temperature |  |  |  | 125 | ${ }^{\circ} \mathrm{C}$ |

NOTE 2: In orderto minimize noise on $\mathrm{V}_{S S}$, care should be taken toprovide a minimum-inductance path between the $\mathrm{V}_{\text {SS }}$ pins and system ground.
electrical characteristics over recommended operating free-air (minimum) and case (maximum) temperature range (unless otherwise noted)

| PARAMETER |  |  | TEST CONDITIONS |  | MIN | TYP ${ }^{\ddagger}$ | MAX | UNIT |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| $\mathrm{V}_{\mathrm{OH}}$ | High-level output voltage |  | $\mathrm{V}_{C C}=4.5 \mathrm{~V}$, | $\mathrm{l}^{\mathrm{OH}}=-8 \mathrm{~mA}$ | 2.6 |  |  | V |
| VOL | Low-level output voltage |  | $\mathrm{V}_{C C}=4.5 \mathrm{~V}$, | $\mathrm{IOL}=8 \mathrm{~mA}$ |  |  | 0.6 | V |
| lo | High-impedance bidirectional pins output current |  | $\mathrm{V}_{C C}=4.5 \mathrm{~V}$, | $\mathrm{V}_{\mathrm{O}}=2.8 \mathrm{~V}$ |  |  | 10 | $\mu \mathrm{A}$ |
|  |  |  | $\mathrm{V}_{C C}=4.5 \mathrm{~V}$, | $\mathrm{V}_{\mathrm{O}}=0.8 \mathrm{~V}$ |  |  | -10 |  |
| 4 | Input current |  | $V_{1}=V_{\text {SS }}$ to $V$ |  |  |  | $\pm 10$ | $\mu \mathrm{A}$ |
| $\mathrm{lcc}^{\text {§ }}$ | Supply current | Dynamic | $V_{C C}=5.5 \mathrm{~V}$ |  |  |  | 325 | mA |
|  |  | Quiescent | $\mathrm{V}_{1}=\mathrm{V}_{1 L}$ max or | $\mathrm{IOH}^{\prime}=\mathrm{I}_{\mathrm{OL}}=0$ |  |  | 50 | mA |
|  |  |  | $V_{1}=0.2 \mathrm{~V}$ or $V$ | $\mathrm{OH}=1 \mathrm{OL}=0$ |  |  | 50 |  |
| $\mathrm{C}_{i}$ | Input capacitance |  |  |  |  | 10 |  | pF |

$\ddagger$ All typical values are at $\mathrm{V}_{C C}=5 \mathrm{~V}$ and $\mathrm{T}_{A}=25^{\circ} \mathrm{C}$.
$\S_{\text {ICC }}$ is measured at maximum clock frequency. Inputs are presented with random logic highs and lows to assure the toggling of internal nodes.

SMJ34082A
GRAPHICS FLOATING-POINT PROCESSOR
D3592. SEPTEMBER 1990 - REVISED MAY 1991 - SGUS012A

## coprocessor mode (MSTR low)

switching characteristics over recommended ranges of supply voltage and operating free-air (minimum) and case (maximum) temperature range (unless otherwise noted) ${ }^{\dagger}$
propagation delay times


[^11]
## coprocessor mode (MSTR low)

switching characteristics over recommended ranges of supply voltage and pperating free-air (minimum) and case (maximum) temperature range (unless otherwise noted) (continued) ${ }^{\dagger}$
enable and disable times

| PARAMETER |  | FIGURE | SMJ34082A-28 |  | SMJ34082A-30 |  | UNIT |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  |  | MIN | MAX | MIN | MAX |  |
| ten(LOEL-LADZX) | Enable time, $\overline{\text { LOE }}$ low to LAD enabled |  | 16 | 2 | 17 | 2 | 17 | ns |
| ten(MAEL-MSAZX) | Enable time, MAE low to MSA enabled | 21, 22 | 2 | 17 | 2 | 17 |  |  |
| ten(MAEL-MSDZX) | Enable time, MAE low to MSD enabled | 22 | 2 | 17 | 2 | 17 |  |  |
| $\mathrm{t}_{\text {dis }}($ LOEH-LADXZ) | Disable time, LOE high to LAD disabled | 16 | 2 | 17 | 2 | 17 | ns |  |
| tdis(MAEH-MSAXZ) | Disable time, MAE high to MSA disabled | 21,22 | 2 | 17 | 2 | 17 |  |  |
| tdis(MAEH-MSDXZ) | Disable time, MAE high to MSD disabled | 21 | 2 | 17 | 2 | 17 |  |  |

valid times

| PARAMETER |  | Figure | SMJ34082A-28 |  | SMJ34082A-30 |  | UNIT |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  |  | MIN | MAX | MIN | MAX |  |
| $\mathrm{tv}_{\mathrm{v} \text { (MWRH-MSA) }}$ | Valid time, MSA address after MWR high |  | $\begin{gathered} 20-22,24, \\ 25 \\ \hline \end{gathered}$ | 0 |  | 0 |  | ns |
| $\mathrm{tv}_{\mathrm{v} \text { (MWRH-MSD) }}$ | Valid time, MSD data output after MWR high | $\begin{gathered} \hline 20-22,24, \\ 25 \\ \hline \end{gathered}$ | 0 |  | 0 |  |  |  |
| $\mathrm{t}_{\mathrm{v}}(\mathrm{LC} 1-\mathrm{MSA})$ | Valid time, MSA address valid after LCK $\uparrow$ or $\downarrow$ | $\begin{aligned} & 17-22, \\ & 24-27 \end{aligned}$ | 3 |  | 3 |  |  |  |
| tv(LC1L-COR) | Valid time, CORDY valid after LCLK1 low | 11 | 0 |  | 0 |  |  |  |

timing requirements over recommended ranges of supply voltage and operating free-air (minimum) and case (maximum) temperature range (unless otherwise noted) ${ }^{\dagger}$ clock period and pulse duration

| PARAMETER |  | FIGURE | SMJ34082A-28 |  | SMJ 34082A-30 |  | UNIT |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  |  | MIN | MAX | MIN | MAX |  |
| $\mathrm{t}_{\mathrm{c}}$ (LC1) | Clock period, LCLK1 (1/filock) with PIPES1 low |  | $\begin{gathered} 10,17-22, \\ 24-27 \end{gathered}$ | 170 |  | 162 |  | ns |
| $\mathrm{t}_{\mathrm{c}}(\mathrm{LC} 2)$ | Clock period, LCLK2 (1/fclock) with PIPES1 low | 10 | 170 |  | 162 |  |  |  |
| ${ }^{\text {w }}$ (LCC1H) | Pulse duration, LCLK1 high | 10 | 76 |  | 72 |  | ns |  |
| ${ }^{\text {w }}$ (LCC1L) | Pulse duration, LCLK1 low | 10 | 76 |  | 72 |  |  |  |
| ${ }^{\text {w }}$ (LC2H) | Pulse duration, LCLK2 high | 10 | 76 |  | 72 |  |  |  |
| ${ }^{\text {w }}$ (LC2L) | Pulse duration, LCLK2 low | 10 | 76 |  | 72 |  |  |  |
| $\mathrm{t}_{\text {w }}$ (DCSH)MH | Pulse duration, DS/CS high with MEMCFG high | 20,25, 27 | 5 |  | 5 |  |  |  |
| ${ }^{\text {w }}$ (RSTL) | Pulse duration, RESET low | 12 | 35 |  | 30 |  |  |  |
| $\mathrm{t}_{\mathrm{w}}$ (MCEH) | Pulse duration, MCE high | 18,25, 27 | 5 |  | 5 |  |  |  |
| $\mathrm{t}_{\text {w }}$ (MOEH ) | Pulse duration, $\overline{\text { MOE }}$ high | $\begin{gathered} 17,18,23, \\ 26,27 \end{gathered}$ | 5 |  | 5 |  |  |  |
| $\mathrm{t}_{\text {w }}$ (MWRH ${ }^{\text {a }}$ | Pulse duration, MWR high | 20,24,25 | 5 |  | 5 |  |  |  |

†ee Parameter Measurement Information for load circuit, voltage waveforms, and timing diagrams. The device parameters are measured for PIPES2 high and PIPES1 low. No other pipeline settings are specified.

## SMJ34082A

## GRAPHICS FLOATING-POINT PROCESSOR

D3592, SEPTEMBER 1990 - REVISED MAY 1991 - SGUSO12A

## coprocessor mode (MSTR low)

timing requirements over recommended ranges of supply voltage and operating free-air (minimum) and case (maximum) temperature range (unless otherwise noted) (continued) ${ }^{\dagger}$ transition times

| PARAMETER |  | FIGURE | SMJ34082A-28 |  | SMJ34082A-30 |  | UNIT |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  |  | MIN | MAX | MIN | MAX |  |
| $\mathrm{t}_{\text {t }}(\mathrm{LC} 1)$ | Transition time, LCLK1 |  | 10 |  | 15 |  | 15 | ns |
| ti(LC) | Transition time, LCLK2 | 10 |  | 15 |  | 15 |  |  |

setup and hold times

| PARAMETER |  | FIGURE | SMJ34082A-28 |  | SMJ34082A-30 |  | UNIT |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  |  | MIN | MAX | MIN | MAX |  |
| $\mathrm{t}_{\text {su( }}$ (BUS-LC2H) | Setup time, BUSFLT valid before LCL.K2 $\uparrow$ |  | 11 | 20 |  | 13 |  | ns |
| $\mathrm{t}_{\text {su }}$ (CC-LC1) | Setup time, CC valid before LCLK1 $\uparrow$ or $\downarrow$ | 12 | 7 |  | 7 |  |  |  |
| $t_{\text {su(LAD-ATCL }}$ | Setup time, LAD address valid before ALTCH low | 13-16, 23 | 17 |  | 17 |  |  |  |
| $\mathrm{t}_{\text {su( }}$ (LAD-CASH) | Setup time, LAD address valid before $\overline{C A S}$ high | $\begin{gathered} 13,15,24 \\ 25 \end{gathered}$ | 15 |  | 15 |  |  |  |
| $t_{\text {su(LRD-LC2H) }}$ | Setup time, LRDY valid before LCLK2 $\uparrow$ | 11 | 20 |  | 20 |  |  |  |
| $t_{\text {su }}$ (MSD-LC1) | Setup time, MSD data valid before LCLK1 $\uparrow$ or $\downarrow$ | 17,18, 23 | 12 |  | 12 |  |  |  |
| $t_{\text {Su }}$ (RASH-ATCL) | Setup time, RAS high before ALTCH low | 13-15, 23 | 35 |  | 30 |  |  |  |
| $t_{\text {su }}$ (RDYL-LC1) | Setup time, RDY low before LCLK1 $\uparrow$ or $\downarrow$ | 12 | 20 |  | 15 |  |  |  |
| $\mathrm{t}_{\text {Su }}$ (RSTH-LC1) | Setup time, $\overline{\text { RESET }}$ high before LCLK1 $\uparrow$ or $\downarrow$ | 12 | 50 |  | 50 |  |  |  |
| $t_{\text {Su }}(S F-A T C L)$ | Setup time, SF valid before ALTCH low | 13-16, 23 | 15 |  | 15 |  |  |  |
| $\mathrm{t}_{\text {Su(WEL-CASL) }}$ | Setup time, WE low for data write before CAS low | 13, 16 | 15 |  | 15 |  |  |  |
| $\mathrm{th}^{\text {(ATCH-SF) }}$ | Hold time, SF valid after ALTCH high | 13-15, 23 | 15 |  | 12 |  | ns |  |
| th(ATCL-LAD) | Hold time, LAD address valid after ALTCH low | 13-16, 23 | 21 |  | 17 |  |  |  |
| th(CASH-LAD) | Hold time, LAD data valid after $\overline{\mathrm{CAS}}$ high | $\begin{gathered} 13,15,24 \\ 25 \\ \hline \end{gathered}$ | 0 |  | 0 |  |  |  |
| th(CASH-SF) | Hold time, SF valid after CAS high | 13-15, 23 | 15 |  | 15 |  |  |  |
| th(LCt-CC) | Hold time, CC valid after LCLK1 $\uparrow$ or $\downarrow$ | 12 | 5 |  | 5 |  |  |  |
| th(LC1-MSD) | Hold time, MSD input data valid after LCLK1 $\uparrow$ or $\downarrow$ | 17,18,23 | 4 |  | 4 |  |  |  |
| th(LC1-RDY) | Hold time, RDY valid after LCLK1 $\uparrow$ or $\downarrow$ | 12 | 5 |  | 5 |  |  |  |
| th(LC1H-LC2L) | Hold time, LCLK2 low after LCLKK1 high | 10 | 20 |  | 20 |  |  |  |
| th(LC2H-BUS) | Hold time, BUSFLT valid after LCLK2 high | 11 | 5 |  | 5 |  |  |  |
| $t_{\text {h }}(\mathrm{LC} 2 \mathrm{H}-\mathrm{LC} 1 \mathrm{H})$ | Hold time, LCLK1 high after LCLK2 high | 10 | 20 |  | 20 |  |  |  |
| th(LC2H-LRD) | Hold time, LRDY valid after LCLK2 high | 11 | 5 |  | 5 |  |  |  |
| th(WEH-SF) | Hold time, SF valid after WE high | 13 | 20 |  | 20 |  |  |  |

† See Parameter Measurement Information for load circuit, voltage waveforms, and timing diagrams. The device parameters are measured for PIPES2 high and PIPES 1 low. No other pipeline settings are specified.

## coprocessor mode (MSTR low)

timing requirements over recommended ranges of supply voltage and operating free-air (minimum) and case (maximum) temperature range (unless otherwise noted) (continued) ${ }^{\dagger}$
delay times

| PARAMETER |  | FIGURE | SMJ34082A-28 |  | SMJ34082A-30 |  | UNIT |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  |  | MIN | MAX | MIN | MAX |  |
| $\mathrm{t}_{\mathrm{d} \text { (DCSH-MCEL) }}$ MH | Delay time, DS/CS high to MCE low with MEMCFG high |  | 18, 22 | 4 |  | 4 |  | ns |
| $\mathrm{t}_{\mathrm{d}}$ (DCSH-MWRL) | Delay time, DS/CS high to MWR low | 19, 24 | 5 |  | 5 |  |  |  |
| $\mathrm{td}_{\text {(MCEH-DCSL)MH }}$ | Delay time, $\overline{M C E}$ high to DS/CS low with MEMCFG high | 20 | 4 |  | 4 |  |  |  |
| $\mathrm{t}_{\mathrm{d}}$ (MCEH-MWRL) | Delay time, MCE high to MWR low | 25 | 5 |  | 5 |  |  |  |
| $\mathrm{t}_{\text {d}}$ (MOEH-MWRL) | Delay time, MOE high to MWR low | 19 | 5 |  | 5 |  |  |  |
| $t_{\text {d }}$ (MSAV-MWRL) | Delay time, MSA valid to MWR low | $\begin{gathered} 20-22,24, \\ 25 \\ \hline \end{gathered}$ | 4 |  | 4 |  |  |  |
| $t_{\text {d }}$ (MSDZ-MOEL) | Delay time, MSD disabled to MOE low | 21, 22 | 2 |  | 2 |  |  |  |
| $\mathrm{t}_{\text {d(MWRH-MCEL }}$ MH | Delay time, MWR high to MCE low with MEMCFG high | 25 | 5 |  | 5 |  |  |  |
| $\mathrm{td}_{\text {d }}$ MWRH-MOEL) | Delay time, MWR high to MOE low | 19, 21, 22 | 5 |  | 5 |  |  |  |
| $\mathrm{t}_{\text {d(MWRH-MSDVZ) }}$ | Delay time, MWF high to MSD disabled | 21 | 1 | 12 | 1 | 9 |  |  |
| $t_{\text {d }}($ MWRL-MSDZX) | Delay time, MWR low to MSD enabled | 21, 22 | 1 | 13 | 1 | 13 |  |  |

$\dagger$ See Parameter Measurement Information for load circuit, voltage waveforms, and timing diagrams. The device parameters are measured for PIPES2 high and PIPES1 low. No other pipeline settings are specified.

## D3592, SEPTEMBER 1990 - REVISED MAY 1991 - SGUS012A

## host-Independent mode (MSTR high)

switching characteristics over recommended ranges of supply voltage and operating free-air (minimum) and case (maximum) temperature range (unless otherwise noted) ${ }^{\dagger}$
propagation delay times

$\dagger$ See Parameter Measurement Information for load circuit, voltage waveforms, and timing diagrams. The device parameters are measured for PIPES2 high and PIPES1 low. No other pipeline settings are specified.
$\ddagger$ Interrupts are not granted during multicycle instructions.

## host-Independent mode (MSTR high)

switching characteristics over recommended ranges of supply voltage and pperating free-air (minimum) and case (maximum) temperature range (unless otherwise noted) (continued) $\dagger$ propagation delay times (continued)

| PARAMETER |  | FIGURE | SMJ34082A-28 | SMJ34082A-30 | UNIT |
| :---: | :---: | :---: | :---: | :---: | :---: |
|  |  | MIN MAX | MIN MAX |  |
| ${ }^{\text {t }}$ (CLLKH-WEH) | Propagation delay time, CLK $\uparrow$ to $\bar{W} E$ high |  | $\begin{gathered} 30,33,43 \\ 44 \end{gathered}$ | 10 | 10 | ns |
| ${ }^{t_{p}(C L K H-W E L)}$ | Propagation delay time, CLK $\uparrow$ to $\overline{\mathrm{W} E}$ low | $\begin{gathered} 30,33,43 \\ 44 \end{gathered}$ | 30 | 30 |  |  |

enable and disable times

| PARAMETER |  | FIGURE | SMJ34082A-28 |  | SMJ34082A-30 |  | UNIT |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  |  | MIN | MAX | MIN | MAX |  |
| ten(CLKH-LADZX) | Enable time, CLK high to LAD enabled |  | 29,30 | 5 |  | 5 |  | ns |
| $\mathrm{t}^{\text {en (LOEL-LADZX) }}$ | Enable time, $\overline{\text { LOE }}$ low to LAD enabled | 33 | 2 | 17 | 2 | 17 |  |  |
| ten(MAEL-MSAZX) | Enable time, MAE Low to MSA enabled | 41, 42 | 2 | 17 | 2 | 17 |  |  |
| ten(MAEL-MSDZX) | Enable time, MAE Iow to MSD enabled | 42 | 2 | 17 | 2 | 17 |  |  |
| $\mathrm{t}_{\text {dis }}($ CLKKH-LADZX) | Disable time, CLK high to LAD disabled ${ }^{\ddagger}$ | 29,30 |  | 25 |  | 25 |  |  |
| $t_{\text {dis }}($ LOEH-LADXZ) | Disable time, LOE high to LAD disabled | 33 | 2 | 17 | 2 | 17 | ns |  |
| tdis (MAEH-MSAXZ) | Disable time, MAE high to MSA disabled | 41,42 | 2 | 17 | 2 | 17 |  |  |
| tids(MAEH-MSDXZ) | Disable time, MAE high to MSD disabled | 42 | 2 | 17 | 2 | 17 |  |  |

valid times

| PARAMETER |  | Figure | SMJ34082A-28 |  | SMJ34082A-30 |  | UNIT |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  |  | MIN | MAX | MiN | MAX |  |
| tv(ATCH-LAD) | Valid time, LAD output data after ALTCH high |  | 29,30 | 2 |  | 2 |  | ns |
| $\mathrm{t}_{\mathrm{v} \text { ( }}$ (LKH-MSA) | Valid time, MSA address valid after CLK high | 35-47 | 3 | $\cup$ | 3. |  |  |  |
| tv(MWRH-MSD) | Valid time, MSD data valid after $\overline{\text { MWR }}$ high | $\begin{aligned} & 35,36, \\ & 40-42 \\ & \hline \end{aligned}$ | 1 |  | 1 |  |  |  |
| tv(MWRH-MSA) | Valid time, MSA address valid after MWF high | $\begin{aligned} & 35,36, \\ & 40-41 \end{aligned}$ | 1 |  | 1 |  |  |  |
| $t_{v}$ (WEH-LAD) | Valid time, LAD data valid after $\overline{\text { WE }}$ | $\begin{gathered} 30,33,43, \\ 44 \end{gathered}$ | 2 |  | 2 |  |  |  |

${ }^{\dagger}$ See Parameter Measurement Information for load circuit, voltage waveforms, and timing diagrams. The device parameters are measured for PIPES2 high and PIPES 1 low. No other pipeline settings are specified.
$\ddagger$ Valid only for last write in series. The LAD bus is not placed in high-impedance state between consecutive outputs.

## host-Independent mode (MSTR high)

timing requirements over recommended ranges of supply voltage and operating free-air (minimum) and case (maximum) temperature range (unless otherwise noted) ${ }^{\dagger}$
clock period and pulse duration

| PARAMETER |  | FIGURE | SMJ34082A-28 |  | SMJ34082A-30 |  | UNIT |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  |  | MIN | MAX | MIN | MAX |  |
| $\mathrm{t}_{\mathrm{c}}$ (CLK) | Clock period time, CLK (1/fclock) with PIPES1 low |  | $\begin{aligned} & 28-31, \\ & 33-48 \end{aligned}$ | 78 |  | 73 |  | ns |
| ${ }^{\text {w }}$ (ATCH) | Pulse duration, $\overline{\text { ALTCH }}$ high | 30 | 5 |  | 5 |  | ns |
| ${ }_{\text {tw }}$ (CASH) | Pulse duration, $\overline{\mathrm{CAS}}$ high | $\begin{gathered} 29,31,32, \\ 35,36 \end{gathered}$ | 5 |  | 5 |  |  |
| $t_{\text {W(CLKH) }}$ | Pulse duration, CLK high | 28 | 17 |  | 17 |  |  |
| $\mathrm{t}_{\mathrm{w}}$ (CLKL) | Pulse duration, CLK low | 28 | 22 |  | 22 |  |  |
| $t_{w}$ (DCSH) | Pulse duration, DS/CS high | 36, 40, 44 | 5 |  | 5 |  |  |
| ${ }^{\text {w }}$ (ITRL) | Pulse duration, INTR low | 34, 47 | 30 |  | 30 |  |  |
| ${ }^{\text {w }}$ (MCEH) | Pulse duration, MCE high | $\begin{gathered} 38,38, \\ 44-46 \end{gathered}$ | 5 |  | 5 |  |  |
| ${ }^{\text {w }}$ (MOEH) | Pulse duration, $\overline{\mathrm{MOE}}$ high | $\begin{gathered} 37,38, \\ 43-46 \\ \hline \end{gathered}$ | 6 |  | 6 |  |  |
| ${ }^{\text {w }}$ (MWRH) | Pulse duration, MWR high | 35, 36, 40 | 6 |  | 6 |  |  |
| $\mathrm{t}_{\mathrm{w}}$ (RSTL) | Pulse duration, $\overline{\text { RESET }}$ low | 34 | 40 |  | 40 |  |  |
| ${ }^{\text {tw }}$ (WEH) | Pulse duration, $\overline{\text { WE }}$ high | $\begin{gathered} 30,33,43, \\ 44 \end{gathered}$ | 5 |  | 5 |  |  |

## transition time

| PARAMETER |  | Figure | SMJ34082A-28 |  | SMJ34082A-30 |  | UNIT |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  |  | MIN | MAX | MIN | MAX |  |
| t(CLK) | Transition time, CLK |  | 28 |  | 15 |  | 15 | ns |

[^12] PIPES2 high and PIPES1 low. No other pipeline settings are specified.

## host-independent mode (MSTR high)

timing requirements over recommended ranges of supply voltage and operating free-air (minimum) and case (maximum) temperature range (unless otherwise noted) (continued) ${ }^{\dagger}$
setup and hold times

| PARAMETER |  | FIGURE | SMJ34082A-28 |  | SMJ34082A-30 |  | UNIT |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  |  | MIN | MAX | MIN | MAX |  |
| $\mathrm{t}_{\text {su( }}$ (CC-CLKH) | Setup time, CC before CLK high |  | 45 | 7 |  | 7 |  | ns |
| ${ }^{\text {tsu }}$ (LADV-CLKL $)$ | Setup time, LAD data valid before CLK low for immediate data input ${ }^{\ddagger}$ | 32 | 15 |  | 15 |  |  |  |
| $\mathrm{t}_{\text {Su( }}$ (ITRL-CLIKH) | Setup time, INTR before CLK high | 47 | 20 |  | 15 |  |  |  |
| $\mathrm{t}_{\text {su }}$ (LAD-CLKH) | Setup time, LAD input data valid before CLK high | $\begin{gathered} 29,31, \\ 34-36 \end{gathered}$ | 15 |  | 13 |  |  |  |
| $\mathrm{t}_{\text {su }}$ (LRD-CLKH) | Setup time, LRDY before CLK high | 48 | 20 |  | 15 |  |  |  |
| $\mathrm{t}_{\text {su }}$ (MSD-CLKH) | Setup time, MSD data valid before CLK high | $\begin{gathered} 37,38, \\ 43-47 \end{gathered}$ | 13 |  | 13 |  |  |  |
| $\mathrm{t}_{\text {su }}$ (RDYV-CLKH) | Setup time, RDY valid before CLK high | 48 | 20 |  | 12 |  |  |  |
| $\mathrm{t}_{\text {Su( }}$ (RSTH-CLKH) | Setup time, RESET high before CLK high | 34 | 45 |  | 45 |  |  |  |
| $t_{\text {su }}$ (RSTL-ITRL) | Setup time, $\overline{R E S E T}$ low before INTR low for bootstrap loader | 34 | 20 |  | 20 |  | ns |  |
| th(CLKH-CC) | Hold time, CC after CLK high | 45 | 3 |  | 3 |  |  |  |
| $\mathrm{th}^{(C L K H-I T R)}$ | Hold time, INTR after CLK high | 47 | 3 |  | 3 |  |  |  |
| th(CLKH-LAD) | Hold time, LAD input data valid after CLK high | $\begin{gathered} 29,31,35, \\ 36 \end{gathered}$ | 5 |  | 5 |  |  |  |
| $\mathrm{th}_{\text {( }}(\mathrm{LL} \mathrm{KH}-\mathrm{LRD})$ | Hold time, LRDY after CLK high | 48 | 0 |  | 0 |  |  |  |
| th(CLKH-MSD) | Hold time, MSD input data valid after CLK high | $\begin{gathered} 37,38 \\ 43-47 \end{gathered}$ | 4 |  | 4 |  |  |  |
| th(CLKH-RDY) | Hold time, RDY after CLK high | 48 | 0 |  | 0 |  |  |  |
| th(CLKL-LAD) | Hold time, LAD data after CLK low for immediate data input ${ }^{\ddagger}$ | 32 | 5 |  | 5 |  |  |  |
| th(ITRL-RSTH) | Hold time, $\overline{R E S E T}$ low after INTR low for bootstrap loader | 34 | 15 |  | 15 |  |  |  |

TSee Parameter Measurement Information for load circuit, voltage waveforms, and timing diagrams. The device parameters are measured for PIPES2 high and PIPES1 low. No other pipeline settings are specified.
$\ddagger$ This mode permits data input that does not meet the minimum setup before CLK high. The clock period for this mode must be extended according to the equation:

$$
\text { Adjusted clock period }=\text { Normal clock period }+ \text { Data delay }+5 n s
$$

The data delay is the delay from CLK high to valid data. This mode may not be used to input data for divides or square roots.
host-independent mode (MSTR high)
timing requirements over recommended ranges of supply voltage and operating free-air (minimum) and case (maximum) temperature range (unless otherwise noted) (continued) ${ }^{\dagger}$ delay times

| PARAMETER |  | FIGURE | SMJ34082A-28 |  | 3MJ34082A-30 |  | UNIT |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  |  | MIN | MAX | MIN | MAX |  |
| $\mathrm{t}_{\text {d }}$ (ATCH-CASL) | Delay time, $\overline{\text { ALTCH }}$ high to $\overline{\text { CAS }}$ low |  | 29 | 5 |  | 5 |  | ns |
| $\mathrm{td}_{\text {d }}$ ATCH-WEL) | Delay time, $\overline{\text { ALTCH }}$ high to WE low | 30 | 3 |  | 3 |  |  |  |
| $\mathrm{t}_{\text {d }}$ (CASH-ATCL) | Delay time, CAS high to ALTCH low | 29 | 3 |  | 3 |  |  |  |
| $\mathrm{t}_{\text {d }}$ (CASH-WEL) | Delay time, CAS high to WE low | 33 | 3 |  | 3 |  |  |  |
| $t_{\text {d }}$ (COIL-ATCL) | Delay time, COINT low to ALTCH low | 29,30 | 0 |  | 0 |  |  |  |
| $\mathrm{t}_{\mathrm{d}}$ (COIL-CASL) | Delay time, COINT low to CAS low | 31, 35, 36 | 0 |  | 0 |  |  |  |
| $\mathrm{t}_{\mathrm{d} \text { (COIL-WEL) }}$ | Delay time, COINT low to WE low | 33 | 0 |  | 0 |  |  |  |
| $\mathrm{td}_{\text {d(DCSH-MCEL) }}$ M ${ }^{\text {d }}$ | Delay time, DS/CS high to MCE low with MEMCFG high | 38, 42 | 5 |  | 5 |  |  |  |
| td(DCSH-MWRL) | Delay time, DS/CS high to MWR low | 35,39 | 4 |  | 4 |  |  |  |
| $\mathrm{t}_{\text {d}}$ (MCEH-DCSL)MH | Delay time, $\overline{M C E}$ high to DC/्̄CS low with MEMCFG high | 40 | 5 |  | 5 |  |  |  |
| td(MCEH-MWRL) | Delay time, MCE high to MWR low | 36 | 5 |  | 5 |  |  |  |
| $\mathrm{t}_{\text {d}}$ (MOEH-MWRL) | Delay time, MOE high to MWR low | 38 | 5 |  | 5 |  |  |  |
| $t_{\text {d }}$ (MSAV-MWRL) | Delay time, MSA valid to MWR low. | $\begin{gathered} 35,36, \\ 40-42 \end{gathered}$ | 4 |  | 4 |  |  |  |
| $\mathrm{t}^{\text {d (MSDZ-MOEL) }}$ | Delay time, MSD disabled to MOE low | 41, 42 | 2 |  | 2 |  |  |  |
| $\mathrm{td}_{\text {d }}$ MWRH-MCEL)MH | Delay time, MWF high to MCE low with MEMCFG high | 36 | 5 |  | 5 |  |  |  |
| td(MWRH-MOEL) | Delay time, MWR high to MOE low | 41, 42 | 5 |  | 5 |  |  |  |
| td (MWRH-MSDXZ) | Delay time, MWR high to MSD disabled | 42 | 1 | 12 | 1 | 9 |  |  |
| td(MWRL-MSDZX) | Delay time, MWR low to MSD enabled | 41, 42 | 1 | 13 | 1 | 13 |  |  |
| $\mathrm{t}_{\mathrm{d}}$ (WEH-ATCL) | Delay time, WE high to ALTCH low | 29 | 3 |  | 3 |  |  |  |
| td (WEH-CASL) | Delay time, WE high to CAS low | 31 | 3 |  | 3 |  |  |  |

† See Parameter Measurement Information for load circuit, voltage waveforms, and timing diagrams. The device parameters are measured for PIPES2 high and PIPES1 low. No other pipeline settings are specified.

## SMJ34082A <br> GRAPHICS FLOATING-POINT PROCESSOR

## EXPLANATION OF LETTER SYMBOLS

This data sheet uses a type of letter symbol based on JEDEC Std-100 and IEC Publication 748-2, 1985, to describe time intervals. The format is:

Where:

$$
t_{A(B C-D E) F}
$$

Subscript A indicates the type of dynamic parameter being represented. One of the following is used:
Switching Characteristics:
$\mathrm{p}=$ Propagation delay time
en = Enable time
dis = Disable time
Timing Requirements:
c = Clock period
$\mathrm{w}=$ Pulse duration
$\mathrm{t}=$ Transition time
d = Delay time
su $=$ Setup time
h = Hold time
$v=$ Valid time
Subscript $B$ indicates the name of the signal or terminal for which a change of state or level (or establishment of a state or level) constitutes a signal event assumed to occur first, that is, at the beginning of the time interval.

Subscript $C$ indicates the direction of the transistion and/or the final state or level of the signal represented by B. One or two of the following are used:
$H=$ High or transition to high
$\mathrm{L}=$ Low or transition to low
$\mathrm{V}=\mathrm{A}$ valid steady-state level
$X=$ Unknown, changing, or "don't care" level
Z = High-impedance (off) state
Subscript $D$ indicates the name of the signal or terminal for which a change of state or level (or establishment of a state or level) constitutes a signal event assumed to occur last, that is, at the end of the time interval.

Subscript $E$ indicates the direction of the transition and/or the final state or level of the signal represented by D. One or two of the symbols described in Subscript $C$ are used.

Subscript $F$ indicates additional information such as mode of operation, test conditions, etc.
The hyphen between the $C$ and $D$ subscripts is omitted when no confusion is likely to occur. For these letter symbols on this data sheet, the signal names are further abbreviated as follows:

| SIGNAL NAME | B \& D SUBSCRIPT | SIGNAL NAME | B \& D SUBSCRIPT | SIGNAL NAME | B\&D SUBSCRIPT | SIGNAL NAME | B\&D SUBSCRIPT | SIGNAL NAME | B\&D SUBSCRIPT |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| $\overline{\text { ALTCH }}$ | ATC | CORDY | COR | LCLK2 | LC2 | MSA(0:15) | MSA | TCK | TCK |
| BUSFLT | BFT | DC/CS | DCS | LOE | LOE | MSD(0:31) | MSD | TDI | TDI |
| CAS | CAS | EC(0:1) | EC | LRDY | LRD | $\overline{\text { MWR }}$ | MWR | TDO | TDO |
| CC | CC | INTG | INT | $\overline{\text { MAE }}$ | MAE | $\overline{\text { RAS }}$ | RAS | TMS | TMS |
| CID(0:2) | CID | INTR | ITR | MSTR | MST | RDY | RDY | $\mathrm{VCO}_{\text {coss }}$ | - |
| CLK | CLK | LAD(0:31) | LAD | $\overline{\text { MCE }}$ | MCE | RESET | RST | $\overline{W E}$ | WE |
| COINT | COI | LCLK1 | LC1 | $\overline{\text { MOE }}$ | MOE | SF | SF | MEMCFG | M |

## PARAMETER MEASUREMENT INFORMATION

| LOAD CIRCUIT PARAMETERS |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: |
| TIMING PARAMETERS |  | $\underset{(\mathrm{PF})}{\mathrm{CLOAD}^{\dagger}}$ | $\begin{aligned} & \mathrm{IOL} \\ & (\mathrm{~mA}) \end{aligned}$ | $\begin{aligned} & \hline \mathrm{loH} \\ & (\mathrm{~mA}) \end{aligned}$ | $\overline{\mathbf{V}_{\text {LOAD }}}$ (V) |
| ten | ${ }^{\text {tPZH }}$ | 65 | 8 | -8 | 0 |
|  | tPZL |  |  |  | 3 |
| $t_{\text {dis }}$ | tPHZ | 65 | 8 | -8 | 1.5 |
|  | tpLZ |  |  |  |  |
| ${ }^{\text {t }}$ p |  | 65 | 8 | -8 | $\ddagger$ |

${ }^{C_{\text {LOAD }} \text { includes the typical load circuit and distributed capacitance. }}$
$\ddagger \frac{V_{\mathrm{LOAD}}-\mathrm{V}_{\mathrm{OL}}}{\mathrm{IOL}}=50 \Omega$, where $\mathrm{V}_{\mathrm{OL}}=0.6 \mathrm{~V}, \mathrm{l}_{\mathrm{OL}}=8 \mathrm{~mA}$.


VOLTAGE WAVEFORMS SETUP AND HOLD TIMES INPUT RISE AND FALL TIMES


VOLTAGE WAVEFORMS
ENABLE AND DISABLE TIMES, 3-STATE OUTPUTS
NOTES: A. Phase relationships between waveforms were chosen arbitrarily. All inputpulses are supplied by pulse generators having the following characteristics: $\mathrm{PRR}=1 \mathrm{MHz}, \mathrm{Z}_{\mathrm{O}}=50 \Omega, \mathrm{t}_{\mathrm{r}} \leq 6 \mathrm{~ns}, \mathrm{t}_{\mathrm{f}} \leq 6 \mathrm{~ns}$.
B. Waveform 1 is for an output with internal conditions such that the output is low except when disabled by the output control.
C. Waveform 2 is for an output with internal conditions such that the output is high except when disabled by the output control. For tpL $Z$ and $\mathrm{tPHZ}_{\mathrm{Z}}, \mathrm{V}_{\mathrm{OL}}$ and $\mathrm{V}_{\mathrm{OH}}$ are measured values.

Figure 9

## PARAMETER MEASUREMENT INFORMATION



Figure 10. C̣oprocessor Mode, Input Clocks

${ }^{\dagger}$ Q1, Q2, Q3, and Q4 represent the first, second, third, and fourth quarter clocks, respectively, of the LCLK1 clock period.
Figure 11. Coprocessor Mode, Bus Control Signals

## PARAMETER MEASUREMENT INFORMATION



Figure 12. Coprocessor Mode, Control Signals

## PARAMETER MEASUREMENT INFORMATION


${ }^{\dagger}$ Q1, Q2, Q3, and Q4 represent the first, second, third, and fourth quarter clocks, respectively, of the LCLK1 clock period.
Figure 13. Coprocessor Mode, SMJ34020 GSP to SMJ34082

## GRAPHICS FLOATING-POINT PROCESSOR

D3592, SEPTEMBER 1990 - REVISED MAY 1991 - SGUS012A

## PARAMETER MEASUREMENT INFORMATION


$\dagger_{\text {Q1, Q2, Q3, and Q4 represent the first, second, third, and fourth quarter clocks, respectively, of the LCLK1 clock period. }}$
Figure 14. Coprocessor Mode, SMJ34082A to SMJ34020 GSP Including Coprocessor Internal Cycle

PARAMETER MEASUREMENT INFORMATION

${ }^{\dagger}$ Q1, Q2, Q3, and Q4 represent the first, second, third, and fourth quarter clocks, respectively, of the LCLK1 clock period.
Figure 15. Coprocessor Mode, DRAM/VRAM Memory to SMJ34082

## PARAMETER MEASUREMENT INFORMATION


$\dagger$ Q1, Q2, Q3, and Q4 represent the first, second, third, and fourth quarter clocks, respectively, of the LCLK1 clock period.
Figure 16. Coprocessor Mode, SMJ34082A to DRAM/VRAM Memory

$\dagger$ The setting of DS/ $\overline{C S}$ determines whether the value on the MSD bus is an instruction or data.
$\ddagger \overline{\mathrm{MCE}}$ dos not toggle at each clock edge.
$\S \overline{\mathrm{MOE}}$ goes high at each clock edge.
NOTE: This example shows a data read followed by an instruction read.
Figure 17. Coprocessor Mode MSD Bus Timing, Memory to SMJ34082A with MEMCFG Low

## PARAMETER MEASUREMENT INFORMATION



NOTE: This example shows a data read followed by an instruction read followed by an instruction read. This option for using DS/CS as data space chip enable and MCE as code space chip enable is invoked by setting the MEMCFG bit high in the configuration register. When MEMCFG is high, $\mathrm{DS} / \overline{\mathrm{CS}}$ and $\overline{\mathrm{MCE}}$ rise after every clock edge. In this mode, $\mathrm{DS} / \overline{\mathrm{CS}}$ and $\overline{\mathrm{MCE}}$ may not both be active (low) at the same time.

Figure 18. Coprocessor Mode MSD Bus Timing, Memory to SMJ34082A with MEMCFG High

$\dagger$ The setting of DS/CS determines whether the value on the MSD bus is an instruction or data.
$\ddagger \overline{\mathrm{MCE}}$ does not toggle at each clock edge.
$\S \overline{\text { MWR goes high at each clock edge. }}$
NOTE: This example shows a data write followed by a code read.
Figure 19. Coprocessor Mode MSD Bus Timing, SMJ34082A to Memory with MEMCFG Low


NOTE: This example shows multiple data writes. Timing for multiple code writes would be similar. This option for using DS/CS as data space chip enable and $\overline{M C E}$ as code space chip enable is invoked by setting the MEMCFG bit high in the configuration register. When MEMCFG is high, DS/ $\overline{C S}$ and $\overline{M C E}$ rise after every clock edge. In this mode, $D S / \overline{C S}$ and $\overline{M C E}$ may not both be active (low) at the same time.

Figure 20. Coprocessor Mode MSD Bus Timing, SMJ34082A to Memory with MEMCFG High

$\dagger$ The setting of DS/CS determines whether the value on the MSD bus is an instruction or data.
$\ddagger \overline{M C E}$ does not toggle at each clock edge.
$\S \overline{\text { MOE goes high at each clock edge. }}$
NOTE: This example shows a data write followed by an instruction read.
Figure 21. Coprocessor Mode, MSD Enable/Disable Timing with MEMCFG Low

## PARAMETER MEASUREMENT INFORMATION



NOTE: This example shows a data write followed by an instruction read. Timing for multiple code writes would be similar. This option for using DS/ $\overline{C S}$ as data space chip enable and $\overline{M C E}$ as code space chip enable is invoked by setting the MEMCFG bit high in the configuration register. When MEMCFG is high, DS/ $\overline{C S}$ and $\overline{M C E}$ rise after every clock edge. In this mode, $D S / \overline{C S}$ and $\overline{M C E}$ may not both be active (low) at the same time.

Figure 22. Coprocessor Mode, MSD Bus Enable/Disable Timing with MEMCFG High

## PARAMETER MEASUREMENT INFORMATION


$\dagger$ Q1, Q2, Q3, and Q4 represent the first, second, third, and fourth quarter clocks, respectively, of the LCLK1 clock period.
$\ddagger$ The setting of DS/CS determines whether the value on the MSD bus in an instruction or data.
§ $\overline{M C E}$ does not toggle at each rising clock edge.
${ }^{\top} \overline{M O E}$ goes hiigh at each rising clock edge.
Figure 23. Coprocessor Mode, Jump to External Memory Subroutine with MEMCFG Low


Figure 24. Coprocessor Mode, LAD to MSD Bus Transfer Timing with MEMCFG Low

PARAMETER MEASUREMENT INFORMATION

${ }^{\dagger}$ DS/ $\overline{C S}$ valid for moves to data space; $\overline{M C E}$ valid for moves to code space. Only one of these would be valid for each move instruction.
$\ddagger$ This option for using DS/CS as data space chip enable and MCE as code space chip enable is invoked by setting the MEMCFG bit high in the configuration register.

Figure 25. Coprocessor Mode, LAD to MSD Bus Transfer Timing with MEMCFG High

## SMJ34082A

GRAPHICS FLOATING-POINT PROCESSOR
D3592, SEPTEMBER 1990 - REVISED MAY 1991 - SGUS012A
PARAMETER MEASUREMENT INFORMATION

$\dagger$ MCE dos not toggle at each clock edge.
$\ddagger \overline{M O E}$ goes high at each clock edge.
Figure 26. Coprocessor Mode, MSD to LAD Bus Transfer Timing with MEMCFG Low

$\dagger_{\mathrm{DS}} / \overline{\mathrm{CS}}$ valid for moves to data space; $\overline{\mathrm{MCE}}$ valid for moves to code space. Only one would be valid for each move instruction. NOTE: This option for using DS/CS as data space chip enable and MCE as code space chip enable is involved by setting the MEMCFG bit high in the configuration register.

Figure 27. Coprocessor Mode, MSD to LAD Bus Transfer Timing with MEMCFG High

## PARAMETER MEASUREMENT INFORMATION



Figure 28. Host-Independent Mode, Input Clock


[^13]Figure 29. Host-Independent Mode, LAD Bus Timing for Memory to SMJ34082A

PARAMETER MEASUREMENT INFORMATION

$\dagger$ Valid only for last write in series. The LAD bus is not placed in high-impedance state between consecutive outputs.
 move instruction instead of the set mask instruction.
NOTE: This timing diagram assumes an external address latch to store address for external memory reads. Data input hold time is zero. Data (or address) outputhold time is nonzero. Validonly for last write in series. The LAD bus is not placed in high impedance between consecutive outputs.

Figure 30. Host-Independent Mode, LAD Bus Timing for SMJ34082A to Memory

## D3592, SEPTEMBER 1990 - REVISED MAY 1991 - SGUS012A

## PARAMETER MEASUREMENT INFORMATION


$\dagger \overline{\text { COINT }}$ timing is for LADCFG high only. When the LADCFG bit is set high in the configuration register, $\overline{C O I N T}$ is controlled by bit 1 of the LAD move instruction instead of the set mask instruction.

Figure 31. Host-Independent Mode, LAD Bus Timing Input to SMJ34082A

$\dagger$ This mode permits data input which does not meet the minimum setup before CLK high. For immediate data input, CLK must be high for more than 20 ns. This input mode cannot be used to input data for divides and square roots.

Adjusted clock period $=$ Normal clock period + Data delay +5 ns
Figure 32. Host-Independent Mode, LAD Bus Timing Input of Immediate Data to SMJ34082A

PARAMETER MEASUREMENT INFORMATION

$t$ When the LADCFG bit is high, $\overline{L O E}$ high places $\overline{C A S}$ and $\overline{W E}$ (as well as the LAD bus) in high impedance.
$\ddagger$ Valid only for LADCFG high. When the LADCFG bit is high in the configuration register, COINT is controlled by bit 1 of the LAD move instruction instead of the set mask instruction.
NOTE: If the instruction writes the result of an FPU operation to a register and outputs the result to the LAD bus, in the same cycle, the minimum clock period must be extended.

Figure 33. Host-Independent Mode, LAD Bus Timing Output from SMJ34082A

## PARAMETER MEASUREMENT INFORMATION


$\dagger \overline{\text { RESET }}$ is level sensitive. When $\overline{\text { RESET }}$ is set low, both LAD and MSD buses are placed in high-impedance state. When $\overline{\mathrm{RESET}}$ is released, the sequencer forces a jump to address 0 . If INTR goes low while $\overline{\text { RESET }}$ is low, the loader moves 64 words through to the external memory on MSD. Timing for the LAD to MSD move is shown in a later diagram, with the exception that the first word on LAD loads the configuration register and does not pass to the MSD bus.
$\mp$ INTR may be low one or more cycles after $\overline{R E S E T}$ goes low. $\overline{R E S E T}$ is held low, and then $\overline{\text { INTR }}$ is taken low. The bootstrap loader starts when $\overline{R E S E T}$ is set high, which may involve a delay of one or more cycles after INTR goes low.
NOTE: When the bootstrap loader is invoked, the first data word input on the LAD bus should be the configuration register settings, which will be written into the configuration register. This allows the user to select the MEMCFG setting, for reading or writing memory on the MSD port, as well as the LADCFG setting for the LAD bus interface.

Figure 34. Host-Independent Mode LAD Bus Timing, Bootstrap Loader Operation

## PARAMETER MEASUREMENT INFORMATION



[^14]Figure 35. Host-Independent Mode, LAD to MSD Bus Timing with MEMCFG Low

## GRAPHICS FLOATING-POINT PROCESSOR

D3592, SEPTEMBER 1990 - REVISED MAY 1991 - SGUS012A

## PARAMETER MEASUREMENT INFORMATION


$\dagger \overline{\text { COINT timing is for LADCFG high only. When the LADCFG bit is set high in the configuration register, COINT is controlled by bit } 1 \text { of the LAD }}$ move instruction instead of the set mask instruction.
$\ddagger \mathrm{DS} / \overline{\mathrm{CS}}$ valid for moves to data space; $\overline{\mathrm{MCE}}$ valid for moves to code space. Only one of these would be valid for each move instruction.
§ This option for using DS/CS as data space chip enable and MCE as code space chip enable is invoked by setting the MEMCFG bit high in the configuration register.

Figure 36. Host-Independent Mode, LAD to MSD Bus Transfer Timing with MEMCFG High

PARAMETER MEASUREMENT INFORMATION

$\dagger$ The setting of DS/ $\overline{C S}$ determines whether the value on the MSD bus is an instruction or data.
$\ddagger \overline{\mathrm{MCE}}$ dos not toggle at each rising clock edge.
$\S \overline{M O E}$ goes high at each rising clock edge.
NOTE; This example shows a data read followed by an instruction read.
Figure 37. Host-Independent Mode MSD Bus Timing, Memory to SMJ34082A with MEMCFG Low

## PARAMETER MEASUREMENT INFORMATION



NOTE: This example shows a data read followed by an instruction read followed by an instruction read. This option for using DS/CS as data space chip enable and $\overline{M C E}$ as code space chip enable is invoked by setting the MEMCFG bit high in the configuration register. When MEMCFG is high, $\overline{D S} \overline{C S}$ and $\overline{M C E}$ rise after every rising clock edge. In this mode, $D S \overline{C S}$ and $\overline{M C E}$ may not both be active (low) at the same time.

Figure 38. Host-Independent Mode MSD Bus Timing, Memory to SMJ34082A with MEMCFG High

$\dagger$ The setting of DS/CS determines whether the value on the MSD bus is an instruction or data.
$\ddagger \overline{M C E}$ dos not toggle at each rising clock edge.
$\S \overline{\text { MWR }}$ goes high at each rising clock edge.
NOTE: This example shows a data write followed by a code read.
Figure 39. Host-Independent Mode MSD Bus Timing, SMJ34082A to Memory with MEMCFG Low

PARAMETER MEASUREMENT INFORMATION


NOTE: This example shows multiple data writes. Timing for multiple code writes would be similar. This option for using DS/CS as data space chip enable and $\overline{M C E}$ as code space chip enable is invoked by setting the MEMCFG bit high in the configuration register. When MEMCFG is high, $\mathrm{DS} / \overline{\mathrm{CS}}$ and $\overline{\mathrm{MCE}}$ rise after every rising clock edge. In this mode, $\mathrm{DS} / \overline{\mathrm{CS}}$ and $\overline{M C E}$ may not both be active (low) at the same time.

Figure 40. Host-Independent Mode MSD Bus Timing, SMJ34082A to Memory with MEMCFG High

$\dagger$ The setting of DS/ $\overline{C S}$ determines whether the value on the MSD bus is an instruction or data.
$\ddagger \overline{\mathrm{MCE}}$ dos not toggle at each rising clock edge.
$\S \frac{\mathrm{MOE}}{\mathrm{Moes}}$ high at each rising clock edge.
NOTE: This example shows a data write followed by an instruction read.
Figure 41. Host-Independent Mode, MSD Enable/Disable Timing with MEMCFG Low

## GRAPHICS FLOATING-POINT PROCESSOR

D3592, SEPTEMBER 1990 - REVISED MAY 1991 - SGUS012A
PARAMETER MEASUREMENT INFORMATION


NOTE: This example shows a data write followed by an instruction read. Timing for multiple code writes would be similar. This option for using $\mathrm{DS} / \overline{\mathrm{CS}}$ as data space chip enable and $\overline{M C E}$ as code space chip enable is invoked by setting the MEMCFG bit high in the configuration register. When MEMCFG is high, DS/CS and MCE rise after every rising clock edge. In this mode, DS/CS and $\overline{M C E}$ may not both be low at the same time.

Figure 42. Host-Independent Mode, MSD Bus Enable/Disable Timing with MEMCFG High

PARAMETER MEASUREMENT INFORMATION

$\dagger \overline{\mathrm{MCE}}$ dos not toggle at each rising clock edge.
$\ddagger \overline{\mathrm{MOE}}$ goes high at each rising clock edge.
Figure 43. Host-Independent Mode, MSD to LAD Bus Transfer Timing with MEMCFG High

## PARAMETER MEASUREMENT INFORMATION


$\dagger$ DS/ $\overline{C S}$ valid for moves to data space; $\overline{M C E}$ valid for moves to code space. Only one would be valid for each move instruction.
NOTE: This option for using DS/CS as data space chip enable and $\overline{M C E}$ as code space chip enable is involved by setting the MEMCFG bit high in the configuration register.

Figure 44. Host-Independent Mode, MSD to LAD Bus Transfer Timing with MEMCF High

PARAMETER MEASUREMENT INFORMATION

$\dagger$ Dotted line shows DS/CS for MEMCFG high.
$\ddagger$ The CC input is registered on each rising edge of the clock, so the CC bit can be latched one cycle and tested during the next cycle.
Figure 45. Host-Independent Mode, MSD Bus Timing Test Condition (CC) and Branch

$\dagger$ Dotted line shows DS/CS for MEMCFG high.
$\ddagger$ Valid for MEMCFG low only. When MEMCFG low, COINT is set high by the set mask instruction, and it remains high until reset with another set mask instruction.
§ The CORDY output is set low by the set mask instruction, and it remains low until reset with another set mask instruction.
Figure 46. Host-Independent Mode MSD Bus Timing, SET/RESET $\overline{\text { COINT }}$ and CORDY

$\dagger$ Dotted lines show DS/CS and $\overline{M C E}$ for MEMCFG high.
$\ddagger$ INTR is negative-edged triggered.
NOTE: Interrupts are not granted during multi-cycle instructions. This example shows two interrupt requests. The first is granted immediately; the second, after the first is finished. INTG remains high after an interrupt is granted until interrupts are reenabled or a return from interrupt instruction is executed.

Figure 47. Host-Independent Mode, MSD Bus Timing External Interrupt to SMJ34082A

## PARAMETER MEASUREMENT INFORMATION



[^15]Figure 48. Host-Independent Mode, MSD Bus Timing Wait State Timing

## PROGRAMMING INFORMATION

## programming the SMJ34082A

The SMJ34082A is supported by a software development tool kit, including a C compiler and an assembler. Program development using the tools is described in the TMS34082A tool kit documentation. Information on internal instructions and listing of the external instructions are provided in the following sections.

In both the coprocessor and host-independent modes, the SMJ34082A instruction word is 32 bits long. The number, length, and arrangement of fields in the 32-bit word depends on the operating mode and operation selected. Internal microcode to the SMJ34082A is not restricted to the same 32-bit instruction formats so certain internal programs may execute faster than the same operations written with external code can achieve.
In the coprocessor mode, the SMJ34082A can execute instructions both from the SMJ34020 and from the program memory on the MSD bus (MSD31-0). In the host-independent mode the SMJ34082A is controlled from code input on the MSD bus. Internal instructions may be executed in the host-independent mode by performing a jump to the internal address.

## PROGRAMMING INFORMATION

## internal instructions

The SMJ34082A FPU performs a wide range of internal arithmetic and logical operations, as well as complex operations (flagged ' $\dagger$ '), summarized below. Complex instructions are multi-cycle routines stored in the internal program ROM.

One-Operand Operations:

Absolute Value
Square Root
Reciprocal ${ }^{\dagger}$
Conversions:

> Integer to Single
> Integer to Double
> Single to Double

Two-Operand Operations:
Add Multiply
Subtract Divide
Compare
Matrix Operations:
$4 \times 4,4 \times 4$ Multiply $\dagger$
$1 \times 4,4 \times 4$ Multiply $\dagger$
$3 \times 3,3 \times 3$ Multiply ${ }^{\dagger}$
Graphics Operations:
Backface Testing ${ }^{\dagger}$
Polygon Clipping ${ }^{\dagger}$
2-D Linear Interpolation ${ }^{\dagger}$
2-D Window Compare ${ }^{\dagger}$
2-Plane Clipping $(X, Y, Z)^{\dagger}$
2-D Cubic Spline ${ }^{\dagger}$
Image Processing:
$3 \times 3$ Convolution $\dagger$
Chained Operations:
Polynomial Expansion $\dagger$
1-D Min/Max ${ }^{\dagger}$
Vector Operations:

| Add ${ }^{\dagger}$ | Dot Product ${ }^{\dagger}$ |
| :--- | :--- |
| Subtract $^{\dagger}$ | Cross Product $^{\dagger}$ |
| Magnitude $^{\dagger}$ | Normalization $^{\dagger}$ |
| Scaling $^{\dagger}$ | Reflection |

Multiply/Accumulate ${ }^{\dagger}$
2-D Min/Max ${ }^{\dagger}$
Dot Product ${ }^{\dagger}$
Cross Product ${ }^{\dagger}$
Normalization
Reflection ${ }^{\dagger}$

The internal ROM routines may be used in either the coprocessor or host-independent mode. In the coprocessor mode, the internal routines are invoked by SMJ34020 instructions to its coprocessor(s).

In the host-independent mode, the internal programs can be called as subroutines by the externally stored code. External programs can call internal routines by executing a jump to subroutine with bit 16 (internal code select) set high and the address of the internal routine as the jump address.
The format of the SMJ34082A instruction in the coprocessor mode is shown in Figure 49. The instruction is issued by the SMJ34020 via the LAD bus.

[^16]PROGRAMMING INFORMATION


Figure 49. SMJ34082A Instruction
The 3-bit ID field identifies the coprocessor for which the instruction is intended. This coprocessor ID corresponds to the settings of the CID2-CID0 pins. To broadcast an instruction to all coprocessors, the ID is set to 4 .

Table 5. Coprocessor ID

| 1D | COPROCESSOR |
| :---: | :---: |
| 000 | FPUO |
| 001 | FPU1 |
| 010 | FPU2 |
| 011 | FPU3 |
| 100 | FPU broadcast |
| 101 | Reserved |
| 110 | Reserved |
| 111 | User defined |

Four coprocessor addressing modes are defined for the SMJ34082A. The md field indicates the addressing mode.

Table 6. Addressing Modes

| MODE | MD FIELD | OPERATION |
| :---: | :---: | :--- |
| 0 | 00 | FPU internal operations with no jump or external moves |
| 1 | 01 | Transfer data to/from SMJ34020 registers |
| 2 | 10 | Transfer data to/from memory (controlled by SMJ34020) |
| 3 | 11 | External instructions |

The type and size bits identify the type of operand; as shown below in Table 7. The I bit is used to indicate to the SMJ34082A that this is a reissue of a coprocessor instruction due to a bus interruption. The least significant four bits are the bus status bits, which will all be zero to indicate a coprocessor cycle.

Table 7. OPERAND Types

| TYPE | SIZE | OPERAND TYPE |
| :---: | :---: | :--- |
| 0 | 0 | 32-bit integer |
| 0 | 1 | Reserved |
| 1 | 0 | Single-precision floating-point (32-bit) |
| 1 | 1 | Double-precision floating-point (64-bit) |

The ra, rb, and rd fields are for the two sources and destination within the FPU. Register addresses are listed in Table 1. For the ra and rb fields, only the four least significant bits of the register address are used. The ra field may only use the RA register file, C , and CT . The RB field may only use the RB register file, C and CT.
The Floating-Point Unit Operation (fpuop) field is the FPU opcode ( 5 bits) described in Tables 8, 9, and 10.
In the coprocessor mode, the SMJ34082A executes user-defined routines (stored in external memory on the MSD bus) by executing a jump to external code. For this instruction, the md field (bits 15-13) is set high and the fpuop field gives the routine number ( $0-31$ ). The SMJ34082A multiplies the routine number by two to get the jump address. For example, routine number 14 would have a jump address of 28 decimal or 1 C hex.

## SMJ34082A GRAPHICS FLOATING-POINT PROCESSOR

SGUS012A - D3592, SEPTEMBER 1990 - REVISED MAY 1991

## PROGRAMMING INFORMATION

The routines are coded using the external instruction format discussed in the next section. The last instruction should be a jump to internal instruction address 0FFFh with the l-bit(internal) set or a return from subroutine instruction. This puts the FPU in an idle state, waiting for the next instruction from the SMJ34020.

Table 8. Coprocessor Mode Instructions

| FPUOP | TMS34020 ASSEMBLER OPCODE | DESCRIPTION |
| :---: | :---: | :---: |
| 00000 | ADDx | Sum of ra and rb, place in rd |
| 00001 | SUBx | Subtract rb from ra, place result in rd |
| 00010 | CMPX | Set status bits on result of ra minus rb |
| 00011 | SUBx | Subtract ra from rb, place result in rd |
| 00100 | ADDAx | Absolute value of sum of ra and ri, place result in rd |
| 00101 | SUBAX | Absolute value of (ra minus rb), place result in rd |
| 00110 | MOVE or MOVx | Load multiple FPU registers from SMJ34020 GSP or its memory |
| 00111 | MOVE or MOVx | Save multiple FPU registers to SMJ34020 GSP or its memory |
| 01000 | MPYx | Multiply ra and rb, place result in rd |
| 01001 | DIVx | Divide ra by rb, place result in rd |
| 01010 | INVx | Divide 1 by rb, place result in rd |
| 01011 | ASUBAx | Absolute value of ra minus absolute value of rb , place in rd |
| 01100 | reserved |  |
| 01101 | MOVEX | Move ra to rd, multiple, for $n$ registers |
| 01110 | MOVEX | Move rb to rd, multiple, for n registers |
| 01111 | (see Table 10) | Single operand instructions, rb field redefined |
| 10000 | CPWx | Compare point to window (set XLT, XGT, YLT, TGT) |
| 10001 | CPVx | Compare point to volume (set XLT, XGT, YLT, YGT, ZLT, ZGT) |
| 10010 | BACKFx | Test polygon for facing direction (backface test) |
| 10011 | INMNMXX | Setup FPU registers for MNMX1 or MNMX2 instruction |
| 10100 | LINTx | Given [ $\mathrm{X} 1, \mathrm{Y} 1, \mathrm{Z} 1$ ], [X2, Y2, Z2], and a plane, find [ $\mathrm{X} 3, \mathrm{Y} 3, \mathrm{Z} 3$ ] |
| 10101 | CLIPFx | Clip a line to a plane pair boundary (start with point 1) |
| 10110 | CLIPRx | Clip a line to a plane pair boundary (start with point 2) |
| 10111 | CLIPCFx | Clip color values to a plane pair boundary (start with point 1) |
| 11000 | SCALEX | Scale and convert coordinates for viewpoint |
| 11001 | MTRANx | Transpose a matrix |
| 11010 | CKVTXx | Compare a polygon vertex to a clipping volume |
| 11011 | CONVx | $3 \times 3$ convolution |
| 11100 | CLIPCRx | Clip color values to a plane pair boundary (start with point 2) |
| 11101 | OUTC3x | Compare a line to a clipping value |
| 11110 | CSPLNx | Calculate cubic spline for given coefficients |
| 11111 | (see Table 11) | Vector and matrix instructions, rb field redefined |

$F$ denotes single-precision, $D$ denotes double-precision floating-point, $x$ denotes operand type, and a blank designates signed integer

## PROGRAMMING INFORMATION

Table 9. Coprocessor Mode Instructions, FPUOP = 01111,

| RB | TMS34020 ASSEMBLER OPCODE |  |
| :---: | :---: | :--- |
| 0000 | PASS | Copy ra to rd |
| 0001 | NOT | Place is complement of ra in rd |
| 0010 | ABS | Place absolute value of ra in rd |
| 0011 | NEG | Place negated value of ra in rd |
| 0100 | CVDF | Convert double in ra to single in rd (T and S define ra) |
| 0100 | CVFD | Convert single in ra to double in rd (T and S define ra) |
| 0101 | CVDI | Convert double in ra to integer in rd (T and S define ra) |
| 0101 | CVFI | Convert single in ra to integer in rd (T and S define ra) |
| 0110 | CVID | Convert integer in ra to double in rd (T and S define ra) |
| 0110 | CVIF | Convert integer in ra to single in rd (T and S define ra) |
| 0111 | VSCLx | Multiply each component of a velocity by a scaling factor |
| 1000 | SQARx | Place (ra * ra) in rd |
| 1001 | SQRTx | Extract square root or ra, place in rd |
| 1010 | SQRTAx | Extract square root of absolute value of ra, place in rd |
| 1011 | ABORT | Stop execution of any FPU instruction |
| 1100 | CKVTXI | Initialize check vertex instruction |
| 1101 | CHECK | Check for previous instruction completion |
| 1110 | MOVMEM | Move data from system memory to external memory @ MCADDR |
| 1111 | MOVMEM | Move data to system memory from external memory @ MCADDR |

Table 10. Coprocessor Mode Instructions, FPUOP = $\mathbf{1 1 1 1 1}_{2}$

| RB | TMS34020 ASSEMBLER OPCODE | DESCRIPTION |
| :---: | :---: | :--- |
| 0000 | POLYx | Polynomial expansion |
| 0001 | MACx | Multiply and accumulate |
| 0010 | MNMX1x | Determine 1-D minimum and maximum of a series |
| 0011 | MNMX2x | Determine 2-D minimum and maximum of a series of pairs |
| 0100 | MMPY0x | Multiply matrix elements $0,1,2,3$ by vector element 0 |
| 0101 | MMPY1x | Multiply matrix elements $4,5,6,7$ by vector element 1 |
| 0110 | MMPY2x | Multiply matrix elements $8,9,10,11$ by vector element 2 |
| 0111 | MMPY3x | Multiply matrix elements $12,13,14,15$ by vector element 3 |
| 1000 | MADDx | Add matrix elements $12,13,14,15$ to vector |
| 1001 | VADDx | Add two vectors |
| 1010 | VSUBx | Subtract a vector from a vector |
| 1011 | VDOTx | Compute scalar dot product of two vectors |
| 1100 | VCROSx | Compute cross product of two vectors |
| 1101 | VMAGx | Determine the magnitude of a vector |
| 1110 | VNORMx | Normalize a vector to unit magnitude |
| 1111 | VRFLCTx | Given normal and incident vectors, find the reflection |

$F$ denotes single-precision, $D$ denotes double-precision floating-point, $x$ denotes operand type, and a blank designates signed integer

## PROGRAMMING INFORMATION

## external instructions

External instructions are 32 bits long, and their formats (number, length, and function of fields) depend on the operations being selected. Separate formats are provided for data transfers, FPU processing, test and branch operations, and subroutine calls.
Instructions that control FPU operations can select operands from input registers, internal feedback, or from the LAD bus (32-bit operations only). The format for an FPU processing instruction is shown in Figure 50.

| 31 | 28 | 20 |  | 11 |  | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| OP | RA | RB | RD | SEL OP | INSTRUCTION |  |

Figure 50. FPU Processing External Instruction Format
The op field selects the sequencer operation. Three continue instructions are available to permit control of the $\overline{W E}$ and $\overline{\text { ALTCH }}$ strobe outputs, which enable LAD output in the host-independent mode. The ra, rb, and rd fields are for the two sources and destination in the SMJ34082A register file. The sel_op field selects the source of the operands: register file or feedback registers. The instruction field designates the operation to be performed.
External instructions and cycle counts are listed in Table 11. Absolute values of operands or results, negated results, and wrapped number inputs are selectable options. Chained operations, using the multiplier and ALU in parallel, and other instructions to control program flow and move data are included.

External instruction timing depends on the pipeline registers setting, controlled by the PIPES2-1 bits in the configuration register. Most FPU processing instructions (with the exception of divide, square root, and double-precision multiply) execute in one cycle per pipeline stage.

## PROGRAMMING INFORMATION

Table 11. External Instructions and Timing

| SMJ34082A ASSEMBLER OPCODE | DESCRIPTION OF ROUTINE | $\begin{gathered} \hline \text { PIPES2-1 } \\ 11 \end{gathered}$ | $\begin{gathered} \text { PIPES2-1 } \\ 10 \end{gathered}$ | $\begin{gathered} \hline \text { PIPES2-1 } \\ 01 \end{gathered}$ | $\begin{aligned} & \text { PIPES2-1 } \\ & 00 \end{aligned}$ |
| :---: | :---: | :---: | :---: | :---: | :---: |
| ADD | Add A + B | 1(1) | 2(1) | 2(1) | 3(1) |
| AND | Logical AND A, B | 1(1) | 2(1) | 2(1) | 3(1) |
| ANDNA | Logical AND NOT A, B | 1(1) | 2(1) | 2(1) | 3(1) |
| ANDNB | Logical AND A, NOT B | 1(1) | 2(1) | 2(1) | 3(1) |
| CJMP | Conditional jump | 1(1) | 1(1) | 1(1) | 1(1) |
| CS.JR | Conditional jump to subroutine | 1(1) | 1(1) | 1(1) | 1(1) |
| CMP | Compare A, B | 1(1) | 2(1) | 2(1) | 3(1) |
| COMPL | Pass is complement of $A$ | 1(1) | 2(1) | 2(1) | 3(1) |
| DIV | Divide A/B $S P$ DP integer | $\begin{gathered} 8(8) \\ 13(13) \\ 16(16) \\ \hline \end{gathered}$ | $\begin{gathered} 8(7) \\ 13(12) \\ 16(15) \end{gathered}$ | $\begin{gathered} 9(7) \\ 15(12) \\ 17(15) \end{gathered}$ | $\begin{gathered} 9(7) \\ 15(12) \\ 17(15) \end{gathered}$ |
| DTOF | Convert from DP to SP | 1(1) | 2(1) | 2(1) | 3(1) |
| DTOI | Convert from DP to integer | 1(1) | 2(1) | 2(1) | 3(1) |
| DTOU | Convert from DP to unsigned integer | 1(1) | 2(1) | 2(1) | 3(1) |
| FTOD | Convert from SP to DP | 1(1) | 2(1) | 2(1) | 3(1) |
| FTOI | Convert from SP to integer | 1(1) | 2(1) | 2(1) | 3(1) |
| FTOU | Convert from SP to unsigned integer | 1(1) | 2(1) | 2(1) | 3(1) |
| ITOD | Convert from integer to DP | 1(1) | 2(1) | 2(1) | 3(1) |
| ITOF | Convert from integer to SP | 1(1) | 2(1) | 2(1) | 3(1) |
| LD | Load $n$ words into register SP DP integer | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ |
| LDLCT | Load loop counter with value | 1(1) | 1(1) | 1 (1) | 1(1) |
| LDMCADDR | Load MCADDR with value | 1(1) | 1(1) | 1(1) | 1(1) |
| MASK | Set programmable mask | 1(1) | 1(1) | 1(1) | 1(1) |
| MOVA | Move A (no status flags active) | 1(1) | 2(1) | 2(1) | 3(1) |
| MOVLM | Move n words from LAD bus to MSD bus SP DP integer | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ |
| MOVML | Move n words from MSD bus to LAD bus SP <br> DP integer | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ |
| MOVRR | Multiple move, register to register SP <br> DP integer | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ |
| MULT.ADD | ```Multiply \(A_{1} * B_{1}\), Add \(A_{2}+B_{2}\) SP DP integer``` | $\begin{aligned} & 1(1) \\ & 2(2) \\ & 1(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 3(1) \\ & 4(2) \\ & 3(1) \\ & \hline \end{aligned}$ |

DP denotes double-precision, and SP denotes single-precision.

## PROGRAMMING INFORMATION

Tabale 11. External Instructions and Timing (Continued)

| SMJ34082A ASSEMBLER OPCODE OPCODE | DESCRIPTION OF ROUTINE | $\begin{gathered} \hline \text { PIPES2-1 } \\ 11 \end{gathered}$ | $\begin{gathered} \hline \text { PIPES2-1 } \\ 10 \end{gathered}$ | PIPES2-1 01 | PIPES2-1 00 |
| :---: | :---: | :---: | :---: | :---: | :---: |
| MULT.NEG | ```Multiply \(A_{1} * B_{1}\), Subtract \(0-A_{2}\) SP DP integer``` | $\begin{aligned} & 1(1) \\ & 2(2) \\ & 1(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 3(1) \\ & 4(2) \\ & 3(1) \\ & \hline \end{aligned}$ |
| MULT | $\begin{aligned} & \hline \text { Multiply A * B } \\ & \text { SP } \\ & \text { DP } \\ & \text { integer } \\ & \hline \end{aligned}$ | $\begin{aligned} & 1(1) \\ & 2(2) \\ & 1(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 3(1) \\ & 4(2) \\ & 3(1) \\ & \hline \end{aligned}$ |
| MULT.PASS | ```Multiply A1* B1, Add A2 + 0 SP DP integer``` | $\begin{aligned} & 1(1) \\ & 2(2) \\ & 1(1) \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \end{aligned}$ | $\begin{aligned} & 3(1) \\ & 4(2) \\ & 3(1) \end{aligned}$ |
| MULT.SUB | ```Multiply \(A_{1} * B_{1}\), Subtract \(A_{2}-B_{2}\) SP DP integer``` | $\begin{aligned} & 1(1) \\ & 2(2) \\ & 1(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 3(1) \\ & 4(2) \\ & 3(1) \end{aligned}$ |
| MULT.2SUBA | ```Multiply A1 * B1, Subtract 2- A2 SP DP integer``` | $\begin{aligned} & 1(t) \\ & 2(2) \\ & 1(t) \\ & \hline \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 3(1) \\ & 4(2) \\ & 3(1) \\ & \hline \end{aligned}$ |
| MULT.SUBRL | ```Multiply \(A_{1} * B_{1}\), Subtract \(B_{2}-A_{2}\) SP DP integer``` | $\begin{aligned} & 1(1) \\ & 2(2) \\ & 1(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 3(1) \\ & 4(2) \\ & 3(1) \\ & \hline \end{aligned}$ |
| NEG | Pass - A (2s Complement) | 1(1) | 2(1) | 2(1) | 3(1) |
| NOR | Logical NOR A, B | 1(1) | 2(1) | 2(1) | 3(1) |
| OR | Logical OR A, B | 1(1) | 2(1) | 2(1) | $3(1)$ |
| PASS | Pass A | 1(1) | 2(1) | 2(1) | 3(1) |
| PASS | Pass B | 1(1) | 2(1) | 2(1) | 3(1) |
| PASS.ADD | ```Multiply A1* 1, Add A 2 + B2 SP DP integer``` | $\begin{aligned} & 1(1) \\ & 2(2) \\ & 1(1) \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \end{aligned}$ | $\begin{aligned} & 3(1) \\ & 4(2) \\ & 3(1) \\ & \hline \end{aligned}$ |
| PASS.NEG | ```Multiply A1* 1, Subtract 0-A A SP DP integer``` | $\begin{aligned} & 1(1) \\ & 2(2) \\ & 1(1) \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \end{aligned}$ | $\begin{aligned} & 3(1) \\ & 4(2) \\ & 3(1) \end{aligned}$ |
| PASS.PASS | ```Multiply A1 * 1, Add A A2+0 SP DP integer``` | $\begin{aligned} & 1(1) \\ & 2(2) \\ & 1(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \end{aligned}$ | $\begin{aligned} & 3(1) \\ & 4(2) \\ & 3(1) \end{aligned}$ |
| PASS.SUB | ```Multiply A1 * 1, Subtract A2- B2 SP DP integer``` | $\begin{aligned} & 1(1) \\ & 2(2) \\ & 1(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 3(1) \\ & 4(2) \\ & 3(1) \\ & \hline \end{aligned}$ |
| PASS.2SUBA | ```Multiply A1*1, Subtract 2-A SP DP integer``` | $\begin{aligned} & 1(1) \\ & 2(2) \\ & 1(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 3(1) \\ & 4(2) \\ & 3(1) \end{aligned}$ |

DP denotes double-precision, and SP denotes single-precision.

## PROGRAMMING INFORMATION

Table 11. External Instructions and Timing (Continued)

| SMJ34082A ASSEMBLER OPCODE | DESCRIPTION OF ROUTINE | CYCLE COUNTS |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: |
|  |  | $\begin{gathered} \hline \text { PIPES2-1 } \\ 11 \\ \hline \end{gathered}$ | $\begin{gathered} \text { PIPES2-1 } \\ 10 \end{gathered}$ | $\begin{aligned} & \text { PIPES2-1 } \\ & 01 \end{aligned}$ | $\begin{gathered} \text { PIPES2-1 } \\ 00 \end{gathered}$ |
| RTS | Return from subroutine | 1(1) | 1(1) | 1(1) | 1(1) |
| SLL | Logical shift left A by B bits | 1(1) | 2(1) | 2(1) | 3(1) |
| SQRT | ```Square root of \(A\) SP DP integer``` | $\begin{aligned} & 11(11) \\ & 16(16) \\ & 20(20) \end{aligned}$ | $\begin{aligned} & 11(10) \\ & 16(15) \\ & 20(19) \end{aligned}$ | $\begin{aligned} & 12(10) \\ & 17(15) \\ & 21(19 \end{aligned}$ | $\begin{aligned} & 12(10) \\ & 17(15) \\ & 21(18) \end{aligned}$ |
| PASS.SUBRL | ```Multiply A1 * 1, Subtract B2 - A2 SP DP integer``` | $\begin{aligned} & 1(1) \\ & 2(2) \\ & 1(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 2(1) \\ & 3(2) \\ & 2(1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 3(1) \\ & 4(2) \\ & 3(1) \\ & \hline \end{aligned}$ |
| SRA | Arithmetic shift right $A$ by $B$ bits | 1(1) | 2(1) | 2(1) | 3(1) |
| SRL | Logical shift right $A$ by $B$ bits | 1(1) | 2(1) | 2(1) | 3(1) |
| ST | Store n words from register SP DP integer | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ | $\begin{gathered} n+1 \\ 2 n+1 \\ n+1 \end{gathered}$ |
| SUB | Subtract A-B | 1(1) | 2(1) | 2(1) | 3(1) |
| SUBRL | Subtract B-A | 1(1) | 2(1) | 2(1) | 3(1) |
| UTOD | Convert from unsigned integer to DP | 1(1) | 2(1) | 2(1) | 3(1) |
| UTOF | Convert from unsigned integer to SP | 1(1) | 2(1) | 2(1) | 3(1) |
| UWRAPI | Unwrap inexact operand | 1(1) | 2(1) | 2(1) | 3(1) |
| UWRAPR | Unwrap rounded operand | 1(1) | 2(1) | 2(1) | 3(1) |
| UWRAPX | Unwrap exact operand | 1(1) | 2(1) | 2(1) | 3(1) |
| WRAP | Wrap denormalized operand | 1(1) | 2(1) | 2(1) | 3(1) |
| XOR | Logical exclusive OR A, B | 1(1) | 2(1) | 2(1) | 3(1) |

DP denotes double-precision, and SP denotes single-precision.

## MECHANICAL DATA

GB pin-grid-array ceramic package
This is a hermetically sealed package.


NOTES: D. Pins are located within $0,13(0.005)$ radius of true position relative to each other at maximum meterial condition and within $0,457(0.018)$ radius of the center of the ceramic.
E. Dimensions do not include solder finish.

SMJ34082A
GRAPHICS FLOATING-POINT PROCESSOR
D3592, SEPTEMBER 1990 - REVISED MAY 1991 - SGUS012A

## Appendix D

## Maximizing Your MFLOPS with the TMS34082 and Motorola MC68030

This application report demonstrates one way that the TMS34082 floating-point processor can be coupled to a Motorola MC68030 microprocessor for high-performance and cost-effective, IEEE 74-1985 compatible, floating-point solutions.

## Overview

## Objectives

The TMS34082 Floating-Point Processor from Texas Instruments is a cost-effective, high-performance floating-point device. The objective of this application report is to demonstrate one way that the TMS34082 floating-point processor can be tightly coupled to a Motorola MC68030 microprocessor for high-performance and cost-effective, IEEE 754-1985 compatible, floating-point solutions. This application report is for Motorola MC680x0 users who interface to the VME/VSB bus, develop stand-alone systems, or who require fast floating-point processor solutions.

This document will show the simplicity and efficiency with which the TMS34082 interfaces with the Motorola MC68030 as a parallel floating-point processor. This report will also show the advanced floating-point capabilities of the TMS 34082 compared to the Motorola MC6888X family.

Direct comparisons have been made between Motorola's coprocessor family and the TMS34082 Floating-Point Processor. Table 1 in the performance analysis section details a comparison of the TMS34082 and the Motorola MC68881. The results clearly show the increase in performance realized by choosing the TMS34082 as the host floating-point processor. By operating the TMS34082 in parallel with the Motorola MC68030, multiple operations can be processed simultaneously for enhanced performance.

When running the TMS34082 floating-point processor in parallel with the Motorola MC68030 as a host, the host processor must ensure the floating-point processor is always busy. In addition, the host processor must also have access to the floating-point processor's outputs and complete control for immediate stalls or interrupts. Details of the system architecture can be found in the System Architecture section.

## TMS34082 Overview

The TMS34082 has features that are unique to floating-point processors. Some of these features are described below.

- Dual buses for accessing both data space and code space: This design allows you to download data over the LAD bus and transfer both instructions and data over the MSD bus, using the TMS34082's ability to simultaneously load instructions and operands over its two buses.
- Dynamic bus-switching: The CC pin can be triggered to affect an immediate jump to a preloaded address. Similar to an interrupt, this feature lets you jump straight to a routine in SRAM.
- Pipelining: The Harvard architecture within the TMS34082 allows pipelined data flow through the internal TMS34082 FPU, maximizing sustained throughput.
- Dynamic pipeline settings: Dynamic pipelining allows flexibility with data flow and feedbacks. Pipeline settings in the configuration register will direct feedback to registers, maximize throughput, or process vectors.
- FAST vs IEEE mode: The TMS34082 can function in fully IEEE 754-1985 compatible mode as well as in a mode that allows flushing all denormalizezd numbers to zero (FAST mode).
- Exception handling: The internal structure of the TMS34082 allows detection of status exceptions via software interrupts that generate address vectors to exception handling subroutines.

Internally, the TMS34082 has 22 onboard registers, which are well suited for matrix multiply graphic routines. It also has selectable data formats such as 32-bit integer, 32-bit floating-point, and 64-bit floating-point processors. In addition, there are internal programs for vector, matrix, and graphics operations.

The TMS34082s dual-bus structure gives you greater design flexibility. You can dynamically switch between the LAD and MSD buses while downloading instructions and data. Results are output on either the LAD or MSD bus.

## System Architecture

## System Overview

Memory mapping is chosen to interface the TMS34082 to the MotorolaMC68030 in this design because it is direct and yields high-performance solutions. Furthermore, memory mapping allows the designer the flexibility to develop the floating-point processing interface around system memory.

Parallel processing provides the greatest throughput when coupling the TMS34082 to the Motorola MC68030 processor. In this design, the parallel processing tasks use buffers for data, instructions, and output (see Figure 1). The TMS34082 receives instructions and secondary data via the MSD bus from a dual-port SRAM (DP-SRAM). The dual-port SRAM has been preloaded by the Motorola MC68030. Primary data is obtained over the LAD bus through a FIFO buffer, which has also been preloaded by the Motorola MC68030.
Employing a FIFO buffer to download data to the LAD bus makes effective use of the Motorola MC68030's blocking loading capability, thus freeing the host processor for other functions. The LAD bus FIFO buffer can block load the TMS34082's internal registers with minimal overhead.

After receiving the data, the TMS34082 completes its calculations and writes its results into the dual-port SRAM buffer. To communicate when the calculations have been completed, the TMS34082 can interrupt the Motorola MC68030 and tell it to poll to the dual-port SRAM for output. Alternately, an optimizing compiler can set up boundary limits indicating when the DP-SRAM is full.


Figure 1. Motorola MC68030 Interface to the TMS34082 - Block Diagram
The system is initialized through a bootstrap loader program. The TMS34082 reads its start-up data through the LAD bus and transfers it via the MSD bus to the DP-SRAM. The first word of data is used to load the configuration register. After 65 clock cycles, the onboard program counter resets itself to 0 and reads from that address in the DP-SRAM.

The Motorola MC68030 receives its code and data from a dual-port, $8 \mathrm{~K} \times 32$ SRAM. The SRAM information is uploaded from an PC/AT supervisory host through address and data buffers. The bus arbitration handshaking between the PC/AT bus and the Motorola MC68030 is accomplished by I/O mapping on the PC/AT.

Maximum throughput could be realized with an optimizing compiler by grouping functions and operands so that calculations can be pipelined and the registers can be loaded as a block.

As a download host, the IBM PC/AT is accessible to most users, allowing the duplication of this design with relative ease.

## Objectives and Trade-Offs

The design objectives during the initial phases of the project were, in order of rank: performance, cost, size, and power. To maximize performance while keeping costs to a minimum, the following guidelines were used:

- Gain in performance should be commensurate with the gain in cost. In other words, a $10 \%$ increase in performance must be justified by no more than $10 \%$ gain in cost.
- A primary objective was to demonstrate the TMS34082's full capabilities by operating at maximum speed without wait states. This is accomplished by using parts that sufficiently meet the TMS34082 throughput requirements for maximum performance.

There are two schools of thought in processing floating-point operands. The first is to load all data from the Motorola MC68030 host through the FIFOs onto the LAD bus. Instructions and other data are loaded into the DP-SRAM, which the TMS34082 could access over the MSD bus. Results are then placed back into the DP-SRAM and read by the Motorola MC68030. This method is slightly faster, but requires a more sophisticated compiler.

The other approach is to toggle the CC signal to the TMS34082. CC is activated by setting the appropriate mask bit in the configuration register. Toggling CC signals the TMS34082 program loads an address vector over the LAD bus that points to an MSD address in external memory. The TMS34082 then executes the routine at this address. This example is useful when the DP-SRAM acts as a monitor and contains routines that are accessed frequently. An optimizing compiler would load relevant operands to the DP-SRAM or to TMS34082 internal registers and then point to a routine contained in DP-SRAM. The trade-off is that one clock cycle is lost in the jump process, but the compiler would have less overhead.

## Software Description

## Overview of Code Development

The objective of the software programs is to demonstrate the full capabilities of the TMS34082. Operands to the TMS34082 are represented in single-precision, double-precision, and integer formats.

Other features presented in these programs are:

- matrix operations,
- conversions between formats,
- arithmetic operations,
- vector processing,
- feedback operations,
- internal ROM routines,
- and block moves, making efficient use of the internal register set.

All of the resident software has been written in the processor's respective assembler language. Software driving the PC/AT is primarily written in C or assembly language.

Code development necessarily begins with the TMS34082. This then becomes the data code for the Motorola MC68030. Routines are written in Motorola MC68030 assembly language to handle data uploads to the FIFO, both uploads and downloads of data/instruction code to the DP-SRAM, and Motorola MC68030 host code resident in the $8 \mathrm{~K} \times 32$ host SRAMs.

The initial code supplied to hostSRAMs is transferred from the PC/AT. The resident Motorola MC68030 assembly language routines are translated from that format to one that the PC/AT recognizes.

The test software developed for this system writes and reads data from the host SRAM to test for correctness, address range functionality, and setup time validity. In addition, it allows thorough testing of the PC/AT bus and validation of hostmemory setup and hold times. The MS-DOS debugger is initially used for testing, while C code is implemented for more thorough test capabilities. In addition, the C code allows for ready upload and download of system software routines.

## Big Endian, Little Endian

Programmers of this system must take into consideration the differences between Big Endian and Little Endian. The Motorola MC68030 device memory can be addressed on a byte-by-byte basis. The data for each byte in a 32-bit word (long word) is in order from most significant to least significant bit. But, the bytes are arranged in order of least significant to most significant (Little Endian). Intel microprocessors reverse their bytes as compared to Motorola processors. Intel arranges bytes from most significant to least significant (Big Endian). Figure 2 illustrates further details on byte arrangement. The hardware description, Appendix B, details more information on mixed implementation of Big Endian/Little Endian.

The PC/AT's backplane uses a different technique to address memory. A byte starts on an addressable byte boundary. A word consisting of two bytes starts on an arbitrary boundary, and the high byte corresponds to a high address (see Figure 2), while the low byte corresponds to a low address.

Code written for this design must take these data formats into consideration.

| Long Word \$ 00000000 |  |  |  | \$0000 0000 |
| :---: | :---: | :---: | :---: | :---: |
| Word \$ 00000000 |  | Word \$ 00000002 |  |  |
| Byte \$0000 0000 | Byte \$ 00000001 | Byte \$ 00000002 | Byte \$0000 0003 |  |
| Long Word \$ 00000004 |  |  |  | \$0000 0004 |
| Word \$ | 0000004 | Word \$ | 000008 |  |
| Byte \$ 00000004 | Byte \$ 00000005 | Byte \$ 00000006 | Byte \$ 00000007 |  |

Intel 80286

| Data |  | Address |
| :---: | :---: | :---: |
| Word \$00000 |  | $\$ 0000$ |
| Byte $\$ 00001$ | Byte $\$ 00000$ |  |
| Word \$00002 |  |  |
| Byte \$00003 |  | Byte $\$ 00002$ |

TMS34082

| Data | Address |
| :---: | :---: |
| Long Word $\$ 0000$ | $\$ 0000$ |
| Long Word \$0001 | $\$ 0001$ |

Figure 2. Data Organization in Memory

## TMS34082 Code Development

Code for the TMS34082 includes abootstrap loader, hardware test, and processorroutines. During startup, routines confirm proper operation of all supporting hardware such as the SRAMs and FIFOs, to ensure the functioning of interfaces to the Motorola MC68030 host and to evaluate the accuracy of internal TMS34082 firmware.

A simple walking 1s and 0s is used to check the DP-SRAMs. The FIFOs can be checked with the bootstrap routine to verify that the proper data is being clocked through the device. In addition, the bootstrap also confirms LAD to MSD bus transfers. The bootstrap is enacted by the Motorola MC68030 by asserting the $\overline{\mathrm{HALT}}$ and the $\overline{\mathrm{INTR}}$ pins. (Consult the TMS34082 data sheet for bootstrap timing characteristics.)

The main software routine will make use of all the relevant internal instructions that demonstrate the TMS34082's processing capabilities. Two subprograms demonstrating the device's superior floating-point capabilities in processing matrix-multiply and transcendental functions are also included. Further, the TMS34082 can be reset either by the host processor, by the PC/AT, or manually.


Figure 3. Block Diagram - TMS34082 Code

## Motorola MC68030 Code Development

The Motorola MC68030 software is divided into six sections:

1. test,
2. read data/code from host SRAM,
3. output code to FIFOs and DP-SRAMs,
4. retrieve code from DP-SRAM,
5. and write data back to host SRAM.

Transferring data is relatively simple and can be seen in detail under the Software Listing section. The fundamental purpose of the test section is to check the DP-SRAM access and functionality from the Motorola MC68030 side. Checkout of the FIFOs has already been completed by the TMS34082 software. The host SRAM needs to be checked by the PC/AT before loading and by the Motorola MC68030 upon startup to verify correct dual access after arbitration (see Figure 4).


Figure 4. Block Diagram - Motorola MC68030 Code

## Intel 80286 Code Development

The PC/AT code has five primary functions (see Figure 5):

1. to upload and download code to Motorola MC6́88030's host SRAM,
2. to test hardware,
3. to provide for a convenient development platform,
4. to perform as a supervisory controller,
5. and to establish communication with the host system.


Figure 5. Block Diagram - PC/AT Code

## Hardware Description

## Overview

The board is an add-on card that fits into a 16-bit PC/AT slot. The speed at which the PC/AT bus accesses the board is not critical, since it acts as a supervisory host only. Its purpose is to transfer data to the $8 \mathrm{~K} \times 8$ SRAM, control bus arbitration, and to read status signals on the card. See the enclosed schematics section for details.

The hardware is best described by breaking the board down into two subsystems: the PC/AT interface and the Motorola MC68030 subsystem.

## PC/AT Interface

Two types of data are down loaded from the PC/AT to the Motorola MC68030: memory and I/O information. In the PC/AT, main memory is established for addresses 000000 H to $07 \mathrm{FFFFH}(512 \mathrm{~K})$, I/O expansion ROM for locations $0 \mathrm{C} 0000 \mathrm{H}-0 \mathrm{DFFFFH}$, and prototype card I/O addresses for $300 \mathrm{H}-31 \mathrm{FH}$. The software for this design makes use of all three of these memory ranges. All system development software is written in the 512 K bytes of user memory space.

To down load data to the board's 8 K words host SRAM an address in the middle of the PC/AT's I/O expansion ROM memory is chosen, 0 D 0000 H . Thus, 8 K words ( 32 bits wide) is placed in addresses $0 \mathrm{D} 0000 \mathrm{H}-0 \mathrm{D} 8000 \mathrm{H}$ (See Figure 6).


Figure 6. PC/AC Interface: I/O and Memory Addressing
Since the PC/AT system bus is 16 bits wide, it needs to match the 32 -bit logic of the Motorola MC68030/TMS34082 system. Interface decode logic handles this by way of an odd/even address toggle, i.e. an even address indicates the lower 16 bits and an odd address indicates the upper 16 bits. Data must be loaded sequentially for this to function properly. An alternative would be to use the dynamic bus sizing capabilities of the Motorola MC68030, but this would require additional handshake logic and minimize real estate for future expansion plans.

The PC/AT Technical Reference Manual recommends that prototype I/O addresses lie between 310 H and 31 FH . Input/output data is configured to be read and written from address 318 H . I/O is mainly used to control bus arbitration, to read status information, and to avoid address bus contention.

If you are only designing with the Motorola MC68030 and the TMS34082, then no design changes need to be made to compensate for byte addressing. However, if you prefer to implement the design as it is described in this report, byte addressing is of concern. While both Motorola's MC68030 and Intel's 80286 have their MSB at the leftmost position, the order in which the bytes are addressed is reversed, also known as a Big Endian/Little Endian format.

Two simple solutions present themselves. The first, a software solution, is to write code to reverse the byte order. The second, a hardware solution, is to simply reverse the data bits to match the byte order. For a prototype system such as this, a software solution is the preferred choice.

The PC/AT interface subsystem uses four PALs to handle address decoding for memory and I/O mapping, status acquisitions, and bus arbitration (See Figure 1).

## Host Processor Interface

This design operates in purely synchronous mode, reducing the overhead logic required to notify the Motorola MC68030 of data size acknowledgements and reducing instruction overhead.

To assist the system designer in applying the TMS34082 to their Motorola MC68030-based system, the following guidelines have been used:

- Interrupts to the Motorola MC68030 are disabled by pulling the signals IPLD $0, \overline{\mathrm{IPL}} 1, \overline{\mathrm{IPL}} 2 \mathrm{high}$. This implies that $\overline{\text { AVEC }}$ is tied high, which also simplifies synchronous operations.
- Occasionally, the TMS34082 and the Motorola MC68030 may attempt to access the same address location in the DP-SRAM, causing a collision. This contention is handled by having the DP-SRAM's BUSY flag pull the $\overline{\mathrm{BERR}}$ and the $\overline{\mathrm{HALT}}$ signals low simultaneously to delay the current cycle.
- Because this application always uses 32-bit data formats, $\overline{\mathrm{DSACK}} 0$ and $\overline{\mathrm{DSACK}} 1$ are pulled high to prevent assertion during synchronous operation with STERM.
- $\overline{\text { STERM }}$ is decoded as a synchronous bus cycle terminator. This also reduces bus cycle delays due to misaligned transfers as they are always 32 bits wide.
- Since this project employs relatively fast SRAMs, external cache is not needed and Motorola MC68030 internal cache is not used. Therefore, $\overline{\mathrm{CIN}}, \overline{\mathrm{CDIS}}$, and $\overline{\mathrm{CBACK}}$ are tied high. This also assists in stabilizing the setup and hold times during periods when $\overline{\mathrm{AS}}$ is asserted during sychronous operations.
- The memory management features of the Motorola MC68030 are not used. Consequently, $\overline{\text { MMUDIS }}$ is tied high.
- Arbitration between the PC/AT bus and the Motorola MC68030's bus is handled by onboard PAL logic.

Memory Addressing has been encoded as follows: host SRAM accesses at 00008000 H, DP-SRAM accesses at 00010000 H , and FIFO accesses at location 00020000 H . Thus, address bits 15,16 , and 17 can be used for each individual memory access.


Figure 7. Motorola MC68030 Interface: Memory Addressing

## TMS34082 as a Parallel Processor

Interfacing to the TMS34082 is simple and direct. This project emphasizes a design approach that requires minimal support hardware. By coupling a FIFO buffer directly to the LAD bus, external address latching and decoding is not required. From the MSD bus, the TMS34082 is directly coupled to the DP-SRAM, further reducing the decode hardware.

The data/code space is directly linked to address locations $0000 \mathrm{H}-07 \mathrm{FFH}$ and could be expanded to 64 K words as required. For further information regarding pin definitions and electrical characteristics, please refer to the TMS34082 data sheet.

## Performance Analysis

To accurately compare the performance of two coprocessors produced by two different manufacturers, it is essential to incorporate commonalties. A preliminary analysis has been completed that compares execution times of functions that are similar to the Motorola MC68881 and the TMS34082 (see Table 1).

The TMS 34082 typically speeds execution by 30-40 times. This does not take into consideration effective addressing, overlap, and pipelining, which widens the gap between the TMS34082's execution times and those of the Motorola coprocessor family. Detailed calculations are available upon request.

Table 1. Performance Comparison Chart

| Generic Instruction | Format/ Precision | TMS34082 |  | Motorola MC68881 |  |
| :---: | :---: | :---: | :---: | :---: | :---: |
|  |  | TI Instruction Syntax | Execution Time | Motorola Instruction Syntax | Execution Time |
| Add | Integer | ADD | 2 | FADD | 80 |
|  | Single | ADDF | 2 |  | 72 |
|  | Double | ADDD | 2 |  | 78 |
| Divide | Integer | DIV | 16 | FDIV | 132 |
|  | Single | DIVF | 7 |  | 124 |
|  | Double | DIVD | 13 |  | 130 |
| 1s Complement | Integer | NOT | 2 | FCMP | 62 |
|  | Single |  | 2 |  | 54 |
|  | Double |  | 2 |  | 60 |
| Absolute Value | Integer | ABS | 2 | FABS | 62 |
|  | Single |  | 2 |  | 54 |
|  | Double |  | 2 |  | 60 |
| Negate | Integer | NEG | 2 | FNEG | 62 |
|  | Single |  | 2 |  | 54 |
|  | Double |  | 2 |  | 60 |
| Multiply | Integer | MPY | 2 | FMUL | 100 |
|  | Single | MPYF | 2 |  | 92 |
|  | Double | MPYD | 3 |  | 98 |
| Square Root | Integer | SQRT | 20 | FSQRT | 134 |
|  | Single | SQRTF | 10 |  | 126 |
|  | Double | SQRTD | 16 |  | 132 |
| Subtract | Integer | SUB | 2 | FSUB | 80 |
|  | Single | SUBF | 2 |  | 72 |
|  | Double | SUBD | 2 |  | 78 |

## System Information — Parts List

Table 2. Parts List

| Reference Designation | Name | Pins | WIDTH <br> (MILS) |
| :---: | :---: | :---: | :---: |
| U1, U2 | PAL20L8 | 24 | 300 |
| U3 | PAL16RA8 | 20 | 300 |
| U4 | PAL16R4 | 20 | 300 |
| U5, U6, U7, U8 | 74BCT245 | 20 | 300 |
| U9 | 74AS74 | 14 | 300 |
| U10 | Motorola MC68030 | $13 \times 13$ | PGA |
| U11, U12 | 74F08 | 14 | 300 |
| U13 | PAL22V10 | 24 | 300 |
| U14 | 74F08 | 14 | 300 |
| U15, U16, U17, U18 | 7C185 | 28 | 300 |
| U19 | IDT7132 | 48 | 600 |
| U20, U21, U22 | IDT7142 | 48 | 600 |
| U23, U24, U25, U26 | 74ALS2232 | 24 | 300 |
| U27 | 74AS74 | 14 | 300 |
| U28 | TMS34082 | $15 \times 15$ | PGA |
| U29 | 74BCT244 | 20 | 300 |
| U30 | 74F74 | 24 | 300 |
| U31, U32, U33 | 74BCT244 | 20 | 300 |
| U34 | 74F74 | 14 | 300 |
| U35 | 74F08 | 14 | 300 |
| U36 | 74F374 | 20 | 300 |
| U37 | 74F08 | 14 | 300 |
| U38 | 74F00 | 14 | 300 |
| RP1, RP2, RP3 | 10K Puill-up Resistor | 9 | SIP |
| HDR1 | Platform | 16 | 300 |
| C1-41 | $0.01 \mu \mathrm{~F}$ Capacitor |  |  |
| C42 | $10 \mu \mathrm{~F}$ Capacitor |  |  |
| X1 | 25 MHZ Oscillator | 14 | 300 |
| X2 | 40 MHZ Oscillator | 14 | 300 |
| SW1 | SPST Switch |  |  |
| SW2 | SPDT Switch |  |  |
| SW3 | 8 POS. DIP SW | 16 | 300 |

Schematics - Hardware Design


Figure 8(a). Block Diagram


Figure 8(b). Block Diagram


Figure 9. PC/AT I/F and Control, Details of U1, U5, U6, U7, and U8


Figure 10. PC/AT I/F and Control, Details of U29


Figure 11. PC/AT I/F and Control, Details of U2, U3, and U4


Figure 12. Motorola MC68030 and Address Buffers, Details of U31, U32, and U33


Figure 13. Motorola MC68030 and Address Buffers, Details of U10


Figure 14. Motorola MC68030 and Address Buffers, Details of Oscillator and U30


Figure 15. Motorola MC68030 Decode/Control, Details of U11


Figure 16. Motorola MC68030 Decode/Control, Details of RP1, RP2, and RP3


Figure 17. Motorola MC68030 Decode/Control, Details of U11, U12, U13, and U30


Figure 18. $8 \mathrm{~K} \times 8$ SRAM DP-SRAM, Details of U15, U16, U17, and U18

Table 3. 8K x 8 SRAM DP-SRAM, Detail Pin Assignments for U15, U16, U17, and U18

| Device |  | 8K $\times 8$ SRAM |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: |
|  |  | U15 | U16 | U17 | U18 |
| Name | in <br> Number | External Connection Signal Name |  |  |  |
| A0 | 21 | A2 | A2 | A2 | A2 |
| A1 | 23 | A3 | A3 | A3 | A3 |
| A2 | 24 | A4 | A4 | A4 | A4 |
| A3 | 25 | A5 | A5 | A5 | A5 |
| A4 | 2 | A6 | A6 | A6 | A6 |
| A5 | 3 | A7 | A7 | A7 | A7 |
| A6 | .4 | A8 | A8 | A8 | A8 |
| A7 | 5 | A9 | A9 | A9 | A9 |
| A8 | 6 | A10 | A10 | A10 | A10 |
| A9 | 7 | A11 | A11 | A11 | A11 |
| A10 | 8 | A12 | A12 | A12 | A12 |
| A11 | 9 | A13 | A13 | A13 | A13 |
| A12 | 10 | A14 | A14 | A14 | A14 |
| 1/00 | 11 | D0 | D8 | D16 | D24 |
| 1/01 | 12 | D1 | D9 | D17 | D25 |
| I/O2 | 13 | D2 | D10 | D18 | D26 |
| 1/03 | 15 | D3 | D11 | D19 | D27 |
| 1/04 | 16 | D4 | D12 | D20 | D28 |
| I/05 | 17 | D5 | D13 | D21 | D29 |
| 1/06 | 18 | D6 | D14 | D22 | D30 |
| 1/07 | 19 | D7 | D15 | D23 | D31 |
| $\overline{O E}$ | 22 | 8KRDEN | 8KRDEN | 8KRDEN | 8KRDEN |
| $\overline{\mathrm{CE}} 1$ | 20 | LLCS | LMCS | UMCS | UUCS |
| CE2+ | 26 | VCC | VCC | $\mathrm{V}_{\mathrm{CC}}$ | $V_{C C}$ |
| $\overline{W E}$ | 27 | 8KWREN | 8KWREN | 8KWREN | 8KWREN |
| NC | 1 | No Connection |  |  |  |



Figure 19. Motorola MC68030 Decode/Control, Details of U3 and U37


Figure 20. 8K x 8 SRAM DP-SRAM, Details of U14 and U19


Figure 21. FIFO Logic, Details of U23, U24, U25, and U26

Table 4. FIFO Logic, Detail Pin Assignments for U23, U24, U25, and U26

| Device |  | SN74ALS2232 |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: |
| Block Number |  | U23 | U24 | U25 | U28 |
| Pin |  | External Connection Signal Name |  |  |  |
| D0 | 2 | D0 | D8 | D16 | D24 |
| D1 | 3 | D1 | D9 | D17 | D25 |
| D2 | 4 | D2 | D10 | D18 | D26 |
| D3 | 5 | D3 | D11 | D19 | D27 |
| D4 | 7 | D4 | D12 | D20 | D28 |
| D5 | 8 | D5 | D13 | D21 | D29 |
| D6 | 9 | D6 | D14 | D22 | D30 |
| D7 | 10 | D7 | D15 | D23 | D31 |
| $\overline{\text { RST }}$ | 1 | RESET | $\overline{\text { RESET }}$ | $\overline{\text { RESET }}$ | $\overline{\text { RESET }}$ |
| $\overline{\text { FULL }}$ | 11 | $\overline{68 H A L T}$ | 68HALT | $\overline{68 H A L T}$ | 68HALT |
| LOCK | 12 | FIFOWR | FIFOWR | FIFOWR | FIFOWR |
| Q0 | 23 | LADO | LAD8 | LAD16 | LAD24 |
| Q1 | 22 | LAD1 | LAD9 | LAD17 | LAD25 |
| Q2 | 21 | LAD2 | LAD10 | LAD18 | LAD26 |
| Q3 | 20 | LAD3 | LAD11 | LAD19 | LAD27 |
| Q4 | 18 | LAD4 | LAD12 | LAD20 | LAD28 |
| Q5 | 17 | LAD5 | LAD13 | LAD21 | LAD29 |
| Q6 | 16 | LAD6 | LAD14 | LAD22 | LAD30 |
| Q7 | 15 | LAD7 | LAD15 | LAD23 | LAD31 |
| $\overline{\mathrm{OE}}$ | 24 | $\overline{\text { COINT }}$ | $\overline{\text { COINT }}$ | $\overline{\text { COINT }}$ | COINT |
| EMPTY | 14 | FIFOSTAL | FIFOSTAL | FIFOSTAL | FIFOSTAL |
| UNCK | 13 | FIFORD | FIFORD | $\overline{\text { FIFORD }}$ | FIFORD |



Figure 22. $8 \mathrm{~K} \times 8$ SRAM DP-SRAM, Details of U20


Figure 23. FIFO Logic, Details of U21


Figure 24. FIFO Logic DP-SRAM, Details of U22


Figure 25. FIFO Logic, Details of U14


Figure 26. TMS34082, Details of U38


Figure 27. TMS34082, Details of U28


Figure 28. TMS34082, Details of U28


|  |  |  |
| :--- | :--- | :--- |
| J1 | A6 | ATDT3 |
| J1 | A7 | ATDT2 |
| J1 | A8 | ATDT1 |
| J1 | A9 | ATDT0 |
| J1 | A20 | ATCLK |
| J1 | A12 | ATAD19 |
| J1 | A13 | ATAD18 |
| J1 | A14 | ATAD17 |
| J1 | A15 | ATAD16 |
| J1 | A16 | ATAD15 |
| J1 | A17 | ATAD14 |
| J1 | A18 | ATAD13 |
| J1 | A19 | ATAD12 |
| J1 | A20 | ATAD11 |
| J1 | A21 | ATAD10 |
| J1 | A222 | ATAD9 |
| J1 | A23 | ATAD8 |
| J1 | A24 | ATAD7 |
| J1 | A25 | ATAD6 |
| J1 | A26 | ATAD5 |
| J1 | A27 | ATAD4 |
| J1 | A28 | ATAD3 |
| J1 | A29 | ATAD2 |
| J1 | A30 | ATAD1 |
| J1 | A31 | ATAD0 |

Figure 29. AT-Bus Connector


Figure 30. Capacitors

## PAL ${ }^{\circledR}$ Code Listing

NOTE: All code is written using PAL ${ }^{\circledR}$ ASM software.

| Memory Decode for TMS 34082 Accelerator Board |  |
| :--- | :--- |
| PATTERN | MEMORY DECODE FUNCTIONS |
| REVISION | 1A |
| AUTHOR | MIKE ROBERTS |
| COMPANY | TEXAS INSTRUMENTS |
| DATE | OCTOBER 11, 1989 |

; This PAL will decode memory functions from the PC/AT to the Motorola MC68030 host SRAM.
CHIP 20 L 8 PAL20L8

$\left[\begin{array}{ccc}10 & 11 & 12 \\ \text { AOWR } & \text { BALE } & \text { GND }\end{array}\right.$

| $;$ PIN |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| $;$ | 13 | 14 | 15 | 16 | 17 | 18 |
|  | ISMEMRD | ISMEMWR | IATHIEN | ATAD23 | ATAD22 | ATEN1 |
|  |  |  |  |  |  |  |
|  | 19 | 20 | 21 | 22 | 23 | 24 |
|  | ATCNTL | IATADDREN | ATEN2 | IATLOEN | AEN | $V_{C C}$ |

## EQUATIONS

## ; ALWAYS ENABLED

| ATLOEN.TRST | $=\mathrm{V}_{\mathrm{CC}}$ |
| :--- | :--- |
| ATHIEN.TRST | $=\mathrm{V}_{\mathrm{CC}}$ |
| ATADDREN.TRST | $=\mathrm{V}_{\mathrm{CC}}$ |
| ATCNTL.TRST | $=\mathrm{V}_{\mathrm{CC}}$ |
| ATEN1.TRST | $=\mathrm{V}_{\mathrm{CC}}$ |
| ATEN2.TRST | $=\mathrm{V}_{\mathrm{CC}}$ |

; ABOVE EQUATIONS NOT REQUIRED SINCE PAL ASM DEFAULTS TO THESE.
; THEY HAVE BEEN ADDED FOR CLARITY.
ATEN1 =/BALE * ATAD19 * ATAD18 * ATAD16 */ATAD17 */AEN
;USED AS A GATE TO ASSERT ACCESS TO HOST SRAM

ATEN2 $=/($ ATAD23 + ATAD22 + ATAD21 + ATAD20 $)$
; INTERMEDIATE TERM DESELECTING ADDRESS LINES 23-20

```
ATLOEN = ATEN1 */ATADO1 */SMEMRD * ATEN2
    + ATEN1 */ATAD01 * ISMEMWR * ATEN2
```

; DECODES LOW BYTE IN EITHER READ OR WRITE MODE
ATHIEN = ATEN1 * ATAD01 * /SMEMRD * ATEN2

+ ATEN1 * ATAD01 */SMEMWR * ATEN2
; DECODES HIGH BYTE IN EITHER READ OR WRITE MODE
PAL is a registered trademark of Monolithic Memories Inc.


# ATADDREN $=$ ATEN1 */ATAD01 */SMEMRD * ATEN2 + ATEN1 */ATAD01 * /SMEMWR * ATEN2 <br> + ATEN1 * ATAD01 */SMEMRD * ATEN2 <br> ; ENABLES THE ADDRESS BUFFERS FOR HIGH OR LOW BYTE 

I/O Decode for TMS34082 Accelerator Board
PATTERN DECODE CONTROL FUNCTIONS
REVISION 1A
AUTHOR MIKE ROBERTS
COMPANY TI
DATE 10/10/89
; This PAL will decode I/O functions to set I/O signals.
; DEVICE U2

| CHIP | 20 L |  | PAL20L |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| ; PIN |  |  |  |  |  |  |  |  |
| ; 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
| ATADO | ATAD1 | ATAD2 | ATAD3 | ATAD4 | ATAD5 | ATAD6 | ATAD7 | ATAD8 |
| 10 | 11 | 12 | 13 | 14 |  |  |  |  |
| ATAD9 | ATAD10 | GND | /IORD | /IOWR |  |  |  |  |
| ; PIN |  |  |  |  |  |  |  |  |
| 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 |
| NC124 | /STATUSRD/CNTRLENNC2 |  |  | NC3 | NC4 | NC5 | SYSCNTL | BALE |
|  |  |  |  |  |  |  |  |  |
| VCC |  |  |  |  |  |  |  |  |

EQUATIONS
; ALWAYS ENABLED

| ISTATUSRD.TRST | $=\mathrm{V}_{\mathrm{CC}}$ |
| :--- | :--- |
| ICNTRLEN.TRST | $=\mathrm{V}_{\mathrm{CC}}$ |
| SYSCNTL.TRST | $=\mathrm{V}_{\mathrm{CC}}$ |
| NC1.TRST | $=\mathrm{V}_{\mathrm{CC}}$ |
| NC2.TRST | $=\mathrm{V}_{\mathrm{CC}}$ |
| NC3.TRST | $=\mathrm{V}_{\mathrm{CC}}$ |
| NC4.TRST | $=\mathrm{V}_{\mathrm{CC}}$ |

CNTRLEN = /BALE *ATAD9 * ATAD8 * ATAD4 * ATAD3 */ATAD2 */ATAD1* /IOWR ; USED TO ENABLE I/O WRITE REGISTER FOR PC/AT Motorola MC68030 ARBITRATION AND CONTROL OF TMS34082 HALT FUNCTIONS

STATUSRD $=/$ IORD */BALE * ATAD9 * ATAD8 * ATAD4 * ATAD3 */ATAD2 * ATAD1
; READ STATUS FROM ASYNCHRONOUS PAL
; USED TO READ STATUS FROM STATUS REGISTER
INVERT = ATADO
; USED TO INVERT PC/AT ADDRESS 0

```
Status Control for TMS34082 Accelerator Board
PATTERN STATUS CONTROL FUNCTIONS
REVISION 1A
AUTHOR MIKE ROBERTS
COMPANY TI
DATE 10/16/89
```

; This PAL will decode memory from the Motorola MC68030 to external DP-SRAM and the FIFO buffer.
CHIP 16RA8 PAL16RA8

| ; DEVICE U3 |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 12 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
| PRLD NC2 | /RESET | NC3 | NC4 | NC5 | /BG | CLK | NC6 |
| 10 |  |  |  |  |  |  |  |
| DBEN |  |  |  |  |  |  |  |
| ; PIN |  |  |  |  |  |  |  |
| $11 \quad 12$ | 13 | 14 | 15 | 16 | 17 | 18 | 19 |
| /STATUSRD NC7 | NC8 | ATDT2 | ATDT1 | ATDTO | NC9 | NC10 | NC11 |
| $20 \quad 21$ | 22 | 23 | 24 |  |  |  |  |
| DPRD TEEM | /FIFOWR | NC5 | $\mathrm{V}_{\mathrm{CC}}$ |  |  |  |  |

## EQUATIONS

$$
\begin{aligned}
\text { TERM } & =68 \mathrm{RW}{ }^{*} \mathrm{~A} 14^{*} / \mathrm{A} 15^{*} / \mathrm{AS} \\
& +/ 68 \mathrm{RW}{ }^{*} \mathrm{~A} 14^{*} / \mathrm{A} 15^{*} / \mathrm{DAS} \\
& +/ 68 \mathrm{RW} * / \mathrm{A} 14^{*} / \mathrm{A} 15^{*} / \mathrm{DAS}
\end{aligned}
$$

; SYNCHRONOUS TERMINATION SIGNAL FOR FIFO AND DP-SRAM
$/$ FIFOWR $=/(/ 68 R W$ * $/$ DBEN * AS * $/$ A14 * A15 $)$
; FIFO WRITE ENABLE. SINCE FIFO IS EDGE-TRIGGERED, THESE SIGNALS ARE RECOMMENDED.

```
DPCE = 68R2 * A14 */A15 */AS
    +/68RW * A14 */A15 */DAS
```

; DUAL-PORT CHIP ENABLE

```
DPRD =68RW * A14 * A15 * IASI; 8K SRAM READ SELECT
DPWRUU = A14 */A15 */A1 */A0 */68RW
; BYTE ENABLE SELECTS FOR UPPER-UPPER BYTE
DPWRUM = A14 */A15 */A1 * A0 */68RW
    + A14 */A15 */A1 */68RW */SIZ0
    + A14 */A15 */A1 */68RW * SIZ1
```

; BYTE ENABLE FOR UPPER-MIDDLE BYTE
DPWRLM $=\mathrm{A} 14$ * $/ \mathrm{A} 15$ * A1 */A0 */68RW
$+\mathrm{A} 14^{*} / \mathrm{A} 15^{*} / \mathrm{A} 1$ */ $/ 68 \mathrm{RW}$ * $/ \mathrm{SIZ1}$ * SIZ0

$$
\begin{aligned}
& +\mathrm{A} 14^{*} / \mathrm{A} 15^{*} / \mathrm{A} 1 \text { * } / 68 \mathrm{RW}{ }^{*} \mathrm{SIZ1} \text { * } \mathrm{SIZO} \\
& +\mathrm{A} 14^{*} / \mathrm{A} 15^{*} / \mathrm{A} 1^{*} / \mathrm{A0} * / 68 \mathrm{RW} \text { */SIZ }
\end{aligned}
$$

; BYTE ENABLE FOR UPPER-LOWER BYTE
DPWRLL $=A 14$ * $/ A 15$ * A1 * A0 */68RW

+ A14 */A15 * A0 */68RW * SIZ1 * SIZ0
+ A14 */A15 * /68RW */SSIZ1 * /SIZ0
+ A14 */A15 * A1 * /68RW * SIZ1
; BYTE ENABLE FOR LOWER-LOWER BYTE

| Byte Enable Decode for TMS34082 Accelerator Board |  |
| :--- | :--- |
| PATTERN | DECODE CONTROL FUNCTION |
| REVISION | 1A |
| AUTHOR | MIKE ROBERTS |
| COMPANY | TI |
| DATE | $10 / 12 / 1989$ |

; This PAL will decode byte enables of the Motorola MC68030 and PC/AT bytes to the 8K-SRAM.
CHIP 16R4 PAL16R4

EQUATIONS
; ALL FUNCTIONS ARE ALWAYS ENABLED
BR

$$
:=A T D T 3 * B G
$$

; BUS REQUEST TO Motorola MC68030, ONLY ACTIVE WHEN Motorola MC68030 BUS GRANT HIGH

BGACK : = ATDT2 */BG
; BUS GRANT ACKNOWLEDGE SIGNAL FROM PC/AT, ACTIVE WHEN BUS GRANTED
68RST $\quad:=$ NATDT4
; THIS SIGNAL RESETS THE Motorola MC68030
TIRST $:=$ ATDT5
; THIS SIGNAL RESETS THE TMS34082

| Pattern Decode for TMS 34082 Accelerator Board |  |
| :--- | :--- |
| PATTERN | DECODE CONTROL FUNCTION |
| REVISION | 1A |
| AUTHOR | MIKE ROBERTS |
| COMPANY | TI |
| DATE | $10 / 12 / 1989$ |

; This PAL will decode memory from the Motorola MC68030 to external devices.

| CHIP | 22V10 |  | PAL22V10 |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| ;PIN |  |  |  |  |  |  |  |  |
| ; 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
| NC1 | A19 | A18 | /68RW | AO | A1 | A15 | A16 | A17 |
| 10 | 11 | 12 |  |  |  |  |  |  |
| A30 | DAS | GND |  |  |  |  |  |  |
| 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 |
| IC1 | /8KWRCS | /BOOT | /68TIRS | NC4 | /DPRDC | STERM | NC2 | /8KRDCS |
| ; 22 | 23 | 24 |  |  |  |  |  |  |
| /8KCE | /FIFOWR | $\mathrm{V}_{\mathrm{CC}}$ |  |  |  |  |  |  |

; MYTHICAL PIN THPC/AT SETS THE REGISTERS ON BOOT-UP CALLED VAPOR VAPOR

## EQUATIONS

| /8KWRCS.TRST | $=\mathrm{V}_{\mathrm{CC}}$ |
| :--- | :--- |
| /BOOT.TRST | $=\mathrm{V}_{\mathrm{CC}}$ |
| /68TIRST.TRST | $=\mathrm{V}_{\mathrm{CC}}$ |
| /DPRDCS.TRST | $=\mathrm{V}_{\mathrm{CC}}$ |
| /STERM.TRST | $=\mathrm{V}_{\mathrm{CC}}$ |
| /8KRDCS.TRST | $=\mathrm{V}_{\mathrm{CC}}$ |
| /8KCE.TRST | $=\mathrm{V}_{\mathrm{CC}}$ |
| /FIFOWR.TRST | $=\mathrm{V}_{\mathrm{CC}}$ |

VAPOR.SETF

$$
=\mathrm{V}_{\mathrm{CC}}
$$

8KWRCS $=/ 68 \mathrm{RW} * / \mathrm{A} 14 * / \mathrm{A} 15 * / \mathrm{DAS} * / \mathrm{AS} *$ RST
; 8K SRAM WRITE RE-SELECT
8KRDCS $=68$ RW*/A14*/A15*/AS*RST
; 8K SRAM READ PRE-SELECT
STERM =/A14 */A15 * 68RW ; 8KRDCS
+/A14*/A15 */68RW */DAS ;8KWRCS
; SYNCHRONOUS TERMINATION ENDING SYNCHRONOUS CYCLES
$/$ UUCS $=/ 9 /$ A14 * $/$ A15 * $/$ A1 * $/ 68$ RW * RST + 68RW * $/$ A14 * $/$ A15 * $/$ AS * RST + ATEN1 * ATADO)
; BYTE ENABLE SELECTS FOR UPPER-UPPER BYTE
/UMCS = ///A14*/A15*/A1 * A0*/68RW*RST + /A14*/A15*/A1*/SIZO*RST + /A14*/A15
*/A1 */68RW * SIZ1 * RST + /A14 */A15 (68RW * RST + 68RW */A14 */A15 */AS * RST * ATEN1

* ATADO)

```
; BYTE ENABLE FOR UPPER-MIDDLE BYTE
\begin{tabular}{|c|c|}
\hline /LMCS & \begin{tabular}{l}
\(=/(\) A14 * \(/\) A15 * A1 * A0 /68RW * RST \\
+/A14*/A15*/A1 /68RW */SIZ1*/SIZ0 *RST \\
+/A14*/A15 */A1 */68RW * AIZ1 * SIZ0 * RST \\
+ /A14 */A15 */A1 * A0 */68RW */SIZ0 * RST \\
+ 68RW * /A14 */A15 */AS * RST ;8KREAD \\
+ ATEN1 * ATADO)
\end{tabular} \\
\hline
\end{tabular}
;BYTE ENABLE FOR UPPER-MIDDLE BYTE
\begin{tabular}{|c|c|}
\hline /LLCS & \begin{tabular}{l}
\(=/\left(\mathrm{A} 14^{*} / \mathrm{A} 15^{*} \mathrm{~A} 1^{*} \mathrm{~A} 0 * / 68 \mathrm{RW}\right.\) * RST \\
+/A14*/A15*A0*/68RW *SIZ1 *SIZ0 *RST \\
+/A14*/A15 */68RW */SIZ1 *SIZ0 *RST \\
+/A14 */A15 * A1 */68RW SIZ1 * RST \\
+ 68RW */A14 */A15 */AS * RST ;8KREAD \\
+ ATEN1 * /ATADO
\end{tabular} \\
\hline
\end{tabular}
; BYTE ENABLE FOR LOWER-LOWER BYTE
```


## Software Listings

Software listings available upon request. Contact the DVP Systems Engineering group at (214) 997-3970.
References
TMS34082 Designer's Handbook, Texas Instruments, 1990.
TMS34082 Data Sheet, Texas Instruments, 1990.
Motorola MC68030 User's Manual, Motorola, Inc., 1989.
680x0: Programming by Example, Stan Kelly-Bootle, Howard Sams and Company, 1988.
Motorola MC68881: Floating-Point Coprocessor User's Manual, Motorola, Inc., 1988.
The Motorola Motorola MC68020 and Motorola MC68030 Microprocessors: Assembly Language, Interfacing, and Design, Thomas L. Harman, Prentice Hall, 1989.

80286 and 80287 Programmer's Reference Manual, Intel, Inc. 1987.
8086/8088/80286 Assembly Language, Leo J. Scanlon, Brady Publishing Co., 1988.
Assembly Language Primer for the IBM PC \& XT, Robert Lafore, Plume/Waite, 1984.
Technical Reference: Personal Computer AT, IBM, 1985.
80286 Hardware Reference Manual, Intel, Inc., 1987.

## Appendix E

## A High Performance Floating-Point Image Computing Workstation for Medical Applications

This appendix describes the hardware and software architecture of a medium-cost floating-point image processing and display subsystem for the $\mathrm{NeXT}{ }^{\mathrm{TM}}$ computer, and its applications as a medical imaging workstation.

Karl S. Mills, Gilman K. Wong, and Yongmin Kim Image Computing Systems Laboratory (ICSL) Department of Electrical Engineering University of Washington Seattle, WA 98195

Reprinted with permission from
Medical Imaging IV: Image Capture and Display, Society of Photo-Optical Instrumentation Engineers (SPIE)

Conference Proceedings
Yongman Kim, Editor
Volume 1232, Summer 1990


#### Abstract

The medical imaging field relies increasingly on imaging and graphics techniques in diverse applications with needs similar to (or more stringent than) those of the military, industrial and scientific communities. However, most image processing and graphics systems available for use in medical imaging today are either expensive, specialized, or in most cases both. High performance imaging and graphics workstations which can provide real-time results for a number of applications, while maintaining affordability and flexibility, can facilitate the application of digital image computing techniques in many different areas.

This paper describes the hardware and software architecture of a medium-cost floating-point image processing and display subsystem for the $\mathrm{NeXT}^{\mathrm{TM}}$ computer, and its applications as a medical imaging workstation. Medical imaging applications of the workstation include use in a Picture Archiving and Communications System (PACS), in multimodal image processing and 3-D graphics workstation for a broad range of imaging modalities, and as an electronic alternator utilizing its multiple monitor display capability and large and fast frame buffer.

The subsystem provides a $2048 \times 2048 \times 32$-bit frame buffer ( 16 Mbytes of image storage) and supports both 8 -bit gray scale and 32-bit true color images. When used to display 8-bit gray scale images, up to four different 256 -color palettes may be used for each of four $2 \mathrm{~K} \times$ IK x 8 -bit image frames. Three of these image frames can be used simultaneously to provide pixel selectable region of interest display. A $1280 \times 1024$ pixel screen with $1: 1$ aspect ratio can be windowed into the frame buffer for display of any portion of the processed image or images. In addition, the system provides hardware support for integer zoom and an 82-color cursor. This subsystem is implemented on an add-in board occupying a single slot in the $\mathrm{NeXT}^{\mathrm{TM}}$ computer. Up to three boards may be added to the NeXT ${ }^{\mathrm{TM}}$ for multiple display capability (e.g., three $1280 \times 1024$ monitors, each with a 16-Mbyte frame buffer).

Each add-in board provides an expansion connector to which an optional image computing coprocessor board may be added. Each coprocessor board supports up to four processors for a peak performance of 160 MFLOP5. The coprocessors can execute programs from external high-speed microcode memory as well as built-in internal microcode routines. The internal microcode routines provide support for 2-D and 3-D graphics operations, matrix and vector arithmetic, and image processing in integer, lEEE single-precision floating point, or lEEE double-precision floating point.

In addition to providing a library of C functions which links the NeXT ${ }^{\mathrm{TM}}$ computer to the add-in board and supports its various operational modes, algorithms and medical imaging application programs are being developed and implemented for image display and enhancement. As an extension to the built-in algorithms of the coprocessors, 2-D Fast Fourier Transform (FET), 2-D Inverse FFT, convolution, warping and other algorithms (e.g., Discrete Cosine Transform) which exploit the parallel architecture of the coprocessor board are being implemented.


## Introduction

The medical field relies increasingly on image computing in many applications areas. Current needs in the medical field include the employment of image processing and graphics in medical image enhancement, simple measurement, or scientific visualization of change, movement, and flow, as well as successive 2-D slices in 3-D medical images. X-ray Computed Tomography (CT), Magnetic Resonance Imaging (MRI) and Positron Emission Tomography (PET) all use computationally intensive reconstruction methods to produce detailed cross sections of the structure. Other medical imaging modalities include digital radiography (digital X-rays), ultrasound and nuclear medicine scanners. These imaging modalities are used to understand internal anatomical and functional pathologies and to utilize that information in various clinical cases, for example during brain or orthopedic surgery. Image processing techniques are necessary for picture enhancement, and computing various statistics in applications like detecting suspicious cancer cells from pap smears. Picture Archiving and Communications System (PACS) with filmless archiving for all the images is a powerful concept with vast untapped potential. High-performance graphics and imaging workstations are essential for successful PACS.

This paper describes the most recent of a series of affordable, high-performance image computing workstations, the University of Washington Graphics System Processor \#3 (UWGSP3) and its application to medical imaging. The UWGSP3 image processing board set supports the following features:

- Single $2 \mathrm{k} \times 2 \mathrm{k} \times 32$-bit ( 16 Mbytes ) roamable video/frame memory implemented entirely with 1 Mbit VRAMs
- 32 bits per pixel configured as 24-bit true-color system with 8 overlay bits, or up to four 8-bit pseudo-color or gray-scale frames (or 3 frames with overlay)
- 160 MFLOPS peak performance for high-speed integer and floating-point image processing and graphics functions
- $1280 \times 102460-\mathrm{Hz}$ noninterlaced color display with $1: 1$ screen aspect ratio
- Hardware zoom, roam, and cursor support
- Up to 3 different color palettes, each driven by a different plane, can be displayed at once for region of interest (ROI) operations
- Expansion port for digitizer, additional frame memory or other devices
- Improved system performance (4 to 8 times that of previous UWGSP systems)
- Support for window-oriented user interface
- $\mathrm{NeXT}^{\mathrm{mm}}$ host system
- Medium-cost

The UWGSP3 offers the powerful, yet flexible environment necessary for meeting the stringent needs of many imaging applications. Applications other than those in medicine include scientific applications: astronomy, remote sensing, geology, seismology, oceanography, and earth resources planning; industrial applications: machine vision and robotics, tolerance verification, parts identification, optical character recognition, and thermography; military applications: field-deployable military workstations for map analysis and processing, target identification and tracking, and surveillance; forensics: fingerprint analysis and identification, signature verification and dental records analysis; and graphics applications: computer image display and synthesis (for example, solid modeling, ray tracing, object rendering and shading), image overlay, graphic arts, and ad preparation. Although some of these applications may never be implemented, these are the types of applications which could be developed on UWGSP3.

E-4 $\quad$ A High Performance Floating-Point Image Computing Workstation for Medical Applications

## Background

Several image computing subsystems have been developed in the Image Computing Systems Laboratory (ICSL) of the University of Washington. The University of Washington Graphics System Processor \#1 (UWGSP1), developed in 1987, was the first of these systems. It has been used as a low-end PACS medical imaging workstation in the University of Washington PACS prototype system [Gee et al., 1989]. This first generation image computing subsystem was implemented on 2 IBM IC/AT protyping cards, heavily utilizing the processing power of the TMS34010 Graphics System Processor (GSP) and TMS32020 Digital Signal Processor (DSP). In UWGSP1, the screen and graphics functions are controlled by the GSP, and the DSP is used as a numeric coprocessor accessed via First-In First-Out (FlFO) buffers from the GSP. The spatial resolution of the display is $512 \times 512$ pixels with a contrast resolution of 8 bits per pixel. Hardware zoom, pan and scroll, one video frame buffer, and three workspace buffers are incorporated in the system. Software developed for the UWGSP1 includes point operations, arithmetic and logical operations, Region of interest (ROI), convolution, geometric transformation, Fast Fourier Transform (FFT) and Inverse (IFFT). Used in conjunction with a PC/AT host, UWGSP1 provides a flexible three processor low-cost medium performance workstation for fixed-point image processing applications.

While the UWGSP1 has proven to be a viable performer in various image analysis and processing applications, experience with the system exposed problem areas that required attention. UWGSP1 suffered from the following problems which somewhat limited its usefulness as an image computing workstation:

- The DSP's 16-bit fixed-point arithmetic can cause serious problems in accuracy of some image processing and graphics operations due to overflow, truncation, and other problems,
- Communication between the GSP and DSP through FIFO buffers is inefficient and difficult to manage,
- Some DSP operations are slow (e.g., 2-D FFT on $512 \times 512$ images takes about 16 seconds),
- For many applications, $512 \times 512$ display resolution is not enough, and
- Because the screen aspect ratio is not $1: 1$, warping of images is required for them to appear in proper proportion.

Because of these limitations, a second generation image processing subsystem was proposed (UWGSP2) and implemented at the ICSL in 1988 [Chinn et al., 1988]. UWGSP2 utilizes the Texas Instruments' 74ACT8837 Floating Point Processor (FPP) as a replacement for the TMS32020 DSP, to provide high-performance floating-point implementation of computationally-intensive image processing and graphics algorithms. By incorporating the FPP in the second generation design, most of the problems associated with the DSP's 16 -bit fixed-point arithmetic operations were alleviated, while still obtaining a performance increase of about 2 times that of UWGSP1. However, the GSP to FPP FIFO interface continued to be a data flow bottleneck in the system, the display resolution was still insufficient for many imaging applications, and the screen aspect ratio was still other than 1:1.

A third generation system (UWGSP3) has been designed and implemented at the ICSL in 1989, and overcomes the limitations of the earlier systems by adding increased display resolution (from $512 \times 512$ to $1280 \times 1024$ ), increased frame buffer storage (from 1 Mbyte to 16 Mbytes), support for 32-bit true-color as well as 8-bit gray scale images or up to 24 -bit gray scale images windowed and leveled into 8 bits, an intuitive graphical user interface, and multiple floating-point coprocessors for 160 MFLOPS of peak processing performance. This system and its application to medical imaging are described below.

The UWGSP3 is implemented on a single multilayer printed circuit board, with an expansion connector for an optional coprocessor board. It is designed around two special purpose VLSI processors, the TMS34020 second generation Graphics System Processor and the TMS34082 Floating-Point Processor. Figure 1 shows a block diagram of the system with major components which include a NeXT ${ }^{\text {TM }}$ Host System and Interface Logic, the TMS34020 Graphics System Processor, four TMS34082 Floating-Point Processors, Local Program Memory (1 Mbyte), and Video Display and Frame Buffer Memory (16 Mbytes). Each of these major design blocks is described below.

## System Architecture

## NeXT ${ }^{\text {тм }}$ Host System and Interface Logic

The host system for UWGSP3, the NeXT ${ }^{\text {TM }}$ computer, was selected over other potential host systems (e.g., MAC Il, PC/AT compatibles, SUN, etc.) mainly for its flexibility and ease of use and programming. The NeXT 's ${ }^{\mathrm{TM}}$ operating system, Mach (compatible with BSD 4.3 UNIX), provides a popular, portable, and flexible environment for software development and maintenance. Although UNIX provides an extremely versatile development environment, it is somewhat cryptic and cumbersome for the general user. However, the $\mathrm{NeXT}{ }^{\mathrm{TM}}$ provides a user friendly "Macintosh-like" interface for the nonprogrammer, while still providing the excellent development environment afforded by UNIX. Furthermore, the NeXT ${ }^{T M}$ architecture includes a high-speed 32-bit bus (NextBus, an enhanced NuBus) providing burst transfer rates of up to 100 Mbytes per second, and the significant board real estate necessary to support complex hardware designs. Another benefit afforded by the $\mathrm{NeXT}^{\mathrm{TM}}$ is an interactive interface developmentenvironment (Interface Builder) which can generate user interface code directly. Because the user interface usually represents approximately $20 \%$ of the code, but requires as much as $80 \%$ of the effort, this capability can provide a significant savings in the time to develop various medical imaging applications by simplifying the generation and modification of application software interfaces [Jobs, 1989]. NeXT 's ${ }^{\text {TM }}$ object-oriented approach to software development makes it possible to develop image processing code modules which could be integrated into applications and user interfaces by the end user.

The backplane of the NeXT ${ }^{\mathrm{Tm}}$ computer supplies three expansion slots. Thus, up to three UWGSP3 subsystems can be inserted into the NeXT ${ }^{\mathrm{TM}}$ for applications that require multiple displays. Interfacing of the $\mathrm{NeXT}^{\mathrm{TM}}$ host to the UWGSP3 system is provided using a dedicated host interface port on the TMS34020. Executable programs, operands, images, and commands are passed to the UWGSP3 and its local memory via this host interface, with the NeXT ${ }^{\mathrm{TM}}$ acting as the master and the UWGSP3 acting as a slave device. The host initializes the subsystem by transferring a GSP executable command decoder into GSP program memory via the host interface port. With the command decoder installed, image processing and graphics functions may be issued from the host. Once a command has been issued to the GSP, the host is free to pursue other functions as may be required, while the GSP decodes the command and executes the appropriate program on the UWGSP3 local bus.


Figure 1. UWGSP Block Diagram

## Processors

The two high-performance special purpose VLSI processors used in UWGSP3 represent state-of-the-art performance and integration. Texas Instruments TMS34020 is the second generation of an advanced high-performance CMOS 32-bit microprocessor optimized for graphics display systems [Texas Instruments, 1989]. Addressing is bit oriented and all data structures such as pixel size and frame size as well as display characteristics are defined in internal GSP control registers, allowing the GSP to be configured to support a wide variety of display devices and formats. The TMS34020 contains a built-in instruction cache, hardware support for raster graphics instructions, video display timing generation hardware, as well as a memory controller and video memory controller. Extensions to the basic architecture of the GSP are provided through its coprocessor interface. Special instructions and cycles are available for enhancing data flow to coprocessors while maintaining a closely coupled processor-coprocessor environment.

The TMS34082 is a high-speed ( 40 MFLOPS peak) floating-point processor combining on a single chip a 16-bit sequencer, address generation and a three operand floating-point unit with twenty-two 64-bit data registers [Texas Instruments, 1989]. Single and double precision IEEE floating-point operations are supported for addition, subtraction, multiplication, division, square root, and comparison. In addition to floating-point operations, 32-bit integer arithmetic, logic operations and shifts may be performed by the 34082. To allow integer pixels to be manipulated in floating point, conversions are provided from integer to single or double precision formats and vice versa. To make the FPP more useful in imaging and graphics applications, internl micrcoode routines are provided for vector and matrix operations and the following graphics and image processing functions:

- $3 \times 3$ variable kernel convolution
- Backface elimination
- Polygon, 2-Plane, and 2-Plane color clipping
- 2-D and 3-D cubic spline
- 2-D window compare and 3-D volume compare
- Viewport scaling and conversion
- 2-D and 3-D linear interpolation
- Polygon elimination

External microcode support is also available to allow custom algorithm implementation on the 34082 processors. Additional image processing and graphics algorithms utilizing one to four 34082 processors are currently being implemented on UWGSP3.
Using the Texas Instruments TMS34020 GSP and the Texas Instruments TMS34082 Floating-Point Processor as a closely coupled processor pair alleviates much of the data transfer bottleneck experienced in the first and second generation UWGSP subsystems. Images stored in frame buffer memory can be transferred directly to the FPP rather than being read by the GSP and rewritten to FFO buffers as in the earlier UWGSP systems. But, with the display area and pixel depth each more than four times that of UWGSP1 or 2, additional processing capability is required to overcome the added computational demands imposed. For this reason multiple (up to 4) FPPs can be attached to the local GSP bus to provide this processing horsepower. As indicated In Figure 1, the FPPs connect directly to the Local Address and Data (LAD) bus of the GSP. Each FPP is also attached to its own bank of high speed $16 \mathrm{~K} \times 32$ bit static memory for external microcode and data storage via the MicroStore Data (MSD) and Address (MSA) buses. The static memory and the MSD and MSA buses operate independently from the GSP's LAD bus, thus reducing GSP local bus activity. Transfers between the GSP memory and the FPP static memory pass through the FPP via the LAD and MSD buses when data or programs are needed by the coprocessors. Registers may also be transferred between the GSP and FPPs at any time.

Using the computing horsepower of the TMS34082's, UWGSP3 can outperform the UWGSP2 system by 4 to 8 times for computationally intensive operations requiring floating-point accuracy. By incorporating the TMS34020 as the graphics engine, graphics and other imaging operations also see a performance increase of 4 to 8 times that of the current UWGSP subsystems.

## Memory

Memory on the GSP local bus is linear and can be partitioned in a user-defined manner. The video buffer on UWGSP3 is configured normally as a single $2048 \times 2048 \times 32$-bit buffer, but may be reconfigured as four 2048 $\times 2048 \times 8$-bit planes, four $4096 \times 4096 \times 4$-bit planes, four $8 \mathrm{~K} \times 8 \mathrm{~K} \times 2$-bit planes, or four $16 \mathrm{~K} \times 16 \mathrm{~K} \times 1$-bit planes. The large video display buffer provides the ability to load large images into the buffer and roam through them, or to load several different images (e.g., an entire CT or MR study) into the buffer at once. For graphics or computer image generation applications, having a video buffer of more than two times the screen size allows double buffering of the display for smooth image and graphics transitions. The video frame buffer is implemented entirely in 1 Mbit multiport Video RAM (VRAM). The use of VRAM substantially increases the availability of the local bus because screen refresh data moves over a separate path to the combined lookup tables and digital to analog converters (RAMDACs).

The GSP program memory consists of 256K x 32-bits of Dynamic RAM (DRAM). This memory is used to store the local programs and data needed to control the display, manipulate images and graphics, and control the four coprocessors. Because the GSP contains the necessary hardware to control both DRAM and VRAM directly, the memory interface requires only the addition of buffers, transceivers and minimal control logic.

## Video Display

UWGSP3 also provides a solution to the resolution and aspect ratio problems experienced in earlier UWGSP systems. The aspect ratio for the subsystem is adjusted for $1: 1$ in all display modes, providing a proportionally correct image required formost graphics and image processing applications. Furthermore, the $1280 \times 1024$ display resolution provides sufficient display resolution for most applications, while a roamable video/frame buffer of 2 K $\times 2 \mathrm{~K} \times 32$-bits ( 16 Mbytes) provides an acceptable solution to all others. The GSP generates the video timing signals; however, it cannot drive the display itself.

Four Brooktree RAMDACs are used to drive the monitor. Each RAMDAC has a $256 \times 24$ bit lookup table (LUT) which drives 8 bits each of red, green and blue signals. The red, green and blue outputs of each RAMDAC are summed together and the composite signals are used to drive the monitor. For true color applications, one RAMDAC will drive only red, one will drive only green and one will drive only blue. The fourth RAMDAC provides 8 bits of overlay information. For gray scale or pseudo-color applications, a single RAMDAC drives red, green and blue outputs concurrently while the other RAMDACs are disabled. While in this mode, it is possible to do region-of-interest (ROI) (i.e., different portions of the screen are assigned different color mappings and/or image data) by switching on and off different RAMDACs In specific regions of the display on a pixel-by-pixel basis. Thus, by enabling different combinations of the RAMDACs, the frame buffer can be configured either as a 24 -bit true color buffer with 8-bit overlay or as four separate 8, 4,2, or 1-bit buffers. Bit-per-pixel selection of $8,4,2$, or 1 is supported directly in hardware in the RAMDACs and augmented by appropriate clocking of the VRAMs. Overlay is also available in the ROI or $8,4,2$, or 1 -bit modes; however, one of the 8 -bit planes must be used for the overlay leaving only three available for image display. Images with contrast resolution ranging from 9 to 24 bits per pixel may also be windowed and leveled into 8 bit gray scale or pseudocolor images using the FPPs. The RAMDACs also include support for a hardware cursor and integer zoom. The cursor shape, color and intensity are stored in a $64 \times 64 \times 2$-bit array within each RAMDAC. The hardware zoom feature requires the support of an external state machine implemented in a Xilinx Logic Cell Array.

## Software Architecture

The overall software architecture for the UWGSP3 and NeXT ${ }^{\mathrm{TM}}$ host system is shown in Figure 2. At the lowest level, drivers local to the UWGSP3 board provide screen management functions as well as graphics and image processing primitives. Commands and data are transferred from the host system to the UWDGSP3 over the NextBus using NeXT ${ }^{\mathrm{TM}}$ hardware specific drivers via the host interface. The NeXT ${ }^{\mathrm{TM}}$ drivers use memory mapped I/O to make the entire usable GSP address space available to the host. Implemented on top of this functionality will be device independent image processing and graphics functions which will provide a consistent and portable software interface for applications.

The communication protocol between the host and UWGSP3 is administered by a command decoder running locally on the UWGSP3. The command decoder prowides entry points to screen management functions as well as entry points to FPP external microcode routines and FPP management functions.


Figure 2. UWGSP3 Software Architecture
Microcoded routines are implemented as parallelized algorithms. Each FPP is assigned a different portion of the image to process while the GSP is responsible for handling the data transfers to and from the FPPs. This is done in such a way as to maintain the scalability of the coprocessor board; that is, the routines will be able to utilize any number of FPPs up to the maximum of four. As long as the parallel algorithms maintain a ratio of 3:1 or greater for the amount of time spent processing relative to the amount of time spent transferring data, the power of all four FFPs can be fully utilized. Code generation for both the TMS34020 and TMS34O82 is being done in C with assembly language and microcode mixed in as necessary for optimization.

Using the Interface Builder development tools available on the NeXT ${ }^{\mathrm{TM}}$, different graphical user interfaces can be quickly prototyped and implemented for applications. Figures 3 and 4 illustrate a prototype interface of an interactive filtering package for UWGSP3 being developed. Figure 3 shows an example of the window used to specify a filter's frequency response. The shape of the frequency response curve may he changed by either typing in the desired parameters or by using the mouse to interactively drag one of the control points (identified as a solid black dot). In this example, a lowpass fllter is shown; but in addition, there are filter windows for highpass, bandpass, bandstop, and azimuthal filters. Once a desired filter is designed, the usercan apply the filter to the image by performing a 2-D FFT operation on the image, multiplying the filter and image in the frequency domain, peforming a 2-D IFFT operation, and displaying the filtered image in its window. The UWGSP3 can complete the entire process interactively (e.g., taking only a few seconds for a $256 \times 256$ image). Thus, trying fllters with different characteristics can be easily supported on UWGSP3 without undue delay to the user.


Figure 3. Lowpass Filter Specification Window
Besides allowing interactive filter specification for frequency domain filtering, the package supports image loading and frame buffer roaming and zooming (Figure 4). The image load window (on the left) allows any size image (up to $2048 \times 2048$ ) to be loaded anywhere within the $2048 \times 2048$ frame buffer. Control buttons are provided for standard sized images from $64 \times 64$ to $2048 \times 2048$. Another set of control buttons determines which channel the image will he loaded into: red, green, blue, or overlay. The frame buffer window is a scaled representation of the entire frame buffer area. The rectangular black outline defines the boundaries of the current display region. The mouse may be used to move the display to a different portion of the frame buffer by clicking and dragging on the display outline. Zoom buttons on the right allow the display to be zoomed by any integer from 1 to 8 . The size of the display outline shrinks to reflect the reduced display region as higher zoom levels are employed. The position of previously loaded images are indicated by the shaded rectangles in the frame buffer window. The interface development effort has been in progress for several months and is almost complete at the time of this writing. The programmer attributes the short development time to the use of the Interface Builder tools.


Figure 4. Image Load (left) and Virtual Frame Buffer (right) Windows

## Application Areas

The UWGSP3 board set and software libraries transform the NeXT ${ }^{\text {TM }}$ computer into an affordable image processing and graphics workstation with processing performance currently available only with much higher cost workstations. Furthermore, the UWGSP3 board set is designed to be flexible enough to provide processing in a wide range of imaging and graphics applications while most other systems are optimized for specific tasks. Described below are a few of the application areas of UWGSP3 in medical imaging, which show mainly the image processing and display capabilities of the system; however, 2-D and 3-D graphics tools are also being developed.

## PACS Workstation

Requirements for a PACS workstation include the following:

- High-resolution image display
- Large image frame buffer and magnetic storage
- Text display
- Network to tie together workstations
- Archival storage
- Rapid image retrieval and display

Together, the UWGSP3 and the $\mathrm{NeXT}^{\mathrm{TM}}$ computer provide a solution to each of these requirements. The 1280 $x 1024$ display satisfies the condition for high resolution display in most applications (with the possible exception of digital radiography where resolutions of up to $2048 \times 2048$ are needed). In the arena of on-line storage, we have 16 Mbytes of frame buffer memory available for image storage. In terms of 8 -bit gray-scale images, this is enough storage for $161 \mathrm{k} \times 1 \mathrm{k}$ images, $64512 \times 512$ images, $256256 \times 256$ images, or $1024128 \times 128$ images. In addition, NeXT ${ }^{\mathrm{TM}}$ offers a 660 Mbyte ESDI hard drive which may be used for temporary storage of a local image database to hold a reasonable number of images downloaded from the central database. Furthermore, the UWGSF3's overlay channel meets the PACS workstation's textual (as well as graphical) annotation needs. As for network capabilities, the NeXT ${ }^{\mathrm{TM}}$ features a built-in thin wire Ethernet interface and its operating system supports the TCP1IP protocol. This makes the UWGSP3 system suitable for use with other PACS systems, such as the UW prototype PACS, which already use Ethernet and TCP/IP Kim et al., 1989]. Finally, the NeXT 's ${ }^{\mathrm{TM}}$ leading role in using optical disk technology provides UWGSP3 with a high-capacity transportable storage media capable of storing 256 Mbytes per disk at a cost of about $\$ 0.20$ per Mbyte, or the UWGSP3 system can have access to a central archival storage unit, e.g., a 5-114" optical disk jukebox at a lower cost per Mbyte.

One area in which UWGSP3 could use improvement is in the rapid transfer of data from storage to the display. Currently, it takes ahout 3 seconds to load a $512 \times 512 \times 8$-bit image and 38 seconds to load a $2048 \times 2048 \times 32$-bit true-color image. However, we are currently evaluating the feasibility of incorporating a parallel transfer disk to improve the NeXT 's ${ }^{\mathrm{TM}}$ disk performance. Furthermore, we are studying a hardware modification to allow UWGSP3 to operate closer to the NextBus' 100 Mbytes second peak transfer rate.

Another benefit of UWGSP3 is derived directly from the NeXT ${ }^{\text {TM }}$ host system. O'Malley [1989] advocates an iterative approach to PACS workstation development that involves a cycle of prototype evaluation and revision. The NeXT 's ${ }^{\mathrm{TM}}$ Interface Builder tool and its use of the Objective-C object-oriented programming language, provide the rapid prototyping ability needed for this type of development paradigm.

## Electronic Alternator

The use of a digital system for emulating a conventional film alternator has been analyzed by several researchers [McNeill et al., 1988] [Beard et al., 1988][Choi et al., 1990]. Analysis of these systems and their ability to emulate the current film altemator configuration digitally, has revealed several problems which must be overcome before workstations of this type can be used in radiology departments. Some of the problems include:

- Slow image loading rates for large images
- Inadequate number of image displays
- Resolution requirements ( $2 \mathrm{k} \times 2 \mathrm{k}$ ) are cost prohibitive
- Current systems do not address radiologists' needs beyond display and processing

Image loading rates for images as large as 2 kx 2 k vary from system to system, but may require as much as 1 minute to load. Recent advances in disk technology such as the Parallel Transfer Disk (PTD) can reduce this to a few seconds, but the ability to maintain as many images as possible resident in memory still remains important, to provide as interactive a system as possible. The UWGSP3 can hold up to four $2 \mathrm{k} \times 2 \mathrm{k} \times 8$ bit images in memory at one time, any of which can be displayed instantaneously. In addition, up to three UWGSP3 may be installed in a single $\mathrm{NeXT}{ }^{\mathrm{TM}}$ computer system allowing a total of up to twelve $2 \mathrm{k} \times 2 \mathrm{k} \times 8$-bit images or $\operatorname{six} 2 \mathrm{k} \times 2 \mathrm{k} \times 16$-bit images to be resident in memory at any one time. Coupled with a PTD this large number of image storage frames would allow for acceptable speeds in displaying radiological studies.

Viewing a large number of images at one time is important to the radiologist in that it better emulates the current mechanical alternator configumtion and allows images to be compared side-by-side. As previously indicated, currently up to 3 UWGSP3 boards can be installed in a single NeXT ${ }^{\text {TM }}$ computer. Thus 3 separate displays, in addition to the $\mathrm{NeXT}{ }^{\mathrm{TM}}$ display are available for image viewing and manipulation. This limitation on the number of displays is not limited by the design of the UWGSP3 itself, but in the number of backplane slots available on the $\mathrm{NeXT}^{\mathrm{TM}}$ computer system. In later iterations of the board, multiple displays may be available on a single board, further increasing the number of available displays.

Providing displays with resolution as high as 2 kx 2 k is at present cost prohibitive when a large number of displays are required in UWGSP3, this issue was addressed by implementing a $1280 \times 1024$ display which is adequate for display of most imaging modalities. Display of multiple images as large as 2 kx 2 k is also possible on UWGSP3 by using hardware pan and scroll of the $1280 \times 1024$ display in the $2 \mathrm{k} \times 2 \mathrm{k}$ image frame. For multiple displays, this roaming can be done on all images concurrently, providing the same positioning of the display in all images. Or, if desired, the images can be panned and scrolled individually to view different areas of each image.
Many electronic altemator systems have addressed the display and processing needs of the radiologist. However, the integration of verbal annotation and/or digital film annotation is not always addressed. Verbal annotation may still be done using existing dictation hardware (e.g., a dictaphone system); however, the integration of this into the electronic altemator system would allow the verbal annotation to be directly associated with the digital image in a complex database. The $\mathrm{NeXT}^{\mathrm{TM}}$ computer host provides the built-in capability for voice digitization, which could be linked with the image in a database. Potential also exists for voice recognition of commands and for speech to text conversion, using the Digital Signal Processor (DSP) available on the $\mathrm{NeXT}^{\mathrm{TM}}$ computer host or some other specialized hardware.

## Image Processing and Graphics

The optional 160 MFLOPS peak performance coprocessor board makes the UWGSF3 an excellent platform for image computing. The GSP supports many frame buffer manipulation functions (e.g., P1XBLT, FILL, and image arithmetic) and the FPPs include many built-in microcode routines for both image processing and 2-D and 3-D graphics operations (e.g., $3 \times 3$ convolution, matrix and vector operations, polygon clipping, and backface elimination). Thus, the UWGSF3 serves as a platform suitable for both image processing and graphics applications.

## NxN Variable Size 2-D Convolution

NxN 2-D convolution can be used to implement a variety of image processing filtering operations such as lowpass, highpass, edge enhancement and edge detection. The algorithm is parallelized by segmenting the image into regions and assigning each locality to a different FPP. The predicted performance for a $512 \times 512 \times 32$-bit image (using all four FPPs) is as follows:

- $5 \times 5$ kernel 0.7 seconds
- $11 \times 11$ kernel 3.0 seconds
- $15 \times 15$ kernel 5.5 seconds


## FFT/IFFT

In some image analysis applications, FFT filtering techniques are often more convenient and intuitive than 2-D convolution and therefore more desirable. Thus, UWGSP3 must provide efficient FFT and IFFT algorithms. The FFT and IFFT will be implemented using the row and column method. Each of the four FPPs will be given an entire row or column to process, thereby parallelizing the operation. The predicted performance for a $512 \times 512 \times 32$-bit image using all four FPPs is 4 seconds for either an FFT or an IFFT.

## Geometric Transformations (warping)

Geometric transformations are utilized in correction of image distortion arising from deficiencies in the acquisition apparatus, image registration for multimodal image analysis, and interpolated zooms for applications in image magnification and minification. Each FPP is assigned a different region of the resultant image. For each destination pixel, the FPPs calculate the coordinates of the source pixels. The GSP passes the source pixels to the FPPs which then perform a bilinear interpolation using the pixel values. Predicted performance of a lower-order warp using bilinear interpolation is 2.5 seconds for a $512 \times 512 \times 32$-bit image.

## Window and Level

In digital radiography and CT and MR images, pixel sizes of up to 12 bits are generated. Most systems do window and leveling of these images by manipulating a 12-bit to 8-bit video output lookup table [Austin, et al, 1988]. Because UWGSP3 utilizes 8-bit RAMDACs, this method cannot be used. Instead the coprocessor board is used to perform a transformation of the 12 -bit image into a window and leveled 8-bit image. The FPPs are used to calculate a transformation lookup table. Regions of the image are then sent to the FPPs which use the lookup tables to produce the 8 -bit image. One of the advantages of this method is that the window and level operation can be limited to user defined regions and need not affect the entire display. Furthermore, this method may be used with pixel sizes greater than 12 bits (up to 24 bits per pixel). A full screen ( $1280 \times 1024$ ) transformation of a 24 -bit image to an 8-bit image is expected to take less than 0.3 seconds.

## Graphics

As mentioned in previous sections of this paper, the TMS34082 FPP includes many built-in microcode routines for 2-D and 3-D graphics which can be used to form the core functionality of a graphics library. In addition, the relatively large external microcode and data storage ( $16 \mathrm{~K} \times 32$-bit for each FPP) allows higher level operations such as ray tracing and volume rendering to be added to the standard set of functions. Furthermore, the UWGSP3 architecture which couples the GSP with multiple FPPs, allows the computational workload to be distributed among the processors. Thus, each FPP can be given a different portion of the object database or a different set of tasks in the rendering process while the GSP is utilized to maintain the integrity of the frame buffer and transfer blocks of data to and from the FPPs.

Since the design is centered around the TM534020, the availability of the Texas Instruments Graphics Architecture TIGA-340 interface allows a UWGSP3 ported to an IBM AT-compatible (or MCA or EISA-based) architecture to be immediately compatible with many graphics applications written for this standard. In addition, UWGSP3 can be made to emulate other widely accepted video adapters such as EGA and VGA to support a vast number of different application programs.

For the current NeXT ${ }^{\text {TM }}$-based version of UWGSF3, graphics standards including PHIGS, PHIGS + , GKS, and Renderman are being evaluated for implementation. The use of one of these standardized graphical programming interfaces, with the low level operations written to exploit the multiple FPP architecture, will further enhance UWGSP3's utility for 2-D and 3-D graphics applications.

## Conclusion

With its increased display resolution, enlarged frame buffer storage, multiple floating point processors and intuitive graphical user interface, UWGSP3 represents an innovation in image computing workstation design and a significant step towards providing affordable real-time display and processing for a variety of applications. It provides an integrated platform for more acceptable and productive end-user environments for both image processing and graphics in the future. In this paper, we have described the basic architecture of UWGSP3, and several solutions to medical imaging applications including use as a PACS workstation, image processing and graphics computational engine, and a multiple display electronic alternator. With the hardware implementation and low-level software now completed, the task of creating the application software to achieve the UWGSP3's potential in these areas and others will extend into the months ahead.

## Acknowledgements

We would like to acknowledge the efforts of Clark Haass of the ICSL for his work on the interactive filtering package user interface. We would also like to acknowledge Texas Instruments for their help and the donation of the prototype chips and development tools used in UWGSP3; in particular, we wish to thank Mike Asal of the GSP Group in Houston and Ron Drafz of the VLSI Group in Dallas.

## References

J. D. Austin and T. Van Hook, "Medical image processing on an enhanced workstation," SP1E Medical Imaging II, 914:1317-1324, 1988.
D. Beard, J. L. Creasy, J. Symon and R. Cromartie, "Experiment comparing fllm and the FilmPlane radiology workstation," SP1E Medical Imaging II, 914:933-937, 1988.
P. Chinn, R. M. Pier, L. A. DeSoto, H. G. Zieber, D. A. Verheiden and Y. Kim, "PC-based floating point image processing system," Electronic Imaging Spring ' 88 , pp. 233-238, 1988.
H.S.Choi,H. W.Park, D. R. Haynor and Y. Kim, "Development of a prototype electronic alternator for DIN/PACS environment and its evaluation," SPIE Medical Imaging IV, 1234:in press, 1990.
S. Jobs, Keynote Address at the University of Washington Computer Fair, March, 1989.
Y. Kim, D. R. Haynor, 0. Saarinen, A. H. Rowberg, and J. W. Loop, "Preliminary PACS experience at the University of Washington," Electronic Imaging Spring '89, pp. 142-147, 1989.
J. C. Gee, L. A. DeSoto, D. R. Haynor, Y. Kim and J. W. Loop, "User interface design for a radiological imaging workstation," SP1E Medical Imaging III, 1093:122-132, 1989.
K. McNeill, G.W. Seeley, K. Maloney, L. Fajardo, and M. Kozik, "Comparison of a digital workstation and a film alternator," SP1E Medical Imaging II, 914:929-932, 1988.
K. G. O'Malley, "An iterative approach to development of a PACS display workstation," SPIE Medical Imaging III, 1093:293-300, 1989.

Texas Instruments TMS34020 User's Guide, 2nd Review Copy, August, 1989.
Texas Instruments TMS34082 Floating-Point Processor Product Preview, September, 1989.

## Appendix F

## Parallel Signal and Matrix Processing with the TMS34082




This application note will discuss and analyze a TMS34082 based parallel architecture.

E. M. Dowling and Z. Fu<br>Erik Jonsson School of Engineering and Computer Science<br>The University of Texas at Dallas<br>Richardson, Texas 75083-0688

R. S. Drafz

Texas Instruments, Incorporated, M/S 8316
Dallas, Texas 75265-5303

## Introduction

VLSI floating-point processor technology is evolving to meet the increasing need to execute sophisticated algorithms at higher and higher rates. Architectural advances in floating-point pipelines and processor organization have led to the TMS34082's high speed arithmetic core and its RISC control structure. However, some applications require much higher speeds than provided by a single TMS34082. A parallel processing solution may be the answer.

The goal of parallel processing is to speed computation by designing the appropriate number of processors into the system. These processors each work on pieces of the algorithm separately, and pass intermediate results among themselves. The simplest and most common form of parallel processing is to assign each processor a different tasks. For example, in a typical computer system, there may be a simple processor in the keyboard, a CPU, and coprocessors for memory management and floating-point operations. Of interest here are the parallel processing architectures that use many identical processors to solve a single numerical problem.

Some common architectures that achieve this are shared memory machines. (Sequent and Multi-Max), distributed memory machines (Ncube, IPSC) and systolic arrays (WARP) [1]. They all use duplicate processors, but have different storage, communication, and programming schemes. Some parallel architectures require all processors to execute the same instructions, but to work on multiple data streams (SIMD = Single Instruction Multiple Data.) Others allow each processor to act independently by providing multiple instruction streams (MIMD = Multiple Instruction Multiple Data.) Many architectures exist to solve numerical problems such as those that arise in scientific computation and signal and image processing applications. Many experimental machines have been targeted to these structured computation intensive applications [1] [3] [4] [5].

In this application note we will design and analyze a TMS34082 based parallel architecture. The architecture will be a MIMD hybrid shared /distributed memory machine that supports message passing as well as systolic data streaming. This structure provides maximumflexibility at a relative low cost. In addition, the system can be scaled so that any number of processors can be added as the application requires. The system reaches a peak of 40 MFLOPs per processor, and sustains a rate of 10 MFLOPs per processor on structured numerical algorithms. For example, if an algorithm must be executed in real time at 150 MFLOPs , a system with about fifteen or sixteen processors is needed.

Parallel architectures are measured in terms of their speed increase over a sequential architecture using the same type basic cell. (A cell can be thought of as a processing unit that would be the CPU/memory/I/O system in a sequential machine). In the MIMD system, the I/O portion of a cell is generally connected to other cells as well as other conventional I/O devices such as disks, A/D converters, etc. Parallel architectures performance is limited by dependencies found in many algorithms, or, more fundamentally, found in many mathematical problems. These dependencies might cause the parallel architecture to performless efficiently than a straight sequential architecture due to communication overhead. If the problem itself is not inherently serial, then a parallel algorithm must be designed to solve the problem. Often this parallel algorithm can be derived from the sequential algorithm by rescheduling the computations so that the algorithm dependencies are satisfied, but many independent calculations will be computed at each step.

At this stage of design, the algorithm becomes linked to the underlying architecture. The most challenging part of the design is not to decide which computations can computed in parallel, but to minimize communication delays and waiting time. This requires the algorithm designer to match the dependence structure of the algorithm to the systems communication structure and processor granularity.

The parallel algorithm must be analyzed to see how it performed. The optimum is to use $n$ processors for an $n$ times speed increase (linear speed-up.) However, Amdahl predicted that most algorithms will have a logarithmic speed-up because their communication burden typically grows exponentially. Fortunately, linear speed-ups can often be attained on modern architectures solving well structured practical numerical problems, due to their regular communications requirements.
As we see, there is a tight interplay between the algorithm and the architecture. In a sense, the algorithm is mapped onto the architecture. It is our intent to design an architecture that can efficiently supportmany different algorithms. A hybrid approach as discussed in the next section was taken to achieve this level of flexibility. The architecture provides all point-to-point and broadcast paths through a single bus. A bidirectional ring of FIFO buffers connects adjacent processors so that high throughput can be achieved. The architecture design was driven by the matrix multiplication, FFT, QRD, and SVD algorithms. Simulation was used to arrive at an architecture that could support all of these representative algorithms.
Once the basic architecture is established, the cell must be carefully designed. As stated above, the performance of the parallel architecture is based on the speed increase over a single cell. If the cell is poorly designed and slow, the increase will be negligible. The cell designed here is based on the TMS34082 acting in host-independent mode. The MSD bus is used for instructions, while the LAD bus is used for data. The TMS34082 requires some sort of addressing assistance on the LAD side, an address latch at the very least. With this assistance, the TMS34082 has a Harvard architecture so can be made to maintain a steady instruction stream while manipulating data on the LAD side. This capability is of paramount importance in a TMS34082 system, and does not come without careful attention. The TMS34082 does not have any organic LAD addressing support. All LAD addresses must be computed in the floating-point core. Furthermore, when the C-compiler is used, local variables are stored on the MSD side and are accessed through stack operations. Even the stack pointer manipulations are carried out in the floating-point pipe, causing 'bubbles; and reducing performance. To bring the performance of the cell up to the full TMS34082 capabilities, a more sophisticated LAD bus controller will be specified. This controller will have its own register set and an integer unit to perform pointer manipulations under the control of an extended instruction field. The same LAD bus controller will be capable of routing data to more than one destination in a single bus cycle and will be able to move data while the TMS34082 is performing other functions such as floating-point loops.

## The HARP Architecture

The design of the Hybrid Array Ring Processor (HARP) architecture was guided by a set of goals. First, the architecture was designed to perform a wide range scientific and DSP oriented algorithms. Second, it was designed to be scalable and expandable so that applications specific systems could be easily configured to the user's needs. It was also decided that the architecture should support both single- and double-precision IEEE standard floating-point arithmetic. The principle concern with the interconnection structure was that it had to support both high throughput and fast point-to-point paths. The interconnection topology was to be as simple as possible, yet had to support the target algorithms cleanly and efficiently. From a software perspective, the architecture had to be programmable in an extended version of C, much like hypercubes. Also the architecture needed to support an operating system such as UNIX. Finally, a version had to be implemented that could be accurately simulated in software so that performance measurements could be made.

Matrix multiplication, FFT, QRD and SVD algorithms shaped the architecture. A simulator was built using the Rice Parallel Processing Testbed (RPPT) package along with the TMS34082 C-compiler and chip simulator. The RPPT simulator can run programs written in a superset of $C$, called Concurrent $C$ (CC). An architecture model was written so that waiting time, date transfer delays, and the overall effects of the system topology could be measured. Profile information from the TMS34082 simulator is fed into the RPPT simulation so that the overall simulator measures the actual cycle counts of the parallel program executing on the architecture. Architectural modifications were made whenever limitations and bottlenecks were revealed by the simulation process.


Figure 1. System Architecture
The overall HARP architecture is depicted in Figure 1. The host can be any subsystem that can run the desired operating system. For purposes of simulation, the host was considered to be a 33 MHz MC68030 with associated support hardware. The system bus can be any high-performance bus, but for simulation purposes was taken to be the native MC68030 processor bus running in asynchronous mode. The architecture, however, can be readily implemented for an open bus standard such as VME, Future Bus, etc. for example, a VME based version would be built around an available single-board UNIX engine, memory boards, and I/O cards. PEs would be added in groups of four per card. Each card would have a ring port in and a ring port out connector on the front. Thus flexible systems can be configured to meet specific processing, memory and I/O requirements. Application programs are written in CC to run on P processors, so that the same program will run on systems with different numbers of processors.
The hybrid aspects of the architecture are highlighted when looking at the system's programming models. From a global perspective, the host sees a conventional system that is augmented with smart memory segments as outlined in the memory map of Figure 2. The host must load these segments with code and data and read out the results. From the PE/bus perspective, the system takes the shape of a shared memory machine. Using the shared memory mode, processes are forked to the various nodes and communication primarily takes place over the bus where synchronization is maintained using semaphores and join constructs. The system can also be viewed as a message passing machine. Here processes on the nodes communicate using the send- and receive-message commands. The messages can be routed through the ring or over the bus. Finally, the system can be programmed as a linear or ring systolic array (with broadcast.) The systolic mode is the fastest mode if local PE to PE communication is required by the application algorithm. In this mode a steady data stream flows through the ring network in lockstep with processing. The cell architecture is optimized so that systolic communication and computation overlap.


Figure 2. System Memory Map
The PE is depicted in Figure 3. It consists of a TMS34082 floating-point RISC processor, a 30 ns 512 K word memory bank, a bus interface, a local bus controller, and a 35ns 32-bit BIFIFO that connects the PE[k] to the $\operatorname{PE}[\mathrm{k}+1]$. A full system bus interface implementation includes a local bus arbitration protocol that allows any PE to directly access any other PEs' local memory. The PE is built around the TMS34082's Harv ard architecture. The program is stored on the Micro Store Data/Address (MDS \& MSA buses) side while the data is stored on the Local Address and Data (LAD) side. The TMS34082 architecture consists of a high-performance FPU, a register file of twenty 32 -bit (or ten 64-bit) registers, and a microsequencer. The TMS34082 does not have an address arithmetic unit nor does it have an address bus on the LAD side. In order for the system to reach high performance levels, an external LAD bus controller was designed to compute LAD addresses and provide low level timing and control signals to the devices attached to the LAD bus (FIFOs, SRAM, bus interface).


Figure 3. PE Architecture
The LAD controller provided over an order of magnitude performance increase. With the controller, the TMS34082 is capable of performing an entire load or store operation and pointer update in a single clock cycle. The LAD controller can interleave accesses from the memory and the FIFO, as is often needed. The LAD bus controller also can be configured to accelerate systolic ring communication and local memory operations. Suppose the TMS34082 of $\mathrm{PE}[\mathrm{k}]$ is to read a word from the FIFO connected to $\mathrm{PE}[\mathrm{k}-1]$. In many applications it will be also necessary to pass the datum onto the FIFO that is connected to $\mathrm{PE}[\mathrm{k}+1]$ and possibly store the received datum in local memory. If both are required, the LAD controller will generate the signals for the FIFO[k-1] read, the FIFO[ $\mathrm{k}+1]$ write, and the memory write all in the same TMS 34082 read cycle. The LAD controller's FFT address generator speeds FFTs by over a factor of 100 .


Figure 4. LAD Bus Controller Architecture

Different LAD controller designs have been considered. Some are algorithm specific, like the FFT, and some are more general. The most general LAD controller, as depicted in Figure 3, can be configured and performs LAD addressing under program control. The architecture of the LAD controller is shown in Figure 4. It consists of a register file, and addressing ALU (complete with FFT instructions), a LAD bus timing and control signal decode section, and an instruction decoder. The LAD controller accepts an instruction stream from an extended microcode field on the MSD bus. The registers may be loaded from the MSD bus as immediate operands. In a sense, the TMS34082's instruction set is expanded to include LAD addressing modes. The LAD addressing related fields in the instructions are stored in a separate four-bit memory (LADM) whose address lines are connected to the MSA bus.

## TMS34082 Host-Independent Mode Optimizations

The TMS34082 operates in either the coprocessor or the host-independent mode. It is most commonly used as a floating-point accelerator for the TMS340 graphics processor and is less commonly used in the host-independent mode, where it acts as an autonomous floating-point RISC processor. While the HARP architecture does assume a host processor to run the operating system, it employs the TMS34082s as loosely coupled processors operating in the host-independent mode. In this section some of the more striking aspects of designing a system around the TMS34082 in host-independent mode are discussed. The TMS34082 is capable of achieving very high computational throughputs.

The TMS34082 is truly a compact architecture, fitting somewhere between a conventional RISC and a dedicated floating-point unit. It lacks many common microprocessor features such as an on-board parallel address generator, while it provides an assortment of high-performance floating-point operations and subroutines such as division and square root. The key to success with the TMS34082 in host-independent mode is to keep it fed with data. If the TMS34082 is asked to manage the data stream into the chip, performance will be downgraded. Also, special care must be taken with loop management.

The first thing to take into consideration when developing a system for the host-independent TMS34082 is its Harvard architecture. The chip is designed to accept an instruction stream from the MSD (microstore data) bus and a data stream from the LAD bus. In coprocessor mode, the TMS340 has no problem keeping the TMS34082 fed with data because the LAD bus is directly connected to the TMS340 bus. However, in host-independent mode, special care must be taken to keep the data flowing on the LAD bus under the control of something other than the TMS34082. The reason for this is two fold. First of all, the TMS34082 LAD bus has both address and data multiplexed onto the same bus. Without external support, this means that an address must be first sent out to an external address latch prior to any LAD access, reducing the LAD bus bandwidth by $50 \%$. A more sever problem arises if the TMS34082 is used to generate data addresses as would be done in a conventional architecture. Because the TMS34082 does not have a separate integer addressing unit, all pointer updates must be computed in the FPU core. This means that the floating-point pipeline must be broken every time an external access is required. It turns out that non-judicious use of the TMS34082 for pointer updating can easily downgrade performance by an order of magnitude or more. Thus it is recommended that an external bus controller be designed into the system that performs the pointer manipulations needed to support the algorithms that will run on the target system.

Once the hardware has been configured, it is important to optimize the software. The first rule of thumb is to make judicious use of registered variables. Often compilers will use frame pointers and related pointer arithmetic to access local variables. As mentioned previously, if the TMS34082 is to perform pointer arithmetic, it will have to do it in the floating-point pipeline at great expense. The compiler will often place local variables on the stack which is located on the MSD side. This means load and store operations take two cycles each instead of the one cycle on the LAD side. More importantly, each local variable access will involve several external accesses to compute the stack pointer relative address of the local variable. Due to these considerations, it takes the TMS34082 11 cycles to compute $k=k+1$ if $k$ is a local variable defined on the MSD stack. On the other hand, if $k$ were declared as a registered variable, the same operation would require only one cycle. Thus great performance dividends will be paid to those whoput as many of the most often used local variables of each routine into registers.

The old axiom that $90 \%$ of a program's time is spent on $10 \%$ of the code is very true when it comes to numerical routines. In fact, most numerical routines spend $90 \%$ or more of their time in tight inner loops. For example, the inner loop of the routine to multiply two $256 \times 256$ matrices on a single TMS34082 would be entered and exited 65,536 times. At each iteration, one multiplication and one addition is performed, which can be computed by the TMS34082 in a single cycle using the mult.add command. Now consider the overhead needed to run the loop. If the loop counter variable were located on the MSD stack it would take 11 cycles to increment the loop counter, it would take another 12 cycles to check to see if the loop terminated. In addition, using the FPU core to perform loop counter iterations and stack pointer manipulations forces the compiler to use separate multiply and add operations and store intermediate results. Thus while the TMS34082 provides the ability to compute a floating-point multiply-accumulate a single cycle, an unoptimized loop might spend 30 cycles each iteration in loop overhead. If care is not taken, loop overhead alone can reduce the performance to $3.33 \%$ efficiency. This figure does not even account for data accesses. The loop overhead can be reduced to about four cycles per iteration if registered variables are used. Even better, it can be reduced to one cycle through the use of the LOOPCT register and the cjmp.d instruction. The cjmp.d instruction is a decrement and branch instruction that decrements LOOPCT, compares it against zero, and takes the appropriate branch all within a single cycle.

Now take data accesses into account. An inner-product loop is set up by initializing LOOPCT with the inner-product length, clearing the accumulator, and loading the base addresses of the two input data arrays. First consider the case where the pointers are loaded into an external LAD controller. Two data loads are performed in two cycles while the LAD controller autoincrements the pointers for the next loop iteration. Next a mult.add instruction is used to perform the multiply accumulate in a single cycle. Finally, the cjmp.d is used to decrement the loop counter and branch to the beginning of the loop. This implementation required four cycles per loop iteration and a LAD controller that could interleave two increment pointer registers. Now consider an implementation that does not use an external LAD controller, but does use LOOPCT and registered variables for the loop pointers. The loop starts out by performing two pointer additions with the output sent to the LAD bus and two loads; four cycles. Next the multiply and add are computed in two instructions because the FPU pipeline had been interrupted. The cjmp.d is the last instruction in the loop. This loop had a total count of nine cycles.

The two above loops can sustain 10 MFLOPs and 4.44 MFLOPs respectively. Slightly enhanced performance can be achieved many loop iterations are in-line coded into a single loop iteration. If the vector length is 100 , then the inner product could be computed in ten loop iterations if ten multiply accumulates are performed in each loop iteration. With the external LAD controller, twenty loads are followed by ten mult.adds and one cjmp.d. This reduces the number of cjmps by nine, but adds additional loop end condition checking overhead if the number of loop iterations is not a multiple of ten. Anywhere between one and ten multiply accumulates can be performed in the inner loop depending on the divisors of the inner product length. Using this optimization technique, the sustained inner product performance can be raised from 10 MFLOPs to 15 MFLOPs.

Experience shows that one must recognize what the TMS34082 is and what it is not. It is a high-performance floating-point unit that can execute floating-point code efficiently. It is not a general purpose processor with a full set of addressing modes and parallel on-board executions units. Great speed-ups in compiler generated code can be easily achieved through judicious use of registered variables. Hand optimized assembler level optimizations can be attained by using the LOOPCT register and the cjmp.d instruction. Further speed-ups come through the use of and external LAD side address generator. Once a system architecture is defined, systems level optimizations can be made to overlay various bus operations into the same cycle.

## Algorithms

In this section several algorithms will be briefly discussed. First consider the problem of multiplying the matrices $A \in R^{m \times r}$ and $B \in R^{r \times n}$ to form the product $C \in R^{m \times n}$ on a system with P PEs. At the start of the algorithm, $A$ and $B$ are stored in the shared memory. At the end of the algorithm, the product matrix $C$ is returned to the shared memory. The first step of the algorithm is for the host to move the columns of B into the PE s using the system bus. Column $b_{j}$ is moved to PE[jmodP]. This column will be used to compute $c_{j}=A b_{j}$ so that $c_{j}$ will be accumulated on $\mathrm{PE}[\mathrm{j} \bmod \mathrm{P}]$. The matrix A is moved into the array at $\mathrm{PE}[0]$ and $\mathrm{PE}[0]$ 's right output buffer concurrently. The inner product of row $a_{i}$ and each resident $b_{j}$ is formed and stored as $c_{i j}$ 's. $\mathrm{PE}[\mathrm{k}]$ reads a word or row $a_{i}$ fromPE[k-1], one word at a time, directly into the TSM34082's FPU pipeline and stores the row in memory for future use and transfers it to the FIFO connected to PE[k+1] all in the same cycle. The rows of A stream through the system until the trailing row cycles through. Due to the ordering of computations, $\mathrm{PE}[0]$ will finish first, then PE[1] etc. Once $\operatorname{PE}[0]$ finishes, it sends its results over the bus to the system memory. Then PE[1] will follow suit, then PE[2] ... etc. on down the line.

Next consider the radix-2 decimation in time (DIT) FFT algorithm. Assume that the number of PEs, P is a power of two. Also assume that the LAD bus controller is capable of performing FFT address generation in addition to the autoincrement mode used in the previous algorithm. For purposes of illustration, suppose that a $\mathrm{N}=1024$ point FFT is to PE performed on $\mathrm{P}=8$ processors. The algorithm is outlined as follows. First decimate the time series into eight 128 -point subsequences and send the ith subsequence to $P E[i]$ over the system bus. Next each node computes a 128 -point radix-2 DIT FFT on the local subsequence. These sequences are built back up using standard binary tree recombination with twiddling. The tree is viewed as having the root node in processor zero $\log _{2}(\mathrm{~N})$ iterations into the future. At the first iteration, each PE is considered to be a leaf of the tree. At this iteration each $\operatorname{PE}[2 k+1]$ sends its results to $P E[k]$ for $k=0 \ldots(N / 2)-1$. The even PEs then perform the twiddle and recombination operations so that the even cells now have 256 -point sequences. At the next iteration, $\mathrm{PE}[4 \mathrm{~K}+2]$ sends its results to $\mathrm{PE}[4 \mathrm{k}]$ for $\mathrm{k}=0 \ldots(\mathrm{~N} / 4)-1$. Now the mod 4 PEs twiddle and recombine. Next $\mathrm{PE}[4]$ sends its result to $\mathrm{PE}[0]$ and $\mathrm{PE}[0]$ assembles the final 1024 point result. The communication of the algorithm is not local, but the algorithmpermits the data to be routed through idle cells so that a negligible penalty is paid for the nonlocal communication. The nonlocal communication only costs one cycle of delay per route-through node; the data rate of the data stream is not effected. Simulation studies have shown that this extra cost has a negligible effect on performance. The simulation studies did show that performance was reduced due to the nonsequential access requirement within a given local vector computation. The nonsequential addressing forces one to send entire messages instead of single elements at a time transparently. Also, as the recombination process progresses, more and more processors become idle.

Another version of the FFT was studied that was able realize the full potential of the system. Most applications that require an FFT actually require many FFTs. For example, a real-time processing system might require that 1024 point frames of an incoming signal be computed continuously. In image processing, a $512 \times 512$ pixel FFT can be computed by first performing 512512 -point column FFTs followed by 512512 -point row FFTs. Similarly, spectral based PDE solvers used in scientific application s require large numbers of 1-D FFTs to compute a single 3-D FFT. The course granularity of the system allows each separate PE to compute an FFT separately. This is a pure smart memory algorithm. The host loads the PEs with data and pulls out results. If enough processors are in the system, the overall computation rate is limited only by the amount of time it takes to load and unload a single smart memory segment with a complex data vector.

The next algorithm considered was the QRD. The QRD provides an alternative way to solve linear systems. The standard algorithmused to solve linear systems is Gaussian elimination with partial pivoting. The pivoting portion of that algorithm degrades performance in the HARP architecture, but a Householder QRD maps quite well. $A \in R^{m \times}{ }^{n}$ has a factorization $\mathrm{A}=\mathrm{QR}$ where $Q \in R^{m \times m}$ is orthognal and $R \in R^{m \times n}$ is upper triangular [11]. Consider the case where $\mathrm{m}=\mathrm{n}$ and $\operatorname{rand}(\mathrm{A})=\mathrm{n}$. Write $\mathrm{Ax}=\mathrm{b}$ as $\mathrm{QRx}=\mathrm{b}$ so that multiplying on both sides by $Q^{T}$ gives the triangular system $R x=Q^{T}$ which can be solved by back substitution. Note that multiplying both sides by $Q^{T}$ is equivalent triangularizing A by a sequence of orthognal transformations and applying these same transformations to $b$. In the case where $m>n$, this procedure may also be applied to solve linear least squares problems.

Suppose $A \in R^{m \times} n_{\text {is }}$ to be decomposed on a P processor system. Assign column $a_{j}$ to $\mathrm{PE}[\mathrm{j} \bmod \mathrm{P}]$. If P does not divide $n$, then some PEs will have extra columns. First set the iteration variable, $k$, and proceed as follows. At iteration k , the PE containing $a_{k}$ computes the vector $v_{k} \in R^{m-k+1}$ such that the bottom ( $\mathrm{k}-\mathrm{m}$ ) -element subvector of $a_{k}$, denoted $a_{k}(k: m)$ satisfies $H_{l} a_{k}\left((k: m)=\alpha \varepsilon_{1}\right.$ where $\varepsilon_{1}$ is the k-order unit standard basis vector and $\alpha=\left\|a_{k}(k: m)\right\|$. The kth transformation must be applied only to columns k through n . So $v_{k}$ is sent down the ring to the right from the PE where $a_{k}$ resides. When the head of the $v_{k}$ data stream arrives, the remaining PEs perform the transformation, $a_{j}(k: m)=a_{j}(k: m)-\left(v T_{k} a_{j}(k: m)\right) v_{k} \forall j>k$ to the local columns. The algorithm is essentially a waveform algorithm where the computation wave propagates to the right and a trail of results $(R)$ are left behind. If the matrix $Q$ is desired, the $v$-vectors may be saved so that $Q$ is available in factored form. If the algorithm is used to solve a linear or linear least squares system, the vector $b$ is augmented as the last column of A and loaded accordingly.

The final algorithm to be discussed is the SVD. Hestenes's method [6] [9] for computing the SVD has received much attention lately in the literature due to its parallel nature. The Hestense method is a one-sided Jaccobi algorithm that operates by applying orthognalizing plane rotations to all pairs of columns of the matrix $A \in R^{m \times n}$. A sequence of all pairs of such rotation is called a sweep. The algorithm can be shown to converge after a sufficient number of sweeps have been applied. Once the algorithmhas converged, the matrix, $A$, will have been transformed via orthognal transformations to another matrix, $B \in R^{m \times n}$, whose columns are orthognal to each other. If the product of the sequence of orthognal transformations is collected in $V \in R^{n \times n}$, then we have $A V=B$. It is trivial to next factor $B$ as $B=U \Sigma$ gives $A=U \Sigma V^{T}$, which can be seen to be the SVD of A. We do note that this algorithm generated $U \in R^{m \times n}$ and $\Sigma \in R^{n} x^{n}$ instead of $U \in R^{m \times m}$ and $\Sigma \in R^{m} x^{n}$ but that no information was lost.

Several systolic array algorithms have been devised to perform the Hestenes SVD algorithm on a linear bidirectional systolic array [2] [8] [10]. The methods are based on a theorem that states that the order in which all the pairs columns are orthognalized does not affect the overall convergence of the Jaccobi algorithm [8]. Algorithms are designed by selecting an ordering where groups of $P$ pairs can be computed at each time step on $P$ processors. The key to a successful ordering is that the next group of $P$ pairs in the ordering can be generated by local shifts of columns between processors. Figure 5 shows how columns are switched in order to generate such an ordering on a P-element bidirectional array. In the figure each PE is assumed to have two vector registers, VRa, and VRb which each hold a column of the matrix. If the columns are loaded into the vector registers, and permuted as depicted in the figure, one sweep will be computed every N - 2 update cycles. At the end of the sweep, the updated columns will return to their original position in the array, ready for the next sweep.


Figure 5. Parallel Jaccobi Updating on a Systolic Architecture
It is clear that this algorithm can be directly implemented on the HARP architecture. The bidirectional ring is sufficient for communication, and the vector registers can be implemented as buffer areas in the local SRAM. Each node must first compute the Jaccobi rotation matrix, apply it to the local columns stored on that node, and send the results to adjacent nodes as indicated in the figure. All nodes perform the same function except the end nodes. The shifting of data between the two data buffers on PE[N-1] can be accomplished by a single pointer swap. If the number of columns in the matrix exceeds 2 P , then the algorithm is slightly more complicated. Now PEs will have to do the job of more than one PE. The complication comes in the form of pointer housekeeping and additional conditional statements. Only the computations that represent the boarder PEs of the sub-array need to communicate with the neighbor PEs, the internal nodes of the subarrays just exchange pointers.

## Simulation Results and Performance Analysis

A simulator was constructed using the RPPT simulation package. The RPPT package consists of a CC compiler, an architecture modeling / analysis package, and a facility to bind CC programs to the architecture model. Once a program and architecture have been bound, RPPT runs a simulation of the program running on the architecture. While doing so, RPPT keeps track of time using a parallel time construct. It is able to account for delays caused by bus contentions, processes waiting for input, data transfers across a bus or communication channel, and processors executing code. The above mentioned delays can be caused by either the parallel program or the underlying architecture or both. In order to account for the time spent by processors executing code, RPPT converts the CC program into assembly code and assigns to each basic block a cycle count (a basic block has one entry, one exit, and each instruction in the block is performed exactly once.) It then inserts an instruction at the front of each basic block that increments a cycle counter variable by the number of cycles spent in that basic block. This act is called profiling, and is done to the native MC68020 assembly code generated for the execution of the simulation of a SUN3 platform. In order to make RPPT count TMS34082 cycles, the node program is recompiled on the TMS34082 C compiler, translated to assembly code, and profiled with using the TMS34028 simulator in single step mode. The basic blocks of the MC68020 assembly code are then cross profiled by replacing the MC68020 cycle counts with the TMS 34082 cycle counts. The key is that both programs execute the same C code so are essentially the same. The architectural modifications of the LAD bus controller are brought into the simulation here by updating the cycle count numbers to reflect the elimination of the cycles that are actually performed in parallel by the LAD controller.

The matrix multiplication was analyzed first. It showed us that the maximum sustainable throughput of a node was essentially limited to 10 MFLOPs . This is so because in the inner-loop of a long inner product required two loads, a mult.add, and a conditional jump. Thus two FLOPs are performed every four cycles, so that at 20 MHz , the TMS34082 can continuously compute data streams at a rate of 10 MFLOPs. This limit can be raised to up to 15.5 MFLOPs if the exact number of elements in the inner product is known ahead of time. For example, if the inner product length were 100 , the loop iterations consisting of 20 loads, ten mult.adds and one conditional jump would each perform 20 FLOPs every 31 cycles. The program used in the simulation was written for the general case so that the nodes were essentially limited to 10 MFLOPs sustained throughput rate.

The simulator was used to measure the efficiency of the HARP and to study the effects of matrix size and the number of processors in the system. The simulation accounts for the time to move the inputs to the nodes from the shared memory, compute the results, and move the results from the individual nodes back into the shared memory. The simulator counts all cycles to include addressing, loop management, testing of conditions, etc. It gives an indication as to the amount of time spent performing FLOPs and the amount of time spent on communication and overhead. Figure 6 shows a plot of the average MFLOPs achieved by a ten element array running matrix multiplications. Note that as the size of the matrices increase, the overall performance of the system approaches the theoretical limit of 100 MFLOPs . The reason that performance is not as close to the limit for smaller matrices, is that the I/O and program overhead becomes more significant. Figure 7 show a plot of the performance measured in average MFLOPs for systems running a $128 \times 128$ matrix multiplication using P processors. Note that for up to 32 processors (the largest array the simulator could handle) there is a linear speed-up as more processors are added. This violation of Amdahl's law is predictable because the communication overhead of the algorithm/architecture combination clearly does not increase exponentially as more processors are added.


Figure 6. Matrix Multiplication Performance on 10-Processor Systems


Figure 7. $128 \times 128$ Matrix Multiplication on P-Processor Systems

The next algorithm that was implemented was the FFT. The LAD controller provided hardware support for the radix-2 addressing scheme. In the first set of experiments, single FFTs of various lengths were computed on different sized arrays. The results are summarized in Table 1. We note that the sustained MFLOPS on a single processor is within $75 \%$ of the maximum sustainable throughput for a single node. The pay off for adding more processors, however, is less pronounced than in the matrix multiply algorithm. This is due to the fact that communication overhead can not be completely overlapped with computations. Thus, as more processors are added, the execution time of the algorithm decreases, but the efficiency of the system also decreases.

The pipeline FFT algorithm was also analyzed. Here the number of processors was determined to form an FFT pipeline that maximized overall performance. Using this number of processors, performance was limited only by the system bus bandwidth. Table 2 shows the results of the second set of experiments for transform lengths from 256 to 4096 that were performed on the optimum sized arrays. For each transform length/ array size pair, the table lists several parameters. First the 1-D pipeline FFT effective computation time is listed followed by the maximum sampling rate that could be accommodated for the various transform lengths. The next column shows how much time it takes to compute an $\mathrm{N} \times \mathrm{N} 2-\mathrm{DFFT}$ using the row/column algorithm. The sustained MFLOPs achieved for each 2-DFFT is listed last. The maximum attainable sustained computation rate can be taken to be $10 *$ P MFLOPs, were $P$ is the number of processors in the array. The efficiency is the measured sustained MFLOPS divided by the total attainable MFLOPs. The simulation shows the system efficiency ranges from $67.8 \%$ for the $256 \times 256$ transform to $80.8 \%$ for the $4096 \times 4096$ transform.

Table 1. Distributed FFT Performance Results.

|  | $N=512$ | $\mathrm{N}=1024$ | $N=2048$ | $\mathrm{N}=4096$ |
| :---: | :---: | :---: | :---: | :---: |
| $P=1$ | $\begin{gathered} t=3.69 \\ S M=7.47 \end{gathered}$ | $\begin{gathered} t=7.94 \\ \mathrm{SM}=7.74 \end{gathered}$ | $\begin{gathered} \mathrm{t}=17.0 \\ \mathrm{SM}=7.96 \end{gathered}$ | $\begin{gathered} t=36.2 \\ S M=8.14 \end{gathered}$ |
| $P=4$ | $\begin{gathered} t=1.64 \\ S M=16.86 \end{gathered}$ | $\begin{gathered} t=3.39 \\ S M=18.12 \end{gathered}$ | $\begin{gathered} t=7.03 \\ S M=19.23 \end{gathered}$ | $\begin{gathered} t=14.59 \\ S M=20.22 \end{gathered}$ |
| $\mathrm{P}=8$ | $\begin{gathered} t=1.40 \\ S M=19.79 \end{gathered}$ | $\begin{gathered} t=2.82 \\ S M=21.77 \end{gathered}$ | $\begin{gathered} t=5.74 \\ \mathrm{SM}=23.55 \end{gathered}$ | $\begin{gathered} t=11.72 \\ S M=25.17 \end{gathered}$ |
| $P=16$ | $\begin{gathered} t=1.31 \\ S M=21.15 \end{gathered}$ | $\begin{gathered} t=2.59 \\ S M=23.74 \end{gathered}$ | $\begin{gathered} t=5.18 \\ S M=26.09 \end{gathered}$ | $\begin{gathered} t=10.43 \\ S M=28.26 \end{gathered}$ |

$P=\#$ of processors. $t=$ time in milliseconds and $S M=$ sustained MFLOPS.
The column-systolic Householder QRD was also executed on the simulator. Figure 8 shows the sustained MFLOP rating of the QRD as a function of matrix dimension on a ten processor system. Note that the algorithm approaches the 100 MFLOPs maximum sustainable capacity of the system nearly as fast as the matrix multiplication algorithm, but levels off to 90 MFLOPS due to additional serial threads in the QRD algorithm. Figure 9 indicates that for large matrix size, that the algorithm has linear speed-up as more processors are added. This is due to the fact that communications and computations are nearly full pipelined. It is also due to the modulo P wrapping of the columns to the array and the use of the external ring connection. This mapping strategy achieves nearly perfect load balancing and allows the inherent dependencies of QRD to be effectively eliminated by allowing the system to execute more than one iteration of the algorithm at a time.


Figure 8. QRD Performance on 10-Processor Systems


Figure 9. $128 \times 128$ QRD on P-Processor Systems

The last algorithm that was implemented to date was the modified Hestenes-Luk SVD. The algorithm was found to be efficient for large matrices especially if the number of processors was large. As Figure 10 indicates, as more and more columns are mapped to each processor, the efficiency of the algorithm diminishes slightly. This is due to the fact that more and more time must be spent on pointer operations and loop overhead since the array is actually emulating a large array. Memory constraints limited the range of the number of processors that could be used to implement the SVD, but Figure 11 shows the results of the system performing $48 \times 48$ SVDs on differing numbers of processors. We note that the number of processors must divide the dimensions of the matrix or some processors will need to be idle. The figure indicates a linear speed-up as more processors are added for large sized problems. This behavior is expected due to the fact that the communication overhead does not grow as more processors are added. Actually, as more processors are added, the communication delay remains constant while the pointer overhead and housekeeping diminishes.

Table 2. Pipelined FFT Performance Results for Real-Time Signal and Image Processing

| $N$ | Processors <br> Required | Time Per <br> Pipelined <br> FFT | Maximum Data <br> Rate | NxN FFT <br> Execution <br> Time | NxN FFT <br> Sustained <br> MFLOPs |
| :---: | :---: | :---: | :---: | :---: | :---: |
| 256 | 18 | 94 | 5.45 MHz | 51.5 ms | 122 |
| 512 | 19 | 186 | 5.51 MHz | 197 ms | 143 |
| 1024 | 21 | 372 | 5.51 MHz | 776 ms | 162 |
| 2048 | 22 | 746 | 5.50 MHz | 3.09 sec | 179 |
| 4096 | 24 | 1468 | 5.43 MHz | 12.43 sec | 194 |



Figure 10. SVD Performance on 8-Processor Systems


Figure 11. $48 \times 48$ SVD on P-Processor Systems

## Conclusion

In this paper a hybrid architecture for matrix, DSP, image, and scientific computations has been presented to harness the power of N TMS34082 floating-point processors. The architecture can be programmed using many different programming models and parallel processing paradigms so that efficient programs can be written for a broad range of algorithms. The machine may be programmed as a shared memory machine, a message passing distributed memory machine, or a systolic array. The architecture may dynamically switch between any of these modes under software control.

The architecture is optimized to operate with multiple TMS34082s. To this end, a local bus controller is introduced to assist the TMS34082s in pointer manipulations and to provide a fast addressing capability on the LAD bus. The bus controller also provides the ability to perform multiple bus operations, such as a fetch and a send, in the same cycle. By allowing the bus controller to have its own instruction stream, a program controlled DMA mechanism makes it possible for the cell to send messages or pass systolic data streams while the processor was executes numerical loops. While a simple address latching scheme seems reasonable, use of the smart LAD bus controller leads to speed-ups of two to three orders of magnitude.

The system was implemented using the TMS34082 Toolkit along with the RPPT simulation package. Matrix Multiplication, FFT, QRD and SVD algorithms were coded in Concurrent C and executed on the architecture model to provide detailed cycles counts which were converted into MFLOPs ratings for each algorithm. The simulations showed what must be done to make the system execute code efficiently. The main findings were that the TMS34082 must be freed from pointer manipulations whenever possible, that registered variables should be utilized to reduce costly stack operations, and that the LOOPCT register together with the cjmp.d instruction should be used to control loops. Hand optimizations to the assembly code generated by the Ccompiler were needed off-load LAD pointer manipulations to the bus controller hardware. The simulation showed that high performance can be achieved if the system is carefully designed and code is optimized. Algorithms can often sustain computation rates approaching MFLOPs per processor, where the MFLOPs rating account for program overhead and data I/O time. For example, the simulation showed the matrix multiplication algorithm could run at just under 100 MFLOPs on a ten TMS34082 system.

## Bibliography

1. M. Annaratone, E. Arnould, R. Cohn, T. Gross, and H. T. Kung. Harp architecture: From prototype to production. In Proceedings of the 1987 National Computer Conference, Chicago, Illinois, June 1987.
2. R. P. Brent and F. T. Luk. The solution of singular-value and symmetric eigenvalue problems on multiprocessor arrays. SIAM J. Sci. Stat. Compt., 6(1):69-84, January 1985.
3. E. M. Dowling, M. Griffin, and F. J. Taylor. A multi-purpose VLSI floating-point array processor. 22nd Annual Asilomar Conference on Signals, Systems, and Computers, October, 1988.
4. B. L. Drake, F. T. Luk, J. M. Speiser, and J. J. Symanski. Slapp: A systolic linear algebra parallel processor. Computer, 20(7):35-43, July, 1987.
5. D. W. Foulser and R. Schreiber. The saxpy matrix-1: A general purpose systolic computer. Computer, 2-(7):35-43, July, 1987.
6. M. R., Hestenes. Inversion of matrices by biorthogonalization and related results. SIAM J. App. Math, 6(1):51-90, March, 1958.
7. H. T. Kung. Why systolic architectures? IEEE Computer Magazine, pages 37-46, January, 1982.
8. F. T. Luk. On parallel Jaccobi orderings. SIAM J. Sci. Stat. Compt., 10(1), January, 1989.
9. J. C. Nash. A one sided transformation method of the singular value decomposition and algebraic eigenproblem. The Computer Journal, 18:74-76, 1975.
10. R. A. Whiteside, N. S. Ostlund, and P. G. Hibbard. a parallel diagonalization algorithm for a loop multiple processor system. IEEE Trans. on Computers, C-33(5):409-413, May 1984.
11. G. H. Golub and C. F. Van Loan. Matrix Computations, 2nd Ed., John Hopkins University Press, Baltimore, MD, 1989.

## TI Worldwide Sales Offices

ALABAMA: Huntsville: 4960 Corporate Drive, Suite N-150, Huntsville, AL 35805-6202, (205) 837-7530.
ARIZONA: Phoenix: 8825 N. 23 rd Avenue, Suite 100, Phoenix, AZ 85021, (602) 995-1007 CALIFORNIA: Irvine: 1920 Main Street, Suite 900, Irvine, CA 92714, (714) 660-1200; Roseville: 1 Sierra Gate Plaza, Suite 255B, Roseville, CA 95678, (916) 786-9208; San Diego: 5625 Ruffin Road, Suite 100, San Diego CA 92123, (619) 278-9601; Santa Clara: 5353 Betsy Ross Drive, Santa Clara, CA 95054, (408) 980-9000; Woodland Hills: 21550 Oxnard Street, Suite 700, Woodland Hills, CA 91367, (818) 704-8100.

COLORADO: Aurora: 1400 S. Potomac Street, Suite 101, Aurora, CO 80012, (303) 368-8000. CONNECTICUT: Wallingford: 9 Barnes Industrial Park So., Wallingford, CT 06492, (203) 269-0074.
FLORIDA: Altamonte Springs: 370 S. North Lake Boulevard, Suite 1008, Altamonte Springs, FL. 32701, (407) 260-2116; Fort Lauderdale: 2950 N.W.' 62 nd Street, Suite 100, Fort Lauderdale, FL 33309, (305) 973-8502; Tampa: Lauderdale, FL 33309, (305) 973-8502; Ta
4803 George Road, Suite 390, Tampa, FL 33634-6234, (813) 882-00i7.
GEORGIA: Norcross: 5515 Spalding Drive,
Norcross, GA 30092, (404) 662-7900.
ILLINOIS: Arlington Heights: 515 W . Algonquin, Arlington Heights, iL 60005, (708) 640-3000.
INDIANA: Carmel: 550 Congressional Drive, Suite 100, Carmel, IN 46032, (317) 573-6400; Fort Wayne: 118 E. Ludwig Road, Suite 102, Fort Wayne, IN 46825, (219) 482-3311. IOWA: Cedar Rapids: 373 Collins Road N.E., Suite 201, Cedar Rapids, IA 52402, (319) 395-9551.
KANSAS: Overland Park: 7300 College Boulevard, Lighton Plaza, Suite 150, Overland Park, KS 66210, (913) 451-4511.
MARYLAND: Columbia: 8815 Centre Park Drive, Suite 100, Columbia, MD 21045, (301) 964-2003.
MASSACHUSETTS: Waltham: 950 Winter Street, Suite 2800, Waltham, MA 02154, (617) 895-9100.
MICHIGAN: Farmington Hills: 33737 W. 12 Mile Road, Farmington Hills, MI 48331, (313) 553-1500; Grand Rapids: 3075 Orchard Vista Drive S.E., Grand Rapids, MI 49506, (616) 957-4202.
MINNESOTA: Eden Prairie: 11000 W. 78th Street, Suite 100, Eden Prairie, MN 55344, (612) 828-9300.
MISSOURI: St. Louis: 12412 Powerscourt Drive, Suite 125, St. Louis, MO 63131, (314) 821-8400.
NEW JERSEY: Iselin: Parkway Towers, 485E. Route 1 South, Iselin, NJ 08830, (201) 750-1050. NEW MEXICO: Albuquerque: 1224 Parsons Court, N.E., Albuquerque, NM 87112, (505) 291-0495.
NEW YORK: East Syracuse: 6365 Collamer Drive, East Syracuse, NY 13057, (315) 463-9291; Fishkill: 300 Westage Business Center, Suite 140, Fishkill, NY 12524, (914) 897-2900; Melville: 1895 Walt Whitman Road, P.O. Box 2936, Melville, NY 11747, (516) 454-6600; Pittsford: 2851 Clover Street, Pittsford, NY 14534, (716) 385-6770.
NORTH CAROLINA: Charlotte: 8 Woodlawn Green, Suite 100, Charlotte, NC 28217, (704) 527-0930; Raleigh: 2809 Highwoods Boulevard, Suite 100, Raleigh, NC 27625, (919) 876-2725.
OHIO: Beachwood: 23775 Commerce Park Road, Beachwood, OH 44122, (216) 464-6100; Beavercreek: 4200 Colonel Glenn Highway, Suite 600, Beavercreek, OH 45431, (513) 427-6200.

OREGON: Beaverton: 6700 S.W. 105th Street, Suite 110, Beaverton, OR 97005, (503) 643-6758. PENNSYLVANIA: Blue Bell: 670 Sentry Parkway, Blue Bell, PA 19422, (215) 825-9500. PUERTO RICO: Hato Rey: 615 Merchantile Plaza Building, Suite 505, Hato Rey, PR 00918, (809) 753-8700.

TEXAS: Austin: 12501 Research Boulevard, Austin, TX 78759, (512) 250-7655; Dallas: 7839 Churchill Way, Dallas, TX 75251, (214) 917-1264; Houston: 9301 Southwest Freeway, Suite 360, Houston, TX 77074, (713) 778-6592. UTAH: Salt Lake City: 1800 S. West Temple Street, Suite 201, Salt Lake City, UT 84115, (801) 466-8973.

WASHINGTON: Redmond: 5010 148th Avenue N.E., Building B, Suite 107, Redmond, WA 98052, (206) 881-3080.
WISCONSIN: Waukesha: 20825 Swenson Drive, Suite 900, Waukesha WI 53186, (414) 798-1001.
CANADA: Nepean: 301 Moodie Drive, Mallorn Center, Suite 102, Nepean, Ontario, Canada K2H 9C4, (613) 726-1970; Richmond Hill: 280 Centre Street East, Richmond Hill, Ontario, Canada L4C 1B1, (416) 884-9181; St. Laurent: 9460 Trans Canada Highway, St. Laurent, Quebec, Canada H4S 1R7, (514) 335-8392.

ARGENTINA: Texas Instruments Argentina Viamonte 1119, 1053 Capital Federal, Buenos Aires, Argentina, 1/748-3699.
AUSTRALIA (\& NEW ZEALAND): Texas Instruments Australia Ltd., 6-10 Talavera Road, North Ryde (Sydney), New South Wales, Australia 2113, 2-878-9000; 5th Floor, 418 Street, Kilda Road, Melbourne, Victoria, Australia 3004, 3 267-4677; 171 Philip Highway, Elizabeth, South Australia 5112, 8 255-2066.
AUSTRIA: Texas Instruments GmbH., Hietzinger Kai 101-105, A-1130 Wien, (0222) 9100-0. BELGIUM: S.A. Texas Instruments Belgium N.V., 11, Avenue Jules Bordetlaan 11, 1140 Brussels, Belgium, (02) 2423080.
BRAZIL: Texas Instruments Electronicos do Brasil Ltda., Rua Paes Leme, 524-7 Andar Pinheiros, 05424 Sao Paulo, Brazil, 0815-6166.
DENMARK: Texas instruments A/S, Borupvang 2D, DK-2750 Ballerup, (44) 687400.
FINLAND: Texas Instruments OY, P.O. Box 86, 02321 Espoo, Finland, (0) 8026517.
FRANCE: Texas Instruments France, 8-10 Avenue Morane Saulnier-B.P. 67, 78141 Velizy Villacoublay cedex, France, (1) 30701003. GERMANY: Texas Instruments Deutschland GmbH., Haggertystrasse 1, 8050 Freising, (08161) 80-0 od. Nbst; Kurfürstendamm 195-196, 1000 Berlin 15, (030) 8827365 ; Düsseldorfer Strasse 40, 6236 Eschborn 1, (06196) 80 70; Kirchhorster Strasse 2, 3000 Hannover 51, (0511) 64 68-0; Maybachstrasse II, 7302 Ostrildern 2 (Nellingen), (0711) 34 03-0; Gildehofcenter, Hollestrasse 3, 4300 Essen 1, (0201) 24 25-0.

HOLLAND: Texas Instruments Holland B.V., Hogehilweg 19, Postbus 12995, 1100 AZ Amsterdam-Zuidoost, Holland, (020) 5602911

HONG KONG: Texas Instruments Hong Kong Ltd., 8th Floor, World Shipping Center, 7 Canton Road, Kowloon, Hong Kong, 7351223.
HUNGARY: Texas Instruments International, Budaorsi u.42, $\mathrm{H}-1112$ Budapest, Hungary, (1) 1 666617.

IRELAND: Texas Instruments Ireland Ltd., 7/8 Harcourt Street, Dublin 2, Ireland, (01) 755233
ITALY: Texas Instruments Italia S.p.A., Centro Direzionale Colleoni, Palazzo Perseo-Via Paracelso, 12, 20041, Agrate Brianza (Mi), (039) 63221; Via Castello della Magliana, 38, 00148 Roma, (06) 5222651; Via Amendola, 17, 40100 Bologna, (051) 554004.
JAPAN: Texas Instruments Japan Ltd., Aoyama Fuji Building 3-6-12 Kita-aoyama Minato-ku, Tokyo, Japan 107, 03-3498-2111; MS Shibaura Building 9F, 4-13-23 Shibaura, Minato-ku, Tokyo, Japan 108, 03-3769-8700; Nissho-iwai Building 5F, 2-5-8 Imabashi, Chuou-ku, Osaka, Japan 541, 06-204-1881; Dai-ni Toyota Building Nishi-kan 7F, 4-10-27 Meieki, Nakamura-ku Nishi-kan 7F, 4-10-27 Meieki, Nakamura-ku,
Nagoya, Japan 450, 052-583-8691; Kanazawa Nagoya, Japan 450, 052-583-8691; Kanazawa
Oyama-cho Daiichi Seimei Building 6F, 3-10 Oyama-cho, Kanazawa, Ishikawa, Japan 920, 0762-23-5471; Matsumoto Showa Building 6F, 1-2-11 Fukashi, Matsumoto, Nagano, Japan 390, 0263-33-1060; Daiichi Olympic Tachikawa Building 6F, 1-25-12, Akebono-cho, Tachikawa, Tokyo, Japan 190, 0425-27-6760; Yokohama Nishiguchi KN Building 6F, 2-8-4 Kita-Saiwai, Nishiguchi KN Building 6F, 2-8-4 Kita-Saiwai,
Nishi-Ku, Yokohama, Kanagawa, Japan 220, Nishi-Ku, Yokohama, Kanagawa, Japan 220
045-322-6741; Nihon Seimei Kyoto Yasaka Building 5F, 843-2, Higashi Shiokohjicho, Higashi-iru, Nishinotoh-in, Shiokohji-dori, Shimogyo-ku, Kyoto, Japan 600, 075-341-7713; Sumitomo Seimei Kumagaya Building 8F, 2-44 Yayoi, Kumagaya, Saitama, Japan 360, 0485-22-2440; 2597-1, Aza Harudai, Oaza Yasaka, Kitsuki, Oita, Japan 873, 09786-3-3211.
KOREA: Texas Instruments Korea Ltd., 28th Floor, Trade Tower, 159-1, Samsung-Dong, Kangnam-ku Seoul, Korea, 25512800.
MEXICO: Texas Instruments de Mexico S.A., Alfonso Reyes 115, Col. Hipodromo Condesa, Mexico, D.F., Mexico 06120, 5/525-3860.
MIDDLE EAST: Texas Instruments, No. 13, 1st Floor Mannai Building, Diplomatic Area, P.O. Box 26335, Manama Bahrain, Arabian Gulf, 973 274681.

NORWAY: Texas Instruments Norge A/S, PB 106, Refstad (Sinsenveien 53), 0513 Oslo 5, Norway, (02) 155090.
PEOPLE'S REPUBLIC OF CHINA: Texas Instruments China Inc., Beijing Representative Office, 7-05 CITIC Building, 19 Jianguomenwai Dajie, Beijing, China, 500-2255, Ext. 3750 .
PHILIPPINES: Texas instruments Asia Ltd.,
Philippines Branch, 14th Floor, Ba-Lepanto Building, Paseo de Roxas, Makati, Metro Manila, Philippines, 28176031.
PORTUGAL: Texas Instruments Equipamento Electronico (Portugal) LDA., 2650 Moreira Da Maia, 4470 Maia, Portugal (2) 9481003.

## SINGAPORE (\& INDIA, INDONESIA,

MALAYSIA, THAILAND): Texas instruments Singapore (PTE) Ltd., Asia Pacific Division, 101 Thomson Road, \#23-01, United Square, Singapore 1130, 3508100.
SPAIN: Texas Instruments España S.A., c/Gobelas 43, Ctra de La Coruna km. 14, La Florida, 28023 Madrid, Spain, (1) 3728051 ; c/Diputacion, 279-3-5, 08007 Barcelona, Spain, (3) 3179180.

SWEDEN: Texas Instruments International Trade Corporation (Sverigefilialen), Box 30, S-164 93 Kista, Sweden, (08) 7525800.
SWITZERLAND: Texas instruments Switzerland AG, Riedstrafse 6, CH-8953 Dietikon, Switzerland, (01) 7442811.
TAIWAN: Texas Instruments Supply Company, Taiwan Branch, Room 903, 9th Floor, Bank Tower, 205 Tung Hua N. Road, Taipei, Taiwan, Republic of China, 27139311.
UNITED KINGDOM: Texas Instruments Ltd., Manton Lane, Bedford, England, MK41 7PA, (0234) 270111.

## TI North

 American Sales OfficesALABAMA: Huntsville: (205) 837-7530
ARIZONA: Phoenix: (602) 995-1007
CALIFORNIA: Irvine: (714)660-1200
Roseville: (916) 786-9208
San Diego: (619) 278-9601
Santa Clara: (408) 980-9000
Woodland Hills: (818) 704-8100
COLORADO: Aurora: (303) 368-8000
CONNECTICUT: Wallingford: (203) 269-0074
FLORIDA: Altamonte Springs: (407) 260-2116
Fort Lauderdale: (305) 973-8502
Tampa: (813) 882-0017
GEORGIA: Norcross: (404) 662-7900
ILLINOIS: Arlington Heights: (708) 640-3000
INDIANA: Carmel: (317) 573-6400
Fort Wayne: (219) 482-3311
IOWA: Cedar Rapids: (319) 395-9551
KANSAS: Overland Park: (913) 451-4511
MARYLAND: Columbia: (301) 964-2003
MASSACHUSETTS: Waltham: (617) 895-9100
MICHIGAN: Farmington Hills: (313) 553-1500
Grand Rapids: (616) 957-4202
MINNESOTA: Eden Prairie: (612) 828-9300
MISSOURI: St. Louis: (314) 821-8400
NEW JERSEY: Iselin: (201) 750-1050
NEW MEXICO: Albuquerque: (505) 291-0495
NEW YORK: East Syracuse: (315) 463-9291
Fishkill: (914) 897-2900
Melville: (516) 454-6600
Pittsford: (716) 385-6770
NORTH CAROLINA: Charlotte: (704) 527-0930 Raleigh: (919) 876-2725
OHIO: Beachwood: (216) 464-6100
Beavercreek: (513) 427-6200
OREGON: Beaverton: (503) 643-6758
PENNSYLVANIA: Blue Bell: (215) 825-9500
PUERTO RICO: Hato Rey: (809) 753-8700
TEXAS: Austin: (512) 250-7655
Dallas: (214) 917-1264
Houston: (713) 778-6592
UTAH: Salt Lake City: (801) 466-8973 WASHINGTON: Redmond: (206) 881-3080 WISCONSIN: Waukesha: (414) 798-1001 CANADA: Nepean: (613) 726-1970 Richmond Hill: (416) 884-9181
St. Laurent: (514) 335-8392

## TI Regional Technology Centers

CALIFORNIA: Irvine: (714) 660-8140 Santa Clara: (408) 748-2220 GEORGIA: Norcross: (404) 662-7950 ILLINOIS: Arlington Heights: (708) 640-2909 INDIANA: Indianapolis: (317) 573-6400 MASSACHUSETTS: Waltham: (617) 895-9196 MEXICO: Mexico City: 491-70834 MINNESOTA: Minneapolis: (612) 828-9300 TEXAS: Dallas: (214) 917-3881
CANADA: Nepean: (613) 726-1970

## Customer Response Center

TOLL FREE: (800) 336-5236
OUTSIDE USA: (214) 995-6611 (8:00 a.m. - 5:00 p.m. CST)

# TI Authorized North American Distributors 

Alliance Electronics, Inc. (military product only) Almac Electronics
Arrow/Kierulff Electronics Group
Arrow (Canada)
Future Electronics (Canada)
GRS Electronics Co., Inc.
Hall-Mark Electronics
Lex Electronics
Marshall Industries
Newark Electronics
Wyle Laboratories
Zeus Components
Rochester Electronics, Inc. (obsolete product only (508) 462-9332)

## TI Distributors

ALABAMA: Arrow/Kierulf (205) 837-6955; Hall-Mark (205) 837-8700; Marshall (205) 881-9235; Lex (205) 895-0480 ARIZONA: Arrow/Kierulff (602) 437-0750 Hath-Mark (602) 437-1200; Marshall (602) 496-0290; Lex (602) 431-0030; Wyle (602) 437-2088.
CALIFORNIA: Los Angeles/Orange County: Arrow/Kierulff (818) 701-7500, (714) 838-5422; Hall-Mark (818) 773-4500, (714) 727-6000; Marshall (818) 407-4100, (714) 458-5301; Lex (818) 880-9686, (714) 863-0200; Wyle (818) 880-9000, (714) 863-9953; Zeus (714) 921-9000 (818) 889-3838

Sacramento: Hall-Mark (916) 624-9781; Marshall (916) 635-9700; Lex (916) 364-0230; Wyle (916) 638-5282;
San Diego: Arrow/Kierulff (619) 565-4800; Hall-Mark (619) 268-1201; Marshall (619) 578-9600; Lex (619) 495-0015; Wyle (619) 565-9171; Zeus (619) 277-9681;
San Francisco Bay Area: Arrow/Kierulff (408) 441-9700; Hall-Mark (408) 432-4000; Marshall (408) 942-4600; Lex (408) 432-7171; Wyle (408) 727-2500; Zeus (408) 629-4789.
COLORADO: Arrow/Kierulff (303) 373-5616; Hall-Mark (303) 790-1662; Marshall (303) 451-8383; Lex (303) 799-0258; Wyle (303) 457-9953.
CONNECTICUT: Arrow/Kierulff (203) 265-7741; Hall-Mark (203) 271-2844; Marshall (203) 265-3822; Lex (203) 264-4700.
FLORIDA: Fort Lauderdale: Arrow/Kierulff (305) 429-8200; Hall-Mark (305) 971-9280; Marshall (305) 977-4880; Lex (305) 977-7511 Orlando: Arrow/Kierulff (407) 333-9300; Hall-Mark (407) 830-5855; Marshall (407) $767-8585$; Lex (407) 331-7555; Zeus (407) 365-3000;
Tampa: Hall-Mark (813) 541-7440; Marshall (813) 573-1399; Lex (813) 541-5100. GEORGIA: Arrow/Kierulff (404) 497-1300; Hall-Mark (404) 623-4400; Marshall (404) 923-5750; Lex (404) 449-9170.
ILLINOIS: Arrow/Kierulff (708) 250-0500; Hall-Mark (708) 860-3800; Marshall (708) 490-0155; Newark (312)784-5100; Lex (708) 330-2888.
INDIANA: Arrow/Kierulff (317) 299-2071; Hall-Mark (317) 872-8875; Marshall (317) 297-0483; Lex (317) 843-1050.


Texas
Instruments

IOWA: Arrow/Kierulf (319) 395-7230; Lex (319) 373-1417.
KANSAS: Arrow/Kierulff (913) 541-9542; Hall-Mark (913) 888-4747; Marshall (913) 492-3121; Lex (913) 492-2922.
MARYLAND: Arrow/Kierulff (301) 995-6002; Hall-Mark (301) 988-9800; Marshall (301) 622-1118; Lex (301) 596-7800; Zeus (301) 997-1118.
MASSACHUSETTS: Arrow/Kierulff (508) 658-0900; Hall-Mark (508) 667-0902; Marshall (508) 658-0810; Lex (508) 694-9100; Wyle (617) 272-7300; Zeus (617) 863-8800.
MICHIGAN: Detroit: Arrow/Kierulff (313) 462-2290; Hall-Mark (313) 462-1205; Marshall (313) 525-5850; Newark (313) 967-0600; Lex (313) 525-8100;

Grand Rapids: Arrow/Kierulff (616) 243-0912.
MINNESOTA: Arrow/Kierulff (612) 830-1800; Hall-Mark (612) 941-2600; Marshall (612) 559-2211; Lex (612) 941-5280.
MISSOURI: Arrow/Kierulff (314) 567-6888; Hall-Mark (314) 291-5350; Marshall (314) 291-4650; Lex (314) 739-0526.
NEW HAMPSHIRE: Lex (800) 833-3557.
NEW JERSEY: Arrow/Kierulff (201) 538-0900, (609) 596-8000; GRS (609) 964-8560; Hall-Mark (201) 515-3000, (609) 235-1900; Marshall (201) 882-0320, (609) 234-9100; Lex (201) 227-7880, (609) 273-7900.

NEW MEXICO: Alliance (505) 292-3360.
NEW YORK: Long Island: Arrow/Kierulff (516) 231-1000; Hall-Mark (516) 737-0600; Marshall (516) 273-2424; Lex (516) 231-2500; Zeus (914) 937-7400;
Rochester: Arrow/Kierulff (716) 427-0300; Hall-Mark (716) 425-3300; Marshall (716) 235-7620; Lex (716) 383-8020;
Syracuse: Marshall (607) 798-1611.
NORTH CAROLINA: Arrow/Kierulff (919)
876-3132; (919) 725-8711; Hall-Mark (919) 872-0712; Marshall (919) 878-9882; Lex (919) 876-0000.
OHIO: Cleveland: Arrow/Kierulff (216)
248-3990; Hall-Mark (216) 349-4632; Marshall (216) 248-1788; Lex (216) 464-2970;

Columbus: Hall-Mark (614) 888-3313;
Dayton: Arrow/Kierulff (513) 435-5563; Marshall (513) 898-4480; Lex (513) 439-1800; Zeus (513) 293-6162.
OKLAHOMA: Arrow/Kierulff (918) 252-7537; Hall-Mark (918) 254-6110; Lex (918) 622-8000. OREGON: Almac (503) 629-8090; Arrow/Kierulff (503) 627-7667; Marshall (503) 644-5050; Wyle (503) 643-7900.

PENNSYLVANIA: Arrow/Kierulff (215) 928-1800; GRS (215) 922-7037; Marshall (412) 788-0441; Lex (412) 963-6804.
TEXAS: Austin: Arrow/Kierulff (512) 835-4180; Hall-Mark (512) 258-8848; Lex (512) 339-0088; Wyle (512) 345-8853;
Dallas: Arrow/Kierulff (214) 380-6464; Hall-Mark (214) 553-4300; Marshall (214) 233-5200; Lex (214) 247-6300; Wyle (214) 235-9953; Zeus (214) 783-7010;

Houston: Arrow/Kierulff (713) 530-4700; Hall-Mark (713) 781-6100; Marshall (713) 895-9200; Lex (713) 784-3600; Wyle (713) 879-9953.
UTAH: Arrow/Kierulff (801) 973-6913; Marshall (801) 485-1551; Wyle (801) 974-9953.

WASHINGTON: Almac (206) 643-9992, (509) 924-9500; Arrow/Kierulff (206) 643-4800 Marshall (206) 486-5747; Wyle (206) 881-1150 WISCONSIN: Arrow/Kierulff (414) 792-0150; Hall-Mark (414) 797-7844; Marshall (414) 797-8400; Lex (414) 784-9451.
CANADA: Calgary: Future (403) 235-5325; Edmonton: Future (403) 438-2858;
Montreal: Arrow Canada (514) 735-5511; Future (514) 694-7710; Marshall (514) 694-8142;

Ottawa: Arrow Canada (613) 226-6903; Future (613) 820-8313; Quebec City: Arrow Canada (418) 871-7500;

Toronto: Arrow Canada (416) 670-7769; Future (416) 612-9200; Marshall (416)458-8046;

Vancouver: Arrow Canada (604) 421-2333;
Future (604) 294-1166.


[^0]:    †The [ ]'s denote the type of buffer utilized in the host-independent mode. If no [ ]'s appear, the buffer type is identical for both modes of operation.

[^1]:    $\dagger \mathrm{C}$ and CT registers cannot both be used for FPU operand sources in the same instruction.

[^2]:    $\dagger$ See Parameter Measurement Information for load circuit, voltage waveforms, and timing diagrams. The device parameters are measured for

[^3]:    $\dagger$ See Parameter Measurement Information for load circuit, voltage waveforms, and timing diagrams. The device parameters are measured for PIPES2 high and PIPES1 low. No other pipeline settings are specified.
    $\ddagger$ Interrupts are not granted during multicycle instructions.

[^4]:    $\dagger$ The setting of DS/CS determines whether the value on the MSD bus is an instruction or data.
    $\ddagger \overline{M C E}$ dos not toggle at each clock edge.
    $\S \overline{M O E}$ goes high at each clock edge.
    NOTE: This example shows a data read followed by an instruction read.

[^5]:    $\dagger$ The setting of DS/ $\overline{C S}$ determines whether the value on the MSD bus is an instruction or data.
    $\ddagger \overline{M C E}$ does not toggle at each clock edge.
    $\S \overline{\text { MWR }}$ goes high at each clock edge.
    NOTE: This example shows a data write followed by a code read.

[^6]:    $\dagger$ DS/ $\overline{C S}$ valid for moves to data space; $\overline{M C E}$ valid for moves to code space. Only one of these would be valid for each move instruction. $\ddagger$ This option for using DS/ $\overline{C S}$ as data space chip enable and $\overline{M C E}$ as code space chip enable is invoked by setting the MEMCFG bit high in the configuration register.

[^7]:    $\dagger$ The setting of DS/CS determines whether the value on the MSD bus is an instruction or data.
    $\ddagger \overline{M C E}$ does not toggle at each rising clock edge.
    $\S \overline{\text { MOE }}$ goes high at each rising clock edge.
    NOTE: This example shows a data write followed by an instruction read.

[^8]:    $\dagger$ Indicates a complex instruction.

[^9]:    F denotes single-precision, D denotes double-precision floating-point, x denotes operand type, and a blank designates signed integer

[^10]:    f The [ ]'s denote the type of buffer utilized in the host-independent mode. If no [ ]'s appear, the buffer type is identical for both modes of operation.

[^11]:    See Parameter Measurement Information for load circuit, voltage waveforms, and timing diagrams. The device parameters are measured for PIPES2 high and PIPES 1 low. No other pipeline settings are specified.

[^12]:    $\dagger_{\text {See Parameter Measurement Information for load circuit, voltage waveforms, and timing diagrams. The device parameters are measured for }}$

[^13]:     instruction instead of the set mask instruction.
    NOTE: This timing diagram assumes an external address latch to store address for external memory reads. Data input hold time on the latch is zero; data (or address) output hold time is nonzero.

[^14]:    $\dagger \overline{\text { COINT }}$ timing is for LADCFG high only. When the LADCFG bit is set high in the configuration register, $\overline{\text { COINT }}$ is controlled by bit 1 of the LAD move instruction instead of the set mask instruction.
    $\ddagger \overline{M C E}$ does not toggle at each rising clock edge.
    $\S \overline{\mathrm{MOE}}$ goes high at each rising clock edge.

[^15]:    NOTE: When either RDY or LRDY is set low and the setup time before CLK high is observed, the device is stalled for one or more clock cycles, until RDY or LRDY is set high again. During a wait state, internal states and status are preserved and output signals do not change. LRDY can be used in this manner only in the host-independent mode.

[^16]:    $\dagger$ Indicates a complex instruction.

