
DATE:		August 14, 1997
TO:		Matias Budhiantho and Dr. Brian L. Evans
FROM:		Chi Duong
SUBJECT:	Completion report on my project to complete the
                simulator of the Texas Instrument TMS320C30 Digital
		Signal Processor (DSP), and to design a
		regression testing for this C30 simulator.


ABSTRACT

A fully compatible simulator for the Texas Instrument TMS320C30 (C30) 
processor is the final goal of the C30 simulator project which has been 
developed in four stages:  (1) design a data structure for C30 simulator, 
(2) design the C30 pipeline cycle engine, (3) implement the C30 
instruction set, and (4) design a regression testing.  The first two 
stages were done by Chris Moy as his Fall 1996 Senior Honors Design 
Project, and the last two by me as my Summer 1997 Senior Honors Design 
Project.  The third stage of the project, execution, was implemented by 
over 100 subroutines to execute over 100 instructions in the C30 
instruction set.  Since the C30 instruction set can be divided into 
subgroups according to their common characteristics, I created several 
C++ inline functions to avoid repetition of any segment of codes, and 
therefore simplified the codes significantly.  For the last stage, I 
designed a regression test using the random opcode testing method to 
compare our C30 simulator with the C30 processor under the DSP Starter 
Tool Kit (DSK) Debugger environment.  Now we have a complete C30 
simulator design and an effective regression test to search and fix bugs.  
All we need is some time to run the regression test and further debug 
before we can claim our C30 simulator fully compatible.


INTRODUCTION

The TMS320C30 (C30) processor is a high-performance CMOS 32-bit 
floating-point device which employs a pipeline consisting of four  
stages:  fetch, decode, read, and execute. Chris Moy was successful in 
designing an accurate cycle engine that correctly fetched, decoded 
instructions, and detected, handled pipeline conflicts [1].  However, 
execution stage which handles the C30 instruction set was barely 
implemented.  In addition, since the C30 simulator was in its development 
phase when I started my project, no comprehensive testing had been 
applied before.  As a result, besides known problems marked as "FIXME" in 
most of the files, several other flaws and bugs were hidden in the C/C++ 
codes, especially in the dissembler and the files implementing C30 
addressing modes. 

My tasks were to (1) complete the execution stage of the pipeline which 
involves writing over 100 subroutines, each executes one instruction in 
the C30 instruction set, and (2) design a regression test to validate the 
fully compatibility of the C30 simulator with the processor on the 
pipeline cycle basic.


EXECUTION OF THE C30 INSTRUCTION SET

The C30 instruction set has 113 instructions which are described in 
details in the User's Guide [2].  The execution stage carries out the 
operation for each of every instruction, and updates the status register 
according to the result of the operation for most instructions.

1.  SUBGROUPS OF THE C30 INSTRUCTION SET AND THE USE OF C++ INLINE 
FUNCTIONS:   I divided 113 instructions into six groups:  conditional, 
integer arithmetic, integer logical, floating-point arithmetic, parallel 
format, and special cases.  Except for the special cases, all 
instructions in each subgroup shares some common characteristics which 
are handled by different C++ inline functions as described below.

	A.  Updating status register:  there are three different ways to update 
status register, one for each of the three subgroups: (1) integer 
arithmetic, (2) integer logical, load integer/floating-point number, and 
(3) floating-point arithmetic.  According to the result of each 
operation, all instructions in (1) need to update underflow (uf), 
negative (n), zero (z), overflow (v), and latched overflow (lv) bits of 
the status register; all instructions in (2) need to update uf, n, z, v 
bits; and instructions in (3) need to update n, z, and lv.  Three inline 
functions were used for updating these bits of the status register .
	B.  Conditional group:  the C30 instruction set provides 11 conditional 
instructions with 20 different condition codes such as negative, zero, 
and greater than ....  These condition codes are evaluated by the 
testConditionalCodes(<cond>) function which returns 1 if the bits of the 
status register set by the previous instruction satisfy the condition 
<cond>, and returns 0 otherwise. All 11 conditional instructions will use 
testConditionalCodes(<cond>) to decide whether they perform their 
operations.  For example, the instruction "LDI<cond>   -1,R0"  will load 
-1 into R0 only if testConditionalCodes(<cond>) returns 1.
	C.  Parallel format:  the C30 instruction set has a very rich set of 
parallel instructions which can carry out two operations at the same 
time.  For example:   "MPYI3 R4,R3,R0   ||   SUBI3 *AR7,*AR6,R1" can do 
multiplication and subtraction in one cycle.  The parallel instruction 
set is a distinctive feature of C3x Digital Signal Processor, allows the 
processor to process signal as fast as possible.  Since most parallel 
instructions update the status register according to the results of both 
operations (MPYI3 and SUBI3 in this example), I wrote one subroutine to 
set the status register for the parallel floating-point instructions.
	D.  Special cases:  the rest of the C30 instruction belong to two 
special cases: program control instructions such as branches, and 
floating-point/integer conversions.  There is no need to use common 
inline functions for these cases because the program control instructions 
are short and simple, and the conversion instructions are unique.  I 
followed the flow charts in the User's Guide [2] step by step to 
implement the conversion instructions such as FLOAT, FIX, RND and NORM.

Besides inline functions, the execution file is further simplified by the 
clever way Prof. Evans used in the dissembler program to transfer 
operands specified in opcodes to the C30 pipeline.  His order of passing 
operands allows 3-operand and 1-operand instructions to be implemented by 
the same codes.  For example, the function S_add3 which executes ADD3 
(3-operand) instruction is implemented by calling S_add function which 
executes ADD (1-operand) function.  As a result, the execution file is 
free of repetition code, easy to read and debug.

2.  PRELIMINARY TESTING OF THE EXECUTION CODE:   grouping the C30 
instruction set, and utilizing the C++ inline functions made it possible 
for me to carry out the preliminary testing for 113 instructions by 
manually running 3 or 4 instructions for each subgroup.  While doing the 
preliminary testing, I not only checked my C30 execution design but also 
fixed bugs in the dissembler.  By July 22, our C30 simulator passed the 
preliminary testing, and was ready for a thorough regression testing.


REGRESSION TESTING FOR THE C30 SIMULATOR

The ultimate goal of the C30 simulator project is a fully compatible C30 
simulator.  This means that our C30 simulator must operate as the C30 
Digital Signal Processor (DSP).  To verify the compatibility between the 
simulator and the DSP, I designed a random_test subroutine which allows 
the same random opcode executed on both our C30 simulator and the DSP, 
and then compares all CPU-register values between the C30 simulator and 
DSP.  This method of random-opcode regression test was suggested by Keith 
Larson, a Texas Instruments (TI) Application Engineer [3], who designed 
the TI DSP Starter Tool Kit (DSK) debugger.  Keith's DSK debugger is an 
interface between the DSP assembly language program and the processor.  
My first step of designing a regression test was to compile Keith's DSK 
debugger to include our C30 simulator so that the debugger can 
communicate with our C30 simulator as it does with the DSP.  The next 
step was to develop the random_test subroutine inside this new DSK/C30 
debugger.  Below is the flow chart of the random_test subroutine.

                               ___________________
                               |    Initialize    |  
                               | DSP  & simulator |
                               |__________________|
                                         |
                                         |
            	             ____________|_____________
____________________________| Generate a random opcode |
|                           |__________________________|
|                                        |
|                                        |
|                   _____________________|_______________
|                   |  Replace the invalid random opcode |
|	            |     with NOP (No OPeration)        |
|                   |____________________________________|
|			                 |
|				         |
|	         ________________________|_______________
|                |  Save current CPU registers' values  |
|                |          of the C30 simulator        |
|                |______________________________________|
|                                        |
|                                        |
|                ____________Step one step on the____________
|                |                                           | 
|             ___|____                                _______|_________
|             |  DSP  |  	          	      | C30 simulator  |
|             |_______|                               |________________| 
|                 |                                          | 
|		  |__________________________________________|
|                             ___________|________
|                  __________|  Compare  values  |__________ 
|                  |         |     (match?)      |         |        
|                  |         |___________________|         |           
|                  |                                       | 
|______ yes________|                                       no
                                         ___________________|__________________
                                         | Print out saved & current values of | 
                                         |      DSP and simulator              |
                                         |_____________________________________|


                     Flow chart 1.  Random-Opcode Regression Test Scheme


As demonstrated above, the regression test will keep running and 
comparing CPU-register values until a bug is encountered.  When that 
happens, the user can fix the bug using the print-outs of the 
CPU-register values of DSP and C30 simulator.  Once the bug is fixed, the 
user will restart the regression test.  I will run this regression test 
until there is no bug is found. Because this regression test checks the 
compatibility of the C30 simulator on  random opcodes, the correct number 
of tested opcodes will be millions.  	


CONCLUSION

I have successfully completed two specifications of my Senior Honors 
Project, which are to complete the C30 simulator and to design a 
comprehensive regression test.  However, our current C30 simulator still 
has bugs.  More time to run the regression test and further debug are all 
we need before we can claim our simulator is 100% compatible with the 
DSP.  I plan to continue running the regression test as a part of my 
EE260 course work next semester, and will attain the fully compatible 
simulator prior to October 31, 1997, in time for our Web-Enabled Texas 
Instruments TMS320C30 Simulator (WETICS) project to be submitted to the 
TI DSP Design Contest.

Beyond being a TI Design Contest Project, WETICS, which is the C30 
simulator with its Web-based user interface written by Dogu Arifler and 
Saleem Marwat for their EE464H projects is a very useful tool for product 
evaluation, distance learning and computer-aided design over the Web [4].


REFERENCES

[1]	Chris Moy, Completion report on my Texas Instruments TMS320C30
        simulator, the University of Texas at Austin, ECE department,
	EE464H final report, December 5, 1996.

[2]	TMS320C3x User's Guide, Texas Instruments, 1994.

[3]	E-mails from Keith Larson, Texas Instruments Application Engineer,
        August 1997.

[4]	Weekly meeting with Prof. Brian Evans, ECE department, The
        University of Texas at Austin, June, July, August 1997.
