
Completion report on my Texas Instruments TMS320C30 simulator.

by Chris Moy

December 5, 1996

Supervising Professor: Dr. Brian L. Evans


ABSTRACT

Currently, a freely-distributable simulator does not exist for the TMS320C30,
a floating-point digital signal processor manufactured by Texas Instruments.
My simulator, written in C, emulates the C30 pipeline, performing the four
stages: fetch, decode, read, and execute.  I modeled my code on the behavior
of the C30.  While, the C30 performs each stage simultaneously, my software
must execute the stages sequentially.  So, the stages are called in the order
of their priority.  Memory and register access counters are used to identify
pipeline conflicts.  When a conflict is detected, the lower priority stages
are skipped, thus interlocking the pipeline.  While the simulator is not 100%
compatible as originally planned, it does correctly compute effective
addresses, perform program counter updates, and detect and handle pipeline
conflicts.  In the future, the instruction set can be improved to complete
compatibility.  A graphical user interface will be added to complete the
simulator.  This application will be a useful tool for C30 programmers at no
cost.




INTRODUCTION

The Texas Instruments TMS320C30 (C30) simulator is a software application,
written in C and compiled using C++, that emulates the C30 floating-point
digital signal processor.  The simulator takes C30 machine code as input
and emulates the data pipeline of the processor [1].  Originally, the
objective of this project was to design a 100% compatible simulator.
However, due to the complexity of the project, the final goal of the project
was to design an accurate cycle engine that correctly fetched and decoded
instructions, and detected and handled pipeline conflicts.  Currently,
there is not a freely-distributable simulator for the C30, which means that
C30 programmers must use expensive evaluation boards.  With my simulator,
programmers can create and optimize programs for the C30 at no cost.

C30 PIPELINE

The C30 employs a pipeline consisting of four stages: fetch, decode, read,
and execute.  Usually, the four stages operate on four separate instructions
simultaneously.  However, pipeline conflicts can occur.  An example of a
pipeline conflict is when the processor attempts to read or write to a 
memory block more times than the block has ports.  The C30 handles this
problem using a technique called interlocking [2].  The processor halts the
stage responsible for the conflict and all stages before it.  This reduces
efficiency, but insures correct execution of an instruction sequence.

DESIGN

To store the information contained in the C30, I had to create a virtual
chip.  So, my first task was to design a data structure that included the
visible architecture, which included the extended precision registers,
auxiliary registers, index registers, program counter, status register,
stack pointer, and repeat mode registers.  My next task was to create a
data structure that stored the state of the pipeline.  This included memory
access counters, an interlock flag, and data passed between stages, such
as the instruction word from the fetch stage, operands from the read stage,
and operand, destination, and opcode function pointers from the decode stage.

Then I turned my attention to the design of a routine that coordinated the
stages of the pipeline.  In the C30, the stages occur simultaneously.
However, in C, the stages must occur sequentially.  So, first I had to
decide the order in which the stages were called.  The obvious choice was
to call the stages by priority.  The highest priority stage, execute, was
called first, followed by the read, decode, and fetch stages.  Then I added
checks to detect pipeline conflicts.  Before each stage, I called a routine
that checked from where data was being read or to where data was being
written and updated an appropriate counter.  The stage's routine was called
if any memory block had not been accessed more than twice and any address
forming register had not been accessed more than once [3].  If the maximum
accesses had been exceeded, the interlock flag was set and the remaining
stages were skipped.  At the beginning of every cycle, the cycle count was
incremented and the interlock flag and access counters were cleared.

The next step in the project was to design the four stages of the pipeline.
The simplest routine was the operand read stage, which simply retrieved
values from pointers and stored them in the pipeline.  This stage also
passed along opcode and destination pointers to the execution stage.  The
execution stage consisted of hundreds of routines that carried out the
actions of the instruction.  Every opcode had a corresponding routine
performed the action.  The status register was updated in these routines.
The fetch routine retrieved the instruction word to which the program counter
pointed.  Then the program counter was updated according to the state of the
simulator.  If an appropriate flag was set by the execution stage, a new
value was stored in the program counter.  This was the result of a branch
instruction.  If the simulator was in repeat mode, according to the value
of a bit in the status register, and the program counter pointed to the end
of the loop, the program counter was loaded with the address at the beginning
of the loop.  If none of these conditions existed, the program counter was
simply incremented.

The decode stage was not entirely my own design.  I was given permission,
by Texas Instruments, to use the code from a C30 disassembler.  So I took
the existing code and added statement that stored information into my
pipeline data structure.  The disassembler was called, with the instruction
word as a parameter.  Then the disassembler stored a pointer to the
appropriate execution stage routine, formed operand and destination
addresses, and stored pointers to these addresses in the pipeline structure.

Many of these routines required a pointer to actual memory, when the inputs
were addresses in the C30 memory map.  So I wrote a routine to take a C30
address as input and return a pointer to the appropriate location in
simulated memory.

The final step in the design was the command-line interface.  This was the
programs main routine.  It input commands followed by parameters from the
keyboard.  The commands included loading a C30 program, resetting the
simulator, restarting the current C30 program, running or stepping the
current program, adding or removing breakpoints, and reading or modifying
registers in the simulated C30.



TESTING

I compiled and debugged my software in Borland C++ 4.51 on my personal
computer.  I tested each component of the simulator using a test main
program, which set up the state and pipeline data structures, than called
the routine being tested.  Naturally, as with any large program, there
were several bugs in almost every routine.  So the standard course of
action was to trace the program, step by step, until the bug was found.

The original proposed testing method was to run the same program on the
simulator and a C30 evaluation board located in a computer on the second
floor of the Engineering Science Building.  However, since 100% compatibility
was not plausible in the time given, I tested the simulator by inputting C30
code sequences and compared the expected results, which were given in the
TMS320C3x User's Guide.  I tested repeat mode operation, address forming,
pipeline conflict detecting, and most of the instructions.


RESULTS

Most of the tests failed at first, due to minor bugs in my code.  However,
after debugging, the simulator passed all tests.  It accurately formed
addresses in all addressing modes, it correctly performed single instruction
and block repeats, it detected all pipeline conflicts and interlocked
accordingly, and it carried out the instructions.



CONCLUSION

My simulator was a success.  Although I did not create a finished product,
I did design an accurate engine for the C30 simulator.  With some work on
the instruction set, a completely compatible simulator can be completed in
the future.  Another step in the future of the simulator is the design of
a graphical user interface.  Then, C30 programmers will have an extremely
useful and free tool to design and optimize their software.  The simulator
will also be used next year in the new undergraduate digital signal
processing class, taught by Dr. Brian Evans.


REFERENCES

[1] Personal interview with Prof. Brian Evans, ECE department, The
    University of Texas at Austin, September 6, 1996.

[2] P. Lapsley, J. Bier, A. Shoham, and E. A. Lee, DSP Processor
    Fundamentals: Architecture and Features.  Fremont, CA: Berkeley
    Design Technology, Inc., 1996, ch. 9, pp. 103-113.

[3] TMS320C3x User's Guide, Texas Instruments, 1994.
