Current advancement of device technology makes it possible to develop the multi-core processors to exploit parallel and concurrent processing, and commercial processors are popularly practiced, which support simultaneous multi-thread execution (SMT). Most of these processors are aimed at exploiting instruction level parallelism (ILP). However, the limits of ILP exploitation are being reached. In addition, the more exploiting ILP, it makes the circuitry of processors the more complicated.
We are taking another simpler approach focused only on thread level parallelism (TLP), and developing the TLP oriented processor, named Fuce. The Fuce processor is designed on the model of continuation- based multithread execution, which is an advanced version of the conventional dataflow computing model. Continuance is defined as computation continuation and data-transfer between threads. Every thread is executed as a non-preemptive sequence of instructions, and any thread becomes ready to execute when all of continuances are notified from its preceding thread executions. The conituation-based multithread execution unifies both external event processing and internal computation as a thread execution.
The Fuce processor is a chip-multi-processor equipped with Thread Activation Controller (TAC for short) and multiple Thread Execution Units (TEUs for short). TAC controls is a hardware implementation of the continuation-based multithreading execution. TAC drastically reduces the overhead of multithreaded execution, and Fuce processor makes it possible to develop more feasible cost/performance TLP processor. Multiple TEUs execute multiple threads in concurrent way. TEU is constructed with a pair of execution unit and pre-load unit. The execution unit is a simple RISC core, and executes register-to- register instructions. The pre-load unit executes only the load instructions that set up thread context. TEU has two register files; One is used by the execution unit for the current thread execution, and another is used by the pre-load unit to set up the context for the next thread execution. The execution unit and pre-load unit run in parallel. Thus, the Fuce processor, with TAC and multiple TEUs, exploits thread level parallelism (TLP).
In order to verify the behavior and to evaluate the performance of the Fuce processor, a Fuce processor emulator is developed using multiple FPGA chips. The Fuce processor on FPGA is implemented using VHDL. We have also developed software simulators for gate level simulation and event level simulation using VHDL and Java respectively.
In the presentation, we introduce the core concepts of Fuce architecture, programming model for Fuce and overview of the Fuce processor design. Then we show evaluation of the Fuce processor from the viewpoint of both the parallel execution performance and hardware cost.