For many applications, allocating performance among all of the tasks in a system-on-chip (SoC) design is much easier, and provides greater design flexibility, with multiple CPUs than with just one control processor and multiple blocks of logic. Multiple-processor design changes the role of processors, making it possible to design programmability into many functions while keeping power budgets under control.
The biggest advantage of using multiple processors as SoC task blocks is that they're programmable, so changes can be made in software after the chip design is finished. This means that complex state machines can be implemented in firmware running on the processor, significantly reducing verification time. And one SoC can often be used for multiple products, turning features on and off as necessary.
Multiple-processor design promotes much more efficient use of memory blocks. A multiple-processor-based approach makes most of the memories processor-visible, processor-controlled, processor-managed, processor-tested and processor-initialized. Additionally, this reduces overall memory requirements while promoting the flexible sharing and reuse of on-chip memories.
But how do you pick the right embedded processors for multiple-CPU designs? How do you partition your design to take maximum advantage of multiple processors? How do you manage the software among all the processors? How do you connect them and manage communications in the hardware?
Four techniques
At the conceptual level, the entire system can be treated as a constellation of concurrent, interacting subsystems or tasks. Each task communicates with other subsystems and shares common resources (memory, data structures, network points). Developers start from a set of tasks for the system and exploit the parallelism by applying a spectrum of techniques, including four basic actions:
- Allocate (mostly) independent tasks to different processors, with communications among tasks expressed via shared memory and messages.
- Speed up each individual task by optimizing the processor on which it runs using a configurable processor.
- For particularly performance-critical tasks, decompose the task into a set of parallel tasks running on a set of optimized, intercommunicating processors.
- Combine multiple low-bandwidth tasks on one processor by time-slicing. This approach degrades parallelism, but may improve SoC cost and efficiency if the processor has enough available computation cycles.