Designing an inter-block connectivity

designing an inter-block connectivity

This post presents the different aspects of designing an interface between blocks in a device. The main purpose is to outline the factors that impact this type of design and offer a solution that solves most of the concerns.

The issue of interface between two block of the same clock on a chip seems trivial at a first glance. You have two blocks of the same frequency, both receiving their clock from the same clock source, so all timing paths are simple and solvable right?
Unfortunately, this is wrong, if you allow uncontrolled interface between block you will likely get into trouble when trying to close timing at the chip top level. In the below example, two blocks interfacing through a path that includes some logic in each side.
inter block sampling example
The natural solution for this case is budgeting, so each block would get the necessary constrains splitting the cycle time between them in a way that ensures top level integration is clean. The problems with this approach come from few aspects like top level routing distance, placement of flops inside each block and the iterations required in the case the budget in one of the blocks cannot be met. Assuming you have 5-15 blocks in a device and thousands of interconnecting signals, the budgeting effort is huge and seldom brings the chip top level to timing closure.
To make things worse, the methods employed by placement tool lie heavily on timing so once the cells were placed according to a specific budget, there is no simple way to shorten this budget without changes to block placement and inevitably, more work.
So what can we do to keep those interfaces under control?
The answer is to sample, sample all interfaces in both the block output and the block input. This will enable top level routing to occupy a full cycle, enable simple definition of the constraints in both blocks without sophisticated budgeting and prevent iterations between the blocks and the top level at full-chip timing closure.
inter block sampling good example
But does this solve everything? What about latency? What about flow control?
Well, the issue of latency is inherent and the architecture should be robust enough to function at those latencies. The main issue is flow control, since stopping a pipeline which has independent sampling points at both block boundaries would require signals to freeze the pipe simultaneously at both blocks. Those signals will naturally become timing critical and cannot be sampled, so we go back to the same problem.
The natural way to solve a pipe stall issue without freezing the pipe is to use a FIFO. A FIFO can absorb the additional latency in the proposed sampled interface and enable simple interface for both sides where the source side logic sees a FIFO load interface with a full indication so it can write to the interface FIFO every time there is space in the FIFO. The destination side will also see the interface as a FIFO with an empty indication, so it can read from the FIFO every time it is not empty or stall the pipe by not reading data from the FIFO. A stall on the destination side would result in a FIFO full indication on the source side, stalling upstream logic.

Using a generic interface FIFO on all inter-block interfaces of a device can also help verification effort. A unified BFM can be used for all those interfaces, simplifying verification environment building and streamlining block level testing.
A FIFO implementation would require pointers and fill level counters on both sides, so each would be able to handle the required functionality. The below diagram demonstrates a design of such an interface FIFO.
interface FIFO block diagram
To design such a FIFO you would also need to consider issues like behavior at reset, FIFO depth calculation and more.  More details are available in the RTLery library interconnect interface FIFO component where you can find detailed design and Verilog code of a robust interface FIFO.
Amnon Parnass