Deficit round robin multi-fifo queuing

 
 
Verified

This design is a queuing system based on FIFOs and deficit round robin arbitration (DRR). The typical usage of this queuing structure is to store data from multiple sources, select one of them according to the deficit fair arbitration method and drive it to a shared resource such as a communication link or processing unit. This component contains the verified RTL Verilog code of the FIFO structure and the deficit work conserving round robin arbiter. A deficit based arbitration serves to balance the processing time of data elements from the different queues so all queues eventually get equal processing time on the shared resource, regardless of the processing time of the individual data elements in the queues.

Block diagram

FIFO queuing block diagram

 

Features

·         Deficit Work conserving Round robin arbitration based on parallel prefix computation (PPC), Enabling fair arbitration according to the processing time of the data element at the head of each queue.

·         Use of datasize information from each queue and a common “quantum value”

·         Parameterized number of FIFOs

·         Generic register based FIFOs

·         Valid indication for output validity, usable for pipeline integration.

·         Parameterized FIFO width and depth sizes.

·         Acknowledge signal for flow control, enables downstream logic to stall the arbitration.

Optional Features

The optional features are available using different downloadable files.

·         Synchronous and asynchronous reset

Deliverables

·         verified RTL code

·         Access to detailed design documentation

·         1 year support and customization service.

Parameters table

Parameter

Valid values

description

FIFOCNT(NOQ)

>7

Number of FIFOs in the queuing system

WIDTH

 

FIFO width

DEPTH

any

FIFO depth

DATASIZE _S

any

the width of the datasize vectors for the arbitration

NOQ –denotes the number of queues.

Interface table

Signal name

Direction/width

description

clk

Input

Clock signal for main domain A, free running clock, needs to toggle for proper functionality

Datain[NOQ – 1:0]

Input [WIDTH -1:0]

input data bus for each of the FIFOs

resetb

Input

Active low reset signal, when asserted, mask would be set 0.

ack

input

Indication that the previous grant has been consumed by the downstream logic and the arbiter may progress to the next requestor.

valid

Output

 

Indication that the current output is valid and is a result of a valid non-empty FIFO.

fifoload

Input [NOQ – 1:0]

Request input vector, width of the vector is the number of queues.

Fifofull

Output [NOQ – 1:0]

Fifo full indication for upstream flow control

datasize[NOQ – 1:0]

Input [DATASIZE_S – 1:0]

The number of individual datasize vectors is equal to the number of fifoload signal. The DATASIZE_S parameter is the width of the datasize input vectors. The datasize vector is a concatenation of dataasize vectors per queue.

Dataout

Output [WIDTH -1:0]

Output data bus, according to the arbiter selection.

Functionality

The queuing system contains multiple FIFOs and an arbitration module enabling deficit work conserving round robin arbitration (DRR). The arbiter balances the overall processing time assigned to each requestor by looking at information from the selected element during arbitration. The typical case for using a deficit work conserving round robin arbiter is for fairly selecting data packets to be sent on a communication link. Each FIFO will get his share of the downstream processing utilization independent it’s number of transactions and their sizes. Empty FIFO will inevitably create discrepancies to this fairness rule. Deficit fairness example from simulation:

queue

Number of transactions

Accumulated datasize

1

105

14,503

2

315

15,326

3

420

14,113

Several flows of data from different sources can share a common resource according to a fair distribution, offsetting datasize or processing time from causing some of the queue to manipulate the downstream processing of communication links.

generic FIFO

A “generic FIFO” is a reference name to the simple type of synchronous FIFOs, where the memory array is based of flip-flops and the pointers and status (full, empty, almost full etc.) are generated using counters or pointers logic.

The load and extract operations of the FIFO impact the FIFO pointers.

When the load signal is asserted, the data on the FIFO datain is loaded into the position pointed by the write pointer and the pointer is then incremented by 1 to point to the next FIFO entry.

If the FIFO full indication is asserted, and the extract signal is not asserted, the load operation would be ignored and the pointer would not be incremented. More details about the generic FIFO can be found here.

Valid signal

The purpose of the valid signal is to enable downstream logic to identify the case where no FIFO contains data, so pipeline integration can be achieved.

Acknowledge signal function

The ack signal serves for stalling the arbiter in the case where downstream logic is unable to process the grant. This function enables the integration of the arbiter in flow controlled pipeline implementation the arbiter would stall on its latest granted FIFO while ack is de-asserted and continue once it is asserted again.

Round Robin Arbitration

Round robin arbitration is a scheduling scheme which gives to each requestor its share of using a common resource for a limited time or data elements. The basic algorithm is that once a requestor has been serves he would “go around” to the end of the line and be the last to be served again.

The simplest form of round robin arbiter is based on assignment of a fixed time slot per requestor; this can be implemented using a circular counter. Alternatively, the arbiter can be defined to allow a specific number X of data elements per each requestor, in which case X data elements from each requestor would be processed before moving to the next one.

Round robin arbiters are typically used for arbitration for shared resources, load balancing, queuing systems, and resource allocation. Any application requiring a minimal fairness where none of the requestors is suffering from starvation is a valid application for a round robin arbiter.

Work conserving round robin

The term “work conserving” refers to the ability of a round robin arbiter to skip requestors that do not have any data to process and move directly to the next requestor who has valid data to process. This method of arbitration is more efficient as it does not “waste” time on idle requestors.

The work conserving round robin (WCRRB) arbiter is therefore required to constantly look ahead into the next available requestors and enable the next valid requestor to be granted immediately after the current requestor is complete.

The function of looking constantly into all requests in order to calculate the next valid grant, may be costly in logic, and as the number of requestors get bigger, faster computation methods must be applied in order to meet timing and limit the size of the logic.

Deficit PPC work conserving round robin arbiter

The deficit round robin arbiter (DWCRRB) is a variation of the round robin where the attention is focused on the length of processing the currently selected element requires of the downstream logic. The arbiter balances the overall processing time assigned to each requestor by looking at the downstream processing time during arbitration.

The typical case for using a deficit work conserving round robin arbiter is for arbitrating data packets to be sent on a communication link. In this case, the arbitration would prevent requestors with large packets from dominating the link and give a fair share of the link bandwidth to requestors with shorter packet sizes.

The Deficit round robin arbiter is based on a deficit counter and a constant value called quantum. The requestor will be served only if the size of its current data element is lower than the deficit counter and the size will be reduced from the counter. If the size is higher, it will wait for the next arbitration round and its deficit counter will be increased by the quantum value. This type of arbitration is called “deficit” as the remainder of the deficit counter after reducing the size is kept for next round, helping smaller size requestors get fair priority.

In the context of arbitration between packet queues, the DWCRRB serves packets at the head of every non-empty queue whose deficit counter is larger than the current packet's size. If the deficit counter is lower, then the queue is skipped. At the end of each round the deficit counter of all non-empty queues is increased by the given value called quantum. This increased value is used to calculate the queue’s eligibility for service the next time around when the scheduler examines each queue for serving its current packet. If the queue is served, then the deficit counter is decremented by the size of packet being served. This type of arbitration is fair, in queuing systems with variable packet sizes. More details about the Deficit round robin arbitration can be found here.

Arbiter Deficit logic

The arbiter deficit logic is using deficit counters to determine the eligibility of a requestor to be granted. For a requestor to be granted, its deficit counter has to be higher than its current request_datasize. The deficit counter is update in the following conditions:

  1. When a request is granted and ack is asserted, the deficit counter is decremented by the request_datasize
  2. When the request signal changes from 0 to 1, the deficit counter is incremented by quantum to allow it to participate in the current round if possible.
  3. When all requests are blocked, all deficit counters of valid requests are incremented by quantum so that there are no unnecessary idle rounds. This also serves when the arbiter starts and all deficit counters are zero, to have the deficit counter reach the level of the first requestor’s datasize.
  4. At the end of each round, all active requestors will have their deficit counters incremented.
  5. When the request signal changes from 1 to 0, the deficit counter is cleared.

The arbiter performs round robin arbitration between all eligible requestors in each round. If a request rises, it will be considered as the eligibility is constantly calculated and the rising request will have its deficit counter incremented by quantum.

When a request is de-asserted, the corresponding deficit counter will be cleared so the priority on toggling request will not be affected by remainders from past transactions.

At each round of the arbiter, all active requestors have their deficit counters incremented, so it can be evaluated again for eligibility to be granted. In case a requestor has several transactions which accumulated datasize does not exceed the deficit counter, they will be granted one after the other and the deficit counter will be decremented, till the datasize of the next transaction is higher than the deficit counter.

Reset

The control logic for the FIFO and arbitration is logic is either asynchronous or synchronous depending on the reset policy of the instantiating device.

 

Test Items

the below table summarizes the test items that were checked  for the different decoders

Item name

Description

Check resulting arbiter distribution

Run many requests where all requestors have constant valid request asserted. See that the accumulated datasize of all requestors independent of datasize distribution. For that purpose, some requestors use low datasize while others use high datasize.

Test with few requestors

Check that with 1,2,3.. requestors only, check fairness is kept.

Arbiter request down/up

Check the case where some FIFOs are almost empty so arbiter requests are de-asserted and re-asserted. See that the deficit counters are updated accordingly.

Fifo full functionality

Check FIFO full; see that the FIFO load when full is dropped.

Quantum

Check with different quantum sizes.

Ack

Check stalling the arbitration through ack and constantly asserted ack signal.

 

 

 

Assumptions

the following assumptions and rules should be followed, those can be used as error indications:

  1. The Arbiter deficit counter should not be incremented to its saturation level where the highest bit is set.