Ppc based deficit work conserving round robin arbiter

 
 
Integrated

The deficit round robin arbiter (DWCRRB or DRR) is a variation of the round robin arbitration where the arbitration decision is focused on the processing length of the currently selected element when using the downstream shared resource. The arbiter balances the overall processing time assigned to each requestor by looking at processing length information from the selected element during arbitration. Round robin arbitration is a scheduling scheme which gives to each requestor its share of using a common resource for a limited time or data elements. The basic algorithm indicates that once a requestor has been serves he would “go around” to the end of the line and be the last to be served again. The typical case for using a deficit work conserving round robin arbiter is for fairly selecting data packets to be sent on a communication link. This component contains the verified RTL Verilog code of a deficit work conserving round robin arbiter.

Block diagram

The below basic block diagram shows the deficit PPC based work conserving round robin arbiter.

deficit PPC based work conserving round robin arbiter.

Features

  • Work conserving round robin with deficit arbitration methods.
  • Datasize input for each of the requestors. Indicating the data size of the next element from the requestor.
  • Constant input quantum value.
  • No time slots wasted on inactive requestors.
  • ack signal, for downstream stall of the arbiter, the grant vector will change only if ack is asserted. For a new grant every cycle; the ack signal can be kept asserted.
  • Fast parallel computation using PPC method.
  • Valid output indicating the arbiter grant is valid, meaning there are valid requests asserted.
  • Wrap around, meaning that the round robin reaches the end of the vector, it will return to the requestors at the beginning without losing cycles.
  • Parameterized output stage, providing either one-hot, binary decoded or thermometer decoded output for usage in different applications.

Optional features

Optional features are available through different downloadable files

  • synchronous and asynchronous reset.

Deliverables

  • verified RTL code
  • Access to detailed design documentation
  • 1 year support and customization service.

parameters table

parameters for typical FIFO

Parameter

Valid values

description

DATASIZE_S

any

the width of the data size and quantum vectors

WIDTH

>7

Number of requestors

OUTPUT_TYPE

0,1,2

0 – ceil (log(WIDTH)) binary encoded index of the     granted requestor.

1 -  WIDTH one-hot encoding of the granted requestor

2 – WIDTH thermometer encoding of the granted requestor, all bits lower than the selected are 0.

Interface table

Signal name

Direction/width

description

clk

Input

Clock signal

resetb

Input

Active low reset signal, when asserted, mask would be set 0.

ack

input

Indication that the previous grant has been consumed by the downstream logic and the arbiter may progress to the next requestor.

valid

Output

 

Indication that the current grant is valid and is a result of a valid requestor.

request

Input [WIDTH – 1:0]

Request input vector

grant

Output [WIDTH – 1:0]

Grant output, either binary decoded or one-hot.

In case of binary decoding the width of the grant would be log2(request WIDTH) rounded up.

quantum

Input [DATASIZE_S -1:0]

The constant value to be added to deficit counter at each round.

Datasize[WIDTH -1:0]

Input [DATASIZE_S -1:0]

Datasize vector per request specifying the current data item size to be evaluated by the artbiter. The size of the vector is a product of the DATASIZE_S and WIDTH parameters.

 

Deficit PPC work conserving round robin arbiter

The deficit round robin arbiter (DWCRRB) is a variation of the round robin where the attention is focused on the length of processing the currently selected element requires of the downstream logic. The arbiter balances the overall processing time assigned to each requestor by looking at the downstream processing time during arbitration.

The typical case for using a deficit work conserving round robin arbiter is for arbitrating data packets to be sent on a communication link. In this case, the arbitration would prevent requestors with large packets from dominating the link and give a fair share of the link bandwidth to requestors with shorter packet sizes.

The Deficit round robin arbiter is based on a deficit counter and a constant value called quantum. The requestor will be served only if the size of its current data element is lower than the deficit counter and the size will be reduced from the counter. If the size is higher, it will wait for the next arbitration round and its deficit counter will be increased by the quantum value. This type of arbitration is called “deficit” as the remainder of the deficit counter after reducing the size is kept for next round, helping smaller size requestors get fair priority.

In the context of arbitration between packet queues, the DWCRRB serves packets at the head of every non-empty queue whose deficit counter is larger than the current packet's size. If the deficit counter is lower, then the queue is skipped. At the end of each round the deficit counter of all non-empty queues is increased by the given value called quantum. This increased value is used to calculate the queue’s eligibility for service the next time around when the scheduler examines each queue for serving its current packet. If the queue is served, then the deficit counter is decremented by the size of packet being served. This type of arbitration is fair, in queuing systems with variable packet sizes.

Round Robin Arbitration

Round robin arbitration is a scheduling scheme which gives to each requestor its share of using a common resource for a limited time or data elements. The basic algorithm is that once a requestor has been serves he would “go around” to the end of the line and be the last to be served again.

The simplest form of round robin arbiter is based on assignment of a fixed time slot per requestor; this can be implemented using a circular counter. Alternatively, the arbiter can be defined to allow a specific number X of data elements per each requestor, in which case X data elements from each requestor would be processed before moving to the next one.

Round robin arbiters are typically used for arbitration for shared resources, load balancing, queuing systems, and resource allocation. Any application requiring a minimal fairness where none of the requestors is suffering from starvation is a valid application for a round robin arbiter.

Work conserving round robin

The term “work conserving” refers to the ability of a round robin arbiter to skip requestors that do not have any data to process and move directly to the next requestor who has valid data to process. This method of arbitration is more efficient as it does not “waste” time on idle requestors.

The work conserving round robin (WCRRB) arbiter is therefore required to constantly look ahead into the next available requestors and enable the next valid requestor to be granted immediately after the current requestor is complete.

The function of looking constantly into all requests in order to calculate the next valid grant, may be costly in logic, and as the number of requestors get bigger, faster computation methods must be applied in order to meet timing and limit the size of the logic.

Parallel prefix computation (PPC)

PPC parallel prefix computation

Parallel prefix computation is a type of parallel calculation serving to reduce the number of stages it takes to calculate a function on multiple inputs by using parallel computations where possible. The example below is for a OR computation resulting in a thermometer decoded vector. 

From the first bit which is set onwards, all bits in the output vector would be set.

The OR parallel prefix computation enables the arbitration logic to find quickly the first request which is asserted, resulting in the ability to find the next valid requestor with the minimal possible logic cones.

The function of an OR PPC is demonstrated in the following table and drawing:

Request vector

PPC output

0001

1111

0010

1110

1100

1100

0100

1100

1000

1000

0011

1111

1001

1111

1110

1110

 

PPC Work conserving round robin arbiter

Using the PPC computation for a work conserving round robin uses a mask to block the currently granted requestor, enabling the next selection of the PPC to indicate the next requestor in line.

To generate the requestor ID from the PPC output vector the vector is binary decoded, resulting with the ID of the granted requestor.

The arbiter also handles the wrap-around of the requesting vector. At some point, one of the highest bits in the requestor’s vector is selected and the next selection would have to be at the lower requestors.  The wrap-around operation is performed without loss of cycles.

The arbiter outputs would also indicate that the current result is valid. This would enable downstream logic to ignore the arbiter output in the situation where no requests are asserted.

There is one cycle delay between the assertion of a request and the grant for it, so the arbiter can be integrated into a block as a single pipe stage.

Valid signal function

The purpose of the Arbiter valid signal is to enable downstream logic to distinguish between the case where no requests are asserted and the case where request 0 is asserted in a binary decoded arbiter. Those two cases would result in a binary decoded output of 0 and the valid bit would indicate that request 0 is granted.

Acknowledge signal function

The ack signal serves for stalling the arbiter in the case where downstream logic is unable to process the grant. This function enables the integration of the arbiter in flow controlled pipeline implementation.

The PPC mask, responsible for masking the current request and causing the arbiter to grant the next requestor, will only change if the ack signal is asserted and the PPC output is valid.

Output type

The arbiter can have three types of output, depending on the requirement of the surrounding logic.

A one-hot output type outputs a vector where only the bit associated with the selected requestor is asserted in the grant vector. The grant vector in this case would have the width of the request vector.

A binary decoded output would have the index of selected requestor, decoded using the number of bits required to decode the highest requestor number. The lowest requestor index is 0. Therefore, the valid bit should be used to determine if the arbiter output is valid.

The Thermometer decoding option would output a thermometer decoder vector. This can be used by downstream logic for further calculation or serve other scheduling schemes using this round robin arbiter.

 

Output Type

Parameter setting

description

Binary decoded

0

ceil (log(WIDTH)) binary encoded index of the granted requestor

One-hot

1

WIDTH one-hot encoding of the granted requestor

Thermometer

2

WIDTH one-hot encoding of the granted requestor

Deficit logic

The arbiter deficit logic is using deficit counters to determine the eligibility of a requestor to be granted. For a requestor to be granted, its deficit counter has to be higher than its current request_datasize. The deficit counter is update in the following conditions:

  1. When a request is granted and ack is asserted, the deficit counter is decremented by the request_datasize
  2. When the request signal changes from 0 to 1, the deficit counter is incremented by quantum to allow it to participate in the current round if possible.
  3. When all requests are blocked, all deficit counters of valid requests are incremented by quantum so that there are no unnecessary idle rounds. This also serves when the arbiter starts and all deficit counters are zero, to have the deficit counter reach the level of the first requestor’s datasize.
  4. At the end of each round, all active requestors will have their deficit counters incremented.
  5. When the request signal changes from 1 to 0, the deficit counter is cleared.

The arbiter performs round robin arbitration between all eligible requestors in each round. If a request rises, it will be considered as the eligibility is constantly calculated and the rising request will have its deficit counter incremented by quantum.

When a request is de-asserted, the corresponding deficit counter will be cleared so the priority on toggling request will not be affected by remainders from past transactions.

At each round of the arbiter, all active requestors have their deficit counters incremented, so it can be evaluated again for eligibility to be granted. In case a requestor has several transactions which accumulated datasize does not exceed the deficit counter, they will be granted one after the other and the deficit counter will be decremented, till the datasize of the next transaction is higher than the deficit counter.

 

Test Items

the below table summarizes the test items that were checked for the arbiter

Item name

Description

Check resulting arbiter distribution

Run many requests where all requestors have constant valid request asserted. See that the accumulated datasize of all requestors independent of datasize distribution. For that purpose, some requestors use low datasize while others use high datasize.

Test with few requestors

Check that with 1,2,3.. requestors only, check fairness is kept.

Request down/up

Check the case where some requests are de-asserted and re-asserted. See that the deficit counters are updated accordingly.

Ack

Check stalling the arbitration through ack and constantly asserted ack signal.

 

 

assumptions

the following assumptions and rules should be followed, those can be used as error indications:

  1. The deficit counter should not be incremented to its saturation level where the highest bit is set.

Usability

The typical case for using a deficit work conserving round robin arbiter is for arbitrating data packets to be sent on a communication link. In this case, the arbitration would prevent requestors with large packets from dominating the link and give a fair share of the link bandwidth to requestors with shorter packet sizes.

The arbiter can be integrated with a multiple FIFO or multiple queue architecture or into a pipeline.

Integrating with a FIFO system

The typical usage of a round robin arbiter for a FIFO system with multiple FIFOs and an arbiter to select which of them will be extracted is implemented through the following connections:

  1. Connect the request vector to the ~empty or each of the FIFOs so when the FIFO is not empty, a request to the arbiter will be asserted.
  2. Connect the datasize vector to the FIFO output data size portion.
  3. Connect a one-hot grant vector (OUTPUT_TYPE=1) to the extract of the FIFOs. This would be masked with the downstream flow control signaling.
  4. Connect the arbiter ack signal to the downstream flow control signal so each time there is no flow control, the ack would be asserted providing that the arbiter valid is asserted.