Linked list queuing

 
 
Verified

A linked-list based queuing system manages the enqueue, dequeue and arbitration actions for a multiple queue structure implemented using a single embedded SRAM memory array. This component contains the verified RTL Verilog code of the memory structure and round robin arbiter and implements a queuing system based on dynamically allocated memory per queue. The implementation is based on a linked list concept where each queue is allocated memory locations according to its accumulating size, using the memory space most efficiently. The typical usage of this linked list queuing is to hold pointers for a much larger array where actual data such as communication packets is stored.

Block diagram

Linked List queuing block diagram

Features

  • Multiple linked list structure implemented in an embedded SRAM memory array.
  • Parameterized number of FIFO queues.
  • Round robin arbitration for de-queue path.
  • Free buffers list is managed as a separate link list in the same memory.
  • Parameterized memory array data width.
  • Parameterized memory array depth.
  • Using a two port SRAM memory, with 1 read and 1 write.
  • External memory instantiation allowing usage of and embedded SRAM.

Optional Features

Optional features are available through different downloadable files

  • Synchronous and asynchronous reset

Deliverables

  • Verified RTL code
  • Access to detailed design documentation
  • 1 year support and customization service.

parameters table

Parameter

Valid values

description

FIFOCNT(NOQ)

>7

Number of FIFOs in the queuing system

WIDTH

 

FIFO width

DEPTH

any

FIFO depth

 

Interface table

Signal name

Direction/width

description

clk

Input

Clock signal

resetb

Input

Active low reset signal, when asserted, mask would be set 0.

enqueue_request

Input [NOQ-1:0]

Enqueue request, onehot vector indicating to which queue the operation is intended.

enqueue_data

Input [WIDTH -1:0]

Enqueue data

enqueue_ready

Output

Indicating that there is space for enqueue of the next item. Can be used for flow control of upstream logic in the case no space is available.

dequeue_enable

Input

Indicates the queues to perform dequeue according to the arbitration. De-asserting this signal stalls dequeue operations. Can be used for flow control from downstream logic.

dequeue_data

Output [Width-1:0]

De-queue data, typically ready 1 cycle after the dequeue enable is asserted as the memory read latency is one cycle.

dequeue_ready

Output

Indicating that the queues are not empty and a dequeue operation may be performed using dequeue_enable.

dequeue_valid

Output

Indicating the the data on the dequeue_data is valid.

mem_read_addr

Output [log(DEPTH) -1:0]

Memory read address, to be connected to memory instance.

mem_write_addr

Output [log(DEPTH) -1:0]

Memory write address, to be connected to memory instance.

Mem_read_en

Output

Memory read enable, active high

Mem_write_en

Output

Memory write enable, active high

mem_write_data

Output [MEMWIDTH -1:0]

Memory write data

Mem_dout

Input [MEMWIDTH -1:0]

Memory read data. As the memory hold both the data and next link list reference, the MEMWIDTH is the sum of WIDTH and the memory address width.

Functionality

The function of the linked-list based queuing system is to manage the enqueue, dequeue and arbitration actions for a multiple queue structure implemented using a single embedded SRAM memory array. Each queue can use as much of the memory as required, resulting in better utilization of the memory without pre-allocating a large amount of memory for each queue.

As not all memory entries are used for one of the linked lists, there is a need to maintain the list of entries that are currently available for use. This is accomplished using one additional list, implemented in the same memory, holding the unused buffers.

The queue for dequeue operations is selected by a work conserving round robin arbiter. The arbiter would allow fair share of dequeue for all queues that are not empty. The arbitration is “work conserving”, meaning that there are no wasted cycles for queues that are empty.

Linked list

A linked list is a data structure which consists of a group of nodes which together represent a sequence, typically used as a queue or an ordered list. Each node is composed of a datum and a reference (in other words, a pointer) to the next node in the sequence. For the minimal usage of a linked list, it requires a “head pointer”, always pointing to the next element to be extracted (dequeue) from the list and a “tail pointer”, always pointing to the next empty node allocated for the list.

Linked list example

Enqueue and dequeue

The enqueue operation loads an item into the memory and links it to a specific linked list according to the enqueue_request indication. The enqueue data is inserted into the memory location pointed by the tail pointer and the reference part is changed to point to a new memory location which is now empty and is reserved for the queue. The tail pointer of the specific queue is changed to point to the new memory location as well, enabling the next enqueue operation to use this location.

The dequeue operation extracts an item from memory and unlinks it from a specific linked list according to the selection of the round robin arbiter. The dequeue operation starts with a read of the memory entry pointed by the queue head pointer, the data part of the entry is returned and the reference part is loaded into the head pointer, which now points to the next entry on the linked list.

Linked list dequeue example

 

Embedded memory

The usage of a single memory array for storing the linked list for multiple queues enables area efficient storage for multiple queues of unequal sizes. The memory used has two ports, allowing it to read one entry and write one entry each clock cycle. The memory read latency assumed in this component is 1 cycle and the option to use the returning pointer as read was implemented so consecutive dequeue operation can be performed to the same queue cycle after cycle. This implementation may result in timing issues, depending on the required speed and the memory timing. For more detailed option for embedded memory integration and timing problems please consult the Applications tab of this component page.

The depth of the memory array is set by the MEMDEPTH parameter and the width of the memory is WIDTH+MEMADDR_S. typically there is a need to replace the memory instance in the component with an embedded memory using those sizes.

Round Robin Arbitration

Round robin arbitration is a scheduling scheme which gives to each requestor its share of using a common resource for a limited time or data elements. The basic algorithm is that once a requestor has been serves he would “go around” to the end of the line and be the last to be served again.

The simplest form of round robin arbiter is based on assignment of a fixed time slot per requestor; this can be implemented using a circular counter. Alternatively, the arbiter can be defined to allow a specific number X of data elements per each requestor, in which case X data elements from each requestor would be processed before moving to the next one.

Round robin arbiters are typically used for arbitration for shared resources, load balancing, queuing systems, and resource allocation. Any application requiring a minimal fairness where none of the requestors is suffering from starvation is a valid application for a round robin arbiter.

Work conserving round robin

The term “work conserving” refers to the ability of a round robin arbiter to skip requestors that do not have any data to process and move directly to the next requestor who has valid data to process. This method of arbitration is more efficient as it does not “waste” time on idle requestors.

The work conserving round robin (WCRRB) arbiter is therefore required to constantly look ahead into the next available requestors and enable the next valid requestor to be granted immediately after the current requestor is complete.

The function of looking constantly into all requests in order to calculate the next valid grant, may be costly in logic, and as the number of requestors get bigger, faster computation methods must be applied in order to meet timing and limit the size of the logic.

For more details about the arbiter go here

Enqueue Interface

The enqueue interface consists of an enqueue_request vector indicating to which queue the operation is to be performed. The enqueue_data contains the actual data item to be written into the queue. The enqueue_ready signal indicates that the memory has sufficient memory to support the current enqueue operation. When the enqueue_ready signal is de-asserted, an enqueue operation would be ignored. This signal should be used for integrating the enqueue logic into a pipeline implementation or handshake with the enqueue_ready signal serving to flow control the upstream logic.

Dequeue interface

The Dequeue interface enables the extract operation from the queue structure. The dequeue_ready output indicates to the downstream logic that there are valid entries to be extracted in one of the queues. The dequeue_enable signal in an input indicating that the downstream logic performs a dequeue operation. This signal can be used by the downstream logic to stall the dequeue operations for flow control purposes. When both dequeue_enable and dequeue_read signals are asserted, a dequeue operation is performed and the dequeue_data will be valid in the next cycle for downstream logic to use. The dequeue_valid output indicates that the data is valid. The dequeue_enable signal can be constantly held asserted, so whenever the queues are ready a dequeue operation will be performed. When the dequeue_enable signal is used for flow control, care must be ataken as the dequeued data would be valid after one cycle.

The below waveform illustrates the usage of the dequeue interface signals:

Linked list Dequeue interface waveform

Reset

Resetting the component does not clear the memory and only affects the random logic. The head and tail pointer of each queue are cleared so effectively, the content of the memory can no longer be accessed. The reset can be either synchronous or asynchronous.

 

Test Items

the below table summarizes the test items that were checked for this component

Item name

Description

Enqueue_ready

Checked the case where the memory is full, enqueue ready is de-asserted and no enqueue data is lost.

Free node availability

Checked that as long as the total number of used buffer is below the total memory size, there are always available nodes for enqueue, during enqueue bursts.

Node loss

Checked that no nodes are lost in the case of dequeue bursts.

Queue functionality

Checked that each queue maintains FIFO ordering and do not loose data.

Reset functionality

Head and tail pointer initialization

Initialization counter

 

Dequeue_ready

Check no dequeue is performed when signal is deasserted.

Dequeue_enable

Check that dequeue is performed only when both dequeue_ready and dequeue_enable are asserted.

Consecutive dequeue

Check consecutive dequeues to the same queue by filling only one queue.

 

 

Round robin arbitration

Check for fair arbitration when all queues are mostly not empty.

 

Assumptions

  1. The embedded memory is a 1 cycle read latency and the read pointer can be immediately used as address for the next read transaction. Some memories may have timing which does not allow such a design. RTLery can assist you in modifying the component to the required trade-offs, please use the bellow comment form to specify your specific requirement.
  2. The Enqueue_request vector is one-hot, so no more than one bit is set each cycle.

  3. The memory contains enough space to allow for several entries per queue.