Ddrx dll

 
 
Verified

The DDR DLL core implements a master-slave structure where a master measures the cycle time and one or more slaves are used for delaying signals such as DDR DQS strobe.

The core provides PVT compensated 25% delay, using digital standard-cell based delay lines with granularity of a standard minimal inverter (15-40ps at 40G). The delay line does not introduce duty cycle distortion or significant jitter to the DQS signal.

The core allows for the connectivity of multiple slaves to the same master.  This configuration can support up to 4 DQS delay slaves for a DDR or similar application at 400Mhz. for higher frequency or spaced out DQS slaves, the use of up to 2 slaves per master are recommended to overcome OCV effects.

The core is structured from delay line taps which are pre-designed and pre-placed separately and are suited for minimal granularity, and additional control logic implemented using synthesizable logic, for the master initialization and locking control functionality.

The DLL requires either the single rate (e.g. 400Mhz for DDR-800) or the double rate clock and can be programmed to measure a double rate clock for the support of lower frequency.

The Core structure is composed of hard macros, for the most sensitive logic and soft synthesizable parts. The hard macros contain only standard cell design, so portability between process nodes and different foundries is possible, with limited timing closure complexity.

Cell count estimated at up to 2000 standard cells, including ~300 flops for the master and 500 cells including ~70 flops for the slave.

DLL block diagram

Features

  • PVT compensation - Master-slave functionality where a single master may be connected to multiple slaves
  • Slave phase shift - 25% phase shift, option for offsetting or overwriting the slave delay select for testability purpose or for optimizing the DQS sampling point.
  • Structure - The DLL uses only standard library cells and does not require separate voltage supplies.
  • Jitter and errors - The DLL granularity error is limited to 1 inverter delay (15-40ps at 40G).
  •  Frequency range - The DLL supports input clock of frequencies between 250Mhz (measuring a double rate clock) and 700Mhz. The slave allows for a delay of 350ps to 1ns suitable for 25% of a cycle at 250Mhz to 700Mhz.
  • Slave select update - The slave module includes a signal for enabling master-to-slave select update, enables update only when no actual DQS is present, preventing DQS glitches.
  • Initialization - The DLL is started once the clocks driving the master are stable. The initialization sequence locks the DLL master to measure one cycle of the input clock. A signal indicates that the DLL is locked.
  • Testing and calibration - The slave allows the usage of the override option for testing the read data valid window of a DDR interface and the offset option allows for placement of the DQS sampling point at the middle of the data valid eye.
  • Testability - The DDR DLL includes full scan support for ATPG testability.

Deliverables

  • Netlist for delay lines and sensitive logic.

  • RTL for control and initialization logic                                                                                                

  • Complete P&R instruction
  • Complete timing closure instructions
  • DFT & testing instructions
  • Complete documentation
  • Timing and P&R review
  • Optional hard macro delivery of the delay lines in the required process node.

Master Interface table

Signal name

Direction/width

description

clk2x

Input

Double rate clock, used by the master for measuring the cycle time.

cclk

Input

Control clock, this is typically a slower clock used for the DLL control logic. There is no alignment requirement between the cclk and the clk2x.

resetb

Input

Asynchronous reset signal

mstr_enable

Input

Master enable signal, starts the master operation. This signal should be asserted after the input clocks are stable at the required frequency. The signal is to be kept asserted while the master is active. 

master_lock

Output

Master lock indication, indicates that the master is locked to the clock frequency and the output can be used for slave components.

mstr2slv_qc

Output [6:0]

Master quarter cycle indication, suitable for slave components with up to 128 taps.

scan_en

Input

Scan enable signal, select shift operation for sequential elements.

Scan_mode

Input

Puts the master into scan mode

mstr_scan_in

Input

Scan chain input for hardened part

mstr_scan_out

Output

Scan chain output for hardened part

Slave Interface table

Signal name

Direction/width

description

cclk

Input

Control clock, this is typically a slower clock used for the DLL control logic. All control signal are related to this clock

resetb

Input

Asynchronous reset signal

slave_in

Input

Slave input signal, used for connecting the strobe to be delayed. 

slave_out

Output

Slave output signal used for connecting the delayed signal

mstr2slv_qc

Input [6:0]

Master quarter cycle indication, the slave uses this vector to set the required delay

Slave_delay_line_offset

Input [6:0]

Delay line offset, used with the slave_delay_line_offset_dir to set an offset to the mstr2slv_qc. Should be set to 0 as default for no offset.

slave_delay_line_offset_dir

Input

Direction of offset, used with the slave_delay_line_offset field. When set to 1, the offset would lower the delay, when set to 0 the offset would increase the delay.

slave_delay_line_override

Input [6:0]

Delay line override, serves to override the mstr2slv_qc. The value in this vector is used directly to select the number of taps in the slave delay lines.

slave_override_enable

Input

Delay line override enable, when selected, the slave dealy line tap vector is overridden by the slave_delay_line_override vector.

slave_soft_reset

Input

Soft reset, returns the tap vector to its initial value.

scan_en

Input

Scan enable signal, select shift operation for sequential elements.

scan_mode

Input

Puts the master into scan mode

slv_scan_in

Input

Scan chain input for hardened part

slv_scan_out

Output

Scan chain output for hardened part

Functionality

A DLL is an essential component of any DDR system. The purpose of the DLL in this context is to delay the DDR data strobe by ¼ of a cycle so it can safely sample a double data rate data bus in the middle of the data valid window. The DLL tracks temperature changes and ensures that the delay is kept at ¼ of a cycle in all conditions. The DDR DLL component is designed as a master-slave structure. The master measures the clock cycle using a high resolution digital delay line effectively tracking the changes of delay resulting from the temperature and voltage effects. The slave delay line is practically identical, so a delay of 25% can be derived from the master measurement. Multiple slaves can be connected to the same master, but care should be taken so effects of on-chip-variation of the process will not impact the DLL accuracy. The master communicates to the slave the number of delay elements that represent a ¼ cycle delay.

The DLL is constructed of standard cells only and can be mapped to any target library or process with minimal re-design effort. Using standard cells simplifies the implementation and requires no special tools and flows to place and route the blocks.

Master clocking

The master block requires two input clocks. The clk2x clock is the clock to be measured. It is typically connected to the same PLL or clock source that is used for the surrounding logic, but there is no balancing requirement. The clock is used for internal purpose only and better results can be achieved by minimizing its delay. The master internally derives the clk1x, measures it and uses it to determine a ¼ cycle delay for the slaves.

The cclk is a control clock, used for clocking the master control logic and can be of slower frequency. Typically, a ½ frequency clock (½ frequency of clk1x) can be used for that purpose but a clk1x can also be sufficient. The control clock should be the same clock used for controlling the surrounding logic as all control signals timing is relative to the cclk, simplifying the interface to the DLL master. There is no alignment required between the cclk and the clk2x.

Master input frequency

The master is designed to use a dual frequency clock from the PLL driving the interface logic. The dual frequency allows the master to use it when operated with a relatively slow frequency, keeping the maximal delay of the master delay line short. The selection between the two frequency measurement modes is automatic and requires no user intervention. The master would select to measure a double rate clock if the provided clock is too slow for it to measure a single rate clocks. The master would output the value of a ¼ cycle regardless of the actual measured clock.

Master initialization and locking

The initialization of the master is performed by internal logic with minimal user intervention. After the clocks are stable and the reset signal is de-asserted. The environment is required to make sure the dual clock input if stable at the correct frequency and assert the master enable signal. The master would take few hundreds of micro-seconds to lock and will eventually assert the master lock signal. To change the frequency and re-lock the master, the enable signal is to be de-asserted, the frequency changed and then the enable signal is to be re-asserted starting the lock sequence again. In the case where the master is required to use a dual clock, the locking time may be longer.

Slave configuration

The slave part of the DLL does not actually require initialization; it has an asynchronous reset signal and can also be initialized to its initial state through a synchronous soft reset. The slave is programed to a specific ¼ cycle delay through the master-to-slave quarter cycle vector. The slave would take the delay indicated by the vector and use it to delay the input signal accordingly. Any change the master does to the configuration vector will cause the slave to change the delay, either increasing or decreasing it.

Slave update

The slave update time is of high importance because any change to the delay may cause unexpected glitches on the delayed signal. To prevent this unwanted situation, the slave is to be updated only when the output is not in use. For a DDR application this may mean updating the slave during a write cycle or during refresh time. A low rate of update for the slave is acceptable as the DLL is designed to follow slow delay changes resulting from temperature drifts that occur over long periods of time. Choosing to update the DLL slave at the correct time is essential for proper slave operation. The update operation should be precisely synchronized to the slave input signal, so the input signal is idle when the slave delay line configuration is changed and for a few cycles after it.

Slave override

In order to allow for debug and testability of the slave and its surrounding logic, the slave supports two override methods. The first is an offset mechanism allowing the user to offset the slave configuration word by increasing or decreasing the slave delay by a constant value. This feature can be used for calibrating a DDR DQS signal to be placed at the center of the data valid window. The second feature enables the slave to completely override the master indication and set the number of delay elements directly. This feature can be used for exploration of the data valid window size for the purpose of DDR interface characterization and debugging.

Note: Care should be taken not to overflow the delay line configuration vector.

DFT and testability

The DLL supports scan ATPG for ATE fault detection. The scan chains of the hardened non-synthesizable parts of the DLL are pre-inserted into the netlist and cover most sequential and most combinational elements of the hardened design. For the synthesizable control logic, scan can be inserted using the normal DFT flow of the surrounding logic. 

Verification environment (VIP)

The below figure is a block diagram of the DDR DLL verification environment. The environment generates read transaction from the DRAM model. It adds skews and jitters on the DQ and DQS signal and drives them to the DUT. The DLL slave DUT delays the DQS by ¼ cycle and the environment uses the delayed signal to sample the incoming data. The sampled data is than compared to the data generated by the DRAM model for checking.

The environment models the immediate DLL environment and does not handle I/O, write transactions, DRAM preamble requirements and some other aspects belonging to the DDR PHY implementation.

DLL verification env diagram

 

Verification Environment blocks

The VIP is divided into several blocks with the following functionality:

DDR DRAM model

The DDR model receives read commands from the environment tasks block. It will generate a burst of read data per each request, driving the DQ[7:0] and DQS aligned per each octet. When there is no request the model would drive X on the DQ and DQS signal, modeling unknown value.

Skew generator

The skew generator would introduce skew between the different DQ bits and and DQS bit per each octet. This will effectively shorten the data valid window going to the PHY. The skew generator will also add jitter to the signals, modeled as X values around each data value change. The effect of the skew generator would be the modeling of a data eye per each data valid window.

DDR PHY model

The DDR PHY model is a simplified DDR PHY sampling data from the skew generator by using the delayed DQS from the DLL slave. The data is sampled for both rising and falling edge of the DQS. The sampled data is than sent to the environment scoreboard for comparison.

Scoreboard

The purpose of the scoreboard is to check that the data sampled by the delayed DQS is the correct one. The scoreboard would signal an error when the data was not sampled correctly. Any test which is targeted for passing would be required to pass all data comparisons correctly.

Clock generator

The clock generator generates a double frequency clock to the master and a clock to the DRAM model and the rest of the blocks in the environment.

Environment tasks

The environment tasks contain tasks that are used for generating the different test scenarios.

The main tasks are:

  1. Read command task, initiating read command.
  2. DLL master initialization task
  3. Reset de-assertion task.
  4. DLL slave update task.
  5. Data valid window measurement task, containing a sequence of operations that measure the size in slave delay line taps of the data valid window.
  6. DLL slave offsetting task enables the setting of an offset in the DLL slave.
  7. Skew generator configuration setting task, used for setting the skew generator to specific skew scenario and specific data valid window size.

Test Items

The following test item list summarizes the tests that were executed on the DLL. Some of the tests are also available as part of the environment deliverables.

All tests would run multiple times with different simulation seeds.

#

Item name

Description

Test name

1

Simple data sample

Check received data is sampled correctly after DLL locking. Use minimal jitter and skew

Test_0

2

Narrow DVW, Large skew

Check received data under high jitter and DQ bit skew

Test_0

3

Manual offset

Set the jitter and skew generator for skew between DQ and DQS and narrow DVW. Check test fails with no manual offset of slave and passes when an offset is programmed. Check offset in both directions

Test_0

4

Manual override

Set the skew generator to a valid window around specific taps. Override the slave delay line to the specific tap. Repeat for several taps. Override to a tap outside the window, expect failing samples, repeat for higher and lower taps

Test_2

5

Temperature cycle

Change master and slave delay line taps to different delays. Emulate temperature rise and fall, use narrow DVW, and check for no failures and constant DQS placement within the data valid window. The delay line temperature emulation is done by changing the delay of each element.

Test_3

 

 

 

 

7

DVW measurement

Implement sequence for finding DVW width, use different DVW sizes.

Test_4

8

Frequency support

Check simple scenario across all the valid frequency range, vary delay line elements delay through slow, fast, typical process corners delay according to library specification

Test_5

9

Minimal frequency

Check for minimal frequency using fast process delay line elements delay. Check frequency out of the range; verify master locks for a maximal delay of the delay line.

Test_5

10

Maximal frequency

Check for maximal frequency using slow process; verify master lock with minimal delay.

Test_5

 

 

 

 

 

 

 

 

 

 

DDR Data valid window calculation

The DLL can be used to measure the data valid window of the interface. This functionality can use the slave override vector and a software sequence to determine the start and end tap of the valid window. Using software or hardware sequence, the user can perform the following actions:

1.    Set the delay line tap vector to a specific location using override feature.

2.    Perform read transactions on the interface, using BIST or any other mechanism.

3.    Mark the result for the specific tap

4.    Repeat for all taps

The result would be a list of taps that pass the interface testing composing all the locations where the DQS can be allocated in relation to the DQ data bits and still pass the DDR read transaction.

Converting the results from a number of taps to actual delay requires deeper understanding of the delay line structure; please contact us for help on this subject.

DDR valid window sampling optimization

DQS-DQ diagram

The allocation of the DQS sampling strobe at the middle of the data valid window can potentially improve the performance of the DDR interface. After the data valid window taps are mapped using the slave override feature, the offset of the data valid widow center to the measured ¼ cycle from the master output can be derived. This offset can be used for placing the DQS strobe at the center of the data valid window through the slave offset feature. Using the offset feature will maintain the tracking capability of the master over the temperature changes while keeping the strobe as close as possible to the data valid window center.