Introduction
While keeping sight of my target when doing this project, there were some components on the Arty that I don’t really know how to use. One of this was the DRAM mounted on the board. Hence, I wanted to implement the most obvious effect that needs some kind of memory of previous samples.
Basically the delay will take at some point of the chain the samples at a rate of 48KHz, store them in a circular buffer in RAM and set up a memory pointer for reading at some distance of the writing pointer in such a way that the reading is yielding back audio from the past. The maths for this are quite simple in the end, with a sampling rate of 48KHz every memory address of distance between read and write adds ~20.8 microseconds of delay. So let’s say I want a delay of up to ~2 seconds we will need 192000 bytes of memory. It’s honestly not even that much when the mounted memory is 1GByte.
So, on the specification side, I need a circular buffer that can store the 16bits samples at 48KBaud and read back with a variable distance. I will treat this module as a FIFO for all purposes from now on.
Talking with the memory controller
Setting up the controller from Vivado was honestly a painful experience. I had to figure many things on the way, so let me put here a recipe that will be useful for the next time I have to implement a memory controller.
DDR Configuration
Almost all of the configuration can be intuitive, but Vivado might not recognize/remember you set up a Digilent board. The pin association I had to figure it out from the schematic and place it by hand. If for some reason I had to reconfigure again those settings were lost, so I spent a lot of time dealing with inserting many rows of pins on the configuration form. So, here it is the configuration of the Arty A7:
| |
Other settings are kind of enforced accordingly to what you want to do. I didn’t wanted AXI so I disabled that interface. Another tricky part was that the exact component of my board was not the one set by default. I set MT41K128M16XX-15E for my memory and data width of 16, with 4 banks and normal ordering.
While I don’t need crazy speed I will stick with correctness when possible, hence I set the impedance controls to RZQ/6 and using a memory address mapping as BANK-ROW-COLUMN.
In my application I have two clocks: system clock at 100MHz and DSP clock at 60MHz. I use the system clock for the controller, so I have to set in the configuration as “No Buffer” with reset active low and disabling the XADC because I want exclusive control from the analog front end.
Other settings aren’t complicated to understand, so I’m skipping. After this you get a memory controller with native interface that’s not too hard to use.
Communication protocol
Here things got a bit messy on my side. I did a controller integrated itself with the delay function, instead of implementing the controller and having a delay module using it. Hence, what comes next is not necessarily standard.
The memory interface generator (MIG from now on) exposes few controls:
| |
the rules are simple and the best way to make the control is with an FSM
| |
This FSM starts the moment there is the request from the host side to write a sample by setting host_wr_ff2, then waiting for the memory controller to be ready and available for writing. For writing into the memory, there is the limit of being able to write in packets of 128 bits, so I have to pack 8 of my 16 bits samples before executing a writing operation. For this, you have to set a memory address (that will have to move in steps of 16 because a byte is 8 bits), the content to be written and when the memory controller notifies that it’s ready for receiving commands, set app_wdf_wren and app_en. Something like this:
| |
A consequence of the steps of 8 bytes in writing is that the granularity of the delay is in that many samples. Before writing/reading I have to collect those 8 samples. I could just insert a single sample every 8 memory address, but I consider the granularity is good enough for the application, sitting at ~166us.
For reading is similar, but app_cmd changes in this case:
| |
the MIG will process the request and will notify when data is ready to be fetched.
| |
This is not particularly complicated to be honest, worth avoiding the AXI interface for this simple application.
Cross Domain Clocks
As an analog designer, the different clocks from controller and host was kind of an obvious problem to be solved: at some points one clock will sample on the flip-flops when the data is not ready, resulting in what’s defined as metastability.
The trick to solve the problem? just put another flip-flop. Basically a metastable latch will take relatively some clocks to resolve the metastability (depending on the architecture) so the additional flip-flop will let the previous one time to resolve the metastability (which is an undefined state, not necessarily mid-range between zero or one). This is almost a stochastic process and usually one FF is enough to avoid audible glitches, but if that was not the case, a third one will reduce the probability of a propagating bad state to extremely low values. I discovered this is something digital designers have to give a lot of thought. For me, it was expected.
The synchronizer has to exist on the own clock domain receiving from the cross domain, so for example in the host, the piece of module will look like this:
| |
So ram_wr_flag is getting generated on the MIG clock domain, but it passes through two registers ram_wr_ff1 and ram_wr_ff2 sampled on the host domain before being used (ram_wr_ff2 being the useful bit, never use ram_wr_ff1). Same for the MIG domain, reasoning on the contrary.
Implementation at the top
The MIG will launch a calibration step at power up, so on the top module, we have to hold the controller in reset until the calibration is done. Usually there is a flag to be read, but in my case I just wait for some time
| |
with the actual instantiation of the module being like:
| |
remember I wanted the XADC module exclusive for the front end, hence the temperature is just hard-coded into the module and we disregard the effect on the calibration for this application.
Conclussion
This is almost a recipe for a MIG implementation, where the key lessons are
- Next time, separate the controller from the user. In the way I did, I can’t really use the memory for something else
- The CDC was an expected behavior, with an intuitive solution that happened to be also the standard solution
- As the project increases in complexity, I will be needing some way for testing (DFT). I will be writing about my solution on this in the next blog
