9 Replies Latest reply on Jun 23, 2017 8:14 AM by mswi

    HyperRAM user refresh

    j.m.granville_2505236

      The HyperRAM data sheet says :

         

      The host system may also effectively increase the tCMS value by explicitly taking responsibility for performing all refresh and doing burst refresh reading of multiple sequential rows in order to catch up on distributed refreshes missed by longer transactions.

         

      The core DRAM array requires periodic refresh of all bits in the array. This can be done by the host system by reading or writing a location in each row within a specified time limit. The read or write access copies a row of bits to an internal buffer. At the end of the access the bits in the
      buffer are written back to the row in memory, thereby recharging (refreshing) the bits in the row of DRAM memory cells.

         

      Array Rows = 8192    Array Refresh Interval = 64ms     tRFH  =  40ns

         

      We want to use the user-refresh option, but the data & appnotes give no examples or guidelines on the most efficient way to do this.

         

      It was expected to find a "refresh++" command in the register space that would increment ROW and perform refresh inside tRFH, but there is no sign of that.

         

      Does that leave a dummy read, as in This can be done by the host system by reading or writing a location in each row within a specified time limit. as the only way to implement user refresh ?

         

      Here, we need to increment A21 - A9 (CA34 - 22) on each read, to scan the Rows.

         

      However, that looks like a lot of wasted clocks ( around 22?! & >> tRFH ) for what should be a very simple task.

         

      There also looks to be no method to disable the Auto-refresh, which means the user-refresh has to manage wasted Auto-refresh time slots, further impacting refresh energy efficiency.

         

      Can you give any links to user-refresh examples, and suggestions on how to do this in the least possible time ?

        • 1. Re: HyperRAM user refresh
          mswi

          [Max] Pasting your text so I can insert my comments:

             

          The HyperRAM data sheet says :

             

          The host system may also effectively increase the tCMS value by explicitly taking responsibility for performing all refresh and doing burst refresh reading of multiple sequential rows in order to catch up on distributed refreshes missed by longer transactions.

             

          The core DRAM array requires periodic refresh of all bits in the array. This can be done by the host system by reading or writing a location in each row within a specified time limit. The read or write access copies a row of bits to an internal buffer. At the end of the access the bits in the
          buffer are written back to the row in memory, thereby recharging (refreshing) the bits in the row of DRAM memory cells.

             

          Array Rows = 8192    Array Refresh Interval = 64ms     tRFH  =  40ns

             

          [Max] So I think you are using an Industrial temperature grade device at 100 MHz with tCMS = 4 us and tRWR=tACC=tRFH=40ns …

             

          We want to use the user-refresh option, but the data & appnotes give no examples or guidelines on the most efficient way to do this.

             

          [Max] It sounds like manual refresh is optional, and not a requirement. The most efficient way is to use self-refresh; all manual methods are less efficient. My suggestion is to let go of the idea of manual refresh since it cannot match the power, time or CPU efficiency of self-refresh.

             

          It was expected to find a "refresh++" command in the register space that would increment ROW and perform refresh inside tRFH, but there is no sign of that.

             

          [Max] Correct. If you want hardware support you should use the self-refresh logic. If you want to do manual refresh, then the work is fully manual – the device provides no “helper” commands.

             

          Does that leave a dummy read, as in This can be done by the host system by reading or writing a location in each row within a specified time limit. as the only way to implement user refresh ?

             

          [Max] Correct. You need to load a row to buffer (for read this happens when the RAM receives CA[23:16]); when the transaction ends the buffer is written to the DRAM array, which is the refresh. If you can trick your HyperBus controller into not clocking out data for a read transaction you can skip those cycles – in this case the theoretical minimum cost is tRWR + tACC (see Figure 3.7 in the HyperRAM datasheet). Note that this is twice the actual cost of self-refresh, which is tRFH (tRWR=tACC=tRFH=40ns for your device and frequency).

             

          Here, we need to increment A21 - A9 (CA34 - 22) on each read, to scan the Rows.

             

          [Max] Correct, for a single HyperBus read command that accesses at most one row. For linear burst read spanning N rows the command overhead only needs to be transmitted once.

             

          However, that looks like a lot of wasted clocks ( around 22?! & >> tRFH ) for what should be a very simple task.

             

          [Max] If you insist on using the bus for refresh, the time overhead can be high.

             

          There also looks to be no method to disable the Auto-refresh,

             

          [Max] Correct, there is no way to disable self-refresh.

             

          which means the user-refresh has to manage wasted Auto-refresh time slots, further impacting refresh energy efficiency.

             

          [Max] Manual refresh is not the way to minimize refresh energy cost; self-refresh is the most efficient way.

             

          Can you give any links to user-refresh examples, and suggestions on how to do this in the least possible time ?

             

          [Max] I don't have any examples of manual refresh. The only reason to use manual refresh is if you are required to perform such large, continuous, HyperBus transactions that self-refresh cannot keep up. In this case you cannot have the most efficient method since you must manually refresh the entire array within the time limit, irrespective of any progress made by self-refresh. Unless you have a clear and strong requirement to do HyperBus transactions that force you to manually catch up the refresh, I suggest that you use the self-refresh.

             

          [Max] My recommendation is to spend some time looking at your HyperBus controller to learn what it takes to support self-refresh. The only requirement on the RAM side, assuming you are using the default CR1[1:0] configuration, is that the HyperBus controller must keep all bus transactions to less than 4us. If you can do this then refresh is solved and you can move on to other problems.

             

          [Max] Also remember that tCMS is not specified, you may adjust it to be longer than the recommended value.  I suggest that the multiplier be either 1 (4 us) or 1.5 (6 us) so you can avoid manual refresh; the 2x multiplier is marginal at 85C for the Industrial temperature grade; and the 4x multiplier requires manual refresh.

          • 2. Re: HyperRAM user refresh
            j.m.granville_2505236

            Max, Thanks, great, that helps a lot.  Some clarify questions :

               

            [Max] Correct. You need to load a row to buffer (for read this happens when the RAM receives CA[23:16]); when the transaction ends the buffer is written to the DRAM array, which is the refresh. If you can trick your HyperBus controller into not clocking out data for a read transaction you can skip those cycles – in this case the theoretical minimum cost is tRWR + tACC (see Figure 3.7 in the HyperRAM datasheet). Note that this is twice the actual cost of self-refresh, which is tRFH (tRWR=tACC=tRFH=40ns for your device and frequency).

               

            We can output any clock count, so yes we can skip cycles. Provided we meet  tRWR + tACC, what is the minimum possible clock count  ?  eg  CA[23:16] loads on the 4th clock edge, if the clock ceases there, and CS# delays tACC, does that meet all requirements ? Put another way, is a CLK required during tACC ?

               

            I also see this in data "The device automatically enables this mode (Active Clock Stop) when clock remains stable for tACC + 30 ns....The Active Clock Stop mode must not be used in violation of the tCSM limit. CS# must go high before tCSM is violated."

               

            I presume that second part, applies only to Auto-refresh, and not user-refresh cases ?

               

            [Max] I don't have any examples of manual refresh. ... Unless you have a clear and strong requirement to do HyperBus transactions that force you to manually catch up the refresh, I suggest that you use the self-refresh.

               

            Yes, we do have "a clear and strong requirement to do HyperBus transactions that force us to manually catch up the refresh", hence these questions.

               

            I think Cypress really should provide manual refresh examples, as you have hinted at here, as there are many use cases that cannot tolerate 4us fragmentation.  Such examples should include minimal-edges possible cases.

               

             

               

            > There also looks to be no method to disable the Auto-refresh,  [Max] Correct, there is no way to disable self-refresh.

               

            Hmm... If we cannot disable Auto-refresh, is there any operational risk of a collision ? (between our refresh, and the still active background refresh). If the Auto-refresh timer/monostable is reset by every user refresh, we are probably OK, as we will refresh faster than 4us.

               

            If there is a possible refresh collision case, does that require added CLKs in order to complete both refresh tasks ? (eg 3 Clock Latency adds 6 edges) and/or a delay before CS# goes high ?

            • 3. Re: HyperRAM user refresh
              mswi

              Let's work the problem from the beginning. Here is a C&P from the datasheet, section 1 (emphasis is mine):

                 

              Because the DRAM cells cannot be refreshed during a read or write transaction, there is a requirement that the host not perform read or write burst transfers that are long enough to block the necessary internal logic refresh operations when they are needed.  The host is required to limit the duration of transactions and allow additional initial access latency, at the beginning of a new transaction, if the memory indicates a refresh operation is needed.

                 

              This means Job #1 for you is to assess your RAM read/write use case to see if your operations are are done at such a high duty cycle or duration that the required mode of automatic refresh cannot work for you. So far I have not seen evidence that you have completed this first step of due diligence. I have also not seen any other users present a use case where distributed refresh cannot work. So please perform this engineering analysis and report back to the forum with the result.

              • 4. Re: HyperRAM user refresh
                j.m.granville_2505236

                Hi Max,

                   

                It seems you are not reading my questions ?  Usually, Job #1 for Vendor support personnel, is to read the customer's questions.

                   

                So far, I have not seen evidence that you have completed this first step of tech support due diligence.

                   

                Perhaps one reason you have not seen any other users present a use case where distributed refresh cannot work, is you seem unable to comprehend why this may be a problem.

                   

                We are not using a FPGA here, so we do not have the luxury of any logic, and any clock speed.

                   

                Instead, we have limited clock speed choices, and limited cycles in which to complete tasks. A quite normal engineering compromise situation. We are trying to save development time.

                   

                Yes, that means we DO have a case where distributed refresh cannot work.

                   

                Perhaps you can find someone within Cypress who can answer our questions ?

                   

                You may also like to pass on the suggestion that Cypress do publish examples, of the least-number-of clocks needed to do manual refresh.

                   

                Q1:  We can output any clock count, so yes we can skip cycles. 

                   

                Provided we meet  tRWR + tACC, what is the minimum possible refresh clock count  ?  eg  CA[23:16] loads on the 4th clock edge, if the clock ceases there, and CS# delays tACC, does that meet all requirements ? Put another way, is a CLK required during tACC ? 

                   

                Q2: is there any operational risk of a collision ? (between our refresh, and the cannot-be-disabled background refresh). If the Auto-refresh timer/monostable is reset by every user refresh, we are probably OK, as we will refresh faster than 4us.

                   

                It seems the refresh window is locked to CS, so we are safe from collisions with cannot-be-disabled background refresh, during our long burst playbacks.

                   

                 

                   

                Q3:  Is there a release schedule for the next generation of HyperRAM ?

                • 5. Re: HyperRAM user refresh
                  mswi

                  I'm sorry, but I fail to see how the data you have provided so far proves that you cannot allow sufficient duty cycle for the automatic refresh to work per the design intent for this device. This is not about clock cycles or clock speed, it's about time.

                     
                        
                  • All you need to do is have CS# go high at least once every tCMS interval (nominally every 1 us or 4 us depending on the part number and operating temperature) and the refresh happens automatically in the most efficient way.
                  •     
                  • If you have enough time to manually refresh the entire part once every array refresh interval (every 16 ms or 64 ms depending on the part number and operating temperature), or to refresh a word every tCMS interval, then you have more than enough CS# high time to allow the automatic refresh to work. 
                  •     
                  • Many memory controllers perform memory transactions in cache line lengths, where the CS# goes high after some bounded number of cache line fills - so normally there are plenty of CS# high windows for refresh to take place automatically.
                  •    
                     

                  If you would be so kind as to tell me your MCU and memory controller, I would be happy to review the documentation to make our communication easier. 

                     

                  Q1: This question can wait until you can convince me that manual refresh makes sense for you. Until then my recommendation is to use this RAM design as it is intended to be used - i.e. with automatic refresh. This will give you the best performance for the least amount of development work, risk and system overhead.

                     

                  Q2: No, manual and automatic refresh operations cannot collide. Automatic refresh only happens as necessary after a CS# rising edge, and if the refresh is still in progress on the next CS# falling edge then the RWDS signal is driven high to request additional latency from the system.

                     

                  Q3: The current flash roadmap shows no new generation of HyperRAM through 2021: http://www.cypress.com/file/206951/download .

                  • 6. Re: HyperRAM user refresh
                    j.m.granville_2505236
                        

                    This is not about clock cycles or clock speed, it's about time.

                       
                       

                    Agreed. It is all about time.

                       
                        

                    Q1: This question can wait until you can convince me that manual refresh makes sense for you. Until then my recommendation is to use this RAM design as it is intended to be used - i.e. with automatic refresh. This will give you the best performance for the least amount of development work, risk and system overhead.

                       
                       

                    The data makes it clear manual refresh is possible, I am somewhat at a loss why a Cypress employee would patronize customers, and ignore what the data says,

                       

                    The problem is the data sheet covers manual refresh poorly, and fails to stipulate the minimum clocks required, which is why I ask here. I expected a simple numeric reply, not a game of ping pong.

                       

                    Since your experience of real applications seems very limited, here is a simple one for you :

                       

                    Display 800 pixels, at a Pixel clock of 38.1MHz, real time video, direct from HyperRAM, no external buffers. 

                       

                    The time budget here is 800/38.1M =  20.9973us, so I look forward to your example of how to stream 800 pixels, with no breaks, inside that tCMS limit !!

                       

                    Most readers will see at a glance that 20.9973us is larger than 4us, and clearly automatic refresh is off the table.

                       

                    There is a time of   7.559us during which user refresh may be applied, and that is ~26.47% of the total time, if the Cypress data did give the minimum clocks required for manual refresh, we could readily calculate how many refresh  cycles could fit into that 7.559us.

                       
                        

                    Q2: No, manual and automatic refresh operations cannot collide. Automatic refresh only happens as necessary after a CS# rising edge, and if the refresh is still in progress on the next CS# falling edge then the RWDS signal is driven high to request additional latency from the system.

                       
                       

                    That's a partial answer, but fails to stipulate what "happens as necessary" means.

                       

                    Q: Is there an auto-refresh monostable running ~ 4us, and if so, is that monostable reset every time a manual refresh is applied (ie a new row access). If this resets, then I would say they cannot collide, as now Auto-refresh defers to manual refresh.

                       

                    On the other hand, if there is no reset action, every ~4us the HyperRAM will decide it needs to insert additional latency, as it then inserts its own, totally redundant, refresh.

                    • 7. Re: HyperRAM user refresh
                      mswi

                      Hello j.m.granville_2505236:

                         

                      It's great that you are so confident in your experience and ability. However you seem to be missing key facts about how HyperBus and HyperBus Controllers work.

                         

                      You have not specified your hardware, but the typical HyperRAM user will have chosen an MCU with a HyperBus memory controller; alternatively users can choose an FPGA configured with a soft HyperBus controller. Cypress invented the Hyperbus protocol, and we created HyperBus Master Controller IP that we provide to MCU and FPGA chipset vendors to help them deliver HyperBus support to the market. Here is the list:

                             

                      Typically the supported chipsets are ARM-based systems where the HyperBus memory controller is an AXI slave connected via the AXI bus to one of the available AXI masters. 

                         

                      Cut & paste from your last transmission:

                         

                      >>Display 800 pixels, at a Pixel clock of 38.1MHz, real time video, direct from HyperRAM, no external buffers. 

                         

                      >> The time budget here is 800/38.1M =  20.9973us, so I look forward to your example of how to stream 800 pixels,

                         

                      >>  with no breaks, inside that tCMS limit !!

                         

                      >> Most readers will see at a glance that 20.9973us is larger than 4us, and clearly automatic refresh is off the table.

                         

                      Ok, so you are reading pixels (assume 4B/pixel) from the system memory map at 38.1 MHz. This is roughly 152 MB/s, which leaves plenty of headroom compared to the maximum HyperRAM read rate of 200 MB/s for the 3V part or 333 MB/s for the 1.8V part.

                         

                      Please read the following to learn why automatic refresh is clearly not off the table.

                         

                      Your turnaround time calculation for a line of pixels incorrectly assumes that the rate at which words are supplied to the graphics controller is the same as the rate at which words are read from the HyperRAM. The following graphic shows a block diagram of the HyperBus controller used in the Cypress Traveo MCUs (S6J3200 Series Hardware Manual); all derivatives of the Cypress HyperBus controller IP are expected to be similar:

                         

                      (BLOCK DIAGRAM)

                         

                       

                         

                      Note that the read path contains two FIFOs - the RX FIFO and the R DAT FIFO. These FIFOs are intended to enable the HyperBus to run faster than the AXI bus - once the RX FIFO is filled the HyperBus transaction is automatically stopped, and it restarts once the RX FIFO has space. The stopping and starting of the HyperBus transactions, so long as the peak loading rate is high enough, is completely invisible at the AXI bus - data pops off the R DAT FIFO at a rate determined by the AXI master. The difference in peak queue loading and unloading rates is where the time comes from that enables the insertion of periodic CS# high intervals (and other overhead time periods) into the HyperBus transaction stream. Yes, this does introduce jitter into the RX FIFO word loading rate, but the FIFO isolation ensures that this jitter is not transmitted to the AXI bus. 

                         

                      So once you choose the HyperBus clock frequency, you can use the tCMS value for your device (either 1 us or 4 us depending on the chip ordering option) to calculate the HyperBus command/data transaction size that fits within tCMS at your HyperRAM clock speed (see below for the overhead periods to account for). Then you set up your AXI Master to request that number of words per transaction. The AXI Master transaction properties are copied directly into the HyperBus command/address fields by the HyperBus controller.

                         

                      Here is the waveform for the HyperRAM read timing (S27KL0641, S27KS0641 HyperRAM™ Self-Refresh DRAM 3.0V/1.8V 64 Mbit (8 Mbyte):

                         

                      (TIMING DIAGRAM)

                         

                         

                      Note that the time between the falling edge of CS# and the subsequent rising edge of CS# is tCMS - the maximum CS# low time that allows for automatic refresh.

                         

                      For the next paragraph we can ignore the "additional latency" period shown in the graphic. 

                         

                      Now that the read transaction properties have been received by the HyperBus controller, the HyperBus controller then lowers CS#, transmits the command/address data cycles, waits the access time tACC, and receives the requested number of words into the RX FIFO (starting and stopping as necessary depending whether the RX FIFO is full or not).

                         

                      Once this transaction is complete the HyperBus controller raises CS# for tCSHI  (the minimum CS# high time between transactions) - it is at this point that a HyperRAM automatic refresh cycle may start. If automatic refresh initiates during tCSHI, then it either completes within tRFH while CS# is high, or if CS# goes low before the refresh completes then the flash sets RWDS high during the command/address cycles to request an additional latency period from the HyperBus controller to allow the refresh to complete - the graphic above shows this additional latency period. Then the rest of the HyperBus transaction completes normally.

                         

                      Notice there are multiple sources of per-transaction timing overhead on the HyperBus side:

                         
                            
                      • six cycles of command/address transmission time
                      •     
                      • tACCESS time:  >36 ns to 40 ns
                      •     
                      • tCSHI time: >10 ns
                      •     
                      • tRFH time: >36 ns to 40 ns
                      •    
                         

                      You will need to account for these overhead periods when sizing the AXI Master read transaction that fits within tCMS. Again, due to the FIFO isolation in the HyperBus controller, none of these overhead periods are visible on the AXI side if the HyperBus clock is fast enough.

                         

                      So now you have your answer for how to stream 800 pixels at 38.1 MHz with no breaks while not violating the tCMS limit:

                         
                            
                      • If you are using HyperRAM from Cypress with a HyperBus controller from one of our chipset partners, just follow the outline above. 
                      •     
                      • If you are not using a HyperBus controller from one of our chipset partners, your use case is unsupportable.
                      •    
                         

                      You can learn more about the HyperBus protocol here:

                             

                      You can see an example for how to use HyperBus efficiently with HyperFlash (HyperRAM is not covered): 

                             

                      Best of luck with your project. 

                         

                      Kind Regards,

                         

                      Max

                         

                         

                      • 8. Re: HyperRAM user refresh
                        j.m.granville_2505236

                        Hi Max,

                           

                        Again, your comprehension fails result in a tangent....

                           

                         

                           
                            

                        You will need to account for these overhead periods when sizing the AXI Master read transaction that fits within tCMS. Again, due to the FIFO isolation in the HyperBus controller, none of these overhead periods are visible on the AXI side if the HyperBus clock is fast enough.

                            

                        So now you have your answer for how to stream 800 pixels at 38.1 MHz with no breaks while not violating the tCMS limit:

                            

                        If you are using HyperRAM from Cypress with a HyperBus controller from one of our chipset partners, just follow the outline above. 
                        If you are not using a HyperBus controller from one of our chipset partners, your use case is unsupportable.

                           
                           

                        I clearly stated no external buffers and no breaks.  Your solution has both external buffers and breaks, so I mark that a *fail*.

                           

                        Interesting you claim the 'non dedicated controller case' is unsupported. Does Cypress really intend to Limit HyperBUS parts to use only with a HyperBus controller from one of our chipset partners. ?!

                           

                        The Data states that user managed refresh is possible, so someone is wrong ? Is it you, or is the data sheet wrong ?

                           

                        We can get the outputs we need from a single frame buffer, (here, the refresh is implicit) but the next improvement step is to refresh at least one other frame buffer, which is what we had hoped Cypress support would be able to assist with. We are happy to use only part of the total memory area.

                           

                        Your replies show you have only one song sheet, that you keep returning to for every use case.

                           

                        Our question remains open : What is the minimum clocks needed for a user refresh cycle ?

                           

                        My overall impression of HyperBUS design (and sadly the support training too) is that it was rather rushed, and 'done by an intern'.

                           

                        Much smarter would have been to have a refresh mode, where the user clocked and the timing was managed without needing to resend command and address for every row. That is plainly dumb and inefficient.

                           

                        This very simple feature would have saved refresh energy, and made HyperRAM easier to use in lower clock speed cases, by slashing the bus traffic.  The industry trend is to make memories smarter.

                        • 9. Re: HyperRAM user refresh
                          mswi

                          The datasheet is as clear as it needs to be on manual refresh. Yes, it is possible. The obvious way to do it is to set up a legal HyperBus read transaction of any type you like where the read transaction spans the region of memory you need to refresh; this constitutes the minimum number of cycles to perform refresh. Computing total clock cycles is trivial by referring to the datasheet. You are already doing this for one frame buffer, so just duplicate the process that is already working.