1 Reply Latest reply on Dec 22, 2014 11:48 AM by userc_41545

    Reset Recovery Summary and Optimization

       This Subject summarizes the voluminous “Reset Recovery Considerations” Posts.


      I am writing this as a benefit to others, and also for myself since the recap effort refreshes the points in my mind.     



      The premise is to find ways to deal with resets that occur. Prevention of resets should also be made where possible. The cost of prevention can be high, particularly with environmental EMI caused resets, some of which may be in equipment not related to the embedded system.     



      A Watchdog Reset can be just as disruptive as the other resets; development testing should explore how close to the WDT threshold you are, with remedial efforts applied when the “cushion” is too low. A production system WDT event recovery is a user consideration; does it represent a solid failure or just a single occurrence. CyResetStatus is useful to report WDT events.     



      In the process of analysis of resets, a technique was developed to find where in code a reset was occurring. A byproduct was to be able to find those areas of code that were heavy consumers of MCU capacity. The technique is discussed below.     



      Resets fall into two supersets. Some existing documents contain misstatements.  From Cypress Tech Support:     


      - “Software reset and watchdog reset fall under soft reset sources.”          


      - “XRES, IPOR, PRES, LVI and HVI sources fall under hard reset sources.”          


      Only Soft resets are reported via CyResetStatus.  WDT implementation is counter intuitive in that it is a soft reset.          


       LVI, as a hard reset source, can be specified to produce either a reset or an interrupt (which has user code insertion spots).  The choice should be obvious.          



      Reset recovery should be tempered as to whether it represents an unrecoverable event (power supply failure), or a transient condition (EMI, Power Failure without backup power).     



      Reset recovery has the objective of restoring the system to the state where it was when a reset occurred while reducing the time of the outage. Critical systems may need some variety of backups. While a recovery is underway, some means is required to prevent making interim actions effective until the recovery is complete.     



      I use the 32K external crystal for an external clock for accurately timing components and producing a DIY RTC. The startup time for the 32K oscillator is a consideration. I am exploring using an external RTC Module with a small coin battery backup. A configurable square wave output is available for use within PSoC3 (may need synchronizing). Having a battery backup makes the RTC always available and oscillator startup delay is eliminated.     



      While it is possible to control components without saving their control parameters, it should be discouraged. The current value of these is the key to quickly return the system to the state  when the reset occurred. The use of EEPROM is not a viable answer due to duty cycle and to its large but limited lifetime. The use of SRAM can be the answer, but the default action of a most resets is to Clear SRAM. There is a  .cydwr  System option to not clear SRAM on startup (and resets), which appears to be the desired choice.     



      A method of determining if a POR Reset occurred is required. A POR requires establishing the system; otherwise a recovery is the method of choice. But POR is not reported so other detection is required. Examining a configuration SRAM entry could provide the differentiation. I employ a Global, “RUN”, which is used to switch between the backup and PSoC3: if it is set, then PSoC3 has been in control when the reset occurred. This applies to all hard resets; thus recovery is the same.     



      Most resets invoke INIT.A51 whose purpose is the restore initialized SRAM variables to their initial value. Initialized component control parameters stored in SRAM would negate being able to use them during recovery. Avoid initialization for recovery sensitive SRAM variables.     



      Any dependent SRAM variables would not need to be recalculated because not clearing SRAM would leave them intact. They should not be  initialized as a result of typical coding practices (else INIT.A51 will wipe them out).     



      Most resets return components to their DWR settings. There is no option to not do the reconfiguration. Therefore a  recovery task is to return the component settings to SRAM values. One reason why not clearing SRAM is desirable. The component reconfiguration for soft resets is a superset of hard resets; last step for soft is to re-establish the components to their last known state.     



      Knowing where in code a reset occurred may assist in the recovery or remedial action. It may require an additional set of steps. If the “where” included more than one possible action that might cause a reset (i.e., turning  motors  on or off ), then selectively inserting a digital output pin instruction, that is used as a DSO trigger, allows finding which of the actions is causing the reset. GlobalSignalRef might assist this by providing a signal rather than an instruction, but interpreting it based on time is more of an art.     



      The technique alluded to above was initially developed with “where” as the goal, but has evolved into a twofold purpose. The second purpose was to be able to statistically determine the heavy MCU users. I refer to the technique as CallName logging/sampling. It uses ENUM ordinal entries of the CallNames (with a suffix to make the compiler happy) and those calls of interest being prefixed with the “suffixed CallName” saved to a Global and a post call to restore the Global to the name of the caller. The Global thus contains the name of the called routine during its execution, and being restored at the completion. If the calling routine is not of sufficient interest, the post call could restore the Global to a “not of interest” indicator, zero fills the bill). The Global is only available after a reset if SRAM is not cleared and the Global is not initialized. If you decide to retain Clear SRAM, it would require Generated Code modifications to retain the Global (a Case has been entered to request such mods including Generated Code as possible participant).      



      Statistically sampling of CallNames; by using DMA, it is possible to sample the CallName Global with a destination of one of two places. An IDAC8 can be used with its output to an analog output pin; an analog scope monitoring the pin would show, via intensity, where most time is being spent. A 1 byte CallName Global output (limited to 256 values) should be discrete enough to see individually on the scope. If an “off board” communication link is available, then a UART (or other link) destination allows for a data stream for a PC analysis to produce a histogram and a usage ranking output. If the PC program were to also read and parse the CallName ENUM, the outputs could be more user-friendly.  DMA sampling frequency will need some empirical testing  to give useful results. If the UART is the method of choice, I recommend “wading” though the original Posts to learn a simplified DMA UART methodology (it will be found in Page 3).     



      AN60630 shows ways to optimize PSoC3, an excellent reference.  However, it is time and co$t intensive to apply the optimization to all portions of a project. The old 80/20 rule says find that 20% of the code which consumes 80% of the capacity and your development Dollar$ will be reduced doing optimization. CallName sampling can show statistically which of the calls are consuming the most capacity. Optimize the worst and work downward until the co$t vs. reward says stop.     





      Resets understanding has been a long road to travel but with the welcome help of other Forum contributors; sometimes just asking a question can lead to analyzing an area that might not be considered.     



       Now I need to apply these guidelines retroactively to my project.     








        • 1. Re: Reset Recovery Summary and Optimization

           The LVI_isr interrupt:





           The LVI interrupt is not implemented as a component. The LVI_isr is located in Generated Code, without a prefixed component name. The LVI_isr.c has provision for inserting user code. It can be activated by API’s documented in the System Reference Guide starting on page 84. The API’s are more user friendly than register bit banging.    



          No distinction is made as to the reason for the interrupt, it could be LVIA or LVID or HVI. It is a user responsibility to determine why the interrupt occurred, what to do about it, and to make a determination if continuing PSoC operation is viable.    



          There are at least three conditions that trigger an LVI interrupt; a transient, a reduction of voltage, or power supply failure. HVI interrupt is a different animal and may require eliminating LVI causes to conclude a HVI occurred.    



          The API side effects apply to interrupts (the SRG is not inclusive), thus a recursive interrupt condition may occur. It will require disabling the LVI_isr to prevent recursion until it is determined that a transient condition is the cause, at which point the interrupt may be re-enabled again.    



          Because my project uses an ADC DelSig with an Input Range of VSSA to VSSD, it is necessary to calibrate the ADC by using an external bandgap VRef (with VDDA set at 5v, I use 4.096v). I use VDDA as the basis of developing the VRef so that if VDDA drops the VRef input pin will ultimately drop along with VDDA. Since I have an external VRef, and it and other ADC inputs are sampled once per second, I may not activate LVIA; detection and remedial action of VDDA transients might be made during the ADC sampling code. The relationship of external VRef to VDDA is inverse; a drop of VDDA will cause the VRef conversion to increase. Once the VDDA drops to VRef, no further ADC increase is possible. If VDDA falls below the minimum 1.7v, the device will go into and remain in reset. Above the minimum and below the specified trigger level, the ADC remains operative; however with an Input Range dependent on VDDA it will not produce correct conversions.    



          LVID represents another challenge. A VDDD monitoring schema can be used ala VDDA monitoring. It requires another analog pin and Mux input. A failure of the VDDA power supply while VDDD is still operational could lead to a VDDD based VRef presenting excessive voltage to an analog input pin. VDDD does not have the ADC calibration requirement, thus a simple resistor divider could be used to present a voltage that does not exceed the electrical specifications of the analog pin when VDDA failure occurs. The voltage tolerance of a SIO pin is another possible solution to the VDDD VRef input. Since VDDD drives the MCU, if it drops below the PRES threshold, a reset will occur; and if remaining below PRES the PSoC will not leave the reset. My backup transfer is a digital output pin that drives a level converting circuit which in turn drives a relay. When VDDD drops below the PRES threshold the relay will be off.    



          A PRESD Event will most likely cause the loss of SRAM data due to low voltage. A POR will not contain any retained SRAM data because it starts from no voltage. Should be self-evident but stated here as a reminder.    



          If a condition exists where continuation is impossible, reverting to the backup is the answer in my project. If a backup is not available, then some method to stop the embedded system should be available.    



          This leads to what resets should main() detect and what alternatives are needed. It must handle POR; PRES essentially requires the same action as POR. With CyResetStatus not reporting any Hard resets, the decision needs some assistance.    



          Those resets that can be categorized should lead to recovery efforts.  LVI_isr.c is the starting place for those recovery efforts. It may seem strange to recommend another reset, but a Software Rest exiting LVI_isr.c, along with setting some semaphores, may get main() back in the picture with the semaphore guidance for executing a recovery (Caveat: I have not yet tested such a scenario).    



          While not an immediate function of the LVI_isr, subsequent recovery speed has been empirically found to be delayed by the use of a 32K watch crystal oscillator. This is with the default 32K Clock “Start on Reset” specified, and also programmatically starting the oscillator. The …_Start only sets bits in two registers; thus the delay is unexpected and needs further review. I use the 32K Clock in clocking multiple components and need to ensure it is running on startup and following a reset. It may be desirable to avoid restarting the oscillator on a soft reset to avoid unnecessary delay; or include an external RTC I2C Module that also provides a 32K square wave output (this speeds up all startup/resets).