1 2 3 Previous Next 94 Replies Latest reply on Dec 31, 2014 6:08 PM by user_14586677

    Reset Recovery Considerations

    bruce.horvath

       After experiencing Hard Resets due to EMI from an adjacent unit, I came to the realization that EMI Resets were going to be a difficult to eliminate. The CY8CKIT-030 was used for a development test bench; I hardly expected to do a better job of PCB design than Cypress engineers. And I would not be able to control the environment that the embedded PSoC3 would be installed in.

         

      This thread focuses on trying to figure out how to "peacefully" co-exist with Resets, both Soft and Hard varieties.

         

      Since there are two varieties of Resets, the first item is to determine the type currently occurring.

         

      Resets of the Soft variety are easier to work with; they leave SRAM as is and the Components and Registers are not reset to DWR settings. User code is resumed at main(). The main Soft Reset is the WDT Reset. RESET_SR0 contains the identity since the Registers are not reset. But lacking in support is a means to determine where in your code the WDT tripped, and a count of how many have occurred. Because the Stack has no meaning, and the Soft Reset clears the CPU registers, "crawling" the Stack (if idata was left uncleared) could possibly answer where the WDT tripped. A count of WDT events could be a user responsibility; SRAM as a count repository is possible (but an intervening Hard Reset clears SRAM and the WDT count would be lost). The normal main() function of initializing Components should probably be skipped with a Soft Reset. If the problem section of code can be determined, then selective initialization may cure the WDT problem.

         

      Hard Resets pose a significant challenge. The action to be taken is highly dependent on the functions that the PSoC3 is designed to make.

         

      With SRAM clearing as an option on a startup reset, the tradeoff of fast startup versus using a Global Variable as an SRAM clearing semaphore should be evaluated. Yes, CyResetStatus could be used to see what type of Hard Reset occurred (I have done extensive research of CyResetStatus and conclude it is not currently working for PSoC3 with Creator 3.0 SP2). But if the reset is of the PRESx variety then a working CyResetStatus does not set a bit (I experienced bad enough EMI to cause a PRESx set with LVID and LVIA not activated). 

         

      The Hard reset SRAM semaphore works by declaring a Global Variable that is not initialized. The last thing main() does is set the variable to a one. On entry to main(), examination of the variable shows if SRAM was cleared and thus a Hard Reset has occurred. A POR or XRES will show a zero semaphore. If the semaphore still contains the one, a Soft reset occurred (assumes that SRAM Clear remains the option).

         

      How to handle a Hard reset is the main consideration. I will explore what I am considering in followup posts. I am not sure with my particular project how to handle Hard resets. This thread is to solicit comments and suggestions which are welcomed.

         

      Thanks for your interest,

         

      Bruce

        • 1. Re: Reset Recovery Considerations
          bruce.horvath

           My project has some Time of Day dependncies. Currently supplied via the CY8CKIT-030 32k crystal clock and a few Counters. Time is set from the PC program which obviously only occurs when the PC is running and detects an incorrect time is being reported.

             

          A Hard reset clears the RTC Counters thus creating a problem.

             

          RTC modules with an I2C interface are available at a modest cost. RTC modules with crystals and backup batteries on a PCB are available for about $10 more for development purposes. The modules also produce a 1Hz square wave that can replace my need for a 1 second semaphore.  The PC update is a little bit different but not a significant problem to change (I2C code vs. Counter update).

             

          The RTC requirement requires a redesign but is solvable.

             

          The next item I need to look at is my use of two PWM outputs. A new post after I research this area.

             

          Bruce

          • 2. Re: Reset Recovery Considerations
            bruce.horvath

             A RTC followup.

               

            The KIT 32k oscillator requires some time to get up and running. This is delaying the restart from a Hard reset. A POR is not a problem, but an undesirable Hard reset delay makes things worse.

               

            A separate RTC module would only be started once or twice in its lifetime and thus not delay a restart. A small coin backup battery is mandatory to ensure that the RTC module does not get wiped clean on a power failure (another incremental cost).

               

            The KIT 32k oscillator is best left out of the equation. Any PSoC3 clocks dependent on the 32k clock need rethinking.

               

            On to the PWM outputs.

               

            Bruce

            • 3. Re: Reset Recovery Considerations
              bruce.horvath

               A retrogression,

                 

              "crawling" the Stack does not seem to be viable on further examination with my rudimentary Stack knowledge.,

                 

              The Stack contains addresses. Don't know how an address can be interpreted on the fly in main() to control subsequent recovery actions. Offline is another story but:

                 

              The Stack is a LIFO stack. Since we get going again in main(), main() is on the top of the stack. If remnants of the stack remain, the currnt main() entry has wiped out the area of code that was running at the time of the WDT fault.

                 

              Anybody like to comment on this topic?

                 

              Still researching PWM when I regressed!

                 

              Bruce

              • 4. Re: Reset Recovery Considerations
                user_1377889

                Maybe I am wrong, but I would never tend to analyze at which point of my program an externally caused event generated a reset.What new knowlege are you expecting to get? I would tend to link an EMI event to a reset event to prove a dependency, but this should be done totally independent of the program running.

                   

                It could turn out to be difficult catching an EMI-pulse, but there are several possibilities like a battery-powered surveilance device, a triggering scope or LA and probably some more.

                   

                 

                   

                Bob

                • 5. Re: Reset Recovery Considerations
                  bruce.horvath

                   Now on to the PWM outputs.

                     

                  One is used to control a proportional valve that does not have a good correlation between its analog input and the resulting output flow (pressure is the significant variation cause). The PWM feeds an integrating circuit that converts it to analog and also level shifts the output. Because of this imprecise condition, a flow turbine is installed to measure the effect. There is also a time delay factor in the valve action; the time delay helps reduce any reset restart delay problems. The flow turbine can be read immediately following a Hard reset and "reverse computed" to determine a good restarting point for the PWM. The reset condition is required to differentiate from a POR vs. an EMI reset for this to function.

                     

                  The second PWM is a "blind" output in that the PWM is in direct control of the device it feeds; no feedback. There is an interaction between the first PWM flow value and this PWM's interrupted value, sufficient I think to get back somewhere in the ballpark for a start. A delay in restarting this PWM will cause an interruption in the operation of the device; inertia of the device is somewhat of a saving factor (the device is an ECM and thus does not use a capacitor start motor, so resuming it at some indeterminate point is not a problem). Also a significant PWM step function can introduce a noticible effect. This area will require some empirical testing to smooth out the kinks. 

                     

                  I think as a result of lesser EMI noise, the ADC DelSig and other sensors exhibit some jitter. I have coded some Exponential Smoothing routines to reduce the jitter. My code starts with an Exponential Smoothing Period of 1, incrementing by 1 each sample until the desired Period is reached. Each variable has its own desired period. I think this smoothing is "friendly" to the reset startup.

                     

                  Next are the Digital outputs. More research before I know how the resets effect these.

                     

                  Bruce

                  • 6. Re: Reset Recovery Considerations
                    bruce.horvath

                     Bob,

                       

                    I used a triggering DSO to determine that my resets were EMI caused.

                       

                    I don't think it is possible with my project to eliminate EMI resets.

                       

                    The purpose of this thread is to explore how to recover from a reset; be it a WDT or EMI caused.

                       

                    I currently use synchronized API's, the untimeliness of any one of which may lead to a WDT event. Not all of my Data Acquisition sensors are used to control the process. It would be nice to know if a WDT was caused by a reporting only sensor so I could cut it out of the acquisition. I intended to address the synchronized/unsynchronized API calls further along; your point is well taken and I am addressing it now to indicate it is an area of concern since the WDT trip is currently devoid of clues. By using timed loops and API completion tests within and including WDT "tickling", an "infinite loop" can be avoided while knowledge of where the excessive delay is occurring is at hand. A WDT trip outside of the sync API calls should hopefully be eliminated during testing and debug phases.

                       

                    Thanks,

                       

                    Bruce

                    • 7. Re: Reset Recovery Considerations
                      bruce.horvath

                       Bob,

                         

                      On rereading your post I see what your EMI reset concern is.

                         

                      I don't care where the EMI reset occurs in my code. It is an entirely random event. I do care about how to recover from an EMI reset.

                         

                      I do care where a WDT event occurs in my code. It is probably in the currently executing call. Knowing where is instrumental in attempting to circumvent the problem and gives an indication if there is a specific hardware problem that needs addressing.

                         

                      My reference to a "sync" API should read "async" API.

                         

                      Thanks,

                         

                      Bruce

                      • 8. Re: Reset Recovery Considerations
                        ki.leung

                        A RESET in your case is not a good thing.

                           
                            For some issues that seems unavoidable, we normally start asking the questions   
                           
                            What is the purpose of knowing why it RESET?   
                           
                            What do you want to avoid?   
                           
                            What to do with this information? What to do without it?   
                           
                            What is the implication if do nothing?   
                           

                        and the manager would ask

                           

                        What is the cost of fixing or not fixing it?

                        • 9. Re: Reset Recovery Considerations
                          bruce.horvath

                           HL,

                             

                          Good questions.

                             

                          Why -- A Soft recovery is a minimal disruption. A Hard recovery is a more prolonged event.

                             

                          Avoid -- A prolonged disruption and possible loss of control. Damage to the fairly expensive equipment being controlled (some safety "valves" are in also in control but only actuate under extreme conditions, no human lives are at stake).

                             

                          Do nothing -- A probable "No Sale" condition. Or a "bad press" event.

                             

                          Cost -- could result in a Go/Nogo decision on project. Costs of a workaround are a one time development cost.

                             

                          Desirable (My project goal)-- A minimally disruptive condition as a result of a reset.

                             

                          Thanks for making me think about your questions,

                             

                          Bruce

                          • 10. Re: Reset Recovery Considerations
                            ki.leung

                            OK, new questions

                               
                                 1.       If there is an interruption, and not knowing when or how it was interrupted, is there a safe or acceptable sequence to restart the control?   
                               
                                 2.       If answer to 1 is yes, what is the sequence?   
                               
                                 3.       If there is an interruption and there is a way of knowing when and/or how it was interrupted and the system can re-establish the state of control at the time of interruption and continue ( after some delay  due to system restart). Is this acceptable?   
                               
                                 4.       If 3 is acceptable, can the system re-establish the state of control? What information is needed?   
                               
                                It seems that the hardware approach (ferrite core, shielding...) is rejected already. Any reason why?   
                            • 11. Re: Reset Recovery Considerations
                              user_1377889

                              Building fail-safe systems - and that is what I see you want to do - is a scientific job and it is - as you can see throughout the world - barely manageable. Having a high-energy cosmic particle that changes a bit in the flash over EMI events (where you focus on) to non-reactive external hardware as sensors and actuators is a wide field of events to cover with adequare provisions.

                                 

                              So handling the requirements the old-fashioned way "divide et impere" can lead us to success:

                                 

                              First, second and third points are: Shielding, shielding, shielding.

                                 

                              Shielding of the supply with ferrits, zener diodes and similar methods must be considered

                                 

                              Shielding of the inputs and outputs with similar provisions will provide leaking-in of failures through this path

                                 

                              Shielding of the complete device against magnetic or electric fields or radiation, static electricity

                                 

                              The next point is: Protection

                                 

                              As long as the device controls any external machinery (which is quite usually the case) there must be a control state at the inputs of the machinery that reflects a "safe" state. This state must be enforced on the controlling lines externally. So in case the device does not run, the controlled machine is in a safe state.

                                 

                              An indispensable point is: Self-Test

                                 

                              As you are probably aware of: Your PC does a self-test every time you switch it on, so you have to implement something similar. Checking the CPU and its environment, checking the sensors and checking the actuators (which might induce some design-considerations to "see" results)

                                 

                              Next point is Communication.

                                 

                              A safe device must be able to report its state to give the whole machinery a chance to act on changes accordingly.

                                 

                              Every PC power-supply has got a signal named "Power Good". This is the least indication a "dumb" device could deliver and which can signal the connected machine whether to trust the controlling signal or not. More complex is a "Heartbeat" signal which dynamically signals that the controller is still alife.

                                 

                              Last, but not least point is: Quality ensurance

                                 

                              Last PC comparision: When your PC was shut down improperly (Plug pulled) next time you start it you get informed about that issue. How? Just by recording the fact that the PC was powered up and by recording a power down, which will not happen in the case of the pulled plug.

                                 

                              A PSoC has got an EEProm which you could use to indicate a legal shut-down, the time (you are using an RTC) of the last power-up and when that is less than a second ago there must be something darn wrong.

                                 

                               

                                 

                              As you can see, there are some inavitable requirements you should focus on because that will help you to build designs that you and your customer can trust on.

                                 

                               

                                 

                              Bob

                              • 12. Re: Reset Recovery Considerations
                                user_14586677

                                "The second PWM is a "blind" output in that the PWM is in direct control of the device it feeds; no feedback. There is an interaction between the first PWM flow value and this PWM's interrupted value, sufficient I think to get back somewhere in the ballpark for a start. A delay in restarting this PWM will cause an interruption in the operation of the device; inertia of the device is somewhat of a saving factor (the device is an ECM and thus does not use a capacitor start motor, so resuming it at some indeterminate point is not a problem). Also a significant PWM step function can introduce a noticible effect. This area will require some empirical testing to smooth out the kinks. "

                                   

                                 

                                   

                                Clearly you have a feedback loop overall in system response. If you are converting PWM

                                   

                                output to DC there is a substantial delay vs ripple tradeoff in the output filter. Attached

                                   

                                is an analysis of filter approaches for simple filters. One useful way of getting at this is

                                   

                                to use matlab or Labview and simulate this. If you do not have these there are a number

                                   

                                of free spice simulators available, like LTC Spice or Tina TI.....

                                   

                                 

                                   

                                Regards, Dana.

                                • 13. Re: Reset Recovery Considerations
                                  user_14586677

                                  "Last PC comparision: When your PC was shut down improperly (Plug pulled) next time you start it you get informed about that issue. How? Just by recording the fact that the PC was powered up and by recording a power down, which will not happen in the case of the pulled plug.

                                     

                                  A PSoC has got an EEProm which you could use to indicate a legal shut-down, the time (you are using an RTC) of the last power-up and when that is less than a second ago there must be something darn wrong."

                                     

                                   

                                     

                                  This approach great for situation where you are getting detectable shutdowns, but your situation

                                     

                                  does not permit this.

                                     

                                   

                                     

                                  Some ap notes on emi/layout that may be of help. Attached.

                                     

                                   

                                     

                                  Regards, Dana.

                                  • 14. Re: Reset Recovery Considerations
                                    bruce.horvath

                                     Dana, Bob, et.al.,

                                       

                                    Good stuff! Plenty to keep me busy.

                                       

                                    Let me reply to the points raised.

                                       

                                    Prevention -- Ferrite cores tried and still in place, 15 count them, not much improvement. Same for Ceramic bypass filters. Sheilding, but adds a quantum increase in cost, as compared to learning how to co-exist. Frequency of intermittent EMI events is hard to guage, but I think some progress has been made in reducing them.

                                       

                                    Protection -- Yes there is a fallback to an existing rudimentary control system (that which is used in the "plain vanilla" system and would still exist in a PSoC3 supplemented system), PSoC3 is used the supplement it for reasons of improving the efficiency and capacity. My project uses digital output driven relays to control which system is RUNning the show. First thing a reset does is to drop power to the RUN relay; causes a noticible audible change to the units operation when reverting control.w A recovery causes a reverse change (a PWM monitor component is employed, whose input is also switched, this allows the reverse to be a gradual change since the plain vanilla setting is available). I think I need to take care that the relay switching does not cause EMI (currently have coil suppression and am inclined to add contact suppression). Don't want rapid EMI events from the relays.

                                       

                                    Communication -- There exists a BlueTooth radio link included. Only effective when a PC program is listening. Data transmitted includes sensor readouts, system state, and unusual conditions (sticky bits). Reset info is currently included but not very useful due to CyResetStatus current state of affairs; possibly useful statistically but not useful for remediation. BlueTooth complicates the shielding.

                                       

                                    2nd PWM -- the more I think about this area, the more "other" feedback items I realize. Temperature differentials can help determine the previous settings, but are only ballpark precision indicators. There exists a delay between making a control adjustment and seeing a temperature change due to the mass of the units components (the M in E=MC**2).

                                       

                                    1st PWM -- Integration circuits were designed with 5Spice, verified with analog scope, and are working nicely. Don't have any evidence of it being a source of interference.

                                       

                                    A WDT followup -- With the use of async API calls and testing with faster WDT settings to detect "near WDT conditions" , a WDT reset would indicate that a significant problem has occurred. I am currently not sure a WDT recovery is a wise move, I am currently inclined to only focus on Hard Reset recovery efforts. 

                                       

                                    Thanks again,

                                       

                                    Bruce

                                    1 2 3 Previous Next