wiced_wifi_scan can fail to run/return WICED_SUCCESS in the following instances (gleaned from examining the source in 'wifi.c'):
- If another scan is in progress and it fails to finish within 3 seconds of the second scan's launch.
- If the scan cannot malloc enough memory for the required dynamic elements: a 100 entry BSSID table and a callback structure (< 50 bytes)
- If Wiced fails to add an event listener (at limit perhaps) for the scan result handler
- If Wiced cannot access the data bus
- If the data bus timesout
- or if the wifi chipset indicates that it failed to initiate the scan
That's what I can find digging through stuff. I will assume that the same path applies for wiced_wifi_get_rssi. I am not surprised that you can still transmit and receive during this time.
I can see the timming having an impact on us calling the to get the RSSI since we make the call once per second.
But, for the scan, it seems to occur after a long streak of failing the "wiced_wifi_join_specific" call. Usually we scan for networks, see the network, and join without a problem. But sometimes (on only a few of our devices), when we disconnect from a network due to being out of range, and then come back into range, the device will scan, then fail to join the network. After 10(?) minutes of join failing, the scan begins to fail.
Memory leaks are possible, but the memory is controlled by the wiced interface, which means the leak would have to be somewhere in the wwd or IOVAR libabries. There is 70KB of allocatable memory, so a large BSSID table shouldn't be a problem.
The data bus is also controlled by wiced, and our data transfers are still going through (when the RSSI fails occur). Therefore the data bus is still alive. Wiced can't use the bus for commands but can still use it for socket sends? seems unlikely.
So, I just found a issue with scanning that may explain your problem: When I wish to connect to a less than fully specified SSID (aka, SSID and password only), my application initiates a scan and upon finding the correct SSID, uses the resulting ap definition to initiate a connection. The problem that occurs is that it appears that connecting in the middle of a scan prevents the high level scan system from getting a completion message. This in turn leaves the semaphore that "protects" the scan system unset. However, due to (in my opinion, a rather large) bug in the scan call logic, if the call fails to get the semaphore, in the abort sequence it sets the semaphore. So, if the return value of scan indicates WICED_TIMEOUT and you initiate a second scan immediately after the first scan, the second one will likely succeed due to the fact that the semaphore is now set.
For us, we usually scan, and wait for the scan handler to return that it is done scanning completely. After which we proceed to search the list for the best connection and then initiate a join. I'm still searching for anywhere that we could be calling two API calls at the same time. So far not much luck, and our issue seems to be isolated to a few specific units (but that might just be the paranoia).
Thanks for the information Dan. We will look into scan logic.