- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Since using 3.7.0.3, wifi scans appear to not work a majority of the time anymore.
We check for if ( malloced_scan_result->status == WICED_SCAN_INCOMPLETE ) (else) as the snip.scan app does and the message never comes.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I do see that most times, the success message makes it to WWD, but not to the scan result callback.
Scan result: channel=0 signal=-66 ssid=1871MEMBER2 bssid=58:97:1e:56:22:24
3241: Event (interface, type, status, reason): WWD_STA_INTERFACE WLC_E_ESCAN_RESULT WLC_E_STATUS_PARTIAL WLC_E_REASON_INITIAL_ASSOC
Scan result: channel=0 signal=-69 ssid=ATT208 bssid=30:60:23:65:22:90
3241: Event (interface, type, status, reason): WWD_STA_INTERFACE WLC_E_ESCAN_RESULT WLC_E_STATUS_SUCCESS WLC_E_REASON_INITIAL_ASSOC
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I tried the WIFI scan using WICED 3.7.0-3 at my end using BCM4343W avnet kit and I did not see any issue. All the APs are being scanned same as previous versions. I tried on WICED 3.7.0 and am seeing the same APs.
Can you tell which platform you are using.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Using a custom platform. (most similar to the WWCD2 devkit)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
dstudejio wrote:
Since using 3.7.0.3, wifi scans appear to not work a majority of the time anymore.
Do you mean it was working with older SDK version?
If so, please share the debug logs of both old SDK and latest SDK for comparison.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
it was 100% working with 3.7.0.
the debug logs are the same. Each reports a set of WLC_E_ESCAN_RESULT packets that end with WLC_E_STATUS_SUCCESS.
the difference in one case or the other is that the callback isn't informed the scan is complete.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Current 3.7.x serial SDK is not usable for us due to the BESL memory leak and other regressions.
So I'm not going to test it right now.
Will carefully verify this once I got a SDK update.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
mwf_mmfae any thoughts? I notice the issue seems to occur more often when a large # of APs are present.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please describe the custom platform most similar to the WWCD2 devkit.
Within this forum, we support our developer kits and to some extent those of our partners.
Are you developing with a partner module?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I wasn't aware custom designs were not supported here. Can you direct me to a route to help support issues on here?
I realized that the networking worker thread had a very small (16 position) queue. the scan, being well over 16 APs, overflowed this queue and the "success" message was rejected. I've increased the size of the queue and this now works 100%.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I would need to understand what you mean by custom. All designs are custom for the most part from an application perspective.
However, the expectation here on the community forum is that you are user either our development kit and/or one from a module partner, along with a production module from that module partner as well.
So if you some how were developing with an SoC, then I would have to direct you back to the local team at Cypress that signed off on the engagement so that they could line up factory support.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
thank you for response. at this point the platform isn't important the
networking worker thread default message queue size is 16 elements which
causes issues when scanning than 16 wifi aps.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
dstudejio wrote:
thank you for response. at this point the platform isn't important the
networking worker thread default message queue size is 16 elements which
causes issues when scanning than 16 wifi aps.
The message queue size is not the limitation of the number of APs you can scan.
That are totally different things.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
well, each scan is sent directly to network worker thread from wwd, which
then does the callback. when there are more than 16 ap scanned, the queue
fills and the remainder of the messages don't get queued. for me,
increasing the message queue size helped the problem but i think a better
solution would handle any number of AP responses.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you want to check if it's really queue size issue,
add a debug code to the code enqueue the message so you will know
if the enqueue fails or not.
BTW, I'm sure I can scan more than 16 APs without modify the queue size.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm running into an issue that seems new in 3.7.0-3. I haven't analyzed what actually is causing the new issue, but it doesn't occur when my code base is reverted to 3.7.0.
I noticed when scans were completing successfully, there were always less than 16 APs reported. I looked into the callback mechanism and found that it was using the networking thread and that the the queue size was 16 items. I hypothesized that this was the problem and increased the queue size to 40. After this change, the scan callback chain was completing for much larger numbers of APs reported. I think in the area I was in, it was around 30 APs. I changed the queue size back and again the scans failed.
I admit the dynamics of my application may be different than the snip.scan application and perhaps the other platform, but I believe the mechanics of the issue constitute a race condition. The networking thread needs to process scans faster than the WWD can queue it up in order to handle larger numbers of APs. This issue will be troublesome for other applications than my own.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
dstudejio wrote:
I admit the dynamics of my application may be different than the snip.scan application and perhaps the other platform, but I believe the mechanics of the issue constitute a race condition. The networking thread needs to process scans faster than the WWD can queue it up in order to handle larger numbers of APs. This issue will be troublesome for other applications than my own.
Does snip.scan work without modifying queue size?
I don't want to jump into a conclusion about the fix for the problem.
(At least, it currently looks does not make sense to me about changing queue size.)
I remember you said it 100% works before 3.7.0-3 SDK.
In older SDKs, the queue size is the same.
I'd rather to figure out the root cause than quickly fix it with a workaround.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I agree with you that the queue resizing is an undesireable workaround. Unfortunately I'm not in a location for a couple days that has that large number of APs available so I can't do more testing on this issue for a bit.
However, I don't think any more testing is necessary. If you review the mechanism and design of the WWD scan process, it's clear the is not dependent on my application. The WWD is feeding scan items to the network queue faster than it can handle and the queue is too small to handle the number of APs that were in my location. IMO this is the root cause. The issue was present, but not exposed, in 3.7.0-3. My best guess based on reviewing the differences for 3.7.0-3 is that introduction of semaphores to the wifi scan process altered timing enough such that this issue was exposed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
BTW, which network stack are you using?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
for my application, NetX Duo. The issue appears without any IP network up.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
dstudejio wrote:
for my application, NetX Duo. The issue appears without any IP network up.
I never use NetX Duo, I use LwIP.
Anyway,
Add the code to check if you usually enqueue failure: (In WICED/internal/wifi.c )
if ( wiced_rtos_send_asynchronous_event( WICED_NETWORKING_WORKER_THREAD, (event_handler_t) scan_handler->results_handler, (void*) ( result_iter ) ) != WICED_SUCCESS )
{
// add your debug code here..
}
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry - I'm not in the test env where I can accomplish this right now.
I have to close on this investigation for now. I am 100% certain at this point based on my work that you will find that the debug code you mentioned will run.
I'm happy to discuss any solutions to remove the race condition.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Realized that one of the things that exacerbated this issue is that I had a higher priority task pretty much blocking the networking thread. With the networking thread unblocked, it seems to service this queue just fine. I'm still a little concerned that this design doesn't take into account queue fill up (for other real time loads).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am encountering the same issue. We have over 40 access points showing up and we never receive a scan complete status. With the introduction of the semaphore, snip.scan never proceeds past the first scan. We see the same problem in our application. Changing the semaphore timeout to 10 seconds provides a temporary workaround for us, but I think a proper fix is warranted.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
indeed. i just got word that a major release (wiced 4.0???) is due any day now. might fix this. I fixed this by making sure my higher priority threads weren't dominating & allowing the networking thread to process.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I fixed this by making sure my higher priority threads weren't dominating & allowing the networking thread to process.
> If you have a high priority thread dominating, then I think fixing it it is the right approach. You may have lot more problems than with just the scan results.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
agreed - the task priority change definitely helped with overall stability.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This issue has been fixed, the queue handling the scan result in wifi.c was getting over filled. The next SDk will have this fix.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
vik86 wrote:
This issue has been fixed, the queue handling the scan result in wifi.c was getting over filled. The next SDk will have this fix.
Can you post the fix so people can verify it right away?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
vik86 wrote:
This issue has been fixed, the queue handling the scan result in wifi.c was getting over filled. The next SDk will have this fix.
Can you just post the fix?
I think the best way to fix the issue reported on the forum is posting the fix rather than asking user to test next (not yet released) SDK.
This is the best way that reporter can review and verify it before new sdk release.
Otherwise, it's possible still has issue on next SDK, in such case people has to wait yet another next SDK.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Attaching a temporary early access fix. Replace this and <WICED_SDK>/WICED/internal
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Somehow, I feel uncomfortable about the changes, I think the fix is *wrong*.
Despite some pointless rename (scan_handler->in_scan_handler) which makes the diff bigger
and a unnecessary change to add g_scan_semaphore_and_timer_inited flag.
wiced_wifi_init() is supposed to be done before scan, I have no idea why you need g_scan_semaphore_and_timer_inited flag.
The main changes is:
Now you have 2 paths to call scan_handler.results_handler(scan_result_ptr);
One in WICED_NETWORKING_WORKER_THREAD and the other one is directly called by wiced_wifi_scan_networks_ex().
It's possible your directly call to scan_handler.results_handler() handle scan_complete before WICED_NETWORKING_WORKER_THREAD handle the rest of scan_results.
The original event queue guarantee the order to process the scan result so scan_complete will always be the last one.
I think the new code is worsen than original behavior.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am just wondering if it is the same problem I see when too many AP's are present. I didnt read the entire thread but I saw something similar and just wanted to confirm. Also are you alon dual band or 2.4G only. i think WCD2 board is dual band?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your explanation - I'm sorry if I misinterpreted.
I'm working only on 2.4G. But the env I am testing in has 30+ APs, even at 2.4Ghz. But many are in marginal range, which makes the issue appear intermittently (when less than 16 aps are reported).