Your numbers seem to be off from the errno definitions in LWIP. don't see a '115'. '11' is "not connected"; '5' is "operation in progress".
Typically ERR_INPROGRESS means the operation hasn't been finished when the call that invokes the operation returns, in your case, perhaps is the lwip_send() call. It is usually for non-blocking calls.
#define ERR_OK 0 /* No error, everything OK. */
#define ERR_MEM -1 /* Out of memory error. */
#define ERR_BUF -2 /* Buffer error. */
#define ERR_TIMEOUT -3 /* Timeout. */
#define ERR_RTE -4 /* Routing problem. */
#define ERR_INPROGRESS -5
/* Operation in progress */
#define ERR_VAL -6
/* Illegal value. */
#define ERR_WOULDBLOCK -7
/* Operation would block. */
#define ERR_IS_FATAL(e) ((e) < ERR_VAL)
#define ERR_ABRT -8 /* Connection aborted. */
#define ERR_RST -9 /* Connection reset. */
#define ERR_CLSD -10 /* Connection closed. */
#define ERR_CONN -11 /* Not connected. */
#define ERR_ARG -12 /* Illegal argument. */
#define ERR_USE -13 /* Address in use. */
#define ERR_IF -14 /* Low-level netif error */
#define ERR_ISCONN -15 * Already connected. */
the list is in lwip\arch.h, atleast that seems to be the one used since I do see a 115 being reported. Below is the full list as seen in the file.
EDIT: The list you posted seems to be the LWIP error codes and not the errno codes. the return values are quite different from the errno values which have to be checked seperately based on what lwip returns as an error. In this specific case, LWIP returns a timeout code, i.e. -3, but the errno is checked when this occurs and the errno can be any of the first 3 mentioned.
#define EPERM 1 /* Operation not permitted */
#define ENOENT 2 /* No such file or directory */
#define ESRCH 3 /* No such process */
#define EINTR 4 /* Interrupted system call */
#define EIO 5 /* I/O error */
#define ENXIO 6 /* No such device or address */
#define E2BIG 7 /* Arg list too long */
#define ENOEXEC 8 /* Exec format error */
#define EBADF 9 /* Bad file number */
#define ECHILD 10 /* No child processes */
#define EAGAIN 11 /* Try again */
#define ENOMEM 12 /* Out of memory */
#define EACCES 13 /* Permission denied */
#define EFAULT 14 /* Bad address */
#define ENOTBLK 15 /* Block device required */
#define EBUSY 16 /* Device or resource busy */
#define EEXIST 17 /* File exists */
#define EXDEV 18 /* Cross-device link */
#define ENODEV 19 /* No such device */
#define ENOTDIR 20 /* Not a directory */
#define EISDIR 21 /* Is a directory */
#define EINVAL 22 /* Invalid argument */
#define ENFILE 23 /* File table overflow */
#define EMFILE 24 /* Too many open files */
#define ENOTTY 25 /* Not a typewriter */
#define ETXTBSY 26 /* Text file busy */
#define EFBIG 27 /* File too large */
#define ENOSPC 28 /* No space left on device */
#define ESPIPE 29 /* Illegal seek */
#define EROFS 30 /* Read-only file system */
#define EMLINK 31 /* Too many links */
#define EPIPE 32 /* Broken pipe */
#define EDOM 33 /* Math argument out of domain of func */
#define ERANGE 34 /* Math result not representable */
#define EDEADLK 35 /* Resource deadlock would occur */
#define ENAMETOOLONG 36 /* File name too long */
#define ENOLCK 37 /* No record locks available */
#define ENOSYS 38 /* Function not implemented */
#define ENOTEMPTY 39 /* Directory not empty */
#define ELOOP 40 /* Too many symbolic links encountered */
#define EWOULDBLOCK EAGAIN /* Operation would block */
#define ENOMSG 42 /* No message of desired type */
#define EIDRM 43 /* Identifier removed */
#define ECHRNG 44 /* Channel number out of range */
#define EL2NSYNC 45 /* Level 2 not synchronized */
#define EL3HLT 46 /* Level 3 halted */
#define EL3RST 47 /* Level 3 reset */
#define ELNRNG 48 /* Link number out of range */
#define EUNATCH 49 /* Protocol driver not attached */
#define ENOCSI 50 /* No CSI structure available */
#define EL2HLT 51 /* Level 2 halted */
#define EBADE 52 /* Invalid exchange */
#define EBADR 53 /* Invalid request descriptor */
#define EXFULL 54 /* Exchange full */
#define ENOANO 55 /* No anode */
#define EBADRQC 56 /* Invalid request code */
#define EBADSLT 57 /* Invalid slot */
#define EDEADLOCK EDEADLK
#define EBFONT 59 /* Bad font file format */
#define ENOSTR 60 /* Device not a stream */
#define ENODATA 61 /* No data available */
#define ETIME 62 /* Timer expired */
#define ENOSR 63 /* Out of streams resources */
#define ENONET 64 /* Machine is not on the network */
#define ENOPKG 65 /* Package not installed */
#define EREMOTE 66 /* Object is remote */
#define ENOLINK 67 /* Link has been severed */
#define EADV 68 /* Advertise error */
#define ESRMNT 69 /* Srmount error */
#define ECOMM 70 /* Communication error on send */
#define EPROTO 71 /* Protocol error */
#define EMULTIHOP 72 /* Multihop attempted */
#define EDOTDOT 73 /* RFS specific error */
#define EBADMSG 74 /* Not a data message */
#define EOVERFLOW 75 /* Value too large for defined data type */
#define ENOTUNIQ 76 /* Name not unique on network */
#define EBADFD 77 /* File descriptor in bad state */
#define EREMCHG 78 /* Remote address changed */
#define ELIBACC 79 /* Can not access a needed shared library */
#define ELIBBAD 80 /* Accessing a corrupted shared library */
#define ELIBSCN 81 /* .lib section in a.out corrupted */
#define ELIBMAX 82 /* Attempting to link in too many shared libraries */
#define ELIBEXEC 83 /* Cannot exec a shared library directly */
#define EILSEQ 84 /* Illegal byte sequence */
#define ERESTART 85 /* Interrupted system call should be restarted */
#define ESTRPIPE 86 /* Streams pipe error */
#define EUSERS 87 /* Too many users */
#define ENOTSOCK 88 /* Socket operation on non-socket */
#define EDESTADDRREQ 89 /* Destination address required */
#define EMSGSIZE 90 /* Message too long */
#define EPROTOTYPE 91 /* Protocol wrong type for socket */
#define ENOPROTOOPT 92 /* Protocol not available */
#define EPROTONOSUPPORT 93 /* Protocol not supported */
#define ESOCKTNOSUPPORT 94 /* Socket type not supported */
#define EOPNOTSUPP 95 /* Operation not supported on transport endpoint */
#define EPFNOSUPPORT 96 /* Protocol family not supported */
#define EAFNOSUPPORT 97 /* Address family not supported by protocol */
#define EADDRINUSE 98 /* Address already in use */
#define EADDRNOTAVAIL 99 /* Cannot assign requested address */
#define ENETDOWN 100 /* Network is down */
#define ENETUNREACH 101 /* Network is unreachable */
#define ENETRESET 102 /* Network dropped connection because of reset */
#define ECONNABORTED 103 /* Software caused connection abort */
#define ECONNRESET 104 /* Connection reset by peer */
#define ENOBUFS 105 /* No buffer space available */
#define EISCONN 106 /* Transport endpoint is already connected */
#define ENOTCONN 107 /* Transport endpoint is not connected */
#define ESHUTDOWN 108 /* Cannot send after transport endpoint shutdown */
#define ETOOMANYREFS 109 /* Too many references: cannot splice */
#define ETIMEDOUT 110 /* Connection timed out */
#define ECONNREFUSED 111 /* Connection refused */
#define EHOSTDOWN 112 /* Host is down */
#define EHOSTUNREACH 113 /* No route to host */
#define EALREADY 114 /* Operation already in progress */
#define EINPROGRESS 115 /* Operation now in progress */
#define ESTALE 116 /* Stale NFS file handle */
#define EUCLEAN 117 /* Structure needs cleaning */
#define ENOTNAM 118 /* Not a XENIX named type file */
#define ENAVAIL 119 /* No XENIX semaphores available */
#define EISNAM 120 /* Is a named type file */
#define EREMOTEIO 121 /* Remote I/O error */
#define EDQUOT 122 /* Quota exceeded */
#define ENOMEDIUM 123 /* No medium found */
#define EMEDIUMTYPE 124 /* Wrong medium type */
You got this from WICED-SDK-xxxx tree?
Yes, wiced 2.4.0.
WICED uses this one-
Yes, wiced does use that one also. The file you are refering to is used for the return values in LWIP calles, sometimes, since most error return values are hard set to -1 completely disregarding the actual error value, this is why the errno has to be checked. errno != return value. errno is set by anything in the system to indicate the status of the last error. The lwip_send call returns a -1 value, which means nothing at all, because any error that occurs in lwip_send causes a return of -1, therefore I check the errno of the system which is also set by lwip calls deeper in the stack, and I find 115 or 5 or 11.
if you got a postive return code from the lwip_send(), it is actually the bytes of data sent.
LWIP is not returning a positive value, it is returning -1. But this does not provide any information about the error. the ERRNO is a global value, set by lwip, but not returned by lwip.
Does this make sense, I'll repeat, errno is NOT the same thing as the return value from lwip_send.
You are right that WICED uses both, what a mess!
I search for EINPROGRESS, but it doesn't turn up being referenced. It is really bizzard that you saw 115.
for EIO, it is mapped to ERR_ARG and also used in err_to_error() when passing an error code exceeding err_to_errno_table size.
I can't copy/paste the source code, I have been having problem to paste stuff into this forum. DNK if it is my browser or the forum edit.
WICED does not use the LwIP errno and you should disregard it when attempting to debug connectivity problems.
Why are you using lwip_send() and not the WICED API?
It sounds like you may be mixing native LwIP objects with WICED objects which will cause all sorts of bad behaviour and possible memory corruption.
Our platform does not allow us to use the full wiced implementation. We are using the wwd API with lwip to implement our solution.
All outgoing packets will end up in low_level_output() found in wwd_network.c under the Wiced/Network/LwIP/wwd directory.
I would put a break point on the failed return statement and see if you end up there (it returns ERR_INPROGRESS). If so then the reason of the failed send is because you have lost Wi-Fi connectivity for some reason.
Can you take a sniffer trace of the events that lead up to be disconnected from the AP?
We mitigated the issue by retrying to send the same packet again up to 5 times. This seems to have "fixed" the problem, since we manage to successfully transmit after the second or third try. Before the retry logic, we did capture the network traffic between the device and the system it was communicating with. All we saw was that we would be happily sending packets when all of a sudden, nothing would come out of the device anymore. Now, with the retries, we manage to stay connected.
More about the history of the problem - The LWIP return value that comes with the errno of 115 or 11 is usually a -3, which is a simple timeout. This timeout was added into LWIP by the wiced group which is evident by the comment "/* WICED_CHANGES - added timeout check */". Now, we did modify this timout check because we noticed that the variable used as a value for the timeout, apimsg->msg.msg.bc.timeout was complete garbage since "bc" is part of a union where the apiflags field in the "w" struct that is part of the union with bc and other things, was being set somewhere in the call stack for lwip_send. Now most of the time, because of the memory on the stack and the allignment with the different structures in the union, the timeout value was kept as 0, which corresponds to wait forever. Sometimes however, due to memory allignment in the structures in the union, timeout could become a garbage value that would lead to a non-wait-forever timeout. Long story short: After realizing that the timeout forever caused the system to hang forever in the case of a EINPROGRESS or ETRYAGAIN or anyother errno, we changed the timeout value from apimsg->msg.msg.bc.timeout to a hardcoded value of 10,000ms. This is what allowed us to retry the transmission that times out due to EINPROGRESS of ETRYAGAIN, and mostly resolves the issue. The only remaining problem is, why is the system hangning to the point that it takes 10 seconds to send out a single TCP packet on a not too congested network when 99% of packets make it out without a hitch?
Thank you for the detailed analysis.
The TCP packet may not be flagged as sent until an ACK has been received from the other side. The error may not be at an 802.11 layer but rather on the IP layer as it routes the packet to the server.
Can you try run the same code but connect to the HTTP server on the AP? I expect you will need to change the processing logic in your code but it might to worthwhile to see if you have similar problems when connecting to a TCP service that won't have any routing issues. If you have continued lockup while talking to the AP then it would seem to indicate an issue at the 802.11 layer.