ssl_handshake_client_async freeze at state SSL_CLIENT_CERTIFICATE

Tip / Sign in to post questions, reply, level up, and achieve exciting badges. Know more

cross mob
lock attach
Attachments are accessible only for community members.
StKa_2187866
Level 2
Level 2
First like received

Hi everyone,

WICED 4.1.0 cannot connect to Azure IoT Hub over TLS-based MQTT connection due to WICED's inability to establish a TLS channel.

Direct connection has never worked for us since our product was based on WICED 3.5.2. That's why we have spinned off TLS proxies (NGINX), while waiting for a fix. However, it seems that neither 4.1.0 is solving the issue, so here is some investigation that may help you with debugging.

We took Wireshark traces for the following scenarios:

1) Connected MQTT.fx app directly to Azure IoT Hub (OK)

2) Connected WICED's secure_mqtt app directly to Azure IoT Hub (Not OK)

3) Connected WICED's secure_mqtt app through NGINX proxy (OK)

After looking into the wireshark traces, we could see that there is a one major difference between our NGINX proxies and Azure IoT Hub. Along with the "Server Hello" message, Azure IoT Hub is also sending "Client Certificate Request" to the client specifying that a client cert is requested.

However, the MQTT authentication in Azure IoT Hub is not based client certificates, but rather on credentials. In such a case, the client is not expected to send any cert. This is where the trace from MQTT.fx are useful. What that app did to deal with this issue was to simply answer, with no certificates. In fact, according to RFC5246:

If no suitable certificate is available, the client MUST send a certificate message containing no certificates.

Unfortunately, WICED's approach is a bit different. What we observed was that the app got stuck in a call to function ssl_handshake_client_async() while processing state SSL_CLIENT_CERTIFICATE. Eventually the app crashed due to a simple watchdog, as bdide_1722426 described in this unanswered question.

Sadly, I cannot dive deeper and understand why ssl_handshare_client_async fails. This is where I need some help. In order to provide some proof, below you can find attached the .pcapng files containing the traces of the aforementioned scenarios.

I am really looking forward to a reply from some representative of Cypress. This has been a big issue for a long time.

Thanks in advance,

Stavros.

1 Solution
AndrewY_86
Employee
Employee
25 sign-ins 25 replies posted 10 replies posted

Thanks for reporting this, please use this workaround in the meantime

libraries/protocols/MQTT/mqtt_network.c

in mqtt_network_init(), make the following change

else

{

    wiced_tls_init_context( &socket->tls_context, NULL,  (const char*) conn->peer_cn );

}

to

else

{

    wiced_tls_identity_t *fake_tls_identity = malloc( sizeof(wiced_tls_identity_t) );

    if( fake_tls_identity != NULL )

    {

        memset( fake_tls_identity, 0, sizeof(wiced_tls_identity_t) );

        wiced_tls_init_context( &socket->tls_context, fake_tls_identity,  (const char*) conn->peer_cn );

    }

    else

    {

        wiced_tls_init_context( &socket->tls_context, NULL,  (const char*) conn->peer_cn );

    }

}

View solution in original post

0 Likes
9 Replies
AxLi_1746341
Level 7
Level 7
10 comments on KBA 5 comments on KBA First comment on KBA

kafouros wrote:

Hi everyone,

WICED 4.1.0 cannot connect to Azure IoT Hub over TLS-based MQTT connection due to WICED's inability to establish a TLS channel.

Direct connection has never worked for us since our product was based on WICED 3.5.2. That's why we have spinned off TLS proxies (NGINX), while waiting for a fix. However, it seems that neither 4.1.0 is solving the issue, so here is some investigation that may help you with debugging.

We took Wireshark traces for the following scenarios:

1) Connected MQTT.fx app directly to Azure IoT Hub (OK)

2) Connected WICED's secure_mqtt app directly to Azure IoT Hub (Not OK)

3) Connected WICED's secure_mqtt app through NGINX proxy (OK)

After looking into the wireshark traces, we could see that there is a one major difference between our NGINX proxies and Azure IoT Hub. Along with the "Server Hello" message, Azure IoT Hub is also sending "Client Certificate Request" to the client specifying that a client cert is requested.

However, the MQTT authentication in Azure IoT Hub is not based client certificates, but rather on credentials. In such a case, the client is not expected to send any cert. This is where the trace from MQTT.fx are useful. What that app did to deal with this issue was to simply answer, with no certificates. In fact, according to RFC5246:

If no suitable certificate is available, the client MUST send a certificate message containing no certificates.

Unfortunately, WICED's approach is a bit different. What we observed was that the app got stuck in a call to function ssl_handshake_client_async() while processing state SSL_CLIENT_CERTIFICATE. Eventually the app crashed due to a simple watchdog, as bdide_1722426 described in this unanswered question.

I observe the same problem on sdk-3.7.0-7.

It's a surprise that the app got stuck and then crashed due to watchdog if no certificate/key provided.

The WICED's TLS library really need fix.

Anonymous
Not applicable

Very relevant question, precisely documented. Cypress must fix this!

AndrewY_86
Employee
Employee
25 sign-ins 25 replies posted 10 replies posted

Thanks for reporting this, please use this workaround in the meantime

libraries/protocols/MQTT/mqtt_network.c

in mqtt_network_init(), make the following change

else

{

    wiced_tls_init_context( &socket->tls_context, NULL,  (const char*) conn->peer_cn );

}

to

else

{

    wiced_tls_identity_t *fake_tls_identity = malloc( sizeof(wiced_tls_identity_t) );

    if( fake_tls_identity != NULL )

    {

        memset( fake_tls_identity, 0, sizeof(wiced_tls_identity_t) );

        wiced_tls_init_context( &socket->tls_context, fake_tls_identity,  (const char*) conn->peer_cn );

    }

    else

    {

        wiced_tls_init_context( &socket->tls_context, NULL,  (const char*) conn->peer_cn );

    }

}

0 Likes

Hi @anyu! Thanks for your reply. The fix indeed works in the test app which is based on SDK 4.1.0! The device managed to connect to Azure IoT hub and I have exchanges messages.

But I have two questions:

1) Is the dynamically allocated resource released in the wiced_tld_init_context?

2) Is this fix applicable in SDK 3.7.0?

0 Likes

Never mind about question (2). It also works on SDK 3.7.0

Thanks!

anyu wrote:

Thanks for reporting this, please use this workaround in the meantime

libraries/protocols/MQTT/mqtt_network.c

in mqtt_network_init(), make the following change

  1. else 
  2.     wiced_tls_init_context( &socket->tls_context, NULL,  (const char*) conn->peer_cn ); 
  3.  

to

  1. else 
  2.     wiced_tls_identity_t *fake_tls_identity = malloc( sizeof(wiced_tls_identity_t) ); 
  3.     if( fake_tls_identity != NULL ) 
  4.     { 
  5.         memset( fake_tls_identity, 0, sizeof(wiced_tls_identity_t) ); 
  6.         wiced_tls_init_context( &socket->tls_context, fake_tls_identity,  (const char*) conn->peer_cn ); 
  7.     } 
  8.     else 
  9.     { 
  10.         wiced_tls_init_context( &socket->tls_context, NULL,  (const char*) conn->peer_cn ); 
  11.     } 

This is very strange code.

So you allow malloc for fake_tls_identity fails and it fall back to the same code as before?

In additional, what if you fixed the BLE library? How can the user know it no longer needs to add the additional malloc?

0 Likes
AndrewY_86
Employee
Employee
25 sign-ins 25 replies posted 10 replies posted

This is a workaround, not a fix. The downside is that the allocated resources do not get freed, you can implement this in your cleanup code. Thread can be updated when BESL fix gets put in SDK

anyu wrote:

This is a workaround, not a fix. The downside is that the allocated resources do not get freed, you can implement this in your cleanup code. Thread can be updated when BESL fix gets put in SDK

People outside of cyress team does not know *when* this workaround can be removed.

In additional, when other people search the issue on the forum, people usually copy-paste

the same code. Such "workaround" is a source of buggy/redundant code.

0 Likes
Anonymous
Not applicable

Is this workaround still relevant with WICED 5.2?