MQTT Reconnection causing hang in the system

Tip / Sign in to post questions, reply, level up, and achieve exciting badges. Know more

cross mob
Anonymous
Not applicable

Hi mwf_mmfae,

I am not able to do a Reconnect to MQTT If the connection goes down. I can see a hang in the system if I try to do a MQTT Reconnect. While debugging I found that its coming from a call to ssl_handshake_server_async()

I tried with 3.5.2 SDK and also with 4.0 (Back porting the core changes from 4.0 to 3.5.2 wrt BESL)

The code snippet is

   while(1)

{

  do

  {

  ret = aws_mqtt_conn_open( app_info.mqtt_object, mqtt_connection_event_cb );

  connection_retries++ ;

  } while ( ( ret != WICED_SUCCESS ) && ( connection_retries < WICED_MQTT_CONNECTION_NUMBER_OF_RETRIES ) );

do

{

ret = aws_mqtt_app_subscribe( app_info.mqtt_object, app_info.shadow_delta_topic , WICED_MQTT_QOS_DELIVER_AT_MOST_ONCE );

connection_retries++ ;

} while ( ( ret != WICED_SUCCESS ) && ( connection_retries < WICED_MQTT_CONNECTION_NUMBER_OF_RETRIES ) );

      shadow_close();

  wiced_rtos_delay_milliseconds(1000);

}

In my shadow close I am just calling

    mqtt_network_deinit(&(((mqtt_connection_t*)app_info.mqtt_object)->socket));

    mqtt_connection_deinit((mqtt_connection_t*) app_info.mqtt_object);

Am I doing any mistake. Please go through this and give some suggestions

If posible can you provide me a sample application or an API which does the Reconnection for MQTT, which is like mandatory.

46 Replies
Anonymous
Not applicable

Hi kausik​ and ntejaswa

Adding to this thread, is the keep alive mechanism working in 3.5.2 or 4.0? I have seen like even though I have kept it as 5 sec disconnect notification is only happening after 30 minutes

0 Likes

SDK 4.0 always set keep alive to zero on MQTT connect packet.

I believe this is a bug and your can modify like this.

#ifndef KEEP_POSSIBLE_MQTT_KEEP_ALIVE_BUG

    MQTT_BUFFER_PUT_SHORT( &frame->buffer, args->keep_alive );

#else

    MQTT_BUFFER_PUT_SHORT( &frame->buffer, 0 ); /* Keep alive for now */

#endif

Note that LWT may not behave as expected if keep alive is set to zero.

According to Spec 3.1.1, server may choose to disconnect the client :

     A Keep Alive value of zero (0) has the effect of turning off the keep alive mechanism. This means that, in this case, the Server is not required to disconnect the Client on the grounds of inactivity.
     Note that a Server is permitted to disconnect a Client that it determines to be inactive or non-responsive at any time, regardless of the Keep Alive value provided by that Client.

And server may not published LWT when keep alive is off, unless other mechanisms are used to detect unexpected disconnect.

SDK 4.0.1 fixed this exactly the same way as above and it "looks" good to me.

I'll try to report if any issue is found.

Anonymous
Not applicable

Adding to this. I would like to know the reason why I continuously gets a WICED_MQTT_DISCONNECT_EVENT when I do a multiple publish or subscribe. I have seen this behavior when we go through testing where our testers run scripts to keep on publish to the same thing. I have seen that I get mostly the disconnect event from the following code snippet. What is the reason and what is the apparent solution for this.

static wiced_result_t mqtt_manager_tick( void* arg )

{

    mqtt_connection_t *conn = (mqtt_connection_t *) arg;

    if ( mqtt_manager( MQTT_EVENT_TICK, NULL, conn ) != WICED_SUCCESS )

    {

        /* Publish is an async method (we don't get an OK), so we simulate the OK after sending it */

        if ( conn->callbacks != NULL )

        {

            wiced_mqtt_event_info_t event;

            event.type = WICED_MQTT_EVENT_TYPE_DISCONNECTED;

            event.data.err_code = WICED_MQTT_CONN_ERR_CODE_INVALID;

            conn->callbacks( (void*) conn, &event );

        }

        return WICED_ERROR;

    }

    return WICED_SUCCESS;

}

Anonymous
Not applicable

Hi,

Any updates

0 Likes

This "disconnect" seems to be the same thing as "symptom 1" reported in this thread.

The deepest place I traced in src is in mqtt_network.c :

    static wiced_result_t mqtt_disconnect_callback( wiced_tcp_socket_t *socket, void *args )

The above callback is registered to network stack when connecting to broker, and is designed to be called on "TCP disconnected" event.

I have no idea why it is called so unexpectedly, even if I put any one of my boards (BRCM EVB / Avnet EVB / SPIL EVB / every our customized PCB) less than 1 meter away from APs (Apple Airport Extreme / DLink / ASUS / smartphone hotspots), while all other laptops / smartphones in the same office always kept connected.

Currently it looks like Cypress is stating "you developers just do reconnect on disconnected events" (mwf_mmfae​, please correct me if I'm wrong). I totally agree that we should implement reconnect mechanism to recover from use cases such as "very long distance", "noisy background" or "poor WIFI AP". But unexpected disconnection in fairly good environments is not acceptable for good, reliable products, isn't it?

The problem is you don't know how to correctly implement the reconnect mechanism.

Your logic depends on the event you got, however, sometimes you got unexpected event and sometime you don't get the expected event.

AFAICT, it's something wrong in lower layer of the SDK.

Anonymous
Not applicable

Hi anandram,

You need to make logic like, if internet connection goes down and then come back you need to re init all MQTT process.

just you put HTTP get request to google.com and  it will revert back response.

if its true then WIFI module successfully connected with internet otherwise internet connection loss.

Thanks & Regards

Chintan patel

0 Likes