5 Replies Latest reply on Oct 25, 2020 1:46 AM by NiLe_4796031

    ble busy vs. throughput

    NiLe_4796031

      I'm having trouble getting a sustained 1.2Mbps from the BLE on the PSoC 63 dev kit. I'm setting it up a connection with a 2M LE PHY and running this hot loop to push out notification packets as quickly as possible:

       

          static uint32 buffer[NOTIFY_MAX_LEN];  // NOTIFY_MAX_LEN is in bytes, but we allocate uint32 here for alignment

          memset(&buffer, 0, NOTIFY_MAX_LEN);

          notify_packet.handleValPair.value.len = NOTIFY_MAX_LEN;

          notify_packet.handleValPair.value.val = &buffer;

         

          uint32_t *packetno = &buffer[0];

          uint32_t *busyno = &buffer[1];  // 4 bytes into buffer

       

          while (1) {

              Cy_BLE_ProcessEvents();

              if (sampling) {

                  if (Cy_BLE_GATT_GetBusyStatus(notify_packet.connHandle.attId) == CY_BLE_STACK_STATE_FREE) {

                      cy_en_ble_api_result_t api_result = Cy_BLE_GATTS_Notification(&notify_packet);

                      if (api_result != CY_BLE_SUCCESS) {

                          CY_ASSERT(false);

                      }

                      ++*packetno;

                      *busyno = 0;

                  } else {

                      ++*busyno;

                  }

              }

          }

       

      I'm getting timing problems correlated with 'busy' responses from the BLE stack. The following data is timestamp (seconds of the real-clock minute) p: packetno b: busyno.

       

      29.5292574 p: 0  b: 0

      29.5322507 p: 1  b: 0

      29.5352564 p: 2  b: 0

      29.5372934 p: 3  b: 0

      29.5412882 p: 4  b: 59128

      29.5432429 p: 5  b: 0

      29.5462694 p: 6  b: 3230

      29.5492359 p: 7  b: 0

      29.5522354 p: 8  b: 5328

      29.5552358 p: 9  b: 0

      29.5582419 p: 10  b: 3230

      29.5602476 p: 11  b: 0

      29.5642403 p: 12  b: 5278

      29.5662404 p: 13  b: 0

      29.5692399 p: 14  b: 3229

      29.5722412 p: 15  b: 0

      29.5752360 p: 16  b: 5321

      29.5782358 p: 17  b: 0

      29.5812354 p: 18  b: 3230

      29.5872353 p: 19  b: 0

      29.5912468 p: 20  b: 5072

      29.6480122 p: 21  b: 0

      29.6508323 p: 22  b: 6238

      29.6528226 p: 23  b: 0

      29.6568276 p: 24  b: 54304

      29.6590555 p: 25  b: 0

      29.7081333 p: 26  b: 3229

      29.7109300 p: 27  b: 0

      29.7138875 p: 28  b: 46741

      29.7158918 p: 29  b: 0

      29.7189114 p: 30  b: 3217

      29.7219352 p: 31  b: 0

      29.7253067 p: 32  b: 5291

      29.7282809 p: 33  b: 0

      29.7301239 p: 34  b: 3229

      29.7331253 p: 35  b: 0

      29.7361498 p: 36  b: 5339

      29.7391269 p: 37  b: 0

      29.7425234 p: 38  b: 3230

      29.7445339 p: 39  b: 0

      29.7485335 p: 40  b: 5285

      You can see at packetno 24 there's a large busyno count of 54304, which then leads to a 0.05 second delay between that and the next packet. Notice how 21 packets were sent during [29.52s .. 29.60s) vs. 5 packets during [29.60s .. 29.70s). The throughput would be fine if not for these occasional large gaps, which in turn cause the throughput to drop by about 300kbps total (measuring average throughput over a minute).

       

      There's a related question BLE stack busy prevents notification sending which suggests changing queue depth via right clicking on something, but I've looked in PSoC Creator and there's no such option. cy_ble_stack.h has a comment with the instructions:

           *  To increase the BLE Stack's default queue depth(CY_BLE_L2CAP_STACK_Q_DEPTH_PER_CONN) and achieve better throughput for the attribute MTU greater than 32,

           *  use the AddQdepthPerConn parameter in the 'Expression View' of the Advanced tab in the BLE component GUI. To Access the 'Expression View', right click on

           *  the 'Advanced' tab in th BLE Component GUI and select the 'Show Expression View' option.

      but I don't see any Expression View or Show Expression View. There is an advanced tab but no right-click option.

       

      What can I do to improve the throughput of my single-connection 2M PHY one characteristic notification sending system with the BLE stack single-core on CM4 program?

        • 1. Re: ble busy vs. throughput
          GaneshD_41

          Hi,

           

          1. but I don't see any Expression View or Show Expression View. There is an advanced tab but no right-click option.

           

          To enable the expression view, go to Tools--> Options --> Design Entry --> Component Catalog and Enable Param Edit views as shown in the image below:

           

          expn_view.PNG

           

          2. What is the throughput you are getting with PSoC 6 Throughput out of the box code example?

          https://www.cypress.com/documentation/code-examples/ce222046-psoc-6-mcu-bluetooth-low-energy-ble-connectivity-ble-throug…

           

          3. Are you using our development kits for both Server and Client? Also tell us what is the distance of separation between Server and Client.

           

          Please try to increase the output power level of TX and see if there is any improvement. If possible please attach your Client and Server projects for us to review the firmware and settings in detail.

           

          Thanks

          Ganesh

          1 of 1 people found this helpful
          • 2. Re: ble busy vs. throughput
            NiLe_4796031

            1. Thank you! I've set the queue depth max to 100 and the behaviour has changed. I'm now seeing long stretches with zero-busy but there still delays without the stack reporting busy:

            21.8175873 p: 29  b: 0

            21.8205914 p: 30  b: 0

            21.8235466 p: 31  b: 0

            21.8266331 p: 32  b: 0

            21.8520539 p: 33  b: 0

            21.8540651 p: 34  b: 0

            21.8570655 p: 35  b: 0

            21.8600660 p: 36  b: 0

            21.8630542 p: 37  b: 0

            21.9120541 p: 38  b: 0

            21.9150539 p: 39  b: 0

            21.9172115 p: 40  b: 0

            21.9202087 p: 41  b: 0

            21.9230543 p: 42  b: 0

            21.9260539 p: 43  b: 0

            21.9280571 p: 44  b: 0

            21.9716949 p: 45  b: 0

            21.9746477 p: 46  b: 0

            21.9768011 p: 47  b: 0

            21.9798032 p: 48  b: 0

            and when the stack does go busy, it goes busy for a long time (subset of packets marked with busy != 0):

            22.4665515 p: 104  b: 677682

            23.0663724 p: 208  b: 507997

            23.6720110 p: 312  b: 511308

            24.3261274 p: 416  b: 555815

            25.0486755 p: 520  b: 618584

            25.4658381 p: 624  b: 339045

            26.0515141 p: 728  b: 444960

            26.6061265 p: 833  b: 511021

            27.1727199 p: 937  b: 476721

            27.7750679 p: 1041  b: 508591

            Ultimately I'm getting the same throughput for a minute long test run.

             

            2/3. I think we need two kits to run the benchmark test? I've looked at its code closely and copied everything that looked even plausibly relevant. I'm getting a second kit to run the test with to arrive later this week.

             

            The client is a Windows 10 laptop (Dell XPS 15 7590 -- Killer AX1650[1]) which is about 25cm away from the PSoC 6 dev kit. I tried turning up the connection TX power from default 0 dBm to 4 dBm and didn't notice any difference.

             

            Note that the messages are only notifications so there shouldn't be any resending, and the packet numbers ("p:") I'm quoting are in the notification data itself, so any dropped messages would show a skipped counter. I would expect TX power problems to cause dropped packets, not delayed packets? Or is there some other bidirectional communication between the two parties that is renegotiating the channel as long as new data is being transmitted?

             

            [1] Dell support calls it a "Killer 1650x", and I can't find any evidence that Killer supports BLE as opposed to just BT, but I can't find any other communication module on this laptop, so I assume that's it?

            • 3. Re: ble busy vs. throughput
              NiLe_4796031

              2/3. I think we need two kits to run the benchmark test? I've looked at its code closely and copied everything that looked even plausibly relevant. I'm getting a second kit to run the test with to arrive later this week.

              Here's the output with the original code:

              *****************CE222046: PSoC 6 MCU BLE Throughput Measurement *****************
              Role : Client (GATT IN)
              **********************************************************************************

              Scanning for GAP Peripheral with address: 00:A0:50:AA:BB:FF

              Found target device with address: 00:A0:50:AA:BB:FF

              Scan stopped as device was found. Initiating Connection...

              Connected to Device

              Throughput is: 1062 kbps.
              Throughput is: 1177 kbps.
              Throughput is: 1162 kbps.
              Throughput is: 1131 kbps.
              Throughput is: 1203 kbps.
              Throughput is: 1187 kbps.
              Throughput is: 1185 kbps.
              Throughput is: 1218 kbps.
              Throughput is: 1190 kbps.
              Throughput is: 1179 kbps.
              Throughput is: 1185 kbps.
              Throughput is: 1157 kbps.
              Throughput is: 1203 kbps.
              Throughput is: 1197 kbps.
              Throughput is: 1137 kbps.
              Throughput is: 1191 kbps.
              Throughput is: 1208 kbps.
              Throughput is: 1128 kbps.
              Throughput is: 1228 kbps.
              Throughput is: 1186 kbps.
              Throughput is: 1161 kbps.
              Throughput is: 1212 kbps.
              Throughput is: 1116 kbps.
              Throughput is: 1181 kbps.
              Throughput is: 1199 kbps.
              Throughput is: 1152 kbps.
              Throughput is: 1155 kbps.
              Throughput is: 1194 kbps.
              Throughput is: 1215 kbps.
              The first obvious difference is that I'm running it single-core on CM4. I took CE222046_GATT_Out and set the BLE block to CM4 and replaced main_cm0p.c's main with:

              int main(void)

              {

                  cy_en_ble_api_result_t apiResult;

                  __enable_irq(); /* Enable global interrupts. */

                  Cy_SysEnableCM4(CY_CORTEX_M4_APPL_ADDR);

                  while (1) {

                      Cy_SysPm_CpuEnterSleep(CY_SYSPM_WAIT_FOR_INTERRUPT);

                  }

              }

              and that causes a slightly lower bandwidth:

              Scanning for GAP Peripheral with address: 00:A0:50:AA:BB:FF

              Found target device with address: 00:A0:50:AA:BB:FF

              Scan stopped as device was found. Initiating Connection...

              Connected to Device

              Throughput is: 819 kbps.

              Throughput is: 1135 kbps.

              Throughput is: 1177 kbps.

              Throughput is: 1141 kbps.

              Throughput is: 928 kbps.

              Throughput is: 977 kbps.

              Throughput is: 991 kbps.

              Throughput is: 994 kbps.

              Throughput is: 938 kbps.

              Throughput is: 966 kbps.

              Throughput is: 971 kbps.

              Throughput is: 968 kbps.

              Throughput is: 1001 kbps.

              Throughput is: 926 kbps.

              Throughput is: 1025 kbps.

              Throughput is: 1012 kbps.

              Throughput is: 976 kbps.

              Throughput is: 1010 kbps.

              Throughput is: 1038 kbps.

              Throughput is: 1034 kbps.

              Throughput is: 1050 kbps.

              Throughput is: 1054 kbps.

              Throughput is: 1079 kbps.

              Throughput is: 1075 kbps.

              Throughput is: 1081 kbps.

              Throughput is: 1116 kbps.

              Throughput is: 1071 kbps.

              Throughput is: 1089 kbps.

              Throughput is: 1111 kbps.

              Throughput is: 1099 kbps.

              but still better than I'm measuring through to my laptop.

              • 4. Re: ble busy vs. throughput
                NiLe_4796031

                I tried switching this around a bit and shrunk the notify packet sizes down to 1 byte of user data. To my surprise, the blocks of time where nothing is being transmitted still remain despite the lower total bandwidth!

                 

                At this stage, I'd like to know what the CPU is doing that corresponds to those gaps. Even just a lot of samples of PC at random times would suffice. I've been unable to figure out how to do that. I'm looking into two approaches and would appreciate any input on either:

                1. software, use a timer-driver interrupt and attempt to read out the thread mode PC from the interrupt handler. Is the previous PC in the LR register in the interrupt handler? How do I read it?

                2. hardware, such as ETM or ITM. I don't have any ARM/Cortex debug cables, I have a Saleae logic analyzer (reads voltages only, can not transmit) and a "JTagulator". Do I need to buy a ULINKpro, or is there something simpler I can use?

                • 5. Re: ble busy vs. throughput
                  NiLe_4796031

                  If possible please attach your Client and Server projects for us to review the firmware and settings in detail.

                  If you have a github username you can send me then I can give you read permissions to the client project.