1 Reply Latest reply on Mar 2, 2020 11:00 PM by DheerajK_81

    Seems like GraphicLCDIntf_WriteM8 could be optimized ...

    LaPe_296836

      The generated code is:

       

      void LCD_INTERFACE_WriteM8(uint8 d_c, uint8 wrData[], uint16 num)

      {

          uint32 i;

         

          for(i = 0u; i < num; i++)

          {

              while((LCD_INTERFACE_STATUS_REG & LCD_INTERFACE_CMD_QUEUE_FULL) != 0u)

              {

                  /* The command queue is full */

              }  

              LCD_INTERFACE_CMD_FIFO_REG = d_c; 

       

              #if (LCD_INTERFACE_BUS_WIDTH == 16u)

                  CY_SET_REG16(LCD_INTERFACE_DATA_FIFO_PTR, wrData[i]);

              #else /* 8-bit interface */

                  LCD_INTERFACE_DATA_FIFO_REG = wrData[i];

              #endif /* LCD_INTERFACE_BUS_WIDTH == 16u */

          }

      }

       

      Is it really necessary to write the d_c value every time in the loop?

       

      Also, it would be nice to have a version of this in hand-written assembler optimized for the case where the array is a multiple of 3 (for RGB) -- in that case you could unroll the loop by 3 without any complexity overhead.  The current inner loop is compiled (for speed) as:

       

      #           .L21:
      # 2378      ldrbr3, [r4]    // Get FIFO full flag
      # DB07      lslsr3, r3, #31 // Check LSB
      # FCD4      bmi .L21            // If set, loop
      # 3070      strbr0, [r6]    // Write the D/CX value (1 for data), req'd by HW?  Or maybe not?
      # 11F8013B  ldrbr3, [r1], #1// Get the next pixel value to send, increment source pointer
      # 2B70      strbr3, [r5]    // Store pixel value
      # 9142      cmp r1, r2          // Is source pointer at the end?
      # F6D1      bne .L21            // If not, loop

       

      Sorry for the table inserted above -- not sure why I couldn't get around that.

       

      So, with unrolling we'd save 2 out of 3 instances of:

          cmp r1, r2

          bne .L21

       

      Overall, going from an average of about 15 cycles per loop iteration, to about 11 cycles per loop!

       

      I was hoping for a write cycle time of 40 ns (the hardware interface speed), and the best I'm seeing is 280 ns.

       

      Also, I'm currently running Cortex M4 at 100 MHz.  I would like to run at 150 MHz (on CY8CPROTO-063 board).

       

      Thanks for any help!

       

      Larry

        • 1. Re: Seems like GraphicLCDIntf_WriteM8 could be optimized ...
          DheerajK_81

          Yes, it is necessary to write the d_c line in every write cycle.  Each write cycle (WRX high low high sequence) consists of 3 control signals (DCX, RDX, WRX) and data signals (DB[17:0]). DCX bit is a control signal, which tells if the data is a command o r a data. The data signals are the command if the control signal is low (=’0’) and vice versa it is data (=’1’).

          graphicslcd.PNG

           

          When I say its necessary, I mean it w.r.t to the CY8CKIT-028-TFT Shield which has the NewHaven Display (NHD-2.4-240320CF-CTXI-F). Please refer to this datasheet for more information on this display.

           

          If you have a display wherein they mention the write cycle sequence to not need the d_c line to be used in each write, you can add the the code outside the for loop. Hope this makes sense.

           

          To operate your application at 150MHz, you can make use of the PLL. In the Design Wide Resources Tab, click Clocks > Edit Clocks. Under FLL/PLL tab, enable the PLL and set the frequency to 150MHz as shown below:

           

          Now go to the High Frequency Clocks tab and change Clk_HF0 to use Path1 (PLL output). You will need to change the divider to 2 for Clk_Peri since it has a legal maximum of 100Mhz.  Set the divider for Clk_Fast to 1 to get 150MHz. Clk_Fast is used by the CM4 as you require.

           

          Hope this helps!

           

          Regards,

          Dheeraj