I ran some tests and they show something different.
See attached project, use it as suggested with no, 4 or 2 input/output pins. If your values tend to differ from mine, try to play around with the optimization-features.
Some of the flash-bytes are used to define/program the control-registers for the modules at startup. To save flash, you may use the ECC-part of the memory to store these initialization data.