Since we recently switched from -O0 to -Os, an increase in the loop count as well as the addition of __asm__("nop") is required (so that the loop doesn't get optimized/removed). The real fix is to add a proper timer-based delay function, of course. Also, fix a bunch of cosmetic issues and typos.