Since we recently switched from -O0 to -Os, an increase in the loop count
as well as the addition of __asm__("nop") is required (so that the loop
doesn't get optimized/removed).
The real fix is to add a proper timer-based delay function, of course.
Also, fix a bunch of cosmetic issues and typos.