The previous implementation of copy_from_pm assumed the destination buffer was aligned on a 16-bit boundary. On M0/M0+ cores (stm32F0*, stm32L0*) this causes a hard fault. This implementation is from Kuldeep Dhaka's tree; it does a 16-bit copy only if the destination buffer is aligned, otherwise a bytewise copy. Fixes GH issues #401, #461