This speeds up memcpy by copying a word at a time if source and destination are
aligned in mod 4. That is, if n and m are a positive integer:
4n -> 4m: aligned, 4x speed.
4n -> 4m+1: misaligned.
4n+1 -> 4m+1: aligned in mod 4, 4x speed.
Ran the unit test on Peppy:
> runtest
...
Running test_memcpy... (speed gain: 120300 -> 38103 us) OK
...
Ran make buildall -j:
...
Running test_memcpy... (speed gain: 2084 -> 549 us) OK
...
Note misaligned case is also optimized. Unit test runs in 298 us on Peppy while
it takes about 475 with the original memcpy.
TEST=Described above.
BUG=chrome-os-partner:23720
BRANCH=none
Signed-off-by: Daisuke Nojiri <dnojiri@chromium.org>
Change-Id: Ic12260451c5efd0896d6353017cd45d29cb672db
Tested-by: Daisuke Nojiri <dnojiri@google.com>
Reviewed-on: https://chromium-review.googlesource.com/185618
Reviewed-by: Randall Spangler <rspangler@chromium.org>
Reviewed-by: Vincent Palatin <vpalatin@chromium.org>
Commit-Queue: Daisuke Nojiri <dnojiri@google.com>