I was having problems with poor texturing and pixel generation performance and this led me to a discovery showing that we've got some memory bandwidth problem.
I have found a memory benchmark for ARM somewhere and here is the result: (tested with CPU forced to 800 MHz)
For comparison take a look at some results of a budget ARMv5 based PXA168:Code:# ./bench -c *** Memory Write Throughput (in MB/s) *** method uncached write alloc cached cached + wa STRB 20.13 111.72 21.09 703.36 STR 80.51 116.05 96.58 2360.09 STM4 296.82 151.88 362.11 4626.08 STM8 472.76 151.83 483.36 4627.50 STRD 161.36 137.92 166.89 3869.30 *** Uncached Memory Read/Copy Throughput (in MB/s) *** method PLD=0 PLD=1 PLD=2 PLD=4 PLD=8 PLD=16 PLD=32 PLD=64 PLD=128 LDRB 162.24 144.11 258.62 274.32 269.98 263.23 260.21 259.76 258.64 LDR 193.90 193.66 310.99 351.18 375.03 367.31 360.99 358.13 355.26 LDM 204.18 193.68 306.79 353.04 384.39 381.31 392.85 388.84 385.59 LDRD 204.21 203.42 305.01 351.89 383.71 391.90 384.32 376.56 373.59 CPY_B 19.77 19.60 19.82 20.84 20.51 20.44 20.34 20.32 20.18 CPY_Bwa 72.03 70.54 75.73 64.99 62.53 63.87 68.30 67.75 65.36 CPY_R 76.16 75.08 75.27 75.62 70.64 68.59 67.59 64.94 66.95 CPY_Rwa 82.43 78.56 88.69 85.83 82.24 87.70 87.13 87.24 86.89 CPY_M 172.33 165.57 218.40 273.56 249.96 234.81 226.51 212.76 209.11 CPY_Mwa 87.58 82.60 88.01 89.85 88.74 89.15 88.99 89.15 88.97 CPY_D 143.90 139.86 142.20 148.45 132.86 126.41 123.19 121.28 120.09 CPY_Dwa 85.68 82.83 100.70 101.00 102.80 104.21 104.64 105.03 104.63 *** Cached Memory Read/Copy Throughput (in MB/s) *** method PLD=0 PLD=1 PLD=2 PLD=4 PLD=8 PLD=16 PLD=32 PLD=64 PLD=128 LDRB 679.48 684.08 606.73 622.53 634.39 641.44 644.26 646.50 646.85 LDR 2358.92 2147.46 1534.96 1727.22 1847.36 1909.69 1942.35 1960.75 1970.66 LDM 3896.89 3881.15 2329.95 2860.00 3092.73 3215.32 3290.66 3324.51 3339.91 LDRD 3883.67 3341.83 1738.20 1960.19 2425.67 2521.37 2573.97 2600.14 2610.03 CPY_B 361.39 355.13 333.20 338.04 342.50 342.95 345.25 345.59 345.30 CPY_Bwa 363.01 357.77 335.12 340.67 344.09 345.78 346.47 347.72 346.73 CPY_R 1311.97 1243.69 1011.94 1053.65 1089.78 1110.73 1119.19 1123.18 1127.48 CPY_Rwa 1314.46 1251.09 1013.02 1058.00 1093.04 1113.29 1124.37 1126.51 1132.56 CPY_M 2363.95 1975.07 1521.41 1865.39 2015.65 2086.81 2117.08 2135.36 2143.50 CPY_Mwa 2154.73 1827.76 1430.19 1760.06 1862.13 1915.94 1848.40 1962.59 1966.80 CPY_D 2361.09 2148.24 1531.07 1727.52 1844.28 1907.03 1941.78 1959.55 1964.37 CPY_Dwa 2358.54 2141.83 1531.46 1637.18 1725.21 1770.53 1794.15 1811.14 1818.27
memspeed on Aspenite — ARM, OMAP, Xscale Linux Kernel
The most alarming are cached writes and uncached reads, which look really slow.
I have attached sources and binary of the benchmark I used:
memspeed.tar.gz


8Likes
LinkBack URL
About LinkBacks

Reply With Quote

Bookmarks