For the first test we will calculate the number of divisors for the fist N natural numbers, will test with N = {1048576, 2097152, 3145728, 4194304, 5242880 6291456, 7340032, 8388608, 9437184 }.
The complexity of the algorithm that i have implemented is n*sqrt(n) so normally it will not scale very well with large values of N.
But before we begin let me introduce you to my rig:
CPU: Phenom X4 960T
GPU: Radeon 7770 GHZ Edition
The price of the CPU was almost identical to that of the GPU so i think this is a fair comparison to see how much more performance you can get for the same amount of money if you use a aparapi and a GPU to solve it.
The performance gain is from 17X for the small instances up to 41X for the largest ones. This is almost unbelievable if you take into account how easy was to modify the algorithm into an Aparapi kernel and how cheap is to buy one of this cards.
In the next weeks i will test this technology with some other problems to see how will perform.
Below are the results and the java code i have used for this test:
Results
1: import com.amd.aparapi.Kernel;
2: public class NumberOfDivisors {
3: public static void main(String[] arg) {
4: for (int k = 1; k <= 9; k++) {
5: long min = Long.MAX_VALUE;
6: for (int t = 0; t < 4; t++) {
7: final int[] a;
8: a = new int[1024 * 1024 * k];
9: for (int i = 0; i < a.length; i++) {
10: a[i] = 0;
11: }
12: Kernel kernel = new Kernel() {
13: @Override
14: public void run() {
15: int k = getGlobalId();
16: for (int i = 2; i * i <= k; i++) {
17: if (k % i == 0) {
18: a[k]++;
19: }
20: }
21: }
22: };
23: kernel.setExecutionMode(Kernel.EXECUTION_MODE.JTP);
24: long start = System.currentTimeMillis();
25: kernel.execute(a.length);
26: long end = System.currentTimeMillis();
27: min = Math.min(min, end - start);
28: // System.out.println(end - start + " ms");
29: }
30: System.out.println(k + " " + min + ",");
31: }
32: }
33: }
No comments:
Post a Comment