Aparapi cpu gpu benchmarks: Levenshtein distance

In this test i tried to adapt the classic Levenshtein_distance algorithm to Aparapi. What i have did was to basically calculate in parallel the values for each color.
Lest say:
A = the first string,
B = the second string
then my algorithm runs the same kernel N = ( len(A)+len(B)-1) times ( this number is equals with the number of different colors ).

I executed the algorithm on some random generated string and here are the results:

The GPU time is 37X slower than the iterative version for the 1024*4 length strings and 14X slower for the 1024*14 length string. Why is this?
I did some testing and the 2 most probably caused are:
a) Kernel start time, if you run the same kernel multiple times even if it doesn't do nothing it takes around 0.6 ms when run in GPU mode and 0.125 ms when run in CPU mode ( the 2 times are without buffer copy and java to opencl code translation ).
b) Low number of operations executed by each kernel run. The maximum number of operations executed at on time is len(B).

What did i learned from this? When you start a kernel make sure it has a lot of data to process at least millions if not billions of operations. If you start a kernel to make 10 000, 100 000 even 1 000 000 if will be slower than running that code on the CPU.

Results
Java code

3 comments:

Hooji11 March 2013 at 08:44
What software do you use to generate your graphs? It is very nice.
vortex11 March 2013 at 11:06
I just used google docs, it very nice because the changes that you do to the graphs are reflected instantly on the blog
Anonymous9 April 2014 at 05:44
It seems weird to me. I think GPU must be faster relatively as the string length getting longer. It may be caused when the group size is set unproperly... I'd like to check it myself soon

Wednesday, 2 January 2013

Levenshtein distance

3 comments: