I have two functions, one of them are a function with array and the other one is with pointer parameter. They return the same result but I don't know which one is better to use.

#define SIZE 1024

int sumA(int a[SIZE][SIZE])
{
  int sum = 0;
  for(int y = 0; y < SIZE; y++)
    for(int x = 0; x < SIZE; x++)
      sum += a[x][y];

  return sum;
}

int sumB(int *a)
{
  int sum[4] = {0, 0, 0, 0};

  for( int i = 0; i < SIZE*SIZE; i += 4 )
  {
      sum[0] += a[i+0];
      sum[1] += a[i+1];
      sum[2] += a[i+2];
      sum[3] += a[i+3];
  }

  return sum[0] + sum[1] + sum[2] + sum[3];
}
    
Magic numbers like i += 4 are very bad; so bad that it immediately makes sumA the better choice. If you must use sizes of elements, use the sizeof operator, not a hardcoded number. – Carey Gregory 6 hours ago
1  
Why does sumB maintain 4 separate sums only to sum them all at the end? Why not just have one sum like sumA? – Chris Drew 6 hours ago
1  
@Bora U Why did you delete the code? – Johnathan Gross 6 hours ago
    
I have rollback as the answers/comments did not make any sense – Ed Heal 5 hours ago
    
int sum[4] = {0, 0, 0, 0}; -> int sum[] = {0, 0, 0, 0}; - Let the compiler do the counting – Ed Heal 5 hours ago

4 Answers 4

In both cases you are passing the array by reference, so no difference there.

Your sum function apparently has a lot of knowledge about the array that is passed in. So I believe it is better to force the array to be of the kind that is expected by the function.

Edit: If you pass in a variable of type int[][] into a function that accepts an int pointer (int*), you have to explicitly cast the variable to int* or the compiler will not accept it.

Therfore

int sumA(int a[SIZE][SIZE])

is the better of the two.

sumB is better from a performance stand point as you are loop unrolling. The thing to be careful of is whether a is a multiple of 4 which you do not check for and could cause the program to crash.

Edit: sumA is certainly better as it is a lot more strict (it knows the exact dimensions)

    
I'm skeptical that once the compiler is done optimizing that sumB will have any performance advantage at all. – Carey Gregory 6 hours ago
1  
@CareyGregory on coliru (gcc) with O3 the second one is 10x faster – Sopel 5 hours ago

The first solution is more typesafe but is bad for performance:

  1. Loops are in order that subsequent adds jump in memory by SIZE*sizeof(int) bytes each, this is bad for cache
  2. You have to rely on compiler unrolling the loop and to even realize that it can be done by one instead of two

Second solution does manual loop unrolling which with the distributed sum variables works well with cpu pipelining (if it isn't vectorized in the first place) and makes less branches.

Both get vectorized, but the second one better. In the first one, replacing loop order does improve things, but does not make the resulting assembly equal (while they are close in speed the second one is few times longer). https://godbolt.org/g/bW1Jkd I measured 10x difference in performance (with -O3 on coliru, with gcc) in favor of the second solution).

Therefore I suggest a hybrid of the two of them:

int sumA(int a[SIZE][SIZE])
{
  static_assert(SIZE % 4 == 0);
  int* flat_a = &(a[0][0]);

  int sum[4] = {0, 0, 0, 0};

  for( int i = 0; i < SIZE*SIZE; i += 4 )
  {
      sum[0] += flat_a[i+0];
      sum[1] += flat_a[i+1];
      sum[2] += flat_a[i+2];
      sum[3] += flat_a[i+3];
  }

  return sum[0] + sum[1] + sum[2] + sum[3];
}

It's not a complicated function, everything is still easly readable.

Also I don't think the 4 constant should be made 'non magic' unless the unrolling is made completely generic, that would require some template magic though. Naming a value should indicate that it can change without completely breaking everything.

Between the 2, I will use more typed, but with correct naming sumB might be viable and more generic: int sumSIZESIZEints(const int*)

Contrary to what you expect, one SIZE of sumA is ignored resulting in

int sumA(int (*a)[SIZE])

The even more typed would be:

int sum(const int (&a)[SIZE][SIZE])
{
    return std::accumulate(&a[0][0], &a[0][0] + SIZE * SIZE, 0);
}