NXC: Matrix algebra library

muntoo · Post by **muntoo** » 13 Jul 2011, 23:49

Share and Enjoy! (Best I could do.)

void MatrixTransp(float &A[][], float &C[][], int R, int S)
{
	int s, r;
	float tmp[];
	
	float A_r[];
	float A_s;
	float C_s[];
		
	// ArrayInit2D(C, tmp, 0, S, R)
	asm
	{
		arrinit tmp, 0, R
		arrinit C, tmp, S
		arrinit tmp, 0, 0
	}
	
	r = R;
	matrix_transp_for_r:
	asm { sub r, r, 1} // --r;
	
		// A_r = A[r];
		asm { index A_r, A, r }

		s = S;
		matrix_transp_for_s:
		asm { sub s, s, 1} // --s;
		
			// C[s][r] = A[r][s];
			asm
			{
				// A_s = A[r][s];
				index A_s, A_r, s

				// C_s = C[s];
				index C_s, C, s

				// C[s][r] = A_s;
				replace C_s, C_s, r, A_s // C_s[r] = A_s;
				replace C, C, s, C_s // C[s] = C_s;
			}
			
		asm { brtst GT, matrix_transp_for_s, s }
		
	asm { brtst GT, matrix_transp_for_r, r }
}

HaWe · Post by **HaWe** » 14 Jul 2011, 08:57

thanks to all!
wrt John's changes: of course a masterpiece (although admittedly hard to read^^ -
I wonder if it's possible to make the NXC compiler generate more optimized code from the start so that we don't have to use NBC...?)
(mun2, I haven't integrated your Transpose version because I hadn't time to test it yet - BTW: did you already make some speed tests with both versions?)

I changed John's "passing to functions" syntax a little to apply to my own ones, and I added a Rotate function to rotate in the geographical sense (like compasses do).
To distinguish both clearly from another, I called them
MatrixRotate2DMath (positive= counter clockwise rotation)
MatrixRotate2DGeo (positive= clockwise rotation)

Calculating with the latest functions, I came upon 1 problem:
Although the mathematical algorithms itself (both my old ones as well as the new ones as far as I tried) all are working both for more-dim matrices and for 1-dim vectors, but e.g. the functions for multiplying do not allow to pass a 2x2 matrix and a 2x1 vector like V[2]={0;1} as a parameter (i.e. although A[2][2] x V[2] and V[2] x A[2][2] are well-defined):
MatrixMatrixMult(A,V,C, 2,2,1) causes an error.

My question:
Can we have function overloading soon or will we have to build and to use always 4 differently named functions in future for
A[m][n] x B[n][k],
A[m][n] x V[n] and
V[n]T x B[n][k]
V[n]T x W[n]
?

spillerrec · Post by **spillerrec** » 14 Jul 2011, 13:36

Here is my version of MatrixTransp, which is just optimized in the same way as the MatrixMatrixMult

Code: Select all

void MatrixTransp(float A[][], float &C[][], int R, int S) {
	float tmp[];
	ArrayInit2D(C,tmp, 0, S,R);

	int s = S;	lbl_Trans_start_s: s--;
		//tmp = C[s];
		asm{ index tmp, C, s }
		int r = R;	lbl_Trans_start_r: r--;
			float val_temp;
			float arr_temp[];
		//	tmp[r]=A[r][s];
			asm{ index arr_temp, A, r }
			asm{ index val_temp, arr_temp, s }
			asm{ replace tmp, tmp, r, val_temp }
		asm{ brtst GT, lbl_Trans_start_r, r }
		asm{ replace C, C, s, tmp }
		//C[s] = tmp;
	asm{ brtst GT, lbl_Trans_start_s, s }
}

Time for 100 runs with a 3x3 array:

Original: 427
Muntoo: 300
Mine: 241

Time for 100 runs with an 10x7 array:

Original: 2654
Muntoo: 1608
Mine: 1178

I'm going to try to see if I can improve performance further though. I have a few ideas, but it is not quite sorted out yet.
(And because I prioritized playing games last night, munto came with his version first. Competition is getting though

)

John, what kind of optimizations have you added to level 3?
Right now there are still some reluctant mov statements when doing simple array reads and writes as shown above. Have this been improved? (If so, just for these statements, or for all statements which have these issues?)
What about boolean expressions in conditional statements? It is probably what is used most in NXC code, yet they are far from efficient. If you have not done it yet, I did some work on figuring out how to do this efficiently for both simple and complex expressions. If you are interested I can finish it and pass it to you. (Mostly making sure that there isn't any errors and make sure I didn't forgot any seldom used operators and something like that.)

HaWe · Post by **HaWe** » 14 Jul 2011, 15:11

spillerrec, mun2:
I tested both, both calculate fine - but true: sp's is faster - sry mun2 =(
I integrated the speed-up T function in to my TO library. :)

a little problem:
when I have an array A[4][4] ,

: Matrix.jpg (4.88 KiB) Viewed 8909 times

how is it possible to delete e.g. row(1) completely to get the "remainding" array B[4][3] as a result?
and next, to delete line (3), to get the "remainding" array C[3][3] as a result?

: SUBa3.jpg (2.44 KiB) Viewed 8909 times

I will have to do this operation 16 times, for all elements of A[4][4], in order to get all 16 possible 3x3 permutations = remainding 3x3 sub-arrays.

spillerrec · Post by **spillerrec** » 14 Jul 2011, 15:47

Create a new 3x3 array. Copy the values you want over in the new. Overwrite the old array with the new array.
I can make a function that should can do this somewhat efficient. I will post it here later today.

HaWe · Post by **HaWe** » 14 Jul 2011, 16:40

sure, locate each single cell, copy, and paste: will work, but it's complicated.
I was more thinking of something like (pseudo code)

Code: Select all

copy Temp=A;
for s=0...3
  for r=0...3
    delete(Temp, row(r))
    shift the rest rows up
    delete(Temp, column(s))
    shift the rest columns to the left
    S[0...2][0...2]=Temp[0...2][0...2]

spillerrec · Post by **spillerrec** » 14 Jul 2011, 17:08

It is not possible to delete elements an array. However there are some native functions to copy larger parts of an array in order to create a new one. Check out the documentation.

HaWe · Post by **HaWe** » 14 Jul 2011, 17:23

ok, I already thought is was that way.
It's not so important, I just wanted to have all the sub-matrices for calculating the minor matrices, and then using them for the inverse of a 4x4 matrix.
Actually it had been only a challenge, I guess I won't really need it for my current project either.

But I'm still curious about that:

It was me who wrote:Can we have function overloading soon or will we have to build and to use always 4 differently named functions in future for
A[m][n] x B[n][k],
A[m][n] x V[n] and
V[n]T x B[n][k]
V[n]T x W[n]
?

afanofosc · Post by **afanofosc** » 14 Jul 2011, 19:14

Here's a tweaked version of spillerrec's optimized Transp function that looks like regular old NXC code:

Code: Select all

#define ArrayIndex(_out, _A, _i)   asm { index _out, _A, _i }
#define ArrayReplace(_A, _i, _new) asm { replace _A, _A, _i, _new }

#define BranchTest(_cmp, _lbl, _v1) asm{ brtst _cmp, _lbl, _v1 }
#define BranchComp(_cmp, _lbl, _v1, _v2) asm{ brcmp _cmp, _lbl, _v1, _v2 }

void MatrixTransp(float A[][], float &C[][], int R, int S) {
   float tmp[], arr_temp[], val_temp;
   int s, r;
   ArrayInit(tmp, 0, R);
   ArrayInit(C, tmp, S);
   s = S;
   lbl_Trans_start_s:
   {
      s--;
      r = R;
      lbl_Trans_start_r:
      {
         r--;
         ArrayIndex(arr_temp, A, r);
         ArrayIndex(val_temp, arr_temp, s);
         ArrayReplace(tmp, r, val_temp);
      }
      BranchTest(GT, lbl_Trans_start_r, r);
      ArrayReplace(C, s, tmp);
   }
   BranchTest(GT, lbl_Trans_start_s, s);
}

The index of C into tmp before the inner loop is not needed since I left tmp allocated with R elements and the s-th array in C is completely overwritten inside the nested loop. You could also use S as the outer loop index since it does not need to be reset back to its initial value like r does. That would save a single line of code that is only executed once (the s = S line). And if you are worried about the memory used by the R elements in tmp you could add ArrayInit(tmp, 0, 0) after the last BranchTest line.

I have changed the repeat loop in NXC to generate "label/subtract/body/brtst GT label" code instead of the "label1/subtract/brtst LT label2/body/jmp label1/label2" code it was previously generating. I have also added a little more optimizing that only occurs at level 3+. I need some people to try out level 3 to see if they run into any problems. I will post a new test release this evening. The new optimization code at level 3 takes additional time to run, of course.

John Hansen

spillerrec · Post by **spillerrec** » 14 Jul 2011, 20:43

Oh, I didn't notice that index was unneeded at all. I had actually noticed that I hadn't yet removed 's' before I posted it but I let it stay that way since I didn't bother update the test results...

Good idea of using defines to hide the NBC stuff!
It would be nice to have a version of the repeat structure which also gives the variable which holds the current index. For example by having an extra optional parameter, "repeat( amount, var )". It would avoid most of those NBC-ish loops... I would prefer that "repeat( S, S )" wouldn't start by "mov S, S;" though...
Btw, last time I checked (which is several months ago), the for(;;) structure also wasn't optimal either. It had an unneeded jmp somewhere iirc.

Mindboards

NXC: Matrix algebra library

Re: NXC: resolved - Matrix algebra library

Re: NXC: resolved - Matrix algebra library

Re: NXC: Matrix algebra library

Re: NXC: Matrix algebra library

Re: NXC: Matrix algebra library

Re: NXC: Matrix algebra library

Re: NXC: Matrix algebra library

Re: NXC: Matrix algebra library

Re: NXC: Matrix algebra library

Re: NXC: Matrix algebra library

Who is online