Here is my version of MatrixTransp, which is just optimized in the same way as the MatrixMatrixMult
Code: Select all
void MatrixTransp(float A[][], float &C[][], int R, int S) {
float tmp[];
ArrayInit2D(C,tmp, 0, S,R);
int s = S; lbl_Trans_start_s: s--;
//tmp = C[s];
asm{ index tmp, C, s }
int r = R; lbl_Trans_start_r: r--;
float val_temp;
float arr_temp[];
// tmp[r]=A[r][s];
asm{ index arr_temp, A, r }
asm{ index val_temp, arr_temp, s }
asm{ replace tmp, tmp, r, val_temp }
asm{ brtst GT, lbl_Trans_start_r, r }
asm{ replace C, C, s, tmp }
//C[s] = tmp;
asm{ brtst GT, lbl_Trans_start_s, s }
}
Time for 100 runs with a 3x3 array:
Original: 427
Muntoo: 300
Mine: 241
Time for 100 runs with an 10x7 array:
Original: 2654
Muntoo: 1608
Mine: 1178
I'm going to try to see if I can improve performance further though. I have a few ideas, but it is not quite sorted out yet.
(And because I prioritized playing games last night, munto came with his version first. Competition is getting though
)
John, what kind of optimizations have you added to level 3?
Right now there are still some reluctant mov statements when doing simple array reads and writes as shown above. Have this been improved? (If so, just for these statements, or for all statements which have these issues?)
What about boolean expressions in conditional statements? It is probably what is used most in NXC code, yet they are far from efficient. If you have not done it yet, I did some work on figuring out how to do this efficiently for both simple and complex expressions. If you are interested I can finish it and pass it to you. (Mostly making sure that there isn't any errors and make sure I didn't forgot any seldom used operators and something like that.)