Page 1 of 7

Need help with efficiency NXC code

Posted: 11 Aug 2011, 01:52
by nxtboyiii
Hi,
I need some help with some code I have. It runs a little slow and I am trying to make it run faster.
It has lots of &&'s and ||'s all mixed up which probably cause part of the lagging.
I would just need help overall, especially the for-loop in the "mGameTask.nxc"
I would greatly appreciate some help.

Thanks,
nxtboy III

Re: Need help with efficiency NXC code

Posted: 11 Aug 2011, 02:38
by linusa
The first thing in optimizing code is always measuring: Find out where it runs slowly. It's often not where you think. Old rule:
Measure twice, optimize once.
I'm not only posting this because it's true, but also because I'd be really happy if you were willing to give my NXC Profiler a spin: http://www.mindstorms.rwth-aachen.de/tr ... XCProfiler

Unfortunately the NXTasy post with a sort-of manual is gone, but you could basically do something like this. In your task Game:

Code: Select all

// SET UP THE PROFILER
// comment this out to silently disable profiling without any side effects!
#define PROFILER_ENABLE
// how many times did we use PROFILER_BEGINSECTION (and ENDSECTION)
#define PROFILER_MAXSECTIONS 7
// name of the binary results file we later analyze
#define PROFILER_RESULTSFILE "game.prf"
// this does all the magic :-)
#include "Profiler01.nxc"

task Game()
{
    PROFILER_START();

    while(1)
    {
        while(play == 1)
        {
            paddx=fmapx == floor(fmapx)?0:1;
            if(SENSOR_2 && SENSOR_1)
            {
                StopAllTasks();
            }
            ShootFireBall();
            Jump();
            mapy=floor(fmapy);
            mapx=floor(fmapx);
            paddy=fmapy == floor(fmapy)?0:1;
            paddx=fmapx == floor(fmapx)?0:1;
            MoveLeft();
            mapy=floor(fmapy);
            mapx=floor(fmapx);
            if(fmapy == floor(fmapy))
            {
                paddy=0;
            }
            else
            {
                paddy=1;
            }
            
            PROFILER_BEGINSECTION(0);
            MoveRight();
            PROFILER_ENDSECTION(0);
            
            mapy=floor(fmapy);
            mapx=floor(fmapx);
            paddx=fmapx == floor(fmapx)?0:1;
            if(!ButtonPressed(BTNCENTER,0) && jc != 0)
            {
                ffd=1;
                jumping=0;
            }
            
            PROFILER_BEGINSECTION(1);
            FireBallHit();
            PROFILER_ENDSECTION(1);
            
            PROFILER_BEGINSECTION(2);
            Down();
            PROFILER_ENDSECTION(2);
            
            s=0;
            mapy=floor(fmapy);
            mapx=floor(fmapx);
            CheckVargs();
            Tile();
            CheckFireBall();
            fballx+=xpp;
            fbally+=ypp;
//********************************************************************************************************************
            // [snipped]
            
//********************************************************************************************************************
            // [snipped]
        }
        PlayToneEx(600,30,1,0);
        
        PROFILER_STOP();
    }
}
As you can see, I added some #define, an #include, and then the profiler-commands. All you have to make sure is, that somewhere in that file, PROFILER_START() gets called once before the other profiler commands. At the end, somewhere, you have to call PROFILER_STOP() once, too. Otherwise it won't work. I'm not sure if I chose the right position in your code...

The main thing is then to mark sections that should be profiled. Set how many sections you have at max with

Code: Select all

#define PROFILER_MAXSECTIONS 7
in the beginning.

Then you have to manually wrap every line or section of code with PROFILER_BEGINSECTION(n) and PROFILER_ENDSECTION(n), where n is a constant integer expression. I think you don't have to have the sections "in order", but it's all an early alpha version, so better do that. You should also make sure that both commands are in the same scoping level, i.e. you begin and end a section within the same "code flow" (for example, avoid having an end-section outside some if-statement that is not reached as often as the begin-section).

That should be it. You should run the program and stop it gracefully (so that PROFILER_STOP() gets executed). You should then find a file called "game.prf" on your NXT. Copy it to your computer, and run the Python script to generate a highlighted HTML version of your code. If you can't get it to work, send me the .prf file, and I'll do it.

You can of course wrap the stuff you marked with //*************** in your code in between profiler sections, I just didn't do it to save code space here.

You have to download Profiler01.nxc and put it to your projects directory, here is a direct download link: http://www.mindstorms.rwth-aachen.de/tr ... iler01.nxc

The profiled program might run slower (up to a factor of 2 to 10), this is normal. You can leave in all the profiler commands and just not #define PROFILER_ENABLE. If you do this, all profiler-commands evaluate to empty macros and should have no effects at all.

More things and an example is described on the project page I linked above. Sorry if this kind of hijacked your thread, but I can think together with the NXC-experts here it could seriously help speed up you program -- once we know where it's slow :-)


Edit: I should note this whole stuff only works for ONE NXC file at the moment. And I guess all profiler section commands should be in one single task. This is all given for your game task.

Re: Need help with efficiency NXC code

Posted: 11 Aug 2011, 20:39
by nxtboyiii
That is very neat! Unfortunately, I do not have a Python compiler. How would I actually speed up the program?

Thanks

Re: Need help with efficiency NXC code

Posted: 11 Aug 2011, 21:46
by linusa
nxtboyiii wrote:That is very neat! Unfortunately, I do not have a Python compiler.
Python is free and open source, you just need an interpreter, 1 installer. You'd usually grab the appropriate version from here http://www.python.org/download/releases/2.7.2/ (for windows, the correct x86 or x64 MSI installer).

After that, you would open a command prompt on your computer and run

Code: Select all

python AnalyzeNXCProfile.py -p game.prf -s mGameTask.nxc -o game.html -v 
This would create a html-file game.html with the profiled & highlighted code. You'd have to download these two python files (the analyzer program) before of course: http://www.mindstorms.rwth-aachen.de/tr ... nippets.py and http://www.mindstorms.rwth-aachen.de/tr ... Profile.py (and put them into the same directory where you have mGameTask.nxc and game.prf).
The game.prf is generated on your NXT by the profiled version of your program.


But you don't have to do this. If you send me the game.prf and the exact version of mGameTask.nxc that created it, I could generated the HTML file for you.
nxtboyiii wrote: How would I actually speed up the program?
Well, the actual optimization comes after that of course. But profiling gives a clou at WHERE to optimize and do the speed ups. Right now, you say your program is slow between the parts you marked with //************** . Did you measure that, or is it just a feeling?


Anyway, I didn't give you a single optimization hint for NXC in this thread yet, sorry. If anyone else wants to go ahead and help, please do so. I just saw the opportunity for a useful "real life test" of the profiler. The results might confirm what you say, that your program is slow between the **** parts, or it might show that your program is slow somewhere else...

You know, there's usually some kind of 90-10 rule for each computer program, which says that "90% of the execution time is spent in 10% of the code". Our goal is now to find that 10%.

Re: Need help with efficiency NXC code

Posted: 11 Aug 2011, 22:21
by nxtboyiii
Well, I know that if I sped that part up(with the //****) then it would go faster.
You see, when there are more enemies, it goes slower because it has to process more enemies. If I make that part faster, then when there are more enemies it won't slow down as much and won't make a huge difference.

I could probably generate and send the game.prf file tomorrow, maybe.

Thanks

Re: Need help with efficiency NXC code

Posted: 11 Aug 2011, 23:30
by linusa
nxtboyiii wrote:I could probably generate and send the game.prf file tomorrow, maybe.
Ok, but no pressure or anything. Just if you feel like it, whenever. Otherwise never mind.

You still would have to place the profiler sections at "interesting parts", where you feel the program might be slow or where you want to know how often / how fast parts get executed. Oe more thing: nested sections are not allowed! I.e. not this:

Code: Select all

PROFILER_BEGINSECTION(0);
// code...
  PROFILER_BEGINSECTION(1);
  // code...
  PROFILER_ENDSECTION(1);
// code...
PROFILER_ENDSECTION(0);
nxtboyiii wrote:Well, I know that if I sped that part up(with the //****) then it would go faster.
You see, when there are more enemies, it goes slower because it has to process more enemies. If I make that part faster, then when there are more enemies it won't slow down as much and won't make a huge difference.

I looked a bit at that code, and there are two general principles you can apply:

1. In your big if-clauses, make assumptions about the conditions and sort them cleverly. I.e. take

Code: Select all

if(fmapy == floor(fmapy) && (E.eat == 3 || E.eat == 7) && (mapx == E.mex || mapx+paddx == E.mex || mapx == E.mex+E.pex || mapx+paddx == E.mex+E.pex) && jumped == 0)
The && and || operators are short-circuited. This means, the if-clause is only evaluated as along as it needs to be. As soon as it can be "decided", the program moves on. So for example this bit

Code: Select all

if (likelyThing && unlikelyThing) {
  // code ...
is slower than

Code: Select all

if (unlikelyThing && likelyThing ) {
  // code ...
This is because in the first case, most of the times 2 expressions will be evaluated: Mostly likelyThing , and then unlikelyThing. If you reorder this, then in most of the cases unlikelyThing is false, and then likelyThing doesn't have to be evaluated any more, as the whole && expression can't become true anymore anyway.

The other thing is with ||:

Code: Select all

if (unlikelyThing || likelyThing ) {
  // code ...
is slower than

Code: Select all

if (likelyThing || unlikelyThing) {
  // code ...
From these simple rules, re-order your big if-statements in unlikely and likely conditions -- if you can make assumptions. If not, happy debugging and TextOut: Measure, see what's going on, and react on it!.

Re: Need help with efficiency NXC code

Posted: 11 Aug 2011, 23:41
by linusa
2. The other general thing is: "Precache" as much as possible. Array-access (lookup) is very expensive in NXC (same ballpark as math operations). So, I noticed you frequently use things like

Code: Select all

(map[E.mex-1][E.mey].sld >= 1 || map[E.mex+1][E.mey].sld >= 1)
If you know which of these array elements get used more than once (I suspect most of them do), create local variables at the begin of your loop:

Code: Select all

int map_sld_01 =[E.mex-1][E.mey].sld;
int map_sld_11 =[E.mex][E.mey].sld;
int map_sld_21 =[E.mex+1][E.mey].sld;
and so on (I think you get the principle: I encoded the -1 as index 0, the index "+0" as 1, and +1 as 2, and just put x, then y in the variable name). Then use these vars instead of the array.

Just a side note: In C++ (and maybe C, but I don't know C that much), it's a good habit to use

Code: Select all

const int map_sld_01 = ...;
// or, depending on context,
// const int & rMap_sld_01 = ...; // this is a const reference
With this consts vars (or const references), the compiler might be more likely to do optimizations.


Anyway, you'd have to cleverly analyze your code, and only do the array-precache-lookup, if you know you're probably losing it. I didn't check your code so good, but if for example, in your first if-clause, you check if an enemy is dead (and exit/break in case he is), then do this array lookup only if your still "in" that loop, not before. The lookups are expensive, too. And if you do them too often or too early, you might deteriorate performance.

This is why it's so important to measure execution speed. You have to know and verify whether your optimization made any sense. Do you have a FPS / frame counter in your game?

A simple timer around your enemy-loop might be enough to at least show the milliseconds it took. The profiler-sections are just some sort of automated glorified timers, which also calc the average execution time...

Re: Need help with efficiency NXC code

Posted: 12 Aug 2011, 21:31
by nxtboyiii
Thanks! That helps a lot.

If you know which of these array elements get used more than once (I suspect most of them do), create local variables at the begin of your loop:

int map_sld_01 =[E.mex-1][E.mey].sld;
int map_sld_11 =[E.mex][E.mey].sld;
int map_sld_21 =[E.mex+1][E.mey].sld;
But the E.mex and E.mey change during the loop(I should probably change the variable names to make them more understandable).

Re: Need help with efficiency NXC code

Posted: 12 Aug 2011, 23:42
by muntoo
@Linus Is this your NXTasy post?
readme01.txt wrote:I couldn't resist to start on a little proof-of-concept. Features of what I'm working on:
  • Use the "profiler" by including a certain NXC file
  • Code to be measured has to be included between two custom commands -- I refer to this as a "section" of code.
  • Results are written to a binary file
  • File is retrieved from the NXT by using NeXTTool
The analysis will produce a HTML-version of the code, with statistics for the profiled sections. Numbers include:
  • How many times a section was executed
  • How many times a section was executed without the firmware "ticking" (i.e. in less of 1 ms)
  • How many ms ticked in total during code execution in a section (more useful the longer a section takes)
  • Total runtime of the executed program
  • Average time per call (if available/if it makes sense), relative percentage of the section compared to total runtime
I still have to figure out how to analyze/interpret the tick-count and elapsed ms data...


I decided to implement the first part fully in NXC. A custom preprocessor / parser in Python could be done later. At the moment users who want to profile their programs have to include profile commands on their own.


At the very top of our program, we need:

Code: Select all

#define PROFILER_ENABLE
#define PROFILER_MAXSECTIONS 10
#define PROFILER_RESULTSFILE "sample.prf"
#include "Profiler01.nxc"
By commenting (i.e. not defining) PROFILER_ENABLE, the whole profiler does NOTHING. This is a very comfortable off-switch. If disabled, all other profiler-functions will produce empty lines with comments -- no performance penalty at all.

PROFILER_MAXSECTIONS should ideally be the number of sections you want to profile. It's not a problem if this number is higher, but it takes up more memory. PROFILER_RESULTSFILE specifies the name of the binary file to be written to the NXTs flash. It gets freshly deleted at the start of every run. Finally we include the Profiler library version 0.1.



In our program we have to indicate the begin and end of our program. Most times this is in task main:

Code: Select all

task main(){
    PROFILE_START();

    ...

    PROFILE_STOP();
}

To actually profile code, we use this:

Code: Select all

    PROFILE_BEGINSECTION(0);
    ...
    PROFILE_ENDSECTION(0);


    PROFILE_BEGINSECTION(1);
    ...
    PROFILE_ENDSECTION(1);
The order of the arguments doesn't matter, we could start with section 2 and then follow with 0 or whatever. Important is that we don't do nested sections (yet). If a certain portion of code calls a function, which has its own section, than that's totally ok. It's just for later processing of the source-code file that we don't have overlapping sections (at the moment).


So I've tried this, and the first part -- profiling the code -- works. It all depends on how often the profiler-code is called of course. If you don't do it inside loops, you don't notice the performance at all. But if you do it inside a busy loop, performance can deteriote between a factor of 2 and 10, rough estimate. The profiler-results file gets generated fine, but I haven't written the part to analyze it yet.


If you're interested, here's the sourcecode:
http://www.mindstorms.rwth-aachen.de/tr ... iler01.nxc
It contains some neat preprocessor-tricks I think B)


As always, comments are very welcome
;)

Re: Need help with efficiency NXC code

Posted: 13 Aug 2011, 22:47
by linusa
muntoo wrote:@Linus Is this your NXTasy post?
;)
Thanks, good I saved that part. But no, I believe in that post, there was more :-)

Anyway, muntoo, don't you have some performance-intense code or some NXC-OpenGL-like stuff or a game or whatever where you'd want to try the NXC Profiler? I'd love to see a real-life testcase, but I'm without an NXT (for a long time already).