Speed issues

Mikenoworth · 16 de Diciembre de 2006, 11:17:09 AM

I've always had this issue with crm32pro, so it's nothing new. I'm not sure if I'm doing anything wrong, but using the default renderer for win xp using CTile to display about 13 x 11 tiles on the screen + two CRM32Pro_Ssprite sprites - the performance on an 800mhz pc 320mb mem., ati radeon 9600 pro is < ~20 fps with 800x600x32. At 800x600x16 the speed is upped maybe 5 to 10 fps. At 800x600x8 it's really good, but my gfx use their own palettes, up to 80 colors each so things like my text which use the color 4, 4, 4 get converted to 0, 0, 0 and therefore are transparent. I don't want to hack it and change the colorkey just to get a good frame rate on a bpp that's hardly ever used anymore. Plus for some reason it crashed really hard at 8bpp fullscreen, but doesn't at 16 or 32. (or 24, if its supported, I never tried it.)

I've optimized the tile display as far as I can, but I'm not sure how to up performance, oh and compiling as release has all kinds of nice issues. (CTile's don't display!!) Also, when I run the program compiled as debug from it's directory, nothing shows and it eventually quits out with no errors, no error info in the log file. When it's compiled as release the sprites show, but the CTiles don't, again no errors.

And I guess you know about this, but the directx renderer is either trying to access or free memory of an invalid pointer when it shuts down so I avoid using it, even though it is very fast.I will list the CTile display code, but i see nothing obviously wrong with it.

Código [Seleccionar]


class Static_Layer
{
public:
	Static_Layer(char* DPF, char* name, int size=256, bool repeat=false)
	{				
		Tileset = new CRM32Pro_CTile();
		if(!Tileset->Load(DPF, name))
			return;

		// set offset to the bottom of the layer of tiles
		y_offset = (size - 11) << 6; // 6 = * 64
		// set size of map in y coord amount of tiles.
		y_size = size;
		// if map scrolling repeats
		y_repeat = repeat;

		// log debug info
		#ifdef _DEBUG
			Test_Fill();

			Tileset->Info();
		#endif		
	}
	~Static_Layer(void)
	{
		delete Tileset;
	}

	void Test_Fill(void)
	{
		// fill in some random tiles for ground
		for(int x = 0; x < 13; ++x)
			for(int y = 0; y < y_size; ++y)
			{
				// temp
				data[x][y] = CRM32Pro.Rand() % 5 + 1;
			}
	}

	void Do(float offspeed=1.0f)
	{		
		if(!Tileset) return;

		int x = 0;
		int y = 0;
		// draw the ground
		for(x = 0; x < 13; ++x)
			for(y = 0; y < 11; ++y)
			{								
				// temp			
				// << (shl) is multiply
				// >> (shr) is divide
				Tileset->SetTileSet( 1, 64  );	
				int xpos = (x << 6); 
				int ypos = (y << 6) - (y_offset%64) - 32; // 32 pixel shift downwards
				int tiley = (y+(y_offset >> 6));
				int tile = data[x][tiley];
				Tileset->SetPosition(xpos, ypos);
				// check for error 
				if(tiley>0)
				{
					tiley = 999;
					Tileset->Draw(0, data[x][0]);
				}

				// on-screen debug info - "yard lines"
				#ifdef _DEBUG
					if(x==12)
					{
						char* info = new char[43]; 		
						sprintf_s(info, 42, "[%i]", tiley);
						Font->PutString(CRM32Pro.screen, xpos - 4, ypos, info);
						delete info;
					}
				#endif
			}		

		// on-screen debug info
		#ifdef _DEBUG
			char* info = new char[43]; 		
			sprintf_s(info, 42, "[Map_Layer First Tile = %i]", (y_offset>>6));
			Font->PutString(CRM32Pro.screen, 1, 1, info);
			sprintf_s(info, 42, "[Map_Layer Last Tile = %i]", y + (y_offset >> 6));
			Font->PutString(CRM32Pro.screen, 1, 24, info);
			sprintf_s(info, 42, "[Map_Layer.Y_Offset = %i]", y_offset);
			Font->PutString(CRM32Pro.screen, 1, 48, info);
			delete info;
		#endif
			
			y_offset -= (int)offspeed;		
	}

private:
	CRM32Pro_CTile* Tileset;
	// screen of 64x64 tiles WxH = 13 x 11 
	// max map size = 13 x 512	
	int data[13][512];	
	int y_size;
	int y_offset;	
	bool y_repeat;
};

TheAzazel · 16 de Diciembre de 2006, 12:52:32 PM

Mike,

I would need the source files to check in my own PC. The 8bits is quite outdated and to use 16bits should be enough but I want to see those errors you got and what about the performance.

Have you tried the glSDLBenchmark? it would give you a good indicator of your computer performance. If you can, send me its results together the source files.

What's compiler are you using? and the platform? I know that SDL is having problems with directx backend, that is the reason why they changed the default backend to windib which is not necessary slower but is more compatible with win64 and vista.

Waiting your feedback :). Do not use my dm2 account as it has problems, use instead megatorm account.

Cheers!

Mikenoworth · 16 de Diciembre de 2006, 01:24:17 PM

I will try all this tomorrow.. Well.. Later today actually. I need to get some sleep. But CTile seems to be the biggest performance hit. (I did remove that line of code you told me I didn't have to call everytime, which makes alot of sense.)

Give me about 8 - 10 hours when I get home or wake up. :)

Mikenoworth · 16 de Diciembre de 2006, 01:31:47 PM

The table wouldn't copy-n-paste correctly, so here's the text readout:

Citarøû (x86 Family 6 Model 8 Stepping 6)
· CPU Speed: 863 Mhz
· System memory total/available: 319Mb / 82Mb
· ATI Technologies Inc. - RADEON 9600 x86/SSE - 2.0.6012 WinXP Release
· Screen resolution 800x600x32bits

· Test 1 using 'software' engine.
-> Number of sprites: 100
-> Screen is at 32 bits per pixel
-> Screen is in video memory
-> Screen has double-buffering enabled
-> Sprite is in video memory
-> Sprite blit uses hardware acceleration
-> 84.49 frames per second

· Test 2 using 'software' engine.
-> Number of sprites: 1000
-> Screen is at 32 bits per pixel
-> Screen is in video memory
-> Screen has double-buffering enabled
-> Sprite is in video memory
-> Sprite blit uses hardware acceleration
-> 33.31 frames per second

· Test 3 using 'software' engine.
-> Number of sprites: 100
-> Screen is at 32 bits per pixel
-> Screen is in video memory
-> Sprite is in video memory
-> Sprite blit uses hardware acceleration
-> 353.27 frames per second

· Test 4 using 'software' engine.
-> Number of sprites: 1000
-> Screen is at 32 bits per pixel
-> Screen is in video memory
-> Sprite is in video memory
-> Sprite blit uses hardware acceleration
-> 35.84 frames per second

· Test 5 using 'software' engine.
-> Number of sprites: 100
-> Screen is at 32 bits per pixel
-> Screen is in video memory
-> Screen has double-buffering enabled
-> Sprite is in video memory
-> 612.73 frames per second

· Test 6 using 'software' engine.
-> Number of sprites: 1000
-> Screen is at 32 bits per pixel
-> Screen is in video memory
-> Screen has double-buffering enabled
-> Sprite is in video memory
-> 233.31 frames per second

· Test 7 using 'software' engine.
-> Number of sprites: 100
-> Screen is at 32 bits per pixel
-> Screen is in video memory
-> Sprite is in video memory
-> 886.92 frames per second

· Test 8 using 'software' engine.
-> Number of sprites: 1000
-> Screen is at 32 bits per pixel
-> Screen is in video memory
-> Sprite is in video memory
-> 209.42 frames per second

· Test 9 using 'software' engine.
-> Number of sprites: 100
-> Screen is at 32 bits per pixel
-> Screen is in system memory
-> Sprite is in system memory
-> Sprite blit uses RLE acceleration
-> 31.15 frames per second

· Test 10 using 'software' engine.
-> Number of sprites: 1000
-> Screen is at 32 bits per pixel
-> Screen is in system memory
-> Sprite is in system memory
-> Sprite blit uses RLE acceleration
-> 4.47 frames per second

TheAzazel · 17 de Diciembre de 2006, 04:13:16 PM

Hope my email helps you :)

cheers!

Mikenoworth · 18 de Diciembre de 2006, 01:59:41 AM

I split the Do() members into Logic() and Render() and threw Logic() into the CRM32Pro.Update(&event) loop, while Render() stayed in RenderGFX() and it's running better now, not perfect, but better. :) I'm sure in fullscreen it will run how it should. But we'll see when there's 20 flyers on the screen with all kinds of bullets and particle systems running. hehe

TheAzazel · 18 de Diciembre de 2006, 11:17:29 AM

Well done!! :)

how many fps are you getting now?

uhm...particle system? I think it will kill the performance :)
One thing you can use is alpha per pixel, if your target platform is Windows, the current library version support an experimental support for OpenGL hardware acceleration(for the moment, the fade system doesnt work at all.. I have to test it more) so you can try to use:

CRM32Pro.Config.VideoRenderer = RENDER_OPENGL;
CRM32Pro.Config.VideoAccel = ACCEL_HARDSMOOTH; // it needs this flag to use a doublebuffer

about the window mode...it works on a window and at fullscreen and remember...hardware acceleration even with alpha per pixel. Im sure this will help A LOT to your performance :)

Cheers!

Mikenoworth · 19 de Diciembre de 2006, 04:56:04 PM

Whats a game without particle systems?!?! Haha.

Hm. So OpenGL is stable enough then? Lets see.. Wowee! Well it's _too_ fast haha! Getting ~400 fps. The ship did have a max speed of 8.0f, I kicked that down to.. 3.0f. May have to drop it even further down, to about 1.5f - 2.0f.

Let me build another stable copy to send to you so you can tell me how the speeds feel on your PC.

BTW the code has had some major changes. :twisted:

TheAzazel · 19 de Diciembre de 2006, 06:07:15 PM

Yes, it is quite stable, but not everything works, remember, do not use fade if you are using RENDER_OPENGL until I can fix it.

Uhm... in theory, if your logic code is running only inside the logic loop(inside CRM32Pro.update()), you are completely independent from rendering performance, so your ship should move at same speed on a machine with 10FPS or with 400FPS. Do know what I mean?

Im longing for that new version :P

Cheers!

Mikenoworth · 20 de Diciembre de 2006, 03:28:42 AM

I do know what you mean, but it's not working like that. I placed the code like you said.. *shrug* I'm packaging everything now for you.

Mikenoworth · 22 de Diciembre de 2006, 09:37:32 AM

Well, truthfully I'm not sure exactly how you expect it to run, but if I change it from say OPEN_GL to the default (for win xp) the performance drops by at least half, but the logic does not seem to make up for the performance drop.

But I don't know _alot_ about timing mechanisms. I only know enough to make my own and hack it together until it does what it should.

As long as I can get victims to test my game, I will continue to work on the performance and timing. But maybe the timing is doing what it's supposed to now?

I really enjoy using CRM32Pro, you've already done all of the exhaustive work for me! WOOT!! Praise CRM32Pro! :wink:

Mikenoworth · 23 de Diciembre de 2006, 09:56:02 PM

One more thing, which I doubt can be improved any more - CRM32Pro_CSprite::Collision() lags the entire game when it is called. :!:

I do check to make sure the two sprites being collision checked are close enough that collision will probably be true before attempting to call Collision().

Of course, when you have an 8x8 bullet in a 16x16 sprite, or a 24x20 ship in a 32x32 sprite, without calling Collision() there's gonna be alot of "HEY THAT DIDN'T HIT ME!!!" or "HEY I DIDN'T MISS HIM!!!" going on. (like in quake 1, 2 and 3 because models would go outside their bounding boxes when running)

Anyway. I know there probably isn't much hope for a fix there. :) It's cool.. I can make it an option in a menu for newer PC's (1.5ghz +).

TheAzazel · 24 de Diciembre de 2006, 02:41:27 AM

Hi Mike,

the collision detection system is as fast as is possible. Internally, the first thing it does is a fast bounding box so you won´t get more frames doing that job before to call it(you will do the same twice!).

As the collision is performed over surfaces, it will work fine and fast when you are using ACCEL_SOFTWARE, with other modes, it will be extremely slow as per each detection, the surface has to be download from VideoRAM to system RAM through PCI,AGP or PCI-E bus and..it doesnt matter at all if you have the latest PCI-E x16... it will be SLOW, very SLOW!.

So, if you need to use the collision system(as I guess :)), you only have one way to go: ACCEL_SOFTWARE.

I really suggest you to have a look to one of the examples: SpriteCollision. You can enable/disable it and it has a debug output so you will see each bounding box around the sprite in blue color, the collision area will be a red rectangle. On this area, is where the system checks pixel by pixel and as soon as it finds a collision, it returns a hit :).

There is a little trick to speed up the process: use as less colorkey as possible. For example, imagine a little solid rectangle of 16x8 embedded in a sprite of 32x32 pixels with all surrounded pixels of our solid rect with colorkey enable, this will cause a lot of pixel by pixel calculation and it is not very efficient. The best way is to embedd our solid rect in a 16x8 pixels to avoid that unnecessary pixels with colorkey. Well, this is just a clear example, but I hope it can help you :)

In the future, when I fully support OpenGL, this kind of thing will be supported but for the moment... ACCEL_SOFTWARE for president :P

Cheers!

P.S.: thinking on it, I could add a new member to set the "granularity" of the collision system, instead of pixel by pixel, each 2,3,4 or x pixels. On this way, you will loss accuracy in pro of the speed. What do you think?

Mikenoworth · 27 de Diciembre de 2006, 06:30:11 AM

Hm.. I swear I replied to this a couple of days ago, but it's not here.. So anyway..

About the unusual sprite size, the .DPF tool limits you to 16x16 sprites as the smallest sprites you can use - this is why I end up using sprites sizes too big for some sprites..

The collision granulty sounds like a great idea! But also, with that bit you said about the red-rectangle showing the collision area, maybe you could add a BB_Collision() member or something that only uses a _tightly_ bounded box?? (which is calculated simply by finding the outer-most pixels.. which can be done when the sprite is loaded.. but you know this. 8) ) Less accurate, but it's a nice option if you want to pass up the current implementation for collision in favor of performance.

Having lots of collision options is my cup of tea. Sometimes you need pixel by pixel, and sometimes you need every other pixel, and sometimes you just need to check a bounding box.

But the "granularity" idea sounds cool! You could cut collision check time by at _least_ half.

One more thing, I've looked over the docs, but I haven't found this feature, I want to get the size of a sprite. Is there a member?? Something I missed somewhere?

Anyway, thanks and happy holidays!

TheAzazel · 27 de Diciembre de 2006, 12:55:20 PM

Hi Mike,

hehehe, a few days ago I fixed the stupid sprite size limit of 16x16 in the EditorDPF, now, it is set to 4x4 which is acceptable, it isnt? :)

About collision system, I will implement first, the fast approach :), to add a new member to only check the collision using a boundbox. I will add to my ToDo list that "granularity" feature.

I hope to deliver today the new release... we will see if I can do it!
Of course, I recommend you to use it as soon as it will be available on the webpage(including the EditorDPF :P).

Cheers!

Foros

Speed issues