Jump to content
Search In
  • More options...
Find results that contain...
Find results in...
Altazimuth

Multithreaded Renderer Beta [NOW IN MASTER!]

Recommended Posts

Wow, awesome. I haven't had much time to play with this but I ran around Heartland MAP05 a bit -- for comparison, I noticed that when using a single thread (on an i9-10900k with 32GB RAM), I was getting between 60-144 FPS (my monitor's refresh rate), roughly scaling with the complexity of the rendered view, with only about 7% CPU utilization (according to task manager). With 20 threads, it's running at 144 FPS more consistently, and utilizing over 90% of the CPU. It still dips a bit into the 130's. Still, pretty crazy. I'll play with this more later.

 

One possible issue with the menu:

When using the left arrow key to set renderer threads to the maximum number from single threaded mode, renderer threads is set to 19 (max threads - 1) rather than maximum threads (20).

 

Share this post


Link to post
10 hours ago, skillsaw said:

One possible issue with the menu:

When using the left arrow key to set renderer threads to the maximum number from single threaded mode, renderer threads is set to 19 (max threads - 1) rather than maximum threads (20).

Thanks for the report, fixed. Good to know it's running far better in Heartland. That was one of my targets, and some scenes really tested the performance. I won't generate a new build for solely this change just to keep things simpler in the case that there are crash reports to deal with soon.

I'm half-tempted to make the contexts a number entry field to reduce the window recreation spam, but then there's no exact way to know the max thread count, and also number entry fields don't work if you're going controller-only which is something I've been trying to improve the experience for recently.

Share this post


Link to post

There's now macOS builds (both x64 and ARM) courtesy of printz. There's no major differences between them and the builds I made yesterday. The only fix present is for the off-by-one that skillsaw pointed out, and that's extremely minor.

Share this post


Link to post

This definitely is a great upcoming feature on Eternity. Thanks to Altazimuth for even starting this and Gooberman for his pioneering work in multithreading Doom.

Share this post


Link to post

There is such an FPS progression I have noticed on Heartland: 1 thread - 98 FPS, 2 threads - 110 FPS, 4 threads - 117FPS and more threads it has the lower performance it gets further, good job!

Especially that I prefer capped framerates in software modes.

Share this post


Link to post

Thanks for all the kind words. I've uploaded a new build that fixes a crash if you changed the number of contexts after a swirling flat had been rendered (it'd crash as soon as another swirling flat was drawn).

Share this post


Link to post

That's cool and being such a nice addition to the renderer. I'm wondering if the thing could have been of a much great help for rendering maps with 3D models if these were ever implemented.

Share this post


Link to post

I don't believe that's ever a plan, and performance is not the limitation.

Share this post


Link to post

3D models are hard to fit in Doom but sometimes there is nothing that can replace them, like beatiful trees, way too unconvenient to model in a map editor. The tree sprites don't really possess that depth effect, and if such actor enlarged, it may create nice forest effects. Yeah, I definitely want too much, sorry.

Share this post


Link to post

Well, from what i see with other's comments, this is actually real!

Almost thinked for a moment it was another hoax just like the other one that ended up in Post-Hell.

Gonna give it a try once i can.

Share this post


Link to post

Wow, the multithread renderer is a big thing.

Thanks for the hard work Altazimuth.

 

I have a whopping bump from 50fps to 80fps when i increase the threads from 1 to 4.

4 seems the sweetspot for my notebook.

More threads then 4 are possible, but then the cooling system is only louder and the fps raise only a little bit.

Share this post


Link to post

Glad to hear of the perf improvement! Sadly as long as the synchronisation time increase per thread is so large it'll end up outweighing the gradually-decreasing gains from increasing the number of render threads. If there's any way that people think I might be able to improve this then I'd owe you a debt of gratitude if you informed me of it.

In other news I uploaded another build, though this only really folds in fixes from master. Most of the rendering changes were to tidy things up and make more things const.

Share this post


Link to post

I have tested now on my good old I7 3770K desktop.

 

The frames in my test scene go from 1 thread with 88fps to 127fps with 3 threads. 4 threads give nearly the same fps as 3 threads.

With 5 threads the fps go backwards.

But hey, that is an bump at around 45% more frames on this old CPU.

 

My test-scene is a map with 5 Edge-Portals on screen standing on one of them.

3Thread.png

Share this post


Link to post

Good stuff! Maybe you could make it so it autodetects which amount of threads is needed for the best performance, although I suppose that would be a tall ask for ultimately little net benefit.

Share this post


Link to post

I have nothing to add, except to say well done. Amazing work.

Share this post


Link to post

I couldn't think of a better wad to try with, so I just resorted to the ol' NUTS.WAD

And I'm running this on an Intel machine:

 

Spoiler

 

11th Gen Intel(R) Core(TM) i3-1115G4 @ 3.00GHz (4 CPUs), ~3.0GHz

8192MB RAM

DirectX 12

Intel(R) UHD Graphics

 

 

I get 20-30 FPS on average when everything is fucking going after me in the middle of the combat, it's a steady framerate for a slaughtermap, so I guess it does make an improvement! (It usually gets a worse framerate when playing with Woof! on the start sector for example)

 

Not too informative, not too casual.

Share this post


Link to post

After benchmarking the new multithread renderer this morning, i have played some hours DOOM today with the 2023-05-01 build and i am impressed about the performance on my old computer and notebook.

I have found no visual glitches so far that are caused by the multithread renderer. Good work!

 

Edited by Meerschweinmann

Share this post


Link to post
12 hours ago, Andromeda said:

Maybe you could make it so it autodetects which amount of threads is needed for the best performance

Not sure how I'd figure out how to do this. Would require some wide-scale rewrites and even with that done I'm not sure I'd be confident in any sort of heuristics I could use to figure out what the best thread count it.

 

10 hours ago, ValveMercenary said:

I couldn't think of a better wad to try with, so I just resorted to the ol' NUTS.WAD

I think NUTS is largely slow due to just how many monsters are thinking, rather than being rendered. It's definitely a combo but IIRC the thinking takes up way more time, meaning that faster rendering isn't gonna help toooo much.

Share this post


Link to post
9 hours ago, Altazimuth said:

I think NUTS is largely slow due to just how many monsters are thinking, rather than being rendered. It's definitely a combo but IIRC the thinking takes up way more time, meaning that faster rendering isn't gonna help toooo much.

Well, my bad. I couldn't think quite of something I could try with, and I forgot about the fact monster thinking is separate from rendering.

Is there some WAD which pushes the limits of rendering you could recommend me with so I could try it with this?

Thanks.

Share this post


Link to post
On 5/1/2023 at 11:05 AM, Altazimuth said:

Glad to hear of the perf improvement! Sadly as long as the synchronisation time increase per thread is so large it'll end up outweighing the gradually-decreasing gains from increasing the number of render threads.

What are you synchronizing? Resource (texture) access?

 

In GZDoom's software renderer I first check the pointer for a resource without locking. If the pointer is not null then it is already loaded and I can safely use it. If the pointer is null then I perform a mutex lock since only one thread can safely load it. See GetSoftwareTexture for an example of this strategy. Basically this removes virtually all synchronization in the gzd multithreaded renderer, except for when the threads wait for a new frame to begin or the first time something needs to load.

Share this post


Link to post
4 hours ago, dpJudas said:

What are you synchronizing?

Just the start/end of threaded rendering. I might be using terminology incorrectly here, to be fair. Any synchronising within the render threads should be at a minimum. All texture caching and such were moved outside rendering. Allocations are all for thread-local heaps, last I checked. There is a mutex for the global zone heap, but basically nothing uses it from within the render threads.

Share this post


Link to post

I believe you are referring to thread joining. Waiting for each thread to terminate has the typical overheads you describe.

Share this post


Link to post

Yeah honestly for the guy who ironed out all these multithreading kinks I really don't know what I'm doing to a large degree. Initial set-up was based on Rum & Raisin code and then the vast majority of it was coming up with novel solutions to issues reported to me by ThreadSanitizer.

It should be beared in mind there's no thread joining here, just setting an atomic bool to true and releasing of a semaphore (at which point the threads will spin). The whole communication between the threads happens on the render end here, and on the main thread's end here.

Share this post


Link to post

I've decided I'm pretty happy with how things are, even without load balancing. I plan on merging this into master in 1.5 days, unless anybody has any major reports.

Share this post


Link to post

Okay if it is the frame start/end part of the threads then there's not much more you can do to improve that. The only thing really would be to further distribute the work between the threads (some finish earlier than others due to a simpler BSP subtree), but that is really difficult with this method.

 

I did some testing with GZD some years back where I partially implemented the other thing from R&R: draw the walls as spans. That gave a massive speed improvement even for simple maps just because it reduces the pressure on the caches so much. Especially at higher resolutions. I never finished it for GZD due to too many drawers to port for me to bother, but I can highly recommend implementing that optimization too if you're up to it. :)

Share this post


Link to post
7 hours ago, dpJudas said:

I never finished it for GZD due to too many drawers to port for me to bother, but I can highly recommend implementing that optimization too if you're up to it. :)

I gave a stab at an incredibly simplistic attempt but couldn't quite figure out how to resolve rendering issues easily enough to bother. It seemed like some sort of persistent data was causing sprites to not render in the zones between the render context where the load balancing was happening. I'll probably pester GooberMan when he's freer.

Share this post


Link to post

@ceski Found an issue with r_sprprojstyle which I have since fixed. I'm not going to upload a new build since I plan on merging into master tomorrow.

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×