Jump to content
Search In
  • More options...
Find results that contain...
Find results in...
sqpat

RealDOOM: DOOM Ported to 16-bit Real Mode

Recommended Posts

When the Doom community runs out of things to port Doom to, they just invent new things to port Doom to.

 

Can't wait until someone figures out how to get the game running on some crusty 8-bit CPU like a Z80.

Share this post


Link to post

I think I'll be sharing ideas here to reduce the memory footprint that I've recently discovered on FastDoom (as this is the most importat part to avoid unnecesary paging). It's possible to unify the STBAR and STARMS lumps, as there will be no multiplayer. This lowers memory usage 1Kb, and makes the code a bit faster since there is no need to update twice the status bar.

 

With some optimizations and downgrades I think fast 286s will be able to run Doom at somewhat playable framerates (similar to 386SX)

Edited by viti95

Share this post


Link to post
On 9/21/2023 at 2:00 PM, Dark Pulse said:

When the Doom community runs out of things to port Doom to, they just invent new things to port Doom to.

 

Can't wait until someone figures out how to get the game running on some crusty 8-bit CPU like a Z80.

 

Already done:

 

 

Share this post


Link to post
30 minutes ago, viti95 said:

That's not based on vanilla Doom source code, so not Doom at all.

 

Obviously. That was a joke comment, the same as Dark Pulse's one that I quoted. 

Share this post


Link to post
11 hours ago, viti95 said:

I think I'll be sharing ideas here to reduce the memory footprint that I've recently discovered on FastDoom (as this is the most importat part to avoid unnecesary paging). It's possible to unify the STBAR and STARMS lumps, as there will be no multiplayer. This lowers memory usage 1Kb, and makes the code a bit faster since there is no need to update twice the status bar.

 

Oh interesting, I'll have to take a look! I changed some of the code in there recently and was able to pull out some code related to multiplayer (frags i think) and also removed all the boolean pointer logic and kind of hardcoded it. It's much simpler now, but I haven't looked super close at any lump optimizations.

 

11 hours ago, viti95 said:

  With some optimizations and downgrades I think fast 286s will be able to run Doom at somewhat playable framerates (similar to 386SX)

 

I am starting to think so too. There's some big possibilities if 1:1 vanilla doom compatibility is dropped, like a fixed 1-byte size viewport width (240? 250? 255) (which would drop visplane memory usage down a bit and change a number of 2 byte variables to 1 byte. Maybe also trying out lower-precision fixed -point - I've done this in a few instances like variables using heights and also fine angles. I've been wondering about 24 bit ints/structs too.

 

 

 

I tried looking into EMS 4.0 support to try and get 8 or 10 physical pages instead of just 4, but I'm having trouble getting it to work in any sort of software driver. Most drivers don't seem to want to give you more physical pages - EMM386 only supports 4 for sure, QEMM386 might support it but i'm fighting with the configuration options right now. I'm traveling right now so I can't test on real hardware. 

 

The current main issue continues to be that there is some sort of memory bug I haven't found yet in 16 bit, which manifests itself as lock-ups (usually when the player fires a weapon). It's making it difficult to do comparative 16-bit optimizations and benchmarking right now. I'm sure I'll eventually find it, but it's a cat-and-mouse game where adding debug code makes the problem go elsewhere. Meanwhile I've been trying to add some other features and optimizations and hoping for the bug to show up in a way that makes it easier to fix.

 

Share this post


Link to post

I've been following this closely since I saw the comment in the Doom8088 post in the VCFed forums, good to see a thread here on DW!

On 9/21/2023 at 2:12 PM, sqpat said:

You probably want close to 620 KB free and 3-4 MB of EMS minimum right now.

I've been a bit confused with whether it's possible to get in-game from reading the GitHub commit history, but if it is possible I can see why the conventional RAM requirements now makes sense to why I've been having difficulty with that - I can't get UMA and my hardware EMS card to work at the same time on my Protech PM286 mobo, which means that using something like DOSMAX is out of the question - the best I can get with conventional RAM is about 615KB free.

Share this post


Link to post
1 hour ago, deathz0r said:

I've been following this closely since I saw the comment in the Doom8088 post in the VCFed forums, good to see a thread here on DW!

I've been a bit confused with whether it's possible to get in-game from reading the GitHub commit history, but if it is possible I can see why the conventional RAM requirements now makes sense to why I've been having difficulty with that - I can't get UMA and my hardware EMS card to work at the same time on my Protech PM286 mobo, which means that using something like DOSMAX is out of the question - the best I can get with conventional RAM is about 615KB free.

 

It's possible to get it running with less, using as little as 400-450k of conventional memory and everything else in EMS. I've been recently adding code to have more and more variables optionally work in conventional memory, to figure out what combination works best, so it's a little messy now.

 

Basically if you just go into z_zone.c and set all the #defined block sizes to 1 (STATIC_CONVENTIONAL_BLOCK_SIZE_1,  STATIC_CONVENTIONAL_BLOCK_SIZE_2,  STATIC_CONVENTIONAL_SPRITE_SIZE, etc) then there wont be any big memory blocks reserved for conventional allocations. (Has to be 1, zero will cause boot issues)

 

One of my 286es got in game and rendered about 30-40 frames before crashing last week there was a recording I posted above. Ultimately, when things are really optimized, though - you will want your machine configured for as much conventional memory as possible to reduce EMS paging.

Share this post


Link to post

It's nice to get some information about this project too, thanks. Good luck! I have a feeling we'll have something running at least 10fps on a 286 within a year. Maybe even more fps. With decades of accumulated optimization tricks and algorithms, it feels inevitable.

Share this post


Link to post

Interesting looking source port.

 

Also I pretty much have the same profile pic as you LOL.

Share this post


Link to post

Okay - I'm back from a month of traveling and hoping to get back to work on this. 

 

The main outstanding issue is that there is still some sort of random (probably memory corruption) bug in the 16 bit build. Because of this, I still can't really do benchmark comparisons of code and feature changes. Fixing this bug is the #1 priority for now. Once that's done, I think I can make a bunch of easy improvements and benchmark some ideas.

 

One funny thing i've noticed about memory management for this project, while at first I moved everything to EMS at the start, the recent trend has been to use my available conventional memory to put stuff back there and avoid EMS as much of possible. In the shareware version, it's probably possible to put just about everything but textures in conventional memory. I don't think there is enough space in retail versions to do this though. (Also, we still aren't using sound. That will take up a lot of memory too... I wonder if there will be a safe way to page sound code in/out of EMS). 

 

 

Share this post


Link to post

How do you debug RealDoom? I tried to use the Watcom debugger for FastDoom but it was very prone to crash, so I ended up using the classic log files and realtime second screen MDA/Hercules output.

Share this post


Link to post
12 hours ago, viti95 said:

How do you debug RealDoom? I tried to use the Watcom debugger for FastDoom but it was very prone to crash, so I ended up using the classic log files and realtime second screen MDA/Hercules output.

 

Yep... up till now it's always been output to log files or to the screen.

 

This came up a lot during the transition to the EMS memory manager as thousands of lines were rewritten to use paged variables, and there were many bugs. Usually I'd find the player x/y or some other field diverged from a normal timedemo. I'd figure out the tick/frame it happened on, use debug code to find out which thinker caused the issue, then turn on some static flags when frame X and thinker Y was active, then print out various values to try and figure out what diverged from a standard build of PCDoomV2 - usually there was another bad variable - and i'd work backwards in frames to when that variable became wrong, etc, until I found the bad line of code that started the bug.

 

The current bug i'm dealing with is bouncing around to different areas every time I make any code change, though, so I can't just make changes to the debug code and rebuild and run again to find the bug source. I'm trying to create the simplest possible reproduction of the bug. It's almost surely a memory leak (or maybe even a bug in the timer code).

Share this post


Link to post

Phew, I finally was able to fix the main bug causing all the desyncs in the 16 bit build. A mobj pointer was being paged out then continued to be used in P_DamageMobj. After fixing this bug, it seems all three demos play back. I'm going to continue some work on conventional memory allocation and then do some comparative benchmarks to see what helps most when its in conventional memory.

Share this post


Link to post

That's really good! Keep us updated on the project.

Share this post


Link to post

Some good and bad news I suppose - The bad is that there are still some bugs (two that I can count) that are popping up from time to time causing crashes during gameplay and demos, but the good news is that they are somewhat uncommon and also pretty deterministic, so it should be easier to fix, and it also doesn't get in the way of benchmarking too much.

 

In the short term, in addition to fixing bugs, my focus is on maximizing performance with regards to what gets stored in EMS versus what little conventional memory we have.

 

Comparative Benchmarks with EMS/Conventional Allocations

 

So timedemo 3 run on 86box with a pentium 233, high detail and a very small window yielded the following results:

 

Everything in EMS:
Realtics 2134 (100% - yes, it ran exactly 1:1 with gametics)

6618473 reads from EMS memory manager (100%)

1746208 EMS page swaps (100%)

 

65535 Bytes made available for sectors, lines, vectors, etc in conventional (not enough for everything, but was enough for a few large fields)
Realtics 1593 (74.6%)

4674431 reads from EMS memory manager (70.6%)

409409 EMS page swaps (23.4%)

 

Texture info cached in conventional (Requires around 48000 bytes in shareware DOOM)

Realtics 1942 ( 91.0%)

5312437 reads from EMS memory manager (80.2%)

1363269 EMS page swaps (78.1%)

 

Sprite info cached in conventional (Requires 7000 bytes in shareware DOOM)
Realtics 2122 (99.4%)

6528690 reads from EMS memory manager (98.6%)

1728079 EMS page swaps (98.9%)

 

Thinkers cached in conventional (I gave it around 50000 bytes but to support max thinkers = 1000 would require as much as 97000)
Realtics 2022 (94.8%)

4736768 reads from EMS memory manager (71.5%)

1518549 EMS page swaps (86.9%)

 

 

So it's pretty clear caching the level variables like sectors and such causes the biggest speed up... texture info is also good to cache. Sprite info seems like it may not be worth caching. I'd like to make around 128k available for level data and also have the texture info cached eventually. Right now there's something like 50-100k available depending on build settings. Another 20-30k at least can get freed up if overlays work (hard to tell when there are unrelated bugs) and I can probably save the same off the stack eventually (currently using 32k stack but it should be easy to make some code changes and then make this 8k or maybe less, will require testing.) And then in theory a well configured machine will have as much as 96k extra conventional memory lying around in the high memory area, so I think there will be a lot to work with once I write some code to make use of that space too. I have to consider at some point sounds and music will require a lot of memory too though.

 

Some fields like texture info are pretty static and don't need to be changed after game start, while things like level data (sectors, etc) don't really change in size after level load, which means you can just give them space in a contiguous memory blob and not get any fragmentation during the level - then between levels you just clear it out and re-allocate. So they don't even need complicated allocation management.

 

Thinkers are kind of a pain because they are constantly being created and recreated which causes some fragmentation. I made a really basic allocator that was kind of wasteful on memory and good enough to run a demo, and it seems it doesn't improve run speed too much. I think what's going on is that while a lot of reads are being made for thinker objects, a lot of them are 'cache hits', hitting something already in the EMS page frames, meaning that the page swaps don't actually decrease too much. It's probably best to leave these in EMS, though I can maybe allocate them a dedicated range of pages, and that might lead to an even better cache hit rate.

 

I think the player object is still in EMS - I'd like that in conventional that but it might require some code rewrite. 

 

Hybrid visplanes are still active. I'm sure I can mess around with how many are in conventional memory and how much is in EMS to get some better average performance. Currently 60 are in conventional memory and more than that becomes EMS visplanes.

 

 

I think once this EMS performance is sort of "maxed out", we'll end up with about twice the framerate as before. Which is a nice start, but not enough. We will still be in 'fast pentium or 486' territory requirements at that point. 
 

I'm kind of just starting to get back into the feel of everything in the codebase after being away for so long, but I should be able to more or less work on this every day for the rest of the year. Hopefully I can get a pretty stable build sometime this month that is compatible with the doom shareware WAD.

Share this post


Link to post

I managed to free up a lot of conventional memory using overlays and reducing stack (from 32k to 3k) and I also removed a bunch of debugging code that was also slowing down runtime.

 

Currently, all level data (vertexes, sectors, side defs, line defs, linebuffer, subsectors, nodes, segs) are in conventional memory, as well as texture and sprite info. This adds up to about 150k of the most-used data shoved into that space.

 

Using 86box, every machine using an ISA Tseng ET4000ax, here are some demo3 results for various processors.

 

(Fullscreen Low Detail is screenblocks 10, Small is screenblocks 5 Demo3, so 2134 corresponds to 35 FPS)

 

Processor  Fullscr      Small    
          Low Detail  Hi Detail       FPS
P-233       1967        1246     37.97 / 59.94
P-100       3507        2325     21.30 / 32.12
P-75        4365        3033     17.11 / 24.62
AMD P90     4336        2854     17.22 / 23.92
AMD-120     5451        3633     13.7  / 20.55
DX4-75      6350        4241     11.76 / 17.6
DX-33       17225       11968    4.34  / 6.24
386-40      21800       15302    3.42  / 4.88
286-25      51107       36074    1.46  / 2.07


I'd say the P-75/AMD P90 number corresponds with "acceptable speeds" and the P-100 number corresponds with "good speeds". To reach 286-25 speeds, we're about 12x away, assuming 86box accuracy. There might be another big performance jump in freeing up enough conventional memory to put more visplanes in conventional memory. I'm sure there's another big jump in optimizing ASM in drawing, EMS code, and math. Potato detail might help some too. It's good to get some real numbers though.

 

Short term,  I may be moving code around to make overlays reduce memory more. (For example, in w_wad.c, we  can pull out the wad initialization code into a different file because it's not needed during the rest of gameplay.)

 

I haven't committed the code for this build yet, but the exe is somewhat stable and attached below. Make sure you are using it in conjunction with the shareware doom wad, the attached dstrings.txt and use a very minimal dos setup (attached autoexec and config.sys included.) MEM should show 619KB or more available. This build is mainly configured to barely work with demo3. You can probably try other levels but it'll probably not fit in conventional and will go into EMS.

doom16.7z

Share this post


Link to post

A few small improvements -

 

Use of overlays and putting initialization code in there such that it gets paged out saved around 14000 bytes of conventional space. It may have caused some bugs in level intermissions and the 32-bit build so I'm going to have to revisit this. I can still save some space by putting a few other things in the overlay like shutdown code.

 

I cached the player mobj in conventional memory. It's hard to measure a speed improvement because it's within the realm of run-to-run noise, but I assume it was a small (1% at best) speed improvement. EMS pagination went down 3 or 4 percent and a few hundred bytes were saved.

 

Thinkers were packed into their own dedicated EMS pages. Since most thinker code interacts with other thinkers, it's important to have them in the same pages rather than being interspersed with other allocation types to reduce page in/page out if they are fragemented around. The new thinker allocations do not have as much storage overhead as generic EMS allocations, so I was able to save 6000 bytes here too. Performance seemed to get around 2% faster, and EMS pagination dropped 3 or 4 percent.

 

I've freed up 20000 bytes here but honestly am running out of ideas on how to use this extra memory to improve performance further. Most of what's left in EMS are now things that get used very infrequently, once per frame at most. We will eventually want more memory available for level data in bigger levels, or more memory available for texture info, etc in commercial versions of the game, as well as memory for sound down the road, so maybe it's just going to be for those things. 

Share this post


Link to post

With FastDoom I learned that the bigger speedups came from optimizing the rendering code (ASM), so I guess that's the first thing to do after fixing all bugs. Also converting 32-bit variables to 16-bit or 8-bit will help a lot (I think @Frenkel is doing this on Doom8088).

Share this post


Link to post

Using 16-bit variables instead of 32-bit helps, but I'm not sure about 8-bit though. I've looked at some disassembled code and I've seen that it converts bytes to words.

Using 32-bit variables as indices into arrays sometimes causes the compiler to crash :).

 

I got a nice speed improvement by not bit-shifting 64-bit variables in FixedMul and FixedDiv. Not using FixedMul/FixedDiv, and thus not using 64-bit variables, also helps performance.

 

I'm now working on potato mode in Doom8088. The view window is 240x128 pixels, but every 4 horizontal pixels are the same, so you get a 60x128 graphics mode.

Using flat walls, flat sky, flat floors and ceilings I get 6.5 FPS in 86Box emulating a 286 @ 25 MHz.

 

What's next for Doom8088? Replacing info.c by getters ;)

Share this post


Link to post

I've thought about comparing 8 and 16 bit variables but haven't done it yet. I've already converted as much as I could from 32 to 16. In theory if I were willing to have some rounding errors that made demos play back incorrectly, I could drop precision further on some items. But I'm trying to keep things 1:1  as much as possible

 

I have a custom fixed_point union where i can move 8-16 bit fields around instead of shifting, but this was done a long time ago so I don't have any data on performance. But bit shifting is slow prior to 386 so I have to imagine it helps.

 

Yeah, I finished replacing info.c with getters a couple days ago. There's also a number of structs I pushed into overlayed function getters so they gets paged out after initialization. I've saved another 10k or so since the last post, and tried a variety of things to use the memory to cache different things - most used textures, flats, patches, etc. It really only adds up to around 1% speed improvement for using up 32k of memory, so I think I've more or less hit the limit on reasonable speed improvements with caching. (pulling from EMS doesn't take that long, mostly its just the overhead involved in page management. We've already reduced pagination over 95% from when it was uncached, so there's not much further to take it)

 

For sure, ASM improvements to drawing and math functions will help a lot. And maybe just a lot of work on the math functions in general. At some point I will have to go thru an ASM phase. Maybe different key functions can also be compiled in different faster compilers and reintroduced into the codebase down the road.

 

Lowering quality (non textured flats or potato mode) might be the next step to work on, but I already know that there won't be enough performance improvement from just these things to get us to playable framerates on the fastest 16 bit cpus. Maybe that's just how things will be though.

 

It sure would be nice to have a proper profiler.

Share this post


Link to post
1 hour ago, Frenkel said:

Using 16-bit variables instead of 32-bit helps, but I'm not sure about 8-bit though. I've looked at some disassembled code and I've seen that it converts bytes to words.

 

Yep compilers aren't very smart. Under some conditions and ASM coding it's possible to process two 8-bit registers at the same time with a single instruction (like SIMD). I use this trick extensively to convert 256-color backbuffered modes to other video modes in FastDoom. Also having 8-bit variables is better for the 8088 as the data bus is 8-bit wide.

 

1 hour ago, sqpat said:

Lowering quality (non textured flats or potato mode) might be the next step to work on, but I already know that there won't be enough performance improvement from just these things to get us to playable framerates on the fastest 16 bit cpus. Maybe that's just how things will be though. 


Potato mode is easy to implement, and reduces a lot the number of OUT instructions issued per frame. If graphic fidelity is not an issue, it's possible to modify the visplane rendering using flat colors without color dimishing, it's much faster as it doesn't require a conversion from columns to rows (took this idea from @Optimus OptiDoom). The main problem after optimizing graphic routines is to optimize game logic and mantain demo compability at the same time, there are lot's of optimizations that break it quite easily.

Share this post


Link to post

My focus these past couple days has been on taking as much as possible that was in EMS, and pulling it back into conventional memory - but also removing the EMS backup functionality, which means I don't need to go thru an accessor and I can just directly access pointers again. I'm guessing we can probably fit everything in memory after all as I've freed up quite a bit. (Though once we're working on say, DOOM 2 and not shareware DOOM it might be tough.)  It's looking like a solid (3-5% ?) runtime improvement but it'll take a few days to work out some bugs. Funnily enough, code is mostly being reverted to how it used to be with pointers and not MEMREFs being passed around everywhere.

 

Alongside this I'm refactoring mobj_t to be a lot smaller - some fields like nightmare respawn data are huge super rarely used and can stay in EMS, stuff like the player pointer is dumb and wasteful, we can just compare mt_type or something instead. But if i want to fit thinkers in conventional memory after all, then it's a lot of code I can rewrite to be more compact again too. I used to have mobj_t at 97 bytes, i'd like to get it into the 60s. Multiplied by many hundreds or even a thousand MAX_THINKERS will add up. I havent pulled upper memory blocks into play yet but i'm guessing 64k from there will be enough. I don't even want to think of where sound code and data will fit. 

 

Once this is all done, hopefully in a few days, I'll probably start on some baseline benchmarks again and then work on ASM math function improvements. Since the earlier benchmarks (nov 4) something like 25-30k of memory has been freed up, and things are around 4-5% faster. There's definitely easier bigger speed gains to be had with ASM  down the road, but I want to get the core memory management code and patterns as tight as possible first. 

Share this post


Link to post

Just a bit of an update. I've worked on various little improvements the past week like combining fields and removing certain cached data fields in lines, sectors, etc that I deemed to be not enough speed benefit to be worth the use of memory. The biggest improvement ultimately was putting mobj_t, ceil_t, floor_t, etc in conventional memory and then combining it with the thinker allocation list, so I reduced all the cross referencing between the thinker list and the actual data (thinkers no longer need a pointer, mobj etc no longer need a thinker pointer) It was a somewhat noticeable speed jump, I got my first P133 demo3 realtics score in the high 1600s with screenblocks 5 and full-quality, after starting at a 1850 a few weeks back.

 

The memory bugs are starting to pop up again, so I need to go on a bugfixing spree again for now. I'm also going to update the readme on the github soon with somewhat of a roadmap, but I want to work towards a 0.1 release by the end of the year that is 'mostly' stable for shareware doom. That 0.1 release will probably not have any ASM or significant math code improvements yet - I'd really just prefer to do 'memory stuff' before I start working on 'asm stuff' because I don't want to rewrite ASM later because I changed my structs later or something.

 

Crazy Long Term Idea - RealDOOM is the OS?

 

I've looked into EMS 4.0 functions a bit earlier, and I'm starting to look into them more now. Basically, EMS 4.0 mostly is an update to the EMS spec to support multitasking. The spec specifies all kinds of possibilities but in practice the main thing the hardware tends to support is mapping of the 256-640k region of main memory. During in-game gameplay of doom, there are two main phases of the runtime - the 'physics' portion (running thinkers, for the most part) and then the render portion. There's also a lot of variables that are only used for one of those - visplanes are never used during physics, thinker data  (mobj etc) is never used during rendering. If you want to go real deep, there are certain fields inside of sectors, lines, etc that are only used in one or another. Ideally, this data can be split into two blocks that map to the same region, and you can just swap from physics to render data as you switch between the two at runtime akin to how a multitasker is switching between two separate programs. It's a little complicated because the EMS mapping is done in multiples of 16kb blocks, aligned with 16kb memory regions and at runtime you have to get these memory blocks lined up dynamically, which feels like you're fighting with the OS to get exact physical memory regions.

 

But thinking about it even more, there's a lot more potential here. Not only is there a lot of data that is only used during one phase or another - there is a lot of code that is only used during one phase or another. Multitaskers using EMS 4.0 are dynamically switching tasks in that EMS region which also include code, obviously. If a bunch of render code and physics code could also live in the same mappable memory region and similarly be swapped out with a near instant EMS call, even more memory becomes available. Now maybe your map's flats and the most-used textures will now fit in conventional memory during the render phase. Maybe when music or sfx interrupts are called, they do an EMS swap to load their data in and swap back before the interrupt ends, and the sound and music is also mapping to this region.

 

It's not exactly simple for a real mode application to say to DOS (whether its link time, compile time, or runtime) "load this function's code at 0x40000". There's probably some way to do this if you wrote some startup code that is basically doing run-time linking at startup to put things where they need to be. This is approaching OS functionality, though. And I think RealDOOM in some idealized final form is more or less becoming the operating system in one way or another so it has complete control of the address space and where things are loaded and it can use EMS multitasking functions to dynamically and 'instantly' and seamlessly load in and out big memory regions at once. This isn't any sort of near term goal and the project will probably never get that far I'm guessing.

 

So as a much more reasonable halfway step - I'm wondering if I can just manage to allocate a 64kb block at the end of conventional memory (576k-640k) at run time with some mallocs/reallocs/etc. Then I could make that 64kb region work as the swappable data region between the two phases, which should already go a long way to free up even more memory.

 

Before that, I'll probably just pull 64k KB from upper memory and enjoy that "free" memory region first.

Share this post


Link to post

Another idea to reduce memory usage, compress better the IWADs. @fraggle has updated wadptr and fixed old issues, this reduces the amount of memory used for certain graphics, and specially the size of the sidedefs.

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×