Jump to content
Search In
  • More options...
Find results that contain...
Find results in...

sqpat

Members
  • Content count

    25
  • Joined

  • Last visited

About sqpat

  • Rank
    Warming Up

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. sqpat

    RealDOOM: DOOM Ported to 16-bit Real Mode

    The 0.11 release needs UMBs - try the 0.10 release which didn't use them (basically the only difference between the two). You will probably have to make sure your config has as much free memory as absolutely possible - it's definitely easier if you have UMBs and DOSMAX or something like that available.
  2. sqpat

    RealDOOM: DOOM Ported to 16-bit Real Mode

    The github readme has been updated with a basic roadmap (and more timedemo scores) Release 0.11 is will come soon - It's not really too different from 0.10, I've fixed a few render bugs and also added UMB support to free enough memory to make the last remaining shareware level playable (e1m6). Upcoming releases will all have to do with varying levels of EMS 4.0 and multitasking support, so this will be the last EMS 3.2 playable version. This isn't a problem for late 286 machines whose chipsets should support these features or machines running EMM386, but earlier 286es without advanced ISA memory cards or XT machines dependent on lotech EMS cards and other simple EMS boards won't be able to run later versions. Maybe a repro card will come around that makes this easier to do, oh well.
  3. sqpat

    RealDOOM: DOOM Ported to 16-bit Real Mode

    Some updates - I got UMBs working, and now pull around 70k from there. I then went ahead and allocated a bunch more memory to level data allowing e1m6 to be playable. The conventional memory usage for the build went from 615818 -> 543738 -> 569802, which is still pretty comfortably low now. I might fix a couple bugs and make a quick 0.11 release before going heavy on EMS 4.0 multitasking prep work. EMS 4.0 hardware is somewhat more difficult to come by, especially for a board that will be compatible with XT class systems. The lo-tech EMS boards won't work, and a lot of other ones are 16 bit only I think. Later, faster 286es have chipset support and aren't a problem. I'm not sure how emm386 will work out yet. Meanwhile, I benched the 0.1 release on a few pieces of hardware over the past day. Below is a full quality timedemo 3 playback on a 4.77 mhz 8088. It's a little under a million realtics, about a 7 hour video and 0.0888 fps :) A V20 at 9.5 mhz finished the typical 5 screenblocks hi detail run bench I tend to run at 162157 realtics, which is about 1/5th of the speed of a fast 286. I have a 16 mhz turbo v20 board I can try it on too at some point.
  4. sqpat

    RealDOOM: DOOM Ported to 16-bit Real Mode

    You can definitely get faster fps on 86box by a factor of 2 or 3 so, but yeah dosbox should at least be playable and is generally more convenient. Oh, yeah - sorry, I thought that was clear - only shareware doom is supported right now. Doom2 generally has larger levels and different content and will need more work. I haven't tested doom2 content at all and it may be buggy - especially viles and the last boss and stuff. Once I've freed up a lot more memory via some upcoming improvements, it'll make more sense to get things like doom 2 or sound working,. Oh yeah, the finale... I've not tested at all! I'm sure it's probably broken, haha. Maybe it is because of the wipe. Wipes were removed because they basically need 128KB of free space to run (two VGA screen buffers) which is a huge chunk of your 640 KB. Once EMS multitasking is in there I can probably make it happen. That reminds me - level restarts after game over also don't work - perhaps because of the screen wipe. You have to go to the menu and select a new game in that case Wadptr will crash without extra work on the codebase due to some of it's optimizations. In order to save space on indexing the wad, I calculate and store the sizes in a certain way and when wad entries overlap it creates 'negative size' entries which leads to trouble. I think what's likely to happen down the road is that rather than using wadptr, I will basically generate the wadptr-style compressed wad at runtime by finding duplicate entries and filtering out duplicates. Thanks for your testing and input!
  5. sqpat

    RealDOOM: DOOM Ported to 16-bit Real Mode

    0.1 Release OK, the game has been working pretty cleanly in DOSBOX again, which for it's faults with compatibility at least makes development cycles so much faster since I don't have to mount images and stuff. Thanks to that I got tons of work done in the past few days - I cleaned up a dozen or so bugs and also converted lots and lots of physics and render code to 16 bit logic (then fixed the bugs that caused). Tomorrow I will figure out how github releases work and update the project page there, but attached here is the exe I assume will go out as RealDOOM 0.1. Basically this should work with all maps in shareware except E1M6. I don't *think* it will crash anymore - I haven't seen funny memory bugs recently, I think they are all fixed... There's some features not in there yet like sound of course, savegames, and screen wiping. There are some render bugs like overdrawing in the intermission screen, and a minor fuzz draw bug in timedemo3 and a also some sprite masking bugs. You need about 605-610k for this one, a little less than the previous one but if you use a standard nearly blank MS-DOS 6.22 config with EMM386 loaded (sample config.sys in the zip file) it should work. The zip also has the dstrings.txt file which is necessary. Don't forget to include DOOM1.WAD The 286-25 bench (screenblocks 5 hi quality Demo3) ran at 31821 realtics, compared to 36074 a month ago. That corresponds to about 12% faster, and the fps is around 2.37 now. This has pretty much just been from rewriting algorithms and refactoring code - there's still no ASM or anything. The pentium times are around 10% faster from the same period... it's hard to say for sure but I think some of the improvements aimed at the 286 specifically (like reducing shifts) have been beneficial. EDIT: I've run some extra benches, and it seems like on real hardware (vs 86box) i get 15% faster times on my 286 and 4-5% faster times on my 386 DX-40. My mmx meanwhile ran about 30% faster. Not really sure what's the reason, maybe memory access times or something, but it's a good sign anyway. I may try to get it to run on the turbo xt tomorrow for funsies... i wonder if it will beat 100,000 realtics... realdoom_0.1.7z
  6. sqpat

    RealDOOM: DOOM Ported to 16-bit Real Mode

    I've managed to fix a half dozen or so known bugs (corrupted palettes, timedemo desyncs, readthis rendering, level intermission crashes) and all the demos are running correctly again with all the improvements from the past few weeks. There is a weird random bug happening where the level begins to render in a mess - A lot of the walls suddenly rendering at the wrong angles among other garbage. I've managed to dump all the level data (sectors, lines, segs, etc etc etc) from memory to file after this has happened but all the data checked out as good. Automap also renders fine when this happens. I don't think this seems like a visplane bug either. I will probably check a few other things like if the trigonometry lookup tables are getting corrupted or something. I think once this last bug is fixed, I will cut the 0.1 release based off of that. Basically, all the shareware content plays fine except for E1M6 which is so much bigger than all the other shareware levels - I would need to clear up another 30kb or so to make it fit. That'll be easy once I have UMB allocation working, which should be another easy 64k (at least)- I will probably update with another minor release after that, then begin on EMS 4.0 work, which feels like it will free up at least another 100k. Stability is as good as it's ever been, outside of the random render bug popping up every couple minutes and other known issues (savegames, sound, nightmare respawns) I don't see too much wrong anymore.
  7. sqpat

    RealDOOM: DOOM Ported to 16-bit Real Mode

    Well, I spent a week cleaning up bugs but also trying out some optimizations. There are some notes in the comments of the codebase about precalculating lineopenings and i gave it a try. You can precalulcate it in level setup but have to treat it as a cache and mark things dirty whenever platforms move and sector floor/ceils change, basically. It wastes 6-7 KB and unfortunately its hard to measure the improvement. I tried average ten runs worth of timedemos and it's a tiny bit faster, maybe 0.25% on average or so? (is there a better way?) . I tried 8 bit validcounts for linedefs to save some space but it seems that's not enough uniqueness to avoid overlaps/collisions of values and bugs happen like a thousand tics into demos. Removing the back references in mobj_t (for sector lists and block lists) is not hard to do since they aren't used all that much. A few KB can be saved by rewriting a bit of code and removing sPrev and bPrev fields. I think they are ultimately a tiny bit measurably slower, so I may revert this later. I'm realizing that if i can get the EMS 4.0 improvements I mentioned earlier to work, then I will have way more memory than i need during the physics portion of the code, so all these tricks to reduce sizes of certain fields wont really matter - what will matter is only speed during that phase, and memory usage becomes more important during rendering due to the amount of memory used by textures. I'm going to clean a couple remaining broken features and known bugs, make a simple release build that can hopefully run all the shareware levels, then after that focus on allocating UMB space and EMS 4.0.
  8. sqpat

    RealDOOM: DOOM Ported to 16-bit Real Mode

    Hmm, so my original intent was to make it be compatible with the original commercial WADs. However, there's no reason I can't add some initialization code (in an overlay, which wont take up ingame runtime memory) to run the wadptr code on the input commercial WAD at startup if it's detected as uncompressed then to compress the IWAD ... so for now, I will probably just use the compressed WADs and make a TODO to port wadptr into some 16 bit code that can optionally run at startup. Thanks for the suggestion. Some doom 2 sidedef lumps were over 64k and I didn't really want to write a kludge to deal with that.
  9. sqpat

    RealDOOM: DOOM Ported to 16-bit Real Mode

    Just a bit of an update. I've worked on various little improvements the past week like combining fields and removing certain cached data fields in lines, sectors, etc that I deemed to be not enough speed benefit to be worth the use of memory. The biggest improvement ultimately was putting mobj_t, ceil_t, floor_t, etc in conventional memory and then combining it with the thinker allocation list, so I reduced all the cross referencing between the thinker list and the actual data (thinkers no longer need a pointer, mobj etc no longer need a thinker pointer) It was a somewhat noticeable speed jump, I got my first P133 demo3 realtics score in the high 1600s with screenblocks 5 and full-quality, after starting at a 1850 a few weeks back. The memory bugs are starting to pop up again, so I need to go on a bugfixing spree again for now. I'm also going to update the readme on the github soon with somewhat of a roadmap, but I want to work towards a 0.1 release by the end of the year that is 'mostly' stable for shareware doom. That 0.1 release will probably not have any ASM or significant math code improvements yet - I'd really just prefer to do 'memory stuff' before I start working on 'asm stuff' because I don't want to rewrite ASM later because I changed my structs later or something. Crazy Long Term Idea - RealDOOM is the OS? I've looked into EMS 4.0 functions a bit earlier, and I'm starting to look into them more now. Basically, EMS 4.0 mostly is an update to the EMS spec to support multitasking. The spec specifies all kinds of possibilities but in practice the main thing the hardware tends to support is mapping of the 256-640k region of main memory. During in-game gameplay of doom, there are two main phases of the runtime - the 'physics' portion (running thinkers, for the most part) and then the render portion. There's also a lot of variables that are only used for one of those - visplanes are never used during physics, thinker data (mobj etc) is never used during rendering. If you want to go real deep, there are certain fields inside of sectors, lines, etc that are only used in one or another. Ideally, this data can be split into two blocks that map to the same region, and you can just swap from physics to render data as you switch between the two at runtime akin to how a multitasker is switching between two separate programs. It's a little complicated because the EMS mapping is done in multiples of 16kb blocks, aligned with 16kb memory regions and at runtime you have to get these memory blocks lined up dynamically, which feels like you're fighting with the OS to get exact physical memory regions. But thinking about it even more, there's a lot more potential here. Not only is there a lot of data that is only used during one phase or another - there is a lot of code that is only used during one phase or another. Multitaskers using EMS 4.0 are dynamically switching tasks in that EMS region which also include code, obviously. If a bunch of render code and physics code could also live in the same mappable memory region and similarly be swapped out with a near instant EMS call, even more memory becomes available. Now maybe your map's flats and the most-used textures will now fit in conventional memory during the render phase. Maybe when music or sfx interrupts are called, they do an EMS swap to load their data in and swap back before the interrupt ends, and the sound and music is also mapping to this region. It's not exactly simple for a real mode application to say to DOS (whether its link time, compile time, or runtime) "load this function's code at 0x40000". There's probably some way to do this if you wrote some startup code that is basically doing run-time linking at startup to put things where they need to be. This is approaching OS functionality, though. And I think RealDOOM in some idealized final form is more or less becoming the operating system in one way or another so it has complete control of the address space and where things are loaded and it can use EMS multitasking functions to dynamically and 'instantly' and seamlessly load in and out big memory regions at once. This isn't any sort of near term goal and the project will probably never get that far I'm guessing. So as a much more reasonable halfway step - I'm wondering if I can just manage to allocate a 64kb block at the end of conventional memory (576k-640k) at run time with some mallocs/reallocs/etc. Then I could make that 64kb region work as the swappable data region between the two phases, which should already go a long way to free up even more memory. Before that, I'll probably just pull 64k KB from upper memory and enjoy that "free" memory region first.
  10. sqpat

    RealDOOM: DOOM Ported to 16-bit Real Mode

    My focus these past couple days has been on taking as much as possible that was in EMS, and pulling it back into conventional memory - but also removing the EMS backup functionality, which means I don't need to go thru an accessor and I can just directly access pointers again. I'm guessing we can probably fit everything in memory after all as I've freed up quite a bit. (Though once we're working on say, DOOM 2 and not shareware DOOM it might be tough.) It's looking like a solid (3-5% ?) runtime improvement but it'll take a few days to work out some bugs. Funnily enough, code is mostly being reverted to how it used to be with pointers and not MEMREFs being passed around everywhere. Alongside this I'm refactoring mobj_t to be a lot smaller - some fields like nightmare respawn data are huge super rarely used and can stay in EMS, stuff like the player pointer is dumb and wasteful, we can just compare mt_type or something instead. But if i want to fit thinkers in conventional memory after all, then it's a lot of code I can rewrite to be more compact again too. I used to have mobj_t at 97 bytes, i'd like to get it into the 60s. Multiplied by many hundreds or even a thousand MAX_THINKERS will add up. I havent pulled upper memory blocks into play yet but i'm guessing 64k from there will be enough. I don't even want to think of where sound code and data will fit. Once this is all done, hopefully in a few days, I'll probably start on some baseline benchmarks again and then work on ASM math function improvements. Since the earlier benchmarks (nov 4) something like 25-30k of memory has been freed up, and things are around 4-5% faster. There's definitely easier bigger speed gains to be had with ASM down the road, but I want to get the core memory management code and patterns as tight as possible first.
  11. sqpat

    RealDOOM: DOOM Ported to 16-bit Real Mode

    I've thought about comparing 8 and 16 bit variables but haven't done it yet. I've already converted as much as I could from 32 to 16. In theory if I were willing to have some rounding errors that made demos play back incorrectly, I could drop precision further on some items. But I'm trying to keep things 1:1 as much as possible I have a custom fixed_point union where i can move 8-16 bit fields around instead of shifting, but this was done a long time ago so I don't have any data on performance. But bit shifting is slow prior to 386 so I have to imagine it helps. Yeah, I finished replacing info.c with getters a couple days ago. There's also a number of structs I pushed into overlayed function getters so they gets paged out after initialization. I've saved another 10k or so since the last post, and tried a variety of things to use the memory to cache different things - most used textures, flats, patches, etc. It really only adds up to around 1% speed improvement for using up 32k of memory, so I think I've more or less hit the limit on reasonable speed improvements with caching. (pulling from EMS doesn't take that long, mostly its just the overhead involved in page management. We've already reduced pagination over 95% from when it was uncached, so there's not much further to take it) For sure, ASM improvements to drawing and math functions will help a lot. And maybe just a lot of work on the math functions in general. At some point I will have to go thru an ASM phase. Maybe different key functions can also be compiled in different faster compilers and reintroduced into the codebase down the road. Lowering quality (non textured flats or potato mode) might be the next step to work on, but I already know that there won't be enough performance improvement from just these things to get us to playable framerates on the fastest 16 bit cpus. Maybe that's just how things will be though. It sure would be nice to have a proper profiler.
  12. sqpat

    RealDOOM: DOOM Ported to 16-bit Real Mode

    A few small improvements - Use of overlays and putting initialization code in there such that it gets paged out saved around 14000 bytes of conventional space. It may have caused some bugs in level intermissions and the 32-bit build so I'm going to have to revisit this. I can still save some space by putting a few other things in the overlay like shutdown code. I cached the player mobj in conventional memory. It's hard to measure a speed improvement because it's within the realm of run-to-run noise, but I assume it was a small (1% at best) speed improvement. EMS pagination went down 3 or 4 percent and a few hundred bytes were saved. Thinkers were packed into their own dedicated EMS pages. Since most thinker code interacts with other thinkers, it's important to have them in the same pages rather than being interspersed with other allocation types to reduce page in/page out if they are fragemented around. The new thinker allocations do not have as much storage overhead as generic EMS allocations, so I was able to save 6000 bytes here too. Performance seemed to get around 2% faster, and EMS pagination dropped 3 or 4 percent. I've freed up 20000 bytes here but honestly am running out of ideas on how to use this extra memory to improve performance further. Most of what's left in EMS are now things that get used very infrequently, once per frame at most. We will eventually want more memory available for level data in bigger levels, or more memory available for texture info, etc in commercial versions of the game, as well as memory for sound down the road, so maybe it's just going to be for those things.
  13. sqpat

    RealDOOM: DOOM Ported to 16-bit Real Mode

    I managed to free up a lot of conventional memory using overlays and reducing stack (from 32k to 3k) and I also removed a bunch of debugging code that was also slowing down runtime. Currently, all level data (vertexes, sectors, side defs, line defs, linebuffer, subsectors, nodes, segs) are in conventional memory, as well as texture and sprite info. This adds up to about 150k of the most-used data shoved into that space. Using 86box, every machine using an ISA Tseng ET4000ax, here are some demo3 results for various processors. (Fullscreen Low Detail is screenblocks 10, Small is screenblocks 5 Demo3, so 2134 corresponds to 35 FPS) Processor Fullscr Small Low Detail Hi Detail FPS P-233 1967 1246 37.97 / 59.94 P-100 3507 2325 21.30 / 32.12 P-75 4365 3033 17.11 / 24.62 AMD P90 4336 2854 17.22 / 23.92 AMD-120 5451 3633 13.7 / 20.55 DX4-75 6350 4241 11.76 / 17.6 DX-33 17225 11968 4.34 / 6.24 386-40 21800 15302 3.42 / 4.88 286-25 51107 36074 1.46 / 2.07 I'd say the P-75/AMD P90 number corresponds with "acceptable speeds" and the P-100 number corresponds with "good speeds". To reach 286-25 speeds, we're about 12x away, assuming 86box accuracy. There might be another big performance jump in freeing up enough conventional memory to put more visplanes in conventional memory. I'm sure there's another big jump in optimizing ASM in drawing, EMS code, and math. Potato detail might help some too. It's good to get some real numbers though. Short term, I may be moving code around to make overlays reduce memory more. (For example, in w_wad.c, we can pull out the wad initialization code into a different file because it's not needed during the rest of gameplay.) I haven't committed the code for this build yet, but the exe is somewhat stable and attached below. Make sure you are using it in conjunction with the shareware doom wad, the attached dstrings.txt and use a very minimal dos setup (attached autoexec and config.sys included.) MEM should show 619KB or more available. This build is mainly configured to barely work with demo3. You can probably try other levels but it'll probably not fit in conventional and will go into EMS. doom16.7z
  14. sqpat

    RealDOOM: DOOM Ported to 16-bit Real Mode

    Some good and bad news I suppose - The bad is that there are still some bugs (two that I can count) that are popping up from time to time causing crashes during gameplay and demos, but the good news is that they are somewhat uncommon and also pretty deterministic, so it should be easier to fix, and it also doesn't get in the way of benchmarking too much. In the short term, in addition to fixing bugs, my focus is on maximizing performance with regards to what gets stored in EMS versus what little conventional memory we have. Comparative Benchmarks with EMS/Conventional Allocations So timedemo 3 run on 86box with a pentium 233, high detail and a very small window yielded the following results: Everything in EMS: Realtics 2134 (100% - yes, it ran exactly 1:1 with gametics) 6618473 reads from EMS memory manager (100%) 1746208 EMS page swaps (100%) 65535 Bytes made available for sectors, lines, vectors, etc in conventional (not enough for everything, but was enough for a few large fields) Realtics 1593 (74.6%) 4674431 reads from EMS memory manager (70.6%) 409409 EMS page swaps (23.4%) Texture info cached in conventional (Requires around 48000 bytes in shareware DOOM) Realtics 1942 ( 91.0%) 5312437 reads from EMS memory manager (80.2%) 1363269 EMS page swaps (78.1%) Sprite info cached in conventional (Requires 7000 bytes in shareware DOOM) Realtics 2122 (99.4%) 6528690 reads from EMS memory manager (98.6%) 1728079 EMS page swaps (98.9%) Thinkers cached in conventional (I gave it around 50000 bytes but to support max thinkers = 1000 would require as much as 97000) Realtics 2022 (94.8%) 4736768 reads from EMS memory manager (71.5%) 1518549 EMS page swaps (86.9%) So it's pretty clear caching the level variables like sectors and such causes the biggest speed up... texture info is also good to cache. Sprite info seems like it may not be worth caching. I'd like to make around 128k available for level data and also have the texture info cached eventually. Right now there's something like 50-100k available depending on build settings. Another 20-30k at least can get freed up if overlays work (hard to tell when there are unrelated bugs) and I can probably save the same off the stack eventually (currently using 32k stack but it should be easy to make some code changes and then make this 8k or maybe less, will require testing.) And then in theory a well configured machine will have as much as 96k extra conventional memory lying around in the high memory area, so I think there will be a lot to work with once I write some code to make use of that space too. I have to consider at some point sounds and music will require a lot of memory too though. Some fields like texture info are pretty static and don't need to be changed after game start, while things like level data (sectors, etc) don't really change in size after level load, which means you can just give them space in a contiguous memory blob and not get any fragmentation during the level - then between levels you just clear it out and re-allocate. So they don't even need complicated allocation management. Thinkers are kind of a pain because they are constantly being created and recreated which causes some fragmentation. I made a really basic allocator that was kind of wasteful on memory and good enough to run a demo, and it seems it doesn't improve run speed too much. I think what's going on is that while a lot of reads are being made for thinker objects, a lot of them are 'cache hits', hitting something already in the EMS page frames, meaning that the page swaps don't actually decrease too much. It's probably best to leave these in EMS, though I can maybe allocate them a dedicated range of pages, and that might lead to an even better cache hit rate. I think the player object is still in EMS - I'd like that in conventional that but it might require some code rewrite. Hybrid visplanes are still active. I'm sure I can mess around with how many are in conventional memory and how much is in EMS to get some better average performance. Currently 60 are in conventional memory and more than that becomes EMS visplanes. I think once this EMS performance is sort of "maxed out", we'll end up with about twice the framerate as before. Which is a nice start, but not enough. We will still be in 'fast pentium or 486' territory requirements at that point. I'm kind of just starting to get back into the feel of everything in the codebase after being away for so long, but I should be able to more or less work on this every day for the rest of the year. Hopefully I can get a pretty stable build sometime this month that is compatible with the doom shareware WAD.
  15. sqpat

    RealDOOM: DOOM Ported to 16-bit Real Mode

    Phew, I finally was able to fix the main bug causing all the desyncs in the 16 bit build. A mobj pointer was being paged out then continued to be used in P_DamageMobj. After fixing this bug, it seems all three demos play back. I'm going to continue some work on conventional memory allocation and then do some comparative benchmarks to see what helps most when its in conventional memory.
×