-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract objects into individual assembly files. #4
Comments
I did that for Sonic Classic Heroes; it is relatively easy to do. I never backported it to the community disassembly since people whine endlessly about the S1 disassembly because Hivebrain did just that at some point. Then again, the kind of people that whine about this is the same kind of people that prefer using the 2007 S2 disassembly, so... |
I don't see why it isn't the sensible thing to do. I'm pretty sure the original source code wasn't all in a single file. |
I agree it is the sensible thing to do; my point was more about how people will whine endlessly -- the "this is why we don't get to have nice things" factor. |
Who has the final say on what gets done? |
In this case, the Sonic Retro Tech Members (of which I am part), which probably agree that it is the sensible thing to do. |
The main problems I see with it are: some objects have common routines in the middle of their code, some objects have special code outside their main routines, and some objects share code with another object. |
Perhaps it could be broken up at least into parts. For example, all the level event code in one asm file, all the object code in one asm file, all the special stage code in one asm file. For objects themselves, maybe sonic and tails code in one file initially. Also, what are the concerns with some files containing a single sub routine? I think a lot of applications written in assembly generally do this as the source code for a single subroutine can be so large. And all related sub routines can go in a categorised directory. |
One of my other concerns with making significant structural changes to the disassembly, is that it makes guides and project files for programs complicated. For example, SonLVL depends on the spring object's sprite mappings being in s2.asm to display them, if the mappings were moved, the project files would either need to be updated for the new file, which would break compatibility with people basing hacks on older revisions; or split into two sets, one for the older revisions and one for the newer ones, which could then potentially lead to more splits down the line as more changes occur (and honestly, who knows whether they took the code before or after any given revision). |
It seems like you are letting these issues prevent any further refinement of the structure of the disassembly. The problem is if I extract code into different files for my own personal branch, I don't think git would be able to handle further PRs which document code in s2.asm and move them accordingly to the separate assembly file in my branch. This means I am stuck with having to navigate a 100K line file which GitHub can't display unless in raw mode. Are there not more advantages to separating out the code than there are disadvantages? |
I don't hate the idea of splitting everything, but I'm not super fond of it either. Personally, I find it a bit tedious trying to research something across multiple files here, there and everywhere; a decent IDE would probably make this easier, but equally, any decent, modern text editor is able to parse, highlight and find in files as long as s2.asm without breaking a sweat. The argument that splitting will make pull requests from a split fork difficult has an inverse. I've had S3C in its own Git repo for the past year, and hooking the main disassembly up as an upstream and merging improvements directly back to my ongoing project has been hugely beneficial. If the S3&K disassembly were to then radically split its main assembly files at this stage, I can only predict I'd have to drop this capability or undergo enormous headaches to ensure my own work and changes fit into a new shape. I don't mean to say that the primary repos should be absolutely beholden to child projects, as it'd be all too easy to run into complete paralysis, but I feel this is just as relevant an issue as the PR one from that angle. I do think MainMemory's considerations regarding external editors are pretty important too - it's again not ideal to be restricted by external factors, but in the context the disassemblies are likely to be used, I think it's fair to be concerned about interoperability with derived projects. Also, while I don't have much time for some folks' dogged determination to stick with antiquated disassemblies and ludditery around labels and macros, by far the loudest and most common complaint I have heard against the SCM disassemblies is the granular file splitting in Sonic 1, and that particular point is one I can understand. The question I want to ask is what would be improved, what problem would be solved by splitting the main assembly. I'm not convinced by the idea that large text files are implicitly harder to deal with these days, but I'd be interested to hear what the actual difficulties are with the current situation. Being unable to view the files directly in Github's interface is indeed an inconvenience, but View Raw seems fairly satisfactory for the very few occasions I've personally needed this. I'm sure I've missed plenty of "for" arguments in past holy wars, but I think it'd be worth laying them out here. I don't necessarily disagree with a reasonably split structure being a sensible option, but I'd personally like there to be a concrete idea of exactly why (and how) that should happen, and how usable the result will be, before implementing a relatively costly change, particularly one the wider community are reasonably likely to object to, with the possibility that even more people might jump back to the safety net of the 2007 version. If there's really a good justification for it, it shouldn't be that difficult to persuade people to stick with it, particularly if we can illustrate a better workflow using the modified setup. |
I think the benefit of splitting up routines is that you can group together routines that aren't necessary together in the main assembly. Effectively sub directory organisation provide implicit namespacing of all the sub routines. I would of thought that would help with research and finding where a particular piece of logic is. Providing any other projects have the latest commit of master as a parent, any downstream merges should be straight forward. It would just mean all the sub projects have to also use the same file split. If the file split is well agreed on, most people should be up for it. In terms of readability, I find a file split provides a good structured hierarchy of the code without actually looking at the code. I have been able to find things much quicker in the S1 repo due to the file splitting as I do not need to work out what text string to search for to find what I am looking for in the code. And if I did want to find a text string across multiple files, any good text editor supports that as well as the GitHub search box. I have made a list of all the named labels in s2.asm ordered by address to help show the linear structure. |
I really would like to see changes in the structure of the disassembly, including moving all the binary files so they are organized by usage rather than data type, like the S3K disassembly, but it would complicate things, especially for me, because I have to figure out what to do with SonLVL's project files. Honestly, I'm mostly just waiting and hoping that FraGag will be able to everything he's said he would with his own assembler/IDE and we'll be able to start over with completely new disassemblies for it. |
That's certainly something I agree with - to me the current structure seems analagous to having subdirectories in "My Pictures" for "JPEG", "BMP", "PNG", and then naming all the files in those directories ".bin" for some reason. The latter applies even to S3&K - wouldn't designations such as, for example, ".mdtile.kosm" make everyone's lives easier in the long run? (While we're on bugbears, for that matter, the mystery meat "Objnn" style object labelling always rubs me up the wrong way, seeming barely a step above "loc_n", but perhaps that comes from spending too much time with S3&K where the actual object ID is barely relevant.) I personally find these issues a bigger pain than whether the code itself is split or not. I can't really argue with IntelOrca's points, other than to wonder that if we were considering a structure of putting all level events in one file, all non-player object code in one file etc. as suggested earlier as a compromise, wouldn't that suggest that it must all be reasonably logically ordered in the original file in the first place to make that work? In S3&K I generally find the original code to be well enough organised to get by, other than some oddities resulting from a few S3 level events etc. being out of place and Carnival Night generally being all over the ROM for some reason. My suggestion would be for two things to happen. Firstly, to open this up as a discussion in the Tech Members lounge, if not E&RE itself. I have no doubt that at least the latter will lead to unpleasantness, but I'd rather at least have the conversation in a slightly wider context, since I'm not sure there's necessarily a large or representative group following this discussion. At least trying to cultivate a reasonable discussion such as this one where the benefits and drawbacks can be outlined and alternative workflows for folks who'd need to change the way they do things suggested seems like the right thing to do, and might save the "grumble grumble somebody changed everything behind my back and I can't work with it and I hate it" factor. Secondly, how about trying to flesh something concrete out on a branch, and involving as many people as are willing (and capable) in that process? I wouldn't want that branch to be hanging around as a permanent alternative - there'd have to come a point where it's either merged or abandoned based on feedback. Also note that when I say involving people, I mean purely from the perspective of soliciting feedback and ideas which may or may not be acted upon - pure community projects don't have a stellar history here of course, whereas strong leadership will at least result in a decision, even if many others hate it. Applying the same branch process to all the main MD disassemblies at once might not be a bad idea either. Just some ideas, anyway - they perhaps rely too much on anyone having time to drive this stuff, amongst other things. |
Perhaps many of these issues could be solved with a script to automate it, just to see what it looks like. I agree that the directory structure for levels should be, levels\ehz\ then the files for mappings, chunks etc. which could have their own extensions which make it more obvious what compression its using too. For HTZ, I think it should only include the level layout, so that it is basically looks like you are using EHZ mappings with a different palette, rather than a "shared" mapping for EHZ and HTZ, if that makes sense. Yesterday, all my text editors were struggling to edit the s2.asm because the file was so large. This is one of the reasons why I wanted to split it, before I start documenting all the objects and level events. |
'Sensibly' massive changes should be kept to their own branch. Just look at ObjectMacros, a tool-compatibiltiy breaker that keeps itself separate. If this branch requires effort to keep up to date, then that's your problem. The users of the joint-code system shouldn't have to suffer because you want a different layout. People are willing to stick with Hivebrain's severely outdated S1 disassembly after all these years because they don't want the Git disasm's formatting. It's bad enough that they miss out on the newer labels and commenting, but if that disassembly were replaced with the Git disassembly, how would they react? It's hardly refinement if I can already search 'Slicer' in s2.asm and immediately find its entry in the object pointer table, and with it the main name of the object's code, and can continue searching 'Slicer' and find almost every piece of code relevant to the Slicer in the whole ROM; only for s2.asm to be split, and ruin all of that. I wasn't exactly having a problem locating the slicer's code prior. So what would I have to do? Open every asm file and then search through all of them? That just defeats the point of having it split. I could take the slower route, and fumble around with the various folders, and pray that one folder actually contains all code relating to the object of interest... which is likely under the same name as the string I was just searching. The modularity's already there: open the big code file, search the name of the object you're looking for, and look at the code past the ObjXX label. If I want the level events, I'll search 'LevelEvents', and there it is. What you're suggesting is another, slower way of doing the same thing. You might as well have a text file with a list of labels sorted into groups. At least that won't destroy s2.asm's structure or readability. If I wanted to find everything relating to EHZ, I would search 'EHZ'. I'd find the palette cycle, music, demo data, even the assigned level select icon. Would a split be as thorough as that, or would I still have to search through the hollowed out s2.asm, or even the other split files? Would InitCam_Index be split into its own area, with all of the code it points to being a part of it, to keep things together, in the middle of nowhere; or will it just be a collection of 'include's pointing into a never-ending list of folders, just so everything fits with the rest of its zone? Eventually, the code becomes unreadable. It'll either be half done, like S1, or it'll be so much of a mess that it won't be worth it. Splitting it all is redundant. What is an object code folder if not a duplicate of Obj_Index? What is distance between related code if a search tool and an appropriate naming scheme can effectively erase it? If the code's been named, it's already been grouped. All that splitting will do is break compatibility with tools and guides, disregard all users of previous versions, and serve to replace something that worked just fine before. No Hivebrain users have set out to merge all files back into sonic.asm and replace the current disassembly with it: personal need against others'. I've merged all files back into sonic.asm once before. So could anyone else. If I felt there was a need, I'd save everyone the hassle and create a public repo for the thing, as I did with Project Two Eight, making two options available. And here you want to erase the disassembly everyone else has been using because you don't want to maintain your own. You're completely going against one group in favour of another. There can be no instance where "most people should be up for it". If you and others want to put up with that, that's fine. But for the above reasons and more, don't make us have to put up with it too. |
I am not expecting everyone to change over in a flick. I was just wanted to start a discussion on splitting the disassembly, pros and cons. Ideally I think the disassembly should be turned into "source code", closer to what the game would have been assembled from in the first place. If this was a side project that started, in a new branch or repository, it could be created through many discussions. A hack developer then might then consider developing a hack by branching off a more project oriented code base which tools like SonicLVL could also handle. |
Wow, it's been almost three years. Anyway, nowadays I'm not as opposed to the idea of splitting the game's code up, having gotten much more used to working with C projects, where it's common practice. My main reason for posting here though is to suggest a certain way of splitting objects and the like: do so in a way that closely follows the game's original source code. While updating the disassembly to produce REV02 binaries, I noticed these odd things that the disassembly refers to as 'JmpTo's. They're these (sometimes huge) groups of JMP instructions to common functions, like PlaySound and SolidObject. What I found notable at the time was that they're largely absent in REV02. It took a lot of work to find and note exactly which ones were removed, and which ones were not, leading me to believe they were machine-generated, rather than hand-written. Well, that, and for whatever reason they usually have a stray NOP instruction at the end that does nothing, and they also pad themselves to the nearest longword. Long story short, I eventually found that a likely assembler Sonic Team may have used was the 2500AD assembler, which had a special property: it could produce .obj files, and link them together. Now, I've never used this assembler, so I can't be sure, but those JmpTos are likely part of the linking process. They must have updated their version of the assembler at the time they made REV02 (or perhaps just changed a setting), leading to the JmpTo's being totally different in that version. My theory is that each JmpTo block indicates the end of a source file, since you can find them at the end of each object. Notably, according to this, some objects were part of the same source file, such as Obj25, Obj37, the unused Big Ring, and ObjDC - all being ring objects. At the very least, splitting the code this way will provide some kind of template that people can hopefully agree on, rather than have whoever splits it just adapt it to their style, a la S1 Git. |
This is a pretty interesting discussion. Being at the point of just about to structure a disassembly I'm working on it's great to read about some of the pros, cons and concerns you guys have on changing your own repo to something more structured. @Clownacy - This worry really jumps out at me though "How would I find 'Slicer'? Open every asm file and then search through all of them?" Hopefully in the 5 years since you posted that you've already moved to a new tool for writing code but on the off chance you haven't can I point out that there are editors capable of searching through multiple files in a directory structure (sub directories included) so I'd say this is an easily mitigated problem, unless there are solid reasons you can't use the likes of Notepad++ or Visual Studio Code? |
|
I saw that, but it makes no mention of knowing there are way of searching through these commonly practiced split files, perhaps you still open them one by one and have just gotten much more used to it!? I know it was unlikely, but imagine how much of a time saver someone mentioning it would have been if you didn't realise those tools could do this! I reckoned the benefits of mentioning it and you having not known about it far outweighed the consequences of mentioning it but you already being aware. :) |
As a somewhat related side note, I believe Sonic 2 used the ProAsm assembler on the Amiga, as it supported all the directives and macro formatting in the S2NA debug mode source code, and also did the JmpTo thing, though it was only for word branches that were too far away, not necessarily that the references were outside of the file (it also just errors out on byte sized branches being too far away). Symbol data generation also appeared to be in line with S2NA's symbol data. I dunno about any other assembler that would fit the bill as well as that one (if anyone does know of one, then do tell!). The main assemblers SoA recommended to devs for DOS didn't support things like "addsym" or do the "JmpTo" thing, even with linking for all of them enabled. 2500 AD's assembler (X68k) also has different directives, as well. With that in mind, then yeah, I totally agree with the idea that JmpTo's indicate the end of a source file. It might be worth looking into further at some point. |
It was mentioned here that Sonic 2 may have used either some version of Devpac for some Atari computer or ProASM on the Amiga (noting there may have been a DOS version made by another company). Does Devpac do the JmpTo thing, though? Note that, by comparison of code between Sonic 1, Sonic 2 Nick Arcade, and Sonic 2 final, as well as that debug mode source code snippet, I can confirm that JmpTos are created whenever a word-sized branch is too far away. |
Are we sure it's as simple as using a JmpTo when a word-sized branch is out of range? Look at I get that it's possible that the assemblers follow some kind of all-or-nothing approach, where if they detect one branch to a certain address is out of range, then they make all branches to that address in that particular source file use a JmpTo/longword-sized jump. But how would a single-pass assembler pull off what REV02 is doing, where it will either use a four-byte This weirdness only happens in the menu code: I don't get it. If I make the JmpTos automatically create themselves depending on whether the destination is in range or not, everything in Sonic 2 is bit-perfect but that darn menu code. This is such a headache... |
My best guess is that in REV02, they manually changed those BSRs to JSRs, though, I couldn't tell you the reason why. Obj5F is a good example of manual code changes taking into account of JmpTos not being automatically generated, with JmpTo_DisplaySprite. I can't really imagine it any other way, to be honest. Dunno about why JmpTos were being produced for those specific branches, either, though, it is worth noting that the menu code does seem to be put in a location where the branches to those functions border on being out of range (earlier branches are in range, but the later ones aren't). Also dunno why I said it was ProAsm that generated the JmpTos, when it's the linker that would do that. I am very much unsure of what linker they used. |
I would like to refer to this comment I made in the Sonic 1 disassembly, because the logic in that also applies here in Sonic 2. Sonic 2 also adds the "RINGACT" (ring layouts) and "TBL2" (additional compressed graphics, placed between the sound driver + drum samples and music + SFX data) sections. |
This all seems to match up quite nicely with Sonic 2's Nick Arcade prototype, particularly where there are lengthy gaps of uninitialized data. (I guess, by extension, this also includes other Sonic prototypes.) Can anything similar be determined for Sonic 3 & Knuckles, by chance? |
I've made an extremely primitive split of s2.asm into separate files in accordance with the game's linker data, which can be found in the 'code-split' branch. Unfortunately, there seem to be a handful of cases where it's obvious that one file ends and another begins, but no linker data exists to indicate this. Notably, this is a problem with the lamppost object (which is currently fused with the floor and wall distance calculation logic) and the spike and GHZ purple rock objects (which are conjoined). This seems to be due to a combination of the file not using any long-distance external symbols (preventing JmpTos from being emitted), and the file incidentally ending on an address that is a multiple of 4 (preventing 'align 4' or a dummy 'nop' from being emitted). Thoughts? |
Some things I should note:
There's probably a few more things that should be split based on information found in the prototypes, but this should be useful. |
Aha, awesome idea to check to check the prototypes too! |
As Brainulator pointed out in #4, the prototypes contain linker output in some places where it is missing in the final ROM.
One more: there's a nop between the SolidObject routines and Sonic's code in the September 14 build. |
Thanks to Brainulator for finding the linker data for this and pointing it out in #4.
Great, thanks! |
I was wondering whether it is a good idea to extract the objects into an assembly file for each object and include them where they were extracted from in s2.asm, like what was done in the Sonic 1 disassembly. This shouldn't change the resulting binary that is created if order is maintained.
It probably would be easier to do it in stages rather than in one whole go though, unless someone is committed to doing that.
The text was updated successfully, but these errors were encountered: