-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring suggestions #36
Comments
Thanks a lot for suggesting these. Overall, yes, all of these are very welcome, with a few caveats that I comment on below: Regarding modularity and functional style, OK but only if not at the expense of readability. For such a small API where most procedures are < 20 LOC, I personally don't see meaningful gains here, but let's look at specific suggestions for changes. Submodules are fine, but again, personally, for such small modules, I prefer to see the whole thing in one file. For the time being, can you open a separate issue for A thing to keep in mind regarding the functional style is that I'll be soon introducing a module of activation subroutines (there already are activation functions) which will modify the activation in place (and be called aptly, Another (simple) refactor that is coming up (and has been done in FKB) and that must come before CNNs is to implement forward and backward propagation on the layer type rather than the network type. I expect that a base layer type will become an For tests, I prefer test-drive--simpler to me and a Fortran-lang project. CI, UML, and FORD files will all be very welcome. If opening PR, please keep one PR per feature (e.g. CI, UML, FORD, associate, submodules should each be separate PRs). |
@milancurcic thanks for getting back to me so quickly with the feedback. I'm not familiar with test-drive. I look forward to checking it out. FYI, I just added a "Nomenclature" section at the end of the checklist in the original comment in this issue. |
While I appreciate that many people like this convention, I don't, sorry. It's possible that we will need a suffix in source filenames if we start using submodules, so let's discuss that then. Another heads-up on nomenclature that I forgot to mention earlier: Currently module names and source files are prefixed with |
@milancurcic regarding the separation of procedure interfaces from procedure definitions, I think it's important to keep two categories of reader in mind: developers and users. Many users (especially a naive user) just want to know how to invoke a procedure (as described in the interface body) and what it does (as might be described in a comment). From a user perspective, there's likely little benefit from bundling the procedure definition with the procedure interface body and there are the downsides of information overload, increased scrolling, and possible confusion (e.g., if the developer wants me to see the innards, was that because the procedure does something unexpected like modifying non-local variables?). Also, "user" in this context is anyone who needs to invoke the procedure, including a contributing developer. Former Fortran committee member and Fortran book author Richard Maine makes the point that separating interfaces from definitions has subtle conceptual benefits beyond the practical benefits often discussed (compilation cascades). I've found this separation very helpful in working with clients on modernizing code and trying to disentangle some of their mental mapping of how different parts of a project relate to each other when they've gotten used to doing things a certain way like using global data. In such cases, people make a big conceptual leap when they realize they don't need to read through every line of a procedure to competently use the procedure. Lastly, it's always possible to repeat the interface information in the procedure definition if one wants to do so. I prefer to avoid the duplication and the potential inconsistencies, but at least it's still nice to clearly exposes the minimal information that a user needs without burdening them with unnecessary implementation details. |
OK, that's a fair point and I trust you, even if I have some initial resistance. I opened #40 to track it. |
@milancurcic FWIW, I just refactored a module on another project and remembered another benefit of submodules: tighter scoping. Non-public entities can be moved to the submodule. When there are constants, module variables, and procedures that are only used internally in executable statements, moving such entities to submodules means less that a user has to wade through. I haven't looked yet at whether this is the case for any entities in neural-fortran. We'll see. |
@milancurcic I keep remembering additional subtleties to the module/submodule pairing (which is why I need to write a new book to gather all this stuff in one place). The separation affords a really nice way around Fortran's prohibition against circular module dependencies. In situations where it would be useful for module A to use module B and module B to use module A, a prohibited circular relationship, submodules often come to the rescue! When the executable statements of a procedure in module A reference a procedure in module B and vice versa, what the procedures are really referencing are each other's interfaces so the dependency graph forms an X instead of a circle:
where the above depiction looks best when rendered in a browser. This doesn't necessarily come up often, but it shows one of the more subtle benefits of exposing the distinction between two very different concepts that otherwise get conflated. When the procedure interface and the procedure definition are separate, many otherwise circular dependencies vanish. I think that's a really nice example of the conceptual benefits. |
My book uses what it calls the Surrogate pattern to solve the above problem. I recently heard MIT computer science Prof. Saman Amarasinghe say "Design patterns are all the things you forgot to put in the language." In our book, we had to invent a pattern in Fortran 2003 as a workaround for a problem that's more naturally solved by a built-in language capability in Fortran 2008. |
Of course, just 2 days after I wrote "This doesn't necessarily come up often,..." regarding circular module dependencies, I now see a recent commit message indicating that Brad (@everythingfunctional) had to do some work to break a circular dependency in software that we're developing for a clients at a private company. I haven't dug through the commit to see if submodules could have helped in this case, but seeing this so soon after my statement makes me wonder just how often the issue arises. It's possible that the only reason I was thinking it doesn't come up often is because we've all gotten used to having to design around a limitation in the language that has been alleviated by submodules, which most people don't use yet. |
@milancurcic I'd like to work on the test items from the above checklist next. The main thing I'd like to do is make passing tests more quiet. This is a philosophy that I picked up from Tom Clune and it's also reflected in how I'll take a look at test-drive. On first look, it appears pretty similar to Vegetables so I'm hoping I can come up to speed on it quickly. If not, I could contribute Vegetables tests and either you could help me convert them to test-drive tests or I could figure out how to keep the Vegetables on my fork so they don't get pushed to the modern-fortran repository. Once the testing has been updated a bit, I'll work on setting up CI testing via GitHub Actions. |
I just realized that I wrote "... keep the Vegetables on my fork..." without thinking about the pun. I'll have use that one again. In the mean time, I'll go for a test-drive! 😆 |
On first pass through the test-drive repository, I recommend Garden, the parallel successor to Vegetables. The differences I see so far are
I get the sense that the procedural style of test-drive is considered a feature and I can understand how that style is more immediately intuitive to most Fortran programmers, but it imposes an unnecessary burden on the newcomer. With a functional style, it's immediately obvious what the inputs are (the arguments) and what the output is (the result). Fortunately, test-drive is simple enough that this information can be at least partially inferred pretty quickly from context in its current form, but that still leaves some uncertainty about argument To understand things like argument But item 2 is what seems to me like a deal breaker. I imagine that it's possible that item 3 mitigates the issue if the intention is that the end user handles the parallelism in the main driver, but that leaves a lot of logic to the user to decide, e.g., will every image run every test, and if so, how does one test the parallel execution of the collective subroutines in neural-fortran? |
Thanks for the detailed summary. I mostly see the same picture, but my takeaways are different.
No doubt garden packs a lot of bells and whistles that are very useful for very complex projects. What appeals to me about test-drive is that it's a small and simple tool with a few procedures. And not the least, again, it's a Fortran-lang project, which I'm personally invested in. |
@milancurcic my most significant points are in the text below the itemized list. The simplicity of test-drive is partly because it has a lot less capability and it burdens the user with reading procedure implementations just to understand calling semantics that would be completely obvious in a functional programming style. By contrast, I've used Garden's predecessor, Vegetables, on O(10) projects and I can't recall a single time that I felt the need to read a Vegetables procedure implementation because every procedure I use is a function that takes only About two years ago, I experimented with going all-in on attempting to follow the functional programming style to its logical conclusion by designing very nearly every part of a simulation package around pure functions. The experience blew me away. It was the first time that I had ever written about 1,200 lines of code that just worked very nearly the first time I integrated all the pieces together. Seriously. I spent almost no time debugging. I had never experienced that before and now it has become the norm for me. It was the basis of the keynote address that I gave at the International Fortran Conference last year. Because you use Vegetable Examples |
@milancurcic I just clicked on the link you provided to the parallel code in test-drive. When I mentioned parallelism, I wasn't referring to executing multiple unit tests concurrently. I was referring to testing SPMD parallel code. Wouldn't forking multiple threads to run independent tests cause problems if the tests are exploiting SPMD parallelism, e.g., multi-image tests? |
I see, I didn't understand that. You're right, I don't think test-drive was designed with that in mind. At the very least, all tests in a suite (assuming one suite per driver) would need to run correctly on the same number of images. And, reporting of the suite results would be duplicated. So, I think it's possible to test parallel code with test-drive, but yes, the writer of the tests would need to handle the parallel logic inside the tests. Test-drive is then not the best candidate for testing parallel code. Thank you for the You don't have to sell me on functional style--I was sold years ago. Functional style is an advantage for me, but it alone is not sufficient to warrant choosing one tool over another. The experience that you describe trying to understand test-drive is something that I didn't experience. I never felt the need to look at the test-drive implementation. In fact, the first time I looked was when I went to see how parallelism was implemented in an earlier message in this thread. I adapted many stdlib tests to test-drive (see fortran-lang/stdlib#494) and I didn't look at the implementation a single time. I merely looked at some existing tests that Sebastian already wrote. So, my explanation for the apparently different movies playing in your and my heads is: You used Garden to write many tests--it makes sense to you; I used test-drive to write many tests--it makes sense to me. That said, I don't expect Garden tests to be any more difficult to write than test-drive tests. It's a merely a matter of learning the API.
Damian, this is key. This was the barrier for me when first looking at Garden/Veggies. I looked at about a dozen different Garden tests before. Seeing functions called Once I switch mindset and get used to what Garden is trying to do, the API is quite attractive. Once I remember I'm in an almost DSL-land, it's even more attractive than that of test-drive. If anything, Garden is too smart. Your explanation that made it click for me should be the first thing in the tutorial. Otherwise, if prospective users come in expecting that the function names alone should convey some meaning, they may get confused. For me, the advantages to Garden over test-drive are:
Let's go with Garden then. If you want to work on this soon-ish, I recommend staying within the limits of MNIST data and io modules, and the activation functions. Most of the other modules will be entirely re-written with a re-factor that I'm working on. |
Exactly. I experienced the same, but I couldn't think of any better suggestions to give for the names. As you probably picked up, "It" refers back to the quote just after
In reading the Garden documentation yesterday, I discovered that it's inspired by behavior-driven development (BDD), which in turn grew out of test-driven development (TDD). I had heard of BDD before, but didn't really get it until just now. This quote from the Wikipedia page precisely describes what Garden/Veggies tests read like: "BDD is largely facilitated through the use of a simple DSL using natural-language constructs (e.g., English-like sentences) that can express the behaviour and the expected outcomes."
Thanks for your patience with me. Despite my long-windedness, I often spend time editing my comments down to try to make them shorter, but there's just so much to say! It reminds me of when jazz band leader Miles Davis was complaining about saxophonist John Coltrane taking too long on his solos. Coltrane responded that once he gets going, he just can't figure out how to stop. Miles replied, "Try taking your lips off the horn." Sometimes I probably just need to try taking my fingers off the keyboard. LOL This will be quick and painless. :) |
Most of the stuff done here, will close. |
@milancurcic would you be open to a pull request with some refactoring that is minor but global? If so, I would submit one or more pull requests with the changes described below. In all honesty, one of the main reasons I do steps like these is because it walks me through the project in a way that keeps my brain actively involved in consuming and understanding the code, but there are potential inherent benefits to the project in the parenthetical descriptions below. If you like some ideas but not others, you could check the ones you like. Otherwise, I can check them off as the pull requests get reviewed and merged.
Increase modularity
Adopt more functional style
Useassociate
wherever possible to ensure immutability (which reduces chances of mistaken data modifications)reverse
in one or two places)Leverage Vegetables in the Tests
Note: Not using vegetables but essentially resolved as of v0.3.0.
Generate & Deploy Documentation
Add PlantUML diagramsAdd a CI script that deploys the documentation to GitHub PagesContinuous Integration Testing
Nomenclature
This is a style that Brad (@everythingfunctional) and I adopted in recent code:
Module name suffix_m
Type name suffix_t
_submodule
Note: Using
_submodule
for submodules as of v0.2.0.The text was updated successfully, but these errors were encountered: