Contribution to project quality by mitigating code smells. #11240
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hello, cBioPortal Community!
We are excited to contribute to your incredible project. As part of a Software Maintenance discipline in our academic program, we were encouraged to engage with open-source projects by identifying and addressing code smells, while analyzing and improving their quality. Leveraging Understand, a static analysis tool, we performed an in-depth review of the codebase and implemented targeted improvements. These efforts resulted in notable enhancements, particularly in terms of cyclomatic complexity reduction, modularity, and maintainability.
This pull request introduces refactorings across multiple classes and methods, focusing on enhancing readability, improving cohesion, and ensuring adherence to software engineering best practices. Through this contribution, we hope to make the codebase more sustainable for the community and easier for new contributors to navigate. Below, we provide a detailed account of the changes, their motivations, and the outcomes achieved.
1. Feature Envy:
Classes: MolecularProfileRepository and related components.
Issues:
Misplaced responsibilities, with methods handling molecular profile logic instead of focusing on persistence.
Solution: Extracted molecular profile-related logic into a new utility class, MolecularProfileUtil.
Encapsulated grouping and interaction with molecular profiles in dedicated methods (groupIdentifiersByProfileType).
Impact: Increased cohesion in MolecularProfileRepository.
Centralized and reusable logic in MolecularProfileUtil.
2. Large Class and Long Method:
Classes: CustomEhcacheProvider and SessionServiceController.
Issues: Classes with multiple responsibilities and overly complex methods.
Difficulties in understanding and maintaining high-cyclomatic complexity code.
Solution: Split responsibilities into dedicated components:
CustomEhcacheProvider was modularized into:
Extracted large methods (e.g., detectCacheConfigurationErrorsAndLog) into smaller, testable utility functions (CacheValidationUtils).
SessionServiceController was split into two controllers: VirtualStudyController for virtual study operations. CustomDataController for custom session data management.
Impact:
3. Long Method:
Class: DataBiner.
Issues: High-cyclomatic complexity in the calcNumericalDataBins method.
Muddled logic due to multiple responsibilities within a single method.
Solution:
Impact:
Code Quality Improvements:
The refactorings resulted in measurable improvements in the following metrics:
Cyclomatic Complexity: Reduced in multiple classes and methods, leading to easier comprehension and maintenance.
Modularity: Improved by isolating responsibilities into smaller, reusable classes and methods.
Cohesion: Increased by ensuring each class has a focused responsibility.
Testability: Smaller methods and classes can now be independently tested, improving code reliability.
Tools and Process:
Static Analysis:
Refactoring Techniques: Method extraction - Class extraction.
Responsibility reallocation to utility classes.
Validation: All refactorings were tested to ensure no change in functionality.
Existing tests were run successfully, and new unit tests were added for the extracted utilities.
Challenges Faced: Understanding the legacy codebase and identifying areas for improvement required significant time due to the large and complex architecture.
Balancing modularization while maintaining existing interfaces and avoiding disruptions to other parts of the system.
Final Notes: These refactorings align the codebase with modern software design principles, ensuring that:
Code is easier to read and maintain.
Logic is modular and reusable. The project is better prepared for future enhancements.
This contribution follows the criteria of improving code quality based on collected metrics and addressing code smells. We are open to further feedback from the maintainers and community to refine this work. Thank you for the opportunity to collaborate!