Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BOLT: A Practical Binary Optimizer for Data Centers and Beyond #45

Open
kavon opened this issue Jul 19, 2018 · 0 comments
Open

BOLT: A Practical Binary Optimizer for Data Centers and Beyond #45

kavon opened this issue Jul 19, 2018 · 0 comments

Comments

@kavon
Copy link
Member

kavon commented Jul 19, 2018

Link: https://arxiv.org/abs/1807.06735

Abstract:

Performance optimization for large-scale applications has recently become more important as computation continues to move towards data centers. Data-center applications are generally very large and complex, which makes code layout an important optimization to improve their performance. This has motivated recent investigation of practical techniques to improve code layout at both compile time and link time. Although post-link optimizers had some success in the past, no recent work has explored their benefits in the context of modern data-center applications.

In this paper, we present BOLT, a post-link optimizer built on top of the LLVM framework. Utilizing sample-based profiling, BOLT boosts the performance of real-world x86 64-bit ELF applications even for highly optimized binaries built with both feedback-driven optimizations (FDO) and link-time optimizations (LTO). We demonstrate that post-link performance improvements are complementary to conventional compiler optimizations, even when the latter are done at a whole-program level and in the presence of profile information. BOLT has been deployed inside Facebook for multiple data-center workloads. For data-center applications, BOLT achieves up to 8.0% performance speedups on top of profile-guided function reordering and LTO. We have also applied BOLT to GCC and Clang binaries, and our evaluation shows that BOLT speeds up these binaries by up to 15.3% on top of FDO and LTO, and up to 35.5% if the binaries are built without FDO and LTO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant