This is a comprehensive list of papers on database theory for understanding and building database systems. It covers various aspects of database systems, including the essential theoretical background, classic system design, and multiple modules within the database.
The list is organized into different categories and subcategories for easy navigation. Each paper is accompanied by a title, author, and publication year, along with a link to the full text if available.
This collection serves as a learning and training resource primarily for the Tencent Cloud Database Team and is also open to external researchers, students, and learners interested in database systems.
In case you are reading this and making the effort to comprehend these papers, we would really like to have a conversation with you regarding opportunities at Tencent Cloud Database Team (@Henry L.).
This list is generated from a Sheet document automatically. If you have any suggestions or would like to contribute to this list, please feel free to file an issue. And we will update our sheet to make the chagnes available for public.
Any contribution that can help improve this list and make it more comprehensive and useful to the community are welcome. Here are some ways you can contribute:
- Add a new paper: If you have a paper that you think should be included in this list, please file an issue to provide the paper's title, author, publication year, and a link to the full text (if available).
- Update an existing paper: If you find any errors or outdated information in the list, please file an issue to provide the correct information.
- Remove a paper: If you think a paper is no longer relevant or useful, please file an issue to suggest its removal.
- General suggestions: If you have any general suggestions or feedback on how to improve this list, please file an issue to share your thoughts.
- Database Papers
- A Relational Model Of Data For Large Shared Data Banks (1970) - Codd, Edgar F.
- Sequel: A Structured English Query Language (1974) - Chamberlin, Donald D., and Raymond F. Boyce.
- Ingres: A Relational Data Base System (1975) - Held, G. D., M. R. Stonebraker, and Eugene Wong.
- Extending The Database Relational Model To Capture More Meaning (1979) - Codd, Edgar F.
- A Critique Of The Sql Database Language (1984) - Date, C. J.
- A Critique Of Snapshot Isolation (2012) - Yabandeh M, Gómez Ferro D.
- The Part-Time Parliament (1998) - Lamport, Leslie.
- Paxos Made Simple (2001) - Lamport, Leslie.
- Consensus: Bridging Theory And Practice (2014) - Ongaro, Diego.
- In Search Of An Understandable Consensus Algorithm (Extended Version) (2014) - Ongaro, Diego, and John Ousterhout.
- Distributed Consensus Revised (2019) - Howard, Heidi.
- A Generalised Solution To Distributed Consensus (2019) - Howard, Heidi, and Richard Mortier.
- Paxos Vs Raft: Have We Reached Consensus On Distributed Consensus? (2020) - Howard, Heidi, and Richard Mortier.
- Consistency Tradeoffs In Modern Distributed Database System Design (2012) - Abadi, Daniel.
- Logical Physical Clocks And Consistent Snapshots In Globally Distributed Databases (2014) - Kulkarni S S, Demirbas M, Madappa D, et al.
- Ark: A Real-World Consensus Implementation (2014) - Kasheff, Zardosht, and Leif Walsh.
- Polarfs: An Ultra-Low Latency And Failure Resilient Distributed File System For Shared Storage Cloud Database (2018) - Cao, Wei, et al.
- Anna: A Kvs For Any Scale (2018) - Wu, Chenggang, et al.
- Strong And Efficient Consistency With Consistency-Aware Durability (2021) - Ganesan, Aishwarya, et al.
- Architecture Of A Database System. Foundations And Trends In Databases (2007) - Hellerstein J M, Stonebraker M, Hamilton J.
- System R: Relational Approach To Database Management (1976) - Astrahan, Morton M., et al.
- The Design And Implementation Of Ingres (1976) - Stonebraker, Michael, et al.
- The Design Of Postgres (1986) - Stonebraker, Michael, and Lawrence A. Rowe.
- Query Processing In Main Memory Database Management Systems (1986) - Lehman, Tobin J., and Michael J. Carey.
- Megastore: Providing Scalable, Highly Available Storage For Interactive Services (2011) - Baker J, Bond C, Corbett J C, et al.
- Spanner: Google's Globally Distributed Database (2013) - Corbett, James C., et al.
- Online, Asynchronous Schema Change In F1 (2013) - Rae, Ian, et al.
- Amazon Aurora: Design Considerations For High Throughput Cloud-Native Relational Databases (2017) - Verbitski, Alexandre, et al.
- Looking Back At Postgres (2019) - Hellerstein, Joseph M.
- Cockroachdb: The Resilient Geo-Distributed Sql Database (2020) - Taft, Rebecca, et al.
- F1 Lightning: Htap As A Service (2020) - Yang, Jiacheng, et al.
- Tidb: A Raft-Based Htap Database (2020) - Huang, Dongxu, et al.
- Polardb Serverless: A Cloud Native Database For Disaggregated Data Centers (2021) - Cao, Wei, et al.
- Bigtable: A Distributed Storage System For Structured Data (2006) - Chang, Fay, et al.
- Dynamo: Amazon’s Highly Available Key-Value Store (2007) - DeCandia, Giuseppe, et al.
- Pnuts: Yahoo!’S Hosted Data Serving Platform (2008) - Cooper, Brian F., et al.
- Cassandra - A Decentralized Structured Storage System (2010) - Lakshman, Avinash, and Prashant Malik.
- Windows Azure Storage: A Highly Available Cloud Storage Service With Strong Consistency (2011) - Calder, Brad, et al.
- Azure Data Lake Store: A Hyperscale Distributed File Service For Big Data Analytics (2017) - Ramakrishnan, Raghu, et al.
- Pnuts To Sherpa: Lessons From Yahoo!’S Cloud Database (2019) - Cooper, Brian F., et al.
- Access Path Selection In A Relational Database Management System (1979) - Selinger, P. Griffiths, et al.
- Query Optimization By Simulated Annealing (1987) - Ioannidis, Yannis E., and Eugene Wong.
- The Exodus Optimizer Generator (1987) - Graefe, Goetz, and David J. DeWitt.
- Extensible/Rule Based Query Rewrite Optimization In Starburst (1992) - Pirahesh, Hamid, Joseph M. Hellerstein, and Waqar Hasan.
- The Volcano Optimizer Generator- Extensibility And Efficient Search (1993) - Graefe, Goetz, and William J. McKenna.
- The Cascades Framework For Query Optimization (1995) - Graefe, Goetz.
- An Overview Of Query Optimization In Relational Systems (1998) - Chaudhuri, Surajit.
- Robust Query Processing Through Progressive Optimization (2004) - Markl, Volker, et al.
- Orca: A Modular Query Optimizer Architecture For Big Data (2014) - Soliman, Mohamed A., et al.
- Parallelizing Query Optimization On Shared-Nothing Architectures (2015) - Trummer, Immanuel, and Christoph Koch.
- The Memsql Query Optimizer: A Modern Optimizer For Real-Time Analytics In A Distributed Database (2016) - Chen, Jack, et al.
- Processing Queries With Quantifiers A Horticultural Approach (1983) - Dayal, Umeshwar.
- Translating Sql Into Relational Algebra: Optimization, Semantics, And Equivalence Of Sql Queries (1985) - Ceri, Stefano, and Georg Gottlob.
- Grammar-Like Functional Rules For Representing Query Optimization Alternatives, (1988) - Lohman, Guy M.
- Query Optimization By Predicate Move-Around (1994) - Levy, Alon Y., Inderpal Singh Mumick, and Yehoshua Sagiv.
- Eager Aggregation And Lazy Aggregation (1995) - Yan, Weipeng P., and Per-Bike Larson.
- Parameterized Queries And Nesting Equivalences (2000) - Galindo-Legaria, C. A.
- Cost-Based Query Transformation In Oracle (2006) - Ahmed, Rafi, et al.
- Using Semi-Joins To Solve Relational Queries (1981) - Bernstein, Philip A., and Dah-Ming W. Chiu.
- On Optimizing An Sql-Like Nested Query (1982) - Kim, Won.
- Optimization Of Nested Queries In A Distributed Relational Database (1984) - L&man, Guy M., et al.
- Sql-Like And Quel-Like Correlation Queries With Aggregates Revisited (1984) - Kiessling, Werner.
- Translating Sql Into Relational Algebra: Optimization, Semantics, And Equivalence Of Sql Queries (1985) - Ceri, Stefano, and Georg Gottlob.
- Optimization Of Nested Sql Queries Revisited (1987) - Ganski, Richard A., and Harry KT Wong.
- A Unitied Approach To Processing Queries That Contain Nested Subqueries, Aggregates, And Quantifiers (1987) - Dayal, Umeshwar.
- Orthogonal Optimization Of Subqueries And Aggregation (2001) - Galindo-Legaria, César, and Milind Joshi.
- Winmagic : Subquery Elimination Using Window Aggregation (2003) - Zuzarte, Calisto, et al.
- Execution Strategies For Sql Subqueries (2007) - Elhemali, Mostafa, et al.
- Enhanced Subquery Optimizations In Oracle (2009) - Bellamkonda, Srikanth, et al.
- Unnesting Arbitrary Queries (2015) - Neumann, Thomas, and Alfons Kemper.
- The Complete Story Of Joins (2017) - Neumann, Thomas, Viktor Leis, and Alfons Kemper.
- Fundamental Techniques For Order Optimization (1996) - Simmen, David, Eugene Shekita, and Timothy Malkemus.
- [Thesis] Exploiting Functional Dependence In Query Optimization (2000) - Paulley, Glenn Norman.
- An Efficient Framework For Order Optimization (2004) - Neumann, Thomas, and Guido Moerkotte.
- Incorporating Partitioning And Parallel Plans Into The Scope Optimizer (2010) - Zhou, Jingren, Per-Ake Larson, and Ronnie Chaiken.
- Accelerating Queries With Groupby And Join By Group Join (2011) - Moerkotte, Guido, and Thomas Neumann.
- Access Paths In The" Abe" Statistical Query Facility (1982) - Klug, Anthony.
- Extending The Algebraic Framework Of Query Processing To Handle Outerjoins (1984) - RosenthaI, A., and D. Reiner.
- Outerjoin Simplication And Reordering For Query Optimization (1993) - Galindo-Legaria C, Rosenthal A.
- Hypergraph Based Reorderings Of Outer Join Queries With Complex Predicates (1995) - Bhargava G, Goel P, Iyer B.
- Rapid Bushy Join-Order Optimization With Cartesian Products. (1996) - Vance B, Maier D.
- Using Eels, A Practical Approach To Outerjoin And Antijoin Reordering (2001) - Rao J, Lindsay B, Lohman G, et al.
- Analysis Of Two Existing And One New Dynamic Programming Algorithm For The Generation Of Optimal Bushy Join Trees Without Cross Products (2006) - Moerkotte, Guido, and Thomas Neumann.
- Optimal Top-Down Join Enumeration (2007) - DeHaan D, Tompa F W.
- Dynamic Programming Strikes Back (2008) - Moerkotte, Guido, and Thomas Neumann.
- On The Correct And Complete Enumeration Of The Core Search Space (2013) - Moerkotte, Guido, Pit Fender, and Marius Eich.
- How Good Are Query Optimizers, Really? (2015) - Leis, Viktor, et al.
- Improving Join Reorderability With Compensation Operators (2018) - Wang, TaiNing, and Chee-Yong Chan.
- Adaptive Optimization Of Very Large Join Queries (2018) - Neumann, Thomas, and Bernhard Radke.
- Modelling Costs For A Mm-Dbms (1996) - Listgarten, Sherry, and Marie-Anne Neimat.
- Seeking The Truth About Ad Hoc Join Costs (1997) - Haas, Laura M., et al.
- Approximation Schemes For Many-Objective Query Optimization (2014) - Trummer, Immanuel, and Christoph Koch.
- Multi-Objective Parametric Query Optimization (2015) - Trummer, Immanuel, and Christoph Koch.
- Accurate Estimation Of The Number Of Tuples Satisfying A Condition (1984) - Piatetsky-Shapiro, Gregory, and Charles Connell.
- Optimal Histograms For Limiting Worst-Case Error Propagation In The Size Of Join Results (1993) - Ioannidis, Yannis E., and Stavros Christodoulakis.
- Universality Of Serial Histograms (1993) - Ioannidis, Yannis E.
- Balancing Histogram Optimality And Practicality For Query Result Size Estimation (1995) - Ioannidis, Yannis E., and Viswanath Poosala.
- Improved Histograms For Selectivity Estimation Of Range Predicates (1996) - Poosala, Viswanath, et al.
- The History Of Histograms (2003) - Ioannidis, Yannis.
- Automated Statistics Collection In Db2 Udb (2004) - Aboulnaga, Ashraf, et al.
- Adaptive Query Processing In The Looking Glass (2005) - Babu, Shivnath, and Pedro Bizarro.
- Optimizer Plan Change Management: Improved Stability And Performance In Oracle 11G (2008) - Ziauddin, Mohamed, et al.
- Histograms Reloaded: The Merits Of Bucket Diversity (2010) - Kanne, Carl-Christian, and Guido Moerkotte.
- Synopses For Massive Data: Samples, Histograms, Wavelets, Sketches (2011) - Cormode, Graham, et al.
- Exploiting Ordered Dictionaries To Efficiently Construct Histograms With Q-Error Guarantees In Sap Hana (2014) - Moerkotte, Guido, et al.
- Adaptive Statistics In Oracle 12C (2017) - Chakkappen, Sunil, et al.
- Probabilistic counting algorithms for data base applications (1985) - Flajolet, Philippe; Martin, G. Nigel.
- Towards Estimation Error Guarantees For Distinct Values (2000) - Charikar, Moses, et al.
- Distinct Sampling For Highly-Accurate Answers To Distinct Values Queries And Event Reports (2001) - Gibbons, Phillip B.
- Leo – Db2’s Learning Optimizer (2001) - Stillger, Michael, et al.
- An Improved Data Stream Summary: The Count-Min Sketch And Its Applications, Journal Of Algorithms (2005) - Cormode, Graham, and Shan Muthukrishnan.
- New Estimation Algorithms For Streaming Data: Count-Min Can Do More (2007) - Deng, Fan, and Davood Rafiei.
- Preventing Bad Plans By Bounding The Impact Of Cardinality Estimation Errors (2009) - Moerkotte, Guido, Thomas Neumann, and Gabriele Steidl.
- Pessimistic Cardinality Estimation: Tighter Upper Bounds For Intermediate Join Cardinalities (2019) - Cai, Walter, Magdalena Balazinska, and Dan Suciu.
- Deep Unsupervised Cardinality Estimation (2019) - Yang, Zongheng, et al.
- Neurocard: One Cardinality Estimator For All Tables (2020) - Yang, Zongheng, et al.
- Querye Valuation Techniques For Large Databases (1993) - Graefe G.
- Volcano - An Extensible And Parallel Query Evaluation System (1994) - Graefe, Goetz.
- Monetdb/X100: Hyper-Pipelining Query Execution (2005) - Boncz, Peter A., Marcin Zukowski, and Niels Nes.
- Efficiently Compiling Efficient Query Plans For Modern Hardware (2011) - Neumann, Thomas.
- Multi-Core, Main-Memory Joins: Sort Vs. Hash Revisited (2013) - Balkesen, Cagri, et al.
- Main-Memory Hash Joins On Modern Processor Architectures (2014) - Balkesen Ç, Teubner J, Alonso G, et al.
- Morsel-Driven Parallelism: A Numa-Aware Query Evaluation Framework For The Many-Core Age (2014) - Leis, Viktor, et al.
- Relaxed Operator Fusion For In-Memory Databases: Making Compilation, Vectorization, And Prefetching Work Together At Last (2017) - Menon, Prashanth, Todd C. Mowry, and Andrew Pavlo.
- Looking Ahead Makes Query Plans Robust (2017) - Zhu, Jianqiao, et al.
- Everything You Always Wanted To Know About Compiled And Vectorized Queries But Were Afraid To Ask (2018) - Kersten, Timo, et al.
- Adaptive Execution Of Compiled Queries (2018) - Kohn, André, Viktor Leis, and Thomas Neumann.
- Db2 Parallel Edition (1995) - Baru, Chaitanya K., et al.
- Parallel Sql Execution In Oracle 10G (2004) - Cruanes, Thierry, Benoit Dageville, and Bhaskar Ghosh.
- Query Optimization In Microsoft Sql Server Pdw (2012) - Shankar, Srinath, et al.
- Adaptive And Big Data Scale Parallel Execution In Oracle (2013) - Bellamkonda, Srikanth, et al.
- Optimizing Queries Over Partitioned Tables In Mpp Systems (2014) - Antova, Lyublena, et al.
- The 5 Minute Rule For Trading Memory For Disc Accesses And The 5 Byte Rule For Trading Memory For Cpu Time (1987) - Gray, Jim, and Franco Putzolu.
- The Five-Minute Rule Ten Years Later, And Other Computer Storage Rules Of Thumb (1997) - Gray, Jim, and Goetz Graefe.
- The Five Minute Rule 20 Years Later And How Flash Memory Changes The Rules (2008) - Graefe, Goetz.
- The Five Minute Rule Thirty Years Later And Its Impact On The Storage Hierarchy (2017) - Appuswamy, Raja, et al.
- The Ubiquitous B-Tree (1979) - Comer, Douglas.
- Principles Of Database Buffer Management (1984) - Effelsberg W, Haerder T.
- The Log-Structured Merge-Tree (Lsm-Tree) (1996) - O’Neil, Patrick, et al.
- A Comparison Of Fractal Trees To Log-Structured Merge (Lsm) Trees (2014) - Kuszmaul, Bradley C.
- Design Tradeoffs Of Data Access Methods (2016) - Athanassoulis, Manos, and Stratos Idreos.
- Designing Access Methods: The Rum Conjecture (2016) - Athanassoulis, Manos, et al.
- Wisckey: Separating Keys From Values In Ssd-Conscious Storage (2017) - Lu, Lanyue, et al.
- Managing Non-Volatile Memory In Database Systems (2018) - van Renen, Alexander, et al.
- Leanstore: In-Memory Data Management Beyond Main Memory (2018) - Leis, Viktor, et al.
- The Case For Learned Index Structures (2018) - Kraska, Tim, et al.
- SuRF: Practical Range Query Filtering With Fast Succinct Tries (2018) - Zhang, Huanchen, et al.
- Lsm-Based Storage Techniques: A Survey (2019) - Luo, Chen, and Michael J. Carey.
- Learning Multi-Dimensional Indexes (2019) - Nathan, Vikram, et al.
- Umbra: A Disk-Based System With In-Memory Performance (2020) - Neumann, Thomas, and Michael J. Freitag.
- Xindex: A Scalable Learned Index For Multicore Data Storage (2020) - Tang, Chuzhe, et al.
- The Pgm-Index: A Fully-Dynamic Compressed Learned Index With Provable Worst-Case Bounds (2020) - Ferragina, Paolo, and Giorgio Vinciguerra.
- From Wisckey To Bourbon: A Learned Index For Log-Structured Merge Trees (2020) - Dai, Yifan, et al.
- Caas-Lsm: Compaction-As-A-Service For Lsm-Based Key-Value Stores In Storage Disaggregated Infrastructure (2024) - Yu, Qiaolin et al.
- The Notions Of Consistency And Predicate Locks In A Database System (1976) - Eswaran, Kapali P., et al.
- Concurrency Control In Distributed Database Systems (1981) - Bernstein, Philip A., and Nathan Goodman.
- On Optimistic Methods For Concurrency Control (1981) - Kung, Hsiang-Tsung, and John T. Robinson.
- Principles Of Transaction-Oriented Database Recovery (1983) - Haerder, Theo, and Andreas Reuter.
- Multiversion Concurrency Control - Theory And Algorithms (1983) - Bernstein, Philip A., and Nathan Goodman.
- Aries: A Transaction Recovery Method Supporting Fine-Granularity Locking And Partial Rollbacks Using Write-Ahead Logging (1992) - Mohan C, Haderle D, Lindsay B, et al.
- A Critique Of Ansi Sql Isolation Levels (1995) - Berenson, Hal, et al.
- Generalized Isolation Level Definitions (2000) - Adya, Atul, Barbara Liskov, and Patrick O'Neil.
- Large-Scale Incremental Processing Using Distributed Transactions And Notifications (2010) - Peng D, Dabek F.
- Serializable Snapshot Isolation In Postgresql (2012) - Ports, Dan RK, and Kevin Grittner.
- Calvin: Fast Distributed Transactions For Partitioned Database Systems (2012) - Thomson, Alexander, et al.
- Maat: Effective And Scalable Coordination Of Distributed Transactions In The Cloud (2014) - Mahmoud, Hatem A., et al.
- Staring Into The Abyss: An Evaluation Of Concurrency Control With One Thousand Cores (2014) - Yu, Xiangyao, et al.
- An Evaluation Of The Advantages And Disadvantages Of Deterministic Database Systems (2014) - Ren, Kun, Alexander Thomson, and Daniel J. Abadi.
- Fast Serializable Multi-Version Concurrency Control For Main-Memory Database Systems (2015) - Neumann, Thomas, Tobias Mühlbauer, and Alfons Kemper.
- An Empirical Evaluation Of In-Memory Multi-Version Concurrency Control (2017) - Wu, Yingjun, et al.
- An Evaluation Of Distributed Concurrency Control (2017) - Harding, Rachael, et al.
- Scalable Garbage Collection For In-Memory Mvcc Systems (2019) - Böttcher, Jan, et al.
- Automated Demand-Driven Resource Scaling In Relational Database-As-A-Service (2016) - Das, Sudipto, et al.
- Autoscaling Tiered Cloud Storage In Anna (2019) - Wu, Chenggang, Vikram Sreekanti, and Joseph M. Hellerstein.
- Adaptive Htap Through Elastic Resource Scheduling (2020) - Raza, Aunn, et al.
- Morphosys: Automatic Physical Design Metamorphosis For Distributed Database Systems (2020) - Abebe, Michael, Brad Glasbergen, and Khuzaima Daudjee.
- Tpc-H Analyzed: Hidden Messages And Lessons Learned From An Influential Benchmark (2013) - Boncz, Peter, Thomas Neumann, and Orri Erling.
- Quantifying Tpch Choke Points And Their Optimizations (2020) - Dreseler, Markus, et al.
- The End Of Slow Networks: It's Time For A Redesign (2015) - Binnig, Carsten, et al.
- Accelerating Relational Databases By Leveraging Remote Memory And Rdma (2016) - Li, Feng, et al.
- Don't Hold My Data Hostage: A Case For Client Protocol Redesign (2017) - Raasveldt, Mark, and Hannes Mühleisen.
- Testing The Accuracy Of Query Optimizers (2012) - Gu, Zhongxian, Mohamed A. Soliman, and Florian M.
- Automatic Sql Tuning In Oracle 10G (2004) - Dageville B, Das D, Dias K, et al.
- Automatic Performance Diagnosis And Tuning In Oracle (2005) - Dias K, Ramacher M, Shaft U, et al.
- A Deep Dive into Common Open Formats for Analytical DBMSs (2023) - Liu C, Pavlenko A, Interlandi M, et al.