Due 11:59 PM Friday, February 22
In this assignment, you will implement persistent B+ trees in Java. In this document, we explain
- how to fetch the release code from GitHub,
- how to test your code using the docker container
- what code you have to implement.
To avoid redownloading our maven dependencies every time we start our docker conatiner, we will start our docker container a little differently than in HW1. The first time you work on this homework run the following command:
docker run --name cs186hw2 -v <pathname-to-directory-on-your-machine>:/cs186 -it cs186/environment bash
The only difference in this command from the one we used in HW1 is that we give the container a name. Make sure you can access your shared drive as in HW1, then exit out of the container (just run exit
). Now startup your container again with the following command:
docker start cs186hw2
The only thing that should happen is the terminal should print cs186hw2. After running those 2 commands once, to open your container in the future the only command you need to run is:
docker exec -it cs186hw2 bash
While inside your container, navigate to the shared directory you created in the hw1 setup:
cd cs186
Clone the HW2 repo:
git clone https://github.com/berkeley-cs186/Sp19HW2.git
If you get an error like: Could not resolve host: github.com
, try restarting your docker machine (run docker-machine restart
after exiting the docker container), and if that doesn't work restart your entire computer.
Now, cd into the folder you just created:
cd Sp19HW2
To test everything downloaded correctly run:
mvn clean compile
To test your project implementation, run:
mvn clean test -D HW=2
If you haven't implemented anything, about 18 tests should error and the bottom of your output should look something like this:
If a few additional tests fail try running the command again. If the problem persists then make a post on piazza.
Before submitting your assignment you must run mvn clean test
and ensure it works in the docker container. We will not accept "the test ran in my IDE" as an excuse. You should be running the maven tests periodically as you work through the project.
You are free to use any text editor or IDE to complete the project, but we will build and test your code in the docker container with maven.
We recommend setting up a more local development environment by installing Java 8 (the version our Docker container runs) and using an IDE such as Eclipse or IntelliJ. Note that if you have another version of Java installed on your computer it is probably fine to run that as long as you don't use any Java 9/10 features, but be sure to run the maven tests in the docker container frequently to ensure your code will work on our setup. Here is a link to Java 8 downloads: http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html. If you're having trouble installing Java, CS61B gives some pretty detailed instructions in Section A 'Configure Your Computer' here: https://sp18.datastructur.es/materials/lab/lab1setup/lab1setup (make sure to use Java 8 rather than Java 9). When setting up your IDE, make sure to import the project as a Maven project.
If you can import the code as a maven project in your IDE and run your unit tests successfully, you do not need to install maven on your local computer. Most IDEs should provide this functionality by default, in Eclipse for example, you can do this: File > import > maven > existing maven project. If your IDE does not, there should be a way to install a Maven plugin which will give you this functionality.
It bears repeating that even though you are free to complete the project in Eclipse or IntelliJ, we will build and test your code in the docker container with maven.
We very strongly recommend having access to (and knowing how to use!) a debugger for Java (this is available in IntelliJ, for example, by choosing Debug instead of Run) - it will save you much time in debugging your code, and is a good tool to be familiar with both for this course and for your future endeavors.
If you are using IntelliJ, and wish to run the same tests as mvn clean test -DHW=2
, open the appropriate
link and follow the given instructions:
Navigate to the Fa18HW2/src/main/java/edu/berkeley/cs186/database
directory. You
will find seven directories: common
, databox
, io
, table
, query
, concurrency
and index
.You do not have to deeply understand all of the code, but since all future programming assignments will reuse this code, it's worth becoming a little familiar with it. In this assignment, though, you may only modify files in
the index
directory.
The common
directory contains miscellaneous and generally useful bits of code
that are not particular to this assignment.
Like most DBMSs, the system we are working on in this assignment has its own type system, which is distinct from the type system of the programming language used to implement the DBMS. (Our DBMS doesn't quite provide SQL types either, though it's modeled on a simplified version of SQL types). In this homework, we'll need to write Java code to create and manipulate the DBMS types and any data we store.
The databox
directory contains the classes which represent the values
stored in a database, as well as their types. Specifically, the DataBox
class
represents values and the Type
class represents types. Here's an example:
DataBox x = new IntDataBox(42); // The integer value '42'.
Type t = Type.intType(); // The type 'int'.
Type xsType = x.type(); // Get x's type: Type.intType()
int y = x.getInt(); // Get x's value: 42
String s = x.getString(); // An exception is thrown.
The io
directory contains code that allows you to allocate, read, and write
pages to and from a file. All modifications to the pages of the file are
persisted to the file. The two main classes of this directory are
PageAllocator
which can be used to allocate pages in a file, and Page
which
represents pages in the file. Here's an example of how to persist data into a
file using a PageAllocator
:
// Create a page allocator which stores data in the file "foo.data". Setting
// wipe to true clears out any data that may have previously been in the file.
bool wipe = true;
PageAllocator allocator = new PageAllocator("foo.data", wipe);
// Allocate a page in the file. All pages are assigned a unique page number
// which can be used to fetch the page.
int pageNum = allocator.allocPage(); // The page number of the allocated page.
Page page = allocator.fetchPage(pageNum); // The page we just allocated.
System.out.println(pageNum); // 0. Page numbers are assigned 0, 1, 2, ...
// Write data into the page. All data written to the page is persisted in the
// file automatically.
ByteBuffer buf = page.getByteBuffer();
buf.putInt(42);
buf.putInt(9001);
And here's an example of how to read data that's been persisted to a file:
// Create a page allocator which stores data in the file "foo.data". Setting
// wipe to false means that this page allocator can read any data that was
// previously stored in "foo.data".
bool wipe = false;
PageAllocator allocator = new PageAllocator("foo.data", wipe);
// Fetch the page we previously allocated.
Page page = allocator.fetchPage(0);
// Read the data we previously wrote.
ByteBuffer buf = page.getByteBuffer();
int x = buf.getInt(); // 42
int y = buf.getInt(); // 9001
In future assignments, the table
directory will contain an implementation of
relational tables that store values of type DataBox
. For now, it only
contains a RecordId
class which uniquely identifies a record on a page by its
page number and entry number.
// The jth record on the ith page.
RecordId rid = new RecordId(i, (short) j);
The query
directory contains what are called query operators. These are operators that are applied to one or more tables, or other operators. They carry out their operation on their input operator(s) and return iterators over records that are the result of applying that specific operator. You don't need to worry about this directory for HW2.
The concurrency
directory contains classes to help achieve concurrency of the database. You don't need to worry about this directory for HW2.
We describe the index
directory in the next section.
The index
directory contains a partial implementation of a B+ tree
(BPlusTree
), an implementation that you will complete in this assignment.
Every B+ tree maps keys of type DataBox
to values of type RecordId
. A B+
tree is composed of inner nodes (InnerNode
) and leaf nodes (LeafNode
).
Every B+ tree is persisted to a file, and every inner node and leaf node is
stored on its own page.
In this assignment, do the following:
- Read through all of the code in the
index
directory. Many comments contain critical information on how you must implement certain functions. For example,BPlusNode::put
specifies how to redistribute entries after a split. You are responsible for reading these comments. If you do not obey the comments, you will lose points. Here are a few of the most notable points:- Our implementation of B+ trees does not support duplicate keys. You will throw an exception whenever a duplicate key is inserted.
- Our implementation of B+ trees assumes that inner nodes and leaf nodes can be serialized on a single page. You do not have to support nodes that span multiple pages.
- Our implementation of delete does not rebalance the tree. Thus, the
invariant that all non-root leaf nodes in a B+ tree of order
d
contain betweend
and2d
entries is broken. Note that actual B+ trees do rebalance after deletion, but we will not be implementing rebalancing trees in this project for the sake of simplicity.
- Implement the
LeafNode::fromBytes
function that reads aLeafNode
from a page. For information on how a leaf node is serialized, seeLeafNode::toBytes
. For an example on how to read a node from disk, seeInnerNode::fromBytes
. Note thattestToAndFromBytes
will not pass just from completing this function, you need to completeLeafNode.put
(question 3) as well. - Implement the
get
,getLeftmostLeaf
,put
,remove
, andbulkLoad
methods ofInnerNode
andLeafNode
. For information on what these methods do, refer to the comments inBPlusNode
. Don't forget to callsync
when implementingput
,remove
, andbulkLoad
; it's easy to forget. - Implement the
get
,scanAll
,scanGreaterEqual
,put
,remove
, andbulkLoad
methods ofBPlusTree
. In order to implementscanAll
andscanGreaterEqual
, you will have to complete theBPlusTreeIterator
class. IftestRandomPuts
fails because of a timeout exception raise theglobalTimeout
(line 37 inTestBPlusTree
). Anything under a minute is probably okay.
After this, you should pass all the tests we have provided to you (and any you add yourselves).
Note that you may not modify the signature of any methods or classes that we
provide to you (except BPlusTreeIterator
), but you're free to add helper
methods. Also, you may only modify code in the index
directory.
To submit, make sure your are in your Sp19HW2 folder. Then run:
python3 turn_in.py --student-id <your_studentid_here> --assignment hw2
but replace <your_studentid_here>
with your actual student ID (not your edx username!). This will generate a zip file called hw2.zip
which you should then upload to edx.
All of our public tests for this project are in the edu.berkeley.cs186.database.index
package in the src/test/java
folder. These are the tests that maven clean test
runs. 60% of your grade will be made up of these public tests and 40% will be made up of hidden tests on our autograder.