Skip to content
This repository has been archived by the owner on Feb 12, 2022. It is now read-only.

Roadmap

James Taylor edited this page Dec 10, 2013 · 15 revisions

Our roadmap is driven by our user community. Below, in prioritized order, is the current plan for Phoenix:

  1. Hash Joins. Provide the ability to join together multiple tables, through a phased approach:
  • Equi-join. Support left, right, inner, outer equi-joins where one side of the join is small enough to fit into memory. Available in master branch
  • Semi/anti-join. Support correlated sub queries for exists and in where one side of the join is small enough to fit into memory.
  1. Multi-tenant Tables. Allows the creation of multiple tables from a base tables on the same physical HBase table. Available in master branch
  2. Sequences. Support the atomic increment of sequence values through the CREATE SEQUENCE and the NEXT VALUE FOR statements.
  3. Type Enhancements. Additional work includes support for DEFAULT declaration when creating a table, for the ARRAY, STRUCT, and JSON data types.
  4. Third Party Integration. There are a number of open source projects with which interop with Phoenix could be added or improved:
  • Flume sink. Support a Flume sink that writes Phoenix-compliant HBase data. Available in master branch
  • Hue integration. Add Phoenix as an HBase service layer in Hue.
  • Pentaho Mondrian support. Allow Phoenix to be used as the JDBC driver for Pentaho Mondrian. This effort is pretty far along already, with the Pentaho FoodMart demo running through Phoenix now
  • Cleanup Pig support . Commonize the functions we use across Map-reduce and Pig processing. We should also upgrade our pom to reference the 0.12 version of Pig and map our DECIMAL type to their new decimal type.
  • Improve Map-reduce integration. It's possible that we could provide a processing model where the map and reduce functions can invoke Phoenix queries (though this needs some more thought).
  1. Derived Tables. Allow a SELECT clause to be used in the FROM clause to define a derived table. This would include support for pipelining queries when necessary.
  2. Functional Indexes. Enables an index to contain the evaluation of an expression as opposed to just a column value.
  3. Monitoring and Management. Provide visibility into cpu, physical io, logical io, wait time, blocking time, and transmission time spent for each thread of execution across the HBase cluster, within coprocessors, and within the client-side thread pools for each query. On top of this, we should exposing things like active sessions and currently running queries. The EXPLAIN PLAN gives an idea of how a query will be executed, but we need more information to help users debug and tune their queries.
  4. Port to HBase 0.96. Currently Phoenix only works on the 0.94 branch of HBase. The latest branch of HBase is now 0.96, which has many breaking, non backward compatible changes (for example requiring that EndPoint coprocessors use protobufs). Ideally, we should create a shim that'll allow Phoenix to work with both 0.94 and 0.96, but barring that, we should have a branch of Phoenix that works under 0.96. Additional work includes replacing our type system with the new HBase type system in 0.96, but that would be significantly more work.
  5. Security Features. A number of existing HBase security features in 0.94 could be leverage and new security features being added to 0.98 could be leveraged in the future.
  1. Cost-based Optimizer. Once secondary indexing and joins are implemented, we'll need to collect and maintains stats and drive query optimization decisions based on them to produce the most efficient query plan.
  2. Query over Multiple Row Versions. Expose the time dimension of rows through a built-in function to allow aggregation and trending over multiple row versions.
  3. Parent/child Join. Unlike with standard relational databases, HBase allows you the flexibility of dynamically creating as many key values in a row as you'd like. Phoenix could leverage this by providing a way to model child rows inside of a parent row. The child row would be comprised of the set of key values whose column qualifier is prefixed with a known name and appended with the primary key of the child row. Phoenix could hide all this complexity, and allow querying over the nested children through joining to the parent row. Essentially, this would be an optimization of the general join case, but could support cases where both sides of the join are bigger than would fit into memory.
  4. OLAP Extensions. Support the WINDOW, PARTITION OVER, RANK, etc. functionality.
  5. Table Sampling. Support the TABLESAMPLE clause by implementing a filter that uses the guideposts established by stats gathering to only return n rows per region.
  6. Nested-loop Join. Support joins where both sides are big enough that they wouldn't fit into memory. As projects like Apache Drill progress, the need for this may lessen, since these systems will be able to decompose the query and perform the join efficiently without Phoenix needing to as described here.
  7. Schema Evolution. Phoenix supports adding and removing columns through the [ALTER TABLE] (http://forcedotcom.github.com/phoenix/index.html#alter_table) DDL command, but changing the data type of, or renaming, an existing column is not yet supported.
  8. Transactions. Support transactions by integrating a system that controls time stamps like OMID. For some ideas on how this might be done, see here.
Clone this wiki locally