SBLAJ

SBLAJ -- Sparse Basic Linear Algebra for the JVM -- is a library that provides standard operations on sparse matrices, with a focus on operations required for Machine Learning. The core goals are:

Extremeley memory efficient for large, sparse matrices
- Optimized storage for boolean data
- Handles > billions of entries
Clean API for common matrix operations
Cache-friendly storage
Fast -- competitive with application specific code
Compact data formats -- small on disk and fast to load

The extended packages also provide:

Implementations of standard machine learning algorithms that are designed for large, sparse data, including
- Naive Bayes
- Stochastic Gradient Descent
- LDA
Utilities to get data into the Sparse Matrix Formats
Compatability with Hadoop and Spark, for distributed computing

Getting Started

SBLAJ can be built using sbt. Simply check out the source and run

sbt package

to create a usable jar. You can try out the code by running the units tests from within sbt, either with test to run all the tests, or eg. ~test-only org.sblaj.BaseSparseBinaryVectorTest to repeatedly run one set of tests as you modify the code

Current Status

SBLAJ is still pre-0.1. Some basic ideas are in place, but the API is still in flux. It is actively developed, and the intention is to be a well-tested, robust library, not simply a toy demonstrating one idea. Contributions and ideas from others would be greatly appreciated!

Roadmap

v0.1

SparseBinaryRowMatrix -- iterators & helper functions
NaiveBayes
Matrix Creation Utils
- Clean api to featurize, creating header, matrix, and dictionary
- transform "hashed" to "enumerated" features

future (rough prioritization)

SGD
Hadoop reader / writer
Matrix Market Import & Export
Spark API
Timing Comparisons
LDA
Integration with Scalala
Std filters & subsets, eg. min occurrence, min correlation, etc.
Additional Matrix Types
- Float, Column
Varint encoded matrices
Views -- api to index sub-portions of matrix, with lazy & force version
Mixed-mode matrices
- dense float + sparse binary (eg., time series)
HMM implementation

Name		Name	Last commit message	Last commit date
Latest commit History 200 Commits
.idea		.idea
boxwood/src		boxwood/src
core/src		core/src
ml/src		ml/src
project		project
spark/src		spark/src
.gitignore		.gitignore
README.md		README.md
sblaj.iml		sblaj.iml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SBLAJ

Getting Started

Current Status

Roadmap

v0.1

future (rough prioritization)

About

Releases

Packages

Contributors 2

Languages

squito/sblaj

Folders and files

Latest commit

History

Repository files navigation

SBLAJ

Getting Started

Current Status

Roadmap

v0.1

future (rough prioritization)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages