dami

Scalable algorithms in data mining. (I am shifting this project to feluca and will refactor it. so this project is deprecating)

dami is writen in Java. Our goal is to make algorithms that can handle hundreds of millions of data with a limited memory PC

Currently we have :

utility: Buffered vectors pool for dataset IO, High performance and simple text parser. (More tests need)
classification: SGD for logistic regressions
recommendation: SlopeOne, SVD, RSVD, itemneighborhood-SVD (see movielens_converter.py)
significant test: swap randomization
graph: Pagerank.

Future:

similarity: simhash

2012/10/22 Release Notes:

L1 & L2 logistic regression

memory cost estimation

simple commandline integration for LR

2012/7/22 Release Notes:

Asynchronous vector buffer for dataset IO

High performance and simple text parser(only for digital related chars)

small refactoring.

2012/7/12 Release Notes:

code refactoring for recommendation and IO

To run RMSE for recommendation, you first need to see movielens_convert.py for converting and/or splitting movielens data, and see CFDataConverter and TestSVD

To achieve computation efficiency and memory utilization, two ways we have just adopted.

1: Using "id" as index of array for fetching data.

2: Only maintaining model in memory and saving data to converted bytes for IO

So it's highly recommemded you use continuous ids for the algorithms :)

My Chinese blog : http://blog.csdn.net/lgnlgn
E-mail : gnliang10 [at] 126.com

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
lib		lib
src/org/dami		src/org/dami
test/org/dami		test/org/dami
.gitignore		.gitignore
README.md		README.md
movielens_convert.py		movielens_convert.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dami

About

Releases

Packages

Contributors 2

Languages

lgnlgn/dami

Folders and files

Latest commit

History

Repository files navigation

dami

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages