- structure it like Nathan Marz: http://nathanmarz.com/blog/cascalog-presentation-at-bay-area-clojure-user-group.html
- collaborative filtering
- sample data?
Hadoop, large data, map reduce
- Distributed Filesystem
- MapReduce Framework
- Scales (1000s machines, petabytes)
Simple Example: word count
- http://wiki.apache.org/hadoop/WordCount
- http://hadoop.apache.org/common/docs/current/mapred_tutorial.html
“Cascalog is a fully-featured data processing and querying library for Clojure. The main use cases for Cascalog are processing “Big Data” on top of Hadoop or doing analysis on your local computer from the Clojure REPL. Cascalog is a replacement for tools like Pig, Hive, and Cascading.
Cascalog operates at a significantly higher level of abstraction than a tool like SQL. More importantly, its tight integration with Clojure gives you the power to use abstraction and composition techniques with your data processing code just like you would with any other code. It’s this latter point that sets Cascalog far above any other tool in terms of expressive power.”
- high level abstractions
- tight integration with Clojure
http://feedproxy.google.com/~r/sam-ritchie/~3/GShXHgJNJgk/cascalog-testing-20.html
Thanks!
- https://github.com/nathanmarz/cascalog
- http://nathanmarz.com/blog
- http://nathanmarz.com/blog/introducing-cascalog-a-clojure-based-query-language-for-hado.html
- http://nathanmarz.com/blog/new-cascalog-features-outer-joins-combiners-sorting-and-more.html
- http://nathanmarz.com/blog/cascalog-presentation-at-bay-area-clojure-user-group.html
- http://nathanmarz.com/blog/news-feed-in-38-lines-of-code-using-cascalog.html
- http://sna-projects.com/blog/2010/11/clojure-at-backtype
- http://tech.backtype.com/52456836 (Why Yieldbot chose cascalog over Pig for hadoop processing)
- http://tech.backtype.com/videos-from-the-may-hadoop-meet-up
- http://blog.yieldbot.com/using-lucene-and-cascalog-for-fast-text-proce
- http://blog.factual.com/clojure-on-hadoop-a-new-hope
- http://blip.tv/clojure/nathan-marz-cascalog-making-data-processing-fun-again-5970118