Introduction to Cascalog

1 Ideas

structure it like Nathan Marz: http://nathanmarz.com/blog/cascalog-presentation-at-bay-area-clojure-user-group.html
collaborative filtering
- sample data?

2 Hadoop

2.1 A Lot of Data!

Hadoop, large data, map reduce

2.2 What is Hadoop

Distributed Filesystem
MapReduce Framework
Scales (1000s machines, petabytes)

3 Programming Hadoop

Simple Example: word count

3.1 MapReduce

http://wiki.apache.org/hadoop/WordCount
http://hadoop.apache.org/common/docs/current/mapred_tutorial.html

3.2 Pig

http://en.wikipedia.org/wiki/Pig_(programming_language)

3.3 Cascading

https://github.com/cwensel/cascading.samples/tree/master/wordcount/

4 Cascalog

4.1 What’s Cascalog?

“Cascalog is a fully-featured data processing and querying library for Clojure. The main use cases for Cascalog are processing “Big Data” on top of Hadoop or doing analysis on your local computer from the Clojure REPL. Cascalog is a replacement for tools like Pig, Hive, and Cascading.

Cascalog operates at a significantly higher level of abstraction than a tool like SQL. More importantly, its tight integration with Clojure gives you the power to use abstraction and composition techniques with your data processing code just like you would with any other code. It’s this latter point that sets Cascalog far above any other tool in terms of expressive power.”

4.2 Features

high level abstractions
tight integration with Clojure

5 Programming Cascalog

5.1 word count sequel

5.2 Demo time!

5.3 Testing

http://feedproxy.google.com/~r/sam-ritchie/~3/GShXHgJNJgk/cascalog-testing-20.html

6 Finally

6.1 Questions?

Thanks!

7 Appendix

7.1 Resources

https://github.com/nathanmarz/cascalog
http://nathanmarz.com/blog
http://sna-projects.com/blog/2010/11/clojure-at-backtype
http://tech.backtype.com/52456836 (Why Yieldbot chose cascalog over Pig for hadoop processing)
http://tech.backtype.com/videos-from-the-may-hadoop-meet-up
http://blog.yieldbot.com/using-lucene-and-cascalog-for-fast-text-proce
http://blog.factual.com/clojure-on-hadoop-a-new-hope
http://blip.tv/clojure/nathan-marz-cascalog-making-data-processing-fun-again-5970118

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cascalog.org

cascalog.org

Introduction to Cascalog

1 Ideas

2 Hadoop

2.1 A Lot of Data!

2.2 What is Hadoop

3 Programming Hadoop

3.1 MapReduce

3.2 Pig

3.3 Cascading

4 Cascalog

4.1 What’s Cascalog?

4.2 Features

5 Programming Cascalog

5.1 word count sequel

5.2 Demo time!

5.3 Testing

6 Finally

6.1 Questions?

7 Appendix

7.1 Resources

Files

cascalog.org

Latest commit

History

cascalog.org

File metadata and controls

Introduction to Cascalog

1 Ideas

2 Hadoop

2.1 A Lot of Data!

2.2 What is Hadoop

3 Programming Hadoop

3.1 MapReduce

3.2 Pig

3.3 Cascading

4 Cascalog

4.1 What’s Cascalog?

4.2 Features

5 Programming Cascalog

5.1 word count sequel

5.2 Demo time!

5.3 Testing

6 Finally

6.1 Questions?

7 Appendix

7.1 Resources