GitHub - joshterrell805/Spark_Count_Unique_Words: Count the unique words in a text document using Spark and Java 8

This repo is an example of how to count the unique words in a text document using Spark and Java 8.

This example is different than most spark+java examples because it uses exclusively lambdas for specifying reduce and map functions in spark which makes the code much more concise.

The run method of Main.java contains almost all of the spark code (the JavaSparkContext is created in Beans.xml).

Excution

To build this example, use maven in your IDE or on the command line.

To run, make sure to supply the path to the file you want to count words from. The --limit parameter is optional and allows the you to specify how many of the most frequent words should be printed.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.idea		.idea
src/main		src/main
.gitignore		.gitignore
README.md		README.md
hello-spark.iml		hello-spark.iml
pom.xml		pom.xml
sample.txt		sample.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Excution

License

About

Releases

Packages

Languages

joshterrell805/Spark_Count_Unique_Words

Folders and files

Latest commit

History

Repository files navigation

Excution

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages