GAE-Data-Flow

Fore Word

This is a Map Reduce script written to filter data in large wine data cluster. The framework is basically designed for google cloud engine.

How to run

Firstly, go to the virtual environment.

To run it direclty in terminal

python main.py --input ${INPUT_PATH} --output ${OUTPUT_PATH} --runner ${RUNNER} --project ${PROJECT_ID} --temp_location ${TEMP} --${MODE} --variety ${VARIETY}(optional)

INPUT_PATH, path of input file OUTPUT_PATH, path of output file RUNNER, the way to run the code, direct or dataflow runner PROJECT_ID, chen-zeyu-wine-dataflow TEMP, path of out temp file MODE, bottles_sold, dollars sold, etc. VARIETY, to choose the variety of the wines, invalid MODE(purchased_together)

To run it by script, simply run in command line.

Local:

chmod +x localrun.sh
./localrun.sh

Remote:

chmod +x rermoterun.sh
./remoterun.sh

The script is written to run all 9 parts.

If you want to change directory path or variety in the script, simply change the variable in the script.

OUTPUT="your prefernce"
VARIETY="your preference"

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
output		output
submit		submit
virtual-env		virtual-env
wordcount		wordcount
LICENSE		LICENSE
README.md		README.md
localrun.sh		localrun.sh
main.py		main.py
remoterun.sh		remoterun.sh
run_all_pipelines.py		run_all_pipelines.py
test-old.csv		test-old.csv
test.csv		test.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GAE-Data-Flow

Fore Word

How to run

About

Releases

Packages

Languages

License

kasakun/GAE-Data-Flow

Folders and files

Latest commit

History

Repository files navigation

GAE-Data-Flow

Fore Word

How to run

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages