The given JSON file contains 20K articles (1 object per line). Each article contains set of attributes with one set value.
Calculate similarity of articles identified by SKU (product identifier) based on their attributes values. The number of matching attributes is the most important metric for defining similarity. In case of a draw, attributes with a name higher in alphabet (a is higher than z) is weighted heavier.
Example 1: {"sku":"sku-1","attributes": {"att-a": "a1", "att-b": "b1", "att-c": "c1"}} is more similar to {"sku":"sku-2","attributes": {"att-a": "a2", "att-b": "b1", "att-c": "c1"}} than to {"sku":"sku-3","attributes": {"att-a": "a1", "att-b": "b3", "att-c": "c3"}}
Example 2: {"sku":"sku-1","attributes":{"att-a": "a1", "att-b": "b1"}} is more similar to {"sku":"sku-2","attributes":{"att-a": "a1", "att-b": "b2"}} than to {"sku":"sku-3","attributes":{"att-a": "a2", "att-b": "b1"}}
Make sure that you have installed at least Python 3.8 on your machine
1. Please go to the task's home directory, and for launching the program, it's best to create a virtual environment as follows
$ python3 -m venv task
$ source task/bin/activate
$ pip3 install -r requirements.txt
$ pytest tests/test_recommendation_engine.py
4. Now run the main.py script by typing the command below and passing necessary keyword parameters (--sku_name, --json_file, -num) for example recommendation request
$ python3 main.py --sku_name="sku-222" --json_file="test-data.json" --num=100
After, when the venv is not needed anymore, the command for deactivating the virtual environment is shown below
$ deactivate