Why example/lstm/lstm_worker does not call init_shared_params? #93

cyxtj · 2017-05-10T03:23:11Z

in the README.md / Usage / implementing the workers, the third rule says:

After the model has been built and the parameters initialized, initialize the central parameters by calling the Worker's init_shared_params() method. Every worker should call this method.

But in example/lstm/lstm_worker line 517-528:

    list_tparams = list(tparams.values())
    if param_sync_api:
        print("Using param_sync worker's interface!")
        worker.init_shared_params(list_tparams, param_sync_rule=EASGD(0.5))
    else:
        print("Using all_reduce worker's interface!")
        from platoon.training import global_dynamics as gd
        cparams = init_tparams(params)
        list_cparams = list(cparams.values())
        easgd = gd.EASGD(worker)
        easgd.make_rule(list_tparams, list_cparams, 0.5)

It calls init_shared_params() only if param_sync_api is set.
But by default setting, param_sync_api=None.
And I didn't find EASGD call init_shared_params.
Does the program wrong? Or there are other rules not stated?
@tsirif

The text was updated successfully, but these errors were encountered:

cshanbo · 2017-05-11T06:59:26Z

I roughly chatted with Christos @tsirif about platoon a couple of days ago, and hope my understanding will help.

In the all_reduce api, the update algorithm is in global_dynamics. The updates will be done by all_reduce itself, which doesn't need init_shared_params as in a param_sync_api. (Looking into synchronous lstm example to see how it works).

According to my experience, my suggestion is that you can use param_sync_api to build your single-node, multi-GPUs training framework, unless you want to use the all_reduce api in a multi-node scenario.

There might be some bugs remains in multi-node scenario. If you really want to use it, you can try this branch to see if it works for you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why example/lstm/lstm_worker does not call init_shared_params? #93

Why example/lstm/lstm_worker does not call init_shared_params? #93

cyxtj commented May 10, 2017 •

edited

Loading

cshanbo commented May 11, 2017

Why example/lstm/lstm_worker does not call init_shared_params? #93

Why example/lstm/lstm_worker does not call init_shared_params? #93

Comments

cyxtj commented May 10, 2017 • edited Loading

cshanbo commented May 11, 2017

cyxtj commented May 10, 2017 •

edited

Loading