Add ELMo modules #298

gpengzhi · 2020-02-20T22:44:58Z

Add texar-styled ELMo encoder adapted from allennlp. The corresponding tokenizer will be in another PR.

gpengzhi · 2020-02-20T22:47:18Z

resolve #297

codecov · 2020-02-20T23:09:57Z

Codecov Report

Merging #298 into master will increase coverage by 0.29%.
The diff coverage is 86.6%.

@@            Coverage Diff             @@
##           master     #298      +/-   ##
==========================================
+ Coverage   82.63%   82.93%   +0.29%     
==========================================
  Files         207      215       +8     
  Lines       16006    17283    +1277     
==========================================
+ Hits        13226    14333    +1107     
- Misses       2780     2950     +170

Impacted Files	Coverage Δ
texar/torch/utils/test.py	`93.75% <100%> (+0.41%)`	⬆️
texar/torch/modules/pretrained/__init__.py	`100% <100%> (ø)`	⬆️
texar/torch/modules/encoders/__init__.py	`100% <100%> (ø)`	⬆️
texar/torch/modules/pretrained/elmo_test.py	`47.05% <47.05%> (ø)`
texar/torch/modules/pretrained/elmo.py	`53.84% <53.84%> (ø)`
texar/torch/modules/encoders/elmo_encoder_test.py	`73.33% <73.33%> (ø)`
texar/torch/modules/encoders/elmo_encoder.py	`74.69% <74.69%> (ø)`
texar/torch/modules/pretrained/elmo_utils.py	`84.46% <84.46%> (ø)`
...exar/torch/data/tokenizers/elmo_tokenizer_utils.py	`94.23% <94.23%> (ø)`
texar/torch/modules/pretrained/elmo_utils_test.py	`96.1% <96.1%> (ø)`
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 931ead9...aa64126. Read the comment docs.

huzecong

I really appreciate the hard work of getting ELMo into our codebase. But I feel that the code should not be merged into master in its current condition, the main reason being to avoid code bloat. I think we can have a temporary branch in the upstream repo (asyml/texar-pytorch) to hold the changes in this PR, so people could still use this if they want; but until we get rid of enough code bloat, I don't think we should merge this.

texar/torch/data/tokenizers/elmo_tokenizer_utils.py

texar/torch/data/tokenizers/elmo_tokenizer_utils_test.py

huzecong · 2020-02-21T00:57:40Z

texar/torch/modules/pretrained/elmo_utils.py

+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""


I guess some of these modules were just copied from the AllenNLP repo? Could you check whether this would cause licensing issues?

Also, it seems that they've implemented almost everything from scratch... I think we have some of the functionalities in the library, for instance (correct me if I'm wrong), custom-cell LSTMs w/ highway connections, and more powerful embedding classes. Simply committing all this code into the master branch would be enormous code bloat.

texar/torch/modules/pretrained/elmo_utils.py

gpengzhi · 2020-02-21T15:21:29Z

For the added functions, most of them are directly copy from allennlp. I could try to use our own functions or add the functions (if they are not elmo specific) to our library if we do not provide such functionalities.

For the added modules, I modified or simplified most of them to get rid of some internal dependencies in allennlp and add some new features to make them texar-pytorch compatible. I will investigate the possibility and the workload of using our own module given the fact that the weight loading part is highly coupled with their modules. If it is not that convenient to use our own modules, I will try to further polish and simplify the code.

gpengzhi · 2020-02-21T15:23:24Z

FYI, allenai/allennlp is licensed under the Apache License 2.0. I think we could use their codes.

Add ELMo modules

083d62f

gpengzhi requested review from huzecong and AvinashBukkittu February 20, 2020 23:35

huzecong reviewed Feb 21, 2020

View reviewed changes

gpengzhi added 2 commits February 21, 2020 15:23

Polish elmo tokenizer utils

d97ed25

Move out some functions

aa64126

gpengzhi closed this Feb 21, 2020

gpengzhi mentioned this pull request Feb 24, 2020

Add ELMo modules #299

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ELMo modules #298

Add ELMo modules #298

gpengzhi commented Feb 20, 2020

gpengzhi commented Feb 20, 2020

codecov bot commented Feb 20, 2020 •

edited

Loading

huzecong left a comment

huzecong Feb 21, 2020

huzecong Feb 21, 2020

gpengzhi commented Feb 21, 2020 •

edited

Loading

gpengzhi commented Feb 21, 2020

Add ELMo modules #298

Add ELMo modules #298

Conversation

gpengzhi commented Feb 20, 2020

gpengzhi commented Feb 20, 2020

codecov bot commented Feb 20, 2020 • edited Loading

Codecov Report

huzecong left a comment

Choose a reason for hiding this comment

huzecong Feb 21, 2020

Choose a reason for hiding this comment

huzecong Feb 21, 2020

Choose a reason for hiding this comment

gpengzhi commented Feb 21, 2020 • edited Loading

gpengzhi commented Feb 21, 2020

codecov bot commented Feb 20, 2020 •

edited

Loading

gpengzhi commented Feb 21, 2020 •

edited

Loading