HF Integration #248

sedrick-keh-tri · 2024-04-12T09:28:05Z

Follows what OLMo does with their HF integration.

This allows us to work with HF without having to create new classes in the upstream transformers repo. We now directly read from this repo, so we also don't need to worry about the OpenLM codebase being updated in the future.

Usage is exactly the same as the standard HF usage, except with the additional import in the first line.

from open_lm_hf import *

from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("tri-ml/openlm-7b-300b")
model = AutoModelForCausalLM.from_pretrained("tri-ml/openlm-7b-300b")
a = tokenizer("Hi, nice to meet you.", return_tensors='pt')
out = model.to("cuda").generate(a['input_ids'].to("cuda"))
print(tokenizer.decode(out[0]))

Some things not implemented yet:

some extra HF functions like ResizeTokenEmbeddings
The HF forward output tuple CausalLMOutputWithPast usually returns the full hidden states, but OpenLM's forward doesn't return that, so I left it as None for now.
There's also a chunk if labels is not None: that I just copied from OLMo and didn't test.

sedrick-keh-tri · 2024-04-12T09:29:30Z

HF model repo config.json should look something like this

{
  "dim": 4096,
  "n_layers": 32, 
  "n_heads": 32, 
  "vocab_size": 50432,
  "norm_eps": 1e-5,
  "seq_len": 2048,
  "weight_tying": false,
  "apply_qk_norm": true,
  "norm_type": "gain_only_lp_layer_norm",
  "positional_embedding_type": "rotary",
  "ffn_type": "swiglu"
}

These correspond to attributes in the Params class.

hf integration

f455fdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HF Integration #248

HF Integration #248

sedrick-keh-tri commented Apr 12, 2024

sedrick-keh-tri commented Apr 12, 2024

HF Integration #248

Are you sure you want to change the base?

HF Integration #248

Conversation

sedrick-keh-tri commented Apr 12, 2024

sedrick-keh-tri commented Apr 12, 2024