Image token process malfunction #78

Stevetich · 2024-08-12T08:39:57Z

It seems that in model_utils.py, only one token is passed in pllava_answer.

Lines 139 to 146 in 6f49fd2

    
           def pllava_answer(conv: Conversation, model, processor, img_list, do_sample=True, max_new_tokens=200, num_beams=1, min_length=1, top_p=0.9, 
        
                          repetition_penalty=1.0, length_penalty=1, temperature=1.0, stop_criteria_keywords=None, print_res=False): 
        
               # torch.cuda.empty_cache() 
        
               prompt = conv.get_prompt() 
        
               inputs = processor(text=prompt, images=img_list, return_tensors="pt") 
        
               if inputs['pixel_values'] is None: 
        
                   inputs.pop('pixel_values') 
        
               inputs = inputs.to(model.device)

However, in eval_utils, the multiple tokens are passed for the model to perform video inference. Is that a bug in model_utils.py?

PLLaVA/tasks/eval/eval_utils.py

Lines 402 to 410 in 6f49fd2

    
           def answer(self, conv: Conversation, img_list, max_new_tokens=200, num_beams=1, min_length=1, top_p=0.9, 
        
                      repetition_penalty=1.0, length_penalty=1, temperature=1.0): 
        
               torch.cuda.empty_cache() 
        
               prompt = conv.get_prompt() 
        
               if prompt.count(conv.mm_token) < len(img_list): 
        
                   diff_mm_num = len(img_list) - prompt.count(conv.mm_token) 
        
                   for i in range(diff_mm_num): 
        
                       conv.user_query("", is_mm=True) 
        
                   prompt = conv.get_prompt()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Image token process malfunction #78

Image token process malfunction #78

Stevetich commented Aug 12, 2024

Image token process malfunction #78

Image token process malfunction #78

Comments

Stevetich commented Aug 12, 2024