Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation Error : RuntimeError: rnn: hx is not contiguous #4

Open
LinuxBeginner opened this issue Jun 6, 2020 · 6 comments
Open

Comments

@LinuxBeginner
Copy link

LinuxBeginner commented Jun 6, 2020

Training was successful. Data:
vatex_training_v1.0.json
vatex_validation_v1.0.json
vatex_public_test_english_v1.1.json

System: Google Colab GPU

When I tried to run the python eval.py , it is showing the following error

Vocab size src/tgt:10523/2907
train/val/test size: 254/30/59
************ Start eval... ************
Use epoch 34 as the best model for testing
Traceback (most recent call last):
File "eval.py", line 123, in
main(args)
File "eval.py", line 63, in main
eval(test_loader, encoder, decoder, cp_file, tok_tgt, result_path)
File "eval.py", line 90, in eval
preds, pred_lengths = decoder.beam_decoding(srccap, init_hidden, src_out, vid_out, args.MAX_INPUT_LENGTH, beam_size=5)
File "/content/drive/My Drive/MMT/MMTvatex/Video-guided-Machine-Translation/model.py", line 208, in beam_decoding
output, hidden_i, attn_weights = self.onestep(output, hidden_i, src_out_i, vid_out_i, src_mask_i)
File "/content/drive/My Drive/MMT/MMTvatex/Video-guided-Machine-Translation/model.py", line 110, in onestep
output, hidden = self.decoder(rnn_input, last_hidden)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py", line 570, in forward
self.dropout, self.training, self.bidirectional, self.batch_first)
RuntimeError: rnn: hx is not contiguous

Could you please tell me why is this happening?
Thank you.

@eric-xw
Copy link
Owner

eric-xw commented Jun 8, 2020

Hi, can you try calling contiguous() for the inputs before feeding them into the decoder LSTM?
The code is working on our end, so we cannot debug it.

@LinuxBeginner
Copy link
Author

LinuxBeginner commented Jun 9, 2020

Hi eric, contiguous() is already implemented at line 169-173 in model.py


 src_out_i = src_out[i].unsqueeze(0).expand(beam_size, src_out.size(1), src_out.size(2)).contiguous() # (bs, seq_len, N)
 vid_out_i = vid_out[i].unsqueeze(0).expand(beam_size, vid_out.size(1), vid_out.size(2)).contiguous()
src_mask_i = src_mask[i].unsqueeze(0).expand(beam_size, src_mask.size(1)).contiguous()
hidden_i = [_[:, i, :].unsqueeze(1).expand(_.size(0), beam_size, _.size(2)).contiguous() for _ in
                            hidden] # (n_layers, bs, 1024)

But, it is still not working, there was no issue at the time of training. The issue is showing only on running the eval.py
Please advice.

@eric-xw
Copy link
Owner

eric-xw commented Jun 9, 2020

Reading the error log, the issue is when calling the LSTM in Line 110. So try calling contiguous() for rnn_input, last_hidden.

@bozhenhhu
Copy link

before this line output, hidden_i, attn_weights = self.onestep(output, hidden_i, src_out_i, vid_out_i, src_mask_i), I add .contiguous() after output and hidden_i as follows:
output = torch.from_numpy(outputs).cuda().contiguous()
def from_numpy(self, states):
return [torch.from_numpy(state).cuda().contiguous() for state in states]
it works.
Apart from this, I find the code in beam_decoding is very hard for me to figure out. It is hugly different with the code in inference, which I thought they may be similar before.
The second output, hidden_i, attn_weights = self.onestep(output, hidden_i, src_out_i, vid_out_i, src_mask_i)
may can be deleted.

@hynbjn
Copy link

hynbjn commented Nov 21, 2022

@bozhenhhu I've tried the method you've suggested, but the code still does not work:(
`import math
import torch
import random
import numpy as np
from torch import nn
import torch.nn.functional as F

from utils import sos_idx, eos_idx

class SoftDotAttention(nn.Module):
def init(self, dim_ctx, dim_h):
'''Initialize layer.'''
super(SoftDotAttention, self).init()
self.linear_in = nn.Linear(dim_h, dim_ctx, bias=False)
self.sm = nn.Softmax(dim=1)

def forward(self, context, h, mask=None):
    '''Propagate h through the network.
    h: batch x dim
    context: batch x seq_len x dim
    mask: batch x seq_len indices to be masked
    '''
    target = self.linear_in(h).unsqueeze(2)  # batch x dim x 1
    # Get attention
    attn = torch.bmm(context, target).squeeze(2)  # batch x seq_len
    if mask is not None:
        # -Inf masking prior to the softmax
        attn.data.masked_fill_(mask, -float('inf'))
    attn = self.sm(attn)
    attn3 = attn.view(attn.size(0), 1, attn.size(1))  # batch x 1 x seq_len
    weighted_ctx = torch.bmm(attn3, context) # batch x dim
    return weighted_ctx, attn

class Encoder(nn.Module):
def init(self, vocab_size, embed_size, hidden_size,
n_layers=2, dropout=0.5):
super(Encoder, self).init()
self.hidden_size = hidden_size
self.embed_size = embed_size
self.src_embed = nn.Embedding(vocab_size, embed_size)
self.src_encoder = nn.LSTM(input_size=embed_size, hidden_size=hidden_size // 2, num_layers=n_layers,
dropout=dropout, batch_first=True, bidirectional=True)

    self.frame_embed = nn.Linear(1024, self.embed_size)
    self.video_encoder = nn.LSTM(input_size=embed_size, hidden_size=hidden_size // 2, num_layers=n_layers,
                                 dropout=dropout, batch_first=True, bidirectional=True)

    self.dropout = nn.Dropout(dropout, inplace=True)

def forward(self, src, vid, src_hidden=None, vid_hidden=None):
    batch_size = src.size(0)

    src_embedded = self.src_embed(src)
    src_out, src_states = self.src_encoder(src_embedded, src_hidden)
    src_h = src_states[0].permute(1, 0, 2).contiguous().view(
        batch_size, 2, -1).permute(1, 0, 2)
    src_c = src_states[1].permute(1, 0, 2).contiguous().view(
        batch_size, 2, -1).permute(1, 0, 2)

    vid_embedded = self.frame_embed(vid)
    vid_out, vid_states = self.video_encoder(vid_embedded, vid_hidden)

    vid_h = vid_states[0].permute(1, 0, 2).contiguous().view(
        batch_size, 2, -1).permute(1, 0, 2)
    vid_c = vid_states[0].permute(1, 0, 2).contiguous().view(
        batch_size, 2, -1).permute(1, 0, 2)

    init_h = torch.cat((src_h, vid_h), 2)
    init_c = torch.cat((src_c, vid_c), 2)

    return src_out, (init_h, init_c), vid_out

class Decoder(nn.Module):
def init(self, embed_size, hidden_size, vocab_size,
n_layers=2, dropout=0.5):
super(Decoder, self).init()
self.embed_size = embed_size
self.hidden_size = hidden_size
self.n_layers = n_layers
self.vocab_size = vocab_size

    self.embed = nn.Embedding(vocab_size, embed_size)
    self.dropout = nn.Dropout(dropout, inplace=True)
    self.src_attention = SoftDotAttention(embed_size, hidden_size)
    self.vid_attention = SoftDotAttention(embed_size, hidden_size)

    self.decoder = nn.LSTM(embed_size*3, hidden_size,
                      n_layers, dropout=dropout, batch_first=True)

    self.fc = nn.Sequential(nn.Linear(self.hidden_size, self.embed_size),
                               nn.Tanh(),
                               nn.Dropout(p=dropout),
                               nn.Linear(embed_size, vocab_size))

def onestep(self, input, last_hidden, src_out, vid_out, src_mask):
    '''
    input: (B,)
    '''
    # Get the embedding of the current input word (last output word)
    embedded = self.embed(input).unsqueeze(1)  # (B,1, N)
    embedded = self.dropout(embedded)
    # Calculate attention weights and apply to encoder outputs
    src_ctx, src_attn = self.src_attention(src_out, last_hidden[0][0], mask=src_mask) # src_ctx: (mb, 1, dim) attn: (mb, 1, seqlen)
    vid_ctx, vid_attn = self.vid_attention(vid_out, last_hidden[0][0])
    # Combine embedded input word and attended context, run through RNN
    rnn_input = torch.cat([embedded, src_ctx, vid_ctx], 2) # (mb, 1, input_size)

    output, hidden = self.decoder(rnn_input, last_hidden) 
    output = output.squeeze(1)  # (B, 1, N) -> (B,N)
    output = self.fc(output)
    return output, hidden, (src_attn, vid_attn)

def forward(self, src, trg, init_hidden, src_out, vid_out, max_len, teacher_forcing_ratio):
    batch_size = trg.size(0)
    src_mask = (src == 0) # mask paddings.

    outputs = torch.zeros(batch_size, max_len, self.vocab_size).cuda()

    hidden = (init_hidden[0][:self.n_layers].contiguous(), init_hidden[1][:self.n_layers].contiguous())
    output = trg.data[:, 0] # <sos>
    for t in range(1, max_len):
        output, hidden, attn_weights = self.onestep(output, hidden, src_out, vid_out, src_mask) # (mb, vocab) (1, mb, N) (mb, 1, seqlen)
        outputs[:, t, :] = output
        is_teacher = random.random() < teacher_forcing_ratio
        top1 = output.data.max(1)[1]
        output = (trg.data[:, t] if is_teacher else top1).cuda() # output should be indices to feed into nn.embedding at next step
    return outputs

def inference(self, src, trg, init_hidden, src_out, vid_out, max_len, teacher_forcing_ratio=0):
    '''
    Greedy decoding
    '''
    batch_size = trg.size(0)
    src_mask = (src == 0) # mask paddings.

    outputs = torch.zeros(batch_size, max_len, self.vocab_size).cuda()

    hidden = (init_hidden[0][:self.n_layers].contiguous(), init_hidden[1][:self.n_layers].contiguous())
    output = trg.data[:, 0] # <sos>
    pred_lengths = [0]*batch_size
    for t in range(1, max_len):
        output, hidden, attn_weights = self.onestep(output, hidden, src_out, vid_out, src_mask) # (mb, vocab) (1, mb, N) (mb, 1, seqlen)
        outputs[:, t, :] = output
        is_teacher = random.random() < teacher_forcing_ratio
        top1 = output.data.max(1)[1]

        output = (trg.data[:, t] if is_teacher else top1).cuda()

        for i in range(batch_size):
            if output[i]==3 and pred_lengths[i]==0:
                pred_lengths[i] = t
    for i in range(batch_size):
        if pred_lengths[i]==0:
            pred_lengths[i] = max_len
    return outputs, pred_lengths

def beam_decoding(self, src, init_hidden, src_out, vid_out, max_len, beam_size=5):
    batch_size = src.size(0)
    src_mask = (src == 0) # mask padding
    hidden = (init_hidden[0][:self.n_layers].contiguous(), init_hidden[1][:self.n_layers].contiguous())

    seq = torch.LongTensor(max_len, batch_size).zero_()
    seq_log_probs = torch.FloatTensor(max_len, batch_size)

    for i in range(batch_size):
        # treat the problem as having a batch size of beam_size
        src_out_i = src_out[i].unsqueeze(0).expand(beam_size, src_out.size(1), src_out.size(2)).contiguous() # (bs, seq_len, N)
        vid_out_i = vid_out[i].unsqueeze(0).expand(beam_size, vid_out.size(1), vid_out.size(2)).contiguous()
        src_mask_i = src_mask[i].unsqueeze(0).expand(beam_size, src_mask.size(1)).contiguous()
        hidden_i = [_[:, i, :].unsqueeze(1).expand(_.size(0), beam_size, _.size(2)).contiguous() for _ in
                        hidden] # (n_layers, bs, 1024)
        
        output = torch.LongTensor([sos_idx] * beam_size).cuda()
        
        output, hidden_i, attn_weights = self.onestep(output, hidden_i, src_out_i, vid_out_i, src_mask_i) # (mb, vocab) (1, mb, N) (mb, 1, seqlen)
        log_probs = F.log_softmax(output, dim=1)
        log_probs[:, -1] = log_probs[:, -1] - 1000
        neg_log_probs = -log_probs

        all_outputs = np.ones((1, beam_size), dtype='int32')
        all_masks = np.ones_like(all_outputs, dtype="float32")
        all_costs = np.zeros_like(all_outputs, dtype="float32")
        
        for j in range(max_len):
            if all_masks[-1].sum() == 0:
                break

            next_costs = (
                all_costs[-1, :, None] + neg_log_probs.data.cpu().numpy() * all_masks[-1, :, None])
            (finished,) = np.where(all_masks[-1] == 0)
            next_costs[finished, 1:] = np.inf

            (indexes, outputs), chosen_costs = self._smallest(
                next_costs, beam_size, only_first_row=j == 0)
            

            new_state_d = [_.data.cpu().numpy()[:, indexes, :]
                           for _ in hidden_i]

            all_outputs = all_outputs[:, indexes]
            all_masks = all_masks[:, indexes]
            all_costs = all_costs[:, indexes]

            output = torch.from_numpy(outputs).cuda().contiguous()
            hidden_i = self.from_numpy(new_state_d)
            output, hidden_i, attn_weights = self.onestep(output, hidden_i, src_out_i, vid_out_i, src_mask_i)
            log_probs = F.log_softmax(output, dim=1)

            log_probs[:, -1] = log_probs[:, -1] - 1000
            neg_log_probs = -log_probs

            all_outputs = np.vstack([all_outputs, outputs[None, :]])
            all_costs = np.vstack([all_costs, chosen_costs[None, :]])
            mask = outputs != 0
            all_masks = np.vstack([all_masks, mask[None, :]])

        all_outputs = all_outputs[1:]
        all_costs = all_costs[1:] - all_costs[:-1]
        all_masks = all_masks[:-1]
        costs = all_costs.sum(axis=0)
        lengths = all_masks.sum(axis=0)
        normalized_cost = costs / lengths
        best_idx = np.argmin(normalized_cost)
        seq[:all_outputs.shape[0], i] = torch.from_numpy(
            all_outputs[:, best_idx])
        seq_log_probs[:all_costs.shape[0], i] = torch.from_numpy(
            all_costs[:, best_idx])

    seq, seq_log_probs = seq.transpose(0, 1), seq_log_probs.transpose(0, 1)

    pred_lengths = [0]*batch_size
    for i in range(batch_size):
        if sum(seq[i] == eos_idx) == 0:
            pred_lengths[i] = max_len
        else:
            pred_lengths[i] = (seq[i] == eos_idx).nonzero()[0][0]
    # return the samples and their log likelihoods
    return seq, pred_lengths # seq_log_probs 

def from_numpy(self, states):
    return [torch.from_numpy(state).cuda().contiguous() for state in states]

@staticmethod
def _smallest(matrix, k, only_first_row=False):
    if only_first_row:
        flatten = matrix[:1, :].flatten()
    else:
        flatten = matrix.flatten()
    args = np.argpartition(flatten, k)[:k]
    args = args[np.argsort(flatten[args])]
    return np.unravel_index(args, matrix.shape), flatten[args]

`
This is the code I ran. What did I do wrong?

@bozhenhhu
Copy link

@bozhenhhu I've tried the method you've suggested, but the code still does not work:( `import math import torch import random import numpy as np from torch import nn import torch.nn.functional as F

from utils import sos_idx, eos_idx

class SoftDotAttention(nn.Module): def init(self, dim_ctx, dim_h): '''Initialize layer.''' super(SoftDotAttention, self).init() self.linear_in = nn.Linear(dim_h, dim_ctx, bias=False) self.sm = nn.Softmax(dim=1)

def forward(self, context, h, mask=None):
    '''Propagate h through the network.
    h: batch x dim
    context: batch x seq_len x dim
    mask: batch x seq_len indices to be masked
    '''
    target = self.linear_in(h).unsqueeze(2)  # batch x dim x 1
    # Get attention
    attn = torch.bmm(context, target).squeeze(2)  # batch x seq_len
    if mask is not None:
        # -Inf masking prior to the softmax
        attn.data.masked_fill_(mask, -float('inf'))
    attn = self.sm(attn)
    attn3 = attn.view(attn.size(0), 1, attn.size(1))  # batch x 1 x seq_len
    weighted_ctx = torch.bmm(attn3, context) # batch x dim
    return weighted_ctx, attn

class Encoder(nn.Module): def init(self, vocab_size, embed_size, hidden_size, n_layers=2, dropout=0.5): super(Encoder, self).init() self.hidden_size = hidden_size self.embed_size = embed_size self.src_embed = nn.Embedding(vocab_size, embed_size) self.src_encoder = nn.LSTM(input_size=embed_size, hidden_size=hidden_size // 2, num_layers=n_layers, dropout=dropout, batch_first=True, bidirectional=True)

    self.frame_embed = nn.Linear(1024, self.embed_size)
    self.video_encoder = nn.LSTM(input_size=embed_size, hidden_size=hidden_size // 2, num_layers=n_layers,
                                 dropout=dropout, batch_first=True, bidirectional=True)

    self.dropout = nn.Dropout(dropout, inplace=True)

def forward(self, src, vid, src_hidden=None, vid_hidden=None):
    batch_size = src.size(0)

    src_embedded = self.src_embed(src)
    src_out, src_states = self.src_encoder(src_embedded, src_hidden)
    src_h = src_states[0].permute(1, 0, 2).contiguous().view(
        batch_size, 2, -1).permute(1, 0, 2)
    src_c = src_states[1].permute(1, 0, 2).contiguous().view(
        batch_size, 2, -1).permute(1, 0, 2)

    vid_embedded = self.frame_embed(vid)
    vid_out, vid_states = self.video_encoder(vid_embedded, vid_hidden)

    vid_h = vid_states[0].permute(1, 0, 2).contiguous().view(
        batch_size, 2, -1).permute(1, 0, 2)
    vid_c = vid_states[0].permute(1, 0, 2).contiguous().view(
        batch_size, 2, -1).permute(1, 0, 2)

    init_h = torch.cat((src_h, vid_h), 2)
    init_c = torch.cat((src_c, vid_c), 2)

    return src_out, (init_h, init_c), vid_out

class Decoder(nn.Module): def init(self, embed_size, hidden_size, vocab_size, n_layers=2, dropout=0.5): super(Decoder, self).init() self.embed_size = embed_size self.hidden_size = hidden_size self.n_layers = n_layers self.vocab_size = vocab_size

    self.embed = nn.Embedding(vocab_size, embed_size)
    self.dropout = nn.Dropout(dropout, inplace=True)
    self.src_attention = SoftDotAttention(embed_size, hidden_size)
    self.vid_attention = SoftDotAttention(embed_size, hidden_size)

    self.decoder = nn.LSTM(embed_size*3, hidden_size,
                      n_layers, dropout=dropout, batch_first=True)

    self.fc = nn.Sequential(nn.Linear(self.hidden_size, self.embed_size),
                               nn.Tanh(),
                               nn.Dropout(p=dropout),
                               nn.Linear(embed_size, vocab_size))

def onestep(self, input, last_hidden, src_out, vid_out, src_mask):
    '''
    input: (B,)
    '''
    # Get the embedding of the current input word (last output word)
    embedded = self.embed(input).unsqueeze(1)  # (B,1, N)
    embedded = self.dropout(embedded)
    # Calculate attention weights and apply to encoder outputs
    src_ctx, src_attn = self.src_attention(src_out, last_hidden[0][0], mask=src_mask) # src_ctx: (mb, 1, dim) attn: (mb, 1, seqlen)
    vid_ctx, vid_attn = self.vid_attention(vid_out, last_hidden[0][0])
    # Combine embedded input word and attended context, run through RNN
    rnn_input = torch.cat([embedded, src_ctx, vid_ctx], 2) # (mb, 1, input_size)

    output, hidden = self.decoder(rnn_input, last_hidden) 
    output = output.squeeze(1)  # (B, 1, N) -> (B,N)
    output = self.fc(output)
    return output, hidden, (src_attn, vid_attn)

def forward(self, src, trg, init_hidden, src_out, vid_out, max_len, teacher_forcing_ratio):
    batch_size = trg.size(0)
    src_mask = (src == 0) # mask paddings.

    outputs = torch.zeros(batch_size, max_len, self.vocab_size).cuda()

    hidden = (init_hidden[0][:self.n_layers].contiguous(), init_hidden[1][:self.n_layers].contiguous())
    output = trg.data[:, 0] # <sos>
    for t in range(1, max_len):
        output, hidden, attn_weights = self.onestep(output, hidden, src_out, vid_out, src_mask) # (mb, vocab) (1, mb, N) (mb, 1, seqlen)
        outputs[:, t, :] = output
        is_teacher = random.random() < teacher_forcing_ratio
        top1 = output.data.max(1)[1]
        output = (trg.data[:, t] if is_teacher else top1).cuda() # output should be indices to feed into nn.embedding at next step
    return outputs

def inference(self, src, trg, init_hidden, src_out, vid_out, max_len, teacher_forcing_ratio=0):
    '''
    Greedy decoding
    '''
    batch_size = trg.size(0)
    src_mask = (src == 0) # mask paddings.

    outputs = torch.zeros(batch_size, max_len, self.vocab_size).cuda()

    hidden = (init_hidden[0][:self.n_layers].contiguous(), init_hidden[1][:self.n_layers].contiguous())
    output = trg.data[:, 0] # <sos>
    pred_lengths = [0]*batch_size
    for t in range(1, max_len):
        output, hidden, attn_weights = self.onestep(output, hidden, src_out, vid_out, src_mask) # (mb, vocab) (1, mb, N) (mb, 1, seqlen)
        outputs[:, t, :] = output
        is_teacher = random.random() < teacher_forcing_ratio
        top1 = output.data.max(1)[1]

        output = (trg.data[:, t] if is_teacher else top1).cuda()

        for i in range(batch_size):
            if output[i]==3 and pred_lengths[i]==0:
                pred_lengths[i] = t
    for i in range(batch_size):
        if pred_lengths[i]==0:
            pred_lengths[i] = max_len
    return outputs, pred_lengths

def beam_decoding(self, src, init_hidden, src_out, vid_out, max_len, beam_size=5):
    batch_size = src.size(0)
    src_mask = (src == 0) # mask padding
    hidden = (init_hidden[0][:self.n_layers].contiguous(), init_hidden[1][:self.n_layers].contiguous())

    seq = torch.LongTensor(max_len, batch_size).zero_()
    seq_log_probs = torch.FloatTensor(max_len, batch_size)

    for i in range(batch_size):
        # treat the problem as having a batch size of beam_size
        src_out_i = src_out[i].unsqueeze(0).expand(beam_size, src_out.size(1), src_out.size(2)).contiguous() # (bs, seq_len, N)
        vid_out_i = vid_out[i].unsqueeze(0).expand(beam_size, vid_out.size(1), vid_out.size(2)).contiguous()
        src_mask_i = src_mask[i].unsqueeze(0).expand(beam_size, src_mask.size(1)).contiguous()
        hidden_i = [_[:, i, :].unsqueeze(1).expand(_.size(0), beam_size, _.size(2)).contiguous() for _ in
                        hidden] # (n_layers, bs, 1024)
        
        output = torch.LongTensor([sos_idx] * beam_size).cuda()
        
        output, hidden_i, attn_weights = self.onestep(output, hidden_i, src_out_i, vid_out_i, src_mask_i) # (mb, vocab) (1, mb, N) (mb, 1, seqlen)
        log_probs = F.log_softmax(output, dim=1)
        log_probs[:, -1] = log_probs[:, -1] - 1000
        neg_log_probs = -log_probs

        all_outputs = np.ones((1, beam_size), dtype='int32')
        all_masks = np.ones_like(all_outputs, dtype="float32")
        all_costs = np.zeros_like(all_outputs, dtype="float32")
        
        for j in range(max_len):
            if all_masks[-1].sum() == 0:
                break

            next_costs = (
                all_costs[-1, :, None] + neg_log_probs.data.cpu().numpy() * all_masks[-1, :, None])
            (finished,) = np.where(all_masks[-1] == 0)
            next_costs[finished, 1:] = np.inf

            (indexes, outputs), chosen_costs = self._smallest(
                next_costs, beam_size, only_first_row=j == 0)
            

            new_state_d = [_.data.cpu().numpy()[:, indexes, :]
                           for _ in hidden_i]

            all_outputs = all_outputs[:, indexes]
            all_masks = all_masks[:, indexes]
            all_costs = all_costs[:, indexes]

            output = torch.from_numpy(outputs).cuda().contiguous()
            hidden_i = self.from_numpy(new_state_d)
            output, hidden_i, attn_weights = self.onestep(output, hidden_i, src_out_i, vid_out_i, src_mask_i)
            log_probs = F.log_softmax(output, dim=1)

            log_probs[:, -1] = log_probs[:, -1] - 1000
            neg_log_probs = -log_probs

            all_outputs = np.vstack([all_outputs, outputs[None, :]])
            all_costs = np.vstack([all_costs, chosen_costs[None, :]])
            mask = outputs != 0
            all_masks = np.vstack([all_masks, mask[None, :]])

        all_outputs = all_outputs[1:]
        all_costs = all_costs[1:] - all_costs[:-1]
        all_masks = all_masks[:-1]
        costs = all_costs.sum(axis=0)
        lengths = all_masks.sum(axis=0)
        normalized_cost = costs / lengths
        best_idx = np.argmin(normalized_cost)
        seq[:all_outputs.shape[0], i] = torch.from_numpy(
            all_outputs[:, best_idx])
        seq_log_probs[:all_costs.shape[0], i] = torch.from_numpy(
            all_costs[:, best_idx])

    seq, seq_log_probs = seq.transpose(0, 1), seq_log_probs.transpose(0, 1)

    pred_lengths = [0]*batch_size
    for i in range(batch_size):
        if sum(seq[i] == eos_idx) == 0:
            pred_lengths[i] = max_len
        else:
            pred_lengths[i] = (seq[i] == eos_idx).nonzero()[0][0]
    # return the samples and their log likelihoods
    return seq, pred_lengths # seq_log_probs 

def from_numpy(self, states):
    return [torch.from_numpy(state).cuda().contiguous() for state in states]

@staticmethod
def _smallest(matrix, k, only_first_row=False):
    if only_first_row:
        flatten = matrix[:1, :].flatten()
    else:
        flatten = matrix.flatten()
    args = np.argpartition(flatten, k)[:k]
    args = args[np.argsort(flatten[args])]
    return np.unravel_index(args, matrix.shape), flatten[args]

` This is the code I ran. What did I do wrong?

Do you have the same environment as this repository, like the prerequisites? It has been a long time since this model was published, and many packages have been updated, which may result in incompatibility. Why not try the up-to-date methods?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants