You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Train seq2seq RNNs to generate syntactically valid inputs to gold code, and then use gold code as oracle to get output for generated input (if the input has correct syntax). Inside of try/except, seq2seq receives gold code as input and has to generate an input for gold code that doesn’t set off exception. Reward seq2seq +1 if no exception, -1 if exception. & add entropy (or minibatch discrimination?) to loss so it’ll have varied generations. To learn which syntax error it made, have seq2seq predict which exception it set off and lessen negative reward if prediction is correct. Maybe SL pretrain with already provided test cases (if we have any) before RL stage (entropy in RL loss will ensure RL stage stays varied).
The text was updated successfully, but these errors were encountered:
Train seq2seq RNNs to generate syntactically valid inputs to gold code, and then use gold code as oracle to get output for generated input (if the input has correct syntax). Inside of try/except, seq2seq receives gold code as input and has to generate an input for gold code that doesn’t set off exception. Reward seq2seq +1 if no exception, -1 if exception. & add entropy (or minibatch discrimination?) to loss so it’ll have varied generations. To learn which syntax error it made, have seq2seq predict which exception it set off and lessen negative reward if prediction is correct. Maybe SL pretrain with already provided test cases (if we have any) before RL stage (entropy in RL loss will ensure RL stage stays varied).
The text was updated successfully, but these errors were encountered: