-
Notifications
You must be signed in to change notification settings - Fork 0
/
log.txt
4859 lines (4859 loc) · 692 KB
/
log.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
2024-10-29 16:13:10,826 - Iteration 0 | Train Loss 11.00112 | Training Time: 0000h 00m 34s | Iteration Time: 34158.717 ms | 2877.9 tokens/sec
2024-10-29 16:13:17,647 - Iteration 1 | Train Loss 10.00988 | Training Time: 0000h 00m 40s | Iteration Time: 6820.844 ms | 14412.3 tokens/sec
2024-10-29 16:13:24,486 - Iteration 2 | Train Loss 9.37065 | Training Time: 0000h 00m 47s | Iteration Time: 6839.377 ms | 14373.2 tokens/sec
2024-10-29 16:13:31,329 - Iteration 3 | Train Loss 8.95712 | Training Time: 0000h 00m 54s | Iteration Time: 6843.273 ms | 14365.1 tokens/sec
2024-10-29 16:13:38,190 - Iteration 4 | Train Loss 8.60862 | Training Time: 0000h 01m 01s | Iteration Time: 6860.199 ms | 14329.6 tokens/sec
2024-10-29 16:13:45,058 - Iteration 5 | Train Loss 8.36450 | Training Time: 0000h 01m 08s | Iteration Time: 6868.828 ms | 14311.6 tokens/sec
2024-10-29 16:13:51,929 - Iteration 6 | Train Loss 8.13959 | Training Time: 0000h 01m 15s | Iteration Time: 6870.996 ms | 14307.1 tokens/sec
2024-10-29 16:13:58,774 - Iteration 7 | Train Loss 8.00657 | Training Time: 0000h 01m 22s | Iteration Time: 6844.800 ms | 14361.9 tokens/sec
2024-10-29 16:14:05,629 - Iteration 8 | Train Loss 7.85076 | Training Time: 0000h 01m 28s | Iteration Time: 6855.226 ms | 14340.0 tokens/sec
2024-10-29 16:14:12,513 - Iteration 9 | Train Loss 7.81880 | Training Time: 0000h 01m 35s | Iteration Time: 6883.607 ms | 14280.9 tokens/sec
2024-10-29 16:14:19,400 - Iteration 10 | Train Loss 7.76733 | Training Time: 0000h 01m 42s | Iteration Time: 6887.081 ms | 14273.7 tokens/sec
2024-10-29 16:15:02,769 - Iteration 19 | Train Loss 7.74691 | Training Time: 0000h 02m 26s | Iteration Time: 4818.776 ms | 20400.2 tokens/sec
2024-10-29 16:15:09,556 - Iteration 20 | Train Loss 7.70321 | Training Time: 0000h 02m 32s | Iteration Time: 6787.206 ms | 14483.7 tokens/sec
2024-10-29 16:15:16,418 - Iteration 21 | Train Loss 7.63504 | Training Time: 0000h 02m 39s | Iteration Time: 6862.082 ms | 14325.7 tokens/sec
2024-10-29 16:17:35,822 - Iteration 51 | Train Loss 7.61611 | Training Time: 0000h 04m 59s | Iteration Time: 4646.773 ms | 21155.3 tokens/sec
2024-10-29 16:18:56,066 - Iteration 68 | Train Loss 7.58043 | Training Time: 0000h 06m 19s | Iteration Time: 4720.273 ms | 20825.9 tokens/sec
2024-10-29 16:20:20,903 - Iteration 86 | Train Loss 7.58026 | Training Time: 0000h 07m 44s | Iteration Time: 4713.153 ms | 20857.4 tokens/sec
2024-10-29 16:21:59,355 - Iteration 107 | Train Loss 7.57781 | Training Time: 0000h 09m 22s | Iteration Time: 4688.178 ms | 20968.5 tokens/sec
2024-10-29 16:24:28,453 - Iteration 139 | Train Loss 7.56642 | Training Time: 0000h 11m 51s | Iteration Time: 4659.318 ms | 21098.4 tokens/sec
2024-10-29 16:25:25,813 - Iteration 151 | Train Loss 7.56150 | Training Time: 0000h 12m 49s | Iteration Time: 4779.981 ms | 20565.8 tokens/sec
2024-10-29 16:27:04,515 - Iteration 172 | Train Loss 7.55573 | Training Time: 0000h 14m 27s | Iteration Time: 4700.106 ms | 20915.3 tokens/sec
2024-10-29 16:31:00,967 - Iteration 223 | Train Loss 7.50704 | Training Time: 0000h 18m 24s | Iteration Time: 4636.318 ms | 21203.0 tokens/sec
2024-10-29 16:32:30,466 - Iteration 242 | Train Loss 7.47868 | Training Time: 0000h 19m 53s | Iteration Time: 4710.448 ms | 20869.4 tokens/sec
2024-10-29 16:39:44,937 - Iteration 336 | Train Loss 7.46720 | Training Time: 0000h 27m 08s | Iteration Time: 4622.041 ms | 21268.5 tokens/sec
2024-10-29 16:40:28,678 - Iteration 345 | Train Loss 7.34915 | Training Time: 0000h 27m 52s | Iteration Time: 4860.066 ms | 20226.9 tokens/sec
2024-10-29 16:44:30,436 - Iteration 397 | Train Loss 7.26250 | Training Time: 0000h 31m 53s | Iteration Time: 4649.193 ms | 21144.3 tokens/sec
2024-10-29 16:45:46,420 - Iteration 413 | Train Loss 7.26176 | Training Time: 0000h 33m 09s | Iteration Time: 4748.991 ms | 20700.0 tokens/sec
2024-10-29 16:46:11,740 - Iteration 418 | Train Loss 7.24613 | Training Time: 0000h 33m 35s | Iteration Time: 5064.014 ms | 19412.3 tokens/sec
2024-10-29 16:46:23,242 - Iteration 420 | Train Loss 7.24161 | Training Time: 0000h 33m 46s | Iteration Time: 5750.850 ms | 17093.8 tokens/sec
2024-10-29 16:47:48,317 - Iteration 438 | Train Loss 7.22038 | Training Time: 0000h 35m 11s | Iteration Time: 4726.386 ms | 20799.0 tokens/sec
2024-10-29 16:47:55,220 - Iteration 439 | Train Loss 7.21591 | Training Time: 0000h 35m 18s | Iteration Time: 6903.124 ms | 14240.5 tokens/sec
2024-10-29 16:48:15,882 - Iteration 443 | Train Loss 7.18354 | Training Time: 0000h 35m 39s | Iteration Time: 5165.652 ms | 19030.3 tokens/sec
2024-10-29 16:49:03,914 - Iteration 453 | Train Loss 7.10614 | Training Time: 0000h 36m 27s | Iteration Time: 4803.161 ms | 20466.5 tokens/sec
2024-10-29 16:51:19,773 - Iteration 482 | Train Loss 7.07366 | Training Time: 0000h 38m 43s | Iteration Time: 4684.783 ms | 20983.7 tokens/sec
2024-10-29 16:51:58,631 - Iteration 490 | Train Loss 6.97924 | Training Time: 0000h 39m 21s | Iteration Time: 4857.255 ms | 20238.6 tokens/sec
2024-10-29 16:53:14,643 - Iteration 506 | Train Loss 6.93444 | Training Time: 0000h 40m 37s | Iteration Time: 4750.790 ms | 20692.1 tokens/sec
2024-10-29 16:58:03,297 - Iteration 526 | Train Loss 7.06733 | Training Time: 0000h 42m 43s | Iteration Time: 238.503 ms | 412171.3 tokens/sec
2024-10-29 16:59:35,113 - Iteration 546 | Train Loss 7.11555 | Training Time: 0000h 44m 15s | Iteration Time: 4590.805 ms | 21413.2 tokens/sec
2024-10-29 17:01:07,183 - Iteration 566 | Train Loss 6.98707 | Training Time: 0000h 45m 47s | Iteration Time: 4603.514 ms | 21354.1 tokens/sec
2024-10-29 17:01:44,032 - Iteration 574 | Train Loss 6.86726 | Training Time: 0000h 46m 24s | Iteration Time: 4606.169 ms | 21341.8 tokens/sec
2024-10-29 17:04:13,948 - Iteration 606 | Train Loss 6.94107 | Training Time: 0000h 48m 54s | Iteration Time: 4684.861 ms | 20983.3 tokens/sec
2024-10-29 17:04:46,253 - Iteration 613 | Train Loss 6.83887 | Training Time: 0000h 49m 26s | Iteration Time: 4615.018 ms | 21300.9 tokens/sec
2024-10-29 17:07:38,542 - Iteration 633 | Train Loss 6.90093 | Training Time: 0000h 51m 32s | Iteration Time: 198.767 ms | 494570.0 tokens/sec
2024-10-29 17:07:43,144 - Iteration 634 | Train Loss 6.82301 | Training Time: 0000h 51m 36s | Iteration Time: 4602.196 ms | 21360.2 tokens/sec
2024-10-29 17:10:03,657 - Iteration 664 | Train Loss 6.80639 | Training Time: 0000h 53m 57s | Iteration Time: 4683.762 ms | 20988.3 tokens/sec
2024-10-29 17:11:24,325 - Iteration 681 | Train Loss 6.72329 | Training Time: 0000h 55m 17s | Iteration Time: 4745.161 ms | 20716.7 tokens/sec
2024-10-29 17:13:54,263 - Iteration 713 | Train Loss 6.83426 | Training Time: 0000h 57m 47s | Iteration Time: 4685.572 ms | 20980.1 tokens/sec
2024-10-29 17:14:26,586 - Iteration 720 | Train Loss 6.69868 | Training Time: 0000h 58m 20s | Iteration Time: 4617.479 ms | 21289.5 tokens/sec
2024-10-29 17:17:01,257 - Iteration 753 | Train Loss 6.86602 | Training Time: 0001h 00m 54s | Iteration Time: 4686.999 ms | 20973.8 tokens/sec
2024-10-29 17:17:33,594 - Iteration 760 | Train Loss 6.68437 | Training Time: 0001h 01m 27s | Iteration Time: 4619.558 ms | 21280.0 tokens/sec
2024-10-29 17:20:08,274 - Iteration 793 | Train Loss 6.84742 | Training Time: 0001h 04m 01s | Iteration Time: 4687.284 ms | 20972.5 tokens/sec
2024-10-29 17:21:08,335 - Iteration 806 | Train Loss 6.65276 | Training Time: 0001h 05m 01s | Iteration Time: 4620.113 ms | 21277.4 tokens/sec
2024-10-29 17:22:10,638 - Iteration 819 | Train Loss 6.64016 | Training Time: 0001h 06m 04s | Iteration Time: 4792.481 ms | 20512.1 tokens/sec
2024-10-29 17:23:31,376 - Iteration 836 | Train Loss 6.60376 | Training Time: 0001h 07m 25s | Iteration Time: 4749.314 ms | 20698.6 tokens/sec
2024-10-29 17:25:01,295 - Iteration 855 | Train Loss 6.54888 | Training Time: 0001h 08m 54s | Iteration Time: 4732.604 ms | 20771.7 tokens/sec
2024-10-29 17:27:59,084 - Iteration 893 | Train Loss 6.79322 | Training Time: 0001h 11m 52s | Iteration Time: 4678.652 ms | 21011.2 tokens/sec
2024-10-29 17:29:31,514 - Iteration 913 | Train Loss 6.88596 | Training Time: 0001h 13m 25s | Iteration Time: 4621.490 ms | 21271.1 tokens/sec
2024-10-29 17:31:03,984 - Iteration 933 | Train Loss 6.82440 | Training Time: 0001h 14m 57s | Iteration Time: 4623.524 ms | 21261.7 tokens/sec
2024-10-29 17:32:36,474 - Iteration 953 | Train Loss 6.73853 | Training Time: 0001h 16m 30s | Iteration Time: 4624.487 ms | 21257.3 tokens/sec
2024-10-29 17:34:08,935 - Iteration 973 | Train Loss 6.74884 | Training Time: 0001h 18m 02s | Iteration Time: 4623.058 ms | 21263.8 tokens/sec
2024-10-29 17:35:41,373 - Iteration 993 | Train Loss 6.81201 | Training Time: 0001h 19m 35s | Iteration Time: 4621.873 ms | 21269.3 tokens/sec
2024-10-29 17:37:13,802 - Iteration 1013 | Train Loss 6.68994 | Training Time: 0001h 21m 07s | Iteration Time: 4621.475 ms | 21271.1 tokens/sec
2024-10-29 17:38:46,206 - Iteration 1033 | Train Loss 6.58318 | Training Time: 0001h 22m 39s | Iteration Time: 4620.184 ms | 21277.1 tokens/sec
2024-10-29 17:40:18,572 - Iteration 1053 | Train Loss 6.65734 | Training Time: 0001h 24m 12s | Iteration Time: 4618.284 ms | 21285.8 tokens/sec
2024-10-29 17:41:18,583 - Iteration 1066 | Train Loss 6.52825 | Training Time: 0001h 25m 12s | Iteration Time: 4616.233 ms | 21295.3 tokens/sec
2024-10-29 17:43:25,481 - Iteration 1093 | Train Loss 6.70282 | Training Time: 0001h 27m 19s | Iteration Time: 4699.948 ms | 20916.0 tokens/sec
2024-10-29 17:44:11,664 - Iteration 1103 | Train Loss 6.51735 | Training Time: 0001h 28m 05s | Iteration Time: 4618.221 ms | 21286.1 tokens/sec
2024-10-29 17:46:32,335 - Iteration 1133 | Train Loss 6.78537 | Training Time: 0001h 30m 25s | Iteration Time: 4689.057 ms | 20964.6 tokens/sec
2024-10-29 17:48:04,597 - Iteration 1153 | Train Loss 6.68342 | Training Time: 0001h 31m 58s | Iteration Time: 4613.096 ms | 21309.8 tokens/sec
2024-10-29 17:49:36,886 - Iteration 1173 | Train Loss 6.65341 | Training Time: 0001h 33m 30s | Iteration Time: 4614.457 ms | 21303.5 tokens/sec
2024-10-29 17:51:09,171 - Iteration 1193 | Train Loss 6.67877 | Training Time: 0001h 35m 02s | Iteration Time: 4614.230 ms | 21304.5 tokens/sec
2024-10-29 17:52:41,566 - Iteration 1213 | Train Loss 6.75044 | Training Time: 0001h 36m 35s | Iteration Time: 4619.739 ms | 21279.1 tokens/sec
2024-10-29 17:53:41,572 - Iteration 1226 | Train Loss 6.34500 | Training Time: 0001h 37m 35s | Iteration Time: 4615.885 ms | 21296.9 tokens/sec
2024-10-29 18:29:03,819 - Iteration 1246 | Train Loss 6.63358 | Training Time: 0001h 39m 40s | Iteration Time: 100.887 ms | 974395.3 tokens/sec
2024-10-29 18:30:35,430 - Iteration 1266 | Train Loss 6.72246 | Training Time: 0001h 41m 12s | Iteration Time: 4580.539 ms | 21461.2 tokens/sec
2024-10-29 18:32:07,188 - Iteration 1286 | Train Loss 6.62608 | Training Time: 0001h 42m 44s | Iteration Time: 4587.897 ms | 21426.8 tokens/sec
2024-10-29 18:33:39,073 - Iteration 1306 | Train Loss 6.62656 | Training Time: 0001h 44m 16s | Iteration Time: 4594.274 ms | 21397.1 tokens/sec
2024-10-29 18:35:11,076 - Iteration 1326 | Train Loss 6.62398 | Training Time: 0001h 45m 48s | Iteration Time: 4600.143 ms | 21369.8 tokens/sec
2024-10-29 18:36:43,139 - Iteration 1346 | Train Loss 6.68842 | Training Time: 0001h 47m 20s | Iteration Time: 4603.127 ms | 21355.9 tokens/sec
2024-10-29 18:38:15,206 - Iteration 1366 | Train Loss 6.67597 | Training Time: 0001h 48m 52s | Iteration Time: 4603.365 ms | 21354.8 tokens/sec
2024-10-29 18:39:47,319 - Iteration 1386 | Train Loss 6.59432 | Training Time: 0001h 50m 24s | Iteration Time: 4605.659 ms | 21344.2 tokens/sec
2024-10-29 18:41:19,438 - Iteration 1406 | Train Loss 6.66197 | Training Time: 0001h 51m 56s | Iteration Time: 4605.956 ms | 21342.8 tokens/sec
2024-10-29 18:42:51,597 - Iteration 1426 | Train Loss 6.79343 | Training Time: 0001h 53m 28s | Iteration Time: 4607.909 ms | 21333.8 tokens/sec
2024-10-29 18:44:23,787 - Iteration 1446 | Train Loss 6.62541 | Training Time: 0001h 55m 00s | Iteration Time: 4609.503 ms | 21326.4 tokens/sec
2024-10-29 18:45:55,969 - Iteration 1466 | Train Loss 6.58125 | Training Time: 0001h 56m 33s | Iteration Time: 4609.140 ms | 21328.1 tokens/sec
2024-10-29 18:47:28,153 - Iteration 1486 | Train Loss 6.73595 | Training Time: 0001h 58m 05s | Iteration Time: 4609.203 ms | 21327.8 tokens/sec
2024-10-29 18:49:00,302 - Iteration 1506 | Train Loss 6.64332 | Training Time: 0001h 59m 37s | Iteration Time: 4607.440 ms | 21335.9 tokens/sec
2024-10-29 18:50:32,447 - Iteration 1526 | Train Loss 6.69289 | Training Time: 0002h 01m 09s | Iteration Time: 4607.213 ms | 21337.0 tokens/sec
2024-10-29 18:52:04,585 - Iteration 1546 | Train Loss 6.64299 | Training Time: 0002h 02m 41s | Iteration Time: 4606.916 ms | 21338.4 tokens/sec
2024-10-29 18:53:36,686 - Iteration 1566 | Train Loss 6.61230 | Training Time: 0002h 04m 13s | Iteration Time: 4605.052 ms | 21347.0 tokens/sec
2024-10-29 18:55:08,802 - Iteration 1586 | Train Loss 6.63622 | Training Time: 0002h 05m 45s | Iteration Time: 4605.784 ms | 21343.6 tokens/sec
2024-10-29 18:56:40,941 - Iteration 1606 | Train Loss 6.72473 | Training Time: 0002h 07m 18s | Iteration Time: 4606.968 ms | 21338.1 tokens/sec
2024-10-29 18:58:13,078 - Iteration 1626 | Train Loss 6.54211 | Training Time: 0002h 08m 50s | Iteration Time: 4606.874 ms | 21338.5 tokens/sec
2024-10-29 18:59:45,245 - Iteration 1646 | Train Loss 6.47445 | Training Time: 0002h 10m 22s | Iteration Time: 4608.327 ms | 21331.8 tokens/sec
2024-10-29 19:01:17,410 - Iteration 1666 | Train Loss 6.54393 | Training Time: 0002h 11m 54s | Iteration Time: 4608.232 ms | 21332.3 tokens/sec
2024-10-29 19:02:49,602 - Iteration 1686 | Train Loss 6.52847 | Training Time: 0002h 13m 26s | Iteration Time: 4609.626 ms | 21325.8 tokens/sec
2024-10-29 19:04:21,794 - Iteration 1706 | Train Loss 6.55984 | Training Time: 0002h 14m 58s | Iteration Time: 4609.601 ms | 21325.9 tokens/sec
2024-10-29 19:05:53,969 - Iteration 1726 | Train Loss 6.57520 | Training Time: 0002h 16m 31s | Iteration Time: 4608.762 ms | 21329.8 tokens/sec
2024-10-29 19:07:26,144 - Iteration 1746 | Train Loss 6.70263 | Training Time: 0002h 18m 03s | Iteration Time: 4608.714 ms | 21330.0 tokens/sec
2024-10-29 19:08:58,383 - Iteration 1766 | Train Loss 6.56261 | Training Time: 0002h 19m 35s | Iteration Time: 4611.959 ms | 21315.0 tokens/sec
2024-10-29 19:10:30,600 - Iteration 1786 | Train Loss 6.52026 | Training Time: 0002h 21m 07s | Iteration Time: 4610.835 ms | 21320.2 tokens/sec
2024-10-29 19:12:02,818 - Iteration 1806 | Train Loss 6.52071 | Training Time: 0002h 22m 39s | Iteration Time: 4610.902 ms | 21319.9 tokens/sec
2024-10-29 19:13:35,048 - Iteration 1826 | Train Loss 6.63556 | Training Time: 0002h 24m 12s | Iteration Time: 4611.510 ms | 21317.1 tokens/sec
2024-10-29 19:14:35,014 - Iteration 1839 | Train Loss 6.23677 | Training Time: 0002h 25m 12s | Iteration Time: 4612.800 ms | 21311.1 tokens/sec
2024-10-29 19:14:41,945 - Iteration 1840 | Train Loss 6.23270 | Training Time: 0002h 25m 19s | Iteration Time: 6930.449 ms | 14184.4 tokens/sec
2024-10-29 19:18:16,601 - Iteration 1886 | Train Loss 6.51976 | Training Time: 0002h 28m 53s | Iteration Time: 4666.446 ms | 21066.1 tokens/sec
2024-10-29 19:19:48,947 - Iteration 1906 | Train Loss 6.58653 | Training Time: 0002h 30m 26s | Iteration Time: 4617.292 ms | 21290.4 tokens/sec
2024-10-29 19:21:21,300 - Iteration 1926 | Train Loss 6.53986 | Training Time: 0002h 31m 58s | Iteration Time: 4617.655 ms | 21288.7 tokens/sec
2024-10-29 19:22:53,652 - Iteration 1946 | Train Loss 6.55223 | Training Time: 0002h 33m 30s | Iteration Time: 4617.618 ms | 21288.9 tokens/sec
2024-10-29 19:24:25,998 - Iteration 1966 | Train Loss 6.63283 | Training Time: 0002h 35m 03s | Iteration Time: 4617.292 ms | 21290.4 tokens/sec
2024-10-29 19:25:58,351 - Iteration 1986 | Train Loss 6.47516 | Training Time: 0002h 36m 35s | Iteration Time: 4617.652 ms | 21288.7 tokens/sec
2024-10-29 19:27:30,693 - Iteration 2006 | Train Loss 6.64502 | Training Time: 0002h 38m 07s | Iteration Time: 4617.094 ms | 21291.3 tokens/sec
2024-10-29 19:29:03,037 - Iteration 2026 | Train Loss 6.55669 | Training Time: 0002h 39m 40s | Iteration Time: 4617.215 ms | 21290.8 tokens/sec
2024-10-29 19:30:35,403 - Iteration 2046 | Train Loss 6.48817 | Training Time: 0002h 41m 12s | Iteration Time: 4618.279 ms | 21285.8 tokens/sec
2024-10-29 19:32:07,754 - Iteration 2066 | Train Loss 6.56943 | Training Time: 0002h 42m 44s | Iteration Time: 4617.566 ms | 21289.1 tokens/sec
2024-10-29 19:33:40,085 - Iteration 2086 | Train Loss 6.57291 | Training Time: 0002h 44m 17s | Iteration Time: 4616.535 ms | 21293.9 tokens/sec
2024-10-29 19:35:12,434 - Iteration 2106 | Train Loss 6.66070 | Training Time: 0002h 45m 49s | Iteration Time: 4617.438 ms | 21289.7 tokens/sec
2024-10-29 19:36:44,803 - Iteration 2126 | Train Loss 6.35027 | Training Time: 0002h 47m 21s | Iteration Time: 4618.438 ms | 21285.1 tokens/sec
2024-10-29 19:38:17,167 - Iteration 2146 | Train Loss 6.52213 | Training Time: 0002h 48m 54s | Iteration Time: 4618.213 ms | 21286.2 tokens/sec
2024-10-29 19:39:49,538 - Iteration 2166 | Train Loss 6.35428 | Training Time: 0002h 50m 26s | Iteration Time: 4618.537 ms | 21284.7 tokens/sec
2024-10-29 19:41:21,899 - Iteration 2186 | Train Loss 6.38004 | Training Time: 0002h 51m 59s | Iteration Time: 4618.057 ms | 21286.9 tokens/sec
2024-10-29 19:42:54,236 - Iteration 2206 | Train Loss 6.37993 | Training Time: 0002h 53m 31s | Iteration Time: 4616.873 ms | 21292.3 tokens/sec
2024-10-29 19:44:26,586 - Iteration 2226 | Train Loss 6.38713 | Training Time: 0002h 55m 03s | Iteration Time: 4617.503 ms | 21289.4 tokens/sec
2024-10-29 19:45:58,959 - Iteration 2246 | Train Loss 6.48723 | Training Time: 0002h 56m 36s | Iteration Time: 4618.662 ms | 21284.1 tokens/sec
2024-10-29 19:47:31,635 - Iteration 2266 | Train Loss 6.46485 | Training Time: 0002h 58m 08s | Iteration Time: 4633.758 ms | 21214.7 tokens/sec
2024-10-29 19:49:04,206 - Iteration 2286 | Train Loss 6.49008 | Training Time: 0002h 59m 41s | Iteration Time: 4628.558 ms | 21238.6 tokens/sec
2024-10-29 19:50:36,577 - Iteration 2306 | Train Loss 6.44579 | Training Time: 0003h 01m 13s | Iteration Time: 4618.572 ms | 21284.5 tokens/sec
2024-10-29 19:52:08,928 - Iteration 2326 | Train Loss 6.47905 | Training Time: 0003h 02m 46s | Iteration Time: 4617.551 ms | 21289.2 tokens/sec
2024-10-29 19:53:41,257 - Iteration 2346 | Train Loss 6.54877 | Training Time: 0003h 04m 18s | Iteration Time: 4616.443 ms | 21294.3 tokens/sec
2024-10-29 19:55:13,575 - Iteration 2366 | Train Loss 6.53691 | Training Time: 0003h 05m 50s | Iteration Time: 4615.918 ms | 21296.7 tokens/sec
2024-10-29 19:56:45,886 - Iteration 2386 | Train Loss 6.39578 | Training Time: 0003h 07m 23s | Iteration Time: 4615.529 ms | 21298.5 tokens/sec
2024-10-29 19:57:59,722 - Iteration 2402 | Train Loss 6.22534 | Training Time: 0003h 08m 36s | Iteration Time: 4614.720 ms | 21302.3 tokens/sec
2024-10-29 22:27:53,073 - Iteration 2422 | Train Loss 6.40497 | Training Time: 0003h 10m 50s | Iteration Time: 55.039 ms | 1786068.4 tokens/sec
2024-10-29 22:29:24,418 - Iteration 2442 | Train Loss 6.43067 | Training Time: 0003h 12m 21s | Iteration Time: 4567.207 ms | 21523.9 tokens/sec
2024-10-29 22:30:56,008 - Iteration 2462 | Train Loss 6.24801 | Training Time: 0003h 13m 53s | Iteration Time: 4579.526 ms | 21466.0 tokens/sec
2024-10-29 22:32:27,851 - Iteration 2482 | Train Loss 6.40076 | Training Time: 0003h 15m 24s | Iteration Time: 4592.164 ms | 21406.9 tokens/sec
2024-10-29 22:33:59,861 - Iteration 2502 | Train Loss 6.82493 | Training Time: 0003h 16m 56s | Iteration Time: 4600.494 ms | 21368.1 tokens/sec
2024-10-29 22:34:13,674 - Iteration 2505 | Train Loss 6.16017 | Training Time: 0003h 17m 10s | Iteration Time: 4604.151 ms | 21351.2 tokens/sec
2024-10-29 22:34:20,544 - Iteration 2506 | Train Loss 6.13792 | Training Time: 0003h 17m 17s | Iteration Time: 6870.702 ms | 14307.7 tokens/sec
2024-10-29 22:38:40,914 - Iteration 2562 | Train Loss 6.41053 | Training Time: 0003h 21m 37s | Iteration Time: 4649.454 ms | 21143.1 tokens/sec
2024-10-29 22:40:13,252 - Iteration 2582 | Train Loss 6.33355 | Training Time: 0003h 23m 10s | Iteration Time: 4616.901 ms | 21292.2 tokens/sec
2024-10-29 22:41:45,621 - Iteration 2602 | Train Loss 6.43878 | Training Time: 0003h 24m 42s | Iteration Time: 4618.471 ms | 21285.0 tokens/sec
2024-10-29 22:43:18,042 - Iteration 2622 | Train Loss 6.46010 | Training Time: 0003h 26m 15s | Iteration Time: 4621.034 ms | 21273.2 tokens/sec
2024-10-29 22:44:50,581 - Iteration 2642 | Train Loss 6.36432 | Training Time: 0003h 27m 47s | Iteration Time: 4626.972 ms | 21245.9 tokens/sec
2024-10-29 22:46:23,109 - Iteration 2662 | Train Loss 6.42224 | Training Time: 0003h 29m 20s | Iteration Time: 4626.361 ms | 21248.7 tokens/sec
2024-10-29 22:47:55,644 - Iteration 2682 | Train Loss 6.45586 | Training Time: 0003h 30m 52s | Iteration Time: 4626.767 ms | 21246.8 tokens/sec
2024-10-29 22:49:28,214 - Iteration 2702 | Train Loss 6.48275 | Training Time: 0003h 32m 25s | Iteration Time: 4628.506 ms | 21238.8 tokens/sec
2024-10-29 22:51:00,756 - Iteration 2722 | Train Loss 6.45245 | Training Time: 0003h 33m 57s | Iteration Time: 4627.099 ms | 21245.3 tokens/sec
2024-10-29 22:52:33,333 - Iteration 2742 | Train Loss 6.25583 | Training Time: 0003h 35m 30s | Iteration Time: 4628.859 ms | 21237.2 tokens/sec
2024-10-29 22:54:05,950 - Iteration 2762 | Train Loss 6.49343 | Training Time: 0003h 37m 03s | Iteration Time: 4630.850 ms | 21228.1 tokens/sec
2024-10-29 22:55:38,707 - Iteration 2782 | Train Loss 6.43171 | Training Time: 0003h 38m 35s | Iteration Time: 4637.842 ms | 21196.1 tokens/sec
2024-10-29 22:57:11,557 - Iteration 2802 | Train Loss 6.25066 | Training Time: 0003h 40m 08s | Iteration Time: 4642.493 ms | 21174.8 tokens/sec
2024-10-29 22:58:44,533 - Iteration 2822 | Train Loss 6.23296 | Training Time: 0003h 41m 41s | Iteration Time: 4648.829 ms | 21146.0 tokens/sec
2024-10-29 23:00:17,586 - Iteration 2842 | Train Loss 6.22901 | Training Time: 0003h 43m 14s | Iteration Time: 4652.621 ms | 21128.7 tokens/sec
2024-10-29 23:01:04,141 - Iteration 2852 | Train Loss 6.12092 | Training Time: 0003h 44m 01s | Iteration Time: 4655.490 ms | 21115.7 tokens/sec
2024-10-29 23:01:48,480 - Iteration 2861 | Train Loss 6.11525 | Training Time: 0003h 44m 45s | Iteration Time: 4926.629 ms | 19953.6 tokens/sec
2024-10-29 23:05:01,732 - Iteration 2902 | Train Loss 6.42520 | Training Time: 0003h 47m 58s | Iteration Time: 4713.451 ms | 20856.1 tokens/sec
2024-10-29 23:06:34,934 - Iteration 2922 | Train Loss 6.34696 | Training Time: 0003h 49m 32s | Iteration Time: 4660.093 ms | 21094.9 tokens/sec
2024-10-29 23:08:08,053 - Iteration 2942 | Train Loss 6.31987 | Training Time: 0003h 51m 05s | Iteration Time: 4655.968 ms | 21113.5 tokens/sec
2024-10-29 23:08:59,185 - Iteration 2953 | Train Loss 5.97243 | Training Time: 0003h 51m 56s | Iteration Time: 4648.389 ms | 21148.0 tokens/sec
2024-10-29 23:11:16,089 - Iteration 2982 | Train Loss 6.19317 | Training Time: 0003h 54m 13s | Iteration Time: 4720.800 ms | 20823.6 tokens/sec
2024-10-29 23:12:48,826 - Iteration 3002 | Train Loss 6.31170 | Training Time: 0003h 55m 45s | Iteration Time: 4636.867 ms | 21200.5 tokens/sec
2024-10-29 23:14:21,499 - Iteration 3022 | Train Loss 6.25060 | Training Time: 0003h 57m 18s | Iteration Time: 4633.650 ms | 21215.2 tokens/sec
2024-10-29 23:15:54,114 - Iteration 3042 | Train Loss 6.18636 | Training Time: 0003h 58m 51s | Iteration Time: 4630.746 ms | 21228.5 tokens/sec
2024-10-29 23:17:26,706 - Iteration 3062 | Train Loss 6.14096 | Training Time: 0004h 00m 23s | Iteration Time: 4629.602 ms | 21233.8 tokens/sec
2024-10-29 23:18:59,288 - Iteration 3082 | Train Loss 6.15651 | Training Time: 0004h 01m 56s | Iteration Time: 4629.097 ms | 21236.1 tokens/sec
2024-10-29 23:20:31,867 - Iteration 3102 | Train Loss 6.26031 | Training Time: 0004h 03m 28s | Iteration Time: 4628.938 ms | 21236.8 tokens/sec
2024-10-29 23:22:04,385 - Iteration 3122 | Train Loss 6.18432 | Training Time: 0004h 05m 01s | Iteration Time: 4625.929 ms | 21250.7 tokens/sec
2024-10-29 23:23:36,912 - Iteration 3142 | Train Loss 6.15904 | Training Time: 0004h 06m 33s | Iteration Time: 4626.335 ms | 21248.8 tokens/sec
2024-10-29 23:25:09,419 - Iteration 3162 | Train Loss 6.19823 | Training Time: 0004h 08m 06s | Iteration Time: 4625.364 ms | 21253.2 tokens/sec
2024-10-29 23:26:41,898 - Iteration 3182 | Train Loss 6.22689 | Training Time: 0004h 09m 38s | Iteration Time: 4623.932 ms | 21259.8 tokens/sec
2024-10-29 23:28:14,372 - Iteration 3202 | Train Loss 6.26973 | Training Time: 0004h 11m 11s | Iteration Time: 4623.698 ms | 21260.9 tokens/sec
2024-10-29 23:29:46,865 - Iteration 3222 | Train Loss 6.22601 | Training Time: 0004h 12m 43s | Iteration Time: 4624.664 ms | 21256.5 tokens/sec
2024-10-29 23:31:19,339 - Iteration 3242 | Train Loss 6.21218 | Training Time: 0004h 14m 16s | Iteration Time: 4623.719 ms | 21260.8 tokens/sec
2024-10-29 23:32:51,797 - Iteration 3262 | Train Loss 6.19659 | Training Time: 0004h 15m 48s | Iteration Time: 4622.854 ms | 21264.8 tokens/sec
2024-10-29 23:34:24,268 - Iteration 3282 | Train Loss 6.21273 | Training Time: 0004h 17m 21s | Iteration Time: 4623.570 ms | 21261.5 tokens/sec
2024-10-29 23:35:05,874 - Iteration 3291 | Train Loss 5.66009 | Training Time: 0004h 18m 02s | Iteration Time: 4622.851 ms | 21264.8 tokens/sec
2024-10-29 23:37:31,495 - Iteration 3322 | Train Loss 6.07075 | Training Time: 0004h 20m 28s | Iteration Time: 4697.463 ms | 20927.0 tokens/sec
2024-10-29 23:39:03,990 - Iteration 3342 | Train Loss 6.12634 | Training Time: 0004h 22m 01s | Iteration Time: 4624.755 ms | 21256.0 tokens/sec
2024-10-29 23:40:36,523 - Iteration 3362 | Train Loss 6.13191 | Training Time: 0004h 23m 33s | Iteration Time: 4626.624 ms | 21247.5 tokens/sec
2024-10-29 23:42:09,042 - Iteration 3382 | Train Loss 6.08641 | Training Time: 0004h 25m 06s | Iteration Time: 4625.958 ms | 21250.5 tokens/sec
2024-10-29 23:43:41,523 - Iteration 3402 | Train Loss 6.30948 | Training Time: 0004h 26m 38s | Iteration Time: 4624.066 ms | 21259.2 tokens/sec
2024-10-29 23:45:14,024 - Iteration 3422 | Train Loss 6.20391 | Training Time: 0004h 28m 11s | Iteration Time: 4625.071 ms | 21254.6 tokens/sec
2024-10-29 23:46:46,524 - Iteration 3442 | Train Loss 6.21834 | Training Time: 0004h 29m 43s | Iteration Time: 4624.970 ms | 21255.1 tokens/sec
2024-10-29 23:48:19,025 - Iteration 3462 | Train Loss 6.10787 | Training Time: 0004h 31m 16s | Iteration Time: 4625.048 ms | 21254.7 tokens/sec
2024-10-29 23:49:51,524 - Iteration 3482 | Train Loss 6.20630 | Training Time: 0004h 32m 48s | Iteration Time: 4624.985 ms | 21255.0 tokens/sec
2024-10-29 23:51:24,025 - Iteration 3502 | Train Loss 6.16344 | Training Time: 0004h 34m 21s | Iteration Time: 4625.035 ms | 21254.8 tokens/sec
2024-10-29 23:52:56,523 - Iteration 3522 | Train Loss 6.10313 | Training Time: 0004h 35m 53s | Iteration Time: 4624.872 ms | 21255.5 tokens/sec
2024-10-29 23:54:29,020 - Iteration 3542 | Train Loss 6.06081 | Training Time: 0004h 37m 26s | Iteration Time: 4624.860 ms | 21255.6 tokens/sec
2024-10-29 23:56:01,448 - Iteration 3562 | Train Loss 6.04392 | Training Time: 0004h 38m 58s | Iteration Time: 4621.398 ms | 21271.5 tokens/sec
2024-10-29 23:57:33,873 - Iteration 3582 | Train Loss 6.04183 | Training Time: 0004h 40m 30s | Iteration Time: 4621.274 ms | 21272.1 tokens/sec
2024-10-29 23:59:06,274 - Iteration 3602 | Train Loss 6.09176 | Training Time: 0004h 42m 03s | Iteration Time: 4620.029 ms | 21277.8 tokens/sec
2024-10-30 00:00:38,653 - Iteration 3622 | Train Loss 6.06121 | Training Time: 0004h 43m 35s | Iteration Time: 4618.968 ms | 21282.7 tokens/sec
2024-10-30 00:02:11,002 - Iteration 3642 | Train Loss 6.05629 | Training Time: 0004h 45m 08s | Iteration Time: 4617.432 ms | 21289.8 tokens/sec
2024-10-30 00:03:43,432 - Iteration 3662 | Train Loss 6.32988 | Training Time: 0004h 46m 40s | Iteration Time: 4621.519 ms | 21270.9 tokens/sec
2024-10-30 00:05:15,868 - Iteration 3682 | Train Loss 6.27831 | Training Time: 0004h 48m 12s | Iteration Time: 4621.773 ms | 21269.8 tokens/sec
2024-10-30 00:06:48,363 - Iteration 3702 | Train Loss 6.29024 | Training Time: 0004h 49m 45s | Iteration Time: 4624.755 ms | 21256.0 tokens/sec
2024-10-30 00:08:20,836 - Iteration 3722 | Train Loss 6.18423 | Training Time: 0004h 51m 17s | Iteration Time: 4623.683 ms | 21261.0 tokens/sec
2024-10-30 00:09:53,318 - Iteration 3742 | Train Loss 6.11158 | Training Time: 0004h 52m 50s | Iteration Time: 4624.080 ms | 21259.1 tokens/sec
2024-10-30 00:11:25,797 - Iteration 3762 | Train Loss 6.06315 | Training Time: 0004h 54m 22s | Iteration Time: 4623.947 ms | 21259.8 tokens/sec
2024-10-30 00:12:58,254 - Iteration 3782 | Train Loss 6.08853 | Training Time: 0004h 55m 55s | Iteration Time: 4622.838 ms | 21264.9 tokens/sec
2024-10-30 00:14:30,705 - Iteration 3802 | Train Loss 6.08650 | Training Time: 0004h 57m 27s | Iteration Time: 4622.572 ms | 21266.1 tokens/sec
2024-10-30 00:16:03,122 - Iteration 3822 | Train Loss 6.11120 | Training Time: 0004h 59m 00s | Iteration Time: 4620.857 ms | 21274.0 tokens/sec
2024-10-30 00:17:35,558 - Iteration 3842 | Train Loss 6.05955 | Training Time: 0005h 00m 32s | Iteration Time: 4621.808 ms | 21269.6 tokens/sec
2024-10-30 00:19:07,985 - Iteration 3862 | Train Loss 6.01552 | Training Time: 0005h 02m 05s | Iteration Time: 4621.328 ms | 21271.8 tokens/sec
2024-10-30 00:20:40,373 - Iteration 3882 | Train Loss 6.00741 | Training Time: 0005h 03m 37s | Iteration Time: 4619.386 ms | 21280.8 tokens/sec
2024-10-30 00:22:12,796 - Iteration 3902 | Train Loss 6.16841 | Training Time: 0005h 05m 09s | Iteration Time: 4621.187 ms | 21272.5 tokens/sec
2024-10-30 00:23:45,244 - Iteration 3922 | Train Loss 6.13562 | Training Time: 0005h 06m 42s | Iteration Time: 4622.372 ms | 21267.0 tokens/sec
2024-10-30 00:25:17,709 - Iteration 3942 | Train Loss 6.04347 | Training Time: 0005h 08m 14s | Iteration Time: 4623.240 ms | 21263.0 tokens/sec
2024-10-30 00:26:50,156 - Iteration 3962 | Train Loss 6.01664 | Training Time: 0005h 09m 47s | Iteration Time: 4622.347 ms | 21267.1 tokens/sec
2024-10-30 00:28:22,589 - Iteration 3982 | Train Loss 6.08454 | Training Time: 0005h 11m 19s | Iteration Time: 4621.675 ms | 21270.2 tokens/sec
2024-10-30 00:29:55,020 - Iteration 4002 | Train Loss 6.13585 | Training Time: 0005h 12m 52s | Iteration Time: 4621.555 ms | 21270.8 tokens/sec
2024-10-30 00:31:27,428 - Iteration 4022 | Train Loss 6.04908 | Training Time: 0005h 14m 24s | Iteration Time: 4620.408 ms | 21276.0 tokens/sec
2024-10-30 00:32:59,857 - Iteration 4042 | Train Loss 6.01750 | Training Time: 0005h 15m 56s | Iteration Time: 4621.419 ms | 21271.4 tokens/sec
2024-10-30 00:34:32,256 - Iteration 4062 | Train Loss 5.97639 | Training Time: 0005h 17m 29s | Iteration Time: 4619.937 ms | 21278.2 tokens/sec
2024-10-30 00:36:04,675 - Iteration 4082 | Train Loss 5.93488 | Training Time: 0005h 19m 01s | Iteration Time: 4620.974 ms | 21273.4 tokens/sec
2024-10-30 00:37:37,066 - Iteration 4102 | Train Loss 5.95953 | Training Time: 0005h 20m 34s | Iteration Time: 4619.567 ms | 21279.9 tokens/sec
2024-10-30 00:39:09,453 - Iteration 4122 | Train Loss 5.81741 | Training Time: 0005h 22m 06s | Iteration Time: 4619.355 ms | 21280.9 tokens/sec
2024-10-30 00:40:41,841 - Iteration 4142 | Train Loss 6.06903 | Training Time: 0005h 23m 38s | Iteration Time: 4619.357 ms | 21280.9 tokens/sec
2024-10-30 00:42:14,208 - Iteration 4162 | Train Loss 6.18575 | Training Time: 0005h 25m 11s | Iteration Time: 4618.387 ms | 21285.4 tokens/sec
2024-10-30 00:43:46,557 - Iteration 4182 | Train Loss 6.03979 | Training Time: 0005h 26m 43s | Iteration Time: 4617.444 ms | 21289.7 tokens/sec
2024-10-30 00:45:18,906 - Iteration 4202 | Train Loss 6.00655 | Training Time: 0005h 28m 15s | Iteration Time: 4617.419 ms | 21289.8 tokens/sec
2024-10-30 00:46:51,234 - Iteration 4222 | Train Loss 6.19167 | Training Time: 0005h 29m 48s | Iteration Time: 4616.424 ms | 21294.4 tokens/sec
2024-10-30 00:48:23,588 - Iteration 4242 | Train Loss 6.07047 | Training Time: 0005h 31m 20s | Iteration Time: 4617.703 ms | 21288.5 tokens/sec
2024-10-30 00:49:55,967 - Iteration 4262 | Train Loss 5.89905 | Training Time: 0005h 32m 53s | Iteration Time: 4618.932 ms | 21282.8 tokens/sec
2024-10-30 00:51:28,322 - Iteration 4282 | Train Loss 5.85041 | Training Time: 0005h 34m 25s | Iteration Time: 4617.753 ms | 21288.3 tokens/sec
2024-10-30 00:53:00,663 - Iteration 4302 | Train Loss 5.89892 | Training Time: 0005h 35m 57s | Iteration Time: 4617.045 ms | 21291.5 tokens/sec
2024-10-30 00:54:32,977 - Iteration 4322 | Train Loss 5.85005 | Training Time: 0005h 37m 30s | Iteration Time: 4615.730 ms | 21297.6 tokens/sec
2024-10-30 00:56:05,283 - Iteration 4342 | Train Loss 5.82253 | Training Time: 0005h 39m 02s | Iteration Time: 4615.263 ms | 21299.8 tokens/sec
2024-10-30 00:57:37,586 - Iteration 4362 | Train Loss 5.78810 | Training Time: 0005h 40m 34s | Iteration Time: 4615.174 ms | 21300.2 tokens/sec
2024-10-30 00:59:09,887 - Iteration 4382 | Train Loss 6.01578 | Training Time: 0005h 42m 06s | Iteration Time: 4615.069 ms | 21300.7 tokens/sec
2024-10-30 01:00:42,218 - Iteration 4402 | Train Loss 5.96479 | Training Time: 0005h 43m 39s | Iteration Time: 4616.533 ms | 21293.9 tokens/sec
2024-10-30 01:02:14,599 - Iteration 4422 | Train Loss 6.01441 | Training Time: 0005h 45m 11s | Iteration Time: 4619.046 ms | 21282.3 tokens/sec
2024-10-30 01:03:46,993 - Iteration 4442 | Train Loss 6.00134 | Training Time: 0005h 46m 44s | Iteration Time: 4619.676 ms | 21279.4 tokens/sec
2024-10-30 01:05:19,293 - Iteration 4462 | Train Loss 5.99100 | Training Time: 0005h 48m 16s | Iteration Time: 4615.019 ms | 21300.9 tokens/sec
2024-10-30 01:06:51,572 - Iteration 4482 | Train Loss 6.13110 | Training Time: 0005h 49m 48s | Iteration Time: 4613.953 ms | 21305.8 tokens/sec
2024-10-30 01:08:23,781 - Iteration 4502 | Train Loss 5.91770 | Training Time: 0005h 51m 20s | Iteration Time: 4610.473 ms | 21321.9 tokens/sec
2024-10-30 01:09:55,966 - Iteration 4522 | Train Loss 5.96916 | Training Time: 0005h 52m 53s | Iteration Time: 4609.228 ms | 21327.6 tokens/sec
2024-10-30 01:11:28,169 - Iteration 4542 | Train Loss 5.83812 | Training Time: 0005h 54m 25s | Iteration Time: 4610.151 ms | 21323.4 tokens/sec
2024-10-30 01:13:00,370 - Iteration 4562 | Train Loss 5.78704 | Training Time: 0005h 55m 57s | Iteration Time: 4610.048 ms | 21323.9 tokens/sec
2024-10-30 01:14:32,542 - Iteration 4582 | Train Loss 5.79600 | Training Time: 0005h 57m 29s | Iteration Time: 4608.614 ms | 21330.5 tokens/sec
2024-10-30 01:16:04,747 - Iteration 4602 | Train Loss 5.75460 | Training Time: 0005h 59m 01s | Iteration Time: 4610.217 ms | 21323.1 tokens/sec
2024-10-30 01:17:36,930 - Iteration 4622 | Train Loss 6.00025 | Training Time: 0006h 00m 34s | Iteration Time: 4609.191 ms | 21327.8 tokens/sec
2024-10-30 01:19:09,106 - Iteration 4642 | Train Loss 5.98690 | Training Time: 0006h 02m 06s | Iteration Time: 4608.769 ms | 21329.8 tokens/sec
2024-10-30 01:20:41,265 - Iteration 4662 | Train Loss 6.03322 | Training Time: 0006h 03m 38s | Iteration Time: 4607.983 ms | 21333.4 tokens/sec
2024-10-30 01:22:13,415 - Iteration 4682 | Train Loss 5.97193 | Training Time: 0006h 05m 10s | Iteration Time: 4607.491 ms | 21335.7 tokens/sec
2024-10-30 01:23:45,557 - Iteration 4702 | Train Loss 5.99387 | Training Time: 0006h 06m 42s | Iteration Time: 4607.086 ms | 21337.6 tokens/sec
2024-10-30 01:25:17,715 - Iteration 4722 | Train Loss 5.90443 | Training Time: 0006h 08m 14s | Iteration Time: 4607.911 ms | 21333.7 tokens/sec
2024-10-30 01:26:49,943 - Iteration 4742 | Train Loss 5.96473 | Training Time: 0006h 09m 47s | Iteration Time: 4611.415 ms | 21317.5 tokens/sec
2024-10-30 01:28:22,158 - Iteration 4762 | Train Loss 5.86822 | Training Time: 0006h 11m 19s | Iteration Time: 4610.723 ms | 21320.7 tokens/sec
2024-10-30 01:29:54,348 - Iteration 4782 | Train Loss 5.86507 | Training Time: 0006h 12m 51s | Iteration Time: 4609.496 ms | 21326.4 tokens/sec
2024-10-30 01:31:26,553 - Iteration 4802 | Train Loss 5.90376 | Training Time: 0006h 14m 23s | Iteration Time: 4610.233 ms | 21323.0 tokens/sec
2024-10-30 01:32:58,732 - Iteration 4822 | Train Loss 5.79209 | Training Time: 0006h 15m 55s | Iteration Time: 4608.959 ms | 21328.9 tokens/sec
2024-10-30 01:34:30,903 - Iteration 4842 | Train Loss 5.66366 | Training Time: 0006h 17m 27s | Iteration Time: 4608.583 ms | 21330.6 tokens/sec
2024-10-30 01:36:03,069 - Iteration 4862 | Train Loss 5.83095 | Training Time: 0006h 19m 00s | Iteration Time: 4608.258 ms | 21332.1 tokens/sec
2024-10-30 01:37:35,212 - Iteration 4882 | Train Loss 5.96545 | Training Time: 0006h 20m 32s | Iteration Time: 4607.162 ms | 21337.2 tokens/sec
2024-10-30 01:39:07,337 - Iteration 4902 | Train Loss 5.87133 | Training Time: 0006h 22m 04s | Iteration Time: 4606.242 ms | 21341.5 tokens/sec
2024-10-30 01:40:39,486 - Iteration 4922 | Train Loss 6.04585 | Training Time: 0006h 23m 36s | Iteration Time: 4607.471 ms | 21335.8 tokens/sec
2024-10-30 01:42:11,730 - Iteration 4942 | Train Loss 5.80967 | Training Time: 0006h 25m 08s | Iteration Time: 4612.179 ms | 21314.0 tokens/sec
2024-10-30 01:43:44,022 - Iteration 4962 | Train Loss 5.85180 | Training Time: 0006h 26m 41s | Iteration Time: 4614.598 ms | 21302.8 tokens/sec
2024-10-30 01:45:16,277 - Iteration 4982 | Train Loss 5.90676 | Training Time: 0006h 28m 13s | Iteration Time: 4612.766 ms | 21311.3 tokens/sec
2024-10-30 01:46:48,467 - Iteration 5002 | Train Loss 6.00574 | Training Time: 0006h 29m 45s | Iteration Time: 4609.502 ms | 21326.4 tokens/sec
2024-10-30 01:48:20,653 - Iteration 5022 | Train Loss 5.85083 | Training Time: 0006h 31m 17s | Iteration Time: 4609.287 ms | 21327.4 tokens/sec
2024-10-30 01:49:52,837 - Iteration 5042 | Train Loss 5.84287 | Training Time: 0006h 32m 49s | Iteration Time: 4609.197 ms | 21327.8 tokens/sec
2024-10-30 01:51:25,006 - Iteration 5062 | Train Loss 5.79267 | Training Time: 0006h 34m 22s | Iteration Time: 4608.482 ms | 21331.1 tokens/sec
2024-10-30 01:52:57,147 - Iteration 5082 | Train Loss 5.72726 | Training Time: 0006h 35m 54s | Iteration Time: 4607.022 ms | 21337.9 tokens/sec
2024-10-30 01:53:29,384 - Iteration 5089 | Train Loss 5.62228 | Training Time: 0006h 36m 26s | Iteration Time: 4605.373 ms | 21345.5 tokens/sec
2024-10-30 01:55:03,832 - Iteration 5109 | Train Loss 5.61131 | Training Time: 0006h 38m 00s | Iteration Time: 4722.412 ms | 20816.5 tokens/sec
2024-11-04 09:32:44,209 - Iteration 5129 | Train Loss 5.82904 | Training Time: 0006h 40m 16s | Iteration Time: 26.384 ms | 3725861.0 tokens/sec
2024-11-04 09:34:15,240 - Iteration 5149 | Train Loss 5.86880 | Training Time: 0006h 41m 47s | Iteration Time: 4551.558 ms | 21597.9 tokens/sec
2024-11-04 09:35:46,370 - Iteration 5169 | Train Loss 5.76386 | Training Time: 0006h 43m 18s | Iteration Time: 4556.489 ms | 21574.5 tokens/sec
2024-11-04 09:37:17,661 - Iteration 5189 | Train Loss 5.73963 | Training Time: 0006h 44m 49s | Iteration Time: 4564.567 ms | 21536.3 tokens/sec
2024-11-04 09:38:49,148 - Iteration 5209 | Train Loss 5.75842 | Training Time: 0006h 46m 21s | Iteration Time: 4574.319 ms | 21490.4 tokens/sec
2024-11-04 09:40:20,763 - Iteration 5229 | Train Loss 5.78024 | Training Time: 0006h 47m 52s | Iteration Time: 4580.776 ms | 21460.1 tokens/sec
2024-11-04 09:41:15,767 - Iteration 5241 | Train Loss 5.60895 | Training Time: 0006h 48m 47s | Iteration Time: 4583.629 ms | 21446.8 tokens/sec
2024-11-04 09:41:36,314 - Iteration 5245 | Train Loss 5.59426 | Training Time: 0006h 49m 08s | Iteration Time: 5136.715 ms | 19137.5 tokens/sec
2024-11-04 09:44:09,847 - Iteration 5278 | Train Loss 5.57856 | Training Time: 0006h 51m 41s | Iteration Time: 4652.535 ms | 21129.1 tokens/sec
2024-11-04 09:46:34,282 - Iteration 5309 | Train Loss 5.98174 | Training Time: 0006h 54m 06s | Iteration Time: 4659.169 ms | 21099.0 tokens/sec
2024-11-04 09:48:06,079 - Iteration 5329 | Train Loss 5.77732 | Training Time: 0006h 55m 38s | Iteration Time: 4589.875 ms | 21417.6 tokens/sec
2024-11-04 09:49:37,925 - Iteration 5349 | Train Loss 5.77669 | Training Time: 0006h 57m 09s | Iteration Time: 4592.311 ms | 21406.2 tokens/sec
2024-11-04 09:51:09,816 - Iteration 5369 | Train Loss 5.73202 | Training Time: 0006h 58m 41s | Iteration Time: 4594.531 ms | 21395.9 tokens/sec
2024-11-04 09:52:41,779 - Iteration 5389 | Train Loss 5.80626 | Training Time: 0007h 00m 13s | Iteration Time: 4598.139 ms | 21379.1 tokens/sec
2024-11-04 09:54:13,720 - Iteration 5409 | Train Loss 5.85458 | Training Time: 0007h 01m 45s | Iteration Time: 4597.081 ms | 21384.0 tokens/sec
2024-11-04 09:55:22,662 - Iteration 5424 | Train Loss 5.51824 | Training Time: 0007h 02m 54s | Iteration Time: 4596.126 ms | 21388.4 tokens/sec
2024-11-04 09:57:19,815 - Iteration 5449 | Train Loss 5.72264 | Training Time: 0007h 04m 51s | Iteration Time: 4686.134 ms | 20977.6 tokens/sec
2024-11-04 09:58:51,702 - Iteration 5469 | Train Loss 5.69047 | Training Time: 0007h 06m 23s | Iteration Time: 4594.310 ms | 21396.9 tokens/sec
2024-11-04 10:00:23,610 - Iteration 5489 | Train Loss 5.68797 | Training Time: 0007h 07m 55s | Iteration Time: 4595.417 ms | 21391.7 tokens/sec
2024-11-04 10:01:55,562 - Iteration 5509 | Train Loss 5.71000 | Training Time: 0007h 09m 27s | Iteration Time: 4597.581 ms | 21381.7 tokens/sec
2024-11-04 10:03:27,604 - Iteration 5529 | Train Loss 5.67435 | Training Time: 0007h 10m 59s | Iteration Time: 4602.106 ms | 21360.7 tokens/sec
2024-11-04 10:04:23,228 - Iteration 5541 | Train Loss 5.49994 | Training Time: 0007h 11m 55s | Iteration Time: 4635.336 ms | 21207.5 tokens/sec
2024-11-04 10:06:34,697 - Iteration 5569 | Train Loss 5.69852 | Training Time: 0007h 14m 06s | Iteration Time: 4695.341 ms | 20936.5 tokens/sec
2024-11-04 10:08:06,843 - Iteration 5589 | Train Loss 5.75027 | Training Time: 0007h 15m 38s | Iteration Time: 4607.288 ms | 21336.6 tokens/sec
2024-11-04 10:09:38,929 - Iteration 5609 | Train Loss 5.73181 | Training Time: 0007h 17m 10s | Iteration Time: 4604.287 ms | 21350.5 tokens/sec
2024-11-04 10:11:10,972 - Iteration 5629 | Train Loss 5.69695 | Training Time: 0007h 18m 42s | Iteration Time: 4602.143 ms | 21360.5 tokens/sec
2024-11-04 10:12:43,021 - Iteration 5649 | Train Loss 5.69728 | Training Time: 0007h 20m 15s | Iteration Time: 4602.466 ms | 21359.0 tokens/sec
2024-11-04 10:14:15,159 - Iteration 5669 | Train Loss 5.59411 | Training Time: 0007h 21m 47s | Iteration Time: 4606.915 ms | 21338.4 tokens/sec
2024-11-04 10:15:47,296 - Iteration 5689 | Train Loss 5.63823 | Training Time: 0007h 23m 19s | Iteration Time: 4606.840 ms | 21338.7 tokens/sec
2024-11-04 10:17:19,449 - Iteration 5709 | Train Loss 5.57643 | Training Time: 0007h 24m 51s | Iteration Time: 4607.652 ms | 21334.9 tokens/sec
2024-11-04 10:18:51,585 - Iteration 5729 | Train Loss 5.60049 | Training Time: 0007h 26m 23s | Iteration Time: 4606.808 ms | 21338.9 tokens/sec
2024-11-04 10:19:28,438 - Iteration 5737 | Train Loss 5.45212 | Training Time: 0007h 27m 00s | Iteration Time: 4606.520 ms | 21340.2 tokens/sec
2024-11-04 10:21:21,349 - Iteration 5761 | Train Loss 5.43986 | Training Time: 0007h 28m 53s | Iteration Time: 4704.650 ms | 20895.1 tokens/sec
2024-11-04 10:22:18,892 - Iteration 5773 | Train Loss 5.41356 | Training Time: 0007h 29m 50s | Iteration Time: 4795.232 ms | 20500.4 tokens/sec
2024-11-04 10:22:58,018 - Iteration 5781 | Train Loss 5.40781 | Training Time: 0007h 30m 30s | Iteration Time: 4890.800 ms | 20099.8 tokens/sec
2024-11-04 10:23:09,522 - Iteration 5783 | Train Loss 5.39181 | Training Time: 0007h 30m 41s | Iteration Time: 5751.893 ms | 17090.7 tokens/sec
2024-11-04 10:28:15,830 - Iteration 5849 | Train Loss 5.80402 | Training Time: 0007h 35m 47s | Iteration Time: 4641.027 ms | 21181.5 tokens/sec
2024-11-04 10:29:48,022 - Iteration 5869 | Train Loss 5.73025 | Training Time: 0007h 37m 20s | Iteration Time: 4609.609 ms | 21325.9 tokens/sec
2024-11-04 10:31:20,265 - Iteration 5889 | Train Loss 5.56323 | Training Time: 0007h 38m 52s | Iteration Time: 4612.143 ms | 21314.2 tokens/sec
2024-11-04 10:32:52,569 - Iteration 5909 | Train Loss 5.60400 | Training Time: 0007h 40m 24s | Iteration Time: 4615.193 ms | 21300.1 tokens/sec
2024-11-04 10:34:24,827 - Iteration 5929 | Train Loss 5.51655 | Training Time: 0007h 41m 56s | Iteration Time: 4612.902 ms | 21310.7 tokens/sec
2024-11-04 10:35:57,108 - Iteration 5949 | Train Loss 5.49409 | Training Time: 0007h 43m 29s | Iteration Time: 4614.043 ms | 21305.4 tokens/sec
2024-11-04 10:37:29,445 - Iteration 5969 | Train Loss 5.54874 | Training Time: 0007h 45m 01s | Iteration Time: 4616.853 ms | 21292.4 tokens/sec
2024-11-04 10:38:52,540 - Iteration 5987 | Train Loss 5.36333 | Training Time: 0007h 46m 24s | Iteration Time: 4616.419 ms | 21294.4 tokens/sec
2024-11-04 10:40:13,233 - Iteration 6004 | Train Loss 5.36152 | Training Time: 0007h 47m 45s | Iteration Time: 4746.654 ms | 20710.2 tokens/sec
2024-11-04 10:41:34,069 - Iteration 6021 | Train Loss 5.31261 | Training Time: 0007h 49m 06s | Iteration Time: 4755.008 ms | 20673.8 tokens/sec
2024-11-04 10:43:45,849 - Iteration 6049 | Train Loss 5.73209 | Training Time: 0007h 51m 17s | Iteration Time: 4706.430 ms | 20887.2 tokens/sec
2024-11-04 10:45:18,619 - Iteration 6069 | Train Loss 5.49814 | Training Time: 0007h 52m 50s | Iteration Time: 4638.521 ms | 21193.0 tokens/sec
2024-11-04 10:46:51,390 - Iteration 6089 | Train Loss 5.53364 | Training Time: 0007h 54m 23s | Iteration Time: 4638.557 ms | 21192.8 tokens/sec
2024-11-04 10:48:24,158 - Iteration 6109 | Train Loss 5.50466 | Training Time: 0007h 55m 56s | Iteration Time: 4638.411 ms | 21193.5 tokens/sec
2024-11-04 10:49:56,984 - Iteration 6129 | Train Loss 5.61601 | Training Time: 0007h 57m 29s | Iteration Time: 4641.265 ms | 21180.4 tokens/sec
2024-11-04 10:51:29,835 - Iteration 6149 | Train Loss 5.53921 | Training Time: 0007h 59m 01s | Iteration Time: 4642.590 ms | 21174.4 tokens/sec
2024-11-04 10:53:02,615 - Iteration 6169 | Train Loss 5.30852 | Training Time: 0008h 00m 34s | Iteration Time: 4638.987 ms | 21190.8 tokens/sec
2024-11-04 10:53:55,738 - Iteration 6180 | Train Loss 5.01856 | Training Time: 0008h 01m 27s | Iteration Time: 4829.315 ms | 20355.7 tokens/sec
2024-11-04 10:56:11,885 - Iteration 6209 | Train Loss 5.54393 | Training Time: 0008h 03m 43s | Iteration Time: 4694.737 ms | 20939.2 tokens/sec
2024-11-04 10:57:44,269 - Iteration 6229 | Train Loss 5.42007 | Training Time: 0008h 05m 16s | Iteration Time: 4619.220 ms | 21281.5 tokens/sec
2024-11-04 10:59:16,653 - Iteration 6249 | Train Loss 5.43774 | Training Time: 0008h 06m 48s | Iteration Time: 4619.157 ms | 21281.8 tokens/sec
2024-11-04 11:00:49,041 - Iteration 6269 | Train Loss 5.32261 | Training Time: 0008h 08m 21s | Iteration Time: 4619.446 ms | 21280.5 tokens/sec
2024-11-04 11:02:21,407 - Iteration 6289 | Train Loss 5.38596 | Training Time: 0008h 09m 53s | Iteration Time: 4618.263 ms | 21285.9 tokens/sec
2024-11-04 11:03:53,766 - Iteration 6309 | Train Loss 5.40726 | Training Time: 0008h 11m 25s | Iteration Time: 4617.984 ms | 21287.2 tokens/sec
2024-11-04 11:05:26,130 - Iteration 6329 | Train Loss 5.44101 | Training Time: 0008h 12m 58s | Iteration Time: 4618.199 ms | 21286.2 tokens/sec
2024-11-04 11:06:58,485 - Iteration 6349 | Train Loss 5.45768 | Training Time: 0008h 14m 30s | Iteration Time: 4617.734 ms | 21288.4 tokens/sec
2024-11-04 11:08:30,827 - Iteration 6369 | Train Loss 5.39837 | Training Time: 0008h 16m 02s | Iteration Time: 4617.085 ms | 21291.4 tokens/sec
2024-11-04 11:10:03,136 - Iteration 6389 | Train Loss 5.39045 | Training Time: 0008h 17m 35s | Iteration Time: 4615.455 ms | 21298.9 tokens/sec
2024-11-04 11:11:35,435 - Iteration 6409 | Train Loss 5.36735 | Training Time: 0008h 19m 07s | Iteration Time: 4614.966 ms | 21301.1 tokens/sec
2024-11-04 11:13:07,776 - Iteration 6429 | Train Loss 5.17083 | Training Time: 0008h 20m 39s | Iteration Time: 4617.062 ms | 21291.5 tokens/sec
2024-11-04 11:14:40,105 - Iteration 6449 | Train Loss 5.38180 | Training Time: 0008h 22m 12s | Iteration Time: 4616.407 ms | 21294.5 tokens/sec
2024-11-04 11:16:12,430 - Iteration 6469 | Train Loss 5.39714 | Training Time: 0008h 23m 44s | Iteration Time: 4616.294 ms | 21295.0 tokens/sec
2024-11-04 11:17:44,795 - Iteration 6489 | Train Loss 5.14122 | Training Time: 0008h 25m 16s | Iteration Time: 4618.221 ms | 21286.1 tokens/sec
2024-11-04 11:19:17,204 - Iteration 6509 | Train Loss 5.25956 | Training Time: 0008h 26m 49s | Iteration Time: 4620.451 ms | 21275.8 tokens/sec
2024-11-04 11:20:49,632 - Iteration 6529 | Train Loss 5.24553 | Training Time: 0008h 28m 21s | Iteration Time: 4621.429 ms | 21271.3 tokens/sec
2024-11-04 11:22:22,131 - Iteration 6549 | Train Loss 5.47845 | Training Time: 0008h 29m 54s | Iteration Time: 4624.916 ms | 21255.3 tokens/sec
2024-11-04 11:23:54,682 - Iteration 6569 | Train Loss 5.46824 | Training Time: 0008h 31m 26s | Iteration Time: 4627.542 ms | 21243.2 tokens/sec
2024-11-04 11:25:27,267 - Iteration 6589 | Train Loss 5.40013 | Training Time: 0008h 32m 59s | Iteration Time: 4629.274 ms | 21235.3 tokens/sec
2024-11-04 11:26:59,861 - Iteration 6609 | Train Loss 5.33486 | Training Time: 0008h 34m 31s | Iteration Time: 4629.703 ms | 21233.3 tokens/sec
2024-11-04 11:28:32,452 - Iteration 6629 | Train Loss 5.47648 | Training Time: 0008h 36m 04s | Iteration Time: 4629.536 ms | 21234.1 tokens/sec
2024-11-04 11:30:05,005 - Iteration 6649 | Train Loss 5.34442 | Training Time: 0008h 37m 37s | Iteration Time: 4627.663 ms | 21242.7 tokens/sec
2024-11-04 11:31:37,545 - Iteration 6669 | Train Loss 5.34302 | Training Time: 0008h 39m 09s | Iteration Time: 4626.978 ms | 21245.8 tokens/sec
2024-11-04 11:33:10,082 - Iteration 6689 | Train Loss 5.34934 | Training Time: 0008h 40m 42s | Iteration Time: 4626.863 ms | 21246.4 tokens/sec
2024-11-04 11:34:42,630 - Iteration 6709 | Train Loss 5.20854 | Training Time: 0008h 42m 14s | Iteration Time: 4627.417 ms | 21243.8 tokens/sec
2024-11-04 11:36:15,195 - Iteration 6729 | Train Loss 5.24401 | Training Time: 0008h 43m 47s | Iteration Time: 4628.256 ms | 21240.0 tokens/sec
2024-11-04 11:37:47,764 - Iteration 6749 | Train Loss 5.20870 | Training Time: 0008h 45m 19s | Iteration Time: 4628.448 ms | 21239.1 tokens/sec
2024-11-04 11:39:20,368 - Iteration 6769 | Train Loss 5.30380 | Training Time: 0008h 46m 52s | Iteration Time: 4630.200 ms | 21231.0 tokens/sec
2024-11-04 11:40:52,915 - Iteration 6789 | Train Loss 5.20719 | Training Time: 0008h 48m 24s | Iteration Time: 4627.319 ms | 21244.3 tokens/sec
2024-11-04 11:42:25,436 - Iteration 6809 | Train Loss 5.37009 | Training Time: 0008h 49m 57s | Iteration Time: 4626.061 ms | 21250.0 tokens/sec
2024-11-04 11:43:57,969 - Iteration 6829 | Train Loss 5.36136 | Training Time: 0008h 51m 29s | Iteration Time: 4626.643 ms | 21247.4 tokens/sec
2024-11-04 11:45:30,483 - Iteration 6849 | Train Loss 5.40302 | Training Time: 0008h 53m 02s | Iteration Time: 4625.728 ms | 21251.6 tokens/sec
2024-11-04 11:47:03,016 - Iteration 6869 | Train Loss 5.40331 | Training Time: 0008h 54m 35s | Iteration Time: 4626.640 ms | 21247.4 tokens/sec
2024-11-04 11:48:35,504 - Iteration 6889 | Train Loss 5.29570 | Training Time: 0008h 56m 07s | Iteration Time: 4624.390 ms | 21257.7 tokens/sec
2024-11-04 11:50:07,997 - Iteration 6909 | Train Loss 5.35242 | Training Time: 0008h 57m 40s | Iteration Time: 4624.639 ms | 21256.6 tokens/sec
2024-11-04 11:51:40,486 - Iteration 6929 | Train Loss 5.26465 | Training Time: 0008h 59m 12s | Iteration Time: 4624.453 ms | 21257.4 tokens/sec
2024-11-04 11:53:12,973 - Iteration 6949 | Train Loss 5.21056 | Training Time: 0009h 00m 44s | Iteration Time: 4624.348 ms | 21257.9 tokens/sec
2024-11-04 11:54:45,479 - Iteration 6969 | Train Loss 5.22475 | Training Time: 0009h 02m 17s | Iteration Time: 4625.327 ms | 21253.4 tokens/sec
2024-11-04 11:56:17,961 - Iteration 6989 | Train Loss 5.11205 | Training Time: 0009h 03m 49s | Iteration Time: 4624.103 ms | 21259.0 tokens/sec
2024-11-04 11:56:54,950 - Iteration 6997 | Train Loss 4.98811 | Training Time: 0009h 04m 26s | Iteration Time: 4623.556 ms | 21261.6 tokens/sec
2024-11-04 11:59:25,237 - Iteration 7029 | Train Loss 5.11885 | Training Time: 0009h 06m 57s | Iteration Time: 4696.461 ms | 20931.5 tokens/sec
2024-11-04 12:00:57,757 - Iteration 7049 | Train Loss 5.14043 | Training Time: 0009h 08m 29s | Iteration Time: 4626.001 ms | 21250.3 tokens/sec
2024-11-04 12:02:30,297 - Iteration 7069 | Train Loss 5.26136 | Training Time: 0009h 10m 02s | Iteration Time: 4627.005 ms | 21245.7 tokens/sec
2024-11-04 12:04:02,797 - Iteration 7089 | Train Loss 5.32255 | Training Time: 0009h 11m 34s | Iteration Time: 4625.028 ms | 21254.8 tokens/sec
2024-11-04 12:05:35,305 - Iteration 7109 | Train Loss 5.29391 | Training Time: 0009h 13m 07s | Iteration Time: 4625.380 ms | 21253.2 tokens/sec
2024-11-04 12:07:07,831 - Iteration 7129 | Train Loss 5.28592 | Training Time: 0009h 14m 39s | Iteration Time: 4626.282 ms | 21249.0 tokens/sec
2024-11-04 12:08:40,400 - Iteration 7149 | Train Loss 5.29163 | Training Time: 0009h 16m 12s | Iteration Time: 4628.494 ms | 21238.9 tokens/sec
2024-11-04 12:10:12,932 - Iteration 7169 | Train Loss 5.27817 | Training Time: 0009h 17m 44s | Iteration Time: 4626.568 ms | 21247.7 tokens/sec
2024-11-04 12:11:45,438 - Iteration 7189 | Train Loss 5.02805 | Training Time: 0009h 19m 17s | Iteration Time: 4625.311 ms | 21253.5 tokens/sec
2024-11-04 12:13:17,934 - Iteration 7209 | Train Loss 5.19487 | Training Time: 0009h 20m 49s | Iteration Time: 4624.772 ms | 21256.0 tokens/sec
2024-11-04 12:14:50,436 - Iteration 7229 | Train Loss 5.12891 | Training Time: 0009h 22m 22s | Iteration Time: 4625.118 ms | 21254.4 tokens/sec
2024-11-04 12:15:22,819 - Iteration 7236 | Train Loss 4.93128 | Training Time: 0009h 22m 54s | Iteration Time: 4626.190 ms | 21249.5 tokens/sec
2024-11-04 12:17:57,730 - Iteration 7269 | Train Loss 5.05816 | Training Time: 0009h 25m 29s | Iteration Time: 4694.274 ms | 20941.3 tokens/sec
2024-11-04 12:19:30,284 - Iteration 7289 | Train Loss 5.32703 | Training Time: 0009h 27m 02s | Iteration Time: 4627.680 ms | 21242.6 tokens/sec
2024-11-04 12:21:02,834 - Iteration 7309 | Train Loss 5.22303 | Training Time: 0009h 28m 34s | Iteration Time: 4627.533 ms | 21243.3 tokens/sec
2024-11-04 12:22:35,363 - Iteration 7329 | Train Loss 5.20430 | Training Time: 0009h 30m 07s | Iteration Time: 4626.444 ms | 21248.3 tokens/sec
2024-11-04 12:24:07,892 - Iteration 7349 | Train Loss 5.04752 | Training Time: 0009h 31m 39s | Iteration Time: 4626.455 ms | 21248.2 tokens/sec
2024-11-04 12:25:40,384 - Iteration 7369 | Train Loss 5.18472 | Training Time: 0009h 33m 12s | Iteration Time: 4624.588 ms | 21256.8 tokens/sec
2024-11-04 12:27:12,913 - Iteration 7389 | Train Loss 5.19552 | Training Time: 0009h 34m 44s | Iteration Time: 4626.414 ms | 21248.4 tokens/sec
2024-11-04 12:28:45,452 - Iteration 7409 | Train Loss 5.13426 | Training Time: 0009h 36m 17s | Iteration Time: 4626.982 ms | 21245.8 tokens/sec
2024-11-04 12:30:18,007 - Iteration 7429 | Train Loss 5.01422 | Training Time: 0009h 37m 50s | Iteration Time: 4627.725 ms | 21242.4 tokens/sec
2024-11-04 12:31:50,549 - Iteration 7449 | Train Loss 5.13491 | Training Time: 0009h 39m 22s | Iteration Time: 4627.110 ms | 21245.2 tokens/sec
2024-11-04 12:33:23,057 - Iteration 7469 | Train Loss 4.96263 | Training Time: 0009h 40m 55s | Iteration Time: 4625.416 ms | 21253.0 tokens/sec
2024-11-04 12:33:32,308 - Iteration 7471 | Train Loss 4.90773 | Training Time: 0009h 41m 04s | Iteration Time: 4625.551 ms | 21252.4 tokens/sec
2024-11-04 12:35:39,379 - Iteration 7498 | Train Loss 4.89858 | Training Time: 0009h 43m 11s | Iteration Time: 4706.338 ms | 20887.6 tokens/sec
2024-11-04 12:36:13,994 - Iteration 7505 | Train Loss 4.85303 | Training Time: 0009h 43m 46s | Iteration Time: 4944.986 ms | 19879.5 tokens/sec
2024-11-04 12:39:39,543 - Iteration 7549 | Train Loss 5.18989 | Training Time: 0009h 47m 11s | Iteration Time: 4671.558 ms | 21043.1 tokens/sec
2024-11-04 12:41:11,983 - Iteration 7569 | Train Loss 5.16724 | Training Time: 0009h 48m 44s | Iteration Time: 4622.016 ms | 21268.6 tokens/sec
2024-11-04 12:42:44,432 - Iteration 7589 | Train Loss 5.11711 | Training Time: 0009h 50m 16s | Iteration Time: 4622.444 ms | 21266.7 tokens/sec
2024-11-04 12:44:16,877 - Iteration 7609 | Train Loss 5.16550 | Training Time: 0009h 51m 48s | Iteration Time: 4622.252 ms | 21267.6 tokens/sec
2024-11-04 12:45:49,386 - Iteration 7629 | Train Loss 5.12838 | Training Time: 0009h 53m 21s | Iteration Time: 4625.433 ms | 21252.9 tokens/sec
2024-11-04 12:47:22,018 - Iteration 7649 | Train Loss 5.25766 | Training Time: 0009h 54m 54s | Iteration Time: 4631.590 ms | 21224.7 tokens/sec
2024-11-04 12:48:54,544 - Iteration 7669 | Train Loss 5.04445 | Training Time: 0009h 56m 26s | Iteration Time: 4626.320 ms | 21248.9 tokens/sec
2024-11-04 12:50:27,075 - Iteration 7689 | Train Loss 5.03862 | Training Time: 0009h 57m 59s | Iteration Time: 4626.574 ms | 21247.7 tokens/sec
2024-11-04 12:51:59,563 - Iteration 7709 | Train Loss 4.96941 | Training Time: 0009h 59m 31s | Iteration Time: 4624.375 ms | 21257.8 tokens/sec
2024-11-04 12:53:32,040 - Iteration 7729 | Train Loss 4.91994 | Training Time: 0010h 01m 04s | Iteration Time: 4623.859 ms | 21260.2 tokens/sec
2024-11-04 12:54:36,773 - Iteration 7743 | Train Loss 4.84610 | Training Time: 0010h 02m 08s | Iteration Time: 4623.809 ms | 21260.4 tokens/sec
2024-11-04 12:55:02,157 - Iteration 7748 | Train Loss 4.84075 | Training Time: 0010h 02m 34s | Iteration Time: 5076.744 ms | 19363.6 tokens/sec
2024-11-04 12:55:13,699 - Iteration 7750 | Train Loss 4.80932 | Training Time: 0010h 02m 45s | Iteration Time: 5770.720 ms | 17035.0 tokens/sec
2024-11-04 12:56:39,096 - Iteration 7768 | Train Loss 4.78632 | Training Time: 0010h 04m 11s | Iteration Time: 4744.313 ms | 20720.4 tokens/sec
2024-11-04 13:01:23,338 - Iteration 7829 | Train Loss 5.08352 | Training Time: 0010h 08m 55s | Iteration Time: 4659.709 ms | 21096.6 tokens/sec
2024-11-04 13:02:55,834 - Iteration 7849 | Train Loss 5.01246 | Training Time: 0010h 10m 27s | Iteration Time: 4624.758 ms | 21256.0 tokens/sec
2024-11-04 13:04:28,286 - Iteration 7869 | Train Loss 5.14130 | Training Time: 0010h 12m 00s | Iteration Time: 4622.616 ms | 21265.9 tokens/sec
2024-11-04 13:06:00,719 - Iteration 7889 | Train Loss 5.10476 | Training Time: 0010h 13m 32s | Iteration Time: 4621.629 ms | 21270.4 tokens/sec
2024-11-04 13:07:33,125 - Iteration 7909 | Train Loss 5.00627 | Training Time: 0010h 15m 05s | Iteration Time: 4620.330 ms | 21276.4 tokens/sec
2024-11-04 13:09:05,556 - Iteration 7929 | Train Loss 5.03775 | Training Time: 0010h 16m 37s | Iteration Time: 4621.535 ms | 21270.9 tokens/sec
2024-11-04 13:10:37,959 - Iteration 7949 | Train Loss 5.04784 | Training Time: 0010h 18m 09s | Iteration Time: 4620.176 ms | 21277.1 tokens/sec
2024-11-04 13:12:10,305 - Iteration 7969 | Train Loss 4.85837 | Training Time: 0010h 19m 42s | Iteration Time: 4617.286 ms | 21290.4 tokens/sec
2024-11-04 13:13:10,340 - Iteration 7982 | Train Loss 4.76916 | Training Time: 0010h 20m 42s | Iteration Time: 4618.039 ms | 21287.0 tokens/sec
2024-11-04 13:15:17,258 - Iteration 8009 | Train Loss 4.94439 | Training Time: 0010h 22m 49s | Iteration Time: 4700.687 ms | 20912.7 tokens/sec
2024-11-04 13:16:49,589 - Iteration 8029 | Train Loss 5.08135 | Training Time: 0010h 24m 21s | Iteration Time: 4616.569 ms | 21293.7 tokens/sec
2024-11-04 13:18:21,907 - Iteration 8049 | Train Loss 5.08029 | Training Time: 0010h 25m 53s | Iteration Time: 4615.860 ms | 21297.0 tokens/sec
2024-11-04 13:19:54,236 - Iteration 8069 | Train Loss 5.05208 | Training Time: 0010h 27m 26s | Iteration Time: 4616.476 ms | 21294.2 tokens/sec
2024-11-04 13:21:26,685 - Iteration 8089 | Train Loss 5.08667 | Training Time: 0010h 28m 58s | Iteration Time: 4622.420 ms | 21266.8 tokens/sec
2024-11-04 13:22:59,447 - Iteration 8109 | Train Loss 5.09066 | Training Time: 0010h 30m 31s | Iteration Time: 4638.103 ms | 21194.9 tokens/sec
2024-11-04 13:24:32,177 - Iteration 8129 | Train Loss 5.05319 | Training Time: 0010h 32m 04s | Iteration Time: 4636.526 ms | 21202.1 tokens/sec
2024-11-04 13:26:04,653 - Iteration 8149 | Train Loss 5.03944 | Training Time: 0010h 33m 36s | Iteration Time: 4623.791 ms | 21260.5 tokens/sec
2024-11-04 13:27:37,042 - Iteration 8169 | Train Loss 5.07868 | Training Time: 0010h 35m 09s | Iteration Time: 4619.437 ms | 21280.5 tokens/sec
2024-11-04 13:29:09,343 - Iteration 8189 | Train Loss 4.93933 | Training Time: 0010h 36m 41s | Iteration Time: 4615.069 ms | 21300.7 tokens/sec
2024-11-04 13:29:46,258 - Iteration 8197 | Train Loss 4.56646 | Training Time: 0010h 37m 18s | Iteration Time: 4614.371 ms | 21303.9 tokens/sec
2024-11-04 13:32:16,111 - Iteration 8229 | Train Loss 4.79928 | Training Time: 0010h 39m 48s | Iteration Time: 4682.905 ms | 20992.1 tokens/sec
2024-11-04 13:33:48,374 - Iteration 8249 | Train Loss 4.95300 | Training Time: 0010h 41m 20s | Iteration Time: 4613.159 ms | 21309.5 tokens/sec
2024-11-04 13:35:20,645 - Iteration 8269 | Train Loss 4.99937 | Training Time: 0010h 42m 52s | Iteration Time: 4613.539 ms | 21307.7 tokens/sec
2024-11-04 13:36:52,940 - Iteration 8289 | Train Loss 5.04778 | Training Time: 0010h 44m 24s | Iteration Time: 4614.762 ms | 21302.1 tokens/sec
2024-11-04 13:38:25,211 - Iteration 8309 | Train Loss 4.96399 | Training Time: 0010h 45m 57s | Iteration Time: 4613.538 ms | 21307.7 tokens/sec
2024-11-04 13:39:57,454 - Iteration 8329 | Train Loss 4.90798 | Training Time: 0010h 47m 29s | Iteration Time: 4612.165 ms | 21314.1 tokens/sec
2024-11-04 13:41:29,703 - Iteration 8349 | Train Loss 5.00775 | Training Time: 0010h 49m 01s | Iteration Time: 4612.440 ms | 21312.8 tokens/sec
2024-11-04 13:43:01,970 - Iteration 8369 | Train Loss 4.97347 | Training Time: 0010h 50m 33s | Iteration Time: 4613.320 ms | 21308.7 tokens/sec
2024-11-04 13:44:34,246 - Iteration 8389 | Train Loss 4.83890 | Training Time: 0010h 52m 06s | Iteration Time: 4613.812 ms | 21306.5 tokens/sec
2024-11-04 13:46:06,511 - Iteration 8409 | Train Loss 4.80516 | Training Time: 0010h 53m 38s | Iteration Time: 4613.276 ms | 21308.9 tokens/sec
2024-11-04 13:47:38,808 - Iteration 8429 | Train Loss 4.97022 | Training Time: 0010h 55m 10s | Iteration Time: 4614.814 ms | 21301.8 tokens/sec
2024-11-04 13:49:11,373 - Iteration 8449 | Train Loss 4.81479 | Training Time: 0010h 56m 43s | Iteration Time: 4628.292 ms | 21239.8 tokens/sec
2024-11-04 13:50:44,066 - Iteration 8469 | Train Loss 4.84822 | Training Time: 0010h 58m 16s | Iteration Time: 4634.627 ms | 21210.8 tokens/sec
2024-11-04 13:51:25,717 - Iteration 8478 | Train Loss 4.53189 | Training Time: 0010h 58m 57s | Iteration Time: 4627.900 ms | 21241.6 tokens/sec
2024-11-04 13:53:51,126 - Iteration 8509 | Train Loss 4.91319 | Training Time: 0011h 01m 23s | Iteration Time: 4690.609 ms | 20957.6 tokens/sec
2024-11-04 13:55:23,380 - Iteration 8529 | Train Loss 4.96785 | Training Time: 0011h 02m 55s | Iteration Time: 4612.712 ms | 21311.5 tokens/sec
2024-11-04 13:56:55,644 - Iteration 8549 | Train Loss 5.08599 | Training Time: 0011h 04m 27s | Iteration Time: 4613.176 ms | 21309.4 tokens/sec
2024-11-04 13:58:27,874 - Iteration 8569 | Train Loss 4.88682 | Training Time: 0011h 05m 59s | Iteration Time: 4611.493 ms | 21317.2 tokens/sec
2024-11-04 14:00:00,053 - Iteration 8589 | Train Loss 4.95486 | Training Time: 0011h 07m 32s | Iteration Time: 4608.994 ms | 21328.7 tokens/sec
2024-11-04 14:01:32,198 - Iteration 8609 | Train Loss 4.88573 | Training Time: 0011h 09m 04s | Iteration Time: 4607.208 ms | 21337.0 tokens/sec
2024-11-04 14:03:04,370 - Iteration 8629 | Train Loss 4.87100 | Training Time: 0011h 10m 36s | Iteration Time: 4608.625 ms | 21330.4 tokens/sec
2024-11-04 14:04:36,531 - Iteration 8649 | Train Loss 4.79040 | Training Time: 0011h 12m 08s | Iteration Time: 4608.035 ms | 21333.2 tokens/sec
2024-11-04 14:06:08,696 - Iteration 8669 | Train Loss 4.88670 | Training Time: 0011h 13m 40s | Iteration Time: 4608.257 ms | 21332.1 tokens/sec
2024-11-04 14:07:40,885 - Iteration 8689 | Train Loss 4.82717 | Training Time: 0011h 15m 12s | Iteration Time: 4609.432 ms | 21326.7 tokens/sec
2024-11-04 14:09:13,053 - Iteration 8709 | Train Loss 4.79666 | Training Time: 0011h 16m 45s | Iteration Time: 4608.436 ms | 21331.3 tokens/sec
2024-11-04 14:10:45,218 - Iteration 8729 | Train Loss 4.83140 | Training Time: 0011h 18m 17s | Iteration Time: 4608.225 ms | 21332.3 tokens/sec
2024-11-04 14:12:17,378 - Iteration 8749 | Train Loss 4.71432 | Training Time: 0011h 19m 49s | Iteration Time: 4608.006 ms | 21333.3 tokens/sec
2024-11-04 14:13:49,566 - Iteration 8769 | Train Loss 4.87524 | Training Time: 0011h 21m 21s | Iteration Time: 4609.387 ms | 21326.9 tokens/sec
2024-11-04 14:15:21,740 - Iteration 8789 | Train Loss 4.95887 | Training Time: 0011h 22m 53s | Iteration Time: 4608.735 ms | 21329.9 tokens/sec
2024-11-04 14:16:53,920 - Iteration 8809 | Train Loss 4.84259 | Training Time: 0011h 24m 25s | Iteration Time: 4608.957 ms | 21328.9 tokens/sec
2024-11-04 14:18:26,107 - Iteration 8829 | Train Loss 4.91186 | Training Time: 0011h 25m 58s | Iteration Time: 4609.384 ms | 21326.9 tokens/sec
2024-11-04 14:19:58,271 - Iteration 8849 | Train Loss 4.84688 | Training Time: 0011h 27m 30s | Iteration Time: 4608.203 ms | 21332.4 tokens/sec
2024-11-04 14:21:30,464 - Iteration 8869 | Train Loss 4.85336 | Training Time: 0011h 29m 02s | Iteration Time: 4609.654 ms | 21325.7 tokens/sec
2024-11-04 14:23:02,641 - Iteration 8889 | Train Loss 4.91769 | Training Time: 0011h 30m 34s | Iteration Time: 4608.824 ms | 21329.5 tokens/sec
2024-11-04 14:24:34,834 - Iteration 8909 | Train Loss 4.85897 | Training Time: 0011h 32m 06s | Iteration Time: 4609.665 ms | 21325.6 tokens/sec
2024-11-04 14:26:07,016 - Iteration 8929 | Train Loss 4.71333 | Training Time: 0011h 33m 39s | Iteration Time: 4609.107 ms | 21328.2 tokens/sec
2024-11-04 14:27:39,177 - Iteration 8949 | Train Loss 4.64374 | Training Time: 0011h 35m 11s | Iteration Time: 4608.019 ms | 21333.2 tokens/sec
2024-11-04 14:29:11,369 - Iteration 8969 | Train Loss 4.72388 | Training Time: 0011h 36m 43s | Iteration Time: 4609.641 ms | 21325.7 tokens/sec
2024-11-04 14:30:43,578 - Iteration 8989 | Train Loss 4.70004 | Training Time: 0011h 38m 15s | Iteration Time: 4610.442 ms | 21322.0 tokens/sec
2024-11-04 14:32:15,746 - Iteration 9009 | Train Loss 4.94921 | Training Time: 0011h 39m 47s | Iteration Time: 4608.390 ms | 21331.5 tokens/sec
2024-11-04 14:33:47,902 - Iteration 9029 | Train Loss 4.78864 | Training Time: 0011h 41m 19s | Iteration Time: 4607.775 ms | 21334.4 tokens/sec
2024-11-04 14:35:20,089 - Iteration 9049 | Train Loss 4.72820 | Training Time: 0011h 42m 52s | Iteration Time: 4609.381 ms | 21326.9 tokens/sec
2024-11-04 14:36:52,281 - Iteration 9069 | Train Loss 4.86910 | Training Time: 0011h 44m 24s | Iteration Time: 4609.597 ms | 21325.9 tokens/sec
2024-11-04 14:38:24,449 - Iteration 9089 | Train Loss 4.81776 | Training Time: 0011h 45m 56s | Iteration Time: 4608.394 ms | 21331.5 tokens/sec
2024-11-04 14:39:56,634 - Iteration 9109 | Train Loss 5.00264 | Training Time: 0011h 47m 28s | Iteration Time: 4609.242 ms | 21327.6 tokens/sec
2024-11-04 14:41:28,787 - Iteration 9129 | Train Loss 4.76874 | Training Time: 0011h 49m 00s | Iteration Time: 4607.659 ms | 21334.9 tokens/sec
2024-11-04 14:43:00,957 - Iteration 9149 | Train Loss 4.80129 | Training Time: 0011h 50m 32s | Iteration Time: 4608.479 ms | 21331.1 tokens/sec
2024-11-04 14:44:33,109 - Iteration 9169 | Train Loss 4.87473 | Training Time: 0011h 52m 05s | Iteration Time: 4607.639 ms | 21335.0 tokens/sec
2024-11-04 14:46:05,270 - Iteration 9189 | Train Loss 4.68973 | Training Time: 0011h 53m 37s | Iteration Time: 4608.016 ms | 21333.3 tokens/sec
2024-11-04 14:47:37,425 - Iteration 9209 | Train Loss 4.61782 | Training Time: 0011h 55m 09s | Iteration Time: 4607.787 ms | 21334.3 tokens/sec
2024-11-04 14:48:28,104 - Iteration 9220 | Train Loss 4.50791 | Training Time: 0011h 56m 00s | Iteration Time: 4607.099 ms | 21337.5 tokens/sec
2024-11-04 14:50:25,506 - Iteration 9245 | Train Loss 4.43546 | Training Time: 0011h 57m 57s | Iteration Time: 4696.111 ms | 20933.1 tokens/sec
2024-11-04 14:52:18,296 - Iteration 9269 | Train Loss 4.76245 | Training Time: 0011h 59m 50s | Iteration Time: 4699.554 ms | 20917.7 tokens/sec
2024-11-04 14:53:50,431 - Iteration 9289 | Train Loss 4.64998 | Training Time: 0012h 01m 22s | Iteration Time: 4606.774 ms | 21339.0 tokens/sec
2024-11-04 14:55:22,574 - Iteration 9309 | Train Loss 4.82187 | Training Time: 0012h 02m 54s | Iteration Time: 4607.134 ms | 21337.3 tokens/sec
2024-11-04 14:56:54,714 - Iteration 9329 | Train Loss 4.77370 | Training Time: 0012h 04m 26s | Iteration Time: 4607.024 ms | 21337.9 tokens/sec
2024-11-04 14:58:26,914 - Iteration 9349 | Train Loss 4.85144 | Training Time: 0012h 05m 58s | Iteration Time: 4609.995 ms | 21324.1 tokens/sec
2024-11-04 14:59:59,127 - Iteration 9369 | Train Loss 4.71072 | Training Time: 0012h 07m 31s | Iteration Time: 4610.642 ms | 21321.1 tokens/sec
2024-11-04 15:01:31,332 - Iteration 9389 | Train Loss 4.71571 | Training Time: 0012h 09m 03s | Iteration Time: 4610.247 ms | 21322.9 tokens/sec
2024-11-04 15:03:03,517 - Iteration 9409 | Train Loss 4.62355 | Training Time: 0012h 10m 35s | Iteration Time: 4609.254 ms | 21327.5 tokens/sec
2024-11-04 15:04:35,699 - Iteration 9429 | Train Loss 4.67471 | Training Time: 0012h 12m 07s | Iteration Time: 4609.120 ms | 21328.1 tokens/sec
2024-11-04 15:05:49,417 - Iteration 9445 | Train Loss 4.42165 | Training Time: 0012h 13m 21s | Iteration Time: 4607.334 ms | 21336.4 tokens/sec
2024-11-04 15:07:42,287 - Iteration 9469 | Train Loss 4.63099 | Training Time: 0012h 15m 14s | Iteration Time: 4702.946 ms | 20902.6 tokens/sec
2024-11-04 15:09:14,450 - Iteration 9489 | Train Loss 4.57863 | Training Time: 0012h 16m 46s | Iteration Time: 4608.107 ms | 21332.8 tokens/sec
2024-11-04 15:10:46,593 - Iteration 9509 | Train Loss 4.72165 | Training Time: 0012h 18m 18s | Iteration Time: 4607.171 ms | 21337.2 tokens/sec
2024-11-04 15:12:18,717 - Iteration 9529 | Train Loss 4.82580 | Training Time: 0012h 19m 50s | Iteration Time: 4606.177 ms | 21341.8 tokens/sec
2024-11-04 15:13:50,852 - Iteration 9549 | Train Loss 4.77276 | Training Time: 0012h 21m 22s | Iteration Time: 4606.781 ms | 21339.0 tokens/sec
2024-11-04 15:15:23,002 - Iteration 9569 | Train Loss 4.85380 | Training Time: 0012h 22m 55s | Iteration Time: 4607.488 ms | 21335.7 tokens/sec
2024-11-04 15:16:55,125 - Iteration 9589 | Train Loss 4.70250 | Training Time: 0012h 24m 27s | Iteration Time: 4606.130 ms | 21342.0 tokens/sec
2024-11-04 15:18:27,264 - Iteration 9609 | Train Loss 4.73815 | Training Time: 0012h 25m 59s | Iteration Time: 4606.969 ms | 21338.1 tokens/sec
2024-11-04 15:19:59,413 - Iteration 9629 | Train Loss 4.70237 | Training Time: 0012h 27m 31s | Iteration Time: 4607.471 ms | 21335.8 tokens/sec
2024-11-04 15:21:31,545 - Iteration 9649 | Train Loss 4.65765 | Training Time: 0012h 29m 03s | Iteration Time: 4606.586 ms | 21339.9 tokens/sec
2024-11-04 15:23:03,683 - Iteration 9669 | Train Loss 4.52187 | Training Time: 0012h 30m 35s | Iteration Time: 4606.877 ms | 21338.5 tokens/sec
2024-11-04 15:24:35,815 - Iteration 9689 | Train Loss 4.55722 | Training Time: 0012h 32m 07s | Iteration Time: 4606.607 ms | 21339.8 tokens/sec
2024-11-04 15:26:07,947 - Iteration 9709 | Train Loss 4.52235 | Training Time: 0012h 33m 39s | Iteration Time: 4606.613 ms | 21339.8 tokens/sec
2024-11-04 15:26:21,779 - Iteration 9712 | Train Loss 4.42140 | Training Time: 0012h 33m 53s | Iteration Time: 4610.782 ms | 21320.5 tokens/sec
2024-11-04 15:29:14,496 - Iteration 9749 | Train Loss 4.54245 | Training Time: 0012h 36m 46s | Iteration Time: 4668.027 ms | 21059.0 tokens/sec
2024-11-04 15:30:46,635 - Iteration 9769 | Train Loss 4.64187 | Training Time: 0012h 38m 18s | Iteration Time: 4606.928 ms | 21338.3 tokens/sec
2024-11-04 15:32:18,785 - Iteration 9789 | Train Loss 4.65753 | Training Time: 0012h 39m 50s | Iteration Time: 4607.525 ms | 21335.5 tokens/sec
2024-11-04 15:33:50,942 - Iteration 9809 | Train Loss 4.85357 | Training Time: 0012h 41m 22s | Iteration Time: 4607.837 ms | 21334.1 tokens/sec
2024-11-04 15:35:23,106 - Iteration 9829 | Train Loss 4.72353 | Training Time: 0012h 42m 55s | Iteration Time: 4608.198 ms | 21332.4 tokens/sec
2024-11-04 15:36:55,279 - Iteration 9849 | Train Loss 4.74056 | Training Time: 0012h 44m 27s | Iteration Time: 4608.636 ms | 21330.4 tokens/sec
2024-11-04 15:38:27,446 - Iteration 9869 | Train Loss 4.64060 | Training Time: 0012h 45m 59s | Iteration Time: 4608.362 ms | 21331.7 tokens/sec
2024-11-04 15:39:59,573 - Iteration 9889 | Train Loss 4.57845 | Training Time: 0012h 47m 31s | Iteration Time: 4606.359 ms | 21340.9 tokens/sec
2024-11-04 15:41:31,703 - Iteration 9909 | Train Loss 4.60568 | Training Time: 0012h 49m 03s | Iteration Time: 4606.490 ms | 21340.3 tokens/sec
2024-11-04 15:43:03,843 - Iteration 9929 | Train Loss 4.62800 | Training Time: 0012h 50m 35s | Iteration Time: 4606.992 ms | 21338.0 tokens/sec
2024-11-04 15:44:35,970 - Iteration 9949 | Train Loss 4.61034 | Training Time: 0012h 52m 07s | Iteration Time: 4606.352 ms | 21341.0 tokens/sec
2024-11-04 15:46:08,126 - Iteration 9969 | Train Loss 4.47868 | Training Time: 0012h 53m 40s | Iteration Time: 4607.807 ms | 21334.2 tokens/sec
2024-11-04 15:46:54,212 - Iteration 9979 | Train Loss 4.36293 | Training Time: 0012h 54m 26s | Iteration Time: 4608.592 ms | 21330.6 tokens/sec
2024-11-04 15:49:14,729 - Iteration 10009 | Train Loss 4.52810 | Training Time: 0012h 56m 46s | Iteration Time: 4683.899 ms | 20987.6 tokens/sec
2024-11-04 15:50:46,937 - Iteration 10029 | Train Loss 4.68328 | Training Time: 0012h 58m 18s | Iteration Time: 4610.420 ms | 21322.1 tokens/sec
2024-11-04 15:52:19,067 - Iteration 10049 | Train Loss 4.69021 | Training Time: 0012h 59m 51s | Iteration Time: 4606.507 ms | 21340.2 tokens/sec
2024-11-04 15:53:51,185 - Iteration 10069 | Train Loss 4.57954 | Training Time: 0013h 01m 23s | Iteration Time: 4605.900 ms | 21343.1 tokens/sec
2024-11-04 15:55:23,305 - Iteration 10089 | Train Loss 4.62399 | Training Time: 0013h 02m 55s | Iteration Time: 4605.954 ms | 21342.8 tokens/sec
2024-11-04 15:56:55,417 - Iteration 10109 | Train Loss 4.60654 | Training Time: 0013h 04m 27s | Iteration Time: 4605.599 ms | 21344.5 tokens/sec
2024-11-04 15:58:13,700 - Iteration 10126 | Train Loss 4.29788 | Training Time: 0013h 05m 45s | Iteration Time: 4604.932 ms | 21347.5 tokens/sec
2024-11-04 16:00:01,886 - Iteration 10149 | Train Loss 4.58412 | Training Time: 0013h 07m 33s | Iteration Time: 4703.730 ms | 20899.2 tokens/sec
2024-11-04 16:01:34,024 - Iteration 10169 | Train Loss 4.58932 | Training Time: 0013h 09m 06s | Iteration Time: 4606.889 ms | 21338.5 tokens/sec
2024-11-04 16:03:06,156 - Iteration 10189 | Train Loss 4.59000 | Training Time: 0013h 10m 38s | Iteration Time: 4606.613 ms | 21339.8 tokens/sec
2024-11-04 16:04:38,304 - Iteration 10209 | Train Loss 4.47641 | Training Time: 0013h 12m 10s | Iteration Time: 4607.406 ms | 21336.1 tokens/sec
2024-11-04 16:06:10,417 - Iteration 10229 | Train Loss 4.51329 | Training Time: 0013h 13m 42s | Iteration Time: 4605.641 ms | 21344.3 tokens/sec
2024-11-04 16:07:42,506 - Iteration 10249 | Train Loss 4.46190 | Training Time: 0013h 15m 14s | Iteration Time: 4604.434 ms | 21349.9 tokens/sec
2024-11-04 16:09:14,570 - Iteration 10269 | Train Loss 4.72474 | Training Time: 0013h 16m 46s | Iteration Time: 4603.232 ms | 21355.4 tokens/sec
2024-11-04 16:10:46,662 - Iteration 10289 | Train Loss 4.75402 | Training Time: 0013h 18m 18s | Iteration Time: 4604.578 ms | 21349.2 tokens/sec
2024-11-04 16:12:18,737 - Iteration 10309 | Train Loss 4.68115 | Training Time: 0013h 19m 50s | Iteration Time: 4603.752 ms | 21353.0 tokens/sec
2024-11-04 16:13:50,831 - Iteration 10329 | Train Loss 4.60293 | Training Time: 0013h 21m 22s | Iteration Time: 4604.714 ms | 21348.6 tokens/sec
2024-11-04 16:15:22,925 - Iteration 10349 | Train Loss 4.59340 | Training Time: 0013h 22m 54s | Iteration Time: 4604.688 ms | 21348.7 tokens/sec
2024-11-04 16:16:55,024 - Iteration 10369 | Train Loss 4.56192 | Training Time: 0013h 24m 27s | Iteration Time: 4604.935 ms | 21347.5 tokens/sec
2024-11-04 16:18:27,122 - Iteration 10389 | Train Loss 4.61738 | Training Time: 0013h 25m 59s | Iteration Time: 4604.923 ms | 21347.6 tokens/sec
2024-11-04 16:19:59,215 - Iteration 10409 | Train Loss 4.64390 | Training Time: 0013h 27m 31s | Iteration Time: 4604.615 ms | 21349.0 tokens/sec
2024-11-04 16:21:31,286 - Iteration 10429 | Train Loss 4.56288 | Training Time: 0013h 29m 03s | Iteration Time: 4603.568 ms | 21353.9 tokens/sec
2024-11-04 16:23:03,362 - Iteration 10449 | Train Loss 4.51223 | Training Time: 0013h 30m 35s | Iteration Time: 4603.784 ms | 21352.9 tokens/sec
2024-11-04 16:24:35,472 - Iteration 10469 | Train Loss 4.45409 | Training Time: 0013h 32m 07s | Iteration Time: 4605.519 ms | 21344.8 tokens/sec
2024-11-04 16:26:07,534 - Iteration 10489 | Train Loss 4.37974 | Training Time: 0013h 33m 39s | Iteration Time: 4603.100 ms | 21356.0 tokens/sec
2024-11-04 16:27:39,614 - Iteration 10509 | Train Loss 4.35848 | Training Time: 0013h 35m 11s | Iteration Time: 4604.023 ms | 21351.8 tokens/sec
2024-11-04 16:29:11,695 - Iteration 10529 | Train Loss 4.66874 | Training Time: 0013h 36m 43s | Iteration Time: 4604.025 ms | 21351.7 tokens/sec
2024-11-04 16:30:43,747 - Iteration 10549 | Train Loss 4.66898 | Training Time: 0013h 38m 15s | Iteration Time: 4602.580 ms | 21358.5 tokens/sec
2024-11-04 16:32:15,815 - Iteration 10569 | Train Loss 4.58306 | Training Time: 0013h 39m 47s | Iteration Time: 4603.438 ms | 21354.5 tokens/sec
2024-11-04 16:33:47,887 - Iteration 10589 | Train Loss 4.51865 | Training Time: 0013h 41m 19s | Iteration Time: 4603.571 ms | 21353.9 tokens/sec
2024-11-04 16:35:19,955 - Iteration 10609 | Train Loss 4.49848 | Training Time: 0013h 42m 51s | Iteration Time: 4603.409 ms | 21354.6 tokens/sec
2024-11-04 16:36:52,052 - Iteration 10629 | Train Loss 4.61497 | Training Time: 0013h 44m 24s | Iteration Time: 4604.882 ms | 21347.8 tokens/sec
2024-11-04 16:38:24,105 - Iteration 10649 | Train Loss 4.48286 | Training Time: 0013h 45m 56s | Iteration Time: 4602.601 ms | 21358.4 tokens/sec
2024-11-04 16:39:56,213 - Iteration 10669 | Train Loss 4.48561 | Training Time: 0013h 47m 28s | Iteration Time: 4605.426 ms | 21345.3 tokens/sec
2024-11-04 16:41:28,325 - Iteration 10689 | Train Loss 4.54641 | Training Time: 0013h 49m 00s | Iteration Time: 4605.603 ms | 21344.4 tokens/sec
2024-11-04 16:43:00,405 - Iteration 10709 | Train Loss 4.45939 | Training Time: 0013h 50m 32s | Iteration Time: 4603.970 ms | 21352.0 tokens/sec
2024-11-04 16:44:32,508 - Iteration 10729 | Train Loss 4.31272 | Training Time: 0013h 52m 04s | Iteration Time: 4605.169 ms | 21346.4 tokens/sec
2024-11-04 16:46:04,622 - Iteration 10749 | Train Loss 4.45488 | Training Time: 0013h 53m 36s | Iteration Time: 4605.704 ms | 21344.0 tokens/sec
2024-11-04 16:46:36,869 - Iteration 10756 | Train Loss 4.25401 | Training Time: 0013h 54m 08s | Iteration Time: 4606.780 ms | 21339.0 tokens/sec
2024-11-04 22:12:16,727 - Iteration 10763 | Train Loss 4.20283 | Training Time: 0013h 55m 25s | Iteration Time: 7.098 ms | 13850269.6 tokens/sec
2024-11-04 22:14:49,450 - Iteration 10796 | Train Loss 4.30272 | Training Time: 0013h 57m 58s | Iteration Time: 4627.943 ms | 21241.4 tokens/sec
2024-11-04 22:16:20,735 - Iteration 10816 | Train Loss 4.33925 | Training Time: 0013h 59m 29s | Iteration Time: 4564.287 ms | 21537.6 tokens/sec
2024-11-04 22:17:52,206 - Iteration 10836 | Train Loss 4.51801 | Training Time: 0014h 01m 00s | Iteration Time: 4573.520 ms | 21494.2 tokens/sec
2024-11-04 22:19:23,801 - Iteration 10856 | Train Loss 4.49533 | Training Time: 0014h 02m 32s | Iteration Time: 4579.777 ms | 21464.8 tokens/sec
2024-11-04 22:20:55,520 - Iteration 10876 | Train Loss 4.53561 | Training Time: 0014h 04m 04s | Iteration Time: 4585.936 ms | 21436.0 tokens/sec
2024-11-04 22:22:27,328 - Iteration 10896 | Train Loss 4.51986 | Training Time: 0014h 05m 35s | Iteration Time: 4590.419 ms | 21415.0 tokens/sec
2024-11-04 22:23:59,201 - Iteration 10916 | Train Loss 4.44675 | Training Time: 0014h 07m 07s | Iteration Time: 4593.616 ms | 21400.1 tokens/sec
2024-11-04 22:25:31,135 - Iteration 10936 | Train Loss 4.31550 | Training Time: 0014h 08m 39s | Iteration Time: 4596.704 ms | 21385.8 tokens/sec
2024-11-04 22:27:03,043 - Iteration 10956 | Train Loss 4.47230 | Training Time: 0014h 10m 11s | Iteration Time: 4595.433 ms | 21391.7 tokens/sec
2024-11-04 22:28:35,009 - Iteration 10976 | Train Loss 4.40511 | Training Time: 0014h 11m 43s | Iteration Time: 4598.279 ms | 21378.4 tokens/sec
2024-11-04 22:30:06,999 - Iteration 10996 | Train Loss 4.37810 | Training Time: 0014h 13m 15s | Iteration Time: 4599.475 ms | 21372.9 tokens/sec
2024-11-04 22:31:39,030 - Iteration 11016 | Train Loss 4.32923 | Training Time: 0014h 14m 47s | Iteration Time: 4601.556 ms | 21363.2 tokens/sec
2024-11-04 22:33:11,052 - Iteration 11036 | Train Loss 4.29771 | Training Time: 0014h 16m 19s | Iteration Time: 4601.109 ms | 21365.3 tokens/sec
2024-11-04 22:34:43,080 - Iteration 11056 | Train Loss 4.45348 | Training Time: 0014h 17m 51s | Iteration Time: 4601.421 ms | 21363.8 tokens/sec
2024-11-04 22:36:15,120 - Iteration 11076 | Train Loss 4.75069 | Training Time: 0014h 19m 23s | Iteration Time: 4601.972 ms | 21361.3 tokens/sec
2024-11-04 22:37:47,204 - Iteration 11096 | Train Loss 4.54403 | Training Time: 0014h 20m 55s | Iteration Time: 4604.215 ms | 21350.9 tokens/sec
2024-11-04 22:39:19,287 - Iteration 11116 | Train Loss 4.52291 | Training Time: 0014h 22m 27s | Iteration Time: 4604.143 ms | 21351.2 tokens/sec
2024-11-04 22:40:51,363 - Iteration 11136 | Train Loss 4.49977 | Training Time: 0014h 23m 59s | Iteration Time: 4603.814 ms | 21352.7 tokens/sec
2024-11-04 22:42:23,474 - Iteration 11156 | Train Loss 4.44929 | Training Time: 0014h 25m 32s | Iteration Time: 4605.533 ms | 21344.8 tokens/sec
2024-11-04 22:43:55,530 - Iteration 11176 | Train Loss 4.35558 | Training Time: 0014h 27m 04s | Iteration Time: 4602.825 ms | 21357.3 tokens/sec
2024-11-04 22:45:27,616 - Iteration 11196 | Train Loss 4.55609 | Training Time: 0014h 28m 36s | Iteration Time: 4604.282 ms | 21350.6 tokens/sec
2024-11-04 22:46:59,706 - Iteration 11216 | Train Loss 4.39028 | Training Time: 0014h 30m 08s | Iteration Time: 4604.482 ms | 21349.6 tokens/sec
2024-11-04 22:48:31,838 - Iteration 11236 | Train Loss 4.39264 | Training Time: 0014h 31m 40s | Iteration Time: 4606.612 ms | 21339.8 tokens/sec
2024-11-04 22:50:03,975 - Iteration 11256 | Train Loss 4.41781 | Training Time: 0014h 33m 12s | Iteration Time: 4606.849 ms | 21338.7 tokens/sec
2024-11-04 22:50:36,221 - Iteration 11263 | Train Loss 4.15886 | Training Time: 0014h 33m 44s | Iteration Time: 4606.560 ms | 21340.0 tokens/sec
2024-11-04 22:53:10,462 - Iteration 11296 | Train Loss 4.32694 | Training Time: 0014h 36m 19s | Iteration Time: 4673.990 ms | 21032.1 tokens/sec
2024-11-04 22:54:42,572 - Iteration 11316 | Train Loss 4.26283 | Training Time: 0014h 37m 51s | Iteration Time: 4605.484 ms | 21345.0 tokens/sec
2024-11-04 22:56:14,732 - Iteration 11336 | Train Loss 4.58927 | Training Time: 0014h 39m 23s | Iteration Time: 4608.006 ms | 21333.3 tokens/sec
2024-11-04 22:57:46,882 - Iteration 11356 | Train Loss 4.47663 | Training Time: 0014h 40m 55s | Iteration Time: 4607.481 ms | 21335.7 tokens/sec
2024-11-04 22:59:19,018 - Iteration 11376 | Train Loss 4.53436 | Training Time: 0014h 42m 27s | Iteration Time: 4606.790 ms | 21338.9 tokens/sec
2024-11-04 23:00:51,140 - Iteration 11396 | Train Loss 4.53501 | Training Time: 0014h 43m 59s | Iteration Time: 4606.120 ms | 21342.0 tokens/sec
2024-11-04 23:02:23,271 - Iteration 11416 | Train Loss 4.43237 | Training Time: 0014h 45m 31s | Iteration Time: 4606.553 ms | 21340.0 tokens/sec
2024-11-04 23:03:55,392 - Iteration 11436 | Train Loss 4.58344 | Training Time: 0014h 47m 03s | Iteration Time: 4606.025 ms | 21342.5 tokens/sec
2024-11-04 23:05:27,518 - Iteration 11456 | Train Loss 4.51591 | Training Time: 0014h 48m 36s | Iteration Time: 4606.314 ms | 21341.1 tokens/sec
2024-11-04 23:06:59,637 - Iteration 11476 | Train Loss 4.31025 | Training Time: 0014h 50m 08s | Iteration Time: 4605.979 ms | 21342.7 tokens/sec
2024-11-04 23:08:31,790 - Iteration 11496 | Train Loss 4.55054 | Training Time: 0014h 51m 40s | Iteration Time: 4607.614 ms | 21335.1 tokens/sec
2024-11-04 23:10:03,945 - Iteration 11516 | Train Loss 4.26982 | Training Time: 0014h 53m 12s | Iteration Time: 4607.761 ms | 21334.4 tokens/sec
2024-11-04 23:11:36,104 - Iteration 11536 | Train Loss 4.53679 | Training Time: 0014h 54m 44s | Iteration Time: 4607.930 ms | 21333.7 tokens/sec
2024-11-04 23:12:40,590 - Iteration 11550 | Train Loss 4.15331 | Training Time: 0014h 55m 49s | Iteration Time: 4606.196 ms | 21341.7 tokens/sec
2024-11-04 23:14:42,644 - Iteration 11576 | Train Loss 4.55870 | Training Time: 0014h 57m 51s | Iteration Time: 4694.381 ms | 20940.8 tokens/sec
2024-11-04 23:16:14,788 - Iteration 11596 | Train Loss 4.41264 | Training Time: 0014h 59m 23s | Iteration Time: 4607.205 ms | 21337.0 tokens/sec
2024-11-04 23:17:46,932 - Iteration 11616 | Train Loss 4.43902 | Training Time: 0015h 00m 55s | Iteration Time: 4607.169 ms | 21337.2 tokens/sec
2024-11-04 23:19:19,072 - Iteration 11636 | Train Loss 4.57285 | Training Time: 0015h 02m 27s | Iteration Time: 4607.039 ms | 21337.8 tokens/sec
2024-11-04 23:20:51,192 - Iteration 11656 | Train Loss 4.51726 | Training Time: 0015h 03m 59s | Iteration Time: 4605.993 ms | 21342.6 tokens/sec
2024-11-04 23:22:23,309 - Iteration 11676 | Train Loss 4.56214 | Training Time: 0015h 05m 31s | Iteration Time: 4605.824 ms | 21343.4 tokens/sec
2024-11-04 23:23:55,403 - Iteration 11696 | Train Loss 4.44166 | Training Time: 0015h 07m 03s | Iteration Time: 4604.726 ms | 21348.5 tokens/sec
2024-11-04 23:25:27,507 - Iteration 11716 | Train Loss 4.34431 | Training Time: 0015h 08m 36s | Iteration Time: 4605.183 ms | 21346.4 tokens/sec
2024-11-04 23:26:59,643 - Iteration 11736 | Train Loss 4.36347 | Training Time: 0015h 10m 08s | Iteration Time: 4606.801 ms | 21338.9 tokens/sec
2024-11-04 23:28:31,769 - Iteration 11756 | Train Loss 4.53830 | Training Time: 0015h 11m 40s | Iteration Time: 4606.281 ms | 21341.3 tokens/sec
2024-11-04 23:30:03,890 - Iteration 11776 | Train Loss 4.27898 | Training Time: 0015h 13m 12s | Iteration Time: 4606.055 ms | 21342.3 tokens/sec
2024-11-04 23:31:12,979 - Iteration 11791 | Train Loss 4.13749 | Training Time: 0015h 14m 21s | Iteration Time: 4605.925 ms | 21342.9 tokens/sec
2024-11-04 23:33:10,547 - Iteration 11816 | Train Loss 4.36530 | Training Time: 0015h 16m 19s | Iteration Time: 4702.729 ms | 20903.6 tokens/sec
2024-11-04 23:34:42,750 - Iteration 11836 | Train Loss 4.36945 | Training Time: 0015h 17m 51s | Iteration Time: 4610.175 ms | 21323.3 tokens/sec
2024-11-04 23:36:14,934 - Iteration 11856 | Train Loss 4.53739 | Training Time: 0015h 19m 23s | Iteration Time: 4609.198 ms | 21327.8 tokens/sec
2024-11-04 23:37:47,123 - Iteration 11876 | Train Loss 4.47617 | Training Time: 0015h 20m 55s | Iteration Time: 4609.447 ms | 21326.6 tokens/sec
2024-11-04 23:39:19,308 - Iteration 11896 | Train Loss 4.47194 | Training Time: 0015h 22m 27s | Iteration Time: 4609.220 ms | 21327.7 tokens/sec
2024-11-04 23:40:51,492 - Iteration 11916 | Train Loss 4.46438 | Training Time: 0015h 24m 00s | Iteration Time: 4609.238 ms | 21327.6 tokens/sec
2024-11-04 23:42:23,695 - Iteration 11936 | Train Loss 4.42583 | Training Time: 0015h 25m 32s | Iteration Time: 4610.134 ms | 21323.5 tokens/sec
2024-11-04 23:43:55,892 - Iteration 11956 | Train Loss 4.34044 | Training Time: 0015h 27m 04s | Iteration Time: 4609.849 ms | 21324.8 tokens/sec
2024-11-04 23:45:28,079 - Iteration 11976 | Train Loss 4.47198 | Training Time: 0015h 28m 36s | Iteration Time: 4609.359 ms | 21327.0 tokens/sec
2024-11-04 23:47:00,267 - Iteration 11996 | Train Loss 4.32371 | Training Time: 0015h 30m 08s | Iteration Time: 4609.385 ms | 21326.9 tokens/sec
2024-11-04 23:47:50,998 - Iteration 12007 | Train Loss 4.12993 | Training Time: 0015h 30m 59s | Iteration Time: 4611.922 ms | 21315.2 tokens/sec
2024-11-04 23:50:06,940 - Iteration 12036 | Train Loss 4.19279 | Training Time: 0015h 33m 15s | Iteration Time: 4687.655 ms | 20970.8 tokens/sec
2024-11-04 23:51:39,127 - Iteration 12056 | Train Loss 4.22387 | Training Time: 0015h 34m 47s | Iteration Time: 4609.350 ms | 21327.1 tokens/sec
2024-11-04 23:53:11,315 - Iteration 12076 | Train Loss 4.50685 | Training Time: 0015h 36m 19s | Iteration Time: 4609.386 ms | 21326.9 tokens/sec
2024-11-04 23:54:43,512 - Iteration 12096 | Train Loss 4.52804 | Training Time: 0015h 37m 52s | Iteration Time: 4609.873 ms | 21324.7 tokens/sec
2024-11-04 23:56:15,731 - Iteration 12116 | Train Loss 4.35232 | Training Time: 0015h 39m 24s | Iteration Time: 4610.962 ms | 21319.6 tokens/sec
2024-11-04 23:57:47,925 - Iteration 12136 | Train Loss 4.42410 | Training Time: 0015h 40m 56s | Iteration Time: 4609.676 ms | 21325.6 tokens/sec
2024-11-04 23:59:20,121 - Iteration 12156 | Train Loss 4.36637 | Training Time: 0015h 42m 28s | Iteration Time: 4609.780 ms | 21325.1 tokens/sec
2024-11-05 00:00:52,314 - Iteration 12176 | Train Loss 4.47137 | Training Time: 0015h 44m 00s | Iteration Time: 4609.679 ms | 21325.6 tokens/sec
2024-11-05 00:02:24,564 - Iteration 12196 | Train Loss 4.32595 | Training Time: 0015h 45m 33s | Iteration Time: 4612.501 ms | 21312.5 tokens/sec
2024-11-05 00:03:56,776 - Iteration 12216 | Train Loss 4.34730 | Training Time: 0015h 47m 05s | Iteration Time: 4610.578 ms | 21321.4 tokens/sec
2024-11-05 00:05:29,000 - Iteration 12236 | Train Loss 4.27731 | Training Time: 0015h 48m 37s | Iteration Time: 4611.223 ms | 21318.4 tokens/sec
2024-11-05 00:07:01,191 - Iteration 12256 | Train Loss 4.17842 | Training Time: 0015h 50m 09s | Iteration Time: 4609.521 ms | 21326.3 tokens/sec
2024-11-05 00:08:33,371 - Iteration 12276 | Train Loss 4.46169 | Training Time: 0015h 51m 41s | Iteration Time: 4609.035 ms | 21328.5 tokens/sec
2024-11-05 00:09:56,318 - Iteration 12294 | Train Loss 4.11237 | Training Time: 0015h 53m 04s | Iteration Time: 4608.146 ms | 21332.7 tokens/sec
2024-11-05 00:11:12,336 - Iteration 12310 | Train Loss 4.06928 | Training Time: 0015h 54m 20s | Iteration Time: 4751.157 ms | 20690.5 tokens/sec
2024-11-05 00:13:14,432 - Iteration 12336 | Train Loss 4.86048 | Training Time: 0015h 56m 22s | Iteration Time: 4695.979 ms | 20933.7 tokens/sec
2024-11-05 00:14:46,586 - Iteration 12356 | Train Loss 4.48738 | Training Time: 0015h 57m 55s | Iteration Time: 4607.715 ms | 21334.7 tokens/sec
2024-11-05 00:16:18,763 - Iteration 12376 | Train Loss 4.41359 | Training Time: 0015h 59m 27s | Iteration Time: 4608.863 ms | 21329.3 tokens/sec
2024-11-05 00:17:50,977 - Iteration 12396 | Train Loss 4.44106 | Training Time: 0016h 00m 59s | Iteration Time: 4610.688 ms | 21320.9 tokens/sec
2024-11-05 00:19:23,178 - Iteration 12416 | Train Loss 4.41448 | Training Time: 0016h 02m 31s | Iteration Time: 4610.040 ms | 21323.9 tokens/sec
2024-11-05 00:20:55,348 - Iteration 12436 | Train Loss 4.35963 | Training Time: 0016h 04m 03s | Iteration Time: 4608.495 ms | 21331.0 tokens/sec
2024-11-05 00:22:27,533 - Iteration 12456 | Train Loss 4.28757 | Training Time: 0016h 05m 36s | Iteration Time: 4609.269 ms | 21327.5 tokens/sec
2024-11-05 00:23:59,736 - Iteration 12476 | Train Loss 4.29285 | Training Time: 0016h 07m 08s | Iteration Time: 4610.118 ms | 21323.5 tokens/sec
2024-11-05 00:25:31,958 - Iteration 12496 | Train Loss 4.28203 | Training Time: 0016h 08m 40s | Iteration Time: 4611.124 ms | 21318.9 tokens/sec
2024-11-05 00:26:04,231 - Iteration 12503 | Train Loss 4.03314 | Training Time: 0016h 09m 12s | Iteration Time: 4610.443 ms | 21322.0 tokens/sec
2024-11-05 00:28:38,626 - Iteration 12536 | Train Loss 4.18874 | Training Time: 0016h 11m 47s | Iteration Time: 4678.630 ms | 21011.3 tokens/sec
2024-11-05 00:30:10,841 - Iteration 12556 | Train Loss 4.27084 | Training Time: 0016h 13m 19s | Iteration Time: 4610.725 ms | 21320.7 tokens/sec
2024-11-05 00:31:43,063 - Iteration 12576 | Train Loss 4.52444 | Training Time: 0016h 14m 51s | Iteration Time: 4611.118 ms | 21318.9 tokens/sec
2024-11-05 00:33:15,326 - Iteration 12596 | Train Loss 4.38154 | Training Time: 0016h 16m 23s | Iteration Time: 4613.176 ms | 21309.4 tokens/sec
2024-11-05 00:34:47,568 - Iteration 12616 | Train Loss 4.33978 | Training Time: 0016h 17m 56s | Iteration Time: 4612.060 ms | 21314.6 tokens/sec
2024-11-05 00:36:19,813 - Iteration 12636 | Train Loss 4.49529 | Training Time: 0016h 19m 28s | Iteration Time: 4612.271 ms | 21313.6 tokens/sec
2024-11-05 00:37:52,050 - Iteration 12656 | Train Loss 4.51847 | Training Time: 0016h 21m 00s | Iteration Time: 4611.858 ms | 21315.5 tokens/sec
2024-11-05 00:39:24,298 - Iteration 12676 | Train Loss 4.41563 | Training Time: 0016h 22m 32s | Iteration Time: 4612.367 ms | 21313.1 tokens/sec
2024-11-05 00:40:56,529 - Iteration 12696 | Train Loss 4.28175 | Training Time: 0016h 24m 05s | Iteration Time: 4611.570 ms | 21316.8 tokens/sec
2024-11-05 00:42:28,781 - Iteration 12716 | Train Loss 4.25523 | Training Time: 0016h 25m 37s | Iteration Time: 4612.605 ms | 21312.0 tokens/sec
2024-11-05 00:44:01,056 - Iteration 12736 | Train Loss 4.36394 | Training Time: 0016h 27m 09s | Iteration Time: 4613.758 ms | 21306.7 tokens/sec
2024-11-05 00:45:33,311 - Iteration 12756 | Train Loss 4.21523 | Training Time: 0016h 28m 41s | Iteration Time: 4612.734 ms | 21311.4 tokens/sec
2024-11-05 00:47:05,604 - Iteration 12776 | Train Loss 4.16696 | Training Time: 0016h 30m 14s | Iteration Time: 4614.650 ms | 21302.6 tokens/sec
2024-11-05 00:48:37,934 - Iteration 12796 | Train Loss 4.17906 | Training Time: 0016h 31m 46s | Iteration Time: 4616.520 ms | 21294.0 tokens/sec
2024-11-05 00:50:10,235 - Iteration 12816 | Train Loss 4.24998 | Training Time: 0016h 33m 18s | Iteration Time: 4615.042 ms | 21300.8 tokens/sec
2024-11-05 00:51:42,523 - Iteration 12836 | Train Loss 4.40146 | Training Time: 0016h 34m 51s | Iteration Time: 4614.394 ms | 21303.8 tokens/sec
2024-11-05 00:53:14,839 - Iteration 12856 | Train Loss 4.41726 | Training Time: 0016h 36m 23s | Iteration Time: 4615.789 ms | 21297.3 tokens/sec
2024-11-05 00:54:47,182 - Iteration 12876 | Train Loss 4.34823 | Training Time: 0016h 37m 55s | Iteration Time: 4617.143 ms | 21291.1 tokens/sec
2024-11-05 00:56:19,499 - Iteration 12896 | Train Loss 4.37306 | Training Time: 0016h 39m 28s | Iteration Time: 4615.848 ms | 21297.1 tokens/sec
2024-11-05 00:57:51,827 - Iteration 12916 | Train Loss 4.22460 | Training Time: 0016h 41m 00s | Iteration Time: 4616.444 ms | 21294.3 tokens/sec
2024-11-05 00:59:24,165 - Iteration 12936 | Train Loss 4.29670 | Training Time: 0016h 42m 32s | Iteration Time: 4616.899 ms | 21292.2 tokens/sec
2024-11-05 01:00:56,504 - Iteration 12956 | Train Loss 4.29678 | Training Time: 0016h 44m 05s | Iteration Time: 4616.918 ms | 21292.1 tokens/sec
2024-11-05 01:02:28,825 - Iteration 12976 | Train Loss 4.26615 | Training Time: 0016h 45m 37s | Iteration Time: 4616.056 ms | 21296.1 tokens/sec
2024-11-05 01:04:01,157 - Iteration 12996 | Train Loss 4.28193 | Training Time: 0016h 47m 09s | Iteration Time: 4616.593 ms | 21293.6 tokens/sec
2024-11-05 01:05:33,446 - Iteration 13016 | Train Loss 4.31924 | Training Time: 0016h 48m 42s | Iteration Time: 4614.442 ms | 21303.6 tokens/sec
2024-11-05 01:07:05,697 - Iteration 13036 | Train Loss 4.23914 | Training Time: 0016h 50m 14s | Iteration Time: 4612.568 ms | 21312.2 tokens/sec
2024-11-05 01:08:37,906 - Iteration 13056 | Train Loss 4.25777 | Training Time: 0016h 51m 46s | Iteration Time: 4610.462 ms | 21321.9 tokens/sec
2024-11-05 01:10:10,117 - Iteration 13076 | Train Loss 4.36171 | Training Time: 0016h 53m 18s | Iteration Time: 4610.561 ms | 21321.5 tokens/sec
2024-11-05 01:11:42,323 - Iteration 13096 | Train Loss 4.34527 | Training Time: 0016h 54m 50s | Iteration Time: 4610.300 ms | 21322.7 tokens/sec
2024-11-05 01:13:14,522 - Iteration 13116 | Train Loss 4.42465 | Training Time: 0016h 56m 23s | Iteration Time: 4609.906 ms | 21324.5 tokens/sec
2024-11-05 01:14:46,741 - Iteration 13136 | Train Loss 4.38261 | Training Time: 0016h 57m 55s | Iteration Time: 4610.978 ms | 21319.6 tokens/sec
2024-11-05 01:16:18,959 - Iteration 13156 | Train Loss 4.35747 | Training Time: 0016h 59m 27s | Iteration Time: 4610.898 ms | 21319.9 tokens/sec
2024-11-05 01:17:51,162 - Iteration 13176 | Train Loss 4.34556 | Training Time: 0017h 00m 59s | Iteration Time: 4610.162 ms | 21323.3 tokens/sec
2024-11-05 01:19:23,362 - Iteration 13196 | Train Loss 4.11889 | Training Time: 0017h 02m 31s | Iteration Time: 4610.002 ms | 21324.1 tokens/sec
2024-11-05 01:20:55,571 - Iteration 13216 | Train Loss 4.30735 | Training Time: 0017h 04m 04s | Iteration Time: 4610.446 ms | 21322.0 tokens/sec
2024-11-05 01:22:27,778 - Iteration 13236 | Train Loss 4.27385 | Training Time: 0017h 05m 36s | Iteration Time: 4610.356 ms | 21322.4 tokens/sec
2024-11-05 01:23:59,940 - Iteration 13256 | Train Loss 4.22059 | Training Time: 0017h 07m 08s | Iteration Time: 4608.072 ms | 21333.0 tokens/sec
2024-11-05 01:25:32,081 - Iteration 13276 | Train Loss 4.18041 | Training Time: 0017h 08m 40s | Iteration Time: 4607.064 ms | 21337.7 tokens/sec
2024-11-05 01:27:04,222 - Iteration 13296 | Train Loss 4.36210 | Training Time: 0017h 10m 12s | Iteration Time: 4607.057 ms | 21337.7 tokens/sec
2024-11-05 01:27:22,659 - Iteration 13300 | Train Loss 3.92465 | Training Time: 0017h 10m 31s | Iteration Time: 4609.098 ms | 21328.3 tokens/sec
2024-11-05 01:30:10,781 - Iteration 13336 | Train Loss 4.54505 | Training Time: 0017h 13m 19s | Iteration Time: 4670.062 ms | 21049.8 tokens/sec
2024-11-05 01:31:42,946 - Iteration 13356 | Train Loss 4.36079 | Training Time: 0017h 14m 51s | Iteration Time: 4608.271 ms | 21332.1 tokens/sec
2024-11-05 01:33:15,094 - Iteration 13376 | Train Loss 4.27666 | Training Time: 0017h 16m 23s | Iteration Time: 4607.372 ms | 21336.2 tokens/sec
2024-11-05 01:34:47,272 - Iteration 13396 | Train Loss 4.30506 | Training Time: 0017h 17m 55s | Iteration Time: 4608.911 ms | 21329.1 tokens/sec
2024-11-05 01:36:19,448 - Iteration 13416 | Train Loss 4.34030 | Training Time: 0017h 19m 28s | Iteration Time: 4608.799 ms | 21329.6 tokens/sec
2024-11-05 01:37:51,638 - Iteration 13436 | Train Loss 4.28694 | Training Time: 0017h 21m 00s | Iteration Time: 4609.512 ms | 21326.3 tokens/sec
2024-11-05 01:39:23,843 - Iteration 13456 | Train Loss 4.26073 | Training Time: 0017h 22m 32s | Iteration Time: 4610.246 ms | 21322.9 tokens/sec
2024-11-05 01:40:56,047 - Iteration 13476 | Train Loss 4.25477 | Training Time: 0017h 24m 04s | Iteration Time: 4610.197 ms | 21323.2 tokens/sec
2024-11-05 01:42:28,242 - Iteration 13496 | Train Loss 4.15616 | Training Time: 0017h 25m 36s | Iteration Time: 4609.740 ms | 21325.3 tokens/sec
2024-11-05 01:44:00,452 - Iteration 13516 | Train Loss 4.16858 | Training Time: 0017h 27m 09s | Iteration Time: 4610.512 ms | 21321.7 tokens/sec
2024-11-05 01:45:32,666 - Iteration 13536 | Train Loss 3.98056 | Training Time: 0017h 28m 41s | Iteration Time: 4610.720 ms | 21320.7 tokens/sec
2024-11-05 01:47:04,886 - Iteration 13556 | Train Loss 4.28475 | Training Time: 0017h 30m 13s | Iteration Time: 4610.978 ms | 21319.6 tokens/sec
2024-11-05 01:48:37,102 - Iteration 13576 | Train Loss 4.26244 | Training Time: 0017h 31m 45s | Iteration Time: 4610.820 ms | 21320.3 tokens/sec
2024-11-05 01:50:09,305 - Iteration 13596 | Train Loss 4.17745 | Training Time: 0017h 33m 17s | Iteration Time: 4610.115 ms | 21323.5 tokens/sec
2024-11-05 01:51:41,498 - Iteration 13616 | Train Loss 4.35060 | Training Time: 0017h 34m 50s | Iteration Time: 4609.675 ms | 21325.6 tokens/sec
2024-11-05 01:53:13,699 - Iteration 13636 | Train Loss 4.29067 | Training Time: 0017h 36m 22s | Iteration Time: 4610.058 ms | 21323.8 tokens/sec
2024-11-05 01:54:45,903 - Iteration 13656 | Train Loss 4.26889 | Training Time: 0017h 37m 54s | Iteration Time: 4610.191 ms | 21323.2 tokens/sec
2024-11-05 01:56:18,075 - Iteration 13676 | Train Loss 4.32385 | Training Time: 0017h 39m 26s | Iteration Time: 4608.611 ms | 21330.5 tokens/sec
2024-11-05 01:57:50,232 - Iteration 13696 | Train Loss 4.24894 | Training Time: 0017h 40m 58s | Iteration Time: 4607.824 ms | 21334.1 tokens/sec
2024-11-05 01:59:22,402 - Iteration 13716 | Train Loss 4.18811 | Training Time: 0017h 42m 30s | Iteration Time: 4608.507 ms | 21331.0 tokens/sec
2024-11-05 02:00:54,566 - Iteration 13736 | Train Loss 4.19649 | Training Time: 0017h 44m 03s | Iteration Time: 4608.220 ms | 21332.3 tokens/sec
2024-11-05 02:02:26,715 - Iteration 13756 | Train Loss 4.84053 | Training Time: 0017h 45m 35s | Iteration Time: 4607.420 ms | 21336.0 tokens/sec
2024-11-05 02:03:58,888 - Iteration 13776 | Train Loss 4.07365 | Training Time: 0017h 47m 07s | Iteration Time: 4608.669 ms | 21330.2 tokens/sec
2024-11-05 02:05:31,057 - Iteration 13796 | Train Loss 4.01112 | Training Time: 0017h 48m 39s | Iteration Time: 4608.425 ms | 21331.4 tokens/sec
2024-11-05 02:07:03,253 - Iteration 13816 | Train Loss 4.34545 | Training Time: 0017h 50m 11s | Iteration Time: 4609.819 ms | 21324.9 tokens/sec
2024-11-05 02:08:35,464 - Iteration 13836 | Train Loss 4.46605 | Training Time: 0017h 51m 44s | Iteration Time: 4610.532 ms | 21321.6 tokens/sec
2024-11-05 02:10:07,668 - Iteration 13856 | Train Loss 4.33339 | Training Time: 0017h 53m 16s | Iteration Time: 4610.219 ms | 21323.1 tokens/sec
2024-11-05 02:11:39,878 - Iteration 13876 | Train Loss 4.29588 | Training Time: 0017h 54m 48s | Iteration Time: 4610.475 ms | 21321.9 tokens/sec
2024-11-05 02:13:12,076 - Iteration 13896 | Train Loss 4.27239 | Training Time: 0017h 56m 20s | Iteration Time: 4609.905 ms | 21324.5 tokens/sec
2024-11-05 02:14:44,272 - Iteration 13916 | Train Loss 4.28428 | Training Time: 0017h 57m 52s | Iteration Time: 4609.813 ms | 21324.9 tokens/sec
2024-11-05 02:16:16,472 - Iteration 13936 | Train Loss 4.21980 | Training Time: 0017h 59m 25s | Iteration Time: 4610.002 ms | 21324.1 tokens/sec
2024-11-05 02:17:48,627 - Iteration 13956 | Train Loss 4.19036 | Training Time: 0018h 00m 57s | Iteration Time: 4607.764 ms | 21334.4 tokens/sec
2024-11-05 02:19:20,806 - Iteration 13976 | Train Loss 4.28293 | Training Time: 0018h 02m 29s | Iteration Time: 4608.932 ms | 21329.0 tokens/sec
2024-11-05 02:20:53,019 - Iteration 13996 | Train Loss 4.09149 | Training Time: 0018h 04m 01s | Iteration Time: 4610.659 ms | 21321.0 tokens/sec
2024-11-05 02:22:25,268 - Iteration 14016 | Train Loss 3.99948 | Training Time: 0018h 05m 33s | Iteration Time: 4612.435 ms | 21312.8 tokens/sec
2024-11-05 02:23:57,508 - Iteration 14036 | Train Loss 4.03316 | Training Time: 0018h 07m 06s | Iteration Time: 4612.011 ms | 21314.8 tokens/sec
2024-11-05 02:24:48,234 - Iteration 14047 | Train Loss 3.92073 | Training Time: 0018h 07m 56s | Iteration Time: 4611.417 ms | 21317.5 tokens/sec
2024-11-05 02:27:04,223 - Iteration 14076 | Train Loss 4.29281 | Training Time: 0018h 10m 12s | Iteration Time: 4689.290 ms | 20963.5 tokens/sec
2024-11-05 02:28:36,451 - Iteration 14096 | Train Loss 4.24679 | Training Time: 0018h 11m 45s | Iteration Time: 4611.377 ms | 21317.7 tokens/sec
2024-11-05 02:30:08,670 - Iteration 14116 | Train Loss 4.27235 | Training Time: 0018h 13m 17s | Iteration Time: 4610.958 ms | 21319.6 tokens/sec
2024-11-05 02:31:40,873 - Iteration 14136 | Train Loss 4.25366 | Training Time: 0018h 14m 49s | Iteration Time: 4610.176 ms | 21323.3 tokens/sec
2024-11-05 02:33:13,114 - Iteration 14156 | Train Loss 4.36181 | Training Time: 0018h 16m 21s | Iteration Time: 4612.030 ms | 21314.7 tokens/sec
2024-11-05 02:34:45,342 - Iteration 14176 | Train Loss 4.22412 | Training Time: 0018h 17m 53s | Iteration Time: 4611.408 ms | 21317.6 tokens/sec
2024-11-05 02:36:17,565 - Iteration 14196 | Train Loss 4.31655 | Training Time: 0018h 19m 26s | Iteration Time: 4611.156 ms | 21318.7 tokens/sec
2024-11-05 02:37:49,783 - Iteration 14216 | Train Loss 4.17944 | Training Time: 0018h 20m 58s | Iteration Time: 4610.887 ms | 21320.0 tokens/sec
2024-11-05 02:39:22,039 - Iteration 14236 | Train Loss 4.24973 | Training Time: 0018h 22m 30s | Iteration Time: 4612.828 ms | 21311.0 tokens/sec
2024-11-05 02:40:54,274 - Iteration 14256 | Train Loss 4.06173 | Training Time: 0018h 24m 02s | Iteration Time: 4611.728 ms | 21316.1 tokens/sec
2024-11-05 02:42:26,513 - Iteration 14276 | Train Loss 4.14815 | Training Time: 0018h 25m 35s | Iteration Time: 4611.936 ms | 21315.1 tokens/sec
2024-11-05 02:43:58,767 - Iteration 14296 | Train Loss 4.05359 | Training Time: 0018h 27m 07s | Iteration Time: 4612.692 ms | 21311.6 tokens/sec
2024-11-05 02:45:31,037 - Iteration 14316 | Train Loss 4.35621 | Training Time: 0018h 28m 39s | Iteration Time: 4613.519 ms | 21307.8 tokens/sec
2024-11-05 02:47:03,265 - Iteration 14336 | Train Loss 4.39502 | Training Time: 0018h 30m 11s | Iteration Time: 4611.420 ms | 21317.5 tokens/sec
2024-11-05 02:48:35,444 - Iteration 14356 | Train Loss 4.29730 | Training Time: 0018h 31m 43s | Iteration Time: 4608.932 ms | 21329.0 tokens/sec
2024-11-05 02:50:07,620 - Iteration 14376 | Train Loss 4.35409 | Training Time: 0018h 33m 16s | Iteration Time: 4608.785 ms | 21329.7 tokens/sec
2024-11-05 02:51:39,819 - Iteration 14396 | Train Loss 4.24229 | Training Time: 0018h 34m 48s | Iteration Time: 4609.975 ms | 21324.2 tokens/sec
2024-11-05 02:53:12,039 - Iteration 14416 | Train Loss 4.18646 | Training Time: 0018h 36m 20s | Iteration Time: 4610.972 ms | 21319.6 tokens/sec
2024-11-05 02:54:44,246 - Iteration 14436 | Train Loss 4.20026 | Training Time: 0018h 37m 52s | Iteration Time: 4610.355 ms | 21322.4 tokens/sec
2024-11-05 02:56:16,432 - Iteration 14456 | Train Loss 4.16279 | Training Time: 0018h 39m 24s | Iteration Time: 4609.311 ms | 21327.3 tokens/sec
2024-11-05 02:57:48,610 - Iteration 14476 | Train Loss 4.18218 | Training Time: 0018h 40m 57s | Iteration Time: 4608.915 ms | 21329.1 tokens/sec
2024-11-05 02:59:20,757 - Iteration 14496 | Train Loss 4.12818 | Training Time: 0018h 42m 29s | Iteration Time: 4607.322 ms | 21336.5 tokens/sec
2024-11-05 03:00:52,884 - Iteration 14516 | Train Loss 4.11679 | Training Time: 0018h 44m 01s | Iteration Time: 4606.391 ms | 21340.8 tokens/sec
2024-11-05 03:02:25,031 - Iteration 14536 | Train Loss 4.05122 | Training Time: 0018h 45m 33s | Iteration Time: 4607.301 ms | 21336.6 tokens/sec
2024-11-05 03:03:57,200 - Iteration 14556 | Train Loss 4.29406 | Training Time: 0018h 47m 05s | Iteration Time: 4608.463 ms | 21331.2 tokens/sec
2024-11-05 03:05:29,410 - Iteration 14576 | Train Loss 4.37448 | Training Time: 0018h 48m 37s | Iteration Time: 4610.511 ms | 21321.7 tokens/sec
2024-11-05 03:07:01,615 - Iteration 14596 | Train Loss 4.30046 | Training Time: 0018h 50m 10s | Iteration Time: 4610.266 ms | 21322.8 tokens/sec
2024-11-05 03:08:33,770 - Iteration 14616 | Train Loss 4.22853 | Training Time: 0018h 51m 42s | Iteration Time: 4607.730 ms | 21334.6 tokens/sec
2024-11-05 03:10:05,917 - Iteration 14636 | Train Loss 4.31234 | Training Time: 0018h 53m 14s | Iteration Time: 4607.374 ms | 21336.2 tokens/sec
2024-11-05 03:11:38,064 - Iteration 14656 | Train Loss 4.17102 | Training Time: 0018h 54m 46s | Iteration Time: 4607.313 ms | 21336.5 tokens/sec
2024-11-05 03:13:10,224 - Iteration 14676 | Train Loss 4.14822 | Training Time: 0018h 56m 18s | Iteration Time: 4608.005 ms | 21333.3 tokens/sec
2024-11-05 03:14:42,384 - Iteration 14696 | Train Loss 4.16804 | Training Time: 0018h 57m 50s | Iteration Time: 4607.994 ms | 21333.4 tokens/sec
2024-11-05 03:16:14,485 - Iteration 14716 | Train Loss 4.10820 | Training Time: 0018h 59m 23s | Iteration Time: 4605.081 ms | 21346.9 tokens/sec
2024-11-05 03:17:46,612 - Iteration 14736 | Train Loss 4.15094 | Training Time: 0019h 00m 55s | Iteration Time: 4606.335 ms | 21341.0 tokens/sec
2024-11-05 03:18:46,512 - Iteration 14749 | Train Loss 3.89518 | Training Time: 0019h 01m 55s | Iteration Time: 4607.710 ms | 21334.7 tokens/sec
2024-11-05 03:20:53,139 - Iteration 14776 | Train Loss 4.06997 | Training Time: 0019h 04m 01s | Iteration Time: 4689.893 ms | 20960.8 tokens/sec
2024-11-05 03:22:25,249 - Iteration 14796 | Train Loss 4.28777 | Training Time: 0019h 05m 33s | Iteration Time: 4605.502 ms | 21344.9 tokens/sec
2024-11-05 03:23:57,356 - Iteration 14816 | Train Loss 4.36321 | Training Time: 0019h 07m 05s | Iteration Time: 4605.347 ms | 21345.6 tokens/sec
2024-11-05 03:25:29,505 - Iteration 14836 | Train Loss 4.40241 | Training Time: 0019h 08m 38s | Iteration Time: 4607.413 ms | 21336.0 tokens/sec
2024-11-05 03:27:01,634 - Iteration 14856 | Train Loss 4.15394 | Training Time: 0019h 10m 10s | Iteration Time: 4606.472 ms | 21340.4 tokens/sec
2024-11-05 03:28:33,773 - Iteration 14876 | Train Loss 4.15801 | Training Time: 0019h 11m 42s | Iteration Time: 4606.964 ms | 21338.1 tokens/sec
2024-11-05 03:29:10,631 - Iteration 14884 | Train Loss 3.83328 | Training Time: 0019h 12m 19s | Iteration Time: 4607.202 ms | 21337.0 tokens/sec
2024-11-05 03:31:40,316 - Iteration 14916 | Train Loss 4.18693 | Training Time: 0019h 14m 48s | Iteration Time: 4677.656 ms | 21015.7 tokens/sec
2024-11-05 03:33:12,461 - Iteration 14936 | Train Loss 4.22470 | Training Time: 0019h 16m 21s | Iteration Time: 4607.246 ms | 21336.8 tokens/sec
2024-11-05 03:34:44,601 - Iteration 14956 | Train Loss 4.13274 | Training Time: 0019h 17m 53s | Iteration Time: 4607.023 ms | 21337.9 tokens/sec
2024-11-05 03:36:16,717 - Iteration 14976 | Train Loss 4.17477 | Training Time: 0019h 19m 25s | Iteration Time: 4605.775 ms | 21343.6 tokens/sec
2024-11-05 03:37:48,803 - Iteration 14996 | Train Loss 4.16291 | Training Time: 0019h 20m 57s | Iteration Time: 4604.309 ms | 21350.4 tokens/sec
2024-11-05 03:39:20,916 - Iteration 15016 | Train Loss 4.02373 | Training Time: 0019h 22m 29s | Iteration Time: 4605.640 ms | 21344.3 tokens/sec
2024-11-05 03:40:53,054 - Iteration 15036 | Train Loss 4.10076 | Training Time: 0019h 24m 01s | Iteration Time: 4606.921 ms | 21338.3 tokens/sec
2024-11-05 03:42:25,182 - Iteration 15056 | Train Loss 4.27576 | Training Time: 0019h 25m 33s | Iteration Time: 4606.407 ms | 21340.7 tokens/sec
2024-11-05 03:43:57,296 - Iteration 15076 | Train Loss 4.27654 | Training Time: 0019h 27m 05s | Iteration Time: 4605.689 ms | 21344.0 tokens/sec
2024-11-05 03:45:29,450 - Iteration 15096 | Train Loss 4.32645 | Training Time: 0019h 28m 38s | Iteration Time: 4607.714 ms | 21334.7 tokens/sec
2024-11-05 03:47:01,581 - Iteration 15116 | Train Loss 4.18520 | Training Time: 0019h 30m 10s | Iteration Time: 4606.514 ms | 21340.2 tokens/sec
2024-11-05 03:48:33,730 - Iteration 15136 | Train Loss 4.33078 | Training Time: 0019h 31m 42s | Iteration Time: 4607.471 ms | 21335.8 tokens/sec
2024-11-05 03:50:05,877 - Iteration 15156 | Train Loss 4.23221 | Training Time: 0019h 33m 14s | Iteration Time: 4607.357 ms | 21336.3 tokens/sec
2024-11-05 03:51:38,027 - Iteration 15176 | Train Loss 4.19379 | Training Time: 0019h 34m 46s | Iteration Time: 4607.477 ms | 21335.8 tokens/sec
2024-11-05 03:53:10,189 - Iteration 15196 | Train Loss 4.23898 | Training Time: 0019h 36m 18s | Iteration Time: 4608.104 ms | 21332.9 tokens/sec
2024-11-05 03:54:42,298 - Iteration 15216 | Train Loss 4.16145 | Training Time: 0019h 37m 50s | Iteration Time: 4605.450 ms | 21345.1 tokens/sec
2024-11-05 03:56:14,413 - Iteration 15236 | Train Loss 4.10297 | Training Time: 0019h 39m 22s | Iteration Time: 4605.746 ms | 21343.8 tokens/sec
2024-11-05 03:57:46,531 - Iteration 15256 | Train Loss 3.99759 | Training Time: 0019h 40m 55s | Iteration Time: 4605.891 ms | 21343.1 tokens/sec
2024-11-05 03:58:04,947 - Iteration 15260 | Train Loss 3.78747 | Training Time: 0019h 41m 13s | Iteration Time: 4604.057 ms | 21351.6 tokens/sec
2024-11-05 04:00:53,002 - Iteration 15296 | Train Loss 4.13550 | Training Time: 0019h 44m 01s | Iteration Time: 4668.207 ms | 21058.2 tokens/sec
2024-11-05 04:02:25,124 - Iteration 15316 | Train Loss 4.21720 | Training Time: 0019h 45m 33s | Iteration Time: 4606.083 ms | 21342.2 tokens/sec
2024-11-05 04:03:57,239 - Iteration 15336 | Train Loss 4.25711 | Training Time: 0019h 47m 05s | Iteration Time: 4605.751 ms | 21343.8 tokens/sec
2024-11-05 04:05:29,330 - Iteration 15356 | Train Loss 4.21112 | Training Time: 0019h 48m 37s | Iteration Time: 4604.573 ms | 21349.2 tokens/sec
2024-11-05 04:07:01,487 - Iteration 15376 | Train Loss 4.12121 | Training Time: 0019h 50m 10s | Iteration Time: 4607.847 ms | 21334.0 tokens/sec
2024-11-05 04:08:33,973 - Iteration 15396 | Train Loss 4.13646 | Training Time: 0019h 51m 42s | Iteration Time: 4624.271 ms | 21258.3 tokens/sec
2024-11-05 04:10:06,634 - Iteration 15416 | Train Loss 4.20360 | Training Time: 0019h 53m 15s | Iteration Time: 4633.042 ms | 21218.0 tokens/sec
2024-11-05 04:11:39,087 - Iteration 15436 | Train Loss 4.18060 | Training Time: 0019h 54m 47s | Iteration Time: 4622.650 ms | 21265.7 tokens/sec
2024-11-05 04:13:11,364 - Iteration 15456 | Train Loss 4.14809 | Training Time: 0019h 56m 19s | Iteration Time: 4613.856 ms | 21306.3 tokens/sec
2024-11-05 04:14:43,578 - Iteration 15476 | Train Loss 4.11586 | Training Time: 0019h 57m 52s | Iteration Time: 4610.721 ms | 21320.7 tokens/sec
2024-11-05 04:16:15,768 - Iteration 15496 | Train Loss 4.06284 | Training Time: 0019h 59m 24s | Iteration Time: 4609.490 ms | 21326.4 tokens/sec
2024-11-05 04:17:47,938 - Iteration 15516 | Train Loss 4.00483 | Training Time: 0020h 00m 56s | Iteration Time: 4608.529 ms | 21330.9 tokens/sec
2024-11-05 04:19:20,092 - Iteration 15536 | Train Loss 3.94866 | Training Time: 0020h 02m 28s | Iteration Time: 4607.658 ms | 21334.9 tokens/sec
2024-11-05 04:20:52,235 - Iteration 15556 | Train Loss 4.10919 | Training Time: 0020h 04m 00s | Iteration Time: 4607.173 ms | 21337.2 tokens/sec
2024-11-05 04:22:24,377 - Iteration 15576 | Train Loss 4.15486 | Training Time: 0020h 05m 32s | Iteration Time: 4607.098 ms | 21337.5 tokens/sec
2024-11-05 04:23:56,526 - Iteration 15596 | Train Loss 4.09335 | Training Time: 0020h 07m 05s | Iteration Time: 4607.459 ms | 21335.8 tokens/sec
2024-11-05 04:25:28,670 - Iteration 15616 | Train Loss 4.15590 | Training Time: 0020h 08m 37s | Iteration Time: 4607.211 ms | 21337.0 tokens/sec
2024-11-05 04:27:00,807 - Iteration 15636 | Train Loss 4.28761 | Training Time: 0020h 10m 09s | Iteration Time: 4606.817 ms | 21338.8 tokens/sec
2024-11-05 04:28:32,941 - Iteration 15656 | Train Loss 4.25158 | Training Time: 0020h 11m 41s | Iteration Time: 4606.713 ms | 21339.3 tokens/sec
2024-11-05 04:30:05,094 - Iteration 15676 | Train Loss 4.13233 | Training Time: 0020h 13m 13s | Iteration Time: 4607.641 ms | 21335.0 tokens/sec
2024-11-05 04:31:37,267 - Iteration 15696 | Train Loss 4.20632 | Training Time: 0020h 14m 45s | Iteration Time: 4608.635 ms | 21330.4 tokens/sec
2024-11-05 04:33:09,411 - Iteration 15716 | Train Loss 4.06068 | Training Time: 0020h 16m 17s | Iteration Time: 4607.223 ms | 21336.9 tokens/sec
2024-11-05 04:34:41,732 - Iteration 15736 | Train Loss 4.01041 | Training Time: 0020h 17m 50s | Iteration Time: 4616.055 ms | 21296.1 tokens/sec
2024-11-05 04:36:14,276 - Iteration 15756 | Train Loss 4.03563 | Training Time: 0020h 19m 22s | Iteration Time: 4627.210 ms | 21244.8 tokens/sec
2024-11-05 04:37:46,611 - Iteration 15776 | Train Loss 4.02449 | Training Time: 0020h 20m 55s | Iteration Time: 4616.718 ms | 21293.0 tokens/sec
2024-11-05 04:39:18,841 - Iteration 15796 | Train Loss 4.18477 | Training Time: 0020h 22m 27s | Iteration Time: 4611.519 ms | 21317.1 tokens/sec
2024-11-05 04:40:51,028 - Iteration 15816 | Train Loss 4.13437 | Training Time: 0020h 23m 59s | Iteration Time: 4609.369 ms | 21327.0 tokens/sec
2024-11-05 04:42:23,200 - Iteration 15836 | Train Loss 4.23764 | Training Time: 0020h 25m 31s | Iteration Time: 4608.571 ms | 21330.7 tokens/sec
2024-11-05 04:43:55,336 - Iteration 15856 | Train Loss 4.19487 | Training Time: 0020h 27m 03s | Iteration Time: 4606.815 ms | 21338.8 tokens/sec
2024-11-05 04:45:27,467 - Iteration 15876 | Train Loss 4.30838 | Training Time: 0020h 28m 36s | Iteration Time: 4606.529 ms | 21340.1 tokens/sec
2024-11-05 04:46:59,624 - Iteration 15896 | Train Loss 4.22612 | Training Time: 0020h 30m 08s | Iteration Time: 4607.854 ms | 21334.0 tokens/sec
2024-11-05 04:48:31,749 - Iteration 15916 | Train Loss 4.21903 | Training Time: 0020h 31m 40s | Iteration Time: 4606.240 ms | 21341.5 tokens/sec
2024-11-05 04:50:03,878 - Iteration 15936 | Train Loss 4.10871 | Training Time: 0020h 33m 12s | Iteration Time: 4606.449 ms | 21340.5 tokens/sec
2024-11-05 04:51:35,997 - Iteration 15956 | Train Loss 4.09572 | Training Time: 0020h 34m 44s | Iteration Time: 4605.981 ms | 21342.7 tokens/sec
2024-11-05 04:53:08,134 - Iteration 15976 | Train Loss 3.92706 | Training Time: 0020h 36m 16s | Iteration Time: 4606.839 ms | 21338.7 tokens/sec
2024-11-05 04:54:40,270 - Iteration 15996 | Train Loss 4.00068 | Training Time: 0020h 37m 48s | Iteration Time: 4606.780 ms | 21339.0 tokens/sec
2024-11-05 04:56:12,428 - Iteration 16016 | Train Loss 4.05845 | Training Time: 0020h 39m 20s | Iteration Time: 4607.932 ms | 21333.6 tokens/sec
2024-11-05 04:57:44,574 - Iteration 16036 | Train Loss 3.94148 | Training Time: 0020h 40m 53s | Iteration Time: 4607.269 ms | 21336.7 tokens/sec
2024-11-05 04:59:16,759 - Iteration 16056 | Train Loss 4.27395 | Training Time: 0020h 42m 25s | Iteration Time: 4609.254 ms | 21327.5 tokens/sec
2024-11-05 05:00:48,938 - Iteration 16076 | Train Loss 4.27592 | Training Time: 0020h 43m 57s | Iteration Time: 4608.958 ms | 21328.9 tokens/sec
2024-11-05 05:02:21,110 - Iteration 16096 | Train Loss 4.22468 | Training Time: 0020h 45m 29s | Iteration Time: 4608.589 ms | 21330.6 tokens/sec
2024-11-05 05:03:53,293 - Iteration 16116 | Train Loss 4.33491 | Training Time: 0020h 47m 01s | Iteration Time: 4609.144 ms | 21328.0 tokens/sec
2024-11-05 05:05:25,480 - Iteration 16136 | Train Loss 4.18263 | Training Time: 0020h 48m 34s | Iteration Time: 4609.359 ms | 21327.0 tokens/sec
2024-11-05 05:06:57,661 - Iteration 16156 | Train Loss 4.10191 | Training Time: 0020h 50m 06s | Iteration Time: 4609.081 ms | 21328.3 tokens/sec
2024-11-05 05:08:29,830 - Iteration 16176 | Train Loss 4.25639 | Training Time: 0020h 51m 38s | Iteration Time: 4608.429 ms | 21331.3 tokens/sec
2024-11-05 05:10:02,005 - Iteration 16196 | Train Loss 4.03819 | Training Time: 0020h 53m 10s | Iteration Time: 4608.767 ms | 21329.8 tokens/sec
2024-11-05 05:11:34,197 - Iteration 16216 | Train Loss 4.10713 | Training Time: 0020h 54m 42s | Iteration Time: 4609.571 ms | 21326.1 tokens/sec
2024-11-05 05:13:06,358 - Iteration 16236 | Train Loss 3.81637 | Training Time: 0020h 56m 14s | Iteration Time: 4608.053 ms | 21333.1 tokens/sec
2024-11-05 05:14:38,537 - Iteration 16256 | Train Loss 3.90393 | Training Time: 0020h 57m 47s | Iteration Time: 4608.976 ms | 21328.8 tokens/sec
2024-11-05 05:16:10,746 - Iteration 16276 | Train Loss 3.95760 | Training Time: 0020h 59m 19s | Iteration Time: 4610.435 ms | 21322.1 tokens/sec
2024-11-05 05:17:42,948 - Iteration 16296 | Train Loss 4.31130 | Training Time: 0021h 00m 51s | Iteration Time: 4610.102 ms | 21323.6 tokens/sec
2024-11-05 05:19:15,242 - Iteration 16316 | Train Loss 4.27415 | Training Time: 0021h 02m 23s | Iteration Time: 4614.707 ms | 21302.3 tokens/sec
2024-11-05 05:20:47,732 - Iteration 16336 | Train Loss 4.13245 | Training Time: 0021h 03m 56s | Iteration Time: 4624.494 ms | 21257.2 tokens/sec
2024-11-05 05:22:20,049 - Iteration 16356 | Train Loss 4.23553 | Training Time: 0021h 05m 28s | Iteration Time: 4615.847 ms | 21297.1 tokens/sec
2024-11-05 05:23:52,306 - Iteration 16376 | Train Loss 4.14559 | Training Time: 0021h 07m 00s | Iteration Time: 4612.827 ms | 21311.0 tokens/sec
2024-11-05 05:25:24,554 - Iteration 16396 | Train Loss 4.20017 | Training Time: 0021h 08m 33s | Iteration Time: 4612.431 ms | 21312.8 tokens/sec
2024-11-05 05:26:56,773 - Iteration 16416 | Train Loss 4.14183 | Training Time: 0021h 10m 05s | Iteration Time: 4610.949 ms | 21319.7 tokens/sec
2024-11-05 05:28:28,996 - Iteration 16436 | Train Loss 4.00033 | Training Time: 0021h 11m 37s | Iteration Time: 4611.152 ms | 21318.8 tokens/sec
2024-11-05 05:30:01,486 - Iteration 16456 | Train Loss 4.09041 | Training Time: 0021h 13m 10s | Iteration Time: 4624.487 ms | 21257.3 tokens/sec
2024-11-05 05:30:57,083 - Iteration 16468 | Train Loss 3.75472 | Training Time: 0021h 14m 05s | Iteration Time: 4633.113 ms | 21217.7 tokens/sec
2024-11-05 05:33:08,337 - Iteration 16496 | Train Loss 3.83077 | Training Time: 0021h 16m 16s | Iteration Time: 4687.622 ms | 20971.0 tokens/sec
2024-11-05 05:34:40,928 - Iteration 16516 | Train Loss 3.84932 | Training Time: 0021h 17m 49s | Iteration Time: 4629.589 ms | 21233.9 tokens/sec
2024-11-05 05:36:13,559 - Iteration 16536 | Train Loss 4.15805 | Training Time: 0021h 19m 22s | Iteration Time: 4631.528 ms | 21225.0 tokens/sec
2024-11-05 05:37:46,162 - Iteration 16556 | Train Loss 4.13169 | Training Time: 0021h 20m 54s | Iteration Time: 4630.138 ms | 21231.3 tokens/sec
2024-11-05 05:39:18,459 - Iteration 16576 | Train Loss 4.14312 | Training Time: 0021h 22m 27s | Iteration Time: 4614.849 ms | 21301.7 tokens/sec
2024-11-05 05:40:50,624 - Iteration 16596 | Train Loss 4.10237 | Training Time: 0021h 23m 59s | Iteration Time: 4608.242 ms | 21332.2 tokens/sec
2024-11-05 05:42:22,728 - Iteration 16616 | Train Loss 4.21951 | Training Time: 0021h 25m 31s | Iteration Time: 4605.228 ms | 21346.2 tokens/sec
2024-11-05 05:43:54,788 - Iteration 16636 | Train Loss 4.21907 | Training Time: 0021h 27m 03s | Iteration Time: 4602.998 ms | 21356.5 tokens/sec
2024-11-05 05:45:26,861 - Iteration 16656 | Train Loss 4.11901 | Training Time: 0021h 28m 35s | Iteration Time: 4603.666 ms | 21353.4 tokens/sec
2024-11-05 05:46:58,953 - Iteration 16676 | Train Loss 3.96477 | Training Time: 0021h 30m 07s | Iteration Time: 4604.589 ms | 21349.1 tokens/sec
2024-11-05 05:48:31,306 - Iteration 16696 | Train Loss 4.09555 | Training Time: 0021h 31m 39s | Iteration Time: 4617.616 ms | 21288.9 tokens/sec
2024-11-05 05:50:03,763 - Iteration 16716 | Train Loss 4.25997 | Training Time: 0021h 33m 12s | Iteration Time: 4622.865 ms | 21264.7 tokens/sec
2024-11-05 05:51:36,290 - Iteration 16736 | Train Loss 3.97293 | Training Time: 0021h 34m 44s | Iteration Time: 4626.355 ms | 21248.7 tokens/sec
2024-11-05 05:52:13,333 - Iteration 16744 | Train Loss 3.70857 | Training Time: 0021h 35m 21s | Iteration Time: 4630.402 ms | 21230.1 tokens/sec
2024-11-05 05:54:43,917 - Iteration 16776 | Train Loss 3.79882 | Training Time: 0021h 37m 52s | Iteration Time: 4705.731 ms | 20890.3 tokens/sec
2024-11-05 05:56:16,847 - Iteration 16796 | Train Loss 4.06512 | Training Time: 0021h 39m 25s | Iteration Time: 4646.510 ms | 21156.5 tokens/sec
2024-11-05 05:57:49,494 - Iteration 16816 | Train Loss 4.27025 | Training Time: 0021h 40m 58s | Iteration Time: 4632.383 ms | 21221.0 tokens/sec
2024-11-05 05:59:22,051 - Iteration 16836 | Train Loss 4.18865 | Training Time: 0021h 42m 30s | Iteration Time: 4627.852 ms | 21241.8 tokens/sec
2024-11-05 06:00:54,586 - Iteration 16856 | Train Loss 4.24084 | Training Time: 0021h 44m 03s | Iteration Time: 4626.731 ms | 21247.0 tokens/sec
2024-11-05 06:02:27,119 - Iteration 16876 | Train Loss 4.16287 | Training Time: 0021h 45m 35s | Iteration Time: 4626.645 ms | 21247.4 tokens/sec
2024-11-05 06:03:59,601 - Iteration 16896 | Train Loss 4.04565 | Training Time: 0021h 47m 08s | Iteration Time: 4624.080 ms | 21259.1 tokens/sec
2024-11-05 06:05:32,080 - Iteration 16916 | Train Loss 4.12473 | Training Time: 0021h 48m 40s | Iteration Time: 4623.966 ms | 21259.7 tokens/sec
2024-11-05 06:07:04,542 - Iteration 16936 | Train Loss 4.00535 | Training Time: 0021h 50m 13s | Iteration Time: 4623.125 ms | 21263.5 tokens/sec
2024-11-05 06:08:36,968 - Iteration 16956 | Train Loss 4.09787 | Training Time: 0021h 51m 45s | Iteration Time: 4621.256 ms | 21272.1 tokens/sec
2024-11-05 06:10:09,388 - Iteration 16976 | Train Loss 3.87384 | Training Time: 0021h 53m 17s | Iteration Time: 4621.035 ms | 21273.2 tokens/sec
2024-11-05 06:11:41,838 - Iteration 16996 | Train Loss 3.98313 | Training Time: 0021h 54m 50s | Iteration Time: 4622.502 ms | 21266.4 tokens/sec
2024-11-05 06:13:14,285 - Iteration 17016 | Train Loss 3.95209 | Training Time: 0021h 56m 22s | Iteration Time: 4622.318 ms | 21267.2 tokens/sec
2024-11-05 06:14:46,691 - Iteration 17036 | Train Loss 4.06933 | Training Time: 0021h 57m 55s | Iteration Time: 4620.313 ms | 21276.5 tokens/sec
2024-11-05 06:16:19,087 - Iteration 17056 | Train Loss 4.14184 | Training Time: 0021h 59m 27s | Iteration Time: 4619.806 ms | 21278.8 tokens/sec
2024-11-05 06:17:51,496 - Iteration 17076 | Train Loss 4.13572 | Training Time: 0022h 01m 00s | Iteration Time: 4620.431 ms | 21275.9 tokens/sec
2024-11-05 06:19:23,893 - Iteration 17096 | Train Loss 4.09828 | Training Time: 0022h 02m 32s | Iteration Time: 4619.891 ms | 21278.4 tokens/sec
2024-11-05 06:20:56,283 - Iteration 17116 | Train Loss 4.16290 | Training Time: 0022h 04m 04s | Iteration Time: 4619.454 ms | 21280.4 tokens/sec
2024-11-05 06:22:28,668 - Iteration 17136 | Train Loss 4.09134 | Training Time: 0022h 05m 37s | Iteration Time: 4619.254 ms | 21281.4 tokens/sec
2024-11-05 06:24:01,088 - Iteration 17156 | Train Loss 4.06790 | Training Time: 0022h 07m 09s | Iteration Time: 4621.001 ms | 21273.3 tokens/sec
2024-11-05 06:25:33,476 - Iteration 17176 | Train Loss 3.99681 | Training Time: 0022h 08m 42s | Iteration Time: 4619.420 ms | 21280.6 tokens/sec
2024-11-05 06:27:05,865 - Iteration 17196 | Train Loss 4.08779 | Training Time: 0022h 10m 14s | Iteration Time: 4619.432 ms | 21280.5 tokens/sec
2024-11-05 06:28:38,243 - Iteration 17216 | Train Loss 3.85394 | Training Time: 0022h 11m 46s | Iteration Time: 4618.897 ms | 21283.0 tokens/sec
2024-11-05 06:30:10,635 - Iteration 17236 | Train Loss 3.87134 | Training Time: 0022h 13m 19s | Iteration Time: 4619.615 ms | 21279.7 tokens/sec
2024-11-05 06:31:43,016 - Iteration 17256 | Train Loss 4.03356 | Training Time: 0022h 14m 51s | Iteration Time: 4619.062 ms | 21282.2 tokens/sec
2024-11-05 06:33:15,403 - Iteration 17276 | Train Loss 4.11391 | Training Time: 0022h 16m 23s | Iteration Time: 4619.348 ms | 21280.9 tokens/sec
2024-11-05 06:34:47,808 - Iteration 17296 | Train Loss 4.11749 | Training Time: 0022h 17m 56s | Iteration Time: 4620.239 ms | 21276.8 tokens/sec
2024-11-05 06:36:20,222 - Iteration 17316 | Train Loss 4.04218 | Training Time: 0022h 19m 28s | Iteration Time: 4620.713 ms | 21274.6 tokens/sec
2024-11-05 06:37:52,605 - Iteration 17336 | Train Loss 4.25511 | Training Time: 0022h 21m 01s | Iteration Time: 4619.145 ms | 21281.9 tokens/sec
2024-11-05 06:39:25,032 - Iteration 17356 | Train Loss 4.15134 | Training Time: 0022h 22m 33s | Iteration Time: 4621.359 ms | 21271.7 tokens/sec
2024-11-05 06:40:57,447 - Iteration 17376 | Train Loss 4.17330 | Training Time: 0022h 24m 06s | Iteration Time: 4620.742 ms | 21274.5 tokens/sec
2024-11-05 06:42:29,833 - Iteration 17396 | Train Loss 4.11392 | Training Time: 0022h 25m 38s | Iteration Time: 4619.293 ms | 21281.2 tokens/sec
2024-11-05 06:44:02,253 - Iteration 17416 | Train Loss 3.98849 | Training Time: 0022h 27m 10s | Iteration Time: 4620.982 ms | 21273.4 tokens/sec
2024-11-05 06:45:34,794 - Iteration 17436 | Train Loss 3.98018 | Training Time: 0022h 28m 43s | Iteration Time: 4627.060 ms | 21245.5 tokens/sec
2024-11-05 06:47:07,286 - Iteration 17456 | Train Loss 4.10755 | Training Time: 0022h 30m 15s | Iteration Time: 4624.622 ms | 21256.7 tokens/sec
2024-11-05 06:48:39,739 - Iteration 17476 | Train Loss 3.86481 | Training Time: 0022h 31m 48s | Iteration Time: 4622.654 ms | 21265.7 tokens/sec
2024-11-05 06:50:12,185 - Iteration 17496 | Train Loss 3.78142 | Training Time: 0022h 33m 20s | Iteration Time: 4622.271 ms | 21267.5 tokens/sec
2024-11-05 06:51:44,601 - Iteration 17516 | Train Loss 3.99621 | Training Time: 0022h 34m 53s | Iteration Time: 4620.798 ms | 21274.2 tokens/sec
2024-11-05 06:53:17,047 - Iteration 17536 | Train Loss 4.06149 | Training Time: 0022h 36m 25s | Iteration Time: 4622.294 ms | 21267.4 tokens/sec
2024-11-05 06:54:49,498 - Iteration 17556 | Train Loss 4.08477 | Training Time: 0022h 37m 58s | Iteration Time: 4622.554 ms | 21266.2 tokens/sec
2024-11-05 06:56:21,941 - Iteration 17576 | Train Loss 4.27962 | Training Time: 0022h 39m 30s | Iteration Time: 4622.167 ms | 21267.9 tokens/sec
2024-11-05 06:57:54,356 - Iteration 17596 | Train Loss 4.09486 | Training Time: 0022h 41m 02s | Iteration Time: 4620.729 ms | 21274.6 tokens/sec
2024-11-05 06:59:26,768 - Iteration 17616 | Train Loss 4.03998 | Training Time: 0022h 42m 35s | Iteration Time: 4620.644 ms | 21275.0 tokens/sec
2024-11-05 07:00:59,182 - Iteration 17636 | Train Loss 4.07648 | Training Time: 0022h 44m 07s | Iteration Time: 4620.670 ms | 21274.8 tokens/sec
2024-11-05 07:02:31,615 - Iteration 17656 | Train Loss 4.05961 | Training Time: 0022h 45m 40s | Iteration Time: 4621.648 ms | 21270.3 tokens/sec
2024-11-05 07:04:04,021 - Iteration 17676 | Train Loss 3.98268 | Training Time: 0022h 47m 12s | Iteration Time: 4620.292 ms | 21276.6 tokens/sec
2024-11-05 07:05:36,424 - Iteration 17696 | Train Loss 4.05312 | Training Time: 0022h 48m 44s | Iteration Time: 4620.180 ms | 21277.1 tokens/sec
2024-11-05 07:07:08,792 - Iteration 17716 | Train Loss 4.02879 | Training Time: 0022h 50m 17s | Iteration Time: 4618.379 ms | 21285.4 tokens/sec
2024-11-05 07:07:45,758 - Iteration 17724 | Train Loss 3.63689 | Training Time: 0022h 50m 54s | Iteration Time: 4620.768 ms | 21274.4 tokens/sec
2024-11-05 07:20:24,761 - Iteration 17744 | Train Loss 4.11181 | Training Time: 0022h 52m 59s | Iteration Time: 7.080 ms | 13884357.0 tokens/sec
2024-11-05 07:21:56,200 - Iteration 17764 | Train Loss 4.17475 | Training Time: 0022h 54m 31s | Iteration Time: 4571.934 ms | 21501.6 tokens/sec
2024-11-05 07:23:27,802 - Iteration 17784 | Train Loss 4.05039 | Training Time: 0022h 56m 02s | Iteration Time: 4580.113 ms | 21463.2 tokens/sec
2024-11-05 07:24:59,499 - Iteration 17804 | Train Loss 3.85771 | Training Time: 0022h 57m 34s | Iteration Time: 4584.827 ms | 21441.2 tokens/sec
2024-11-05 07:26:31,278 - Iteration 17824 | Train Loss 4.05425 | Training Time: 0022h 59m 06s | Iteration Time: 4588.990 ms | 21421.7 tokens/sec
2024-11-05 07:27:35,536 - Iteration 17838 | Train Loss 3.59442 | Training Time: 0023h 00m 10s | Iteration Time: 4589.813 ms | 21417.9 tokens/sec
2024-11-05 07:29:37,298 - Iteration 17864 | Train Loss 3.71731 | Training Time: 0023h 02m 12s | Iteration Time: 4683.172 ms | 20990.9 tokens/sec
2024-11-05 07:31:09,447 - Iteration 17884 | Train Loss 3.82997 | Training Time: 0023h 03m 44s | Iteration Time: 4607.446 ms | 21335.9 tokens/sec
2024-11-05 07:32:41,574 - Iteration 17904 | Train Loss 4.10534 | Training Time: 0023h 05m 16s | Iteration Time: 4606.343 ms | 21341.0 tokens/sec
2024-11-05 07:34:13,650 - Iteration 17924 | Train Loss 3.98514 | Training Time: 0023h 06m 48s | Iteration Time: 4603.790 ms | 21352.8 tokens/sec
2024-11-05 07:35:45,722 - Iteration 17944 | Train Loss 4.11511 | Training Time: 0023h 08m 20s | Iteration Time: 4603.621 ms | 21353.6 tokens/sec
2024-11-05 07:37:17,780 - Iteration 17964 | Train Loss 4.01119 | Training Time: 0023h 09m 52s | Iteration Time: 4602.910 ms | 21356.9 tokens/sec
2024-11-05 07:38:49,840 - Iteration 17984 | Train Loss 4.08821 | Training Time: 0023h 11m 25s | Iteration Time: 4602.976 ms | 21356.6 tokens/sec
2024-11-05 07:40:21,915 - Iteration 18004 | Train Loss 3.93922 | Training Time: 0023h 12m 57s | Iteration Time: 4603.736 ms | 21353.1 tokens/sec
2024-11-05 07:41:54,041 - Iteration 18024 | Train Loss 3.94489 | Training Time: 0023h 14m 29s | Iteration Time: 4606.335 ms | 21341.0 tokens/sec
2024-11-05 07:43:26,199 - Iteration 18044 | Train Loss 3.90495 | Training Time: 0023h 16m 01s | Iteration Time: 4607.884 ms | 21333.9 tokens/sec
2024-11-05 07:44:58,347 - Iteration 18064 | Train Loss 3.95347 | Training Time: 0023h 17m 33s | Iteration Time: 4607.382 ms | 21336.2 tokens/sec
2024-11-05 07:46:30,478 - Iteration 18084 | Train Loss 3.76804 | Training Time: 0023h 19m 05s | Iteration Time: 4606.548 ms | 21340.1 tokens/sec
2024-11-05 07:48:02,594 - Iteration 18104 | Train Loss 3.83696 | Training Time: 0023h 20m 37s | Iteration Time: 4605.823 ms | 21343.4 tokens/sec
2024-11-05 07:49:34,703 - Iteration 18124 | Train Loss 3.86911 | Training Time: 0023h 22m 09s | Iteration Time: 4605.472 ms | 21345.0 tokens/sec
2024-11-05 07:51:06,842 - Iteration 18144 | Train Loss 4.11860 | Training Time: 0023h 23m 42s | Iteration Time: 4606.950 ms | 21338.2 tokens/sec
2024-11-05 07:52:38,994 - Iteration 18164 | Train Loss 4.18225 | Training Time: 0023h 25m 14s | Iteration Time: 4607.572 ms | 21335.3 tokens/sec
2024-11-05 07:54:11,126 - Iteration 18184 | Train Loss 4.00303 | Training Time: 0023h 26m 46s | Iteration Time: 4606.626 ms | 21339.7 tokens/sec
2024-11-05 07:55:43,213 - Iteration 18204 | Train Loss 3.97924 | Training Time: 0023h 28m 18s | Iteration Time: 4604.342 ms | 21350.3 tokens/sec
2024-11-05 07:57:15,310 - Iteration 18224 | Train Loss 3.97760 | Training Time: 0023h 29m 50s | Iteration Time: 4604.818 ms | 21348.1 tokens/sec
2024-11-05 07:58:47,406 - Iteration 18244 | Train Loss 4.07537 | Training Time: 0023h 31m 22s | Iteration Time: 4604.837 ms | 21348.0 tokens/sec
2024-11-05 08:00:19,522 - Iteration 18264 | Train Loss 3.97310 | Training Time: 0023h 32m 54s | Iteration Time: 4605.785 ms | 21343.6 tokens/sec
2024-11-05 08:01:51,608 - Iteration 18284 | Train Loss 3.92966 | Training Time: 0023h 34m 26s | Iteration Time: 4604.310 ms | 21350.4 tokens/sec
2024-11-05 08:03:23,691 - Iteration 18304 | Train Loss 3.88823 | Training Time: 0023h 35m 58s | Iteration Time: 4604.118 ms | 21351.3 tokens/sec
2024-11-05 08:04:55,756 - Iteration 18324 | Train Loss 3.99268 | Training Time: 0023h 37m 30s | Iteration Time: 4603.253 ms | 21355.3 tokens/sec
2024-11-05 08:06:27,825 - Iteration 18344 | Train Loss 3.77134 | Training Time: 0023h 39m 03s | Iteration Time: 4603.456 ms | 21354.4 tokens/sec
2024-11-05 08:07:59,901 - Iteration 18364 | Train Loss 3.74943 | Training Time: 0023h 40m 35s | Iteration Time: 4603.788 ms | 21352.9 tokens/sec
2024-11-05 08:09:31,986 - Iteration 18384 | Train Loss 3.78650 | Training Time: 0023h 42m 07s | Iteration Time: 4604.264 ms | 21350.6 tokens/sec
2024-11-05 08:11:04,055 - Iteration 18404 | Train Loss 3.90416 | Training Time: 0023h 43m 39s | Iteration Time: 4603.442 ms | 21354.5 tokens/sec
2024-11-05 08:12:36,152 - Iteration 18424 | Train Loss 4.02696 | Training Time: 0023h 45m 11s | Iteration Time: 4604.858 ms | 21347.9 tokens/sec
2024-11-05 08:14:08,254 - Iteration 18444 | Train Loss 4.07418 | Training Time: 0023h 46m 43s | Iteration Time: 4605.109 ms | 21346.7 tokens/sec
2024-11-05 08:15:40,355 - Iteration 18464 | Train Loss 4.05785 | Training Time: 0023h 48m 15s | Iteration Time: 4605.028 ms | 21347.1 tokens/sec
2024-11-05 08:17:12,449 - Iteration 18484 | Train Loss 4.03317 | Training Time: 0023h 49m 47s | Iteration Time: 4604.738 ms | 21348.4 tokens/sec
2024-11-05 08:18:44,540 - Iteration 18504 | Train Loss 3.95439 | Training Time: 0023h 51m 19s | Iteration Time: 4604.524 ms | 21349.4 tokens/sec
2024-11-05 08:20:16,645 - Iteration 18524 | Train Loss 3.96452 | Training Time: 0023h 52m 51s | Iteration Time: 4605.273 ms | 21346.0 tokens/sec
2024-11-05 08:21:48,766 - Iteration 18544 | Train Loss 4.12465 | Training Time: 0023h 54m 23s | Iteration Time: 4606.058 ms | 21342.3 tokens/sec
2024-11-05 08:23:20,884 - Iteration 18564 | Train Loss 3.89874 | Training Time: 0023h 55m 56s | Iteration Time: 4605.862 ms | 21343.2 tokens/sec
2024-11-05 08:24:53,004 - Iteration 18584 | Train Loss 3.87629 | Training Time: 0023h 57m 28s | Iteration Time: 4606.020 ms | 21342.5 tokens/sec
2024-11-05 08:25:16,029 - Iteration 18589 | Train Loss 3.32134 | Training Time: 0023h 57m 51s | Iteration Time: 4605.047 ms | 21347.0 tokens/sec
2024-11-05 08:27:59,465 - Iteration 18624 | Train Loss 3.85846 | Training Time: 0024h 00m 34s | Iteration Time: 4669.600 ms | 21051.9 tokens/sec
2024-11-05 08:29:31,578 - Iteration 18644 | Train Loss 3.81684 | Training Time: 0024h 02m 06s | Iteration Time: 4605.638 ms | 21344.3 tokens/sec
2024-11-05 08:31:03,663 - Iteration 18664 | Train Loss 4.05455 | Training Time: 0024h 03m 38s | Iteration Time: 4604.252 ms | 21350.7 tokens/sec
2024-11-05 08:32:35,747 - Iteration 18684 | Train Loss 4.01129 | Training Time: 0024h 05m 10s | Iteration Time: 4604.204 ms | 21350.9 tokens/sec
2024-11-05 08:34:07,871 - Iteration 18704 | Train Loss 4.05370 | Training Time: 0024h 06m 43s | Iteration Time: 4606.182 ms | 21341.8 tokens/sec
2024-11-05 08:35:39,969 - Iteration 18724 | Train Loss 4.11974 | Training Time: 0024h 08m 15s | Iteration Time: 4604.921 ms | 21347.6 tokens/sec
2024-11-05 08:37:12,091 - Iteration 18744 | Train Loss 4.07326 | Training Time: 0024h 09m 47s | Iteration Time: 4606.092 ms | 21342.2 tokens/sec
2024-11-05 08:38:44,228 - Iteration 18764 | Train Loss 4.00503 | Training Time: 0024h 11m 19s | Iteration Time: 4606.830 ms | 21338.8 tokens/sec
2024-11-05 08:40:16,345 - Iteration 18784 | Train Loss 3.99119 | Training Time: 0024h 12m 51s | Iteration Time: 4605.877 ms | 21343.2 tokens/sec
2024-11-05 08:41:48,438 - Iteration 18804 | Train Loss 4.01101 | Training Time: 0024h 14m 23s | Iteration Time: 4604.655 ms | 21348.8 tokens/sec
2024-11-05 08:43:20,544 - Iteration 18824 | Train Loss 4.06077 | Training Time: 0024h 15m 55s | Iteration Time: 4605.289 ms | 21345.9 tokens/sec
2024-11-05 08:44:52,671 - Iteration 18844 | Train Loss 3.84400 | Training Time: 0024h 17m 27s | Iteration Time: 4606.354 ms | 21341.0 tokens/sec
2024-11-05 08:46:24,780 - Iteration 18864 | Train Loss 3.80663 | Training Time: 0024h 18m 59s | Iteration Time: 4605.456 ms | 21345.1 tokens/sec
2024-11-05 08:47:56,860 - Iteration 18884 | Train Loss 3.92446 | Training Time: 0024h 20m 32s | Iteration Time: 4603.993 ms | 21351.9 tokens/sec
2024-11-05 08:49:28,922 - Iteration 18904 | Train Loss 3.86998 | Training Time: 0024h 22m 04s | Iteration Time: 4603.112 ms | 21356.0 tokens/sec
2024-11-05 08:51:01,024 - Iteration 18924 | Train Loss 4.09169 | Training Time: 0024h 23m 36s | Iteration Time: 4605.083 ms | 21346.8 tokens/sec
2024-11-05 08:52:33,112 - Iteration 18944 | Train Loss 4.08322 | Training Time: 0024h 25m 08s | Iteration Time: 4604.401 ms | 21350.0 tokens/sec
2024-11-05 08:54:05,204 - Iteration 18964 | Train Loss 4.10643 | Training Time: 0024h 26m 40s | Iteration Time: 4604.603 ms | 21349.1 tokens/sec
2024-11-05 08:55:37,323 - Iteration 18984 | Train Loss 4.13052 | Training Time: 0024h 28m 12s | Iteration Time: 4605.937 ms | 21342.9 tokens/sec
2024-11-05 08:57:09,400 - Iteration 19004 | Train Loss 4.12418 | Training Time: 0024h 29m 44s | Iteration Time: 4603.861 ms | 21352.5 tokens/sec
2024-11-05 08:58:41,500 - Iteration 19024 | Train Loss 3.96564 | Training Time: 0024h 31m 16s | Iteration Time: 4604.972 ms | 21347.4 tokens/sec
2024-11-05 09:00:13,627 - Iteration 19044 | Train Loss 4.02525 | Training Time: 0024h 32m 48s | Iteration Time: 4606.379 ms | 21340.8 tokens/sec
2024-11-05 09:01:45,730 - Iteration 19064 | Train Loss 4.03526 | Training Time: 0024h 34m 20s | Iteration Time: 4605.122 ms | 21346.7 tokens/sec
2024-11-05 09:03:17,856 - Iteration 19084 | Train Loss 4.00020 | Training Time: 0024h 35m 53s | Iteration Time: 4606.302 ms | 21341.2 tokens/sec
2024-11-05 09:04:49,950 - Iteration 19104 | Train Loss 3.81722 | Training Time: 0024h 37m 25s | Iteration Time: 4604.716 ms | 21348.5 tokens/sec
2024-11-05 09:06:22,034 - Iteration 19124 | Train Loss 3.85388 | Training Time: 0024h 38m 57s | Iteration Time: 4604.198 ms | 21350.9 tokens/sec
2024-11-05 09:07:54,148 - Iteration 19144 | Train Loss 3.81628 | Training Time: 0024h 40m 29s | Iteration Time: 4605.716 ms | 21343.9 tokens/sec
2024-11-05 09:09:26,249 - Iteration 19164 | Train Loss 4.08026 | Training Time: 0024h 42m 01s | Iteration Time: 4605.024 ms | 21347.1 tokens/sec
2024-11-05 09:10:58,335 - Iteration 19184 | Train Loss 4.03060 | Training Time: 0024h 43m 33s | Iteration Time: 4604.314 ms | 21350.4 tokens/sec
2024-11-05 09:12:30,435 - Iteration 19204 | Train Loss 4.12212 | Training Time: 0024h 45m 05s | Iteration Time: 4604.980 ms | 21347.3 tokens/sec
2024-11-05 09:14:02,516 - Iteration 19224 | Train Loss 4.15655 | Training Time: 0024h 46m 37s | Iteration Time: 4604.048 ms | 21351.6 tokens/sec
2024-11-05 09:15:34,598 - Iteration 19244 | Train Loss 4.08875 | Training Time: 0024h 48m 09s | Iteration Time: 4604.120 ms | 21351.3 tokens/sec
2024-11-05 09:17:06,735 - Iteration 19264 | Train Loss 3.99948 | Training Time: 0024h 49m 41s | Iteration Time: 4606.872 ms | 21338.6 tokens/sec
2024-11-05 09:18:38,848 - Iteration 19284 | Train Loss 4.14245 | Training Time: 0024h 51m 14s | Iteration Time: 4605.618 ms | 21344.4 tokens/sec
2024-11-05 09:20:10,954 - Iteration 19304 | Train Loss 3.92762 | Training Time: 0024h 52m 46s | Iteration Time: 4605.304 ms | 21345.8 tokens/sec
2024-11-05 09:21:43,055 - Iteration 19324 | Train Loss 3.99990 | Training Time: 0024h 54m 18s | Iteration Time: 4605.060 ms | 21347.0 tokens/sec
2024-11-05 09:23:15,173 - Iteration 19344 | Train Loss 4.01311 | Training Time: 0024h 55m 50s | Iteration Time: 4605.874 ms | 21343.2 tokens/sec
2024-11-05 09:24:47,285 - Iteration 19364 | Train Loss 3.86886 | Training Time: 0024h 57m 22s | Iteration Time: 4605.631 ms | 21344.3 tokens/sec
2024-11-05 09:26:19,430 - Iteration 19384 | Train Loss 3.69892 | Training Time: 0024h 58m 54s | Iteration Time: 4607.227 ms | 21336.9 tokens/sec
2024-11-05 09:27:51,539 - Iteration 19404 | Train Loss 3.95059 | Training Time: 0025h 00m 26s | Iteration Time: 4605.482 ms | 21345.0 tokens/sec
2024-11-05 09:29:23,656 - Iteration 19424 | Train Loss 3.80684 | Training Time: 0025h 01m 58s | Iteration Time: 4605.820 ms | 21343.4 tokens/sec
2024-11-05 09:30:55,785 - Iteration 19444 | Train Loss 4.04671 | Training Time: 0025h 03m 30s | Iteration Time: 4606.487 ms | 21340.3 tokens/sec
2024-11-05 09:32:27,906 - Iteration 19464 | Train Loss 4.08932 | Training Time: 0025h 05m 03s | Iteration Time: 4606.038 ms | 21342.4 tokens/sec
2024-11-05 09:34:00,026 - Iteration 19484 | Train Loss 3.91540 | Training Time: 0025h 06m 35s | Iteration Time: 4605.964 ms | 21342.8 tokens/sec
2024-11-05 09:35:32,154 - Iteration 19504 | Train Loss 4.06794 | Training Time: 0025h 08m 07s | Iteration Time: 4606.403 ms | 21340.7 tokens/sec
2024-11-05 09:37:04,262 - Iteration 19524 | Train Loss 4.13037 | Training Time: 0025h 09m 39s | Iteration Time: 4605.437 ms | 21345.2 tokens/sec
2024-11-05 09:38:36,359 - Iteration 19544 | Train Loss 4.00044 | Training Time: 0025h 11m 11s | Iteration Time: 4604.855 ms | 21347.9 tokens/sec
2024-11-05 09:40:08,493 - Iteration 19564 | Train Loss 4.00879 | Training Time: 0025h 12m 43s | Iteration Time: 4606.659 ms | 21339.5 tokens/sec
2024-11-05 09:41:40,591 - Iteration 19584 | Train Loss 3.94736 | Training Time: 0025h 14m 15s | Iteration Time: 4604.919 ms | 21347.6 tokens/sec
2024-11-05 09:43:12,677 - Iteration 19604 | Train Loss 3.91909 | Training Time: 0025h 15m 47s | Iteration Time: 4604.294 ms | 21350.5 tokens/sec
2024-11-05 09:44:44,770 - Iteration 19624 | Train Loss 3.80856 | Training Time: 0025h 17m 19s | Iteration Time: 4604.676 ms | 21348.7 tokens/sec
2024-11-05 09:46:16,896 - Iteration 19644 | Train Loss 3.89146 | Training Time: 0025h 18m 52s | Iteration Time: 4606.268 ms | 21341.4 tokens/sec
2024-11-05 09:47:49,039 - Iteration 19664 | Train Loss 3.84442 | Training Time: 0025h 20m 24s | Iteration Time: 4607.159 ms | 21337.2 tokens/sec
2024-11-05 09:49:21,205 - Iteration 19684 | Train Loss 4.01895 | Training Time: 0025h 21m 56s | Iteration Time: 4608.323 ms | 21331.8 tokens/sec
2024-11-05 09:50:53,384 - Iteration 19704 | Train Loss 4.05747 | Training Time: 0025h 23m 28s | Iteration Time: 4608.931 ms | 21329.0 tokens/sec
2024-11-05 09:52:25,515 - Iteration 19724 | Train Loss 4.14750 | Training Time: 0025h 25m 00s | Iteration Time: 4606.538 ms | 21340.1 tokens/sec