You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Flagging and recovering from parallel rsync failures?
Overview
The approach of using parallel rsync to transfer large datasets from NCI to CSIRO has yeilded speeds at least an order of magnitude greater than previous experience.
--results /datastore/d/dcfp/logs/ saves a directory structure of log files according to the GNU parallel docs here: https://www.gnu.org/software/parallel/
cd /datastore/d/dcfp/logs/1
/datastore/d/dcfp/logs/1/f6.WIP.c5-d60-pX-f6-20111101.top_level.20200831_165650.tar> ls
seq stderr stdout
How do we know there's been a failure?
We happen to see it in the command line output (this is not robust):
f+++++++++ f6.WIP.c5-d60-pX-f6-20121101.mem079.20200831_153624.tar
120,233,226,240 100% 41.75MB/s 0:45:46 (xfr#1, to-chk=0/1)
Connection closed by 192.43.239.112
rsync: connection unexpectedly closed (0 bytes received so far) [Receiver]
rsync error: unexplained error (code 255) at io.c(228) [Receiver=3.2.3]
Connection closed by 192.43.239.112
rsync: connection unexpectedly closed (0 bytes received so far) [Receiver]
rsync error: unexplained error (code 255) at io.c(228) [Receiver=3.2.3]
receiving incremental file list
f+++++++++ f6.WIP.c5-d60-pX-f6-20121101.mem082.20200831_153624.tar
120,198,225,920 100% 30.29MB/s 1:03:04 (xfr#1, to-chk=0/1)
receiving incremental file list
After the fact we compare to source filelist and there are differences:
Flagging and recovering from
parallel rsync
failures?Overview
The approach of using
parallel rsync
to transfer large datasets from NCI to CSIRO has yeilded speeds at least an order of magnitude greater than previous experience.However the method relies on numerous streams, each with it's own
rsync
that can fail.How can we confidently alert the user to these failures and then recover from them and, for this use case, with the ongoing transfer to tape in mind?
Example: a 97 file, 11TB transfer with failures
command:
time cat /datastore/d/dcfp/NCI_file_lists/cut_f6_2012_filelist.txt | parallel -j 10 --results /datastore/d/dcfp/logs/ 'rsync -ailPW --log-file="/datastore/d/dcfp/logs/f6_2012_rsync.log.$(date +%Y%m%d%H%m%S)" -e "ssh -T -c aes128-ctr" [email protected]:/scratch/v14/$USER/tar_tmp/f6.WIP.c5-d60-pX-f6-20121101.20200831_153624/{} /datastore/d/dcfp/CAFE/forecasts/f6/'
--results /datastore/d/dcfp/logs/
saves a directory structure of log files according to theGNU parallel
docs here: https://www.gnu.org/software/parallel/How do we know there's been a failure?
We happen to see it in the command line output (this is not robust):
After the fact we compare to source filelist and there are differences:
We capture it in one of the many stout files:
NB: note above that mem052 and mem063 don't appear in the
grep -rnw '/datastore/d/dcfp/logs/' -e 'error'
???ToDo:
rsync
task confidently?rsync
assesses files that have already been moved to tapeThe text was updated successfully, but these errors were encountered: