-
Notifications
You must be signed in to change notification settings - Fork 9
/
online.tex
1112 lines (980 loc) · 58.1 KB
/
online.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
%auto-ignore
\section{\label{sect:online}Online Systems}
The IceCube online systems comprise both the software and hardware at the
detector site responsible for data acquisition, event selection,
monitoring, and data storage and movement. As one of the goals of IceCube
operations is to maximize the fraction of time the detector is sensitive to
neutrino interactions (``uptime''), the online systems are modular so that
failures in one particular component do not necessarily prevent the
continuation of basic data acquisition. Additionally, all systems are
monitored with a combination of custom-designed and industry-standard tools
so that detector operators can be alerted in case of abnormal conditions.
\subsection{\label{sect:online:dataflow}Data Flow Overview}
The online data flow consists of a number of steps of data reduction and
selection in the progression from photon detection in the ice to
candidate physics event selection, along with associated secondary
monitoring and data streams. An overview of the data flow is shown in
figure~\ref{fig:online_dataflow}.
\begin{figure}[!ht]
\centering
\includegraphics[width=0.6\textwidth]{graphics/online/online_dataflow.pdf}
\caption{Data flow in the primary IceCube online systems. See details
on each component in the text.}
\label{fig:online_dataflow}
\end{figure}
DOM hits are mostly due to dark noise. The first step
in data reduction takes place in the DOM, using the Local Coincidence (LC) condition described in
section~\ref{sec:dom_functional}. Hits that meet the LC criteria are flagged
as Hard Local Coincidence (HLC hits) and include a full payload of
digitized waveforms, while isolated non-LC hits are flagged as Soft Local
Coincidence (SLC) hits and are compressed more aggressively, with only a
timestamp and minimal amplitude / charge information transmitted
(section~\ref{sect:online:payloads}).
All DOM hits are read out to dedicated computers on the surface
(section~\ref{sect:sps}) by the data acquisition system (DAQ). The next level
of data selection is the formation of software triggers by the DAQ
system. HLC hits across the detector are examined
for temporal and in some cases spatial patterns that suggest a common
causal relationship. A number of different trigger algorithms run in
parallel, described in section~\ref{sect:online:trigger}. All hits (both HLC
and SLC) within a window around the trigger are combined into events, the
fundamental output of the DAQ. The event rate varies
seasonally with the atmospheric muon flux~\cite{ICECUBE:IceTop} from 2.5
kHz to 2.9 kHz, with a median rate of 2.7 kHz, and the total DAQ data rate is
approximately 1~TB/day.
The DAQ also produces secondary streams that include time calibration,
monitoring, and DOM scaler data. The scaler data report the
hit rate of each DOM and are used in the supernova data
acquisition system (section~\ref{sect:SNDAQ}). The time calibration and
monitoring streams are used to monitor the health and quality of the
data-taking runs.
The DAQ event data are then processed further with approximately 25 filters
in order to select a subset of events (about 15\%) to transfer over
satellite to the Northern Hemisphere (section~\ref{sect:online:filter}). Each
filter, typically designed to select events useful for a particular physics
analysis, is run over all events using a computing cluster in the ICL.
Because of limitations both on total computing power and bounds on the
processing time of each event, only fast directional and energy
reconstructions are used. This Processing and Filtering (PnF) system is
also responsible for applying up-to-date calibration constants to the DAQ
data. All processed events, even those not selected by the online filters,
are archived locally.
A dedicated system for data movement, JADE, handles the local archival storage to
tape or disk and the handoff of satellite data
(section~\ref{sect:online_jade}). This includes not only primary data streams
but also monitoring data, calibration runs, and other data streams.
Low-latency communications for experiment control and real-time monitoring
are provided by the IceCube Messaging System (I3MS).
Experiment control and detector monitoring are handled by the IceCube Live
software system, described in section~\ref{sec:online:icecubelive}.
Data-taking runs are arbitrarily divided into 8-hour periods and assigned
a unique run number; data acquisition need not actually pause during
the run transition. Detector configuration parameters that affect physics
analyses are changed at most once per year (typically in May), indicating
the start of a new ``physics run''.
\subsection{\label{sect:sps}South Pole System and South Pole Test System}
The South Pole System (SPS) comprises 19 racks of computing and network
hardware that run the various online systems described in this
section. The DOM surface cables are connected via passive patch panels to
custom 4U computers, called DOMHubs, one DOMHub per in-ice string and 11 additional
hubs for IceTop. The remaining servers, including those for higher-level
data acquisition, event filtering, detector monitoring, and core
infrastructure, are currently 2U Dell PowerEdge R720 servers running
Scientific Linux (Table \ref{tab:sps_breakdown}). The servers are
typically upgraded every three to four years. The custom hardware components in
the DOMHubs are replaced with spares as failures warrant, and the disks and
single-board computer were upgraded in 2013--14.
\begin{table}[h]
\centering
\caption{Breakdown of computing equipment at SPS, indicating number of
machines used for each task.}
\begin{tabular}{ r c }
\hline
Component & \# \\ \hline DOMHubs & 97 \\ Other data
acquisition & 4 \\
Monitoring & 3 \\ Event filtering & 24 \\ System infrastructure & 8 \\ Other &
6 \\
\hline
\end{tabular}
\label{tab:sps_breakdown}
\end{table}
Each of the DOMHubs is an industrial computer chassis with custom components for DOM
power, timing, and communication. A low-power single-board computer
communicates with 8 custom PCI DOM Readout (DOR) cards via an
industry-standard backplane. An ATX power supply with two
redundant modules powers the DOMHub, while two
48~VDC Acopian power supplies, mounted and connected in series inside the
chassis to double the voltage to 96~V, supply power to the DOMs. The DOM power is switched and
monitored by the DOR cards and is controlled by software. Another PCI
card, the DOMHub Service Board (DSB), is responsible for GPS timing fanout
(section~\ref{sect:online:master_clock}).
The SPS computers are connected via switches in each rack that provide
redundant connections to a 10--20 Gbps network backbone. The DOMHubs are
connected to the rack switches with two bonded 1 Gbps links. Typical network I/O during
data-taking for the DAQ Event Builder (section~\ref{sect:online:evbuilder}) is about 240~Mbps in each direction.
The PnF Central Server sees 200 Mbps in and 640 Mbps out; the output
stream is significantly higher than the input as PnF distributes the
events to filtering clients and generates multiple output streams
(section~\ref{sect:online:filter}).
Redundancy and continuous monitoring of SPS is one of the keys to a high
detector livetime (section~\ref{sec:operational_performance}).
Nagios monitoring software detects and flags problems,
including issues with DOM power and communication on the DOMHubs. Severe
problems impacting data-taking result in a page to the IceCube winterover
personnel via the station's Land Mobile Radio (LMR) system. A dedicated
powered spare server can replace any failed DAQ node, and spare PnF filtering
clients can be started to increase throughput in case of a data filtering
backlog. SPS hardware is also connected to uninterruptible
power supplies (UPS) in order to continue data-taking through station power
outages of up to 15 minutes.
Total power usage of the detector and online systems, including computing servers, is
approximately 53 kW. The majority is consumed by the DOMs, with an
average power consumption of 5.7~W each, including power supply efficiency
and transmission losses, for a total of 30.6 kW. The DOMHubs require 128~W
each, not including the DOM power, for a total of 12.4 kW. The
computing servers consume approximately 200--300~W each, depending on
configuration. Most of the of the remaining power is used by the PnF
Filter Clients: 20 servers of 300W each for a total of 6 kW.
A scaled-down version of SPS, the South Pole Test System (SPTS) located in
Madison, Wisconsin, U.S.A., allows testing and validation of both hardware
and software in the Northern Hemisphere before rollout to SPS. Servers and DOMHubs
identical to those at SPS, along with a small number of DOMs in chest
freezers, are used in the test system. Although the number of DOMs
available is much smaller than in the real detector, recent software
improvements allow the ``replay'' of pre-recorded raw SPS hit data
on SPTS systems, providing a data stream to higher-level DAQ and PnF
components identical to SPS. Another test system includes a full-length
in-ice cable and is used primarily for validation of DOM communications and
timing.
\subsection{Data Readout and Timing}
While the low-level communications and timing systems of the DOM are
described in detail in ref.~\cite{ICECUBE:DAQ}, we review those here in
the broader context of the online systems.
\subsubsection{\label{sect:online:comms}Communications}
Digital communication between the DOR card and DOM occurs via copper
twisted pairs, with two DOMs per pair on the in-ice cable, and one IceTop
DOM per pair for increased bandwidth (IceTop hit rates can exceed 3 kHz).
The physical layer signaling uses on-off keying with bipolar pulses. The
protocol is a custom
packet-based scheme. Each packet is assigned a sequence number, and all
received packets are acknowledged if the sequence number is correct. Each
packet also contains a cyclic redundancy checksum to detect transmission errors.
Out-of-sequence packets received are ignored, and non-acknowledged packets
are retransmitted. The total bandwidth of the communication channel
is 720 kbps per twisted pair.
Messaging is managed from the surface, in that the DOR requests data from
each DOM in turn; only one DOM per pair can transmit at a time. Communication is
paused once per second to perform a timing calibration (RAPCal; section~\ref{sect:dom:rapcal}); this enables time transfer of DOM clock to DOR
clock for every DOM.
\subsubsection{\label{sect:online:master_clock}Master Clock System}
The DOR clocks themselves are synchronized to UTC via an active fanout
system from a single Symmetricom ET6000 GPS receiver with a
temperature-stabilized 10 MHz oscillator, also known as the Master
Clock. The fanout tree is shown in
figure~\ref{fig:clock_fanout}. The 10 MHz output, a 1 Hz output, and a
serial time string indicating
the UTC date and time are distributed to the DOMHubs via a series of
fanouts, using shielded, delay-matched twisted-pair cables. Within the
DOMHub, the DSB card continues the fanout via short delay-matched patch
cables to each DOR card. The local 20 MHz clocks of each DOR card are
phase-locked to the distributed 10 MHz signal.
To avoid the Master Clock being a single-point failure for the detector, a
hot spare receiver, using its own GPS antenna and powered through a
separate UPS, is continuously active and satellite-locked in case of
problems with the primary.
\begin{figure}[!ht]
\centering
\includegraphics[width=0.8\textwidth]{graphics/online/data_readout/clock_fanout.pdf}
\caption{Master Clock fanout system, from GPS receiver to DOR cards in
each DOMHub.}
\label{fig:clock_fanout}
\end{figure}
\subsubsection{DOR Card and Driver}
Each DOR card is connected to up to 8 DOMs, with 8 DOR cards in a
DOMHub. The DOR card controls the 96~VDC power supply to the DOMs and
modulates the communications signaling on top of this DC level; the DOMs
can accept input voltages from 40 V to 120 V.
Dedicated circuitry monitors the current draw and voltage levels on each
twisted pair. A ``firmware fuse'' can disable power if the current draw
deviates from programmable maximum or minimum levels, and this mechanism is
supplemented with standard physical fuses.
The software interface to the DOR card, and thus to the DOMs, is provided
with a custom Linux device driver. Access to DOR card functions, including
DOM power control, communication statistics, RAPCal, and current / voltage
monitoring, is facilitated using the Linux \texttt{/proc} filesystem
interface. Data transfer from the cards to the single-board computer is
achieved via DMA over the PCI bus. The driver
provides a device file for each DOM for read/write access by higher-level software.
\subsubsection{\label{sect:online:payloads}DOM Hit Payloads}
The content of the DOM hit payloads transmitted to the surface depends on whether local
coincidence was satisfied, i.e. whether the hit was flagged as HLC or SLC.
The DOM Main Board ID and the timestamp of the hit in DOM clock counts are
always transmitted, along with trigger and LC flags.
For HLC hits, the digitized ATWD and fADC waveforms are transmitted.
Waveforms from lower-gain ATWD channels are only included if the signal
amplitude in the higher-gain channel exceeds 75\% of the digitizer range. The waveforms are
compressed losslessly in the DOM using a delta-compression algorithm that
encodes the difference between subsequent samples. The difference values
are packed into words of length 1, 2, 3, 6, or 11 bits depending on
magnitude, and special values in the bitstream are used to transition
between different word lengths.
For both HLC and SLC hits, a chargestamp is included that provides an
estimate of the amplitude/charge even if, as in the SLC case, the full
waveform is not transmitted. For in-ice DOMs, the chargestamp consists of
three samples of the fADC waveform centered around the peak value, along
with the peak sample number. For IceTop DOMs, the chargestamp is the sum
of all samples of the ATWD waveform, after pedestal subtraction.
\subsection{Data Acquisition Software}
IceCube's data acquisition (DAQ) system is a set of software components
running on the DOMHubs and dedicated servers in the ICL. These components are shown in
figure~\ref{fig:online_dataflow} and include StringHub, Trigger, Event
Builder, Secondary Builder, and a Command and Control server. The DAQ is
responsible for detecting patterns of hits in the detector likely to be
caused by particle interactions and storing these collections of hits as
events.
Hits are read continuously from the DOMs by the
StringHub components running on each DOMHub, and a minimal representation of each HLC hit is
forwarded to the Trigger components (either the in-ice or IceTop Trigger.)
The Trigger components apply a
configurable set of algorithms to the hit stream and form windows around interesting temporal
and/or spatial patterns. These time windows are collected by the
Global Trigger and used to form non-overlapping trigger requests by merging
subtriggers as needed, ensuring that the same hit doesn't appear in
multiple events. The merged trigger requests are used by the Event Builder
component as templates
to gather the complete hit data from each StringHub and assemble the final
events.
\subsubsection{StringHub and HitSpool}
\label{sec:domhub_hitspool}
The StringHub software component that runs on each DOMHub is responsible
for reading all available data from each of its connected DOMs each second
and passing that data onto the downstream consumers. It also saves all
hits to a local ``HitSpool'' on-disk cache and queues them in an
in-memory cache to service future requests from the Event Builder for full
waveform data.
The StringHub component is divided into two logical pieces: the front
end is called Omicron, and the back end is the Sender. Omicron controls all
of the connected DOMs, forwarding any
non-physics data (calibration, monitoring) to its downstream consumers and
sorting the hits from all
DOMs into a single time-ordered stream before passing them to the Sender.
Omicron is also responsible for translating DOM hit times into
UTC-compatible ``DAQ time'', which counts the number of 0.1-ns periods
since the UTC start of the year (including leap seconds). The translation
uses the RAPCal procedure as described in section~\ref{sect:dom:rapcal},
performed for each DOM every second.
The Sender caches SLC and HLC hits in memory, then forwards a
condensed version of each HLC hit to the appropriate local Trigger. Each
condensed HLC hit record contains the hit time, a DOM identifier, and the
trigger mode. After the
Trigger components have determined interesting time intervals,
the Event Builder requests each interval from the Sender which returns a list of
all hits within the interval and prunes all older hits from the in-memory hit
cache after each interval.
One core requirement of the DAQ is that each component operates on a
time-ordered stream of data. The DAQ uses its ``Splicer'' to accomplish
this. The Splicer is an object that gathers all input streams
during the setup phase at the beginning of a data-taking run; no inputs can
be added once started. Each stream
pushes new data onto a ``tail'', and the Splicer merges the data from all
streams into a single sorted output stream. When a stream is closed, it
issues an end-of-stream marker that causes the Splicer to
ignore all further data. Details of the Splicer algorithm can be found in
ref.~\cite{vlvnt13_trigger}.
As hits move from Omicron to the Sender, they are written to the
HitSpool disk cache. These files are
written in a circular order so that the newest hits overwrite the oldest
data. The files are catalogued in a SQLite database to
aid in fast retrieval of raw hit data.
One limitation of the current design is that it only reads data when
the full DAQ system is running, so the detector is essentially ``off''
during certain hardware failures or the periodic full restarts of the
system that occur every 32 hours. A future enhancement
will split the StringHub into several independent pieces to eliminate these
brief pauses. The front end (Omicron) will be moved to a daemon
that continuously writes data (including secondary, non-physics data and
other metadata) to the disk cache. Part of the back end (Sender)
will become a simple HitSpool client that reads data from the disk cache
and sends it to the downstream consumers, while another simple component
will listen for requested hit readout time intervals from the Event Builder
and return lists of hits taken from the HitSpool.
\subsubsection{\label{sect:online:trigger}Triggers}
The DAQ trigger algorithms look for clusters of HLC hits in space and time
that could indicate light due to a particle interaction in the detector, as
opposed to uncorrelated dark noise. An algorithm searches for a given
multiplicity of HLC hits, possibly with an additional geometric
requirement, within a trigger time window. The time scale of the trigger window is
set by the light travel time in ice and the geometry requirement
involved. Longer readout windows are appended before and after the trigger
windows to save early and late hits with the events.
Triggers are generally restricted to a subset of DOMs, such as all in-ice DOMs,
IceTop DOMs, or DeepCore DOMs. The algorithms run in parallel over all
hits in the DOM set, and then overlapping triggers are merged. The various
trigger algorithms are described below, and a summary of the algorithm
parameter settings is found in Table \ref{tab:triggers}. Trigger settings
are changed at most once per year.
The fundamental trigger for IceCube, IceTop, and DeepCore is the Simple
Multiplicity Trigger (SMT). The SMT requires $N$ or more HLC hits within a
sliding time window of several $\mu\mathrm{s}$, without any locality
conditions. Once the multiplicity condition is met, the trigger is
extended until there is a time period of the length of the initial trigger
window without any HLC hits from the relevant DOM set. The
multiplicity value $N$ is tuned to the energy threshold of the sub-detector,
which fundamentally is set by the string or tank spacing.
Other triggers use a lower multiplicity threshold by adding constraints on
the HLC hit topology. The time windows for these triggers are based upon
the size of the locality volume. The Volume Trigger defines a cylinder of fixed size around
each hit DOM and requires a given multiplicity within this cylinder
(figure~\ref{fig:trig_cylinder}); this allows IceCube to trigger on localized
low-energy events that do not satisfy the SMT condition. The Volume Trigger
has an additional simple multiplicity parameter that fires the trigger when
a certain number of hits is reached, regardless of any spatial
restrictions; this prevents the trigger
algorithm from slowing down when the detector has triggered already from
the primary SMT. The String Trigger requires a certain number of hits
within a span of DOMs along a single string
(figure~\ref{fig:trig_string}); this allows one to trigger on low-energy
muons that pass vertically through the detector.
\begin{figure}[ht]
\centering \subfloat[]{
\includegraphics[scale=0.45]{graphics/online/trigger/trig_cylinder}
\label{fig:trig_cylinder}
}
\quad
\subfloat[]{
\includegraphics[scale=0.5]{graphics/online/trigger/trig_string}
\label{fig:trig_string}
}
\caption{Schematic representation of triggers using spatial coincidences. Shaded circles
represent HLC-hit DOMs. Left: Volume Trigger. Right: String Trigger. }
\end{figure}
IceCube can detect hypothetical subrelativistic heavy
particles such as magnetic monopoles that may catalyze nucleon decays along
their trajectory \cite{Aartsen:2014awd}. However, because these
particles may travel at velocities less than $0.01c$, the time
windows used in the standard triggers are too short. A dedicated Slow
Particle (SLOP) trigger has thus been developed to search for slow
track-like particle signatures.
The SLOP trigger operates in several stages. The HLC hits, which by design
occur at least in pairs along a string, are cleaned by removing pairs that
are proximate in time ($\Delta t < T_{\mathrm{prox}}$); $T_{\mathrm{prox}}$
is tuned to remove most hits from particles traveling near $c$, such as muons.
For all parameters, the trigger algorithm considers the time and
position of the first hit within each HLC pair. Next, triplets of HLC
pairs within a time window $T_{\mathrm{max}}$
are formed. The geometry of each triplet formed (figure~\ref{fig:slop})
must satisfy track-like
conditions: the largest inner angle $\alpha$ of the triangle formed by the
HLC pairs must be greater than $\alpha_{\mathrm{min}}$, and the
``velocities'' along the triangle sides must be consistent. Specifically,
the normalized inverted velocity difference $v_\mathrm{rel}$, defined as
\begin{equation}
v_\mathrm{rel}=\frac{|\Delta
v_\mathrm{inverse}|}{\overline{v}_\mathrm{inverse}} =
3\cdot\frac{|\frac{1}{v_{23}}-\frac{1}{v_{12}}|}
{\frac{1}{v_{12}}+\frac{1}{v_{23}}+\frac{1}{v_{13}}}
\end{equation}
\noindent where $\ v_{ij} = \Delta x_{ij}/\Delta t_{ij}$, must be less than
or equal to a predefined maximum value
$v_{\mathrm{rel}}^{\mathrm{max}}$. Finally, the total number of track-like triplets
must be greater than or equal to $N_{\mathrm{triplet}}$, set to 5, and all
of these track-like triplets must overlap in time.
\begin{figure}[!ht]
\centering
\includegraphics[width=0.6\textwidth]{graphics/online/trigger/slop.pdf}
\caption{Geometry of a SLOP trigger triplet of HLC pairs.}
\label{fig:slop}
\end{figure}
Other special-purpose triggers exist to collect minimum bias data of
various sorts. The Fixed-Rate Trigger (FRT) reads out 10~ms of hit data from
the full detector at fixed intervals. This is especially useful for studies
of DOM noise. The Calibration Trigger selects a particular type of hit
such as special IceTop non-HLC hits that have full waveform readout, and promotes
them to a trigger. The Calibration Trigger can also be configured to
include all events due to LED flashers in cases where flasher
operations require disabling standard triggers. Finally, a Minimum Bias
trigger can select one of every $N$ HLC hits and promote this hit to a trigger, adding
readout windows as usual; currently an IceTop Minimum Bias trigger with a
prescale factor $N$ of 10000 is active.
\begin{table}
\centering \footnotesize
\caption{Trigger parameters (as of May 2016) and typical trigger
rates of each algorithm. Most rates vary seasonally with the atmospheric
muon flux. The merged event rate varies from 2.5 to
2.9 kHz.}
\begin{tabular}{lrrrrr}
\hline Trigger & DOM set & $N$ HLC hits & Window & Topology & Rate\\
& & & ($\mu$s) & & (Hz) \\
\hline
SMT & in-ice & 8 & 5 & --- & 2100\\
SMT & DeepCore & 3 & 2.5 & --- & 250\\
SMT & IceTop & 6 & 5 & --- & 25\\
Volume & in-ice & 4 & 1 & cylinder (r=175m, h=75m) & 3700\\
Volume & IceTop infill & 4 & 0.2 & cylinder (r=60m, h=10m) & 4\\
String & in-ice & 5 & 1.5 & 7 adjacent vertical DOMs & 2200\\
SLOP & in-ice & $N_{\mathrm{triplet}} = 5$ & $T_{\mathrm{prox}} = 2.5$, &
$\alpha_{\mathrm{min}} = 140^\circ,\ v_{\mathrm{rel}}^{\mathrm{max}}
= 0.5$ & 12\\
& & & $T_{\mathrm{min}} = 0$, & &\\
& & & $T_{\mathrm{max}} = 500$ & &\\
FRT & all & --- & --- & --- & 0.003\\
\hline
\end{tabular}
\label{tab:triggers}
\end{table}
Many events will satisfy more than one of the trigger conditions, sometimes
multiple times. In order to avoid overlapping events, possibly containing
the same DOM hits, the triggers and their associated readout windows are
merged, while retaining information about the separate triggers. The
merged trigger is referred to as the Global Trigger.
Each trigger has defined readout windows around the trigger window; all
hits from the full detector, including those DOM sets not involved in the trigger,
are requested from the StringHub components and built into events. For the
DOM set involved in an in-ice trigger, the readout windows are appended at each end of the trigger
window, while for other DOM sets, the readout windows are centered around
the trigger start time. Readout windows around IceTop triggers are global
and include hits from all other DOM sets before and after the trigger
window. The union of overlapping readout windows defines
an event (figure~\ref{fig:trigger_readout}). Long events such as SLOP or FRT
triggers typically contain several causally independent ``physics'' events;
these typically are re-split before reconstruction and analysis.
\begin{figure}[!ht]
\centering
\includegraphics[width=0.8\textwidth]{graphics/online/trigger/trigger_readout}
\caption{In-ice, IceTop, and merged readout windows for a long event
satisfying SLOP and SMT8 triggers.}
\label{fig:trigger_readout}
\end{figure}
\subsubsection{\label{sect:online:evbuilder}Event Builder}
The Event Builder receives requests from the Global Trigger, extracts the
individual readout windows, and sends them to the appropriate subset of the
StringHubs. The StringHubs each send back a list of all hits within the
window. When all StringHubs have returned a list of hits, these are bundled with the trigger data into an event.
Events are written to a temporary file. When the temporary file
reaches a preset configurable size, it is renamed to a standard unique name. When the PnF
system sees a new file, it accepts it for processing and filtering
(section~\ref{sect:online:filter}). The total latency from detection of
photons at the DOMs to DAQ events written to disk is approximately five
seconds.
\subsubsection{\label{sect:online:daqdomconfig}DAQ and DOM Configuration}
The configuration of the DAQ is managed by two sets of XML files: a cluster
configuration file and a hierarchical tree of run configuration files.
The cluster configuration file contains system-level settings used to
launch the DAQ, such as component host servers, startup paths, command-line
options, etc. Components (other than StringHub) can easily be moved to
different hosts for troubleshooting, load balancing, and maintenance.
Run configuration files list the trigger and DOM configuration files to be
used for taking data. The trigger configuration file specifies
configuration parameters for all
trigger components (in-ice, IceTop, and global) used in a run. These
include the list of algorithms run by each trigger component, along with
readout window sizes and any other configurable parameters (multiplicity
threshold, trigger period, prescale factor, etc.).
DOM configuration files (one per hub) list all DOMs that contribute to
the data-taking run. All configuration parameters for each DOM are
specified, including PMT high voltages, ATWD operating parameters,
discriminator thresholds, local coincidence settings, baselines and others.
Run configuration files (including trigger and DOM files) are versioned and
frozen once used for data-taking. All relevant configuration parameters
are also stored in a database for use in analysis.
An additional geometry XML file contains the $(x,y,z)$ and (string,
position) coordinates of the DOMs, needed by the Trigger components. The
DOMs entries are indexed by their unique Main Board ID. This ensures that cabling changes
on a DOMHub do not result in changes in data-taking or errors in the geometry.
\subsubsection{Component Control}
The DAQ components are managed by a single ``command-and-control'' daemon,
CnCServer, that manages and monitors components and acts as the main
external interface to the DAQ. It uses a standard component interface to query and
control the components, and a separate interface for components to expose
internal data used for monitoring the health of the detector or for
debugging purposes.
CnCServer dynamically discovers the detector components during a launch
phase, and instructs them to connect to each other as needed. Using the
run configuration files, it then distributes each component configuration
appropriately. The components are then started to begin a data-taking run.
When a run is in progress, CnCServer regularly checks that components are
still active and that data are flowing between components.
\subsubsection{\label{sect:SNDAQ}Supernova Data Acquisition System}
The IceCube DAQ has a parallel triggering and analysis pathway designed
specifically for the detection of the many $O(10)$ MeV neutrinos from a
Galactic core-collapse supernova. In the case of such an event, these
neutrinos will produce interactions in
the detector that, individually, are too dim to trigger the standard DAQ,
but because of their high number, can cause a coherent rise in the
individual hit rates of the DOMs~\cite{IC3:supernova}.
Each DOM monitors its hit rate and sends a stream of binned counts, using a
bin width of 1.6384~ms ($2^{16}$ clock cycles at 40 MHz). An artificial
deadtime of $250\ {\mu}\mathrm{s}$ is applied after each hit to reduce the
impact of correlated hits (section~\ref{sect:darknoise}). Each
StringHub collects the rate stream of each DOM, supplies UTC-based timestamps,
and forwards the streams to the Secondary Builder.
The supernova DAQ (SNDAQ) system receives the Secondary Builder stream,
rebins the individual DOM rates, and monitors the sum of rates over several
timescales for a significant rise. This analysis is described in
detail in ref.~\cite{IC3:supernova}. One complication is that light
deposition from cosmic-ray muons distorts the significance
calculation. To correct for this, the trigger rate of the standard DAQ is
continuously sent to SNDAQ, and any significant alert is corrected
\cite{IC3:icrc15_sndaq}. At a high significance threshold, the capture of
all untriggered data around the alert time is initiated using the HitSpool
system (section~\ref{sect:hitspool}), and the Supernova Neutrino Early Warning
System (SNEWS) \cite{SNEWS} is notified. SNDAQ latency of approximately 7
minutes is dominated by the sliding window algorithm used to determine
average DOM rates.
\subsubsection{\label{sect:hitspool}HitSpool Request System}
In the event of a significant transient event, subsystems such as SNDAQ can
request all untriggered DOM hits from the detector in
a particular time interval by sending requests to a HitSpool Request daemon. Presently,
the HitSpool Request System has three clients;
their basic characteristics are described in
Table~\ref{tab:hsclients}. The central daemon passes the request on to
every DOMHub, where hits in the requested time
interval are gathered and forwarded to a ``sender'' component. The hits
are then bundled and transferred to the Northern Hemisphere for further analysis.
The time windows of SNDAQ HitSpool data requests are based on the
statistical significance of the alert and are shown in
Table~\ref{tab:hsclients}. The online High Energy Starting Event (HESE)
analysis system requests HitSpool data from a symmetrical time window of
1~s around events with a total deposited charge of greater than 1500~PE.
The recently implemented HitSpool client for solar flare analyses is
triggered externally by significant Fermi Large Area Telescope (LAT) events~\cite{fermilat:flare}
and requests HitSpool
data from a symmetrical time window of one hour around the trigger
time. Unlike the other two clients, these data sets are not transferred
over the satellite due to their size but are stored locally on disk, with
transmission over satellite only pursued in extraordinary cases.
\begin{table}
\caption{HitSpool data-requesting services and request characteristics.
The SNDAQ quantity $\xi$ is related to the statistical significance of
the rate fluctuation.}
\centering
\footnotesize
\begin{tabularx}{\textwidth}{lcXXXX}
\toprule Client & Trigger Threshold & Time\newline Window & Request
Length & Raw \newline Data Size & Frequency \\
\midrule
SNDAQ & $7.65 \le \xi < 10$ & $[-30\,\mathrm{s},+60\,\mathrm{s}]$ &
$90 \,\mathrm{s}$& $15 \,\mathrm{GB}$&
$0.5/\mathrm{week}$ \\
& $\xi \ge 10$ & $[\pm250\,\mathrm{s}]$ & $500\,\mathrm{s}$ & $85
\,\mathrm{GB}$ & $0.0006 / \mathrm{week}$ \\
HESE & $1500 \,\mathrm{PE} $ &
$[\pm0.5\,\mathrm{s}]$& $1\,\mathrm{s}$ & $175\,\mathrm{MB}$ &
$4/\mathrm{day}$ \\
& 6000 PE & & & & $1/\,\mathrm{month}$ \\
Solar Flare & Fermi-LAT & $[\pm30\,\mathrm{min}]$ & $1\,\mathrm{h}$&
$~600\,\mathrm{GB}$& $ 7 / \mathrm{year}$ \\
& significant event & & & &
\\ \bottomrule
\end{tabularx}
\label{tab:hsclients}
\end{table}
\subsection{\label{sect:online:filter}Online Processing and Filtering}
\subsubsection{Overview}
The online Processing and Filtering (PnF) system handles
all triggered events collected by the DAQ
and reduces the data volume to a level that can be accommodated in our
satellite bandwidth allocation (about 100 GB/day). This treatment
includes application of calibration constants, event
characterization and selection, extraction of data quality monitoring
information, generation of realtime alerts for events of astrophysical
interest, and creation of data files and metadata information for long-term
archiving. The PnF system is a custom software
suite that utilizes about 20 standard, multi-processor servers located in
the SPS computing cluster.
The first step in the analysis of triggered events is the calibration of
the digitized DOM waveforms, as described in section~\ref{sec:waveformcal}.
The geometry, calibration, and detector status (GCD) information needed to
process the data is stored in a database. Next, each DOM's waveform is
deconvolved using the known DOM response to single photons to
extract the light arrival time and amplitude information~\cite{IC3:ereco}.
This series of time and amplitude light arrival information for each DOM is
the basis for event reconstruction and characterization. PnF encodes this
information in a compact data format known as the Super Data
Storage and Transfer format (SuperDST); this format uses only 9\% of the storage
size of the full waveform information. The encoding does introduce a
small level of discretization error to the data, measured to be 1.1~ns in time and
0.04~PE in charge, smaller than the calibration uncertainties on these
values. Any DOM readout whose SuperDST information is found not to be a
good representation of the original waveform, or sees large amounts of
light, also has the full waveform data saved in addition to the
SuperDST record.
Each event is then characterized with a series of reconstruction
algorithms that attempt to match the observed patterns of recorded light in
the SuperDST with known patterns of light from track and shower event
hypotheses~\cite{IC3:ereco}. The reconstructed vertex position, direction,
energy, and the goodness-of-fit are used to select interesting events by various
filter selections. The filter criteria are set by the collaboration
each year and are tuned to select events of interest to specific
analyses. Each year there are about 25 filter selections in
operation; as of 2016, approximately 15\% of all triggered events are
selected by one or more filters. Some of these filters are designed to search for
neutrino events of wide astrophysical interest to the scientific community
and trigger alerts that are distributed to followup observatories
worldwide~\cite{Abbasi:2011ja,Aartsen:2015trq}.
The PnF system also extracts and aggregates data quality and monitoring
information from the data during processing. This information includes
stability and quality information from the DOM waveform and calibration
process, rates of DOM readouts, and rates and
stability information for all detector triggers and filters. This
information is aggregated for each data segment and reported to the IceCube
Live monitoring system (section~\ref{sec:online:icecubelive}).
Finally, the PnF system generates several types of files for satellite
transmission and for long term archival. These
include:
\begin{enumerate}
\item Filtered data files containing events selected by the online filter
selections. These events generally only include the SuperDST version of
the DOM information and results from the online event reconstructions.
The files are queued for satellite transmission to the IceCube data
center in the Northern Hemisphere by the data handling system.
\item SuperDST data files containing the SuperDST version of DOM readout
information for all triggered events as well as summary information from
the online filtering process. This file set is intended as the long-term
archive version of IceCube data.
\item Raw data files containing all uncalibrated waveforms from all DOMs for
every event.
\end{enumerate}
During normal operations, the DAQ produces a raw data output of
$\sim$1 TB per day, resulting in a raw data file
archive of the same size. The SuperDST and filtered data archive, after
data compression, are $\sim$170 GB/day and $\sim$90 GB/day, respectively.
\subsubsection{System Design}
The PnF system uses a modular design based on
a central master server node and a scalable number of data processing client
nodes. The central master server handles data distribution and
aggregation tasks (requesting data blocks from the DAQ, collating event
SuperDST, reconstruction, filter, and monitoring information and writing
data files), while the clients handle the per-event processing
tasks (event calibration, reconstruction, analysis, and filtering).
The system is built upon the IceCube analysis software framework,
IceTray~\cite{DeYoung:2005zz}, allowing standard IceCube algorithms to be
used online without modifications.
The system uses the Common Object Request Broker Architecture
(CORBA) system as a means for controlling, supervising and interconnecting
the modular portions of the system. Specialized CORBA classes allow
data to stream from one component to another using IceTray formats.
Clients can also be easily added and removed as needed to meet the
processing load.
\subsubsection{Components}
The flow of triggered event data in the PnF
system is shown in figure~\ref{fig:online_pnf_internals}. Standard
components include:
\begin{enumerate}
\item DAQ Dispatch, a process to pick up event data from the DAQ data cache
and forward to the Central Server components.
\item Central Servers, which
receive data from the
DAQ Dispatch event source, distribute events to and record results
returning from the PnF client farm, and send events to Writer components.
Typically there are four servers in operation.
\item Filter Clients, where the core calibration, reconstruction and
filtering processes are applied to each triggered event. Up to 500 of
these clients operate in parallel to filter
events in real time.
\item GCD Dispatch, a database caching system to prevent the
Filter Client processes from overwhelming the GCD database at run transitions.
\item File Writers, responsible for creation of files and metadata for
the data archive. There is one writer component for each file type created.
\item Online Writer, responsible for extracting event reconstruction and
filter information from the data for events of astrophysical interest and
sending this information out in real time via the IceCube Live alert
system.
\item Monitoring Writer, responsible for aggregating per-event monitoring
information, creating histograms, and forwarding to the IceCube Live
monitoring system.
\item Filtered Data Dispatch and FollowUp Clients, responsible for
looking for bursts of neutrino events on timescales from 100 seconds up
to 3 weeks in duration. Any significant burst of neutrinos found generates alerts
sent to partner observatories worldwide.
\end{enumerate}
\begin{figure}[!ht]
\centering
\includegraphics[width=0.8\textwidth]{graphics/online/pnf/PnF_Internals.pdf}
\caption{Internal components of the PnF
system. Arrows show the flow of data within the system.}
\label{fig:online_pnf_internals}
\end{figure}
\subsubsection{Performance}
The PnF system is designed to filter triggered events as quickly as
possible after collection by the data acquisition
system. A key performance metric is processing system latency, defined as the duration
of time between the DAQ trigger and the completion of event
processing and filtering. A representative latency history for the system is
shown in figure~\ref{fig:online_pnf_latency}, showing typical system
latencies of about 20 seconds.
\begin{figure}[!ht]
\centering
\includegraphics[width=0.85\textwidth]{graphics/online/pnf/pnf_latency_160627.pdf}
\caption{Typical PnF system latency for a
24-hour period. The latency is defined as the time between DAQ trigger
time and time when the online filtering processing is complete. The
spikes in latency correspond to DAQ run transitions, when
geometry, calibration, and detector status information is updated and
distributed to the filtering clients.}
\label{fig:online_pnf_latency}
\end{figure}
The filter selections used have been relatively stable over several years
of operation of the completed IceCube detector, with most seeing only minor
updates at the
start of each season. The majority of physics analyses derive from a small
set of core filters, including:
\begin{enumerate}
\item A muon track filter that searches for high-quality track events from all
directions. Up-going events for all triggered energies are selected,
while only high-energy
down-going tracks are selected to avoid the large background of
down-going atmospheric muons at lower energies. These selected events
are frequently used as the input to point source and transient neutrino searches.
\item A shower event filter that searches for events producing large energy
depositions in or near the instrumented volume. These selected events are
frequently used as the input to searches for high-energy shower events arising from
atmospheric and astrophysical neutrinos.
\item A high-charge filter that searches for any event depositing a
large amount of energy leading to a recorded charge of $\geq1000$
photoelectrons in the
instrumented volume. While having a large overlap with the muon track
and shower filters at high energies, this filter targets the highest
energy neutrino events of all types. The selected events are used as
inputs to searches for high-energy astrophysical and cosmogenic
neutrinos as well as for relativistic magnetic monopoles.
\item Cosmic ray filters that search for extended air-shower events in
IceTop. The selected events are used as inputs to analyses
targeting the flux, spectrum, and composition of the primary cosmic rays
observed in the Earth's atmosphere.
\item A DeepCore contained event filter that searches for contained, lower-energy
neutrino events (in the range of 10--100 GeV) from atmospheric neutrino interactions
that are contained within the more densely instrumented DeepCore region.
The selected events are used as inputs to analyses that measure
neutrino oscillation effects and search for indirect signatures of dark matter.
\end{enumerate}
\noindent Other filters are employed for more specialized searches, as well as for minimum bias selections.
\subsection{\label{sect:online_jade}Data Handling}
The bulk of South Pole Station data traffic is handled by geosynchronous
satellite links. Due to the location, only
geosynchronous satellites with steeply inclined orbits reach far enough
above the horizon to establish a link. For a given satellite, this link
provides four to six hours of communications once per sidereal day.
Multiple satellites are currently utilized by the
U.S.~Antarctic Program, providing a window of about 12 hours of connectivity with
bandwidth of 250 Mbps for uni-directional data transfer and bandwidth
of 5 Mbps for bi-directional internet connectivity. For the remainder of the day, Iridium
communication satellites allow limited voice and data connectivity and provide up to 2.4
kbps of bandwidth per modem.
IceCube incorporates Iridium modems into two separate systems, the legacy IceCube
Teleport System (ITS) and the IceCube Messaging System (I3MS). ITS uses
the Iridium Short Burst Data mode to send short
messages of 1.8 kB or smaller with a typical latency (transmission time) of 30 seconds.
Messages may either originate or terminate at the ITS Iridium modem at the
South Pole. Messages also contain a recipient ID indicating the intended
host to receive the message, allowing a many-to-many communications
infrastructure between systems running at the South Pole and systems in the
Northern Hemisphere. ITS was retired in 2016.
The newer IceCube Messaging System (I3MS), deployed in 2015, incorporates
multiple Iridium modems and uses the Iridium RUDICS data mode, providing a
2.4 kbit/s bidirectional serial stream per modem and a minimum latency of
about 1.5 seconds. I3MS runs as a daemon on both ends of the link, accepts
messages via the ZeroMQ distributed messaging protocol, and transports
those messages across the link based on message priority and fair sharing
of bandwidth among all users. I3MS message recipients listen for messages
using ZeroMQ publish-subscribe (PUB-SUB), allowing a given message to be
sent to multiple recipients. I3MS also provides low-bandwidth secure shell
(ssh) connectivity to the South Pole, allowing off-site operators
access to SPS in the case of detector issues.
Data handling is provided by three servers running the Java Archival and
Data Exchange (JADE) software. JADE is a
recent Java-based reimplementation and expansion of earlier software, the
South Pole Archival and Data Exchange (SPADE). JADE has
four primary tasks: data pickup, archiving, satellite transmission, and
real-time transmission. The three servers operate independently of one
another, and each is capable of separately handling the nominal
data volume; thus, data handling can continue seamlessly in case of
hardware failure or maintenance.
JADE is configured with a number of input data streams,
each consisting of a data server, a dropbox directory, and a filename pattern. The
data stream dropbox directories are checked on a regular basis for new
files. File completion is indicated by the producer creating a matching
semaphore file. For each file, a
checksum calculated on the data server is compared to a checksum calculated
on the JADE server. After validation, the original data file is removed
from the pickup location.
Files are then routed according to the configuration of their
data stream, either transmitted via satellite link or
archived locally. Archival data
were formerly written to Linear Tape Open (LTO) tapes; the tape system was
retired in 2015, and archival data are now written to disks.
All of the archival data are buffered on the server until the storage medium
is full. In case of media failure, the buffered files can be
immediately written to new archival media with a single command.
Two copies of each archival data stream are saved, and the disks are
transported from the South Pole to the IceCube data center each
year. Archival data are not regularly reprocessed but are retained
indefinitely in case of an issue with the filtered data streams or a
significant improvement in the low-level calibration.
Data streams intended for satellite transfer are queued separately.
JADE combines multiple smaller files or splits large files to create $\sim1$
GB bundles, allowing satellite link operators to manage the daily data
transmission. A configurable number of bundles is then transferred to the
satellite relay server. If satellite transmission is temporarily
interrupted, the excess bundles are staged on the JADE server.
Small files ($<$50 KB) with high priority are sent via
the I3MS Iridium link. In cases where the real-time link is not available, I3MS
will queue the messages to be sent when the link becomes available. All
I3MS messages are also sent to JADE to send via the geosynchronous satellite link to
ensure delivery if the Iridium link should be unavailable for an extended
period of time.
\subsection{\label{sec:online:icecubelive}IceCube Live and Remote Monitoring}
IceCube operations are controlled and monitored centrally by IceCube Live.
IceCube Live consists of two major components: LiveControl,
responsible for controlling data-taking operations and collecting
monitoring data, and the IceCube Live website, responsible for processing
and storing monitoring data as well as presenting this data in webpages and
plots that characterize the state of the detector.
\subsubsection{LiveControl}
LiveControl is responsible for controlling the state of DAQ and PnF, starting and
stopping data-taking runs, and recording the parameters of these runs.
Human operators typically control the detector and check basic
detector status using a command-line interface to the LiveControl
daemon. Standard operation is to
request a run start, supplying a DAQ run configuration
file. LiveControl then records the run number, configuration, start time,
etc. and sends a request
for DAQ to begin data-taking. After data-taking commences successfully,
LiveControl waits a specified amount of time, generally eight hours, then
stops the current run and automatically starts a new run using the same
configuration.
This cycle continues until stopped by a user request or a
run fails. In case of failure, LiveControl attempts to restart data-taking
by starting a new run. Occasionally a hardware failure occurs, and a new
run cannot be started with the supplied configuration because requested
DOMs are unpowered or temporarily unable to communicate. In this case,
LiveControl cycles through predefined partial-detector
configurations in an attempt to exclude problematic DOMs. This results in
taking data with fewer than the full number of available strings, but it
greatly reduces the chance of a prolonged complete outage where no IceCube
data are recorded.
A secondary function of LiveControl is the collection, processing, and
forwarding of monitoring data from DAQ, PnF, and other
components. The associated JavaScript Object Notation (JSON) data are
forwarded to LiveControl using the ZeroMQ protocol and queued internally
for processing. Certain monitoring quantities indicate serious problems with
the detector, e.g. the PnF latency is too high. LiveControl
maintains a database of critical monitoring quantities and raises an alert
if the value is out of the specified range or
hasn't been received in a specified amount of time. The alert usually
includes an email to parties responsible for the affected subsystem and,
for serious problems, triggers an automated page to winterover operators.
Alerts generated by controlled components such as DAQ or PnF may also
generate emails or pages.
All monitoring data are forwarded to the IceCube Live website for further
processing and display.
\subsubsection{IceCube Live Website}
Two operational copies of the IceCube Live website exist: one inside the
IceCube network at the South Pole, and one in the Northern Hemisphere.
Monitoring data reach the northern website based on relative priority and using
both geosynchronous and Iridium data transport, summarized in table
\ref{i3messages}.
\begin{table}[!ht]
\centering
\caption{Typical data volume and latencies for IceCube Live monitoring
messages.}