=============================================================== Investigating missing Tango events with ska-tango-event-monitor =============================================================== In this tutorial, we will investigate a Tango control system where we observe a "missing event" using the event monitor. In the process we will learn about the internals of the cppTango ZMQ event system and how to interpret the output of the event monitor script. .. contents:: Contents :depth: 1 :local: :backlinks: none Launch the control system ------------------------- Let's begin by downloading :download:`diagnose-missing-events.tar.bz2 <../../build/tutorial/diagnose-missing-events.tar.bz2>`. This archive contains all the necessary scripts and configuration files required to run the tutorial demo. Once the download is complete, extract the contents of the archive and enter the project directory: .. code :: shell $ tar xaf diagnose-missing-events.tar.bz2 $ cd diagnose-missing-events Next, let's launch the demo control system using ``docker compose``: .. code :: shell $ docker compose up --detach --build Once the control system is up and running, we can install ska-tango-event-monitor script into a virtual environment: .. code :: shell python3 -m pip --require-virtualenv install --extra-index-url https://artefact.skao.int/repository/pypi-all/simple ska-tango-event-monitor[tutorial] Finally, from an ``itango`` console, we can verify the control system is running by :func:`ping`'ing some devices: .. code :: python-console In [1]: hw = Device("foo/bar/hw") In [2]: hw.ping() Out[2]: 238 In [3]: top = Device("foo/bar/top") In [4]: top.ping() Out[4]: 243 Demonstrating the problem ------------------------- The control system you have just deployed exhibits a defect which could be described as a missing event. This example is based on a real SKAO subsystem which exhibited a similar defect. Before we describe the defect, let's first introduce our demo control system. The control system we just deployed has two devices of interest as shown in :ref:`Diagram 1 `. .. _tut-diag1: .. uml :: :caption: Diagram 1: Hierarchy of deployed devices. The :class:`!Hardware` is not actually deployed in this demo, but instead entirely simulated in the :class:`!HwDevice`. :align: center @startuml hide empty members entity "TopDevice(foo/bar/top)" as T { <> +Scan() } entity "HwDevice(foo/bar/hw)" as H { <> +Scan() +mode } entity Hardware { <> } T --> H : Tango H --> Hardware : simulates @enduml The user interacts directly with the :class:`!TopDevice` to start a scan with the :func:`!Scan()` command. This command is forwarded to the :class:`!HwDevice`. For the :class:`!HwDevice`, the :func:`!Scan()` command returns as soon as the scan has been started. Clients are expected to monitor the :attr:`!mode` attribute to know when the scan has completed. To provide a nicer experience for the user, :class:`!TopDevice` does the monitoring of the :attr:`!mode` attribute for you and will not return until the :class:`!HwDevice` has finished and the :class:`!TopDevice` has received an ``IDLE`` event. :ref:`Diagram 2 ` shows what we expect to happen when we call the :func:`!Scan()` command: .. _tut-diag2: .. uml :: :caption: Diagram 2: Expected sequence diagram from :func:`!Scan()` command. :align: center @startuml [-> TopDevice : Scan() TopDevice -> HwDevice : Subscribe mode TopDevice -> HwDevice : Scan() HwDevice -> Hardware : <> activate Hardware TopDevice <-- HwDevice : Mode.SCANNING TopDevice <- HwDevice : Scan() return TopDevice -> TopDevice: <> activate TopDevice HwDevice <- Hardware : <> deactivate Hardware TopDevice <-- HwDevice : Mode.IDLE deactivate TopDevice TopDevice -> HwDevice : Unsubscribe mode [<- TopDevice : Scan() return @enduml Let's give it a go and see what happens when we run the :func:`!Scan()` command: .. code :: python-console In [1]: top = DeviceProxy("foo/bar/top") In [2]: top.get_command_config("Scan").in_type_desc Out[2]: 'Run a scan. This command takes at most 30 seconds to complete. You should increase your client timeout before calling this.' In [3]: top.set_timeout_millis(60000) In [4]: top.Scan() PyDs_PythonError: RuntimeError: Timeout waiting for Mode.IDLE (For more detailed information type: tango_error) We hit a timeout! The :class:`!TopDevice` never received the ``Mode.IDLE`` event it was expecting during its 30 second timeout. .. note :: Here, we are increasing the *client side* timeout with the call to :func:`tango.DeviceProxy.set_timeout_millis` as instructed by the command's documentation. What is curious is that the :class:`!HwDevice` has logged that it did push this event in ~13 seconds which is much less than our 30 second timeout: .. code :: shell $ docker compose logs dev-hw dev-hw-1 | Ready to accept request dev-hw-1 | Starting scan -- pushing Mode.SCANNING dev-hw-1 | Scan complete in 12.85 s -- pushed Mode.IDLE Initial investigation with the event monitor -------------------------------------------- Let's monitor the Tango event system while the :func:`!Scan()` command is running to see what is going on. First, in a separate terminal let's start the ``ska-tango-event-monitor`` script running, pointing it at both our Tango devices: .. code :: shell $ ska-tango-event-monitor foo/bar/hw foo/bar/top ===============================================2026-03-13T16:35:33.678998=============================================== -----------------------------------------dserver/MyServer/hw (from foo/bar/hw)------------------------------------------ - New publishers foo/bar/hw/mode.change: 2 is the last pushed server counter foo/bar/hw/position.change: 128 is the last pushed server counter ----------------------------------------dserver/MyServer/top (from foo/bar/top)----------------------------------------- - New subscriptions foo/bar/hw/mode.idl5_change: 2 events already processed : 2 is the last received server counter : 1 callback(s) registered : 0 missed events detected : 1 events discarded foo/bar/hw/position.idl5_change: 128 events already processed : 128 is the last received server counter : 1 callback(s) registered : 0 missed events detected : 0 events discarded ===============================================2026-03-13T16:35:33.678998=============================================== How do we interpret this output? ******************************** The event monitor uses the :func:`QueryEventSystem()` admin device command to collect snapshots of the event system for each device server simultaneously. As the snapshots contain a lot of information, the ``ska-tango-event-monitor`` script outputs the difference of each snapshot from the previous snapshot. Above shows the initial snapshot captured by the script. For this initial snapshot everything is new and we can see the state of the event system some time after we have run our :func:`!Scan()` command. The script will continue to output snapshot summaries every 10 seconds, however, these will be empty as there are no events being sent at the moment and nothing is changing with the event system: .. code :: shell ===============================================2026-03-13T16:35:43.689001=============================================== -----------------------------------------dserver/MyServer/hw (from foo/bar/hw)------------------------------------------ ----------------------------------------dserver/MyServer/top (from foo/bar/top)----------------------------------------- ===============================================2026-03-13T16:35:43.689001=============================================== .. note :: The Tango event system is shared by the entire :term:`device server ` process and it is this shared event system that we are taking snapshots of. The ``ska-tango-event-monitor`` script resolves each Tango device to the corresponding device server. If multiple devices live in the same server, then there will only be a single entry for that entire device server, even if the multiple devices were requested. The information presented in this first snapshot is already interesting. We can see that ``dserver/MyServer/hw`` device server (which is where ``foo/bar/hw`` resides) currently has publishers for pushing change events for two attributes: - ``foo/bar/hw/mode`` - ``foo/bar/hw/position`` These publishers are only activated if there is a subscriber that has registered interest in the :term:`event stream`, so something is currently subscribing to these attributes. But what do the numbers mean after them? There is this curious phrase "2 is the last pushed server counter". Each publisher has an associated :term:`counter ` that is incremented every time the publisher sends an event (after Tango has filtered out events due to e.g. ``abs_change``). This counter is sent along with the event data to the client and the phrase "2 is the last pushed server counter" means that the last event that was sent included a counter with a value of 2. This also means that 2 events have been sent on this event stream since the client subscribed, as the counter starts at 1. You can probably guess that the two :attr:`!mode` events correspond to the ``Mode.SCANNING`` and ``Mode.IDLE`` log lines we saw from the ``foo/bar/hw`` device. We also have 128 :attr:`!position` events that have been pushed. We didn't mention this earlier, but there is another ominously named :class:`!RogueDevice` that is forwarding a :attr:`!position` attribute and a :attr:`!mode` attribute from the :class:`!HwDevice`. It has subscribed to change events for these attributes and forwards the events it receives. It is these event subscriptions that we see here. As there is no client subscribed to this :class:`!RogueDevice`, we do not see any publishers for these event stream. On the client side, we can see that the ``dserver/MyServer/top`` device server (which is where ``foo/bar/top`` resides) is currently subscribing to the same change event streams: - ``foo/bar/hw/mode`` - ``foo/bar/hw/position`` .. note :: On the client side, we have ``idl5_change`` whereas on the server side it is simply ``change``. This is related to backwards compatibility of the Tango event system. On the client side we see a ZMQ topic that is used to send the event data over. However, on the server side we see a "event stream" which comprises multiple ZMQ topics with each topic supporting a slightly different event data format. In most cases, this distinction can be safely ignored. On the client side, we get more numbers to look at for each event stream. The "2 events already processed" and "2 is the last received server counter" phrases are hopefully self explanatory. For this event stream, the client has processed 2 events and also the counter it received with the most recently processed event was 2. These numbers might get out of sync if, for example, the device server pushing the events is restarted or if some other client subscribed to this event stream and received some events before we did. In our case these two counters are in sync with each other and they are also in sync with the publisher pushing the events. In other words, Tango thinks all the events pushed were processed, including the ``Mode.IDLE`` event we seemed to miss. This is reinforced by the "0 missed events detected" phrases. A :term:`missed event` would be detected if the counter received from the server skipped a value and we can see here that Tango never saw this happen. If Tango did detect a missed event like this, it would push an "error event" to the client callback. .. note :: The "1 events discarded" phrase is often not interesting. A :term:`discarded event` means that the Tango event system received the same event counter twice in a row. This can happen if the ZMQ socket is unsure if we received the event, so it sends it again just in case. This is most likely to happen during the initial subscription and is harmless. This is all very interesting. Tango thinks that we received all of the events, but clearly we didn't receive the ``Mode.IDLE`` event. What is going on? Live monitoring of a scan -------------------------- Let's try running another :func:`!Scan()` while the event monitor is running to get some more clues. When you run this, the event monitor will output something like the following: .. code-block :: shell :linenos: ===============================================2026-03-17T11:14:43.725534=============================================== -----------------------------------------dserver/MyServer/hw (from foo/bar/hw)------------------------------------------ - Published events foo/bar/hw/mode.change: 2 -> 3 (+1) last pushed server counter foo/bar/hw/position.change: 128 -> 175 (+47) last pushed server counter ----------------------------------------dserver/MyServer/top (from foo/bar/top)----------------------------------------- - Received events and callback changes foo/bar/hw/mode.idl5_change: 2 -> 3 (+1) events processed : 2 -> 3 (+1) last received server counter : 1 -> 2 (+1) callbacks foo/bar/hw/position.idl5_change: 128 -> 144 (+16) events processed : 128 -> 144 (+16) last received server counter ===============================================2026-03-17T11:14:43.725534=============================================== ===============================================2026-03-17T11:14:53.733599=============================================== -----------------------------------------dserver/MyServer/hw (from foo/bar/hw)------------------------------------------ - Published events foo/bar/hw/mode.change: 3 -> 4 (+1) last pushed server counter foo/bar/hw/position.change: 175 -> 256 (+81) last pushed server counter ----------------------------------------dserver/MyServer/top (from foo/bar/top)----------------------------------------- - Received events and callback changes foo/bar/hw/position.idl5_change: 144 -> 177 (+33) events processed : 144 -> 177 (+33) last received server counter ===============================================2026-03-17T11:14:53.733599=============================================== ===============================================2026-03-17T11:15:03.744107=============================================== -----------------------------------------dserver/MyServer/hw (from foo/bar/hw)------------------------------------------ ----------------------------------------dserver/MyServer/top (from foo/bar/top)----------------------------------------- - Received events and callback changes foo/bar/hw/position.idl5_change: 177 -> 210 (+33) events processed : 177 -> 210 (+33) last received server counter ===============================================2026-03-17T11:15:03.744107=============================================== ===============================================2026-03-17T11:15:13.754068=============================================== -----------------------------------------dserver/MyServer/hw (from foo/bar/hw)------------------------------------------ ----------------------------------------dserver/MyServer/top (from foo/bar/top)----------------------------------------- - Received events and callback changes foo/bar/hw/mode.idl5_change: 2 -> 1 (-1) callbacks foo/bar/hw/position.idl5_change: 210 -> 244 (+34) events processed : 210 -> 244 (+34) last received server counter ===============================================2026-03-17T11:15:13.754068=============================================== ===============================================2026-03-17T11:15:23.764071=============================================== -----------------------------------------dserver/MyServer/hw (from foo/bar/hw)------------------------------------------ ----------------------------------------dserver/MyServer/top (from foo/bar/top)----------------------------------------- - Received events and callback changes foo/bar/hw/mode.idl5_change: 3 -> 4 (+1) events processed : 3 -> 4 (+1) last received server counter foo/bar/hw/position.idl5_change: 244 -> 256 (+12) events processed : 244 -> 256 (+12) last received server counter ===============================================2026-03-17T11:15:23.764071=============================================== The output here contains 5 snapshots gathered at a 10-second interval. The first of these (lines 1-13) shows that the hardware device pushed a single :attr:`!mode` event (presumably with ``Mode.SCANNING``) and 47 :attr:`!position` events. We can also see that the top device processed the mode event and added an additional :attr:`!mode` callback (used by the :func:`!Scan` command to wait for the ``Mode.IDLE`` event). However, the top device only processed 16 of the :attr:`!position` events in this period. It turns out, this is the root of our issue. .. tip :: The rate these snapshots are gathered can be controlled with the ``--polling-period`` option. Take care setting this too fast as the command sent to the device server locks out other clients from subscribing to events on this device server. In the second snapshot (lines 17-26), we see that the hardware device finishes sending its events however the top device is still busy processing them for another 3 snapshots and we do not see the ``Mode.IDLE`` event until the final sample (lines 53-62). Notice in the penultimate snapshot (lines 41-49), we see the number of :attr:`!mode` callbacks decrease. This corresponds to the :func:`!Scan` command giving up and :term:`unsubscribing ` from mode change events. The :func:`!Scan` command does not have a long enough timeout given all the events that have to be processed by the device server during the scan. The key point here is that the processing of the :attr:`!position` events is not part of :func:`!Scan` command, in fact, the subscription to the :attr:`!position` is done by the :class:`!RogueDevice`, an entirely unrelated device. Despite being unrelated, the activity of this :class:`!RogueDevice` is influencing the command being executed by our :class:`!TopDevice`. The Tango event system is a shared resource that couples all Tango devices deployed in the same device server. If one device is slow to process its events, this can introduce :term:`latencies ` for events that other Tango devices in the same device server are waiting for. In this simple example, we could probably work out where the problem is with this information and from just looking at the code. However, in more complicated scenarios it perhaps isn't so obvious what is going on. To help gather more details in these scenarios, the event monitor can request each device server turns on event system performance monitoring and print statistical summaries of the performance data. Let's have a look at this next. Performance monitoring of a scan -------------------------------- To enable performance monitoring, we need to restart the ``ska-tango-event-monitor`` script with the ``--monitor`` option: .. code :: shell > ska-tango-event-monitor --monitor foo/bar/hw foo/bar/top ===============================================2026-03-17T12:21:25.321968=============================================== -----------------------------------------dserver/MyServer/hw (from foo/bar/hw)------------------------------------------ - New publishers foo/bar/hw/mode.change: 4 is the last pushed server counter foo/bar/hw/position.change: 256 is the last pushed server counter ----------------------------------------dserver/MyServer/top (from foo/bar/top)----------------------------------------- - New subscriptions foo/bar/hw/mode.idl5_change: 4 events already processed : 4 is the last received server counter : 1 callback(s) registered : 0 missed events detected : 2 events discarded foo/bar/hw/position.idl5_change: 256 events already processed : 256 is the last received server counter : 1 callback(s) registered : 0 missed events detected : 0 events discarded ===============================================2026-03-17T12:21:25.321968=============================================== Initially, nothing much seems to have changed. However, if we set off another scan, we will now see some performance statistics: .. code-block :: shell :linenos: ===============================================2026-03-17T12:42:21.916829=============================================== -----------------------------------------dserver/MyServer/hw (from foo/bar/hw)------------------------------------------ - Published events foo/bar/hw/mode.change: 4 -> 5 (+1) last pushed server counter foo/bar/hw/position.change: 256 -> 315 (+59) last pushed server counter - Server performance mean median count minimum maximum 1% 10% 90% 99% Event Gaps [ms] 764.69±660.26 100.41 60 0.05 40048.41 100.20 101.08 Push Time [μs] 47.17±2.48 40.00 60 4.00 116.00 30.00 65.00 ----------------------------------------dserver/MyServer/top (from foo/bar/top)----------------------------------------- - Received events and callback changes foo/bar/hw/mode.idl5_change: 4 -> 5 (+1) events processed : 4 -> 5 (+1) last received server counter : 1 -> 2 (+1) callbacks foo/bar/hw/position.idl5_change: 256 -> 276 (+20) events processed : 256 -> 276 (+20) last received server counter - Client performance mean median count minimum maximum 1% 10% 90% 99% Event Gaps [ms] 991.47±688.67 300.91 20 0.13 14413.12 300.42 301.52 Sleep Time [ms] 720.68±702.40 0.02 20 0.01 14412.92 0.01 0.12 Latency [s] 1.71±0.25 1.70 20 0.00 3.61 0.02 3.39 Callbacks called 1.05±0.05 1.00 20 1.00 2.00 1.00 1.00 Wake Count 1.40±0.39 1.00 20 1.00 9.00 1.00 1.00 mode Processing Time [μs] 127.00±0.00 127.00 1 127.00 127.00 position Processing Time [ms] 300.83±0.08 300.83 19 300.39 301.50 300.45 301.36 ===============================================2026-03-17T12:42:21.916829=============================================== ===============================================2026-03-17T12:42:31.927772=============================================== -----------------------------------------dserver/MyServer/hw (from foo/bar/hw)------------------------------------------ - Published events foo/bar/hw/mode.change: 5 -> 6 (+1) last pushed server counter foo/bar/hw/position.change: 315 -> 384 (+69) last pushed server counter - Server performance mean median count minimum maximum 1% 10% 90% 99% Event Gaps [ms] 100.49±0.04 100.31 70 100.19 101.30 100.22 101.01 Push Time [μs] 47.11±2.03 40.00 70 25.00 122.00 31.00 64.90 ----------------------------------------dserver/MyServer/top (from foo/bar/top)----------------------------------------- - Received events and callback changes foo/bar/hw/position.idl5_change: 276 -> 309 (+33) events processed : 276 -> 309 (+33) last received server counter - Client performance mean median count minimum maximum 1% 10% 90% 99% Event Gaps [ms] 300.86±0.06 300.71 32 300.39 301.61 300.47 301.41 Sleep Time [μs] 26.22±2.36 20.00 32 16.00 74.00 16.30 46.70 Latency [s] 7.11±0.33 7.11 32 4.01 10.22 4.47 9.76 Callbacks called 1.00±0.00 1.00 32 1.00 1.00 1.00 1.00 Wake Count 1.03±0.03 1.00 32 1.00 2.00 1.00 1.00 position Processing Time [ms] 300.86±0.07 300.78 32 300.30 301.57 300.45 301.50 ===============================================2026-03-17T12:42:31.927772=============================================== ===============================================2026-03-17T12:42:41.937550=============================================== -----------------------------------------dserver/MyServer/hw (from foo/bar/hw)------------------------------------------ ----------------------------------------dserver/MyServer/top (from foo/bar/top)----------------------------------------- - Received events and callback changes foo/bar/hw/position.idl5_change: 309 -> 343 (+34) events processed : 309 -> 343 (+34) last received server counter - Client performance mean median count minimum maximum 1% 10% 90% 99% Event Gaps [ms] 300.91±0.06 300.89 33 300.34 301.70 300.48 301.36 Sleep Time [μs] 29.21±3.29 25.00 33 14.00 103.00 16.40 57.60 Latency [s] 13.82±0.33 13.82 33 10.62 17.03 11.10 16.55 Callbacks called 1.00±0.00 1.00 33 1.00 1.00 1.00 1.00 Wake Count 1.03±0.03 1.00 33 1.00 2.00 1.00 1.00 position Processing Time [ms] 300.86±0.07 300.80 33 300.31 301.66 300.45 301.33 ===============================================2026-03-17T12:42:41.937550=============================================== ===============================================2026-03-17T12:42:51.947066=============================================== -----------------------------------------dserver/MyServer/hw (from foo/bar/hw)------------------------------------------ ----------------------------------------dserver/MyServer/top (from foo/bar/top)----------------------------------------- - Received events and callback changes foo/bar/hw/mode.idl5_change: 2 -> 1 (-1) callbacks foo/bar/hw/position.idl5_change: 343 -> 376 (+33) events processed : 343 -> 376 (+33) last received server counter - Client performance mean median count minimum maximum 1% 10% 90% 99% Event Gaps [ms] 300.83±0.08 300.60 32 300.37 302.10 300.50 301.57 Sleep Time [μs] 32.22±5.24 20.00 32 14.00 171.00 16.30 63.60 Latency [s] 20.54±0.33 20.54 32 17.43 23.64 17.89 23.18 Callbacks called 1.00±0.00 1.00 32 1.00 1.00 1.00 1.00 Wake Count 1.06±0.04 1.00 32 1.00 2.00 1.00 1.00 position Processing Time [ms] 300.77±0.08 300.57 32 300.31 302.07 300.43 301.47 ===============================================2026-03-17T12:42:51.947066=============================================== ===============================================2026-03-17T12:43:01.955012=============================================== -----------------------------------------dserver/MyServer/hw (from foo/bar/hw)------------------------------------------ ----------------------------------------dserver/MyServer/top (from foo/bar/top)----------------------------------------- - Received events and callback changes foo/bar/hw/mode.idl5_change: 5 -> 6 (+1) events processed : 5 -> 6 (+1) last received server counter foo/bar/hw/position.idl5_change: 376 -> 384 (+8) events processed : 376 -> 384 (+8) last received server counter - Client performance mean median count minimum maximum 1% 10% 90% 99% Event Gaps [ms] 301.05±0.17 300.92 9 300.52 301.95 Sleep Time [μs] 22.78±1.94 20.00 9 14.00 31.00 Latency [s] 24.84±0.17 24.84 9 24.04 25.65 Callbacks called 1.00±0.00 1.00 9 1.00 1.00 Wake Count 1.00±0.00 1.00 9 1.00 1.00 mode Processing Time [μs] 126.00±0.00 126.00 1 126.00 126.00 position Processing Time [ms] 301.08±0.18 300.98 8 300.50 301.93 ===============================================2026-03-17T12:43:01.955012=============================================== Consider the second snapshot (lines 30-51), we can see that in this 10 second period 69 :attr:`!position` events were sent by the ``dserver/MyServer/hw`` device server with ~100 ms gap between each event (``Event Gaps``). In other words, these events are being sent at a rate of 10 Hz. However, on the client side we can see that the ``dserver/MyServer/top`` device server has a ~300 ms gap between events being processed (``Event Gaps``), reinforcing what we already knew; the client is only managing to process events at 3 Hz and so cannot keep up. We can also see that the issue is caused by the processing of the :attr:`!position` events, which is taking ~300 ms per event (``position Processing Time``). Another interesting statistic to look at here is the :term:`event latency` (``Latency``), this is the delay between the event being sent by the server and the event *starting to be processed* by the client. In our case, this statistic is telling us what we already knew; there is a long delay (up to ~25s in the final snapshot) on events being processed. This is ultimately why our :func:`!Scan` command timed out. It wasn't that the event was missed by the Tango event system, but rather the processing of the event was delayed. .. warning :: The ``Latency`` measurement is potentially comparing timestamps from different clocks, one used by the server process and one used by the client process. If these processes are on different machines, these clocks might not be in sync. Typically, this will result in a systematic bias in the measurement meaning absolute numbers should be taken with a grain of salt, however, this bias should not influence any changes to the latency you see. .. note :: The ``Wake Count`` measurement is not relevant for us here, however, it could be important in other scenarios. This measurement is counting the number of times the event processing thread was woken up while waiting for the event. A value greater than one means that the thread is being woken up to do something other than process events. Typically, if this value is much greater than 1 it means the device server is spending a lot of time subscribing to and unsubscribing from event streams. This can potentially introduce delays as events cannot be processed while the client is (un)subscribing to an event stream. The event processing thread has to be woken during the (un)subscription in order to negotiate disabling event processing for this period. All the evidence we have gathered points to the culprit being the processing of position events, if we look at the code in question the problem is clear: .. literalinclude :: diagnose-missing-events/MyServer.py :pyobject: RogueDevice.handle_position :emphasize-lines: 3 We should not be sleeping in this callback! In this case the fix is easy, however, in more realistic scenarios it perhaps isn't so straight forward. `Writing robust Tango event callbacks `_ has some guidelines about writing Tango event callbacks that avoid the issue we uncovered in this tutorial. Conclusion ---------- We have managed to track down the performance issue with our control system that was causing it to appear as if an event went missing during our scan. If you are experiencing Tango event system issues with your control system, you probably aren't experiencing exactly the same performance issue, however, hopefully during the process of diagnosing the issue with this toy control system you have learnt: - How the ``server counter`` can be used to correlate events pushed by the publisher with events processed by the consumer - How the ``missed event counter`` reports how many missed events Tango has detected - How the ``--monitor`` flag can be used to gather performance statistics from the event system such as the ``Event Gaps`` and ``Latency`` - How to detect if frequent subscriptions are introducing delays by monitoring the ``Wake Count`` - How one "rogue" device can delay events for all other devices in the device server by monopolising the event processing thread