.. _primer: ======================================= Primer on the cppTango ZMQ event system ======================================= The Tango event system uses a publisher/subscriber protocol. The `Tango RFCs `_ provide a high-level specification of this protocol, free from implementation details. However, in order to diagnose issues related to the Tango event system, we often have to understand the concrete implementation details of the system. The ``ska-tango-event-monitor`` script outputs information about the concrete implementation used by cppTango/PyTango. This primer aims to provide an overview of the implementation to aid in understanding the output from the ``ska-tango-event-monitor``. The modern Tango event system implementation is based on `ZeroMQ `_, using its topic based publisher/subscriber mechanism. There is also an older notifd implementation of the Tango event system, however, this is only used with cppTango <=7. Only the ZMQ-based implementation is discussed here. .. _subscribing: Event subscription ------------------ In order to receive events from a :term:`Tango device server` a :term:`client ` needs to :term:`subscribe` in order to register its interest in receiving these events. :ref:`Diagram 1 ` shows the subscription process. .. _primer-diag1: .. uml :: :caption: Diagram 1: Sequence diagram from the subscription process :align: center @startuml Client -> Client : <> activate Client Client -> "Device Server" as DS : ZmqEventSubscriptionChange <> DS -> DS : <> Client <- DS : <> Client --> DS : <> Client --> DS : <> Client -> Client : <> Client -> DS : <> Client <- DS : <> Client -> Client : <> Client -> Client : <> deactivate Client @enduml A client initiates a subscription to a so-called :term:`event stream`. This event stream consists of a :term:`Tango Resource Locator` (TRL) to some Tango object, such as a device or attribute, and an event type, such as ``intr_change`` or ``archive``. For a given event type, only a certain type of Tango object is applicable. For example, interface change events (``intr_change``) are only available for devices and archive events (``archive``) are only available for attributes. Subscribing to an :term:`event stream` consists of two parts: 1. Resolving the event stream to a :term:`ZMQ socket` and a :term:`ZMQ topic` 2. Registering interest with the Tango device server in the event stream Both of these are achieved by calling the :func:`!ZmqEventSubscriptionChange` command on the :term:`admin device` of the device server where the Tango object in question resides. The admin device replies to this command with the ZMQ socket to connect to as well as the ZMQ topic that events will be published on. Internally, the device server's :term:`event supplier` records that there is a client interested in this particular ZMQ topic. .. note :: A given event stream will only ever resolve to a single event channel, however, there are multiple :term:`ZMQ topics ` it may resolve to depending on the compatibility version negotiated between the client and the server. For cppTango >=9 this compatibility version will be "idl5", which will appear in the :term:`ZMQ topic` name. The ZMQ socket is part of a so-called :term:`event channel` and is unique to a particular device server. Each event channel is identified by the TRL of the admin device of the device server. As well as providing a socket for publishing events (known as the ":term:`event socket`"), each event channel consists of an additional socket for publishing :term:`heartbeat events ` (known as the ":term:`heartbeat socket`"). These heartbeat events are used by Tango clients to monitor the availability of the event channel. This heartbeat mechanism is discussed in :ref:`keep-alive`. The Tango client's :term:`event consumer` connects to both sockets of the resolved event channel. Then it registers interest in the resolved ZMQ topic with the event socket connection. If a Tango client process subscribes to multiple event streams that resolve to the same event channel, then only a single connection is maintained to that channel. After connecting to the event socket, the event consumer then stores the callback to be called whenever an event is received and records that it needs to maintain a connection to this particular event channel. Finally, the Tango client optionally performs a regular read of the attribute or device interface in order to provide an initial event to the callback. Whether this initial read is performed is determined by the ``EventSubMod`` used to subscribe to the event stream. While the client is subscribing to (and unsubscribing from) the event stream, the event consumer stops processing events so that it can safely modify its internal data structures. Similarly, only a single thread can be in the process of subscribing or unsubscribing at a time. This can result in increased :term:`event latency` for events in extreme cases. .. _sending-data: Sending event data ------------------ Once the connection to the :term:`event stream` has been established the :term:`device server ` will begin pushing events for this particular stream. Each event is packaged with a 1-based :term:`counter ` that is incremented for each event sent over the event stream. The :term:`client ` uses this counter to detect :term:`duplicate ` and :term:`missed events `. If the :term:`client ` receives the same counter value twice in a row for a given event stream, then the second event is discarded. If the received counter value increases by more than 1 for subsequent events then the :term:`event consumer` will push a ``API_MissedEvents`` :term:`error event` to all callbacks registered with this :term:`event stream`. If the :term:`Tango device server` is restarted, then the :term:`Tango client` detects this and does not report any jumps in the counter value due to this with an ``API_MissedEvents`` error event. .. _keep-alive: The keep-alive mechanism ------------------------ While the event data is being sent by the :term:`event supplier`, the Tango :term:`event consumer` is responsible for maintaining the connection to the :term:`event channel`. This is performed by the so-called :term:`keep-alive mechanism`. There are two aspects to the keep-alive mechanism: 1. The event consumer monitors incoming :term:`heartbeat events ` for each event channel to detect when the event channel is no longer available 2. The event consumer periodically resubscribes to each :term:`event stream` to notify the :term:`device server ` that the :term:`client ` is still interested in this event stream For the first of these aspects, the Tango device server's event supplier broadcasts a heartbeat event on the :term:`heartbeat socket` every 9 seconds. The client's event consumer records when it receives these heartbeat events for each event channel. Every 10 seconds the event consumer checks if any of the event channels have not received a heartbeat event for more than 10 seconds and pushes a ``API_EventTimeout`` :term:`error event` to each callback associated with the event channels with missing heartbeats. For the second of the above aspects, the Tango client's event consumer will resubscribe to each subscribed event stream every ~3 minutes by calling the :func:`EventConfirmSubscription` command on the admin device of the device server. The Tango device server's event supplier will automatically unsubscribe any ZMQ topics which have not been resubscribed to in the last 10 minutes. .. _unsubscribing: Unsubscribing from an event stream ---------------------------------- The :term:`Tango client` does not make contact with the :term:`Tango device server` to :term:`unsubscribe` from an :term:`event stream`. Instead the :term:`event consumer` simply removes the corresponding callback from its internal data structures. If there are no more callbacks for the event stream, then the client will stop resubscribing to the event stream in question. Similarly, if there are no more event streams for a given :term:`event channel`, then the event consumer will close the connections to the event channel sockets. The :term:`Tango device server`'s :term:`event supplier` will eventually clean up the subscription information on its side as the event stream is no longer being re-subscribed to.