In the next years, 5G infrastructure will become a ubiquitous, flexible, broadband and programmable network that will be in the core of every social, business, and cultural process, enabling both economic growth and social prosperity. In order to achieve this goal, the 5G vision poses significant technical challenges that must be fulfilled, including the concept of agile programmability and supporting the introduction of management mechanisms for the efficient instantiation of innovative services across heterogeneous network components, virtualized infrastructures and geographically dispersed cloud environments.
One of the important issues to be addressed in this new era of 5G service management is related to network and service monitoring, demanding for the collection and processing of network, computation and storage resources involved in the lifecycle management of 5G services. However, the already available monitoring tools do not achieve to satisfy the requirements stemming from the services envisioned in the 5G landscape, since they are in most of the cases:
In a nutshell, the SONATA monitoring framework collects and processes data from several sources, providing the developer the ability to activate metrics and thresholds in order to capture generic or service-specific behaviour. Moreover, the developer can define rules based on metrics gathered from one or more VNFs deployed in one or more NFVIs in order to receive notifications in real time. In general, the developer is able to subscribe to a message queue or he can get the alert notifications by email and/or SMS on his smartphone. Most importantly, monitoring data and alerts are also accessible through a RESTful API or directly accessing a websocket URL.
One of the cornerstones of the monitoring framework implementation was to deliver a carrier-grade solution that would fulfill scalability requirements in a multi-PoP environment and thus several components of the Monitoring Framework had to be distributed across the SONATA Points of Presence (PoPs). First, each PoP must have its own websocket server to accommodate developers’ demands for streaming data, although the management of websockets is handled by the Monitoring Manager instance in a centralized way. Second, Prometheus Monitoring servers follow a distributed (cascaded) architecture. The local Prometheus servers collect and store metric data from the VNFs deployed in the PoP, while only the alerts are sent to the federated Prometheus server for further processing and forwarding to the subscribed users. Moreover, the alerting rules and notifications are based on monitoring data collected in different PoPs and so the decision must be made on a federation level. Another scalability requirement concerns the large flow of data from the monitoring probes to the Monitoring Server and its respective database that might affect the service performance in extreme cases. In this respect, an architectural decision to address this scalability issue was to support a distributed architecture regarding the monitoring server and its database, working in a cascaded fashion along with proper modifications on component level. In particular, the functionality of the monitoring probe will change so that it will not send data to the monitoring server in cases where the value difference is less than a threshold defined by the developer.