How do we solve a problem like IoT data pipelines? - Ericsson

Establishing upon the connectivity layer provided by Internet of Things (IoT) platforms, today’s industries are moving towards management and computation solutions which enable Artificial Intelligence (AI) services for data intensive applications.

To extract actionable insights from these applications, we are providing what we call Intelligence Data Pipelines, as well as an architecture in which to deploy them. This, in short, is the ability to extract, understand, and use knowledge (intelligence), from factual information (data), with a set of elements connected where the output of one element is the input of another one (data pipeline).

The framework, Ericsson Research AI Actors (ERAIA), is an actor-based framework which provides a novel basis to build intelligence and data pipelines. In doing so, it addresses two main challenges of Industrial IoT (IIoT) applications:

the creation of processing pipelines for data employed by the AI algorithms
the distribution and orchestration of data and AI computation resources supporting these pipelines

As IoT adoption scales up , the explosion of data generated by devices is unavoidable. Hardware development (such as tensor processing units and graphic processing units) significantly increase systems' capabilities for processing big data and conducting AI computations. Nevertheless, frameworks with clear IoT requirements to enhance the computation capabilities of distributed systems are still rare.

This leads to extra demands on IoT platforms to enhance the intelligence computations while:
fulfilling latency requirements on AI computations and data processing for critical use-cases
offering computation capabilities to provide quick insights through distributed data resources
supporting live configuration updates in dynamic environments
providing safety functionality that can run in the device edge
providing resilient systems that recover easily from disturbances
All the while, the IoT platform should ensure optimized operational and infrastructure costs. These demands illustrate the need to move away from traditional centralized (and data center) IoT platforms to highly distributed platforms.

ERAIA is a novel solution for the computation related demands in data-intensive IoT applications. It’s a reactive system built using an actor model and intends to provide a responsive, resilient and elastic system required by the scalable and distributed IoT deployments (from edge devices, to gateways, network infrastructure and data-centers). By making use of all the end-to-end nodes for conducting divided computation tasks, AI computation and data processing are spread between components in the IoT landscape. Heavy raw data does not always need to travel from edge devices to the cloud, and by providing edge-centric data processing, latency can be reduced; communication resources are reduced; bandwidth is optimized through filtered and processed data, and nodes’ computation utilization is thus increased.

Industrial IoT challenges

Before we arrive at replicable IIoT solutions and enhanced Application Enablement Platforms, there are five common horizontal challenges that need to be tackled in the different IIoT domain verticals. These are: data management, latency and throughput, AI computation, distributed computing and orchestration and live reconfiguration.

Data Management: Enabling intelligence of IoT raises requests to process the data generated by the sensors for discovering patterns and extracting knowledge, which therefore needs to manage the data effectively. Among data management topics in heterogeneous IoT systems, data ingestion, serving, preparation and processing becomes relevant to extract, understand and expose data between different entities and increase interoperability. Ingesting or serving data can be done via various protocols, message brokers, or by querying from databases. Processing, together with serving said data online, improves IoT systems’ functionality via transformation, accumulation, serialization, aggregation and/or compression.

Latency and throughput: For data-intensive systems like IoT, the latency affects the processing efficiency of complex algorithms, such as deep neural networks. Reducing the latency enables the implementation of real-time applications and provides quicker delivery to stakeholders of insights extracted from sensor data. Regardless the improvement of communication and processing hardware, software architectures with clear IoT requirements for efficient orchestration would bring a large benefit in tackling this challenge. Splitting the total latency and throughput cost into individual shares attached to the various IoT solution components provides insights into how they can be further optimized via resource orchestration.

AI computation: AI methods e.g. Machine Learning (ML) are extensively used for IoT data analysis to support various smart service use cases. In IoT, there are strong needs to deploy and run the algorithms in a flexible way to immediately retrieve insights from the data streams. Therefore, AI orchestration in IoT systems has become rather important, as it optimizes the resource utilization of the systems performed by heterogeneous resource-constrained components. In order to evaluate AI composition, it is important to examine the footprint requirements, the possibilities to ingest and expose data using different protocols, and the downtime when a model is updated or re-deployed in another node.

Distributed computing and orchestration: In order to exploit and benefit from the distributed nature of IoT systems, challenges such as weak scalability and weak interoperability due to the heterogeneity need to be dealt with. One way to accomplish this is to employ virtualization techniques, such as docker containers, unikernels, cloud resource containers and other lightweight options such as node-red or actors. These solutions play an essential role in managing and orchestrating the computation capabilities among the distributed components.

Live reconfiguration: One of the main characteristics of IoT is the dynamism of the environments and the need to adapt to changes rapidly while minimizing the impact on the application. The live reconfiguration challenge covers changes such as protocol and/or data adaptation, migration or relocation of computation loads, algorithm changes or ML model updates, and addition or removal of volatile IoT resources.

Data pipeline architecture examples
Our actor-based framework ERAIA tackles these challenges and provides:
a distributed system that can dynamically expand across multiple nodes ranging from edge to cloud in the IoT landscape.
a flexible system which can be reused in various scenarios and be integrated into heterogeneous infrastructures.
seamless online data ingestion and data processing with live reconfiguration.
highly interoperable with state of the art protocols, which facilitates the integration with other solutions and technologies
a well-defined API exposed to other systems that allows for application functionality deployment in a distributed fashion This also makes ERAIA an enabler for lifecycle management of data pipelines as well as AI/ML.
Another important feature of ERAIA is the separation of the control plane (used to configure, manage and keep the cluster alive) from the data plane (used for the processed and transformed data).

The ERAIA architecture has four main components which enable the intelligence and data composition features:
external - provides the API to compose/decompose the intelligence and data pipeline based on ERAIA, and enables external applications to interact with the framework.
cluster manager - creates and manages a fault-tolerant peer-to-peer cluster. This component takes the responsibilities to distribute and place the workloads using a scheduler and/or location based policies.
worker - is the component that executes the pipeline elements provided via the external API. It is the component responsible for the data plane, which establishes the connection to the devices and/or the message brokers.
user interface - facilitates the visualization and management of the system exposed via the API.
Intelligence Processor Unit

Click here to view an enlarged image

The intelligence and data composition refers to the process of on-loading intelligence computations by creating and configuring a series of computations on the ingested data resources. To make such process flexible and re-configurable, the composition is based on a set of intelligence processor units, configured as a set of distributed and organized flow of computations forming the intelligence and data pipeline. Given the decoupling of control and data plane, each of the communication endpoints are initiated or terminated on the processor (worker) units, and not on the manager component. This allows for each of the processor units to be deployed on different worker nodes without impacting the communication. Therefore, it facilitates a rapid migration and redeployment of the individual units, fulfilling the requirements of a vivid IoT environment. This enables the creation of dynamic intelligence and data pipelines which are end-to-end data pipelines composed of one or more individual execution units called intelligence processor units.

An intelligence processor unit is the atomic unit used to compose the pipeline, which can be distributed among workers deployed in the IoT landscape, where each unit conducts computations for certain partitions of the whole AI task. An intelligence processor unit consists of several blocks:
Source is the data ingestion block: inlet encloses the data from the connector library (e.g. MQTT library) to an object. Subsequently, decode converts the data based on the serialization, and exposes it as the expected data type(e.g. received data from the topic X at the MQTT broker Y with JSON+SenML).
Map provides complex event processing (CEP) mechanisms in conjunction with defined intervals that control the output flow (i.e. accumulate, sum, minimum, maximum, first, last, standard deviation). For example, accumulate each element received during a 10 second interval, and provide the list to the next block.
Transform executes a generic or AI function (e.g. enhance data semantics by metadata annotation, rule-based reasoning), or ML functions (e.g. deep neural networks and Bayesian models). This block is compatible with external libraries (and/or tools) for creating customized ML or AI functions.
State manages the persistency of data required for some algorithms on consecutive executions. For example, output data from the previous execution can be used as an input to the current execution.
Sink is the data serving block: encode converts the data based on the expected serialization. Subsequently, outlet packages the data to be sent by the specific connector. An interval parameter throttles the outgoing data flow. This can be used, not only to serve data to intelligence consumers or other IPUs, but to expose the output of control loops or actuation for devices providing local decision making.
An implementation of ERAIA architecture has been successfully employed in different IoT use-cases and scenarios like IoT-enabled smart logistics and Industrial IoT showing its flexibility and replicability. The implementation is on the Java Virtual Machine (JVM) employing Scala and akka. The transformation functions can be implemented using either compiled (e.g. Scala or Java) or interpreted (e.g. Python or Javascript) programming languages. Scala, a functional programming language in Java, is used as the default since ERAIA’s implementation is done in Scala. Python is used for its access to an extensive number of AI libraries and to facilitate their deployments by data scientists and data engineers.

Summary

In conclusion, ERAIA is a new framework that supports scaling, handles heterogeneity, and provides intelligence and data composition. The proposed solution targets two of the main challenges of Industrial IoT (IIoT) applications, namely the orchestration of data and computational resources and the refining and preparing of data for intelligence extraction.

We kindly invite you to read the full paper which was presented at the 18th Annual IEEE International Conference on Pervasive Computing and Communications. In the paper, we take a deeper look at how ERAIA, our actor-based framework enabling intelligence and data pipelines, addresses challenges arising from Industrial IoT. We also present a detailed architecture description, an implementation and a performance evaluation, proving its capabilities on a series of hardware platforms covering the IoT ecosystem.

Learn more

Read the paper in full: ERAIA - Enabling Intelligence Data Pipelines for IoT-based Application Systems

Read more about TinyML as-a-Service and the challenges of machine learning at the edge.

Industrial IoT challenges

Sellers

Registered Users

Systems

Our Company

Login to your account

Password Reset

Industrial IoT challenges