SEMIOTIC aims to orchestrate the communication and the data analytics for Internet-of-Things (IoT) into a meaningfully connected system. SEMIOTIC will optimize the interplay between data communication and data analytics processing based on the value and significance that the specific portion of data has for its end usage. SEMIOTIC comes at a time of rapid expansion of various IoT systems that can be segmented into two large sets: massive IoT systems and reliable low-latency IoT systems. In both types of systems there is a potential to improve the use of the communication resources and the accuracy of the data analytics. The two overall objectives of SEMIOTIC are: (1) decrease the overhead associated with various metadata used in the communication protocols; (2) minimize the resources utilized by irrelevant data. This can lead to dramatic efficiency improvement in the overall system, especially visible in IoT use cases that require low latency, such as real-time data processing and augmented reality.
The central element of the approach in SEMIOTIC is to revise this relationship and utilize co-design of the two layers (modules) in order to improve the overall efficiency, see Figure 1. The white blocks depict a two-way communication model that is based on the classical model of Shannon , valid when the protocol information is negligible, the source bits are ideally compressed, and all source bits are relevant for the destination. With the addition of the two modules, data model and analytics, and protocol information, SEMIOTIC generalizes the communication model to account for the cost of the metadata as well as the meaningfulness of the data communicated. The instances of the model from Figure 1 can be vastly different depending on the actual scenario.
The expansion of IoT systems will lead to massive data produced from a vast variety of connected devices and sensors, thus providing unprecedented knowledge about the state and processes of our physical world. For example, environmental sensor data can give high-resolution information about farming conditions, location/air sensors can provide detailed real-time information about traffic /pollution, while embedded industrial sensors provide the state of an industrial manufacturing process. This can be used to optimize the processes and improve the knowledge/insights about them. However, curently data communication and data analytics are modules that are optimized separately, which may lead to highly inefficient operation. As a representative example, real-time data stream mining puts an upper limit to the amount of data arrivals per unit time in order to be able to process all incoming data from a large number of sources. On the other hand, communication protocols aim to maximize the throughput, expressed as the number of data units per second that are delivered to the data analytics module. This can end in a paradox, where the a large population of devices occupy excessive communication resources in the radio spectrum to deliver data that is eventually not used.