Design decisions

Design decisions

This document captures important design/architectural decisions along with their context and consequences.

[0.1] Delegation of processing and visualization

Tenta follows the Unix philosophy and aims to be lightweight and composable. Tenta collects data from sensors and allows supervising and configuring them remotely. Tenta does not process data and does not visualize anything more elaborate than real-time line charts over raw data. Instead, we design Tenta's interfaces to work well with other tools and provide documentation about, e.g., how to get data out of Tenta to process it.

The open-source ecosystem already has excellent tools for processing and visualization, e.g., polars and Grafana. By not reinventing the wheel here, we can focus on Tenta's core functionality. Additionally, processing and visualization vary immensely between projects, which means that we'd probably only end up with a very complex system that still doesn't work for everyone.

Tenta stores the raw data it receives from the sensors and does not allow it to be modified. This means that Tenta can function as a source of truth. When we export and process data outside of Tenta, bugs or accidental deletions and modifications during processing do not affect the raw data.

Furthermore, it's often not possible to analyze data in real time. We might want to try out and compare different processing methods, or we might want to reprocess all your data when we find a bug. The processing might depend on metadata or external sources that are only available at a later point in time, etc. Additionally, processing can sometimes take hours or days. By decoupling the processing, we can build arbitrarily complex processing pipelines without affecting Tenta's real-time functionality.

[0.1] MQTT for communication

Tenta uses MQTT to communicate with sensors. To configure sensors in real-time, communication needs to happen not only from the sensors to the server but also in the other direction. MQTT uses little bandwidth, which is important for sensors connected via the cellular network. MQTT is flexible and has a large ecosystem of tools and libraries. Going all in on a single protocol means that we can leverage features like retained messages and last will and testament, and it makes for a simpler and more robust system.

LoRaWAN gateways can translate messages between the sensors and Tenta. Such gateways could also be implemented to translate from other protocols to MQTT and thus allow connecting sensors communicating with different protocols to a single Tenta instance.

[0.1] Structure of measurements

Measurements are stored in a single EAV table in the database. This table contains an attribute column (string) and a value column (double-precision floating-point). Measurements arrive as JSON documents at the server, which stores each key-value pair as a separate row. A single measurement consisting of values for temperature and humidity is stored as two rows in the database.

This approach is simple and flexible. We do not have to define measurement schemas beforehand, and we can store different kinds of measurements in the same table. Additionally, measurements can evolve over time without downtime or changes to the database schema.

In previous alpha versions, measurements were stored as JSON documents. However, this approach made it difficult to query and aggregate the data. Additionally, our processing pipeline had to deal with a multitude of different measurement formats, which added a lot of complexity. The EAV schema allows us to export, query, and aggregate the data in a structured way while keeping most of the flexibility of the JSON schema. Additionally, it allows us to show real-time charts and statistics on the dashboard.

The downside of the EAV approach is that all measurements are stored as floating-point numbers. We can represent integers and booleans with doubles, but we cannot store strings or binary objects. We have not yet found this to be a problem in practice. We might be able to support strings and binary objects in the future with a bytea column in a separate table.

[0.1] Configurations as JSON documents

Configurations are JSON documents that are stored and relayed to the sensors without changes. This makes configurations highly flexible and allows them to evolve over time. Tenta does not validate configurations before relaying them to the sensors (except for the fact that they are valid JSON documents). Sensors should be implemented as self-sufficient systems that validate and accept/reject configurations themselves.

What we need to know is whether a sensor has received a configuration and whether it could be implemented successfully. A structurally valid configuration is not guaranteed to be implemented successfully by a sensor: the MQTT broker might be offline, the sensor might have a bug in its firmware that makes the update fail, etc.

Instead, Tenta implements an acknowledgment cycle. When a sensor receives a new configuration, it can reply with an acknowledgment stating whether the update was successful or not. Configurations are validated on the sensors, which is the only place where it can be done reliably.

[0.1] Configuration revisions

The server assigns each configuration a monotonically increasing revision number. This allows us to associate measurements and logs with the configuration that was active at the time. This is often important during analysis. Revision numbers are also used during the acknowledgment cycle to detect if a sensor has received a configuration and whether it could be implemented successfully.

[0.1] PostgreSQL+TimescaleDB

Tenta stores all data in a PostgreSQL database with the TimescaleDB extension. Apart from time series data like measurements, we need to store highly relational information, e.g., about users, networks, or session tokens.

TimescaleDB allows us to optimize certain tables for time series data while still being able to use most of the features of PostgreSQL. Compared to other time-series databases (e.g., ClickHouse, InfluxDB), we can store time series and relational data in a single data store. This makes our setup simple and robust. I've heard of scaling issues with TimescaleDB; We have not yet encountered any and will only react to them when we do.