Working with the data

Working with the data

Measurements make up the overwhelming majority of the data in the system. They must be flexible enough to allow for different sensors and measurement formats and must be able to evolve over time without downtime. At the same time, they must be efficient to store and query and structured enough for easy export and analysis.

Messages, measurements, and data points

Sensors send MQTT messages to the server. Each message contains one or more measurements. Measurements contain one or more data points, which consist of an attribute (e.g., temperature) and a value (e.g., 23.1). Each data point is associated with the time Tenta received it, as well as the time it was measured and, optionally, the revision that was in use at the time.

Values are sent as JSON numbers and stored as 64-bit double-precision floating-points. You cannot send strings or binary objects. This is a compromise between flexibility and efficiency that we've rarely found to be limiting in practice. Note that you can represent integers and booleans as floats.

Evolving measurement formats

You can freely evolve a sensor's measurement format by sending new or dropping old attributes. Changing a sensor's measurement format does not require changes to Tenta. Sensors can also use multiple different measurement formats at the same time. Similarly, sensors can send partial measurements, i.e., measurements that only contain a subset of attributes.

Note that although measurements are very flexible, once they are stored in the database, they are immutable. You cannot rename attributes, but you can always add new attributes. If you need to update values retroactively, your best option is to do this during processing.


Export

The server stores measurements in the measurement table. This table has the following columns:

  • sensor_identifier (UUID): The identifier of the sensor that made the measurement.
  • attribute (TEXT): The name of the attribute that was measured.
  • value (DOUBLE PRECISION): The value of the measurement.
  • revision (INT): The revision of the configuration associated with the measurement.
  • creation_timestamp (TIMESTAMPTZ): The time at which the measurement was made.
  • receipt_timestamp (TIMESTAMPTZ): The time at which the server received the measurement.

The most flexible way to process these measurements further is to download them locally. You can access the database from any programming language and with any PostgreSQL client library.

For Python, the connector-x (opens in a new tab) and pyarrow (opens in a new tab) libraries are a powerful combination:

import connectorx as cx
import pyarrow.parquet
import pathlib
 
 
# Read the data from PostgreSQL (here: our development instance)
table = cx.read_sql(
    conn="postgresql://postgres:12345678@localhost:5432/database",  # PostgreSQL connection string
    query="SELECT * FROM measurement ORDER BY creation_timestamp DESC LIMIT 256",
    return_type="arrow2",
    protocol="binary",
)
# Use the directory of the script as path for the file
path = pathlib.Path(__file__).parent.resolve() / "measurements.parquet"
# Write to parquet file
pyarrow.parquet.write_table(table, where=path)

You can then read and process this parquet file with your preferred tool, e.g., with polars (opens in a new tab):

import pathlib
import polars
 
 
# Use the directory of the script as path for the file
path = pathlib.Path(__file__).parent.resolve() / "measurements.parquet"
# Load the parquet file into polars
dataframe = polars.read_parquet(path)
# (optional) Transform the (attribute, value) columns into one column per attribute
dataframe.pivot(
    values='value',
    index=['sensor_identifier', 'revision', 'creation_timestamp'],
    columns='attribute',
    aggregate_function='first',
)

You can access the other tables in the same way, e.g. to explore configurations and logs. Please refer to the database schema (opens in a new tab) for exact details on the available tables and columns.

💡

If your sensors send their current configuration's revision number with each measurement, you can join the measurement and configuration tables on (sensor_identifier, revision) to match each measurement with the associated configuration.

Database access control

The server should be the only user with write access to the database. If you want to give other people read access to the data, you should create a read-only user:

CREATE ROLE reader WITH LOGIN PASSWORD '12345678';
GRANT CONNECT ON DATABASE database TO reader;
GRANT USAGE ON SCHEMA public TO reader;
-- Grant read-only access to all tables in the public schema
GRANT SELECT ON ALL TABLES IN SCHEMA public TO reader;
💡

See the PostgreSQL documentation (opens in a new tab) for more details on managing users and permissions.

To restrict read-only access to certain networks, sensors, or attributes, you can use views. Instead of granting the reader user access to all tables, we can grant access only to measurements from a certain sensor:

CREATE VIEW measurement_single_sensor AS
SELECT *
FROM measurement
WHERE sensor_identifier = '81bf7042-e20f-4a97-ac44-c15853e3618f';
-- Grant read-only access only to the view
GRANT SELECT ON measurement_single_sensor TO reader;