raw
vscombined
vsaggregated
vslive
vshistorical
measurements WUT?
If you are that person , this post might help you.
Overview
Measurements are ingested from assets like batteries, chargers and heat pumps continuously - depending on the vendor, asset type and firmware, this can happen at any pace.
Measurements are then uploaded to the cloud in - currently - two second intervals. For every asset, the latest available measurement since the last upload is used. This results in higher resolution measurements being discarded and lower resolution measurements being uploaded with same frequency as the driver can read from the asset.
That results in a lot of measurements being available. For the vast majority of use cases, this level of granularity is not required.
Each measurement is specific to an asset type, e.g. a meter
measurement looks different than a heat pump
measurement. This is what we refer to as raw
measurements. raw
measurements are collected and retrieved on an asset level, so to get all raw
measurements of a system, you’d need to query them for each asset attached to a certain gridBox individually.
Measurements from assets attached to a gridBox are put together within a small time window. They are called live
measurements. For details and caveats, please refer to the use case section.
Both live
and historical
measurements are aggregated
measurements. live
measurements are aggregated with a window of minimum two seconds whereas historical
measurement aggregations are available for window sizes of 15 minutes or one hour.
And combined
measurements? The combined measurements endpoint returns both raw
and energy management combined into a single object. You can forget about them, they are just there for convenience for specific use cases and do not offer additional information, compared to the other endpoints.
So, what should I use for my use case?
The most current, high resolution data we make available comes from the system live measurements endpoint. We use live
measurements to show, e.g., the live system view in XENON. Be aware, however, that live
measurements do not provide a (near) real-time view of the system’s state: It is the last known set of measurements over a window of five minutes. So if there are no measurements from an asset more current than 5 minutes, they still will be included in the response. As an example, only five minutes after an asset stops sending measurements, the response will reflect that. Before, the last measurement received will be included.
This implies you should not infer on-/offline state of assets based on live measurements.
You should not preemptively pull live
measurements and store them for later consumption. Live
measurements are tailored towards providing the most current view of energy usage to end users on demand, not for analytics and recommendation algorithms.
If you want to analzye or vizualize longer periods of time, you should use historical measurements. Take care to request only the period you need and specify a reasonable resolution given the period’s length - pulling one year of data in 15m interval would not be accepted, e.g.
XENON uses these data to provide the historical view.
We now discourage using raw measurements via the API and might eventually turn off external access, if possible. This has several reasons:
- As the format for every measurement depends on the appliance type, meaning your app would need to consider different formats when working with them. We learned the lesson.
- When processing
raw
measurements, you need to account for measurements arriving late due to edge connectivity issues. Thelive
andhistorical
aggregations account for that. - The cost of using them is high (both for gridX and your company) as a lot of data has to be retrieved, transferred and stored again. While this is especially true for
raw
measurements (due to retrieving them per asset instead of per system), retrievinglive
measurements at a high frequency also incurs high cost and needs to be considered carefully.
How not to use measurements
As you can see from the description above, the measurements we offer through the API are aggregated
views and delivery of every single measurement taken as well as the delivery latency is not guaranteed. Take this into account when designing solutions based on granular measurements.
In particular, if you are planning to implement use cases similar to the ones listed below, please get in touch first so we can find an idiomatic solution together.
Calculating your own aggregations
You want to create custom aggregation based on measurements, e.g. specifically filtered energy consumption or production over a given time period.
Why is it problematic?
As live
and historical
measurements are already pre-processed on the edge, during ingestion and may arrive late, special care needs to be taken to prevent calculation errors from creeping in. Getting this right requires deeper insights into the measurement data pipeline. As we continue to optimize, build new features and scale, internals may change and thus, invalidate prior assumptions.
Additionally, waiting for measurements to arrive and be ready for querying and then downloading them takes time. Typically this can happen rather quick, but for live views of the system’s state, it might still not be up to date enough, depending on your use case.
What to do instead?
Talk to us about the aggregation you need. We might just be able to provide the aggregations you require. What’s more, as we can skip transferring the data to your processing nodes, we can provide aggregations with significantly lower latencies. You also don’t need to concern yourselves with gridX internal updates that might influence the computation of aggregations.
Downloading all measurements continuously
You want to keep a copy of all measurements over time in your own data warehousing solution, probably for future analysis.
Why is it problematic?
Besides the issues with de-duplication, late arriving measurements and pre-processing mentioned above, downloading all measurements continuously will cause significant data transfer and storage cost. Assuming you have 20k systems in the field and download all their measurements every 2s, you’ll end up with loading and ingesting hundreds of millions data points per day. Setting up infrastructure and architecture that can handle this is non-trivial, and expensive. Especially so in case the data is transferred between different cloud computing providers.
Verifying completeness of data can also become an issue - if a gap occurs in the polling job on your end, e.g. due to updates, special care needs to be taken to reconcile what was downloaded already with the data missing.
What to do instead?
Talk to us about the analysis you want to run. There are ongoing efforts to provide deeper insights into data on gridX side, and we’d love to learn about your use cases and consider them when designing analytics features.
If obtaining a copy of measurements still is a hard requirement for you, we may provide a solution to retrieve measurement data in large batches (incurring data transfer and storage cost).
Taking asset controlling decisions directly
You directly want to control energy assets based on measurement data, e.g. charge a battery when PV production is high or feed energy back into the grid.
Why is it problematic?
Controlling energy resources needs to be approached in a holistic fashion, as you interact with a complex system that is steered through various optimization algorithms, both in the cloud and on the edge. As mentioned, the measurements retrieved through the API may not be sufficiently current to take controlling decisions. Interacting with energy resources directly needs to be carefully considered as not to run into ill side effects.
What to do instead?
This is why gridX offers higher level APIs that integrate with internal optimization strategies and don’t require direct interaction with energy resources. Consider our flex Module, e.g.
In any case, please reach out to discuss your requirements, if you consider this too limiting.
Details
Measurement Data Flow
When talking about measurements, we feel it’s helpful to have a rough mental model of their origin and dataflow. Assets (like wallboxes, batteries, heat pumps and PV systems) send measurements to the gridBox they are connected to. They do so at their own pace, and it’s vendor and asset type specific. Some assets send multiple measurements per second, others only every few seconds. Some of these measurements are taken into consideration locally (i.e. on the edge/the gridBox) to control assets directly.
In a certain interval, currently every two seconds, the gridBox collects the latest measurement from all assets and uploads it to the gridX cloud systems. If no uplink is available (due to the household experiencing network issues), measurements are cached and uploaded once a connection is re-established.
sequenceDiagram
box Edge
participant A1 as Heat Pump
participant A2 as PV System
participant A3 as Battery
participant A4 as ...
participant GB as gridBox
end
box Cloud
participant I as Ingestion
participant S as Storage
participant A as API
participant BE as App Backends
end
box Edge
participant APP as Apps
actor U as User
end
loop assets contiously send measurements
A1 ->> GB: send measurement
A4 ->> GB: send measurement
A2 ->> GB: send measurement
A1 ->> GB: send measurement
A3 ->> GB: send measurement
GB -->+ GB: cache measurement
end
loop measurements are uploaded in regular intervals
GB ->>- I: Upload measurements
I --> I: Preprocess<br>measurements
I ->> S: Store measurement<br>timeseries
S --> S: Aggregate measurements
end
activate U
U ->>+ APP: View, e.g. statistics
activate A
alt Native App
APP ->> A: Request aggregation
end
opt Web App
APP ->> BE: Request aggregation
BE ->> A: Request aggregation
end
opt Server side, non-interactive apps
BE ->> A: Request aggregation
end
A ->> S: Load aggregation
note over BE: ... potentially post-process, return all the way back
S --> U:
deactivate A
deactivate U
Timing and window sizes
This table summarizes the current state of measurement sending/ingestion timing and batching. This is to aid your understanding, but please bear in mind this is the current state of internal implementations, are not guaranteed and might change without prior notice. We’ll keep this post updated, but there may be a certain lag. Give us a heads-up if you notice something is off
Action/Aggregation | Interval |
---|---|
Asset sends measurement to gridBox | Continuously, depending on the asset type and vendor. |
gridBox sends measurements to the cloud | Every 2s, if there’s a network connection to the cloud. In case of a network partition, the measurements to be uploaded are cached on the gridBox and uploaded once the partition is healed. If the gridBox is prone to run out of disc space, the cache will be compacted by deleting every other measurements starting by the oldest ones. This means, should this happen, older measurements are getting more sparse the longer the network partition persists while the gridBox still has low disk space. |
Live measurements | Measurements in >= 2s resolution are collected over a 5min window, returning the latest available measurements within that window. |
Historical measurements | The period and resolution of historical measurement aggregations can - in certain boundaries - be defined when requesting them. Please refer to the API docs for details. Measurements arriving late due to edge connectivity issues will be included in the historical measurements with a delay of a bit more than two hours. |
Raw measurements | Raw measurements that arrive at the ts-api with a minimal delay (time between measurement timestamp and time.Now) should be available to query again withing a few seconds. Raw measurements that arrive with a delay of more than 45 minutes bypass the hot storage and it can take up to 45 minutes for them to be available. This is an exception, though, not the norm. |