WindSeer: Real-time volumetric wind prediction over complex terrain aboard a small UAV

Florian Achermann

{}^{1}

, Thomas Stastny

{}^{1,2}

, Bogdan Danciu

{}^{1}

, Andrey Kolobov

{}^{3}

, Jen Jen Chung

{}^{1,4}

Roland Siegwart

{}^{1}

, Nicholas Lawrance

{}^{1,5}

{}^{1}

Autonomous Systems Lab, ETH Zürich, Zürich 8092, Switzerland {acfloria, danciub, rsiegwart}@ethz.ch

{}^{2}

Auterion, Edenstrasse 20, Zürich 8045, Switzerland, thomas.stastny@auterion.com

{}^{3}

Microsoft Research, One Microsoft Way, Redmond 98052, USA, akolobov@microsoft.com

{}^{4}

School of Electrical Engineering and Computer Science, The University of Queensland, Staff House Road, QLD 4072, Australia, jenjen.chung@uq.edu.au

{}^{5}

Robotic Perception and Autonomy, CSIRO Data61, QLD 4069, Australia, nicholas.lawrance@csiro.au

Abstract

Real-time high-resolution wind predictions are beneficial for various applications including safe manned and unmanned aviation. Current weather models require too much compute and lack the necessary predictive capabilities as they are valid only at the scale of multiple kilometers and hours – much lower spatial and temporal resolutions than these applications require. Our work, for the first time, demonstrates the ability to predict low-altitude wind in real-time on limited-compute devices, from only sparse measurement data. We train a neural network, WindSeer, using only synthetic data from computational fluid dynamics simulations and show that it can successfully predict real wind fields over terrain with known topography from just a few noisy and spatially clustered wind measurements. WindSeer can generate accurate predictions at different resolutions and domain sizes on previously unseen topography without retraining. We demonstrate that the model successfully predicts historical wind data collected by weather stations and wind measured onboard drones.

Index Terms:

Wind Field Prediction, Machine Learning, Computational Fluid Dynamics, Uncrewed Aerial Vehicles, Sparse Data, Multi Resolution Prediction, Convolutional Neural Network, Low Altitude Wind

I Introduction

Accurate modelling of the wind is crucial for applications such as wind farm layout optimization (WFLO) or safe manned and unmanned aviation. The energy generated by wind turbines is proportional to the cubic power of the wind speed, thus micrositing turbines relies on accurate flow models [1]. Adverse wind poses a challenge for manned aviation close to the ground at airports with challenging surrounding terrain, such as Madeira International Airport [2]. Finally, winds in mountainous regions can easily exceed \qty10\per [3], a speed comparable to the normal cruise speed of small uncrewed aerial vehicles [4], resulting in poor tracking of the planned flight path [5]. If \@iacisUAV sUAV knew the wind in advance, path planning could avoid areas of unfavorable winds and high turbulence [6].

Chaotic fluid-dynamic effects due to local steep terrain result in large spatial variations of wind around complex terrain [7]. Grid-based models thus require sufficiently high spatial resolution. \AcNWP can accurately model relatively large-scale wind patterns with resolutions on the order of kilometers [8]. \AcfCFD simulations generate high-resolution wind flows around terrain at smaller scales, on the order of meters or less, but require knowing well-defined boundary conditions reflecting the overall weather situation [9, 10]. Both simulation-based methods numerically solve the underlying system of partial differential equations, thus are computationally expensive (compute times in the order of hours) and do not provide real-time wind field estimates. Onsite wind measurements either with a Doppler Lidar [11] or measurement masts [9, 10, 12, 13, 14] provide real-time wind information but with limited resolution and high setup cost.

AI-based methods have been used to accelerate the computation of flow fields by assisting or replacing numerical PDE-based solvers in different settings. Examples include modelling fluid flows for visual rendering [15, 16] and replacing computational fluid dynamics (CFD) simulations in aerodynamic shape optimization [17, 18, 19, 20, 21]. However, these models rely on privileged information, such as boundary conditions and consider much simpler geometries compared to the topography of complex terrain. Super-resolution flow analysis closely aligns with our approach, yet previous studies in this field assume complete, uniform coverage of measurements over the entire region [22, 23, 24, 25]. They either exclusively investigate two-dimensional flows [22, 23, 24] or require dense low-resolution data [22, 24, 25]. Various deep neural network (DNN)-based approaches demonstrated weather prediction at a global scale, essentially replacing numerical weather prediction (NWP) with much faster compute time in the order of seconds [26, 27, 28, 29]. But the resolution of these models, on the order of kilometers, is too low to accurately model the wind around complex terrain.

In this work, we present WindSeer, an approach for predicting the wind and turbulence in real-time at meter-scale based on the topography and sparse, noisy wind observations without needing bulky specialized equipment or assuming access to privileged information. WindSeer’s ability to predict real wind stems from its encoder-decoder convolutional neural network (CNN) architecture trained offline using synthetic flow data generated by computationally expensive steady-state Reynolds-averaged Navier–Stokes (RANS) CFD simulations. The CFD simulations are run offline over real terrain patches that are available from web services [30, 31]. The novelty of our method is in training WindSeer to produce CFD-like predictions from only sparse, in-situ observations – without requiring privileged information such as global boundary conditions. Access to these boundary conditions would be equivalent to measuring the wind along the full boundary of the prediction region, which is simply not available in real-time, nor at the required meter-scale. An overview of our wind prediction pipeline is presented in Fig. 1. We evaluated WindSeer in a series of experiments, whose results show WindSeer’s ability to make accurate dense wind and turbulence predictions based on local noisy wind measurements, across a wide range of spatial resolutions without retraining the model.

Refer to caption — Figure 1: Overview of the wind prediction pipeline. (A) First we generate labelled flows utilizing a CFD simulation. (B) Then *WindSeer* is trained with measurements along randomly sampled piecewise linear trajectories to predict the dense flow. (C) During deployment the wind estimates from the UAV or wind measurement towers together with the known topography serve as the input to *WindSeer*.

II Results

We evaluated WindSeer in a sequence of increasingly challenging experiments to demonstrate its real-time wind prediction capabilities:

1.

We demonstrated on held-out CFD-simulated flows that WindSeer is expressive enough to represent the complex flow patterns around real terrain. We analysed several network training approaches on this dataset as it provides dense labels with a wide array of topographies.
2.

We demonstrated the ability of WindSeer to predict real wind data using measurements gathered from masts as part of large-scale measurement campaigns over different terrains across Europe [9, 10, 12, 13, 14], thereby validating both the complete pipeline and our approach of using CFD as a teacher model. These datasets offer good spatial coverage with measurement from different flow regions.
3.

On data from several multi-sUAV flights over mountainous terrain we illustrate WindSeer’s ability to predict the wind when using noisy onboard wind measurements as input. These input measurements were subject to high noise due to the uncertainty of the estimated sUAV’s pose and errors in the airflow sensing, all of which complicated the prediction problem faced by WindSeer.
4.

Finally, we showed WindSeer’s real-time prediction capability on flight-grade hardware.

II-A WindSeer model

WindSeer is a convolutional neural network (CNN) with four-channel input. The input is composed of a binary mask indicating cells containing input measurements, the terrain model stored as a distance field, and the sparse horizontal wind speed measurements (two channels). Note that vertical wind speed measurements are not an input to the model, since weather stations typically measure only the horizontal wind and the vertical wind is not observable with a standard fixed-wing sUAV sensor set. In Appendix C we show empirically that adding vertical wind as an input, if it were available, has a limited impact on prediction quality. The percentage of observed cells in the input data varies across experiments ranging from \qty3.5e-6 to \qty0.19, thus WindSeer always operates on highly sparse observations.

The four-channel output of WindSeer has the same spatial dimension and resolution as the input. The first three channels contain the three-dimensional wind prediction ( $W_{x},W_{y},W_{z}$ ) and the fourth channel contains the turbulence kinetic energy (TKE) prediction — a metric for the strength of the turbulent velocity fluctuations in the wind field that is proportional to the sum of the variances in each dimension. Thus, the prediction contains properties ( $W_{z}$ , TKE) that are not available as input measurements.

II-B Experiment group 1: Predicting CFD-simulated flows

We evaluated WindSeer on held-out CFD-simulated flows. The dense label data allowed for evaluation over the full domain, thus evaluating the influence of different measurement locations as well as qualitatively characterizing the prediction quality.

Experiment setup

We used CFD-simulated flows over previously unobserved terrains and sampled the input measurements and noise from the same distributions as observed during training.

Prediction performance

Three terrain and input pairs together with the prediction error cloud are shown in Fig. 2 A) and B). While the highest prediction errors occurred either close to the ground or on the lee side of the terrain, this trend is mitigated by the fact that, due to practical considerations such as payload configuration and safety, the operating altitude for sUAV is typically over \qty50m above ground level [32, 33] and the wind turbine hub heights are typically higher than \qty80 above ground [34]. The distribution of the average normalized prediction errors over all non-terrain cells over the full test set is displayed in Fig. 2 C) in blue. When scoring the network output only above an altitude of \qty46m the median relative velocity norm error reduces from \qty14.5 to \qty11.5 and the median relative TKE error from \qty11.2 to \qty8.3. The high correlation values in the voxel-wise comparison of the prediction to the CFD labels shown in Fig. 2 D) show that WindSeer can qualitatively capture the different flow regimes. The bias (average error) is close to zero for all channels and the root mean squared error (RMSE) is also low compared to the overall magnitude. Together, this underlines WindSeer’s prediction quality. The scatter density plots for the terrains presented in Fig. 2 A) (1) and (3) are available in Extended Data Figure 14.

Error analysis

We evaluated three individual terrains in more detail (Fig. 2 A)) to assess the sensitivity of the prediction quality to the sampled input data locations. For each terrain we randomly sampled 2000 trajectories and evaluated the prediction error (Fig. 2 C) green). No noise or bias was added to the input data in this experiment to focus solely on the impact of the trajectory location. Whenever the input data was sampled in regions where the model could not predict the prevailing flow well, e.g. the lee side of the hill (Fig. 2 A) (3b)), WindSeer performed poorly. If such a region is large enough, multi-modal error distributions can be observed as seen in Fig. 2 C) (3) where prediction failed for input wind samples on the right (lee) side of the hill.

The CFD-simulated flows are computed on a finer grid close to terrain to account for the high spatial variation of the flow in these regions. Interpolating the flow to the coarser fixed-size grid of WindSeer, as visible in Fig. 1 A), leads to a loss of information and flow artifacts close to the ground. Such confounding factors make learning these low-altitude flows especially challenging, offering an explanation for the performance difference between predicting the low-altitude and high-altitude winds. However, the experiments with the real wind data show that the wind close to the ground can be accurately modelled if the grid resolution is increased.

II-C Experiment group 2: Evaluation on wind measurement campaign datasets

We evaluated WindSeer on real wind data available from three published measurement campaigns. The long measurement periods allow filtering out short-term effects such as wind gusts and reduces the overall measurement noise, enabling an evaluation of WindSeer with clean input wind measurements over larger-scale domains.

Experiment setup

The in-situ wind and TKE measurements were collected via wind velocity sensor suites (sonic or cup anemometers) mounted on masts providing data from \qty2m to \qty100m altitude above ground level [14]. For each terrain, varying wind flow directions and magnitudes are available from different measurement periods (Fig. 3). The terrain in these campaigns varies in complexity and size – from the \qty11m high Bolund hill (Fig. 3 A)) [9, 10] to the gently-sloped \qty116m high Askervein hill (Fig. 3 C)) [12, 13]. Both Bolund and Askervein have limited vegetation while the Perdigão region in Portugal represents the most complex test case with two lightly-forested parallel ridges roughly \qty300m high (Fig. 3 E)).

The varying geometric extents of the sites together with the low altitude wind measurements required a larger grid size ( $384\times 384\times 192$ instead of $64^{3}$ cells) paired with higher resolution to obtain meaningful predictions. Accordingly, the grid resolutions were increased $2\times$ , $4\times$ , and $30\times$ for the Perdigão, Askervein, and Bolund terrains, respectively. These changes in the prediction grid were enabled by the fully convolutional architecture of WindSeer and the distance field representation of the terrain that indirectly provides the cell size and resulted in around $100\times$ sparser input data compared to the training density.

We compared WindSeer against an averaging baseline (AVG) that assumes the wind and TKE are constant and predicts the average of all measurements over the full domain. This is a widely accepted assumption for sUAV flights [5, 35, 36], moreover, the state of the art in planning large-scale missions relies on NWP forecasts that remain constant at the spatial resolution of the mission [37]. Each method predicted the wind based on the measurements from a single mast, yielding an ensemble of predictions for each wind case, while measurements from the remaining masts were used to validate the predictions.

Prediction performance

For each wind case we averaged the metrics over the ensemble of predictions and report the absolute errors of the wind magnitude, vertical wind, and TKE in Tab. I. In most cases WindSeer outperformed the averaging baseline (AVG). In the other cases, the wind direction usually aligned with the ridge/terrain, causing only small variations across the measurements of the different masts. The average prediction errors of WindSeer are \qty17, \qty43, and \qty39 lower than the baseline for the velocity magnitude, vertical wind and TKE respectively.

To assess whether WindSeer can predict flow trends well, we also report the correlation between the wind predictions and measurements in Tab. I. The correlation for the AVG baseline is undefined, due to the constant, location-invariant wind prediction. Averaged over all cases, WindSeer yielded strong positive correlations for all metrics suggesting that it was able to predict the observed trends well, such as up- and downdrafts, high wind and turbulence, thus providing a valuable contribution to planning safer and more efficient sUAV trajectories or WFLO.

In Fig. 4 A-C we compare the measurements to the WindSeer predictions. Analogous to the previous experiment we use the wind measurements from one mast as input resulting in 32 predictions for the Bolund [8 masts and 4 wind cases], 182 predictions for the Askervein [14 masts and 13 wind cases], and 9120 predictions for the Perdigão campaigns [38 masts and 240 wind cases (one hour averaged data for ten different days¹¹12017-05-09, 2017-05-11, 2017-05-12, 2017-05-16, 2017-05-18, 2017-05-20, 2017-05-26, 2017-06-02, 2017-06-03, 2017-06-08)]. Overall, the low bias and RMSE together with the high correlation demonstrate that WindSeer handles different wind conditions well. The best predictions are obtained for the simpler Askervein terrain and the errors grow with increasing terrain complexity. In general the model under-predicts the downdrafts in the wind fields for the Bolund and Perdigao cases (Fig. 4 A and C) but the Askervein experiment (Fig. 4 B) shows that WindSeer is capable of representing strong downdrafts.

For the Bolund hill a study of different CFD simulation was conducted and the speed up errors for one wind direction (\qty239) are reported [10]. The average absolute speedup prediction error for WindSeer is \qty20.3 while the best performing RANS-CFD models achieve an error of \qty15 but with runtimes in the order of hours. The models with a runtime of less than \qty15min have errors of \qtyrange26.532.4, comparable to our averaging baseline with error \qty33.5.

TABLE I: Real world wind results. Absolute prediction errors (

\epsilon

) and correlation between the measurements and predictions (

\rho

) for the velocity magnitude (S), vertical wind component (W), and turbulence kinetic energy (TKE) on the measurement campaign datasets and flight experiments

		S $[m/s]$			W $[m/s]$			TKE $[m^{2}/s^{2}]$
		AVG	WindSeer		AVG	WindSeer		AVG	WindSeer
Terrain	Case	$\epsilon$	$\epsilon$	$\rho$	$\epsilon$	$\epsilon$	$\rho$	$\epsilon$	$\epsilon$	$\rho$
Bolund	90	1.80	1.58	0.72	0.85	0.58	0.50	1.91	1.33	0.58
	239	2.80	2.50	0.68	0.66	0.34	0.76	2.67	1.68	0.86
	255	3.24	2.47	0.82	0.85	0.44	0.72	3.43	2.12	0.82
	270	3.77	2.79	0.85	0.95	0.51	0.78	5.14	3.43	0.73
Askervein	TU25	2.58	2.39	0.65	1.10	0.37	0.90	0.61	0.41	0.89
	TU30A	1.14	0.98	0.62	0.41	0.26	0.58	1.38	0.72	0.42
	TU30B	1.80	1.46	0.73	0.51	0.41	0.64	2.82	1.40	0.23
	TU01A	3.26	2.83	0.77	1.41	0.52	0.91	1.89	1.06	0.85
	TU01B	3.24	2.74	0.79	1.37	0.48	0.92	1.64	0.98	0.87
	TU01C	3.55	3.08	0.78	1.23	0.45	0.92	1.17	0.72	0.90
	TU01D	4.21	3.71	0.79	1.26	0.47	0.93	1.62	1.17	0.93
	TU03A	5.29	4.70	0.78	1.74	0.64	0.93	2.04	1.31	0.98
	TU03B	4.90	4.41	0.77	1.54	0.54	0.92	1.82	1.21	0.90
	TU05A	1.91	1.89	0.62	0.76	0.31	0.89	1.73	0.99	0.40
	TU05B	1.18	1.00	0.79	0.31	0.26	0.48	1.40	0.63	0.04
	TU05C	0.93	0.93	0.66	0.34	0.25	0.58	1.09	0.43	0.14
	TU07B	3.42	3.27	0.70	1.59	0.49	0.90	2.44	1.85	0.40
Perdigão 2017-05-09	13:30-13:35	2.91	2.27	0.82	0.85	0.57	0.53	-	-	-
	17:10:17:15	4.41	3.33	0.48	1.22	0.88	0.35	-	-	-
	17:00-18:00	3.06	2.37	0.77	0.80	0.56	0.50	-	-	-
Perdigão 2017-05-12	01:00-02:00	2.82	2.31	0.76	0.52	0.42	0.45	-	-	-
	17:00-18:00	2.76	2.23	0.81	0.70	0.57	0.57	-	-	-
	19:45-19:50	1.15	0.90	0.71	0.21	0.16	0.57	-	-	-
Perdigão 2017-05-16	07:00-08:00	1.58	1.16	0.65	0.27	0.27	0.67	-	-	-
	11:40-11:45	0.85	0.77	0.51	0.33	0.27	0.22	-	-	-
	12:40-12:45	0.86	0.75	0.48	0.29	0.22	0.30	-	-	-
	20:00-21:00	1.35	0.97	0.68	0.22	0.31	0.21	-	-	-
Perdigão 2017-05-18	14:35-14:40	2.14	1.82	0.70	0.41	0.36	0.33	-	-	-
	20:00-21:00	1.54	1.08	0.84	0.17	0.19	0.12	-	-	-
	22:00-23:00	1.52	1.13	0.74	0.17	0.17	0.42	-	-	-
Perdigão 2017-05-20	03:15-03:20	3.59	2.63	0.61	0.41	0.55	0.30	-	-	-
	10:00-11:00	2.20	1.84	0.71	0.51	0.42	0.46	-	-	-
	12:20-12:25	1.93	1.71	0.66	0.58	0.43	0.45	-	-	-
Perdigão 2017-06-08	00:00-01:00	2.68	1.87	0.69	0.35	0.38	0.44	-	-	-
	12:40-12:45	1.20	0.90	0.77	0.31	0.22	0.46	-	-	-
	14:00-15:00	2.49	1.77	0.82	0.74	0.42	0.58	-	-	-
All campaigns		2.50	2.07	0.71	0.72	0.41	0.59	2.05	1.26	0.64
Chasseral	Flight 1	0.61	0.85	0.33	0.54	0.38	0.95	-	-	-
	Flight 2	0.57	0.77	0.48	0.50	0.32	0.80	-	-	-
	Flight 3	0.53	0.71	0.37	0.49	0.30	0.84	-	-	-
Oberalp	Flight 1	0.56	0.82	-0.46	0.54	0.33	0.78	-	-	-
Gotthard	Flight 1	0.99	1.13	0.33	0.21	0.18	0.88	-	-	-
All flights		0.65	0.86	0.21	0.46	0.30	0.85	-	-	-

of WindSeer compared to the averaging baseline (AVG). Note that the AVG baseline does not offer correlations as it produces a constant signal.

TABLE I: Real world wind results. Absolute prediction errors (

\epsilon

) and correlation between the measurements and predictions (

\rho

) for the velocity magnitude (S), vertical wind component (W), and turbulence kinetic energy (TKE) on the measurement campaign datasets and flight experiments

Error Analysis

The RANS CFD simulations compute the time-averaged solution of the Navier-Stokes equation, thus WindSeer is trained to operate on these static flows. However, in the real world, wind changes constantly. The Perdigão campaign provides measurements as 5-minute averages allowing us to compare the performance of WindSeer operating with high- and low-frequency data as shown in Fig. 4 D. Averaged over a day, the performance is consistent across the two time windows with slightly better results using the hourly averages. One exception is the large difference in the correlation scores observed between 06:00-10:00, a time period characterized by low wind magnitudes and rapidly changing wind direction as evident from the TSE04 tower measurements. These dynamic conditions do not match the RANS CFD simulation offering an explanation for the poorer prediction performance on the 5-minute averaged data compared to the hourly averages that filter out these unsteady flow features.

In Fig. 3 the masts are colored by the averaged wind magnitude error of the WindSeer prediction when using the measurements from that respective mast as inputs. Lighter colors indicate input measurement mast locations that yielded more accurate predictions. In Fig. 4 D we show the prediction errors and correlation using the TNW11 and V01 tower data in addition to the averaged errors over all tower predictions. Consistent with the findings from our previous experiments, using measurements from the hill top or the upwind side generally resulted in lower-error predictions than measurements from the lee side, nevertheless, the correlation is consistently high.

Wind field trends along straight lines

In each campaign the masts were arranged along multiple straight lines enabling us to qualitatively assess whether the models could capture expected flow trends along these lines. We selected cases where the wind and the line direction are parallel, as the measurements show higher variation in these scenarios, and present the WindSeer and baseline predictions in Fig. 5. Refer to Fig. 3 for the wind direction and the mast locations. For each method and case we show three predictions using the measurements from different masts as the input. The error bars for wind speed measurements report the 1 $\sigma$ uncertainty.

WindSeer successfully predicted the speed changes and up-/downdrafts unless the measurement tower was located on the lee side of a hill, e.g. Bolund: M8, Askervein: ANE40, Perdigão: V01/TNW07. Wherever TKE measurements were available, WindSeer predicted TKE trends well. WindSeer struggles to predict flow patterns not observed during training, such as a lee side rotor which occurs when flow detaches on the downwind side and causes a recirculating pattern (Appendix A). Such a case is shown in Extended Data Figure 15.

II-D Experiment group 3: Predicting the wind along sUAV trajectories

We evaluated WindSeer on noisy, local wind measurements collected by multiple fixed-wing sUAV. This mirrors the targeted use case of WindSeer and challenges it with high input noise levels due to real sensor noise and short term wind effects such as gusts or turbulence.

Experiment setup

We flew multiple fixed-wing sUAV simultaneously in the Swiss Jura (Chasseral) and the Swiss Alps (Oberalppass and Gotthardpass). The flight plans for each sUAV consisted of multiple circular loiter patterns. We generated the sparse input from the point-cloud of measurements to WindSeer by binning the observations along the flight path from one sUAV into the discretized prediction grid and averaging all measurements falling in a single cell. We used the native training grid size and resolution with the grid center at the first observed sUAV position estimate.

Prediction performance

We first evaluated WindSeer on time-averaged data, similar to the previous experiment group with the static masts, to reduce the noise and sensitivity to sensor calibration on the input measurements. We generated this data by averaging the measurements over one loiter pattern to a single wind measurement. The observations from the different sUAV and multiple loiters enabled us to compute the average prediction error and the correlation for each flight experiment. We present the error metrics for all flights in Tab. I. For the wind magnitude, the baseline outperforms WindSeer which also only yields a slightly positive correlation averaged over all flights. Nevertheless, the high correlations and significantly lower errors for the vertical wind compared to the baseline indicate that WindSeer can better predict the locations of dangerous downdrafts, as well as favourable updraft regions, based solely on wind measurements taken from one sUAV.

We further evaluated WindSeer in a sequential time-windowed manner using the wind data from a \qty120 window to predict the wind along the flight trajectory for the next window. The corresponding predictions for the first Chasseral flight are displayed in Fig. 5. Note that the prediction between successive windows can be discontinuous due to the different measurements used by WindSeer. In Extended Data Figures 16 and 17 we provide the predictions for the other flights as well as flight paths and example predictions. The model was able to accurately predict the magnitude difference in the vertical wind between the two sUAV for all Chasseral flights. It slightly under-predicted the downwind on the lee side for the validation sUAV, which can be explained by the generally worse performance of the models on the lee-side wind predictions, as shown in the previous experiments.

Error Analysis

Although the measurements were averaged over time the noise due to wind gusts and measurement errors was comparable to the variation of the measured wind magnitude as outlined in Appendix D. Thus, the averaging already offered a good baseline prediction of the wind magnitude. As the wind on the vertical axis varied much more between the different sUAV and throughout a flight, WindSeer could predict these variations and outperform the baseline.

The Oberalppass and Gotthardpass are especially challenging prediction terrains, as they exhibit large altitude changes (\qty1500) due to valleys and peaks within \qty4\kilo of our flight locations. High surrounding peaks can cause high gust levels, explaining the high variation in the measured wind. Furthermore, terrain outside the prediction area can significantly influence the wind features observed in the valley. In contrast, our CFD simulation setup for generating the WindSeer training data was limited to well-defined inflow conditions and a domain size of \qtyproduct1.5 x 1.5\kilo, which did not allow simulating the flow over multiple large scale mountains or ridges. Thus, during training WindSeer did not observe such complex wind flows, explaining the performance difference between the Chasseral and Oberalppass/Gotthardpass flights.

II-E WindSeer inference time

We evaluated prediction times of WindSeer on an NVIDIA Jetson Orin AGX, a light-weight and low-power single-board computer, to show real-time performance on sUAV flight-grade hardware. The average inference times over 100 runs on a $64^{3}$ and $384\times 384\times 192$ prediction domain were \qty[separate-uncertainty-units = single, separate-uncertainty=true]0.021(2) and \qty[separate-uncertainty-units = single, separate-uncertainty=true]3.577(15) respectively. Mixed-precision inference reduced the inference times to \qty[separate-uncertainty-units = single, separate-uncertainty=true]0.021(5) and \qty[separate-uncertainty-units = single, separate-uncertainty=true]1.700(5). These inference times show that WindSeer is capable of low-latency wind predictions over large domains with limited compute to quickly recalculate predictions in response to new measurements.

III Discussion

In this work, we have proposed an approach to train a CNN, WindSeer, for predicting low-altitude wind and TKE around complex terrain in real-time based on sparse and noisy wind measurements and known topography. We trained WindSeer solely on simulated RANS CFD flows over terrain patches from Switzerland and evaluated it on held-back CFD data and real wind measurements. In the first experiment on previously unobserved CFD solutions we demonstrated that WindSeer is capable of replicating the dense flows based on sparse and noisy observations with high accuracy (median relative error below \qty10).

In the next experiments we demonstrated zero-shot sim-to-real transfer by evaluating WindSeer on real wind measurements without retraining. On the historic measurement campaign datasets we showed that WindSeer was able to reconstruct real wind flows of different scales, at up to 30 times higher resolution than the training data. This corresponds to larger prediction domains containing up to 108 times more cells than those used for network training and to much sparser input data compared to training. The distance field representation enabled this multi-resolution property of WindSeer as it provided a sense of scale to the model.

Finally, the performance of WindSeer on the flight data is comparable to the average baseline assumption, with less accurate predictions of the horizontal wind but superior performance when predicting the vertical wind, which is the key factor when assessing the safety and efficiency of flight plans. The flight data exhibits much higher measurement noise due to the noise of the low-cost sensors used to estimate the sUAV pose and wind. Therefore, until better wind sensing is available on sUAV, we envision the onboard deployment of WindSeer by using the weather data from nearby weather stations.

Previous work has shown the ability of DNN to predict fluid flows for well-defined geometries paired with well-known inflow conditions [17, 18, 19, 38, 39]. We demonstrated the capability of DNN to work with sparse and noisy input data on realistic complex terrain. This enables real-time wind prediction using data that is feasible to obtain aboard \@iacisUAV sUAV. The sparsity of the input data (\qty0.19 down to \qty3.5e-6) exceeds previous research of sparse-to-dense DNN that usually assume denser data around \qty0.75 [40, 41], \qty0.2 [42], or \qty6.5e-3 [43]. In these previous examples the sparse input data was distributed over the whole prediction domain while in our case we showed WindSeer still performs well even if the samples are located within a spatially constrained sub-region.

III-A Limitations and Future Work

Training domain

In our data generation pipeline we restricted the CFD simulation domain to \qtyproduct1.5 x 1.5\kilo based on the initial assumption that the large scale NWP would be used in the network input. This domain size restricted the terrain to mostly contain one single major geographical feature such as a mountain or a ridge. Therefore the current CFD training data does not contain samples that include wind phenomena such as lee-side rotors, which arise in the presence of multiple mountains/ridges. A larger simulation domain in the order of \qtyproduct10 x 10\kilo could allow a better representation of such complex flows and possibly increase the wind prediction performance for complex terrains inside mountain ranges. A biased sampling strategy when composing the input could also help to solve the sample imbalance problem by exposing the network to more examples from the lee-side flow regime during training.

Temporal wind variation

Changing the CFD simulation from a time-averaged RANS solution to a time-varying model such as large eddy simulation (LES) or a mesoscale weather prediction, such as the WRF model [44], could have multiple advantages. First, \@iaciDNN DNN could be trained using the time-varying wind data to construct the input but still predict the time-averaged solution. This could result in the model learning wind gust characteristics and thus increase robustness to noisy wind estimates from the sUAV. Second, a model could be trained to predict the time-varying flow representing wind gusts and short term weather evolution in the predicted wind. However, whether the information from the noisy measurements are sufficient to uniquely determine the flow state still needs to be carefully analyzed. Depending on the sparsity of the data there are likely to be multiple possible flow solutions matching the observations. Further, time varying methods like LES require significantly more computational resources than RANS solvers, further increasing the cost of generating training data [45].

Fluid flow assumptions

Currently the CFD simulations used to train WindSeer model the air as an incompressible fluid with uniform temperature. By including temperature differences in the compressible fluid and terrain the CFD simulation could model complex flow phenomena such as thermals [46], updrafts caused by temperature variations on the ground, or mountain waves [47, 48], which are large scale oscillations of the wind direction and magnitude behind large ridges. However, simulating these phenomena would require far more input data and ultimately we would still need to verify whether these simulations provide realistic flows that reflect the true airflow characteristics.

Wind Estimation

Our current wind sensing setup onboard the sUAV is prone to calibration errors and noise resulting in relatively large wind estimation errors. Alternative sensors, such as a five hole probe [49] paired with an improved calibration procedure or better sensor placement could improve the wind estimates.

IV Methods

IV-A Overview

We developed a pipeline (Fig. 1) to train and deploy \@iaciCNN CNN, WindSeer, that predicts the dense wind and turbulence around complex terrain. The network training consists of two steps: First we generated a dataset of dense flows over terrain patches from Switzerland using \@iaciRANS RANS CFD solver (Fig. 1 A)). We then trained WindSeer using the label flows to simulate local wind measurements along randomly generated piecewise-linear trajectories, robustifying the predictions by adding noise to the measurements along the trajectories (Fig. 1 B)). The trained WindSeer was evaluated on (i) held back CFD-simulated flows on previously unobserved terrains, (ii) real wind data gathered in measurement campaigns [9, 10, 12, 13, 14], and (iii) real wind data measured by multiple sUAV around mountainous terrain.

IV-B CFD wind data

We generated flow data over real terrain patches with a pipeline based on the open source solver OpenFOAM [50, 51] with the steady-state RANS model and the popular $k-\epsilon$ two-equation turbulence closure [52]. The automated pipeline ingests terrain patches and outputs the time-averaged flow solutions for multiple wind speeds [53]. We extracted 563 terrain patches each with an extent of \qtyproduct1.5 x 1.5\kilo from the GeoVite service, which provides access to the swissALTI3D digital elevation map (DEM) for Swiss researchers, with a lateral resolution of \qty0.5m²²2https://geovite.ethz.ch/, recently also available on ArcGIS:
https://elevation.arcgis.com/arcgis/rest/services/WorldElevation/Terrain/ImageServer. The terrain patches exhibit at least one side with near-constant elevation allowing us to simulate a formed boundary layer flow (logarithmic profile) entering into the domain from that face. Some terrains allowed for multiple flow directions leading to 866 terrain/flow direction pairs. The vertical extent of the simulation domain was three times the height difference of the terrain with a lower bound of \qty1100 minimizing the boundary effects on the flow. Each case was simulated with up to 15 different wind speeds if the automatic meshing succeeded, resulting in 7361 executed CFD runs. We initialized the subsequent simulation for the higher wind speed cases with the previous solution to speed up computation. Only solutions that met a required optimization tolerance were accepted as fully converged solutions, which was the case in \qty92.9 of the runs. We enhanced our dataset with one zero-velocity flow for each terrain that had at least one converged CFD simulation, resulting in a total of 7285 flows.

The CFD solutions are computed on an automatically generated irregular mesh with OpenFOAM’s SnappyHexMesh utility. We resampled each case up to a height of \qty1100 to a regular \numproduct91 x 91 x 96 grid resulting in a resolution of \qty16.5 horizontally and \qty11.5 vertically (Fig. 1 A)).

IV-C Data augmentation

Generating CFD flows is a computationally and labor-intensive task. For reference, our 7361 CFD runs required \qty9168 CPU compute time (\qty782 creating the meshes and \qty8386 solving the flow, average compute time: \qty1.25). Unfortunately, deep networks are notoriously data-hungry and, for a complex modeling problem such as wind prediction, would typically require orders of magnitude more training data to achieve good performance. In computer vision, image augmentation methods are widely used when training deep CNN [54]. These methods aim to improve the quality and size of the datasets when only limited data is available to prevent the networks from over-fitting. In this work, we showed, for the first time, that geometric transformations can be applied to CFD flows to augment the WindSeer training data.

We randomize the locations of terrain features and flow directions by generating $64^{3}$ subdomains sampled from each full \numproduct91 x 91 x 96 grid. The subdomains are constructed by sampling from a range of rotations and origin translation offsets inside the full domain. In a first step the horizontal shift and a rotation around the $z$ -axis are sampled from bounded uniform distributions to ensure that the shifted and rotated $64^{2}$ subdomain is fully contained within the full $91^{2}$ domain. Then in a second step the vertical shift is sampled from a triangle distribution with lower limit and mode of 0 and an upper limit of 32. Smaller vertical offsets are favoured to focus on the complex flow regions closer to the terrain. The flow data is linearly interpolated to the coordinates of the subdomain grid, which is the same spatial resolution as the full grid.

IV-D Input and label composition

The input to WindSeer consists of four volumetric channels, one of which corresponds to the terrain encoding $T$ created as a Euclidean distance transform with zeros inside the terrain. That representation propagates the terrain information over the full domain and even allows us to include terrain features outside of the domain if they are accounted for in the distance field calculation. The remaining three channels include the sparse and noisy horizontal wind measurements ( $U_{x,in},U_{y,in}$ ) and a binary mask $B$ indicating cells containing measurements. Previous work has shown the value of providing binary input masks to CNN handling sparse input data [55, 56].

Measurements from weather stations or realistic flight scenarios only cover a small percentage of the prediction volume along a connected path, e.g. a \qty30 flight segment with our sUAV covers approximately 20 cells at the default grid resolution. Consequently, for a practical onboard wind prediction scenario, we expect the available input wind data to be very sparse and thus construct our network input to reflect this sparsity. We create the input based on the augmented dense flow by creating a mask and then selecting the measurements based on the mask. We emulate the characteristics of \@iacisUAV sUAV flight path by filling the mask along sequential randomly-selected piecewise linear segments with a length of 3 to 500 cells.

Noise is added to the sampled wind data in order to account for fluctuations in the wind and sensor errors that are not captured by the RANS CFD simulations. Two types of disturbance are added, white Gaussian noise (sampled i.i.d. at each measurement from $\mathcal{N}\left(0,\sigma_{g}^{2}\right)$ ) and measurement bias (sampled from $\mathcal{U}\left(-0.1,0.1\right)$ and applied to all measurements). The first has the purpose of simulating noise due to sensor measurements [57], while the latter simulates the effects of sensor miscalibration. The standard deviation for the Gaussian noise $\sigma_{g}$ itself is drawn from a uniform distribution: $\sigma_{g}\sim\mathcal{U}\left(0,0.1\right)$ , simulating different noise levels. All the noise values are scaled with the mean wind velocity for each sample in the training set to have coherent noise levels from low to high velocity samples. Note that noise is only added to the training inputs and not to the CFD ground truth labels used to compute the network training losses.

The sparse input implies that for most cells in the input wind velocity channels ( $U_{x,in},U_{y,in}$ ) the values are undefined since they do not contain a measurement. We test and evaluate two approaches to filling the missing information. The first naïve approach simply places zeros in all voxels without a measurement. This results in large gradients of the input for high magnitude wind. The second approach uses the per-channel-average of all measurements as the fill value resulting in a smoother input and propagating the information over the whole domain.

The labels are constructed by stacking the four volumetric channels corresponding to the three-dimensional predicted velocity ( $U_{x,out},U_{y,out},U_{z,out}$ ) as well as the TKE at each cell from the CFD ground truth flows.

IV-E Model training

The wind prediction model is an encoder-decoder CNN with skip connections based on the U-Net architecture [58]. The WindSeer encoder is composed of single 3D convolutions with kernel size 3 and reflection padding to preserve the size. Using skip connections, the information at each depth is relayed to the decoder before utilizing a max-pooling layer with kernel size of 2 to down-sample the feature map. The original domain size is restored by pairing the information from the skip connection with a nearest-neighbor up-sampled feature map followed by two 3D convolutions with kernel size 4. This decoder structure removes checkerboard artifacts sometimes experienced when using a decoder-encoder CNN [59]. Each convolution, except the final one, is followed by the nonlinear ReLu layer with negative slope 0.1 [60].

A majority of the cells in the wind speed input channels contain no measurements. Our first approach was to set the values of all these cells to zero. However, this resulted in a network overfitting to the number of observed cells and did not generalize to larger domains and different resolutions as demonstrated in Appendix C. In the end, we ended up filling the unobserved cells with the average of all measurements per channel which results in a smoother input and helps to propagate the information over the full domain.

A scaled version of the mean squared error (MSE) loss is applied to train the model balancing the loss $L(\cdot)$ between the samples and channels:

L\left(X,Y,N\right)=\frac{1}{N}\sum_{c}\left(\frac{X_{c}-Y_{c}}{\hat{Y}_{c}}% \right)^{2},

(1)

where $X$ is the network prediction, $Y$ the label flow, $\hat{Y}_{c}$ the label average per channel of the non-terrain cells, and $N$ the number of non-terrain cells. Normalizing the error by the average label value balances the loss for flows of different magnitudes. Without accounting for the number of terrain cells in the loss, a sample with a high ratio of terrain cells would not contribute much to the overall loss. Thus, scaling according to $N$ prevents these cases from being underrepresented in the training.

The model is trained using the Adam optimizer [61] for 3000 epochs. The initial learning rate of $1.0\times 10^{-5}$ is quartered every 700 epochs.

IV-F Measurement campaign datasets

Each of the three measurement campaign datasets that we used for evaluation are publicly available but require some preprocessing to enable direct comparison with our wind prediction outputs. We convert the data from the different file formats for each measurement campaign to the same gridded format that we use to store the CFD solutions. Each experiment provides terrain data as well as wind measurements collected using static masts equipped with airflow sensors at various heights. The terrain is discretized by querying the raw data using bilinear interpolation in the center of the respective cell. The location of each measurement is converted into the cell coordinates.

WindSeer predicts the wind using the measurements from one mast. The measurements are filled into the corresponding cell and averaged in case of multiple measurements in one cell. The predictions, which are obtained with trilinear interpolation at the sensing locations, are then compared to the measured wind. CNN allow for variable input sizes, a trait we exploit to predict at a higher spatial resolution for domains with smaller length scales (see Bolund Hill below) at an increased domain size of $384\times 384\times 192$ cells. Since the terrain is represented as a Euclidean distance field, this gives WindSeer a sense of the grid resolution and thus the scale of the flow, enabling us to predict the wind at different scales.

The error bars for Fig. 5 are calculated using error propagation from the standard deviations of each axis if not reported for the magnitude [62].

Bolund hill

The data for the Bolund hill experiment containing time-averaged wind velocities and TKE measurements is publicly available³³3https://www.bolund.vindenergi.dtu.dk/blind_comparison. As Bolund hill exhibits only a small elevation change of \qty11, the default prediction resolution is not sufficient to account for its near-ground measurement locations. As mentioned above, we exploit the multi-scale property of WindSeer and increase the resolution of the prediction grid thirty-fold resulting in a domain $\qty{\sim 211}{}\times\qty{\sim 211}{}$ wide and \qty∼73m tall, giving a corresponding horizontal resolution of \qty0.55 and vertical resolution of \qty0.38.

Askervein hill

While a digitized version of the Askervein hill topography is available⁴⁴4https://zenodo.org/record/4095052 the wind and TKE measurements had to be manually extracted from the field report [12]. We selected 13 runs measuring the turbulent wind, where the data from most towers is provided (in certain runs data is not reported for all towers). The measurements are averaged over one- to four-hour intervals with varying flow magnitudes and directions. The domain size of $\qty{1584}{}\times\qty{1584}{}$ wide and \qty552m tall results in a four-fold resolution increase.

Perdigão

The Perdigão dataset consists of multiple measurement posts of different heights ranging from \qty10m up to \qty100m across the valley or along the ridges⁵⁵5https://perdigao.fe.up.pt/. We used the five minute averages and tilt corrected measurements that were recorded throughout the measurement campaign and we consider data from six different days in our evaluation. The tower positions were not stored with sufficient precision in the dataset requiring us to manually correct the positions. We extracted the topography of the hills from the World Elevation Terrain layer provided by Esri using ArcGIS ⁶⁶6https://www.arcgis.com/home/item.html?id=58a541efc59545e6b7137f961d7de883. Perdigão required the largest prediction domain size, $\qty{3168}{m}\times\qty{3168}{m}$ wide and \qty1104m tall, showcasing the wind prediction performance at double the original resolution.

IV-G Inference time experiments setup

We ran the inference time experiments on an Orin AGX⁷⁷7https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-orin/, a low power, light weight (\qty623 including the carrier board and heatsink) and small scale (\qtyproduct105 x 105 x 60\milli) single-board computer that can be carried by a small scale sUAV. We set up the Orin AGX with the Jetpack 5.1 software kit that includes CUDA 11.4 and cuDNN 8.6.0 and installed PyTorch 2.0. During the evaluation we ran the Orin in the maximum power mode (\qty60) using all 12 CPU cores.

IV-H sUAV flight tests

We used three Multiplex EasyGlider4 airframes equipped with the Pixhawk 4 autopilot [63] using the high quality ADIS16448 inertial measurement unit (IMU) and the u-blox M9N GNSS module for autonomous navigation. We configured the main height source of our modified PX4 autopilot [64] to the GPS height and use the barometric pressure as a fallback. An extension to the guidance law adjusting the airspeed ensured safety during strong wind conditions [5]. We used a custom designed pitot tube with the Sensirion SDP31 differential pressure sensor and Hall sensor airflow vanes to enable measuring the 3D wind vector. Refer to Section D for more details about the airflow sensing setup and calibration procedure. We used a ground station computer with QGroundControl to control and navigate the sUAV. While the default PX4 state estimator could be extended to estimate the 3D wind we opted for an offline flight path reconstruction (FPR) pipeline using an iterated extended Kalman filter (see a similar problem definition in [65]). The offline FPR pipeline allowed us to generate high quality estimates for validating our approach and to adjust the estimation pipeline post flight.

We gathered wind data from flights at three test sites in Switzerland. The first test site at Chasseral is one of the most topographically isolated mountains in Switzerland and is located in the Jura mountains (47° 07’ 38” N, 7° 02’ 47” E, \qty1548 above mean sea level (AMSL)). The other test sites are located on the ridges of the Oberalppass (46° 39’ 24” N, 8° 40’ 21” E, \qty2069 AMSL) and Gotthardpass (46° 34’ 17” N, 8° 33’ 33” E, \qty1960 AMSL) in the Central Swiss Alps. These were chosen to evaluate the prediction performance for domains surrounded by complex terrain. The spatial constraints allowed for two sUAV to simultaneously collect wind data at Oberalppass and Gotthardpass and three sUAVs at Chasseral. The sUAV were flown simultaneously in circular loiter patterns with a radius of \qty100m leading to lateral separation between the planes of up to \qty800m and measurements in different flow regimes. We planned the flights based on NWP forecasts ensuring good flight (no precipitation, fog or clouds) and stable wind conditions (wind magnitude below the cruise speed of \qty10\per, direction and magnitude near-constant over multiple hours).

We use two modes to convert the raw wind estimates to the WindSeer input. In the first mode we generated the input by averaging the measurements over one loiter pattern to generate a single wind estimate. This time-averaged data, similar to the averaged data in the measurement campaigns with static masts, helped to reduce the noise and sensitivity to sensor calibration on the input measurements. The observations from the different sUAV and multiple loiters enabled us to compute the average prediction error and the correlation for each flight experiment to see if the flow trends are well predicted. In the second mode the input is composed of the wind data from a \qty120 window of one sUAV to predict the wind along the flight path within the next \qty120 window for both the input (itself) and the validation sUAV. This sequential time-windowed setup allowed us to qualitatively evaluate the WindSeer performance along the flight paths. A high-resolution elevation map provided by SwissTopo [31] was used to construct the terrain for the WindSeer input.

Acknowledgments

We would like to thank our drone pilots David Rohr, Jaeyoung Lim, Jonas Langenegger, Tizian Steiger, and Yves Allenspach for ensuring the safety of our drones during the experiments. The arduous task of setting up the drones and the respective sensing devices was supported by Himmet Kaplan, Jonas Langenegger, Michael Riner, Thomas Mantel, Tizian Steiger, and Yves Allenspach. Last we would like to thank Alexey Dosovitskiy, Debadeepta Dey, and René Ranftl for their valuable inputs throughout the project.

This project was funded, in part, by Microsoft Swiss Joint Research Center under Contract No. 2019-038 “Project Altair: Infrared Vision and AI Decision-Making for Longer Drone Flights”, the Intel Network on Intelligent Systems, and by the ETH Research Grant AvalMapper ETH-10 20-1.

The model training and evaluations were performed on the ETH Zürich Euler computing cluster.

Declarations

Competing interests

The authors declare that they have no competing interests.

Availability of data and materials

The synthetic CFD training data, as well as the sUAV flight measurements will be made available through our dataset server: https://projects.asl.ethz.ch/datasets/doku.php?id=nmi_23_windseer.

Code availability

The code is available on github: https://github.com/ethz-asl/WindSeer

References

[1] J. Mattuella, A. Loredo-Souza, M. Oliveira, and A. Petry, “Wind tunnel experimental analysis of a complex terrain micrositing,” Renewable and Sustainable Energy Reviews, vol. 54, pp. 110–119, 2016. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1364032115010588
[2] M. Belo-Pereira and J. A. Santos, “Air-traffic restrictions at the madeira international airport due to adverse winds: Links to synoptic-scale patterns and orographic effects,” Atmosphere, vol. 11, no. 11, 2020. [Online]. Available: https://www.mdpi.com/2073-4433/11/11/1257
[3] T. Schlegel, M. Geissmann, M. Hertach, and D. Kröpfli, “Windatlas Schweiz: Jahresmittel der modellierten windgeschwindigkeit und windrichtung,” Federal Department of Environment, Transport, Energy and Communications (UVEK), Tech. Rep. COO.2207.110.2.1073455, 2016.
[4] P. Oettershagen, A. Melzer, T. Mantel, K. Rudin, R. Lotz, D. Siebenmann, S. Leutenegger, K. Alexis, and R. Siegwart, “A solar-powered hand-launchable uav for low-altitude multi-day continuous flight,” in 2015 IEEE International Conference on Robotics and Automation (ICRA), 2015, pp. 3986–3993.
[5] T. Stastny and R. Siegwart, “On flying backwards: Preventing run-away of small, low-speed, fixed-wing uavs in strong winds,” in 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019, pp. 5198–5205.
[6] A. Chakrabarty and J. Langelaan, “Uav flight path planning in time varying complex wind-fields,” in 2013 American Control Conference, 2013, pp. 2568–2574.
[7] R. Buizza, “Chaos and weather prediction,” 2002. [Online]. Available: https://www.ecmwf.int/node/16927
[8] A. Voudouri, P. Khain, I. Carmona, E. Avgoustoglou, P. Kaufmann, F. Grazzini, and J. Bettems, “Optimization of high resolution cosmo model performance over switzerland and northern italy,” Atmospheric Research, vol. 213, pp. 70–85, 2018. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0169809517307305
[9] J. Berg, J. Mann, A. Bechmann, M. Courtney, and H. E. Jørgensen, “The Bolund experiment, part I: flow over a steep, three-dimensional hill,” Boundary-layer meteorology, vol. 141, no. 2, p. 219, 2011.
[10] A. Bechmann, N. N. Sørensen, J. Berg, J. Mann, and P.-E. Réthoré, “The Bolund experiment, part II: blind comparison of microscale flow models,” Boundary-Layer Meteorology, vol. 141, no. 2, p. 245, 2011.
[11] N. Vasiljević, G. Lea, M. Courtney, J.-P. Cariou, J. Mann, and T. Mikkelsen, “Long-range windscanner system,” Remote Sensing, vol. 8, no. 11, 2016. [Online]. Available: https://www.mdpi.com/2072-4292/8/11/896
[12] P. Taylor and H. Teunissen, Askervein’82: Report on the September/October 1982 Experiment to Study Boundary Layer Flow over Askervein, South Uist. Meteorological Services Research Branch, Atmospheric Environment Service, 1983.
[13] P. A. Taylor and H. W. Teunissen, “The askervein hill project: Overview and background data,” Boundary-Layer Meteorology, vol. 39, no. 15, 1987.
[14] H. J. S. Fernando, J. Mann, J. M. L. M. Palma, J. K. Lundquist, R. J. Barthelmie, M. Belo-Pereira, W. O. J. Brown, F. K. Chow, T. Gerz, C. M. Hocut, P. M. Klein, L. S. Leo, J. C. Matos, S. P. Oncley, S. C. Pryor, L. Bariteau, T. M. Bell, N. Bodini, M. B. Carney, M. S. Courtney, E. D. Creegan, R. Dimitrova, S. Gomes, M. Hagen, J. O. Hyde, S. Kigle, R. Krishnamurthy, J. C. Lopes, L. Mazzaro, J. M. T. Neher, R. Menke, P. Murphy, L. Oswald, S. Otarola-Bustos, A. K. Pattantyus, C. V. Rodrigues, A. Schady, N. Sirin, S. Spuler, E. Svensson, J. Tomaszewski, D. D. Turner, L. van Veen, N. Vasiljević, D. Vassallo, S. Voss, N. Wildmann, and Y. Wang, “The perdigão: Peering into microscale details of mountain winds,” Bulletin of the American Meteorological Society, vol. 100, no. 5, pp. 799 – 819, 2019.
[15] Y. Xie, E. Franz, M. Chu, and N. Thuerey, “tempoGAN: A temporally coherent, volumetric GAN for super-resolution fluid flow,” ACM Transactions on Graphics (TOG), vol. 37, no. 4, p. 95, 2018.
[16] B. Kim, V. C. Azevedo, N. Thuerey, T. Kim, M. Gross, and B. Solenthaler, “Deep fluids: A generative network for parameterized fluid simulations,” Computer Graphics Forum, vol. 38, no. 2, pp. 59–70, 2019. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1111/cgf.13619
[17] M. D. Ribeiro, A. Rehman, S. Ahmed, and A. Dengel, “Deepcfd: Efficient steady-state laminar flow approximation with deep convolutional neural networks,” arXiv preprint arXiv:2004.08826, 2020.
[18] S. Bhatnagar, Y. Afshar, S. Pan, K. Duraisamy, and S. Kaushik, “Prediction of aerodynamic flow fields using convolutional neural networks,” Computational Mechanics, vol. 64, no. 2, pp. 525–545, 2019.
[19] N. Umetani and B. Bickel, “Learning three-dimensional flow for interactive aerodynamic design,” ACM Trans. Graph., vol. 37, no. 4, pp. 89:1–89:10, Jul. 2018. [Online]. Available: http://doi.acm.org/10.1145/3197517.3201325
[20] P. Baqué, E. Remelli, F. Fleuret, and P. Fua, “Geodesic convolutional shape optimization,” arXiv preprint arXiv:1802.04016, 2018.
[21] T.-T.-H. Le, H. Kang, and H. Kim, “Towards incompressible laminar flow estimation based on interpolated feature generation and deep learning,” Sustainability, vol. 14, no. 19, p. 11996, 2022.
[22] A. Güemes, C. Sanmiguel Vila, and S. Discetti, “Super-resolution generative adversarial networks of randomly-seeded fields,” Nature Machine Intelligence, vol. 4, no. 12, pp. 1165–1173, 2022.
[23] K. Fukami, R. Maulik, N. Ramachandra, K. Fukagata, and K. Taira, “Global field reconstruction from sparse sensors with voronoi tessellation-assisted deep learning,” Nature Machine Intelligence, vol. 3, no. 11, pp. 945–951, 2021.
[24] K. Fukami, K. Fukagata, and K. Taira, “Super-resolution analysis via machine learning: a survey for fluid flows,” Theoretical and Computational Fluid Dynamics, pp. 1–24, 2023.
[25] Z. Yang, H. Yang, and Z. Yin, “Super-resolution reconstruction for the three-dimensional turbulence flows with a back-projection network,” Physics of Fluids, vol. 35, no. 5, p. 055123, 05 2023.
[26] N. Wandel, M. Weinmann, and R. Klein, “Unsupervised deep learning of incompressible fluid dynamics,” CoRR, vol. abs/2006.08762, 2020. [Online]. Available: https://arxiv.org/abs/2006.08762
[27] J. Pathak, S. Subramanian, P. Harrington, S. Raja, A. Chattopadhyay, M. Mardani, T. Kurth, D. Hall, Z. Li, K. Azizzadenesheli, P. Hassanzadeh, K. Kashinath, and A. Anandkumar, “Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators,” arXiv preprint arXiv:2202.11214, 2022.
[28] K. Bi, L. Xie, H. Zhang, X. Chen, X. Gu, and Q. Tian, “Pangu-weather: A 3d high-resolution model for fast and accurate global weather forecast,” arXiv preprint arXiv:2211.02556, 2022.
[29] R. Lam, A. Sanchez-Gonzalez, M. Willson, P. Wirnsberger, M. Fortunato, A. Pritzel, S. Ravuri, T. Ewalds, F. Alet, Z. Eaton-Rosen et al., “Graphcast: Learning skillful medium-range global weather forecasting,” arXiv preprint arXiv:2212.12794, 2022.
[30] USGS, “3d elevation program,” https://www.usgs.gov/3d-elevation-program, Dec 2021.
[31] SwissTopo, “3d elevation program,” https://www.geo.admin.ch/en/geo-information-switzerland/geodata-index-inspire/surface-representation/elevation.html, Dec 2021.
[32] P. Oettershagen, A. Melzer, T. Mantel, K. Rudin, T. Stastny, B. Wawrzacz, T. Hinzmann, S. Leutenegger, K. Alexis, and R. Siegwart, “Design of small hand-launched solar-powered uavs: From concept study to a multi-day world endurance record flight,” Journal of Field Robotics, vol. 34, no. 7, pp. 1352–1377, 2017. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/rob.21717
[33] S. L. Ellis, M. L. Taylor, M. Schiele, and T. B. Letessier, “Influence of altitude on tropical marine habitat classification using imagery from fixed-wing, water-landing uavs,” Remote Sensing in Ecology and Conservation, vol. 7, no. 1, pp. 50–63, 2021. [Online]. Available: https://zslpublications.onlinelibrary.wiley.com/doi/abs/10.1002/rse2.160
[34] E. J. Lantz, J. O. Roberts, J. Nunemaker, E. DeMeo, K. L. Dykes, and G. N. Scott, “Increasing wind turbine tower heights: Opportunities and challenges,” National Renewable Energy Lab.(NREL), Golden, CO (United States), Tech. Rep., 2019.
[35] H. M. P. C. Jayaweera and S. Hanoun, “Path planning of unmanned aerial vehicles (uavs) in windy environments,” Drones, vol. 6, no. 5, p. 101, Apr 2022.
[36] M. Coombes, W.-H. Chen, and C. Liu, “Flight testing boustrophedon coverage path planning for fixed wing uavs in wind,” in 2019 International Conference on Robotics and Automation (ICRA), 2019, pp. 711–717.
[37] P. Oettershagen, J. Förster, L. Wirth, G. Hitz, R. Siegwart, and J. Ambühl, “Meteorology-aware multi-goal path planning for large-scale inspection missions with solar-powered aircraft,” Journal of Aerospace Information Systems, vol. 16, no. 10, pp. 390–408, 2019.
[38] A. Kashefi, D. Rempe, and L. J. Guibas, “A point-cloud deep learning framework for prediction of fluid flow fields on irregular geometries,” Physics of Fluids, vol. 33, no. 2, p. 027104, 2021.
[39] J. Zhang and X. Zhao, “Machine-learning-based surrogate modeling of aerodynamic flow around distributed structures,” AIAA Journal, vol. 59, no. 3, pp. 868–879, 2021.
[40] F. Ma and S. Karaman, “Sparse-to-dense: Depth prediction from sparse depth samples and a single image,” in 2018 IEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 4796–4803.
[41] M. Jaritz, R. de Charette, E. Wirbel, X. Perrotton, and F. Nashashibi, “Sparse and Dense Data with CNNs: Depth Completion and Semantic Segmentation,” arXiv e-prints, p. arXiv:1808.00769, Aug. 2018.
[42] K. Lu, N. Barnes, S. Anwar, and L. Zheng, “From depth what can you see? depth completion via auxiliary image reconstruction,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 11 303–11 312.
[43] Z. Huang, J. Fan, S. Cheng, S. Yi, X. Wang, and H. Li, “HMS-Net: Hierarchical Multi-scale Sparsity-invariant Network for Sparse Depth Completion,” arXiv e-prints, p. arXiv:1808.08685, Aug. 2018.
[44] W. C. Skamarock, J. B. Klemp, J. Dudhia, D. O. Gill, Z. Liu, J. Berner, W. Wang, J. G. Powers, M. G. Duda, D. M. Barker et al., “A description of the advanced research wrf model version 4,” National Center for Atmospheric Research: Boulder, CO, USA, vol. 145, no. 145, p. 550, 2019.
[45] B. Blocken, “50 years of computational wind engineering: Past, present and future,” Journal of Wind Engineering and Industrial Aerodynamics, vol. 129, pp. 69–102, 2014.
[46] Z. Akos, M. Nagy, S. Leven, and T. Vicsek, “Thermal soaring flight of birds and unmanned aerial vehicles,” Bioinspiration & Biomimetics, vol. 5, no. 4, 2010.
[47] D. R. Durran, Mountain Waves. Boston, MA: American Meteorological Society, 1986, pp. 472–492.
[48] C.-Y. Chang, J. Schmidt, M. Dörenkämper, and B. Stoevesandt, “A consistent steady state cfd simulation method for stratified atmospheric boundary layer flows,” Journal of Wind Engineering and Industrial Aerodynamics, vol. 172, pp. 55–67, 2018. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0167610517305135
[49] L. Sankaralingam and C. Ramprasadh, “Angle of attack measurement using low-cost 3d printed five hole probe for uav applications,” Measurement, vol. 168, p. 108379, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0263224120309155
[50] H. G. Weller, G. Tabor, H. Jasak, and C. Fureby, “A tensorial approach to computational continuum mechanics using object-oriented techniques,” Computers in Physics, vol. 12, no. 6, pp. 620–631, 1998.
[51] H. Jasak, A. Jemcov, v. Tuković et al., “OpenFOAM: A C++ library for complex physics simulations,” in International workshop on coupled methods in numerical dynamics. IUC Dubrovnik, Croatia, 2007, pp. 1–20.
[52] B. E. Launder and B. Sharma, “Application of the energy-dissipation model of turbulence to the calculation of flow near a spinning disc,” Letters in heat and mass transfer, vol. 1, no. 2, pp. 131–137, 1974.
[53] F. Achermann, N. R. J. Lawrance, R. Ranftl, A. Dosovitskiy, J. J. Chung, and R. Siegwart, “Learning to predict the wind for safe aerial vehicle planning,” in 2019 International Conference on Robotics and Automation (ICRA), 2019, pp. 2311–2317.
[54] C. Shorten and T. M. Khoshgoftaar, “A survey on image data augmentation for deep learning,” Journal of Big Data, vol. 6, 2019. [Online]. Available: https://doi.org/10.1186/s40537-019-0197-0
[55] R. Köhler, C. Schuler, B. Schölkopf, and S. Harmeling, “Mask-specific inpainting with deep neural networks,” 09 2014, pp. 523–534.
[56] J. Uhrig, N. Schneider, L. Schneider, U. Franke, T. Brox, and A. Geiger, “Sparsity invariant cnns,” in 2017 International Conference on 3D Vision (3DV), 2017, pp. 11–20.
[57] K. Nirmal, A. G. Sreejith, J. Mathew, M. Sarpotdar, A. Suresh, A. Prakash, M. Safonova, and J. Murthy, “Noise modeling and analysis of an IMU-based attitude sensor: improvement of performance by filtering and sensor fusion,” arXiv e-prints, p. arXiv:1608.07053, Aug. 2016.
[58] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention. Springer, 2015, pp. 234–241.
[59] A. Odena, V. Dumoulin, and C. Olah, “Deconvolution and checkerboard artifacts,” Distill, 2016. [Online]. Available: http://distill.pub/2016/deconv-checkerboard
[60] A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities improve neural network acoustic models,” in in ICML Workshop on Deep Learning for Audio, Speech and Language Processing, 2013.
[61] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in 3rd International Conference for Learning Representations, 2015.
[62] H. Ku, “Notes on the use of propagation of error formulas,” Journal of Research of the National Bureau of Standards. Section C: Engineering and Instrumentation, vol. 70C, no. 4, pp. 263–273, 1966.
[63] L. Meier, P. Tanskanen, F. Fraundorfer, and M. Pollefeys, “Pixhawk: A system for autonomous flight using onboard computer vision,” in 2011 IEEE International Conference on Robotics and Automation. IEEE, 2011, pp. 2992–2997.
[64] L. Meier, D. Honegger, and M. Pollefeys, “Px4: A node-based multithreaded open source robotics framework for deeply embedded platforms,” in 2015 IEEE international conference on robotics and automation (ICRA). IEEE, 2015, pp. 6235–6240.
[65] J. A. Mulder, Q. P. Chu, J. K. Sridhar, J. H. Breeman, and M. Laban, “Non-linear aircraft flight path reconstruction review and new advances,” Progress in Aerospace Sciences, vol. 35, no. 7, pp. 673–726, Oct. 1999.
[66] D. A. Hastings, P. K. Dunbar, G. M. Elphingstone, M. Bootz, H. Murakami, H. Maruyama, H. Masaharu, P. Holland, J. Payne, N. A. Bryant et al., “The global land one-kilometer base elevation (globe) digital elevation model, version 1.0,” National Oceanic and Atmospheric Administration, National Geophysical Data Center, vol. 325, pp. 80 305–3328, 1999.
[67] G.-A. Heinrich, S. Vogt, N. R. J. Lawrance, T. J. Stastny, and R. Y. Siegwart, “In-wing pressure measurements for airspeed and airflow angle estimation and high angle-of-attack flight,” Journal of Guidance, Control, and Dynamics, vol. 0, no. 0, pp. 1–13, 2021. [Online]. Available: https://doi.org/10.2514/1.G006412
[68] C. Ejeh, I. Afgan, R. Shittu, A. Sakirudeen, and P. Anumah, “Investigating the impact of velocity fluctuations and compressibility to aerodynamic efficiency of a fixed-wing aircraft,” Results in Physics, vol. 18, p. 103263, 2020.
[69] S. E. Sayed Ahmed, E. Z. Ibrahiem, O. M. Mesalhy, and M. A. Abdelatief, “Effect of attack and cone angels on air flow characteristics for staggered wing shaped tubes bundle,” Heat and Mass Transfer, vol. 51, pp. 1001–1016, 2015.

Appendix A Wind characteristics terminology

The complex wind around terrain has some typical flow regions and we want to establish common terms for some of these regions as shown in Fig. 6. The side of the terrain where the wind direction points toward the hill is called the upwind side since it typically exhibits mostly rising winds. At the highest point of the terrain (hill top) the wind speeds up and higher wind magnitudes can be measured. The lee side of the terrain/hill describes the region where the wind direction typically points away from the hill top. This region is highly turbulent and can form multiple modes with completely different characteristics. Under certain conditions the flow can follow the terrain downhill resulting in prevailing downwinds. In other conditions a lee side rotor can form, where the wind follows a circular motion close to the terrain behind the hill. At a larger scale of multiple kilometers and under very specific conditions, mountain waves with multiple rotors can form [47].

Appendix B NWP data as WindSeer input

Our initial hypothesis was to train WindSeer based on the known high-resolution terrain and predictions from large scale NWP. The Swiss COSMO 1 model provides predictions with a horizontal resolution of \qty1.1\kilo [8]. The elevation data used in the NWP models, such as GLOBE [66], is an aggregation of available high-resolution terrain sources (usually the mean or median of the high-resolution data within one cell). This smoothed topography representation neglects smaller scale terrain features and therefore only provides meaningful results at a scale of multiple cells/kilometers.

We conducted a test flight to evaluate how well the NWP of one cell matches the wind measured by \@iacisUAV sUAV. We measured the wind at one grid point of the Swiss COSMO-1 model⁸⁸8https://www.meteoschweiz.admin.ch/home/mess-und-prognosesysteme/warn-und-prognosesysteme/cosmo-prognosesysteme.html close to Flüelen (46° 53’ 33” N, 8° 36’ 45” E, \qty436 AMSL). While Flüelen is located within the Swiss Alps, this particular test site is bordered on one side by a lake and surrounded by flat and smooth terrain within a \qty1\kilo radius resulting in a good match between the NWP terrain model and the high-resolution terrain.

As visible in Fig. 7 A), the COSMO-1 NWP poorly represents the sUAV data for both the magnitude and wind direction. Obviously in different conditions the NWP might fit the measurements better. However, this implies that, depending on the case, the NWP may or may not be accurate. Thus, WindSeer needs another, more reliable source for its wind prediction data. In addition, the coarse representation of the terrain can result in large altitude offsets of the NWP compared to the actual terrain in the presence of large elevation changes, see Fig. 7 B).

The NWP data may provide supplemental information to WindSeer if used together with the sparse measurements. However, first the mapping between the NWP data to the actual flow needs to be established. This would be a highly data-driven task, and if that connection is too noisy, WindSeer might learn to ignore the NWP input data altogether.

Appendix C WindSeer ablation study

We evaluated the effect of varying certain hyperparameters in the training pipeline on the model performance on a test set of previously unobserved CFD samples. The baseline model parameters are shown in Tab. II, note that these parameters are different from the finalized WindSeer version. We used the average error norm over all non-terrain cells averaged over all samples in the test set as our metric to compare the models.

TABLE II: Baseline hyperparameter set used in the ablation study.

Hyperparameter	Value
learning rate	$1.0\times 10^{-5}$
learning rate decay	0.25 every 700th epoch
learning epochs	1500
learning batch size	35
max Gaussian noise std	\qty0
max bias magnitude	\qty0
trajectory min length	3 cells
trajectory max length	50 cells
model depth	4
pooling method	strided convolution
input no measurement value	mean
input use $u_{z}$	true

Models trained with different pooling methods (max-pooling (MP), average-pooling (AP), convolution with strides) perform comparably with a slight edge for the pooling methods over the convolution with strides (\qty1.1 error reduction). The model using only the horizontal wind measurement (NUZ) outperforms the baseline (BL) model, which uses the vertical measurements as well, by \qty2.6. We also varied the input trajectory lengths up to a length of 500 cells (LT). Networks trained on longer trajectories perform \qty13.6 better even if they are evaluated exclusively on short trajectories with lengths of up to 50 cells.

Input noise ablation study

Realistic wind measurements are subject to noise. We model the sensor noise with a zero-mean Gaussian distribution and the sensor miscalibration with a constant bias. We evaluated the robustness of the model to different levels of such noisy input. Doing so we trained multiple models (BL architecture) with varying levels of input noise. We varied the standard deviation of the Gaussian noise between \qty0 and \qty80 of the average flow magnitude of the respective sample; we varied the bias between \qty0 and \qty50 of the flow magnitude. We then evaluated the models in two ways: First we evaluated them on the test set with the same noise distribution they observed during training. Since this is not a fair comparison, as predicting with high-noise levels is more difficult than low-noise data, we also evaluated all models against perfect data (no noise added). The results of the experiment are displayed in Fig. 8 A). In general, higher input noise indicates higher prediction errors, but up to a level of \qty10 Gaussian noise and bias we observed similar errors. When evaluating the models on the perfect input we can see that the low-level noise models (up to \qty10 bias and \qty30 Gaussian noise) perform comparably to the baseline model trained without noise (Fig. 8 B)). Thus training models with a too high noise level will also negatively impact the performance when they are provided with perfect input data.

CFD prediction results of different WindSeer variants

In our evaluation we considered six variations of WindSeer [ZD4, ZD6, AD4, AD6, VD4, VD6] by varying the fill value and network depth. The fill indicator (Z, A, V) indicates how the wind speed input channels are filled for the cells with no measurements. We tested fill values of: zero (Z), the average of all measurements per channel (A), and the Voronoi tessellation presented in [23] (V) (essentially the nearest measurement value). The network depth indicates the number of pooling/upsampling layers in the encoder/decoder of WindSeer and we evaluated depths of four (D4) and six (D6) resulting in receptive field sizes of 175 and 703 respectively. The models were trained using the Adam optimizer [61] for 3000 epochs except for AD6, where the model after 1000 epochs was chosen as further training showed increasing validation loss, suggesting over-fitting.

We used the same input noise distribution as observed during training (Gaussian noise and random bias). Fig. 8 C) shows the distribution of the relative velocity magnitude and TKE prediction errors over the full flow domain on the left side (blue) and excluding the lowest four cells above the terrain on the right side (green). These latter results (equivalent to only scoring the network output above an altitude of \qty46m) illustrate the predictive performance for realistic sUAV flight regimes. There, all WindSeer variants produced more accurate wind velocity predictions (median error reduction AD4: \qty11.1 to \qty8.3, AD6: \qty12.0 to \qty8.9, ZD4: \qty12.2 to \qty9.0, ZD6: \qty11.0 to \qty8.1, VD4: \qty12.1 to \qty8.0, VD6: \qty11.9 to \qty7.6). In contrast to the velocity errors, the TKE predictions do not significantly change on the reduced prediction volume since the computed TKE values close to the terrain tend to be smoother than the velocity values, thus suffering less from resolution differences between WindSeer and the CFD simulations. All the WindSeer variants result in a similar median between \qty10.8 to \qty11.9. Depending on the metric different models perform best. The averaging input models score the lowest mean error while the Voronoi variants yield the lowest median error. The zero-fill variants are most consistent with the lowest 75th percentile.

Overall, as evident in Fig. 8 C), there is no significant performance difference between the metrics of the WindSeer variants. However, a qualitative assessment of the predicted wind fields reveals that the Voronoi tessellation models (VD4 and VD6) show strong artifacts along the partition borders in certain cases. In contrast, the other WindSeer variants either do not exhibit such artifacts or are effected at a much smaller scale. These artifacts are a result of the noisy measurements being close to each other resulting in large differences between the Voronoi partitions. In Fig. 9 we show one such case and the resulting predictions for the VD4 and AD4 WindSeer variants. These results indicate that while Voronoi tessellation has been shown to work with sparse input data for flow prediction [23], in our setting with highly noisy measurements from only a small sub-region of the domain, this input modality can result in predictions containing artifacts. Thus, we further evaluate the artifact-free WindSeer variants on the real wind data (AD4, AD6, ZD4, ZD6).

Measurement campaign results of different WindSeer variants

We evaluated the performance of the different WindSeer variants on the data from the Bolund, Askervein, and Perdigão data and report the prediction error and correlation averaged for certain wind cases, as in Tab. I. The changes in the prediction grid as outlined in Sec. IV-F resulted in higher sparsity levels in the range of \qty3.5e-6 to \qty3.2e-5 compared to the training density of \qty1.1e-3 to \qty0.19. The model variants using average-filling (AD4, AD6) could generalize to this much sparser input data in contrast to the zero-fill models (ZD4, ZD6), which severely underpredicted the wind regardless of the measurement values. Thus, we only compared the two performant WindSeer variants against an averaging baseline (AVG) that assumes the wind and TKE are constant and predicts the average of all measurements over the full domain and report the prediction errors and correlations in Tab. III. The AD4 variant resulted in better wind magnitude predictions, while the AD6 predicted the vertical wind better. The TKE is predicted with a lower error with the AD6 variant but also with lower correlation values compared to AD4. Overall, in most cases both WindSeer variants performed similarly to each other, explaining the small difference in the averaged error over all cases for the three metrics. We chose the AD4 variant as our WindSeer model as it did not show the over-fitting during training.

		Error S $[m/s]$			Error W $[m/s]$			Error TKE $[m^{2}/s^{2}]$
Terrain	Case	AVG	AD4	AD6	AVG	AD4	AD6	AVG	AD4	AD6
Bolund	90	1.80	1.58	1.61	0.85	0.58	0.62	1.91	1.33	1.19
	239	2.80	2.50	2.49	0.66	0.34	0.37	2.67	1.68	1.66
	255	3.24	2.47	2.61	0.85	0.44	0.46	3.43	2.12	2.06
	270	3.77	2.79	2.94	0.95	0.51	0.48	5.14	3.43	3.35
Askervein	TU25	2.58	2.39	2.44	1.10	0.37	0.31	0.61	0.41	0.34
	TU30A	1.14	0.98	1.20	0.41	0.26	0.25	1.38	0.72	0.54
	TU30B	1.80	1.46	2.13	0.51	0.41	0.39	2.82	1.40	1.18
	TU01A	3.26	2.83	3.21	1.41	0.52	0.46	1.89	1.06	0.96
	TU01B	3.24	2.74	3.12	1.37	0.48	0.42	1.64	0.98	0.87
	TU01C	3.55	3.08	3.42	1.23	0.45	0.41	1.17	0.72	0.68
	TU01D	4.21	3.71	4.04	1.26	0.47	0.42	1.62	1.17	0.99
	TU03A	5.29	4.70	5.00	1.74	0.64	0.55	2.04	1.31	1.15
	TU03B	4.90	4.41	4.70	1.54	0.54	0.46	1.82	1.21	1.04
	TU05A	1.91	1.89	1.92	0.76	0.31	0.27	1.73	0.99	0.76
	TU05B	1.18	1.00	1.12	0.31	0.26	0.28	1.40	0.63	0.51
	TU05C	0.93	0.93	1.01	0.34	0.25	0.25	1.09	0.43	0.39
	TU07B	3.42	3.27	3.18	1.59	0.49	0.41	2.44	1.85	1.43
Perdigão 2017-05-09	13:30-13:35	2.91	2.27	2.17	0.85	0.57	0.58	-	-	-
	17:10:17:15	4.41	3.33	3.12	1.22	0.88	0.86	-	-	-
	17:00-18:00	3.06	2.37	2.24	0.80	0.56	0.58	-	-	-
Perdigão 2017-05-12	01:00-02:00	2.82	2.31	2.23	0.52	0.42	0.40	-	-
	17:00-18:00	2.76	2.23	2.11	0.70	0.57	0.47	-	-	-
	19:45-19:50	1.15	0.90	0.91	0.21	0.16	0.18	-	-	-
Perdigão 2017-05-16	07:00-08:00	1.58	1.16	1.15	0.27	0.27	0.22	-	-
	11:40-11:45	0.85	0.77	0.76	0.33	0.27	0.27	-	-	-
	12:40-12:45	0.86	0.75	0.74	0.29	0.22	0.22	-	-	-
	20:00-21:00	1.35	0.97	1.04	0.22	0.31	0.29	-	-	-
Perdigão 2017-05-18	14:35-14:40	2.14	1.82	1.75	0.41	0.36	0.33	-	-	-
	20:00-21:00	1.54	1.08	1.11	0.17	0.19	0.20	-	-	-
	22:00-23:00	1.52	1.13	1.18	0.17	0.17	0.16	-	-	-
Perdigão 2017-05-20	03:15-03:20	3.59	2.63	2.77	0.41	0.55	0.46	-	-	-
	10:00-11:00	2.20	1.84	1.75	0.51	0.42	0.38	-	-	-
	12:20-12:25	1.93	1.71	1.61	0.58	0.43	0.41	-	-	-
Perdigão 2017-06-08	00:00-01:00	2.68	1.87	2.03	0.35	0.38	0.33	-	-	-
	12:40-12:45	1.20	0.90	0.87	0.31	0.22	0.24	-	-	-
	14:00-15:00	2.49	1.77	1.64	0.74	0.42	0.43	-	-	-
Total average		2.50	2.07	2.13	0.72	0.41	0.38	2.05	1.26	1.12
		Correlation S			Correlation W			Correlation TKE
Terrain	Case	AVG	AD4	AD6	AVG	AD4	AD6	AVG	AD4	AD6
Bolund	90	-	0.72	0.67	-	0.50	0.64	-	0.58	0.60
	239	-	0.68	0.73	-	0.76	0.75	-	0.86	0.93
	255	-	0.82	0.85	-	0.72	0.76	-	0.82	0.89
	270	-	0.85	0.91	-	0.78	0.84	-	0.73	0.78
Askervein	TU25	-	0.65	0.60	-	0.90	0.96	-	0.89	0.88
	TU30A	-	0.62	0.68	-	0.58	0.74	-	0.42	0.54
	TU30B	-	0.73	0.71	-	0.64	0.63	-	0.23	0.50
	TU01A	-	0.77	0.52	-	0.91	0.92	-	0.85	0.46
	TU01B	-	0.79	0.53	-	0.92	0.93	-	0.87	0.53
	TU01C	-	0.78	0.46	-	0.92	0.93	-	0.90	0.64
	TU01D	-	0.79	0.54	-	0.93	0.94	-	0.93	0.81
	TU03A	-	0.78	0.65	-	0.93	0.95	-	0.98	0.94
	TU03B	-	0.77	0.61	-	0.92	0.95	-	0.90	0.89
	TU05A	-	0.62	0.53	-	0.89	0.91	-	0.40	0.31
	TU05B	-	0.79	0.69	-	0.48	0.53	-	0.04	-0.09
	TU05C	-	0.66	0.53	-	0.58	0.60	-	0.14	-0.05
	TU07B	-	0.70	0.66	-	0.90	0.97	-	0.40	0.41
Perdigão 2017-05-09	13:32:30	-	0.82	0.82	-	0.53	0.57	-	-	-
	17:12:30	-	0.48	0.80	-	0.35	0.48	-	-	-
	17:00-18:00	-	0.77	0.77	-	0.50	0.54	-	-	-
Perdigão 2017-05-12	01:00-02:00	-	0.76	0.74	-	0.45	0.51	-	-	-
	17:00-18:00	-	0.81	0.81	-	0.57	0.61	-	-	-
	19:45-19:50	-	0.71	0.72	-	0.57	0.53	-	-	-
Perdigão 2017-05-16	07:00-08:00	-	0.65	0.68	-	0.67	0.63	-	-	-
	11:40-11:45	-	0.51	0.55	-	0.22	0.18	-	-	-
	12:40-12:45	-	0.48	0.45	-	0.30	0.31	-	-	-
	20:00-21:00	-	0.68	0.64	-	0.21	0.25	-	-	-
Perdigão 2017-05-18	14:35-14:40	-	0.70	0.72	-	0.33	0.39	-	-	-
	20:00-21:00	-	0.84	0.80	-	0.12	0.19	-	-	-
	22:00-23:00	-	0.74	0.69	-	0.42	0.50	-	-	-
Perdigão 2017-05-20	03:15-03:20	-	0.61	0.64	-	0.30	0.30	-	-	-
	10:00-11:00	-	0.71	0.76	-	0.46	0.44	-	-	-
	12:20-12:25	-	0.66	0.71	-	0.45	0.42	-	-	-
Perdigão 2017-06-08	00:00-01:00	-	0.69	0.67	-	0.44	0.42	-	-	-
	12:40-12:45	-	0.77	0.77	-	0.46	0.46	-	-	-
	14:00-15:00	-	0.82	0.82	-	0.58	0.62	-	-	-
Total average		-	0.71	0.68	-	0.59	0.62	-	0.64	0.59

TABLE III: Measurement campaigns error results. Absolute prediction errors and correlations for the velocity magnitude (S), vertical wind component (W), and turbulence kinetic energy (TKE) on the measurement campaign datasets of the AD4 and AD6 models compared to the averaging baseline (AVG).

sUAV results of different WindSeer variants

We present the error metrics for all flights and the different model variants in Tab. IV. All models consistently struggle at predicting the horizontal wind while the vertical wind prediction is much more accurate compared to the baseline. The average-filling models strongly rely on the averaged measurement, thus providing a good representation of the overall flow state. However, in the flight experiments with complex topography, the measured wind does not have this property, therefore the zero-fill models (ZD4, ZD6), that learnt to better encode the measurement locations, outperform the average-filling (AD4, AD6) variants. In this set of experiment we see a slight trend that increasing network depth seems to improve the prediction quality.

		Mean Absolute Error			Correlation
Flight	Model	$W_{hor}$ [m/s]	$\Psi_{hor}$ [ $\deg$ ]	$W_{z}$ [m/s]	$W_{hor}$	$\Psi_{hor}$	$W_{z}$
Chasseral 1	AVG	0.62	7.22	0.53	-	-	-
	AD4	0.77	7.84	0.66	0.54	-0.09	0.94
	AD6	0.61	7.49	0.55	0.77	-0.17	0.95
	ZD4	0.87	9.18	0.41	0.26	-0.13	0.95
	ZD6	0.84	9.19	0.38	0.33	-0.37	0.95
Chasseral 2	AVG	0.57	13.6	0.50	-	-	-
	AD4	0.66	14.7	0.48	0.50	-0.35	0.90
	AD6	0.48	14.1	0.47	0.48	-0.46	0.87
	ZD4	0.73	17.2	0.35	0.49	-0.42	0.78
	ZD6	0.74	16.8	0.31	0.49	-0.60	0.80
Chasseral 3	AVG	0.55	9.1	0.49	-	-	-
	AD4	0.56	9.8	0.39	0.43	-0.21	0.82
	AD6	0.55	9.5	0.40	0.51	-0.33	0.84
	ZD4	0.67	14.0	0.32	0.32	-0.43	0.80
	ZD6	0.65	12.4	0.30	0.37	-0.38	0.84
Oberalppass	AVG	0.55	7.0	0.55	-	-	-
	AD4	1.05	19.7	0.30	0.28	-0.86	0.81
	AD6	3.12	7.5	0.35	0.43	0.85	0.76
	ZD4	0.65	5.8	0.33	-0.24	0.62	0.78
	ZD6	0.77	7.9	0.34	-0.46	-0.91	0.77
Gotthardpass	AVG	1.00	7.5	0.21	-	-	-
	AD4	2.55	64.3	0.57	0.26	0.75	0.21
	AD6	1.41	58.3	1.06	0.71	0.98	0.48
	ZD4	1.40	7.9	0.20	-0.06	0.97	0.48
	ZD6	1.19	7.8	0.17	0.33	0.63	0.88
All Chasseral Flights	AVG	0.58	9.98	0.51	-	-	-
	AD4	0.66	10.78	0.51	0.49	-0.22	0.89
	AD6	0.55	10.36	0.47	0.55	-0.32	0.89
	ZD4	0.76	13.46	0.36	0.36	-0.33	0.84
	ZD6	0.74	10.60	0.34	0.19	-0.11	0.86
All Flights	AVG	0.66	8.88	0.46	-	-	-
	AD4	1.12	23.27	0.48	0.40	-0.15	0.74
	AD6	1.23	19.38	0.57	0.58	0.17	0.78
	ZD4	0.86	10.82	0.32	0.15	0.12	0.76
	ZD6	0.84	9.5	0.30	0.09	-0.13	0.85

TABLE IV: sUAV flight results. The mean absolute error and the correlation for the horizontal wind magnitude

W_{hor}

, wind direction

\Psi_{hor}

, and vertical wind

W_{z}

on the loiter-averaged data. The results are the average over all loiters for all planes for the respective flight.

Appendix D sUAV airflow sensing

In this section we outline the design the airflow vanes used to estimate the full 3D wind aboard the sUAV. We explain and evaluate the calibration procedure to account for the constant mounting offset and the dynamic aerodynamic biases.

Design and calibration of sUAV airflow vanes

Two custom-designed wind vanes together with a pitot tube measure the 3D airflow. One vane measures the angle of attack (AoA) and the other one the angle of sideslip (AoS) of the airspeed vector relative to the sUAV body reference frame. During flight the wings flex due to maneuvers or wind gusts causing measurement error on the vanes that are larger if the vanes are mounted further towards the wingtips. However, the prop wash makes any placement too close to the fuselage invalid since the vanes need to measure the undisturbed free flow. Therefore we place the vanes approximately one quarter of the wing length away from the fuselage (Fig. 10 A)).

The vane is 3D printed and balanced using metal weights at the front and back (Fig. 10 B)). Small ball bearings at the connection axis ensure little friction in the setup and fast response time to changing wind. A diametric radial magnet is mounted at the end of the connection axis resulting in a changing magnetic field (for different angles) that the Hall sensor measures.

The calibration tool, shown in Fig. 10 C), allowed us to accurately set the vanes to angles with \qty2 increments, thus gathering accurate data to determine the mapping from the magnetic flux density $B$ . We calibrated each sensor for angles ranging from \qty-24 to \qty24 using a third-order polynomial function. Fig. 11 shows the measurements and the resulting fit for one wind vane.

sUAV airflow sensing calibration

Raw AoA and AoS measurements are subject to mounting errors as well as aerodynamic effects from the fuselage and the wing. We defined calibration functions based on wind tunnel data provided by Heinrich et al. [67] to estimate the true airflow angles based on the sensor measurements $(\alpha_{raw},\beta_{raw})$ in the relevant range (AoA between $0^{\circ}$ to $15^{\circ}$ , AoS between $-10^{\circ}$ to $10^{\circ}$ ):

$\displaystyle\alpha_{off}$	$\displaystyle=p_{\alpha,0}+p_{\alpha,1}\cdot\alpha_{raw}+p_{\alpha,2}\left(% \alpha_{raw}+p_{\alpha,3}\right)\left(v_{Aspd}+p_{\alpha,4}\right),$	(2)
$\displaystyle\beta_{off}$	$\displaystyle=p_{\beta,0}+p_{\beta,1}\left(v_{Aspd}+p_{\beta,2}\right)\left(1+% \tanh\left(p_{\beta,3}\left(\alpha_{raw}+p_{\beta,4}\right)\right)\right)+$
	$\displaystyle p_{\beta,5}\cdot\beta_{raw}+p_{\beta,6}\cdot\tanh\left(p_{\beta,% 7}\left(\Phi+p_{\beta,8}\right)\right),$	(3)

where the $p$ variables are free parameters. For the wind tunnel validation, the parameters were estimated by minimizing the MSE between the sensor measurements and ground truth airflow angles (orientation of the aircraft using a tunnel-mounted sting, assumed to have very low angular position error). Fitting the wind tunnel data, the base functions result in \@iaciMSE MSE for the AoA of $0.45^{\circ}$ and $0.83^{\circ}$ for the AoS.

However, due to variations in mounts and aircraft, this calibration could not be performed for every sensor installation. Thus, we further defined a calibration routine to estimate the parameters of Eq. 2, 3 based on data gathered during a calibration flight, removing the need to calibrate every sUAV with wind tunnel data. The underlying assumptions that ensure the parameters are observable are that the horizontal wind is piecewise constant and that there is no vertical wind during the calibration flight (calibration flights were performed in as calm flight conditions as possible, usually early morning). We also assume the estimated attitude and global position/velocity are accurate. To cover the different flight regimes our calibration flight consisted of counter-clockwise and clockwise loiter circles of different radii ranging from \qtyrange30100 flown at different airspeeds (\qty10\per to \qty16\per). We then solved for the calibration parameters and the wind ( $W_{x}$ , $W_{y}$ ) by minimizing the error using a nonlinear least-squares solver in the wind triangle over the full flight:

\bm{e}=R\left(\Phi,\Theta,\Psi\right)\begin{bmatrix}v_{Aspd}\\ v_{Aspd}\cdot\tanh\left(\beta-\beta_{off}\right)+l_{x,\beta}\cdot\omega_{z}-l_% {z,\beta}\cdot\omega_{x}\\ v_{Aspd}\cdot\tanh\left(\alpha-\alpha_{off}\right)-l_{x,\alpha}\cdot\omega_{y}% +l_{y,\alpha}\cdot\omega_{x}\end{bmatrix}+\begin{bmatrix}W_{x}\\ W_{y}\\ 0\end{bmatrix}-\bm{v_{Gnd}},

(4)

where $R\left(\Theta,\Phi,\Psi\right)$ is the rotation matrix based on the current attitude, $\bm{v_{Gnd}}$ the estimated ground speed vector, and $\omega_{(.)}$ the rotational speed around the respective axis. The offset from the vanes to the autopilot origin is denoted by $l_{x,\beta}$ , $l_{z,\beta}$ , $l_{x,\alpha}$ , and $l_{y,\alpha}$ .

Using the uncalibrated measurements from the airflow sensors results in strong oscillations of the estimated horizontal wind (strongly correlated to the loiter frequency) and a vertical estimate with a non-zero mean as visible in Fig. 12 A) as the sensors are located within the disturbed flow from the wing and airframe. The piecewise linear horizontal and zero-mean vertical wind fit as a result of the airflow calibration pipeline are shown Fig. 12 B). Although there is no constraint on the difference between the segments in the horizontal wind, the changes are relatively small. This stable, near-constant wind (magnitude and direction) reflects the forecast and observations made from the ground during the flight. The calibration reduces the estimated oscillations in the wind significantly and results in accurately measuring the zero-mean vertical wind (Fig. 12 C)). However, some correlation between the wind estimates and the loiter frequency remain, indicating that the calibration function could still be improved.

Calibration quality

In contrast to the calibration flights, in the actual data collection flights, the wind estimates from the sUAV again show some oscillations strongly correlating with the loiter patterns in flight. However, for these data collection flights we expect the wind to vary across different locations so this could be correctly observed changes in the wind field. In Fig. 13 we display the binned wind observations from two loiters patterns flown at the same altitude next to each other. Especially for the horizontal wind measurements $w_{e}$ and $w_{n}$ we can see the same pattern repeating for both loiters with an amplitude of about \qty1\per in each direction. This pattern indicates that we would expect errors in the horizontal measurements of about $\pm$ \qty1\per, which are comparable to the observed variation of the measurements between the sUAV and within a single flight. For the vertical wind we do not see such repeating patterns, thus we expect a higher quality of these measurements.

The altitude for the calibration flight of \qty540 above mean sea level compared to altitudes of the data collection flights (\qtyrange16002200) results in \qtyrange10.515.1% lower air densities at the higher altitudes. Previous work has shown that density changes result in changing flow fields [68, 69]. This could, in part, explain the difficulty to accurately calibrating the airflow sensing if they are located within the disturbed flow field of the air-frame. Therefore, for future flights, the sensors should be placed further away from the wings and fuselage to minimize the aerodynamic disturbances on the sensors, and calibration flights performed at the same altitude as test flights.