Copyright CNES

From PixC to Virtual Stations

In this tutorial, we will use the output from the pixcdust library, a space-time-variable extract and concatenation of the SWOT L2 HR Pixel Cloud Product to compute “virtual stations” (i.e timeseries of water surface height) in user-defined polygons of interest.

This example originates from a study presented by Gasnier & Zawadzki at EGU 2025 on the potential of SWOT data on narrow rivers. It is located in Panama. contact to recreate the study: Lionel.Zawadzki@cnes.fr

[ ]:
import geopandas as gpd
import numpy as np
import fiona
import matplotlib.pyplot as plt
from tqdm import tqdm

Defining the polygons of interest

Here, we need a geopandas.GeoDataFrame object with one or several geometries (features) corresponding to the areas of interest. This can be achieved in many different ways, for instance:

  • use QGIS or ArcGIS to generate or extract a layer with polygons saved in a shapefile or geopackage database or any other vector layer format. Load this file with geopandas.read_file.

  • Create polygons on the fly using https://wktmap.com/ or similar website to draw a polygon, export text in Well-Know-Text, paste in notebook, create a geopandas.Geoseries and convert to geopandas.GeoDataFrame.

  • Use the Prior Lake Database, read it with geopandas.read_file and restrain it to you lakes of interest

All these methods are used in the various tutorials.

In the present turorial, we use the first method.

[16]:
zone_file = '/home/hysope2/STUDIES/SWOT_Panama/DATA/zones.gpkg'
gdf_zones = gpd.read_file(zone_file)
gdf_zones.explore()
[16]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Read the SWOT file generated by pixcdust

nb: the SWOT geopackage file has been generated thanks to the pixcdust library, cf dedicated tutorial “Download and convert to Geopackage” in the Intermediary Section. It is basically an extract (space, time, variables) of the SWOT L2 Pixel Cloud product, each layer within the geopackage being the extract of one SWOT PixC tile. The area of interest used to generate this geopackage is larger than the ones defined in the present tutorial.

[19]:
swot_file = '/home/hysope2/STUDIES/SWOT_Panama/DATA/gpkg/swot.gpkg'


layers = fiona.listlayers(swot_file)
# (optional) removing the layers ending with _h3, which were statistical aggregation in a H3 DGGS grid
layers = [l for l in layers if not l.endswith('_h3')]

Limiting SWOT data to the defined polygons of interest

A few functions

[21]:
from datetime import datetime as dt
import pandas as pd

def load_layers(swot_file, layers, list_dates):

    joined = None

    for layer in tqdm(layers):
        #get the date from the layer's name
        try: date = layer.split('_')[3]
        except: continue
        # list_dates is a list with the dates of interest (should you want to remove some dates)
        if not any(k in date for k in list_dates): continue

        swot = gpd.read_file(
            swot_file,
            engine='pyogrio',# much faster
            use_arrow=True,# much faster
            layer=layer
            )
        # restrict to polygons of interest
        joined_tmp = gpd.sjoin(left_df=swot, right_df=gdf_zones, how="inner", predicate="within")
        del swot
        if joined_tmp is None: continue

        joined_tmp["date"] = dt.strptime(date, "%Y%m%d")

        # concatenate results
        if joined is None:
            joined = joined_tmp
        else:
            joined = gpd.GeoDataFrame(pd.concat([joined, joined_tmp], ignore_index=True))

        del joined_tmp

    return joined


Loading the data and editing

[22]:
dates = [str(k) for k in range(20230501, 20230615)]
# Those dates where clear outliers in my study, removing them
for d in ['20230413', '20230517', '20230524', '20230618']:
    if d in dates: dates.remove(d)

ddf = load_layers(swot_file, layers, dates)
# defining some editing criteria on Sigma0, classification and Water Surface Height range
ddf = ddf[
    (ddf.sig0>=20)&
    (ddf.classification>=3)&(ddf.classification<=4)&
    (ddf.wse>=-10)&(ddf.wse<=30)
]

100%|██████████| 80/80 [00:16<00:00,  4.95it/s]

Compute statistics (mean, median, std, etc.) for each Polygon and date

[24]:
# Compute statistics (mean, median, std, etc.) for each polygon and date
df_vs = ddf.groupby(['lake_id', 'date'])['wse'].describe().reset_index()
# sorting dates to have timeseries
df_vs.sort_values('date', inplace=True)
# keeping only values for "robust" statistics
df_vs = df_vs[df_vs['count']>=20]

Make a nice plot out of it

for some of the polygons of interest

[25]:
zones_id = [10, 20, 30, 40, 50]
var = 'mean'

[ ]:
plt.rcParams["axes.prop_cycle"] = plt.cycler(
    "color",
    plt.cm.turbo(np.linspace(0,1,len(zones_id)+1))
)

fig, ax = plt.subplots(figsize=(16,8))
fig.patch.set_facecolor('#FEFEFE')
fig.patch.set_alpha(0.8)

for i,id in enumerate(zones_id):
    ax.fill_between(
        df_vs[df_vs['lake_id']==id].date,
        df_vs[df_vs['lake_id']==id][var],
        0., alpha=.4, label=id
    )
    ax.plot(
        df_vs[df_vs['lake_id']==id].date,
        df_vs[df_vs['lake_id']==id][var],
        ls='None', marker='o', alpha=.6, label=None
    )


# Some optional design code
ax.legend(loc=1)
ax.set_ylabel('water surface elevation (m)', color='.3')
ax.patch.set_facecolor('#EFEFEF')
ax.patch.set_alpha(0.)

# Removing axis
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
for spine in ['left', 'bottom']:
    ax.spines[spine].set_linewidth(3)
    ax.spines[spine].set_color('.3')
ax.tick_params(colors='.3')


# fig.savefig('virtual_stations.png', facecolor=fig.get_facecolor(), edgecolor='none', bbox_inches='tight')
../../_images/SWOT-Hydrology_INTERMEDIATE_from_pixc_to_virtual_stations_18_0.png

Enjoy! L.Zawadzki