Download notebook.

Download and convert to Zarr

This downloads SWOT Pixel Cloud products from hydroweb.next (API-Key necessary) based on a region and a period of interest. Then is extracts information contained in the area of interest for your study, stores everything in a Zarr Database (based on the zcollection package) for future use. Zarr (and the way we partitionned data with zcollection) is very efficient for computation. However, it is not (yet) compatible with QGIS compared to Geopackage.

Setting the region and period of interest

Using a geopackage layer, preliminary created with, e.g. QGIS, to limit data download and database

[1]:

from pixcdust.downloaders.hydroweb_next import PixCDownloader
import geopandas as gpd
from datetime import datetime

/home/hysope2/SRC/preprocess_swot_pixc/pixcdust/.venv/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

[2]:

# reading the area of interest polygon (could have been set)
gdf_geom = gpd.read_file('/home/hysope2/STUDIES/SWOT_Panama/DATA/aoi.gpkg')

dates = (
    datetime(2023,4,6),
    datetime(2023,4,15),
)

Download

This will unfortunately lead to downloading many big files (that will be removed later). This is the only way right now, but the hydroweb.next team is working on improving that.

[3]:

pixcdownloader = PixCDownloader(
    gdf_geom,
    dates,
    verbose=0,
    path_download='/tmp/pixc',
    )
pixcdownloader.search_download()

Extraction

Now we have all necessary files, let us extract key variables within area of interest in a Zarr (zcollection) database. This Zarr partionned format is very efficient for time analysis, but is not currently accessible in GIS softwares such as QGIS We are using the same geodataframe to limit the data to the area of interest

[4]:

from pixcdust.converters.zarr import PixCNc2ZarrConverter
from glob import glob

[5]:

pixc = PixCNc2ZarrConverter(
            glob(pixcdownloader.path_download+'/*/*nc'),
            "/tmp/my_awesome_pixc_zarr",
            variables=['height', 'sig0', 'classification'],
            area_of_interest=gdf_geom,
            mode='o',
        )
pixc.database_from_nc()

/home/hysope2/SRC/preprocess_swot_pixc/pixcdust/.venv/lib/python3.10/site-packages/distributed/client.py:3164: UserWarning: Sending large graph of size 611.84 MiB.
This may cause some slowdown.
Consider scattering data ahead of time and using futures.
  warnings.warn(

database has been succesfully created, we can remove the raw files

[6]:

# import shutil
# shutil.rmtree('/tmp/pixc')

Read the database

previous steps are not necessary

Now we can open this database in a xarray, or dataframe, or GeoDataFrame

[7]:

from pixcdust.readers.zarr import PixCZarrReader
import datetime

pixc_read = PixCZarrReader(
    "/tmp/my_awesome_pixc_zarr"
)
pixc_read.read((datetime.datetime(2023,4,10), datetime.datetime(2023,4,12)))
pixc_read.data

[7]:

<zcollection.dataset.Dataset>
  Dimensions: ('points: 18871671',)
Data variables:
    tile_number    (points)  uint16: dask.array<chunksize=(2097152,)>
    classification (points)  float32: dask.array<chunksize=(2097152,)>
    longitude      (points)  float32: dask.array<chunksize=(2097152,)>
    geoid          (points)  float32: dask.array<chunksize=(2097152,)>
    height         (points)  float32: dask.array<chunksize=(2097152,)>
    sig0           (points)  float32: dask.array<chunksize=(2097152,)>
    pass_number    (points)  uint16: dask.array<chunksize=(2097152,)>
    time           (points)  datetime64[ns]: dask.array<chunksize=(2097152,)>
    cycle_number   (points)  uint16: dask.array<chunksize=(2097152,)>
    latitude       (points)  float64: dask.array<chunksize=(2097152,)>
  Attributes:
    azimuth_offset            : 8
    description               : 'cloud of geolocated interferogram pixels'
    interferogram_size_azimuth: 2190
    interferogram_size_range  : 5075
    looks_to_efflooks         : 1.5377120501155137
    num_azimuth_looks         : 7.0

[8]:

gdf_pixc = pixc_read.to_geodataframe()
gdf_pixc

[8]:

	tile_number	classification	longitude	geoid	height	sig0	pass_number	time	cycle_number	latitude	geometry
points
0	170	1.0	-79.104660	13.444022	20.446413	NaN	9	2023-04-10 03:26:32	486	8.762601	POINT (-79.10466 8.76260)
1	170	1.0	-79.107796	13.425582	19.637384	NaN	9	2023-04-10 03:26:32	486	8.763069	POINT (-79.10780 8.76307)
2	170	1.0	-79.102852	13.454618	19.055544	NaN	9	2023-04-10 03:26:32	486	8.762330	POINT (-79.10285 8.76233)
3	170	1.0	-79.108620	13.420682	18.149841	NaN	9	2023-04-10 03:26:32	486	8.763193	POINT (-79.10862 8.76319)
4	170	1.0	-79.108315	13.422508	17.395298	NaN	9	2023-04-10 03:26:32	486	8.763147	POINT (-79.10831 8.76315)
...	...	...	...	...	...	...	...	...	...	...	...
18871666	171	7.0	-79.561188	5.001038	2.826578	11.713768	9	2023-04-11 03:17:16	487	9.760922	POINT (-79.56119 9.76092)
18871667	171	7.0	-79.561272	5.000404	2.826173	14.126613	9	2023-04-11 03:17:16	487	9.760934	POINT (-79.56127 9.76093)
18871668	171	4.0	-79.561638	4.997420	5.619805	19.553595	9	2023-04-11 03:17:16	487	9.760988	POINT (-79.56164 9.76099)
18871669	171	4.0	-79.561813	4.995993	6.564185	21.987041	9	2023-04-11 03:17:16	487	9.761015	POINT (-79.56181 9.76101)
18871670	171	7.0	-79.561882	4.995450	6.457133	18.216505	9	2023-04-11 03:17:16	487	9.761025	POINT (-79.56188 9.76102)

18871671 rows × 11 columns

Enjoy!

Download notebook.