Skip to content

Dataset Module🔗

Dataset processing classes for water timeseries analysis.

This module provides classes for processing and normalizing satellite-derived land cover and water classification data. It includes specialized handlers for different data sources and processing pipelines.

DWDataset 🔗

Bases: LakeDataset

Handler for Dynamic World land cover classification data.

Processes Dynamic World land cover classes including water, bare soil, snow/ice, trees, grass, flooded vegetation, crops, shrub/scrub, and built areas.

Attributes:

Name Type Description
water_column str

Fixed as "water" for DW data.

data_columns list

All 9 DW land cover classes.

Example

dw_data = DWDataset(xr.open_dataset("dynamic_world.nc")) water_time_series = dw_data.ds_normalized["water"] print(dw_data.data_columns) ['water', 'bare', 'snow_and_ice', 'trees', 'grass', 'flooded_vegetation', 'crops', 'shrub_and_scrub', 'built']

Source code in src/water_timeseries/dataset.py
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
class DWDataset(LakeDataset):
    """Handler for Dynamic World land cover classification data.

    Processes Dynamic World land cover classes including water, bare soil, snow/ice,
    trees, grass, flooded vegetation, crops, shrub/scrub, and built areas.

    Attributes:
        water_column (str): Fixed as "water" for DW data.
        data_columns (list): All 9 DW land cover classes.

    Example:
        >>> dw_data = DWDataset(xr.open_dataset("dynamic_world.nc"))
        >>> water_time_series = dw_data.ds_normalized["water"]
        >>> print(dw_data.data_columns)
        ['water', 'bare', 'snow_and_ice', 'trees', 'grass', 'flooded_vegetation', 'crops', 'shrub_and_scrub', 'built']
    """

    def __init__(self, ds):
        """Initialize DWDataset with Dynamic World data.

        Args:
            ds (xr.Dataset): Input xarray Dataset with at least the 9 DW class variables.
        """
        super().__init__(ds)
        self.water_column = "water"
        self.data_columns = [
            "water",
            "bare",
            "snow_and_ice",
            "trees",
            "grass",
            "flooded_vegetation",
            "crops",
            "shrub_and_scrub",
            "built",
        ]

    def _preprocess(self):
        """Preprocess Dynamic World data.

        Calculates total area as the sum of all land cover classes and computes
        the no-data area as the difference from maximum area across time.
        """
        super()._preprocess()
        ds = self.ds
        ds["area_data"] = (
            ds["bare"]
            + ds["water"]
            + ds["snow_and_ice"]
            + ds["trees"]
            + ds["grass"]
            + ds["flooded_vegetation"]
            + ds["crops"]
            + ds["shrub_and_scrub"]
            + ds["built"]
        )

        max_area = ds["area_data"].max(dim="date", skipna=True)
        ds["area_nodata"] = (max_area - ds["area_data"]).round(4)

        self.preprocessed_ = True
        self.ds = ds

    def _mask_invalid(self):
        """Mask invalid data based on quality criteria.

        Removes observations where data quality is poor (high no-data area) or
        where snow/ice coverage is excessive (more than 5%), which indicates poor classification.
        """
        ds = self.ds_normalized
        # Mask where no-data area > 0
        mask_nodata = ds["area_nodata"] <= 0
        # Mask where snow/ice > 5% (indicates poor classification)
        mask_snow = ds["snow_and_ice"] <= 0.05
        # Combine masks
        mask = mask_nodata & mask_snow

        self.ds = self.ds.where(mask)
        self.ds_normalized = self.ds_normalized.where(mask)

        self.ds_ismasked_ = True
        self.ds_normalized_ismasked_ = True

    # create_timelapse is inherited from LakeDataset

    def plot_timeseries(self, id_geohash: str, breakpoints=None, save_path: Optional[str | Path] = None) -> plt.Figure:
        """Plot the time series for a specific geohash.

        Args:
            id_geohash (str): The geohash identifier for the location.
            breakpoints (BreakpointMethod, optional): Breakpoint detection method to use.
            save_path (str | Path, optional): Path to save the plot as an image file.

        Returns:
            plt.Figure: The matplotlib figure object.
        """
        # self._normalize_ds()
        df = self.ds.sel(id_geohash=id_geohash).load().to_dataframe().dropna()
        df_plot = prepare_data_for_plot_dw(df, group_vegetation=True)
        normalization_factor = df["area_data"].max()

        if breakpoints is not None:
            breaks = breakpoints.calculate_break(self, object_id=id_geohash)
            if breaks is not None:
                if len(breaks) > 0:
                    bp = breaks["date_break"].iloc[0]
                else:
                    bp = None
        else:
            bp = None

        figure = plot_water_time_series_dw(
            df_plot,
            first_break=bp,
            normalization_factor=normalization_factor,
            lake_id=id_geohash,
            save_path=save_path,
        )

        return figure

    def plot_timeseries_interactive(
        self,
        id_geohash: str,
        breakpoints=None,
        save_path: Optional[str | Path] = None,
    ):
        """Plot the interactive time series for a specific geohash using Plotly.

        Args:
            id_geohash (str): The geohash identifier for the location.
            breakpoints (BreakpointMethod, optional): Breakpoint detection method to use.
            save_path (str | Path, optional): Path to save the plot as HTML file.

        Returns:
            plotly.graph_objects.Figure: Interactive Plotly figure.
        """
        df = self.ds.sel(id_geohash=id_geohash).load().to_dataframe().dropna()
        df_plot = prepare_data_for_plot_dw(df, group_vegetation=True)
        normalization_factor = df["area_data"].max()

        if breakpoints is not None:
            breaks = breakpoints.calculate_break(self, object_id=id_geohash)
            if breaks is not None:
                if len(breaks) > 0:
                    bp = breaks["date_break"].iloc[0]
                else:
                    bp = None
        else:
            bp = None

        figure = plot_water_time_series_dw_interactive(
            df_plot,
            first_break=bp,
            normalization_factor=normalization_factor,
            lake_id=id_geohash,
            save_path=save_path,
        )

        return figure

__init__(ds) 🔗

Initialize DWDataset with Dynamic World data.

Parameters:

Name Type Description Default
ds Dataset

Input xarray Dataset with at least the 9 DW class variables.

required
Source code in src/water_timeseries/dataset.py
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
def __init__(self, ds):
    """Initialize DWDataset with Dynamic World data.

    Args:
        ds (xr.Dataset): Input xarray Dataset with at least the 9 DW class variables.
    """
    super().__init__(ds)
    self.water_column = "water"
    self.data_columns = [
        "water",
        "bare",
        "snow_and_ice",
        "trees",
        "grass",
        "flooded_vegetation",
        "crops",
        "shrub_and_scrub",
        "built",
    ]

plot_timeseries(id_geohash, breakpoints=None, save_path=None) 🔗

Plot the time series for a specific geohash.

Parameters:

Name Type Description Default
id_geohash str

The geohash identifier for the location.

required
breakpoints BreakpointMethod

Breakpoint detection method to use.

None
save_path str | Path

Path to save the plot as an image file.

None

Returns:

Type Description
Figure

plt.Figure: The matplotlib figure object.

Source code in src/water_timeseries/dataset.py
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
def plot_timeseries(self, id_geohash: str, breakpoints=None, save_path: Optional[str | Path] = None) -> plt.Figure:
    """Plot the time series for a specific geohash.

    Args:
        id_geohash (str): The geohash identifier for the location.
        breakpoints (BreakpointMethod, optional): Breakpoint detection method to use.
        save_path (str | Path, optional): Path to save the plot as an image file.

    Returns:
        plt.Figure: The matplotlib figure object.
    """
    # self._normalize_ds()
    df = self.ds.sel(id_geohash=id_geohash).load().to_dataframe().dropna()
    df_plot = prepare_data_for_plot_dw(df, group_vegetation=True)
    normalization_factor = df["area_data"].max()

    if breakpoints is not None:
        breaks = breakpoints.calculate_break(self, object_id=id_geohash)
        if breaks is not None:
            if len(breaks) > 0:
                bp = breaks["date_break"].iloc[0]
            else:
                bp = None
    else:
        bp = None

    figure = plot_water_time_series_dw(
        df_plot,
        first_break=bp,
        normalization_factor=normalization_factor,
        lake_id=id_geohash,
        save_path=save_path,
    )

    return figure

plot_timeseries_interactive(id_geohash, breakpoints=None, save_path=None) 🔗

Plot the interactive time series for a specific geohash using Plotly.

Parameters:

Name Type Description Default
id_geohash str

The geohash identifier for the location.

required
breakpoints BreakpointMethod

Breakpoint detection method to use.

None
save_path str | Path

Path to save the plot as HTML file.

None

Returns:

Type Description

plotly.graph_objects.Figure: Interactive Plotly figure.

Source code in src/water_timeseries/dataset.py
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
def plot_timeseries_interactive(
    self,
    id_geohash: str,
    breakpoints=None,
    save_path: Optional[str | Path] = None,
):
    """Plot the interactive time series for a specific geohash using Plotly.

    Args:
        id_geohash (str): The geohash identifier for the location.
        breakpoints (BreakpointMethod, optional): Breakpoint detection method to use.
        save_path (str | Path, optional): Path to save the plot as HTML file.

    Returns:
        plotly.graph_objects.Figure: Interactive Plotly figure.
    """
    df = self.ds.sel(id_geohash=id_geohash).load().to_dataframe().dropna()
    df_plot = prepare_data_for_plot_dw(df, group_vegetation=True)
    normalization_factor = df["area_data"].max()

    if breakpoints is not None:
        breaks = breakpoints.calculate_break(self, object_id=id_geohash)
        if breaks is not None:
            if len(breaks) > 0:
                bp = breaks["date_break"].iloc[0]
            else:
                bp = None
    else:
        bp = None

    figure = plot_water_time_series_dw_interactive(
        df_plot,
        first_break=bp,
        normalization_factor=normalization_factor,
        lake_id=id_geohash,
        save_path=save_path,
    )

    return figure

JRCDataset 🔗

Bases: LakeDataset

Handler for JRC (Joint Research Centre) water classification data.

Processes JRC water occurrence data with separate classes for permanent water, seasonal water, and land.

Attributes:

Name Type Description
water_column str

Fixed as "area_water_permanent" for JRC data.

data_columns list

['area_water_permanent', 'area_water_seasonal', 'area_land'].

Example

jrc_data = JRCDataset(xr.open_dataset("jrc_water.nc")) permanent_water = jrc_data.ds_normalized["area_water_permanent"] seasonal_water = jrc_data.ds_normalized["area_water_seasonal"]

Source code in src/water_timeseries/dataset.py
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
class JRCDataset(LakeDataset):
    """Handler for JRC (Joint Research Centre) water classification data.

    Processes JRC water occurrence data with separate classes for permanent water,
    seasonal water, and land.

    Attributes:
        water_column (str): Fixed as "area_water_permanent" for JRC data.
        data_columns (list): ['area_water_permanent', 'area_water_seasonal', 'area_land'].

    Example:
        >>> jrc_data = JRCDataset(xr.open_dataset("jrc_water.nc"))
        >>> permanent_water = jrc_data.ds_normalized["area_water_permanent"]
        >>> seasonal_water = jrc_data.ds_normalized["area_water_seasonal"]
    """

    def __init__(self, ds):
        """Initialize JRCDataset with JRC water classification data.

        Args:
            ds (xr.Dataset): Input xarray Dataset with JRC water classification variables.
        """
        super().__init__(ds)
        self.water_column = "area_water_permanent"
        self.data_columns = ["area_water_permanent", "area_water_seasonal", "area_land"]

    def _preprocess(self):
        """Preprocess JRC water data.

        Calculates total area as the sum of permanent water, seasonal water, and land.
        """
        ds = self.ds
        ds["area_data"] = ds["area_land"] + ds["area_water_permanent"] + ds["area_water_seasonal"]

        max_area = ds["area_data"].max(dim="date", skipna=True)
        ds["area_nodata"] = (max_area - ds["area_data"]).round(4)

        self.preprocessed_ = True
        self.ds = ds

    def _mask_invalid(self):
        """Mask invalid data based on data quality.

        Removes observations where the no-data area exceeds quality thresholds.
        """
        ds = self.ds_normalized
        mask = ds["area_nodata"] <= 0
        self.ds = self.ds.where(mask)
        self.ds_normalized = self.ds_normalized.where(mask)

        self.ds_ismasked_ = True
        self.ds_normalized_ismasked_ = True

    def create_timelapse(
        self,
        lake_gdf: gpd.GeoDataFrame,
        id_geohash: str,
        timelapse_source: str = "landsat",
        gif_outdir: str | Path = "gifs",
        buffer: float = 100,
        start_year: int = 2000,
        end_year: int = 2025,
        start_date: str = "07-01",
        end_date: str = "08-31",
        frames_per_second: int = 1,
        dimensions: int = 512,
        overwrite_exists: bool = False,
    ) -> Path | None:
        """
        Create a timelapse GIF for a specific lake.

        This method generates an animated GIF showing satellite imagery
        over a date range for a lake identified by its geohash. The timelapse captures
        the summer period (July-August) each year to maximize cloud-free observations.

        Default timelapse_source is 'landsat' for JRC data.

        Args:
            lake_gdf: GeoDataFrame containing lake geometries with an 'id_geohash' column.
            id_geohash: The geohash identifier for the specific lake to visualize.
            timelapse_source: Image source for timelapse imagery ('sentinel2' or 'landsat').
            gif_outdir: Output directory for the GIF file (default: 'gifs').
            buffer: Buffer distance in meters to expand the lake bounding box (default: 100).
            start_year: Start year for the timelapse (default: 2000).
            end_year: End year for the timelapse (default: 2025).
            start_date: Start date within each year (MM-DD format, default: '07-01').
            end_date: End date within each year (MM-DD format, default: '08-31').
            frames_per_second: Animation speed (default: 1).
            dimensions: Pixel dimensions for the output GIF (default: 512).
            overwrite_exists: If False (default), skip download if output file already exists.
                              If True, always re-download and overwrite existing file.

        Returns:
            Path | None: Path to the generated GIF file, or None if skipped due to existing file.
        """
        return create_timelapse(
            input_lake_gdf=lake_gdf,
            id_geohash=id_geohash,
            timelapse_source=timelapse_source,
            gif_outdir=gif_outdir,
            buffer=buffer,
            start_year=start_year,
            end_year=end_year,
            start_date=start_date,
            end_date=end_date,
            frames_per_second=frames_per_second,
            dimensions=dimensions,
            overwrite_exists=overwrite_exists,
        )

    def plot_timeseries(self, id_geohash: str, breakpoints=None, save_path: Optional[str | Path] = None) -> plt.Figure:
        """Plot the time series for a specific geohash.

        Args:
            id_geohash (str): The geohash identifier for the location.
            breakpoints (BreakpointMethod, optional): Breakpoint detection method to use.
            save_path (str | Path, optional): Path to save the plot as an image file.

        Returns:
            plt.Figure: The matplotlib figure object.
        """
        df = self.ds.sel(id_geohash=id_geohash).load().to_dataframe().dropna().reset_index(drop=False)
        normalization_factor = df["area_data"].max()

        # TODO: breaks are not visualized correctly
        if breakpoints is not None:
            breaks = breakpoints.calculate_break(self, object_id=id_geohash)
            if breaks is not None:
                bp = breaks["date_break"].iloc[0]
        else:
            bp = None

        fig = plot_water_time_series_jrc(
            df,
            first_break=bp,
            plot_variables=["area_water_permanent", "area_water_seasonal", "area_land"],
            normalization_factor=normalization_factor,
            lake_id=id_geohash,
            save_path=save_path,
        )

        # return figure
        return fig

    def plot_timeseries_interactive(
        self,
        id_geohash: str,
        breakpoints=None,
        save_path: Optional[str | Path] = None,
    ):
        """Plot the interactive time series for a specific geohash using Plotly.

        Args:
            id_geohash (str): The geohash identifier for the location.
            breakpoints (BreakpointMethod, optional): Breakpoint detection method to use (not used currently).
            save_path (str | Path, optional): Path to save the plot as HTML file.

        Returns:
            plotly.graph_objects.Figure: Interactive Plotly figure.
        """
        df = self.ds.sel(id_geohash=id_geohash).load().to_dataframe().dropna().reset_index(drop=False)
        normalization_factor = df["area_data"].max()

        # Breakpoint processing disabled for now
        bp = None

        fig = plot_water_time_series_jrc_interactive(
            df,
            first_break=bp,
            plot_variables=["area_water_permanent", "area_water_seasonal", "area_land"],
            normalization_factor=normalization_factor,
            lake_id=id_geohash,
            save_path=save_path,
        )

        return fig

__init__(ds) 🔗

Initialize JRCDataset with JRC water classification data.

Parameters:

Name Type Description Default
ds Dataset

Input xarray Dataset with JRC water classification variables.

required
Source code in src/water_timeseries/dataset.py
466
467
468
469
470
471
472
473
474
def __init__(self, ds):
    """Initialize JRCDataset with JRC water classification data.

    Args:
        ds (xr.Dataset): Input xarray Dataset with JRC water classification variables.
    """
    super().__init__(ds)
    self.water_column = "area_water_permanent"
    self.data_columns = ["area_water_permanent", "area_water_seasonal", "area_land"]

create_timelapse(lake_gdf, id_geohash, timelapse_source='landsat', gif_outdir='gifs', buffer=100, start_year=2000, end_year=2025, start_date='07-01', end_date='08-31', frames_per_second=1, dimensions=512, overwrite_exists=False) 🔗

Create a timelapse GIF for a specific lake.

This method generates an animated GIF showing satellite imagery over a date range for a lake identified by its geohash. The timelapse captures the summer period (July-August) each year to maximize cloud-free observations.

Default timelapse_source is 'landsat' for JRC data.

Parameters:

Name Type Description Default
lake_gdf GeoDataFrame

GeoDataFrame containing lake geometries with an 'id_geohash' column.

required
id_geohash str

The geohash identifier for the specific lake to visualize.

required
timelapse_source str

Image source for timelapse imagery ('sentinel2' or 'landsat').

'landsat'
gif_outdir str | Path

Output directory for the GIF file (default: 'gifs').

'gifs'
buffer float

Buffer distance in meters to expand the lake bounding box (default: 100).

100
start_year int

Start year for the timelapse (default: 2000).

2000
end_year int

End year for the timelapse (default: 2025).

2025
start_date str

Start date within each year (MM-DD format, default: '07-01').

'07-01'
end_date str

End date within each year (MM-DD format, default: '08-31').

'08-31'
frames_per_second int

Animation speed (default: 1).

1
dimensions int

Pixel dimensions for the output GIF (default: 512).

512
overwrite_exists bool

If False (default), skip download if output file already exists. If True, always re-download and overwrite existing file.

False

Returns:

Type Description
Path | None

Path | None: Path to the generated GIF file, or None if skipped due to existing file.

Source code in src/water_timeseries/dataset.py
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
def create_timelapse(
    self,
    lake_gdf: gpd.GeoDataFrame,
    id_geohash: str,
    timelapse_source: str = "landsat",
    gif_outdir: str | Path = "gifs",
    buffer: float = 100,
    start_year: int = 2000,
    end_year: int = 2025,
    start_date: str = "07-01",
    end_date: str = "08-31",
    frames_per_second: int = 1,
    dimensions: int = 512,
    overwrite_exists: bool = False,
) -> Path | None:
    """
    Create a timelapse GIF for a specific lake.

    This method generates an animated GIF showing satellite imagery
    over a date range for a lake identified by its geohash. The timelapse captures
    the summer period (July-August) each year to maximize cloud-free observations.

    Default timelapse_source is 'landsat' for JRC data.

    Args:
        lake_gdf: GeoDataFrame containing lake geometries with an 'id_geohash' column.
        id_geohash: The geohash identifier for the specific lake to visualize.
        timelapse_source: Image source for timelapse imagery ('sentinel2' or 'landsat').
        gif_outdir: Output directory for the GIF file (default: 'gifs').
        buffer: Buffer distance in meters to expand the lake bounding box (default: 100).
        start_year: Start year for the timelapse (default: 2000).
        end_year: End year for the timelapse (default: 2025).
        start_date: Start date within each year (MM-DD format, default: '07-01').
        end_date: End date within each year (MM-DD format, default: '08-31').
        frames_per_second: Animation speed (default: 1).
        dimensions: Pixel dimensions for the output GIF (default: 512).
        overwrite_exists: If False (default), skip download if output file already exists.
                          If True, always re-download and overwrite existing file.

    Returns:
        Path | None: Path to the generated GIF file, or None if skipped due to existing file.
    """
    return create_timelapse(
        input_lake_gdf=lake_gdf,
        id_geohash=id_geohash,
        timelapse_source=timelapse_source,
        gif_outdir=gif_outdir,
        buffer=buffer,
        start_year=start_year,
        end_year=end_year,
        start_date=start_date,
        end_date=end_date,
        frames_per_second=frames_per_second,
        dimensions=dimensions,
        overwrite_exists=overwrite_exists,
    )

plot_timeseries(id_geohash, breakpoints=None, save_path=None) 🔗

Plot the time series for a specific geohash.

Parameters:

Name Type Description Default
id_geohash str

The geohash identifier for the location.

required
breakpoints BreakpointMethod

Breakpoint detection method to use.

None
save_path str | Path

Path to save the plot as an image file.

None

Returns:

Type Description
Figure

plt.Figure: The matplotlib figure object.

Source code in src/water_timeseries/dataset.py
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
def plot_timeseries(self, id_geohash: str, breakpoints=None, save_path: Optional[str | Path] = None) -> plt.Figure:
    """Plot the time series for a specific geohash.

    Args:
        id_geohash (str): The geohash identifier for the location.
        breakpoints (BreakpointMethod, optional): Breakpoint detection method to use.
        save_path (str | Path, optional): Path to save the plot as an image file.

    Returns:
        plt.Figure: The matplotlib figure object.
    """
    df = self.ds.sel(id_geohash=id_geohash).load().to_dataframe().dropna().reset_index(drop=False)
    normalization_factor = df["area_data"].max()

    # TODO: breaks are not visualized correctly
    if breakpoints is not None:
        breaks = breakpoints.calculate_break(self, object_id=id_geohash)
        if breaks is not None:
            bp = breaks["date_break"].iloc[0]
    else:
        bp = None

    fig = plot_water_time_series_jrc(
        df,
        first_break=bp,
        plot_variables=["area_water_permanent", "area_water_seasonal", "area_land"],
        normalization_factor=normalization_factor,
        lake_id=id_geohash,
        save_path=save_path,
    )

    # return figure
    return fig

plot_timeseries_interactive(id_geohash, breakpoints=None, save_path=None) 🔗

Plot the interactive time series for a specific geohash using Plotly.

Parameters:

Name Type Description Default
id_geohash str

The geohash identifier for the location.

required
breakpoints BreakpointMethod

Breakpoint detection method to use (not used currently).

None
save_path str | Path

Path to save the plot as HTML file.

None

Returns:

Type Description

plotly.graph_objects.Figure: Interactive Plotly figure.

Source code in src/water_timeseries/dataset.py
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
def plot_timeseries_interactive(
    self,
    id_geohash: str,
    breakpoints=None,
    save_path: Optional[str | Path] = None,
):
    """Plot the interactive time series for a specific geohash using Plotly.

    Args:
        id_geohash (str): The geohash identifier for the location.
        breakpoints (BreakpointMethod, optional): Breakpoint detection method to use (not used currently).
        save_path (str | Path, optional): Path to save the plot as HTML file.

    Returns:
        plotly.graph_objects.Figure: Interactive Plotly figure.
    """
    df = self.ds.sel(id_geohash=id_geohash).load().to_dataframe().dropna().reset_index(drop=False)
    normalization_factor = df["area_data"].max()

    # Breakpoint processing disabled for now
    bp = None

    fig = plot_water_time_series_jrc_interactive(
        df,
        first_break=bp,
        plot_variables=["area_water_permanent", "area_water_seasonal", "area_land"],
        normalization_factor=normalization_factor,
        lake_id=id_geohash,
        save_path=save_path,
    )

    return fig

LakeDataset 🔗

Base class for processing lake and water body datasets.

Handles common operations for dataset preprocessing, normalization, and masking. Provides a framework that can be extended for different data sources.

Attributes:

Name Type Description
ds Dataset

The input xarray Dataset containing raw data.

ds_normalized Dataset

Normalized version of the dataset (0-1 scale).

preprocessed_ bool

Whether preprocessing has been completed.

normalized_available_ bool

Whether normalized data is available.

water_column str

Name of the water/water extent column.

data_columns list

Names of all data columns in the dataset.

ds_ismasked_ bool

Whether the original dataset has been masked.

ds_normalized_ismasked_ bool

Whether the normalized dataset has been masked.

Example

lake_data = LakeDataset(xr.Dataset(...)) normalized = lake_data.ds_normalized

Source code in src/water_timeseries/dataset.py
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
class LakeDataset:
    """Base class for processing lake and water body datasets.

    Handles common operations for dataset preprocessing, normalization, and masking.
    Provides a framework that can be extended for different data sources.

    Attributes:
        ds (xr.Dataset): The input xarray Dataset containing raw data.
        ds_normalized (xr.Dataset): Normalized version of the dataset (0-1 scale).
        preprocessed_ (bool): Whether preprocessing has been completed.
        normalized_available_ (bool): Whether normalized data is available.
        water_column (str): Name of the water/water extent column.
        data_columns (list): Names of all data columns in the dataset.
        ds_ismasked_ (bool): Whether the original dataset has been masked.
        ds_normalized_ismasked_ (bool): Whether the normalized dataset has been masked.

    Example:
        >>> lake_data = LakeDataset(xr.Dataset(...))
        >>> normalized = lake_data.ds_normalized
    """

    def __init__(self, ds, id_field: str = "id_geohash"):
        """Initialize the LakeDataset.

        Args:
            ds (xr.Dataset): Input xarray Dataset with land cover or water classification data.
            id_field (str): Name of the coordinate field that identifies individual time series (default: "id_geohash").
        """
        self.ds = ds
        self.preprocessed_ = False
        self.normalized_available_ = False
        self.water_column = None
        self.data_columns = None
        self.ds_ismasked_ = False
        self.ds_normalized_ismasked_ = False
        self.id_field = id_field
        self._preprocess()
        self._normalize_ds()
        self._mask_invalid()

    @property
    def object_ids_(self) -> list:
        """Get all valid object IDs from the dataset.

        Returns:
            list: List of all object IDs from the id_field coordinate.
        """
        return list(self.ds.coords[self.id_field].values)

    @property
    def dates_(self) -> list:
        """Get all valid dates from the dataset.

        Returns:
            list: List of all dates from the 'date' coordinate.
        """
        return list(self.ds.coords["date"].values)

    def _preprocess(self):
        """Preprocess the dataset.

        This method should be overridden in subclasses to implement data-source-specific
        preprocessing steps such as calculating composite indicators or adding derived fields.
        """
        pass

    def _normalize_ds(self):
        """Normalize the dataset by dividing by maximum values.

        Scales all data to 0-1 range based on the maximum area value per time series.
        This ensures comparability across different spatial extents.
        """
        self.ds_normalized = self.ds.copy()
        self.ds_normalized = self.ds / self.ds.max(dim="date")["area_data"]
        self.normalized_available_ = True

    def _mask_invalid(self):
        """Mask invalid data based on quality criteria.

        This method should be overridden in subclasses to implement data-source-specific
        masking logic based on their quality thresholds and constraints.
        """
        pass

    def create_timelapse(
        self,
        lake_gdf: gpd.GeoDataFrame,
        id_geohash: str,
        timelapse_source: str = "sentinel2",
        gif_outdir: str | Path = "gifs",
        buffer: float = 100,
        start_year: int = 2016,
        end_year: int = 2025,
        start_date: str = "07-01",
        end_date: str = "08-31",
        frames_per_second: int = 1,
        dimensions: int = 512,
        overwrite_exists: bool = False,
    ) -> Path | None:
        """
        Create a timelapse GIF for a specific lake.

        This method generates an animated GIF showing satellite imagery
        over a date range for a lake identified by its geohash. The timelapse captures
        the summer period (July-August) each year to maximize cloud-free observations.

        Args:
            lake_gdf: GeoDataFrame containing lake geometries with an 'id_geohash' column.
            id_geohash: The geohash identifier for the specific lake to visualize.
            timelapse_source: Image source for timelapse imagery ('sentinel2' or 'landsat').
            gif_outdir: Output directory for the GIF file (default: 'gifs').
            buffer: Buffer distance in meters to expand the lake bounding box (default: 100).
            start_year: Start year for the timelapse (default: 2016).
            end_year: End year for the timelapse (default: 2025).
            start_date: Start date within each year (MM-DD format, default: '07-01').
            end_date: End date within each year (MM-DD format, default: '08-31').
            frames_per_second: Animation speed (default: 1).
            dimensions: Pixel dimensions for the output GIF (default: 512).
            overwrite_exists: If False (default), skip download if output file already exists.
                              If True, always re-download and overwrite existing file.

        Returns:
            Path | None: Path to the generated GIF file, or None if skipped due to existing file.
        """
        return create_timelapse(
            input_lake_gdf=lake_gdf,
            id_geohash=id_geohash,
            timelapse_source=timelapse_source,
            gif_outdir=gif_outdir,
            buffer=buffer,
            start_year=start_year,
            end_year=end_year,
            start_date=start_date,
            end_date=end_date,
            frames_per_second=frames_per_second,
            dimensions=dimensions,
            overwrite_exists=overwrite_exists,
        )

    def plot_timeseries(self, id_geohash: str, breakpoints: None) -> plt.Figure:
        """Plot the time series for a specific geohash.

        Args:
            id_geohash (str): The geohash identifier for the location.
            breakpoints (BreakpointMethod, optional): Breakpoint detection method to use.
        """
        pass

    def calculate_changes(self, break_df: pd.DataFrame, id_geohash: str) -> pd.DataFrame:

        pass

    def merge(
        self,
        other: "LakeDataset",
        how: str = "both",
    ) -> "LakeDataset":
        """Merge this LakeDataset with another LakeDataset.

        Combines the .ds attributes of both datasets. Both datasets must have the same
        variables. The merge strategy is determined by the `how` parameter.
        Both datasets must be of the same type (e.g., both DWDataset or both JRCDataset).

        Args:
            other (LakeDataset): Another LakeDataset instance to merge with.
            how (str): Merge strategy. Options:
                - "both": Merge along both dimensions (date and id_geohash). Combines all
                  data from both datasets, keeping all unique dates and id_geohashes.
                - "date": Merge along the "date" dimension only. Both datasets must have
                  the same id_geohash values, but can have different dates. New dates are
                  appended to the existing time series.
                - "id_geohash": Merge along the "id_geohash" dimension only. Both datasets
                  must have the same dates, but can have different id_geohashes. New
                  id_geohashes (lakes) are added with their time series.

        Returns:
            LakeDataset: A new LakeDataset with merged .ds data.

        Raises:
            TypeError: If the datasets are of different types.
            ValueError: If the merge strategy is invalid or datasets are incompatible.

        Example:
            >>> merged = dataset1.merge(dataset2, how="both")
            >>> merged = dataset1.merge(dataset2, how="date")  # Add new dates
            >>> merged = dataset1.merge(dataset2, how="id_geohash")  # Add new lakes
        """
        self._validate_merge(other, how)

        if how == "both":
            merged_ds = self._merge_both(self.ds, other.ds)
        elif how == "date":
            merged_ds = self._merge_by_date(self.ds, other.ds)
        else:  # how == "id_geohash"
            merged_ds = self._merge_by_id(self.ds, other.ds)

        merged = self.__class__(merged_ds)
        merged.id_field = self.id_field
        return merged

    def _validate_merge(self, other: "LakeDataset", how: str):
        """Validate datasets before merging."""

        if how not in {"both", "date", "id_geohash"}:
            raise ValueError(f"Invalid merge strategy '{how}'. Must be 'both', 'date', or 'id_geohash'.")

        if type(self) is not type(other):
            raise TypeError(
                f"Cannot merge {type(self).__name__} with {type(other).__name__}. Both datasets must be the same type."
            )

        if set(self.ds.data_vars) != set(other.ds.data_vars):
            raise ValueError("Datasets have different variables.")

    def _merge_both(self, ds1, ds2):
        """Merge along both dimensions."""

        return xr.merge([ds1, ds2])

    def _merge_by_date(self, ds1, ds2):
        """Merge along date dimension (same id_geohash, new dates)."""

        if set(ds1.coords[self.id_field].values) != set(ds2.coords[self.id_field].values):
            raise ValueError(f"For merge how='date', both datasets must have the same {self.id_field} values.")

        # Check for duplicate dates
        dates1 = set(ds1.coords["date"].values)
        dates2 = set(ds2.coords["date"].values)
        duplicate_dates = dates1 & dates2
        if duplicate_dates:
            warnings.warn(
                f"Datasets have {len(duplicate_dates)} overlapping dates. "
                f"Data from the second dataset will overwrite the first for these dates.",
                UserWarning,
            )

        merged = xr.concat([ds1, ds2], dim="date")
        return merged.sortby("date")

    def _merge_by_id(self, ds1, ds2):
        """Merge along id_geohash dimension (same dates, new id_geohashes)."""

        if set(ds1.coords["date"].values) != set(ds2.coords["date"].values):
            raise ValueError("For merge how='id_geohash', both datasets must have the same dates.")

        # Check for duplicate id_geohashes
        ids1 = set(ds1.coords[self.id_field].values)
        ids2 = set(ds2.coords[self.id_field].values)
        duplicate_ids = ids1 & ids2
        if duplicate_ids:
            warnings.warn(
                f"Datasets have {len(duplicate_ids)} overlapping {self.id_field} values. "
                f"Data from the second dataset will overwrite the first for these values.",
                UserWarning,
            )

        return xr.concat([ds1, ds2], dim=self.id_field)

dates_ property 🔗

Get all valid dates from the dataset.

Returns:

Name Type Description
list list

List of all dates from the 'date' coordinate.

object_ids_ property 🔗

Get all valid object IDs from the dataset.

Returns:

Name Type Description
list list

List of all object IDs from the id_field coordinate.

__init__(ds, id_field='id_geohash') 🔗

Initialize the LakeDataset.

Parameters:

Name Type Description Default
ds Dataset

Input xarray Dataset with land cover or water classification data.

required
id_field str

Name of the coordinate field that identifies individual time series (default: "id_geohash").

'id_geohash'
Source code in src/water_timeseries/dataset.py
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
def __init__(self, ds, id_field: str = "id_geohash"):
    """Initialize the LakeDataset.

    Args:
        ds (xr.Dataset): Input xarray Dataset with land cover or water classification data.
        id_field (str): Name of the coordinate field that identifies individual time series (default: "id_geohash").
    """
    self.ds = ds
    self.preprocessed_ = False
    self.normalized_available_ = False
    self.water_column = None
    self.data_columns = None
    self.ds_ismasked_ = False
    self.ds_normalized_ismasked_ = False
    self.id_field = id_field
    self._preprocess()
    self._normalize_ds()
    self._mask_invalid()

create_timelapse(lake_gdf, id_geohash, timelapse_source='sentinel2', gif_outdir='gifs', buffer=100, start_year=2016, end_year=2025, start_date='07-01', end_date='08-31', frames_per_second=1, dimensions=512, overwrite_exists=False) 🔗

Create a timelapse GIF for a specific lake.

This method generates an animated GIF showing satellite imagery over a date range for a lake identified by its geohash. The timelapse captures the summer period (July-August) each year to maximize cloud-free observations.

Parameters:

Name Type Description Default
lake_gdf GeoDataFrame

GeoDataFrame containing lake geometries with an 'id_geohash' column.

required
id_geohash str

The geohash identifier for the specific lake to visualize.

required
timelapse_source str

Image source for timelapse imagery ('sentinel2' or 'landsat').

'sentinel2'
gif_outdir str | Path

Output directory for the GIF file (default: 'gifs').

'gifs'
buffer float

Buffer distance in meters to expand the lake bounding box (default: 100).

100
start_year int

Start year for the timelapse (default: 2016).

2016
end_year int

End year for the timelapse (default: 2025).

2025
start_date str

Start date within each year (MM-DD format, default: '07-01').

'07-01'
end_date str

End date within each year (MM-DD format, default: '08-31').

'08-31'
frames_per_second int

Animation speed (default: 1).

1
dimensions int

Pixel dimensions for the output GIF (default: 512).

512
overwrite_exists bool

If False (default), skip download if output file already exists. If True, always re-download and overwrite existing file.

False

Returns:

Type Description
Path | None

Path | None: Path to the generated GIF file, or None if skipped due to existing file.

Source code in src/water_timeseries/dataset.py
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
def create_timelapse(
    self,
    lake_gdf: gpd.GeoDataFrame,
    id_geohash: str,
    timelapse_source: str = "sentinel2",
    gif_outdir: str | Path = "gifs",
    buffer: float = 100,
    start_year: int = 2016,
    end_year: int = 2025,
    start_date: str = "07-01",
    end_date: str = "08-31",
    frames_per_second: int = 1,
    dimensions: int = 512,
    overwrite_exists: bool = False,
) -> Path | None:
    """
    Create a timelapse GIF for a specific lake.

    This method generates an animated GIF showing satellite imagery
    over a date range for a lake identified by its geohash. The timelapse captures
    the summer period (July-August) each year to maximize cloud-free observations.

    Args:
        lake_gdf: GeoDataFrame containing lake geometries with an 'id_geohash' column.
        id_geohash: The geohash identifier for the specific lake to visualize.
        timelapse_source: Image source for timelapse imagery ('sentinel2' or 'landsat').
        gif_outdir: Output directory for the GIF file (default: 'gifs').
        buffer: Buffer distance in meters to expand the lake bounding box (default: 100).
        start_year: Start year for the timelapse (default: 2016).
        end_year: End year for the timelapse (default: 2025).
        start_date: Start date within each year (MM-DD format, default: '07-01').
        end_date: End date within each year (MM-DD format, default: '08-31').
        frames_per_second: Animation speed (default: 1).
        dimensions: Pixel dimensions for the output GIF (default: 512).
        overwrite_exists: If False (default), skip download if output file already exists.
                          If True, always re-download and overwrite existing file.

    Returns:
        Path | None: Path to the generated GIF file, or None if skipped due to existing file.
    """
    return create_timelapse(
        input_lake_gdf=lake_gdf,
        id_geohash=id_geohash,
        timelapse_source=timelapse_source,
        gif_outdir=gif_outdir,
        buffer=buffer,
        start_year=start_year,
        end_year=end_year,
        start_date=start_date,
        end_date=end_date,
        frames_per_second=frames_per_second,
        dimensions=dimensions,
        overwrite_exists=overwrite_exists,
    )

merge(other, how='both') 🔗

Merge this LakeDataset with another LakeDataset.

Combines the .ds attributes of both datasets. Both datasets must have the same variables. The merge strategy is determined by the how parameter. Both datasets must be of the same type (e.g., both DWDataset or both JRCDataset).

Parameters:

Name Type Description Default
other LakeDataset

Another LakeDataset instance to merge with.

required
how str

Merge strategy. Options: - "both": Merge along both dimensions (date and id_geohash). Combines all data from both datasets, keeping all unique dates and id_geohashes. - "date": Merge along the "date" dimension only. Both datasets must have the same id_geohash values, but can have different dates. New dates are appended to the existing time series. - "id_geohash": Merge along the "id_geohash" dimension only. Both datasets must have the same dates, but can have different id_geohashes. New id_geohashes (lakes) are added with their time series.

'both'

Returns:

Name Type Description
LakeDataset LakeDataset

A new LakeDataset with merged .ds data.

Raises:

Type Description
TypeError

If the datasets are of different types.

ValueError

If the merge strategy is invalid or datasets are incompatible.

Example

merged = dataset1.merge(dataset2, how="both") merged = dataset1.merge(dataset2, how="date") # Add new dates merged = dataset1.merge(dataset2, how="id_geohash") # Add new lakes

Source code in src/water_timeseries/dataset.py
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
def merge(
    self,
    other: "LakeDataset",
    how: str = "both",
) -> "LakeDataset":
    """Merge this LakeDataset with another LakeDataset.

    Combines the .ds attributes of both datasets. Both datasets must have the same
    variables. The merge strategy is determined by the `how` parameter.
    Both datasets must be of the same type (e.g., both DWDataset or both JRCDataset).

    Args:
        other (LakeDataset): Another LakeDataset instance to merge with.
        how (str): Merge strategy. Options:
            - "both": Merge along both dimensions (date and id_geohash). Combines all
              data from both datasets, keeping all unique dates and id_geohashes.
            - "date": Merge along the "date" dimension only. Both datasets must have
              the same id_geohash values, but can have different dates. New dates are
              appended to the existing time series.
            - "id_geohash": Merge along the "id_geohash" dimension only. Both datasets
              must have the same dates, but can have different id_geohashes. New
              id_geohashes (lakes) are added with their time series.

    Returns:
        LakeDataset: A new LakeDataset with merged .ds data.

    Raises:
        TypeError: If the datasets are of different types.
        ValueError: If the merge strategy is invalid or datasets are incompatible.

    Example:
        >>> merged = dataset1.merge(dataset2, how="both")
        >>> merged = dataset1.merge(dataset2, how="date")  # Add new dates
        >>> merged = dataset1.merge(dataset2, how="id_geohash")  # Add new lakes
    """
    self._validate_merge(other, how)

    if how == "both":
        merged_ds = self._merge_both(self.ds, other.ds)
    elif how == "date":
        merged_ds = self._merge_by_date(self.ds, other.ds)
    else:  # how == "id_geohash"
        merged_ds = self._merge_by_id(self.ds, other.ds)

    merged = self.__class__(merged_ds)
    merged.id_field = self.id_field
    return merged

plot_timeseries(id_geohash, breakpoints) 🔗

Plot the time series for a specific geohash.

Parameters:

Name Type Description Default
id_geohash str

The geohash identifier for the location.

required
breakpoints BreakpointMethod

Breakpoint detection method to use.

required
Source code in src/water_timeseries/dataset.py
168
169
170
171
172
173
174
175
def plot_timeseries(self, id_geohash: str, breakpoints: None) -> plt.Figure:
    """Plot the time series for a specific geohash.

    Args:
        id_geohash (str): The geohash identifier for the location.
        breakpoints (BreakpointMethod, optional): Breakpoint detection method to use.
    """
    pass

Merge Functionality🔗

The LakeDataset class and its subclasses (DWDataset, JRCDataset) provide a merge() method to combine two datasets. This is useful for:

  • Combining datasets from different time periods
  • Adding new lakes to an existing dataset
  • Combining partial datasets into a complete one

Merge Strategies🔗

The merge() method accepts a how parameter with three options:

Strategy Description Requirements
"both" Merge along both dimensions (date and id_geohash). Combines all unique data from both datasets. Same variables
"date" Merge along the date dimension only. Adds new dates for the same lakes. Same id_geohash values, same variables
"id_geohash" Merge along the id_geohash dimension only. Adds new lakes with the same dates. Same dates, same variables

Examples🔗

from water_timeseries.dataset import DWDataset
import xarray as xr

# Load two datasets
ds1 = xr.open_dataset("data_2020_2022.zarr")
dataset1 = DWDataset(ds1)

ds2 = xr.open_dataset("data_2023_2024.zarr")
dataset2 = DWDataset(ds2)

# Merge along both dimensions
merged = dataset1.merge(dataset2, how="both")

# Add new dates to existing time series (same lakes)
# Both datasets must have the same id_geohash values
merged = dataset1.merge(dataset2, how="date")

# Add new lakes with the same temporal coverage
# Both datasets must have the same dates
merged = dataset1.merge(dataset2, how="id_geohash")

Warnings🔗

When there are overlapping values, a warning is issued:

  • how="date": Warns if there are duplicate dates between datasets
  • how="id_geohash": Warns if there are duplicate id_geohash values

In both cases, data from the second dataset will overwrite the first for overlapping values.

Requirements🔗

  • Both datasets must be of the same type (both DWDataset or both JRCDataset)
  • Both datasets must have the same variables
  • The specific merge strategy may have additional requirements (see table above)

Return Value🔗

The merge() method returns a new LakeDataset instance (of the same type as the first dataset) with the combined data. The returned dataset is fully preprocessed and normalized.


Plot Time Series🔗

Both DWDataset and JRCDataset provide a plot_timeseries() method to visualize water extent over time for a specific lake.

DWDataset.plot_timeseries()🔗

from water_timeseries.dataset import DWDataset
import xarray as xr

# Load data
ds = xr.open_zarr("lakes_dw.zarr")
dataset = DWDataset(ds)

# Plot time series for a specific lake
fig = dataset.plot_timeseries(
    id_geohash="b7uefy0bvcrc",
    breakpoints=None  # Optional: pass BreakpointMethod to overlay detected breaks
)

# Show the plot
fig.show()

JRCDataset.plot_timeseries()🔗

from water_timeseries.dataset import JRCDataset
import xarray as xr

# Load data
ds = xr.open_zarr("lakes_jrc.zarr")
dataset = JRCDataset(ds)

# Plot time series
fig = dataset.plot_timeseries(
    id_geohash="b7uefy0bvcrc",
    breakpoints=None  # Optional: BreakpointMethod to overlay detected breaks
)

fig.show()

Parameters🔗

Parameter Type Description
id_geohash str The geohash identifier for the lake to plot
breakpoints BreakpointMethod, optional Breakpoint detection result to overlay on the plot (e.g., from SimpleBreakpoint or BeastBreakpoint)

Return Value🔗

Returns a matplotlib.figure.Figure object that can be displayed or saved.

With Breakpoint Overlay🔗

from water_timeseries.dataset import DWDataset
from water_timeseries.breakpoint import SimpleBreakpoint

# Initialize dataset
dataset = DWDataset(xr.open_zarr("lakes_dw.zarr"))

# Detect breakpoints
bp = SimpleBreakpoint()
breaks = bp.calculate_break(dataset, object_id="b7uefy0bvcrc")

# Plot with breakpoint overlay
fig = dataset.plot_timeseries(
    id_geohash="b7uefy0bvcrc",
    breakpoints=breaks
)

fig.show()

Visual Output🔗

DWDataset Time Series

DW Time Series Example

The DWDataset plot shows land cover class proportions as a stacked area chart: - Water (blue): Primary water extent indicator - Vegetation classes (trees, grass, crops, shrub): Grouped in green tones - Other classes (built, bare, snow): Shown in distinct colors - Values are normalized to total area (0-1 scale)

JRCDataset Time Series

JRC Time Series Example

The JRCDataset plot shows permanent vs seasonal water as a line chart: - Permanent water (blue): Water present year-round - Seasonal water (light blue): Water present seasonally - Land: Dry land area (shown in brown/green) - Values are percentages (0-100)

With Breakpoint Overlay

When a breakpoint is detected, a vertical dashed line marks when significant water extent changes occurred.