hydro_event¶
The hydro_event module provides utilities for flood event extraction and analysis from hydrological time series data.
Core Functions¶
extract_flood_events¶
1 2 3 4 5 6 | |
Extracts flood events from a DataFrame based on a binary flood event indicator.
Args:
- df: DataFrame with flood_event and time columns
- warmup_length: Number of time steps to include as warmup period
- flood_event_col: Name of flood event indicator column
- time_col: Name of time column
Returns: - List of flood event dictionaries containing event data and metadata
Example:
1 2 3 4 5 6 7 8 9 10 11 12 13 | |
time_to_ten_digits¶
1 | |
Converts time objects to ten-digit format (YYYYMMDDHH).
Example:
1 2 3 4 5 | |
extract_peaks¶
1 2 3 4 5 | |
Extracts peak values and their indices from time series data.
calculate_event_statistics¶
1 | |
Calculates statistical summary for extracted flood events.
API Reference¶
Author: Wenyu Ouyang Date: 2025-01-17 LastEditTime: 2025-08-17 09:29:42 LastEditors: Wenyu Ouyang Description: Flood event extraction utilities for hydrological data processing FilePath: \hydromodeld:\Code\hydroutils\hydroutils\hydro_event.py Copyright (c) 2023-2026 Wenyu Ouyang. All rights reserved.
extract_event_data_by_columns(df, event_indices, data_columns)
¶
Extract event data for specified columns using event indices.
This function extracts data from specified columns for a flood event using the index information from get_event_indices or extract_flood_events.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Original DataFrame containing all data. |
required |
event_indices
|
Dict
|
Event index information dictionary containing: - warmup_start_idx (int): Start index including warmup period - end_idx (int): End index of event |
required |
data_columns
|
List[str]
|
List of column names to extract. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Dict |
Dict
|
Dictionary mapping column names to numpy arrays containing the extracted data. If a column is not found, it will contain an array of NaN values. |
Example
df = pd.DataFrame({ ... 'time': pd.date_range('2020-01-01', periods=5), ... 'flow': [100, 200, 300, 250, 150] ... }) indices = {'warmup_start_idx': 1, 'end_idx': 4} data = extract_event_data_by_columns(df, indices, ['flow']) data['flow'] array([200., 300., 250.])
Source code in hydroutils/hydro_event.py
286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 | |
extract_flood_events(df, warmup_length=0, flood_event_col='flood_event', time_col='time')
¶
Extract flood events from a DataFrame based on a flood event indicator column.
This function extracts flood events based on a binary indicator column (flood_event). The design philosophy is to be agnostic about other columns, letting the caller decide how to handle the data columns. The function only requires the flood_event column to mark events and a time column for event naming.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
DataFrame containing site data. Must have flood_event and time columns. |
required |
warmup_length
|
int
|
Number of time steps to include as warmup period before each event. Defaults to 0. |
0
|
flood_event_col
|
str
|
Name of the flood event indicator column. Defaults to "flood_event". |
'flood_event'
|
time_col
|
str
|
Name of the time column. Defaults to "time". |
'time'
|
Returns:
| Type | Description |
|---|---|
List[Dict]
|
List[Dict]: List of flood events. Each dictionary contains: - event_name (str): Event name based on start/end times - start_idx (int): Start index of actual event in original DataFrame - end_idx (int): End index of actual event in original DataFrame - warmup_start_idx (int): Start index including warmup period - data (pd.DataFrame): Event data including warmup period - is_warmup_mask (np.ndarray): Boolean array marking warmup rows - actual_start_time: Start time of actual event - actual_end_time: End time of actual event |
Raises:
| Type | Description |
|---|---|
ValueError
|
If required columns are missing from DataFrame. |
Example
df = pd.DataFrame({ ... 'time': pd.date_range('2020-01-01', periods=5), ... 'flood_event': [0, 1, 1, 1, 0], ... 'flow': [100, 200, 300, 250, 150] ... }) events = extract_flood_events(df, warmup_length=1) len(events) 1 events[0]['data'] time flood_event flow 0 2020-01-01 0 100 # warmup period 1 2020-01-02 1 200 # event start 2 2020-01-03 1 300 3 2020-01-04 1 250 # event end
Source code in hydroutils/hydro_event.py
59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 | |
find_flood_event_segments_as_tuples(flood_event_array, warmup_length=0)
¶
Find continuous flood event segments and return them as tuples.
This is a convenience function that returns event segments as tuples instead of dictionaries, for compatibility with existing code.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
flood_event_array
|
ndarray
|
Binary array where values > 0 indicate flood events. |
required |
warmup_length
|
int
|
Number of time steps to include as warmup period before each event. Defaults to 0. |
0
|
Returns:
| Type | Description |
|---|---|
List[Tuple[int, int, int, int]]
|
List[Tuple[int, int, int, int]]: List of tuples, each containing: (extended_start, extended_end, original_start, original_end) where: - extended_start: Start index including warmup period - extended_end: End index of event - original_start: Start index of actual event - original_end: End index of actual event |
Example
arr = np.array([0, 0, 1, 1, 1, 0]) segments = find_flood_event_segments_as_tuples(arr, warmup_length=1) segments[0] # (warmup_start, event_end, event_start, event_end) (1, 4, 2, 4)
Source code in hydroutils/hydro_event.py
406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 | |
find_flood_event_segments_from_array(flood_event_array, warmup_length=0)
¶
Find continuous flood event segments in a binary indicator array.
This is a low-level function that handles the core logic of segmenting a flood event indicator array into continuous events. It can be reused by different higher-level functions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
flood_event_array
|
ndarray
|
Binary array where values > 0 indicate flood events. |
required |
warmup_length
|
int
|
Number of time steps to include as warmup period before each event. Defaults to 0. |
0
|
Returns:
| Type | Description |
|---|---|
List[Dict]
|
List[Dict]: List of event segment information. Each dictionary contains: - extended_start (int): Start index including warmup period - extended_end (int): End index of event - original_start (int): Start index of actual event - original_end (int): End index of actual event - duration (int): Duration of actual event in time steps - total_length (int): Total length including warmup period |
Example
arr = np.array([0, 0, 1, 1, 1, 0, 0, 1, 1, 0]) segments = find_flood_event_segments_from_array(arr, warmup_length=1) len(segments) # Two events found 2 segments[0] # First event with one timestep warmup { 'extended_start': 1, # Warmup start 'extended_end': 4, # Event end 'original_start': 2, # Actual event start 'original_end': 4, # Actual event end 'duration': 3, # Event duration 'total_length': 4 # Total length with warmup }
Source code in hydroutils/hydro_event.py
330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 | |
get_event_indices(df, warmup_length=0, flood_event_col='flood_event')
¶
Get index information for flood events without extracting data.
This function identifies flood events in the DataFrame and returns their index information, but does not extract the actual data. This is useful when you only need to know the locations and durations of events.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
DataFrame containing site data. |
required |
warmup_length
|
int
|
Number of time steps to include as warmup period before each event. Defaults to 0. |
0
|
flood_event_col
|
str
|
Name of flood event indicator column. Defaults to "flood_event". |
'flood_event'
|
Returns:
| Type | Description |
|---|---|
List[Dict]
|
List[Dict]: List of event index information. Each dictionary contains: - start_idx (int): Start index of actual event - end_idx (int): End index of actual event - warmup_start_idx (int): Start index including warmup period - duration (int): Duration of actual event in time steps - total_length (int): Total length including warmup period |
Raises:
| Type | Description |
|---|---|
ValueError
|
If flood_event_col is not found in DataFrame. |
Example
df = pd.DataFrame({'flood_event': [0, 1, 1, 1, 0]}) indices = get_event_indices(df, warmup_length=1) indices[0] { 'start_idx': 1, 'end_idx': 4, 'warmup_start_idx': 0, 'duration': 3, 'total_length': 4 }
Source code in hydroutils/hydro_event.py
223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 | |
time_to_ten_digits(time_obj)
¶
Convert a time object to a ten-digit format YYYYMMDDHH.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
time_obj
|
Union[datetime, datetime64, str]
|
Time object to convert. Can be datetime, numpy.datetime64, or string. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
Ten-digit time string in YYYYMMDDHH format. |
Example
time_to_ten_digits(datetime.datetime(2020, 1, 1, 12, 0)) '2020010112' time_to_ten_digits(np.datetime64('2020-01-01T12')) '2020010112' time_to_ten_digits('2020-01-01T12:00:00') '2020010112'
Source code in hydroutils/hydro_event.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 | |