FAQ¶
General Questions¶
What is hydrodataset?¶
hydrodataset is a Python package that provides a unified API for accessing 50+ hydrological datasets. It serves as a data-adapting layer on top of AquaFetch, standardizing diverse datasets into a consistent NetCDF format optimized for deep learning workflows.
How is hydrodataset different from AquaFetch?¶
- AquaFetch: Handles downloading and reading raw data from public hydrological datasets
- hydrodataset: Takes AquaFetch data and standardizes it into a consistent format with unified variable names, NetCDF caching, and ML-ready outputs
Think of AquaFetch as the data fetcher and hydrodataset as the data standardizer.
Which Python versions are supported?¶
hydrodataset requires Python 3.10 or higher.
Installation & Setup¶
Where should I create the hydro_setting.yml file?¶
The hydro_setting.yml file should be placed in your home directory (~/hydro_setting.yml):
- Windows: C:\Users\YourUsername\hydro_setting.yml
- Linux/Mac: /home/username/hydro_setting.yml or ~/hydro_setting.yml
What should be in hydro_setting.yml?¶
1 2 3 4 | |
Adjust paths according to your system and preferences.
I'm getting an error about missing hydro_setting.yml. What should I do?¶
- Create the file in your home directory (see above)
- Ensure the paths in the file exist and are writable
- Use absolute paths or proper forward slashes on Windows
Data Access¶
How do I know which datasets are available?¶
Check the Supported Datasets section in the README or browse the API documentation.
What are standardized variable names?¶
Standardized variable names allow you to request the same type of data across different datasets using a common name:
- streamflow - works for CAMELS-US, CAMELS-AUS, etc.
- precipitation - consistent across all datasets
- temperature_max / temperature_min - temperature extremes
This eliminates the need to learn each dataset's specific naming conventions.
How do I see what variables are available for a dataset?¶
1 2 3 4 5 | |
Caching & Performance¶
Where are the NetCDF cache files stored?¶
Cache files are stored in the cache directory specified in your hydro_setting.yml:
1 2 | |
The first data access is slow. Is this normal?¶
Yes! The first access: 1. Fetches raw data via AquaFetch 2. Standardizes variable names and units 3. Saves to NetCDF cache files
All subsequent reads are instant as they load from the fast .nc cache.
How do I regenerate the cache?¶
Simply delete the corresponding .nc files in your cache directory:
1 2 3 | |
Next access will regenerate the cache.
Usage & Examples¶
How do I read data for specific basins?¶
1 2 3 4 5 6 7 8 | |
How do I specify a time range?¶
1 2 3 4 5 | |
Can I use this with deep learning frameworks?¶
Yes! The data is returned as xarray.Dataset objects which can be easily converted to numpy arrays or PyTorch tensors:
1 2 3 4 5 | |
For integration with deep learning workflows, check out torchhydro.
Troubleshooting¶
I'm getting import errors. What should I check?¶
- Ensure hydrodataset is installed:
pip install hydrodataset - Check your Python version:
python --version(must be 3.10+) - Try reinstalling:
pip install --upgrade hydrodataset
Data is not being cached. What's wrong?¶
- Check that the
cachepath inhydro_setting.ymlexists - Verify write permissions for the cache directory
- Check disk space availability
I'm getting "FileNotFoundError" when reading data. Help!¶
- Ensure raw data is downloaded to the
datasets-origindirectory - Some datasets require manual download - check AquaFetch documentation
- Verify paths in
hydro_setting.ymlare correct
Where can I get help?¶
- 📖 Read the Documentation
- 🐛 Check GitHub Issues
- 💬 Open a new issue with your question