Skip to content

Package and testing updates and fixes#435

Open
jmcvey3 wants to merge 11 commits intoMHKiT-Software:developfrom
jmcvey3:package-cleanup
Open

Package and testing updates and fixes#435
jmcvey3 wants to merge 11 commits intoMHKiT-Software:developfrom
jmcvey3:package-cleanup

Conversation

@jmcvey3
Copy link
Contributor

@jmcvey3 jmcvey3 commented Feb 19, 2026

A general PR to cover tests that have been breaking recently to python and dependency updates. Also contains cleanup of the package in general

  1. Makes the caching code in utils/cache.py less opaque. To properly test cache code, one needs to delete the .cache/mhkit folder, wherever that is stored on your local
  2. Cleans up a new cache bug in wave/io/cdip.py
  3. Packaging updates:
    • Part of our CI troubles may be that dependencies are double-installed via the conda environment.yaml file and the pyproject.toml file. Best practice is to use the environment file for conda-specific packages and keep dependencies in the .toml file. Dependencies will come from conda if the mhkit package is installed via conda instead of pip.
    • One possible issue may come from the hdf5-related packages. I'm aware there can an issue with hdf5 C library filepaths if they are installed partly via conda and partly via pip.
    • Updates github actions CI versions

@jmcvey3
Copy link
Contributor Author

jmcvey3 commented Feb 19, 2026

And the wind-hindcast-cache tests have a dtype error. Something was changed and now the data returns as float64 instead of float32

@jmcvey3 jmcvey3 requested review from akeeste and simmsa February 19, 2026 17:33
@simmsa
Copy link
Contributor

simmsa commented Feb 21, 2026

@jmcvey3 thank you for breaking this out into a separate PR.

There is a non obvious "test" here where we are verifying that the conda-forge install works. The original environment.yml is the same as this: https://github.com/conda-forge/mhkit-feedstock/blob/main/recipe/meta.yaml which is what conda-forge uses to build the conda version of mhkit.

Also important is that conda-forge uses this line:

{{ PYTHON }} -m pip install .[all] -vv --no-deps --no-build-isolation

to build the conda forge package.

TBH this is a bit of python packing magic that I don't fully understand. In lieu of understanding, the tests build mhkit using a similar workflow as conda forge and basically "verify" that conda-forge will work, or not work. I think we should maintain this pattern and restore the environment.yml to be the same as the conda forge version.

Having to define the same versions in two different files, pyproject.toml and environment.yml is really annoying, and maintaining a 3rd file in another repo even more annoying. I haven't seen any solutions for this in python, but maybe something better exists.

I think it is possible to autogen environment.yml from pyproject.toml to keep versions the same, so maybe that helps here. Or some other tool altogether.

Or maybe we need some other packaging solution that handles H5/HDF5 binaries better.

I'd be happy to try to pick this up, and try to get the tests working. Let us know what makes sense here.

@jmcvey3
Copy link
Contributor Author

jmcvey3 commented Feb 21, 2026

@simmsa Thank you for the background here, this is starting to ring a bell

So to be clear, the only reason the extra dependencies are in environment.yml is to test and make sure the conda-forge meta.yaml will function properly?

If so, should we move the original environment.yml file into something like "mhkit/tests/conda-forge/enviroment.yml" and leave a barebones environment.yml file in the main repo directory? That way we can refer to the former in the "conda-ubunut-latest" tests and the latter in the "pip-ubuntu-latest" tests and separate them properly?

@@ -54,8 +56,6 @@ river = [
]

dolfyn = [
Copy link
Contributor Author

@jmcvey3 jmcvey3 Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also saw that the h5py libraries are set here, but dolfyn doesn't actually require any of the h5-related libraries. These might belong under the wave module dependencies

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmcvey3 every netcdf file I have worked with is a h5 file: https://docs.unidata.ucar.edu/netcdf-c/4.9.3/interoperability_hdf5.html

Historically for mhkit it has been beneficial to control the Hdf5 library versions because it is typical for different libraries to use different versions, see #351, Issue #303, Issue #177.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simmsa I'm just commenting that they were listed under the dolfyn module and don't need to be. I learned when refactoring dolfyn that netCDF4 could incorporate h5 capabilities if needed, but it doesn't have to. The python h5 packages are bulky, so I wrote dolfyn to write netCDF4 files to only need the python netCDF4 dependencies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I support moving them to wave dependencies then. Part of the original motivation of the optional install set-up was to allow for small installs and keep the h5 packages in Dolfyn will force the module to be larger than necessary

@simmsa
Copy link
Contributor

simmsa commented Feb 23, 2026

@simmsa Thank you for the background here, this is starting to ring a bell

So to be clear, the only reason the extra dependencies are in environment.yml is to test and make sure the conda-forge meta.yaml will function properly?

If so, should we move the original environment.yml file into something like "mhkit/tests/conda-forge/enviroment.yml" and leave a barebones environment.yml file in the main repo directory? That way we can refer to the former in the "conda-ubunut-latest" tests and the latter in the "pip-ubuntu-latest" tests and separate them properly?

Def not the only reason. environment.yml in the root allows developers to use conda to install mhkit packages. It seems valuable to support multiple installation methods for developers, which is why we define the dependencies twice.

@jmcvey3
Copy link
Contributor Author

jmcvey3 commented Feb 23, 2026

@simmsa Thank you for the background here, this is starting to ring a bell
So to be clear, the only reason the extra dependencies are in environment.yml is to test and make sure the conda-forge meta.yaml will function properly?
If so, should we move the original environment.yml file into something like "mhkit/tests/conda-forge/enviroment.yml" and leave a barebones environment.yml file in the main repo directory? That way we can refer to the former in the "conda-ubunut-latest" tests and the latter in the "pip-ubuntu-latest" tests and separate them properly?

Def not the only reason. environment.yml in the root allows developers to use conda to install mhkit packages. It seems valuable to support multiple installation methods for developers, which is why we define the dependencies twice.

I don't understand - you can still use conda with a smaller environment.yml file. If you're downloading from conda-forge, then you'll get the packages from conda-forge. There's nothing different between packages downloaded from conda-forge and those downloaded from pip, assuming the versions are the same.

@simmsa
Copy link
Contributor

simmsa commented Feb 23, 2026

@jmcvey3 one other thing that you are may be fighting is the optional dependencies settings:

https://packaging.python.org/en/latest/guides/writing-pyproject-toml/#dependencies-optional-dependencies

We set those to allow pip install "mhkit[]" which allows module level installs, but I don't think any of us realized the implications of module level installs and the complications.

I pushed a couple of fixes: #421, #422, #423 and #424 when we figured out that conda install -c conda-forge mhkit effectively installed nothing, but there was some nuance here that may be causing some of the problems now:

MHKiT-Python/pyproject.toml

Lines 129 to 142 in 6f806c8

[tool.setuptools.packages.find]
# This section controls how this python package is built for pip installations
# Getting this section wrong results in an incomplete package that is missing submodules
# This can be tested by simply running python -m build in the root directory
# and then inspecting the generated .tar.gz and .whl files in the dist/ directory
#
# https://setuptools.pypa.io/en/latest/userguide/package_discovery.html#finding-simple-packages
# What this does: (per the above docs):
# - `where = ["."]`: Search for packages starting from the root directory
# - `include = ["mhkit*"]`: Only include packages that match the pattern `mhkit*`
# - this includes `mhkit`, `mhkit.wave`, `mhkit.river`, etc.
# - Setuptools will automatically discover all directories containing `__init__.py` files
where = ["."]
include = ["mhkit*"]

https://setuptools.pypa.io/en/latest/userguide/package_discovery.html#finding-namespace-packages

Do you think the above failures have anything to do with the module changes?

@jmcvey3
Copy link
Contributor Author

jmcvey3 commented Feb 23, 2026

@simmsa I don't think I'm fighting the optional dependency installs; those seem to be working fine. I do remember having an issue with the package initialization process running into errors trying to import optional packages, but I think that was resolved.

Copy link
Contributor

@akeeste akeeste left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple clarifying questions on this PR:

  • why the --no-deps change in the Actions workflows? What problem was this flag causing?
  • numexpr and pyarrow are added to the pyproject.toml but I don't see them imported anywhere. Where are these two packages used?

And to confirm my understanding of the environment.yml files:
So now the all-inclusive tests/conda-forge/environment.yml serves the purpose of testing a pure-conda installation of MHKiT (plus the local GitHub version of MHKiT) used on a given PR. While mhkit/environment.yml is used in the pip installation of MHKiT and only installs python, pip, and hdf5 with conda before setting up the rest of the environment with pip

Lastly, my test_cdip.py is not working locally. I need to evaluate this further to see what is going on.

@@ -54,8 +56,6 @@ river = [
]

dolfyn = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I support moving them to wave dependencies then. Part of the original motivation of the optional install set-up was to allow for small installs and keep the h5 packages in Dolfyn will force the module to be larger than necessary

@akeeste akeeste mentioned this pull request Feb 24, 2026
@jmcvey3
Copy link
Contributor Author

jmcvey3 commented Feb 24, 2026

A couple clarifying questions on this PR:

  • why the --no-deps change in the Actions workflows? What problem was this flag causing?
  • numexpr and pyarrow are added to the pyproject.toml but I don't see them imported anywhere. Where are these two packages used?

And to confirm my understanding of the environment.yml files: So now the all-inclusive tests/conda-forge/environment.yml serves the purpose of testing a pure-conda installation of MHKiT (plus the local GitHub version of MHKiT) used on a given PR. While mhkit/environment.yml is used in the pip installation of MHKiT and only installs python, pip, and hdf5 with conda before setting up the rest of the environment with pip

Lastly, my test_cdip.py is not working locally. I need to evaluate this further to see what is going on.

For the "--no-deps" changes, the "pip-ubuntu-latest" tests were installing all dependencies via the conda environment file, rather than letting pip do the package dependency handling. This basically means conda installed all the packages, and we aren't testing if the versions that pip wants to install also function. Sometimes conda-approved versions are behind those on pip.

Good point, we should check if we still need to set "numexpr" and "pyarrow". They're dependencies for other packages, and I think we have them set because of some problematic package with a bad version release.

Yes, your understanding is correct. We can probably also test if we need hdf5 installed via conda if there is a user out there that wants to use venv or some such.

Yeah the "test_cdip.py" issues were what launched this whole escapade. Make sure you clear your mhkit cache on your local first. The first time you run the test, it writes the cache, and every time after you run the test, it reads the cache.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants