refactor: reorganize directory structure#946
refactor: reorganize directory structure#946newtontech wants to merge 2 commits intodeepmodeling:masterfrom
Conversation
Move all format directories (abacus, amber, cp2k, deepmd, dftbplus, fhi_aims, gaussian, gromacs, lammps, lmdb, md, openmx, orca, psi4, pwmat, pymatgen, qe, rdkit, siesta, vasp, xyz) into a new formats/ subdirectory. This addresses issue deepmodeling#934. Changes: - Created dpdata/formats/ directory - Moved all format directories to dpdata/formats/ - Updated all import statements throughout the codebase - Updated relative imports in format modules (from .. to from ...) - Updated dpdata/__init__.py to import from new locations - Updated tests/context.py for new import paths The plugins directory remains at the root level as requested.
for more information, see https://pre-commit.ci
📝 WalkthroughWalkthroughThe PR reorganizes dpdata's module structure by moving numerous format-specific modules (abacus, amber, cp2k, gaussian, gromacs, lammps, vasp, and others) into a centralized Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
dpdata/plugins/openmx.py (1)
64-64: Prefix unused variablecswith underscore for consistency.The variable
csis unpacked but never used. For consistency with line 38 (which uses_for the same purpose), consider using_here as well.♻️ Proposed fix
- data, cs = dpdata.formats.openmx.omx.to_system_data(fname, mdname) + data, _ = dpdata.formats.openmx.omx.to_system_data(fname, mdname)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dpdata/plugins/openmx.py` at line 64, The unpacked variable `cs` from the call to dpdata.formats.openmx.omx.to_system_data is unused; change the unpack to use an underscore placeholder (e.g., data, _ = dpdata.formats.openmx.omx.to_system_data(...)) to match the pattern used elsewhere (see the other unpack using `_`) so the unused value is clearly ignored and linter warnings are avoided.dpdata/bond_order_system.py (1)
13-13: Optional cleanup: remove stale commented rdkit import/usage lines.These comment-only remnants add noise and can be dropped in a follow-up for readability.
Also applies to: 164-164
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dpdata/bond_order_system.py` at line 13, Remove the stale commented rdkit import and any related commented-out rdkit usage lines (specifically the commented "import dpdata.formats.rdkit.mol2" and the other commented rdkit remnants around the later block) to clean up noise; locate these comments in dpdata/bond_order_system.py by searching for the literal "dpdata.formats.rdkit.mol2" and for any commented references to "rdkit" or "mol2" and delete those comment-only lines, leaving active imports and code untouched.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@dpdata/formats/__init__.py`:
- Line 1: The repo has widespread ruff lint violations under the dpdata package;
run ruff check --fix dpdata/ to automatically apply fixes and then manually
address remaining issues: replace bare "raise" usage with "raise from" or
preserve exception context for TRY003, mark unused method args in class methods
(e.g., in any dpdata.* classes) with a leading underscore or remove them to fix
ARG002, remove or use unused variables to fix F841 and B007 (or prefix loop
control variables with _), add explicit stacklevel arguments to warnings.warn
calls to resolve B028, and eliminate unused imports/variables across the
package; iterate until ruff reports no errors and commit the cleaned dpdata
package (starting point: dpdata.__init__ and other modules under dpdata/).
In `@dpdata/formats/amber/md.py`:
- Line 8: Replace the list construction ["X"] + ELEMENTS with the unpacking form
["X", *ELEMENTS] where the atom/elements list is defined; find the constant that
uses ELEMENTS and update it. For the two exception occurrences, move the literal
messages into the exception constructors instead of passing them outside:
replace patterns that raise exceptions with external messages to raise
RuntimeError("Unsupported cells") and raise ValueError("Please provide one of
mden_file and mdout_file") (or the original exception types used) so the strings
"Unsupported cells" and "Please provide one of mden_file and mdout_file" become
the exception's message. Ensure you update the raise sites in
dpdata.formats.amber.md (the blocks that handle cell parsing and the mden/mdout
argument validation) and run ruff check/format afterwards.
---
Nitpick comments:
In `@dpdata/bond_order_system.py`:
- Line 13: Remove the stale commented rdkit import and any related commented-out
rdkit usage lines (specifically the commented "import dpdata.formats.rdkit.mol2"
and the other commented rdkit remnants around the later block) to clean up
noise; locate these comments in dpdata/bond_order_system.py by searching for the
literal "dpdata.formats.rdkit.mol2" and for any commented references to "rdkit"
or "mol2" and delete those comment-only lines, leaving active imports and code
untouched.
In `@dpdata/plugins/openmx.py`:
- Line 64: The unpacked variable `cs` from the call to
dpdata.formats.openmx.omx.to_system_data is unused; change the unpack to use an
underscore placeholder (e.g., data, _ =
dpdata.formats.openmx.omx.to_system_data(...)) to match the pattern used
elsewhere (see the other unpack using `_`) so the unused value is clearly
ignored and linter warnings are avoided.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (96)
dpdata/__init__.pydpdata/bond_order_system.pydpdata/formats/__init__.pydpdata/formats/abacus/__init__.pydpdata/formats/abacus/md.pydpdata/formats/abacus/relax.pydpdata/formats/abacus/scf.pydpdata/formats/abacus/stru.pydpdata/formats/amber/__init__.pydpdata/formats/amber/mask.pydpdata/formats/amber/md.pydpdata/formats/amber/sqm.pydpdata/formats/cp2k/__init__.pydpdata/formats/cp2k/cell.pydpdata/formats/cp2k/output.pydpdata/formats/deepmd/__init__.pydpdata/formats/deepmd/comp.pydpdata/formats/deepmd/hdf5.pydpdata/formats/deepmd/mixed.pydpdata/formats/deepmd/raw.pydpdata/formats/dftbplus/__init__.pydpdata/formats/dftbplus/output.pydpdata/formats/fhi_aims/__init__.pydpdata/formats/fhi_aims/output.pydpdata/formats/gaussian/__init__.pydpdata/formats/gaussian/fchk.pydpdata/formats/gaussian/gjf.pydpdata/formats/gaussian/log.pydpdata/formats/gromacs/__init__.pydpdata/formats/gromacs/gro.pydpdata/formats/lammps/__init__.pydpdata/formats/lammps/dump.pydpdata/formats/lammps/lmp.pydpdata/formats/lmdb/__init__.pydpdata/formats/lmdb/format.pydpdata/formats/md/__init__.pydpdata/formats/md/msd.pydpdata/formats/md/pbc.pydpdata/formats/md/rdf.pydpdata/formats/md/water.pydpdata/formats/openmx/__init__.pydpdata/formats/openmx/omx.pydpdata/formats/orca/__init__.pydpdata/formats/orca/output.pydpdata/formats/psi4/__init__.pydpdata/formats/psi4/input.pydpdata/formats/psi4/output.pydpdata/formats/pwmat/__init__.pydpdata/formats/pwmat/atomconfig.pydpdata/formats/pwmat/movement.pydpdata/formats/pymatgen/__init__.pydpdata/formats/pymatgen/molecule.pydpdata/formats/pymatgen/structure.pydpdata/formats/qe/__init__.pydpdata/formats/qe/scf.pydpdata/formats/qe/traj.pydpdata/formats/rdkit/__init__.pydpdata/formats/rdkit/sanitize.pydpdata/formats/rdkit/utils.pydpdata/formats/siesta/__init__.pydpdata/formats/siesta/aiMD_output.pydpdata/formats/siesta/output.pydpdata/formats/vasp/__init__.pydpdata/formats/vasp/outcar.pydpdata/formats/vasp/poscar.pydpdata/formats/vasp/xml.pydpdata/formats/xyz/__init__.pydpdata/formats/xyz/quip_gap_xyz.pydpdata/formats/xyz/xyz.pydpdata/plugins/3dmol.pydpdata/plugins/abacus.pydpdata/plugins/amber.pydpdata/plugins/cp2k.pydpdata/plugins/deepmd.pydpdata/plugins/dftbplus.pydpdata/plugins/fhi_aims.pydpdata/plugins/gaussian.pydpdata/plugins/gromacs.pydpdata/plugins/lammps.pydpdata/plugins/lmdb.pydpdata/plugins/openmx.pydpdata/plugins/orca.pydpdata/plugins/psi4.pydpdata/plugins/pwmat.pydpdata/plugins/pymatgen.pydpdata/plugins/qe.pydpdata/plugins/rdkit.pydpdata/plugins/siesta.pydpdata/plugins/vasp.pydpdata/plugins/xyz.pydpdata/system.pytests/context.pytests/test_abacus_stru_dump.pytests/test_lammps_lmp_dump.pytests/test_lammps_spin.pytests/test_lmdb.py
| @@ -0,0 +1 @@ | |||
| # Format modules for dpdata | |||
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check current directory structure and file content
pwd
echo "=== File content ==="
cat -n dpdata/formats/__init__.py 2>/dev/null || echo "File not found"
echo -e "\n=== Checking if ruff is available ==="
which ruff || echo "ruff not found, will attempt to install"
echo -e "\n=== Installing ruff if needed ==="
pip install ruff -q
echo -e "\n=== Running ruff checks ==="
ruff check dpdata/ --statistics
echo -e "\n=== Running ruff format check (dry-run) ==="
ruff format --check dpdata/Repository: deepmodeling/dpdata
Length of output: 2279
Package initializer structure is correct; however, ruff linting shows 386 errors across dpdata/ that must be resolved before merge.
The file itself is fine, but the broader directory has significant lint violations:
- 117 TRY003 (raise-vanilla-args)
- 95 ARG002 (unused-method-argument)
- 24 B007 (unused-loop-control-variable)
- 22 B028 (no-explicit-stacklevel)
- 20 F841 (unused-variable)
- And 131 more violations across multiple rules
Formatting is compliant (107 files already formatted). Address the lint errors with ruff check --fix dpdata/ before merge to comply with the coding guidelines for dpdata/**/*.py.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@dpdata/formats/__init__.py` at line 1, The repo has widespread ruff lint
violations under the dpdata package; run ruff check --fix dpdata/ to
automatically apply fixes and then manually address remaining issues: replace
bare "raise" usage with "raise from" or preserve exception context for TRY003,
mark unused method args in class methods (e.g., in any dpdata.* classes) with a
leading underscore or remove them to fix ARG002, remove or use unused variables
to fix F841 and B007 (or prefix loop control variables with _), add explicit
stacklevel arguments to warnings.warn calls to resolve B028, and eliminate
unused imports/variables across the package; iterate until ruff reports no
errors and commit the cleaned dpdata package (starting point: dpdata.__init__
and other modules under dpdata/).
| import numpy as np | ||
|
|
||
| from dpdata.amber.mask import pick_by_amber_mask | ||
| from dpdata.formats.amber.mask import pick_by_amber_mask |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n dpdata/formats/amber/md.py | head -20Repository: deepmodeling/dpdata
Length of output: 598
🏁 Script executed:
ruff check dpdata/formats/amber/md.pyRepository: deepmodeling/dpdata
Length of output: 1706
Fix ruff linting violations before merge.
The file has 3 ruff violations:
- Line 15 (RUF005): Replace
["X"] + ELEMENTSwith["X", *ELEMENTS] - Line 96 (TRY003): Avoid specifying long exception messages outside the exception class; move "Unsupported cells" into the exception
- Line 118 (TRY003): Avoid specifying long exception messages outside the exception class; move "Please provide one of mden_file and mdout_file" into the exception
Run ruff check dpdata/ to verify compliance and ruff format dpdata/ to auto-fix where possible.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@dpdata/formats/amber/md.py` at line 8, Replace the list construction ["X"] +
ELEMENTS with the unpacking form ["X", *ELEMENTS] where the atom/elements list
is defined; find the constant that uses ELEMENTS and update it. For the two
exception occurrences, move the literal messages into the exception constructors
instead of passing them outside: replace patterns that raise exceptions with
external messages to raise RuntimeError("Unsupported cells") and raise
ValueError("Please provide one of mden_file and mdout_file") (or the original
exception types used) so the strings "Unsupported cells" and "Please provide one
of mden_file and mdout_file" become the exception's message. Ensure you update
the raise sites in dpdata.formats.amber.md (the blocks that handle cell parsing
and the mden/mdout argument validation) and run ruff check/format afterwards.
Merging this PR will not alter performance
|
Move all format directories (abacus, amber, cp2k, deepmd, dftbplus, fhi_aims, gaussian, gromacs, lammps, lmdb, md, openmx, orca, psi4, pwmat, pymatgen, qe, rdkit, siesta, vasp, xyz) into a new formats/ subdirectory.
This addresses issue #934.
Changes:
The plugins directory remains at the root level as requested.
Summary by CodeRabbit
Refactor