Skip to content

refactor: reorganize directory structure#946

Open
newtontech wants to merge 2 commits intodeepmodeling:masterfrom
newtontech:fix/issue-934
Open

refactor: reorganize directory structure#946
newtontech wants to merge 2 commits intodeepmodeling:masterfrom
newtontech:fix/issue-934

Conversation

@newtontech
Copy link

@newtontech newtontech commented Feb 28, 2026

Move all format directories (abacus, amber, cp2k, deepmd, dftbplus, fhi_aims, gaussian, gromacs, lammps, lmdb, md, openmx, orca, psi4, pwmat, pymatgen, qe, rdkit, siesta, vasp, xyz) into a new formats/ subdirectory.

This addresses issue #934.

Changes:

  • Created dpdata/formats/ directory
  • Moved all format directories to dpdata/formats/
  • Updated all import statements throughout the codebase
  • Updated relative imports in format modules (from .. to from ...)
  • Updated dpdata/init.py to import from new locations
  • Updated tests/context.py for new import paths

The plugins directory remains at the root level as requested.

Summary by CodeRabbit

Refactor

  • Reorganized internal module structure across the package hierarchy. Modules have been relocated and consolidated for improved organization. All public-facing APIs, standard import paths, and user-accessible functionality remain unchanged. Complete backward compatibility is maintained. Tests and internal references were updated to reflect the new module locations.

Move all format directories (abacus, amber, cp2k, deepmd, dftbplus,
fhi_aims, gaussian, gromacs, lammps, lmdb, md, openmx, orca, psi4,
pwmat, pymatgen, qe, rdkit, siesta, vasp, xyz) into a new formats/
subdirectory.

This addresses issue deepmodeling#934.

Changes:
- Created dpdata/formats/ directory
- Moved all format directories to dpdata/formats/
- Updated all import statements throughout the codebase
- Updated relative imports in format modules (from .. to from ...)
- Updated dpdata/__init__.py to import from new locations
- Updated tests/context.py for new import paths

The plugins directory remains at the root level as requested.
@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Feb 28, 2026
@dosubot dosubot bot added the dpdata label Feb 28, 2026
@coderabbitai
Copy link

coderabbitai bot commented Feb 28, 2026

📝 Walkthrough

Walkthrough

The PR reorganizes dpdata's module structure by moving numerous format-specific modules (abacus, amber, cp2k, gaussian, gromacs, lammps, vasp, and others) into a centralized dpdata.formats subpackage. All corresponding import statements across plugins, core modules, and tests are updated to reference these new locations while preserving identical public APIs.

Changes

Cohort / File(s) Summary
Root package initialization
dpdata/__init__.py
Updated public submodule imports (lammps, md, vasp) to reference dpdata.formats instead of root package.
Bond order and RDKit utilities
dpdata/bond_order_system.py
Updated all RDKit utility references from dpdata.rdkit.* to dpdata.formats.rdkit.* across initialization, mol conversion, and molecule equivalence checks.
Formats package structure
dpdata/formats/__init__.py
Added new package initializer for formats subpackage.
Format modules - Import path adjustments
dpdata/formats/abacus/scf.py, dpdata/formats/abacus/stru.py, dpdata/formats/amber/md.py, dpdata/formats/cp2k/output.py, dpdata/formats/gaussian/fchk.py, dpdata/formats/gaussian/log.py, dpdata/formats/gromacs/gro.py, dpdata/formats/openmx/omx.py, dpdata/formats/pwmat/atomconfig.py, dpdata/formats/pwmat/movement.py, dpdata/formats/qe/traj.py
Updated relative import paths from parent-level (e.g., ..unit) to grandparent-level (e.g., ...unit) to accommodate new package hierarchy under dpdata.formats.
Plugin integrations
dpdata/plugins/3dmol.py, dpdata/plugins/abacus.py, dpdata/plugins/amber.py, dpdata/plugins/cp2k.py, dpdata/plugins/deepmd.py, dpdata/plugins/dftbplus.py, dpdata/plugins/fhi_aims.py, dpdata/plugins/gaussian.py, dpdata/plugins/gromacs.py, dpdata/plugins/lammps.py, dpdata/plugins/lmdb.py, dpdata/plugins/openmx.py, dpdata/plugins/orca.py, dpdata/plugins/psi4.py, dpdata/plugins/pwmat.py, dpdata/plugins/pymatgen.py, dpdata/plugins/qe.py, dpdata/plugins/rdkit.py, dpdata/plugins/siesta.py, dpdata/plugins/vasp.py, dpdata/plugins/xyz.py
Updated all module imports and function call references from original dpdata.* namespaces to new dpdata.formats.* paths across all format reader/writer implementations.
Core system modules
dpdata/system.py, dpdata/bond_order_system.py
Updated internal imports from dpdata.md.pbc and dpdata.amber.mask to their new locations under dpdata.formats.
Test files
tests/context.py, tests/test_abacus_stru_dump.py, tests/test_lammps_lmp_dump.py, tests/test_lammps_spin.py, tests/test_lmdb.py
Updated test imports to reference format modules from new dpdata.formats package hierarchy.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Possibly related PRs

  • add spin for lammps #738: Related through shared LAMMPS module refactoring—this PR moves LAMMPS modules into dpdata.formats (affecting dpdata.formats.lammps.dump and related functions) which directly impacts the LAMMPS spin functions and test imports modified in PR #738.

  • Refactor the reading and writing of abacus/stru format #793: Related through ABACUS module reorganization—this PR updates all imports to use refactored ABACUS functions (get_frame_from_stru, make_unlabeled_stru) now located under dpdata.formats.abacus, matching the refactoring introduced in PR #793.

  • docs: fix documentation build #655: Related through pymatgen integration—this PR updates call sites to use dpdata.formats.pymatgen.structure.from_system_data, which directly targets the pymatgen structure conversion functionality added in PR #655.

Suggested reviewers

  • njzjz
  • wanghan-iapcm
  • iProzd
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 47.46% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: reorganizing the directory structure by moving format directories into a new dpdata/formats/ subdirectory, which aligns with the comprehensive refactoring shown across all 40+ modified files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
dpdata/plugins/openmx.py (1)

64-64: Prefix unused variable cs with underscore for consistency.

The variable cs is unpacked but never used. For consistency with line 38 (which uses _ for the same purpose), consider using _ here as well.

♻️ Proposed fix
-        data, cs = dpdata.formats.openmx.omx.to_system_data(fname, mdname)
+        data, _ = dpdata.formats.openmx.omx.to_system_data(fname, mdname)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dpdata/plugins/openmx.py` at line 64, The unpacked variable `cs` from the
call to dpdata.formats.openmx.omx.to_system_data is unused; change the unpack to
use an underscore placeholder (e.g., data, _ =
dpdata.formats.openmx.omx.to_system_data(...)) to match the pattern used
elsewhere (see the other unpack using `_`) so the unused value is clearly
ignored and linter warnings are avoided.
dpdata/bond_order_system.py (1)

13-13: Optional cleanup: remove stale commented rdkit import/usage lines.

These comment-only remnants add noise and can be dropped in a follow-up for readability.

Also applies to: 164-164

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dpdata/bond_order_system.py` at line 13, Remove the stale commented rdkit
import and any related commented-out rdkit usage lines (specifically the
commented "import dpdata.formats.rdkit.mol2" and the other commented rdkit
remnants around the later block) to clean up noise; locate these comments in
dpdata/bond_order_system.py by searching for the literal
"dpdata.formats.rdkit.mol2" and for any commented references to "rdkit" or
"mol2" and delete those comment-only lines, leaving active imports and code
untouched.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dpdata/formats/__init__.py`:
- Line 1: The repo has widespread ruff lint violations under the dpdata package;
run ruff check --fix dpdata/ to automatically apply fixes and then manually
address remaining issues: replace bare "raise" usage with "raise from" or
preserve exception context for TRY003, mark unused method args in class methods
(e.g., in any dpdata.* classes) with a leading underscore or remove them to fix
ARG002, remove or use unused variables to fix F841 and B007 (or prefix loop
control variables with _), add explicit stacklevel arguments to warnings.warn
calls to resolve B028, and eliminate unused imports/variables across the
package; iterate until ruff reports no errors and commit the cleaned dpdata
package (starting point: dpdata.__init__ and other modules under dpdata/).

In `@dpdata/formats/amber/md.py`:
- Line 8: Replace the list construction ["X"] + ELEMENTS with the unpacking form
["X", *ELEMENTS] where the atom/elements list is defined; find the constant that
uses ELEMENTS and update it. For the two exception occurrences, move the literal
messages into the exception constructors instead of passing them outside:
replace patterns that raise exceptions with external messages to raise
RuntimeError("Unsupported cells") and raise ValueError("Please provide one of
mden_file and mdout_file") (or the original exception types used) so the strings
"Unsupported cells" and "Please provide one of mden_file and mdout_file" become
the exception's message. Ensure you update the raise sites in
dpdata.formats.amber.md (the blocks that handle cell parsing and the mden/mdout
argument validation) and run ruff check/format afterwards.

---

Nitpick comments:
In `@dpdata/bond_order_system.py`:
- Line 13: Remove the stale commented rdkit import and any related commented-out
rdkit usage lines (specifically the commented "import dpdata.formats.rdkit.mol2"
and the other commented rdkit remnants around the later block) to clean up
noise; locate these comments in dpdata/bond_order_system.py by searching for the
literal "dpdata.formats.rdkit.mol2" and for any commented references to "rdkit"
or "mol2" and delete those comment-only lines, leaving active imports and code
untouched.

In `@dpdata/plugins/openmx.py`:
- Line 64: The unpacked variable `cs` from the call to
dpdata.formats.openmx.omx.to_system_data is unused; change the unpack to use an
underscore placeholder (e.g., data, _ =
dpdata.formats.openmx.omx.to_system_data(...)) to match the pattern used
elsewhere (see the other unpack using `_`) so the unused value is clearly
ignored and linter warnings are avoided.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ae134fe and 31f6ab8.

📒 Files selected for processing (96)
  • dpdata/__init__.py
  • dpdata/bond_order_system.py
  • dpdata/formats/__init__.py
  • dpdata/formats/abacus/__init__.py
  • dpdata/formats/abacus/md.py
  • dpdata/formats/abacus/relax.py
  • dpdata/formats/abacus/scf.py
  • dpdata/formats/abacus/stru.py
  • dpdata/formats/amber/__init__.py
  • dpdata/formats/amber/mask.py
  • dpdata/formats/amber/md.py
  • dpdata/formats/amber/sqm.py
  • dpdata/formats/cp2k/__init__.py
  • dpdata/formats/cp2k/cell.py
  • dpdata/formats/cp2k/output.py
  • dpdata/formats/deepmd/__init__.py
  • dpdata/formats/deepmd/comp.py
  • dpdata/formats/deepmd/hdf5.py
  • dpdata/formats/deepmd/mixed.py
  • dpdata/formats/deepmd/raw.py
  • dpdata/formats/dftbplus/__init__.py
  • dpdata/formats/dftbplus/output.py
  • dpdata/formats/fhi_aims/__init__.py
  • dpdata/formats/fhi_aims/output.py
  • dpdata/formats/gaussian/__init__.py
  • dpdata/formats/gaussian/fchk.py
  • dpdata/formats/gaussian/gjf.py
  • dpdata/formats/gaussian/log.py
  • dpdata/formats/gromacs/__init__.py
  • dpdata/formats/gromacs/gro.py
  • dpdata/formats/lammps/__init__.py
  • dpdata/formats/lammps/dump.py
  • dpdata/formats/lammps/lmp.py
  • dpdata/formats/lmdb/__init__.py
  • dpdata/formats/lmdb/format.py
  • dpdata/formats/md/__init__.py
  • dpdata/formats/md/msd.py
  • dpdata/formats/md/pbc.py
  • dpdata/formats/md/rdf.py
  • dpdata/formats/md/water.py
  • dpdata/formats/openmx/__init__.py
  • dpdata/formats/openmx/omx.py
  • dpdata/formats/orca/__init__.py
  • dpdata/formats/orca/output.py
  • dpdata/formats/psi4/__init__.py
  • dpdata/formats/psi4/input.py
  • dpdata/formats/psi4/output.py
  • dpdata/formats/pwmat/__init__.py
  • dpdata/formats/pwmat/atomconfig.py
  • dpdata/formats/pwmat/movement.py
  • dpdata/formats/pymatgen/__init__.py
  • dpdata/formats/pymatgen/molecule.py
  • dpdata/formats/pymatgen/structure.py
  • dpdata/formats/qe/__init__.py
  • dpdata/formats/qe/scf.py
  • dpdata/formats/qe/traj.py
  • dpdata/formats/rdkit/__init__.py
  • dpdata/formats/rdkit/sanitize.py
  • dpdata/formats/rdkit/utils.py
  • dpdata/formats/siesta/__init__.py
  • dpdata/formats/siesta/aiMD_output.py
  • dpdata/formats/siesta/output.py
  • dpdata/formats/vasp/__init__.py
  • dpdata/formats/vasp/outcar.py
  • dpdata/formats/vasp/poscar.py
  • dpdata/formats/vasp/xml.py
  • dpdata/formats/xyz/__init__.py
  • dpdata/formats/xyz/quip_gap_xyz.py
  • dpdata/formats/xyz/xyz.py
  • dpdata/plugins/3dmol.py
  • dpdata/plugins/abacus.py
  • dpdata/plugins/amber.py
  • dpdata/plugins/cp2k.py
  • dpdata/plugins/deepmd.py
  • dpdata/plugins/dftbplus.py
  • dpdata/plugins/fhi_aims.py
  • dpdata/plugins/gaussian.py
  • dpdata/plugins/gromacs.py
  • dpdata/plugins/lammps.py
  • dpdata/plugins/lmdb.py
  • dpdata/plugins/openmx.py
  • dpdata/plugins/orca.py
  • dpdata/plugins/psi4.py
  • dpdata/plugins/pwmat.py
  • dpdata/plugins/pymatgen.py
  • dpdata/plugins/qe.py
  • dpdata/plugins/rdkit.py
  • dpdata/plugins/siesta.py
  • dpdata/plugins/vasp.py
  • dpdata/plugins/xyz.py
  • dpdata/system.py
  • tests/context.py
  • tests/test_abacus_stru_dump.py
  • tests/test_lammps_lmp_dump.py
  • tests/test_lammps_spin.py
  • tests/test_lmdb.py

@@ -0,0 +1 @@
# Format modules for dpdata
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check current directory structure and file content
pwd
echo "=== File content ==="
cat -n dpdata/formats/__init__.py 2>/dev/null || echo "File not found"

echo -e "\n=== Checking if ruff is available ==="
which ruff || echo "ruff not found, will attempt to install"

echo -e "\n=== Installing ruff if needed ==="
pip install ruff -q

echo -e "\n=== Running ruff checks ==="
ruff check dpdata/ --statistics

echo -e "\n=== Running ruff format check (dry-run) ==="
ruff format --check dpdata/

Repository: deepmodeling/dpdata

Length of output: 2279


Package initializer structure is correct; however, ruff linting shows 386 errors across dpdata/ that must be resolved before merge.

The file itself is fine, but the broader directory has significant lint violations:

  • 117 TRY003 (raise-vanilla-args)
  • 95 ARG002 (unused-method-argument)
  • 24 B007 (unused-loop-control-variable)
  • 22 B028 (no-explicit-stacklevel)
  • 20 F841 (unused-variable)
  • And 131 more violations across multiple rules

Formatting is compliant (107 files already formatted). Address the lint errors with ruff check --fix dpdata/ before merge to comply with the coding guidelines for dpdata/**/*.py.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dpdata/formats/__init__.py` at line 1, The repo has widespread ruff lint
violations under the dpdata package; run ruff check --fix dpdata/ to
automatically apply fixes and then manually address remaining issues: replace
bare "raise" usage with "raise from" or preserve exception context for TRY003,
mark unused method args in class methods (e.g., in any dpdata.* classes) with a
leading underscore or remove them to fix ARG002, remove or use unused variables
to fix F841 and B007 (or prefix loop control variables with _), add explicit
stacklevel arguments to warnings.warn calls to resolve B028, and eliminate
unused imports/variables across the package; iterate until ruff reports no
errors and commit the cleaned dpdata package (starting point: dpdata.__init__
and other modules under dpdata/).

import numpy as np

from dpdata.amber.mask import pick_by_amber_mask
from dpdata.formats.amber.mask import pick_by_amber_mask
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n dpdata/formats/amber/md.py | head -20

Repository: deepmodeling/dpdata

Length of output: 598


🏁 Script executed:

ruff check dpdata/formats/amber/md.py

Repository: deepmodeling/dpdata

Length of output: 1706


Fix ruff linting violations before merge.

The file has 3 ruff violations:

  • Line 15 (RUF005): Replace ["X"] + ELEMENTS with ["X", *ELEMENTS]
  • Line 96 (TRY003): Avoid specifying long exception messages outside the exception class; move "Unsupported cells" into the exception
  • Line 118 (TRY003): Avoid specifying long exception messages outside the exception class; move "Please provide one of mden_file and mdout_file" into the exception

Run ruff check dpdata/ to verify compliance and ruff format dpdata/ to auto-fix where possible.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dpdata/formats/amber/md.py` at line 8, Replace the list construction ["X"] +
ELEMENTS with the unpacking form ["X", *ELEMENTS] where the atom/elements list
is defined; find the constant that uses ELEMENTS and update it. For the two
exception occurrences, move the literal messages into the exception constructors
instead of passing them outside: replace patterns that raise exceptions with
external messages to raise RuntimeError("Unsupported cells") and raise
ValueError("Please provide one of mden_file and mdout_file") (or the original
exception types used) so the strings "Unsupported cells" and "Please provide one
of mden_file and mdout_file" become the exception's message. Ensure you update
the raise sites in dpdata.formats.amber.md (the blocks that handle cell parsing
and the mden/mdout argument validation) and run ruff check/format afterwards.

@wanghan-iapcm wanghan-iapcm requested a review from njzjz March 1, 2026 09:17
@codspeed-hq
Copy link

codspeed-hq bot commented Mar 1, 2026

Merging this PR will not alter performance

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

✅ 2 untouched benchmarks


Comparing newtontech:fix/issue-934 (31f6ab8) with master (ae134fe)

Open in CodSpeed

Copy link
Member

@njzjz njzjz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please let the CI pass

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dpdata size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants