refactor: reorganize directory structure by newtontech · Pull Request #946 · deepmodeling/dpdata

newtontech · 2026-02-28T03:07:32Z

Move all format directories (abacus, amber, cp2k, deepmd, dftbplus, fhi_aims, gaussian, gromacs, lammps, lmdb, md, openmx, orca, psi4, pwmat, pymatgen, qe, rdkit, siesta, vasp, xyz) into a new formats/ subdirectory.

This addresses issue #934.

Changes:

Created dpdata/formats/ directory
Moved all format directories to dpdata/formats/
Updated all import statements throughout the codebase
Updated relative imports in format modules (from .. to from ...)
Updated dpdata/init.py to import from new locations
Updated tests/context.py for new import paths

The plugins directory remains at the root level as requested.

Summary by CodeRabbit

Refactor

Reorganized internal module structure across the package hierarchy. Modules have been relocated and consolidated for improved organization. All public-facing APIs, standard import paths, and user-accessible functionality remain unchanged. Complete backward compatibility is maintained. Tests and internal references were updated to reflect the new module locations.

Move all format directories (abacus, amber, cp2k, deepmd, dftbplus, fhi_aims, gaussian, gromacs, lammps, lmdb, md, openmx, orca, psi4, pwmat, pymatgen, qe, rdkit, siesta, vasp, xyz) into a new formats/ subdirectory. This addresses issue deepmodeling#934. Changes: - Created dpdata/formats/ directory - Moved all format directories to dpdata/formats/ - Updated all import statements throughout the codebase - Updated relative imports in format modules (from .. to from ...) - Updated dpdata/__init__.py to import from new locations - Updated tests/context.py for new import paths The plugins directory remains at the root level as requested.

for more information, see https://pre-commit.ci

coderabbitai · 2026-02-28T03:09:13Z

📝 Walkthrough

Walkthrough

The PR reorganizes dpdata's module structure by moving numerous format-specific modules (abacus, amber, cp2k, gaussian, gromacs, lammps, vasp, and others) into a centralized dpdata.formats subpackage. All corresponding import statements across plugins, core modules, and tests are updated to reference these new locations while preserving identical public APIs.

Changes

Cohort / File(s)	Summary
Root package initialization `dpdata/__init__.py`	Updated public submodule imports (lammps, md, vasp) to reference `dpdata.formats` instead of root package.
Bond order and RDKit utilities `dpdata/bond_order_system.py`	Updated all RDKit utility references from `dpdata.rdkit.` to `dpdata.formats.rdkit.` across initialization, mol conversion, and molecule equivalence checks.
Formats package structure `dpdata/formats/__init__.py`	Added new package initializer for formats subpackage.
Format modules - Import path adjustments `dpdata/formats/abacus/scf.py`, `dpdata/formats/abacus/stru.py`, `dpdata/formats/amber/md.py`, `dpdata/formats/cp2k/output.py`, `dpdata/formats/gaussian/fchk.py`, `dpdata/formats/gaussian/log.py`, `dpdata/formats/gromacs/gro.py`, `dpdata/formats/openmx/omx.py`, `dpdata/formats/pwmat/atomconfig.py`, `dpdata/formats/pwmat/movement.py`, `dpdata/formats/qe/traj.py`	Updated relative import paths from parent-level (e.g., `..unit`) to grandparent-level (e.g., `...unit`) to accommodate new package hierarchy under `dpdata.formats`.
Plugin integrations `dpdata/plugins/3dmol.py`, `dpdata/plugins/abacus.py`, `dpdata/plugins/amber.py`, `dpdata/plugins/cp2k.py`, `dpdata/plugins/deepmd.py`, `dpdata/plugins/dftbplus.py`, `dpdata/plugins/fhi_aims.py`, `dpdata/plugins/gaussian.py`, `dpdata/plugins/gromacs.py`, `dpdata/plugins/lammps.py`, `dpdata/plugins/lmdb.py`, `dpdata/plugins/openmx.py`, `dpdata/plugins/orca.py`, `dpdata/plugins/psi4.py`, `dpdata/plugins/pwmat.py`, `dpdata/plugins/pymatgen.py`, `dpdata/plugins/qe.py`, `dpdata/plugins/rdkit.py`, `dpdata/plugins/siesta.py`, `dpdata/plugins/vasp.py`, `dpdata/plugins/xyz.py`	Updated all module imports and function call references from original `dpdata.` namespaces to new `dpdata.formats.` paths across all format reader/writer implementations.
Core system modules `dpdata/system.py`, `dpdata/bond_order_system.py`	Updated internal imports from `dpdata.md.pbc` and `dpdata.amber.mask` to their new locations under `dpdata.formats`.
Test files `tests/context.py`, `tests/test_abacus_stru_dump.py`, `tests/test_lammps_lmp_dump.py`, `tests/test_lammps_spin.py`, `tests/test_lmdb.py`	Updated test imports to reference format modules from new `dpdata.formats` package hierarchy.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

[Feature Request] Reorganize the directory structure #934: Main issue proposing the directory reorganization that this PR directly implements by moving format modules under dpdata.formats and updating all corresponding import paths throughout the codebase.

Possibly related PRs

add spin for lammps #738: Related through shared LAMMPS module refactoring—this PR moves LAMMPS modules into dpdata.formats (affecting dpdata.formats.lammps.dump and related functions) which directly impacts the LAMMPS spin functions and test imports modified in PR #738.
Refactor the reading and writing of abacus/stru format #793: Related through ABACUS module reorganization—this PR updates all imports to use refactored ABACUS functions (get_frame_from_stru, make_unlabeled_stru) now located under dpdata.formats.abacus, matching the refactoring introduced in PR #793.
docs: fix documentation build #655: Related through pymatgen integration—this PR updates call sites to use dpdata.formats.pymatgen.structure.from_system_data, which directly targets the pymatgen structure conversion functionality added in PR #655.

Suggested reviewers

njzjz
wanghan-iapcm
iProzd

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 47.46% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: reorganizing the directory structure by moving format directories into a new dpdata/formats/ subdirectory, which aligns with the comprehensive refactoring shown across all 40+ modified files.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

dpdata/plugins/openmx.py (1)

64-64: Prefix unused variable cs with underscore for consistency.

The variable cs is unpacked but never used. For consistency with line 38 (which uses _ for the same purpose), consider using _ here as well.

♻️ Proposed fix

-        data, cs = dpdata.formats.openmx.omx.to_system_data(fname, mdname)
+        data, _ = dpdata.formats.openmx.omx.to_system_data(fname, mdname)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@dpdata/plugins/openmx.py` at line 64, The unpacked variable `cs` from the
call to dpdata.formats.openmx.omx.to_system_data is unused; change the unpack to
use an underscore placeholder (e.g., data, _ =
dpdata.formats.openmx.omx.to_system_data(...)) to match the pattern used
elsewhere (see the other unpack using `_`) so the unused value is clearly
ignored and linter warnings are avoided.

dpdata/bond_order_system.py (1)

13-13: Optional cleanup: remove stale commented rdkit import/usage lines.

These comment-only remnants add noise and can be dropped in a follow-up for readability.

Also applies to: 164-164

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@dpdata/bond_order_system.py` at line 13, Remove the stale commented rdkit
import and any related commented-out rdkit usage lines (specifically the
commented "import dpdata.formats.rdkit.mol2" and the other commented rdkit
remnants around the later block) to clean up noise; locate these comments in
dpdata/bond_order_system.py by searching for the literal
"dpdata.formats.rdkit.mol2" and for any commented references to "rdkit" or
"mol2" and delete those comment-only lines, leaving active imports and code
untouched.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dpdata/formats/__init__.py`:
- Line 1: The repo has widespread ruff lint violations under the dpdata package;
run ruff check --fix dpdata/ to automatically apply fixes and then manually
address remaining issues: replace bare "raise" usage with "raise from" or
preserve exception context for TRY003, mark unused method args in class methods
(e.g., in any dpdata.* classes) with a leading underscore or remove them to fix
ARG002, remove or use unused variables to fix F841 and B007 (or prefix loop
control variables with _), add explicit stacklevel arguments to warnings.warn
calls to resolve B028, and eliminate unused imports/variables across the
package; iterate until ruff reports no errors and commit the cleaned dpdata
package (starting point: dpdata.__init__ and other modules under dpdata/).

In `@dpdata/formats/amber/md.py`:
- Line 8: Replace the list construction ["X"] + ELEMENTS with the unpacking form
["X", *ELEMENTS] where the atom/elements list is defined; find the constant that
uses ELEMENTS and update it. For the two exception occurrences, move the literal
messages into the exception constructors instead of passing them outside:
replace patterns that raise exceptions with external messages to raise
RuntimeError("Unsupported cells") and raise ValueError("Please provide one of
mden_file and mdout_file") (or the original exception types used) so the strings
"Unsupported cells" and "Please provide one of mden_file and mdout_file" become
the exception's message. Ensure you update the raise sites in
dpdata.formats.amber.md (the blocks that handle cell parsing and the mden/mdout
argument validation) and run ruff check/format afterwards.

---

Nitpick comments:
In `@dpdata/bond_order_system.py`:
- Line 13: Remove the stale commented rdkit import and any related commented-out
rdkit usage lines (specifically the commented "import dpdata.formats.rdkit.mol2"
and the other commented rdkit remnants around the later block) to clean up
noise; locate these comments in dpdata/bond_order_system.py by searching for the
literal "dpdata.formats.rdkit.mol2" and for any commented references to "rdkit"
or "mol2" and delete those comment-only lines, leaving active imports and code
untouched.

In `@dpdata/plugins/openmx.py`:
- Line 64: The unpacked variable `cs` from the call to
dpdata.formats.openmx.omx.to_system_data is unused; change the unpack to use an
underscore placeholder (e.g., data, _ =
dpdata.formats.openmx.omx.to_system_data(...)) to match the pattern used
elsewhere (see the other unpack using `_`) so the unused value is clearly
ignored and linter warnings are avoided.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ae134fe and 31f6ab8.

📒 Files selected for processing (96)

dpdata/__init__.py
dpdata/bond_order_system.py
dpdata/formats/__init__.py
dpdata/formats/abacus/__init__.py
dpdata/formats/abacus/md.py
dpdata/formats/abacus/relax.py
dpdata/formats/abacus/scf.py
dpdata/formats/abacus/stru.py
dpdata/formats/amber/__init__.py
dpdata/formats/amber/mask.py
dpdata/formats/amber/md.py
dpdata/formats/amber/sqm.py
dpdata/formats/cp2k/__init__.py
dpdata/formats/cp2k/cell.py
dpdata/formats/cp2k/output.py
dpdata/formats/deepmd/__init__.py
dpdata/formats/deepmd/comp.py
dpdata/formats/deepmd/hdf5.py
dpdata/formats/deepmd/mixed.py
dpdata/formats/deepmd/raw.py
dpdata/formats/dftbplus/__init__.py
dpdata/formats/dftbplus/output.py
dpdata/formats/fhi_aims/__init__.py
dpdata/formats/fhi_aims/output.py
dpdata/formats/gaussian/__init__.py
dpdata/formats/gaussian/fchk.py
dpdata/formats/gaussian/gjf.py
dpdata/formats/gaussian/log.py
dpdata/formats/gromacs/__init__.py
dpdata/formats/gromacs/gro.py
dpdata/formats/lammps/__init__.py
dpdata/formats/lammps/dump.py
dpdata/formats/lammps/lmp.py
dpdata/formats/lmdb/__init__.py
dpdata/formats/lmdb/format.py
dpdata/formats/md/__init__.py
dpdata/formats/md/msd.py
dpdata/formats/md/pbc.py
dpdata/formats/md/rdf.py
dpdata/formats/md/water.py
dpdata/formats/openmx/__init__.py
dpdata/formats/openmx/omx.py
dpdata/formats/orca/__init__.py
dpdata/formats/orca/output.py
dpdata/formats/psi4/__init__.py
dpdata/formats/psi4/input.py
dpdata/formats/psi4/output.py
dpdata/formats/pwmat/__init__.py
dpdata/formats/pwmat/atomconfig.py
dpdata/formats/pwmat/movement.py
dpdata/formats/pymatgen/__init__.py
dpdata/formats/pymatgen/molecule.py
dpdata/formats/pymatgen/structure.py
dpdata/formats/qe/__init__.py
dpdata/formats/qe/scf.py
dpdata/formats/qe/traj.py
dpdata/formats/rdkit/__init__.py
dpdata/formats/rdkit/sanitize.py
dpdata/formats/rdkit/utils.py
dpdata/formats/siesta/__init__.py
dpdata/formats/siesta/aiMD_output.py
dpdata/formats/siesta/output.py
dpdata/formats/vasp/__init__.py
dpdata/formats/vasp/outcar.py
dpdata/formats/vasp/poscar.py
dpdata/formats/vasp/xml.py
dpdata/formats/xyz/__init__.py
dpdata/formats/xyz/quip_gap_xyz.py
dpdata/formats/xyz/xyz.py
dpdata/plugins/3dmol.py
dpdata/plugins/abacus.py
dpdata/plugins/amber.py
dpdata/plugins/cp2k.py
dpdata/plugins/deepmd.py
dpdata/plugins/dftbplus.py
dpdata/plugins/fhi_aims.py
dpdata/plugins/gaussian.py
dpdata/plugins/gromacs.py
dpdata/plugins/lammps.py
dpdata/plugins/lmdb.py
dpdata/plugins/openmx.py
dpdata/plugins/orca.py
dpdata/plugins/psi4.py
dpdata/plugins/pwmat.py
dpdata/plugins/pymatgen.py
dpdata/plugins/qe.py
dpdata/plugins/rdkit.py
dpdata/plugins/siesta.py
dpdata/plugins/vasp.py
dpdata/plugins/xyz.py
dpdata/system.py
tests/context.py
tests/test_abacus_stru_dump.py
tests/test_lammps_lmp_dump.py
tests/test_lammps_spin.py
tests/test_lmdb.py

coderabbitai · 2026-02-28T03:13:28Z

dpdata/formats/__init__.py

@@ -0,0 +1 @@
+# Format modules for dpdata


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check current directory structure and file content pwd echo "=== File content ===" cat -n dpdata/formats/__init__.py 2>/dev/null || echo "File not found" echo -e "\n=== Checking if ruff is available ===" which ruff || echo "ruff not found, will attempt to install" echo -e "\n=== Installing ruff if needed ===" pip install ruff -q echo -e "\n=== Running ruff checks ===" ruff check dpdata/ --statistics echo -e "\n=== Running ruff format check (dry-run) ===" ruff format --check dpdata/

Repository: deepmodeling/dpdata

Length of output: 2279

Package initializer structure is correct; however, ruff linting shows 386 errors across dpdata/ that must be resolved before merge.

The file itself is fine, but the broader directory has significant lint violations:

117 TRY003 (raise-vanilla-args)

95 ARG002 (unused-method-argument)

24 B007 (unused-loop-control-variable)

22 B028 (no-explicit-stacklevel)

20 F841 (unused-variable)

And 131 more violations across multiple rules

Formatting is compliant (107 files already formatted). Address the lint errors with ruff check --fix dpdata/ before merge to comply with the coding guidelines for dpdata/**/*.py.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@dpdata/formats/__init__.py` at line 1, The repo has widespread ruff lint violations under the dpdata package; run ruff check --fix dpdata/ to automatically apply fixes and then manually address remaining issues: replace bare "raise" usage with "raise from" or preserve exception context for TRY003, mark unused method args in class methods (e.g., in any dpdata.* classes) with a leading underscore or remove them to fix ARG002, remove or use unused variables to fix F841 and B007 (or prefix loop control variables with _), add explicit stacklevel arguments to warnings.warn calls to resolve B028, and eliminate unused imports/variables across the package; iterate until ruff reports no errors and commit the cleaned dpdata package (starting point: dpdata.__init__ and other modules under dpdata/).

coderabbitai · 2026-02-28T03:13:28Z

dpdata/formats/amber/md.py

 import numpy as np

-from dpdata.amber.mask import pick_by_amber_mask
+from dpdata.formats.amber.mask import pick_by_amber_mask


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n dpdata/formats/amber/md.py | head -20

Repository: deepmodeling/dpdata

Length of output: 598

🏁 Script executed:

ruff check dpdata/formats/amber/md.py

Repository: deepmodeling/dpdata

Length of output: 1706

Fix ruff linting violations before merge.

The file has 3 ruff violations:

Line 15 (RUF005): Replace ["X"] + ELEMENTS with ["X", *ELEMENTS]

Line 96 (TRY003): Avoid specifying long exception messages outside the exception class; move "Unsupported cells" into the exception

Line 118 (TRY003): Avoid specifying long exception messages outside the exception class; move "Please provide one of mden_file and mdout_file" into the exception

Run ruff check dpdata/ to verify compliance and ruff format dpdata/ to auto-fix where possible.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@dpdata/formats/amber/md.py` at line 8, Replace the list construction ["X"] + ELEMENTS with the unpacking form ["X", *ELEMENTS] where the atom/elements list is defined; find the constant that uses ELEMENTS and update it. For the two exception occurrences, move the literal messages into the exception constructors instead of passing them outside: replace patterns that raise exceptions with external messages to raise RuntimeError("Unsupported cells") and raise ValueError("Please provide one of mden_file and mdout_file") (or the original exception types used) so the strings "Unsupported cells" and "Please provide one of mden_file and mdout_file" become the exception's message. Ensure you update the raise sites in dpdata.formats.amber.md (the blocks that handle cell parsing and the mden/mdout argument validation) and run ruff check/format afterwards.

codspeed-hq · 2026-03-01T09:27:23Z

Merging this PR will not alter performance

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

✅ 2 untouched benchmarks

_{Comparing newtontech:fix/issue-934 (31f6ab8) with master (ae134fe)}

njzjz

Please let the CI pass

dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Feb 28, 2026

[pre-commit.ci] auto fixes from pre-commit.com hooks

31f6ab8

for more information, see https://pre-commit.ci

dosubot bot added the dpdata label Feb 28, 2026

coderabbitai bot reviewed Feb 28, 2026

View reviewed changes

wanghan-iapcm requested a review from njzjz March 1, 2026 09:17

njzjz requested changes Mar 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: reorganize directory structure#946

refactor: reorganize directory structure#946
newtontech wants to merge 2 commits intodeepmodeling:masterfrom
newtontech:fix/issue-934

newtontech commented Feb 28, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 28, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 28, 2026

Uh oh!

coderabbitai bot Feb 28, 2026

Uh oh!

codspeed-hq bot commented Mar 1, 2026

Uh oh!

njzjz left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

newtontech commented Feb 28, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Refactor

Uh oh!

coderabbitai bot commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

codspeed-hq bot commented Mar 1, 2026

Merging this PR will not alter performance

Uh oh!

njzjz left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

newtontech commented Feb 28, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 28, 2026 •

edited

Loading