Skip to content

fix: Support CP2K 2025 output format for energy and forces (fixes #850)#947

Open
newtontech wants to merge 4 commits intodeepmodeling:masterfrom
newtontech:fix-cp2k-2025-format-v2
Open

fix: Support CP2K 2025 output format for energy and forces (fixes #850)#947
newtontech wants to merge 4 commits intodeepmodeling:masterfrom
newtontech:fix-cp2k-2025-format-v2

Conversation

@newtontech
Copy link

@newtontech newtontech commented Feb 28, 2026

This commit adds support for parsing CP2K 2025 version output files:

Changes in CP2K 2025 format:

  1. Energy line format changed from: 'ENERGY| Total FORCE_EVAL ( QS ) energy (a.u.): -7.997403996236343' to: 'ENERGY| Total FORCE_EVAL ( QS ) energy [hartree] -7.364190264587725'

  2. Forces output format changed from:
    'ATOMIC FORCES in [a.u.]' table with ' Atom Kind Element X Y Z' header
    to:
    'FORCES| Atomic forces [hartree/bohr]' with 'FORCES| Atom x y z |f|' prefix lines

Implementation:

  • Detect CP2K 2025 format by checking for 'energy [hartree]' in the content
  • Parse energy from new '[hartree]' format
  • Parse forces from new 'FORCES|' prefixed lines
  • Maintain backward compatibility with CP2K 2023 format

Testing:

  • Added test file for CP2K 2025 format (tests/cp2k/cp2k_2025_output/)
  • Added test case TestCp2k2025Output to verify parsing
  • Added regression test TestCp2k2023FormatStillWorks to ensure backward compatibility
  • All existing CP2K tests pass

Summary by CodeRabbit

  • New Features

    • Added support for CP2K 2025 output parsing with automatic format detection and robust energy/force extraction alongside existing CP2K 2023 handling.
  • Tests

    • Added comprehensive tests and fixtures for CP2K 2025 parsing, including edge-case scenarios, plus regression tests ensuring CP2K 2023 behavior remains unchanged.

…pmodeling#850)

This commit adds support for parsing CP2K 2025 version output files:

**Changes in CP2K 2025 format:**
1. Energy line format changed from:
   'ENERGY| Total FORCE_EVAL ( QS ) energy (a.u.): -7.997403996236343'
   to:
   'ENERGY| Total FORCE_EVAL ( QS ) energy [hartree] -7.364190264587725'

2. Forces output format changed from:
   'ATOMIC FORCES in [a.u.]' table with ' Atom   Kind   Element X Y Z' header
   to:
   'FORCES| Atomic forces [hartree/bohr]' with 'FORCES| Atom x y z |f|' prefix lines

**Implementation:**
- Detect CP2K 2025 format by checking for 'energy [hartree]' in the content
- Parse energy from new '[hartree]' format
- Parse forces from new 'FORCES|' prefixed lines
- Maintain backward compatibility with CP2K 2023 format

**Testing:**
- Added test file for CP2K 2025 format (tests/cp2k/cp2k_2025_output/)
- Added test case TestCp2k2025Output to verify parsing
- Added regression test TestCp2k2023FormatStillWorks to ensure backward compatibility
- All existing CP2K tests pass
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Feb 28, 2026
@dosubot dosubot bot added the cp2k label Feb 28, 2026
@coderabbitai
Copy link

coderabbitai bot commented Feb 28, 2026

📝 Walkthrough

Walkthrough

Adds CP2K 2025 output support to dpdata by detecting a "energy [hartree]" token and branching parsing logic to extract energies and forces from the new CP2K 2025 line formats while preserving CP2K 2023 behavior. New test fixtures and unit tests validate parsing and edge cases.

Changes

Cohort / File(s) Summary
Core implementation
dpdata/cp2k/output.py
Adds CP2K 2025 detection flag (scan for "energy [hartree]") and branching parsing: energy extraction after "[hartree]" token with numeric-token fallback; new forces parsing from lines containing `FORCES
Test fixtures
tests/cp2k/cp2k_2025_output/cp2k_2025_output, tests/cp2k/cp2k_2025_output/deepmd/type.raw, tests/cp2k/cp2k_2025_output/deepmd/type_map.raw
Adds CP2K 2025 sample output capturing program header, SCF, energy, forces, stress, coordinates, and small DeepMD type/map files for the fixture.
Test cases
tests/test_cp2k_2025_output.py
Adds unit tests covering CP2K 2025 energy/force extraction, regression tests for CP2K 2023, and edge-case tests (extra whitespace, header lines, atomic forces line variants).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Suggested reviewers

  • njzjz
  • pxlxingliang
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 70.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change: adding support for CP2K 2025 output format for energy and forces parsing, with a reference to the fixed issue.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dpdata/cp2k/output.py`:
- Around line 438-454: In the is_cp2k_2025 branch the fallback loop may leave
energy uninitialized if no numeric token is found; modify the block around
is_cp2k_2025 / parts / hartree_idx so that if neither the "[hartree]" lookup nor
the reversed parts numeric search sets energy you raise a clear exception or log
an error with context (e.g., include the original line string ii and parts)
instead of continuing silently; ensure the raised error (ValueError or custom)
describes that energy could not be parsed from the CP2K 2025 output so the later
assertion yields a helpful message.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ae134fe and ac0d5ba.

📒 Files selected for processing (10)
  • dpdata/cp2k/output.py
  • tests/cp2k/cp2k_2025_output/cp2k_2025_output
  • tests/cp2k/cp2k_2025_output/deepmd/set.000/box.npy
  • tests/cp2k/cp2k_2025_output/deepmd/set.000/coord.npy
  • tests/cp2k/cp2k_2025_output/deepmd/set.000/energy.npy
  • tests/cp2k/cp2k_2025_output/deepmd/set.000/force.npy
  • tests/cp2k/cp2k_2025_output/deepmd/set.000/virial.npy
  • tests/cp2k/cp2k_2025_output/deepmd/type.raw
  • tests/cp2k/cp2k_2025_output/deepmd/type_map.raw
  • tests/test_cp2k_2025_output.py

Comment on lines +438 to +454
if is_cp2k_2025:
# Find the energy value after "[hartree]"
parts = ii.split()
try:
hartree_idx = parts.index("[hartree]")
energy = parts[hartree_idx + 1]
except (ValueError, IndexError):
# Fallback: try to find energy value in the line
for part in reversed(parts):
try:
float(part)
energy = part
break
except ValueError:
continue
else:
energy = ii.split()[8]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential uninitialized variable in fallback path.

If "[hartree]" is not found in parts and the fallback loop doesn't find any numeric value, energy will not be set for this line, but the code continues silently. While the assertion on line 491 will eventually catch this, the error message won't be helpful for debugging malformed CP2K 2025 output.

Consider adding explicit error handling:

Proposed fix
                 if is_cp2k_2025:
                     # Find the energy value after "[hartree]"
                     parts = ii.split()
                     try:
                         hartree_idx = parts.index("[hartree]")
                         energy = parts[hartree_idx + 1]
                     except (ValueError, IndexError):
                         # Fallback: try to find energy value in the line
                         for part in reversed(parts):
                             try:
                                 float(part)
                                 energy = part
                                 break
                             except ValueError:
                                 continue
+                        else:
+                            # Loop completed without finding energy
+                            raise RuntimeError(f"Cannot parse energy from CP2K 2025 line: {ii}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dpdata/cp2k/output.py` around lines 438 - 454, In the is_cp2k_2025 branch the
fallback loop may leave energy uninitialized if no numeric token is found;
modify the block around is_cp2k_2025 / parts / hartree_idx so that if neither
the "[hartree]" lookup nor the reversed parts numeric search sets energy you
raise a clear exception or log an error with context (e.g., include the original
line string ii and parts) instead of continuing silently; ensure the raised
error (ValueError or custom) describes that energy could not be parsed from the
CP2K 2025 output so the later assertion yields a helpful message.

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Mar 1, 2026
@codspeed-hq
Copy link

codspeed-hq bot commented Mar 1, 2026

Merging this PR will not alter performance

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

✅ 2 untouched benchmarks


Comparing newtontech:fix-cp2k-2025-format-v2 (ac0d5ba) with master (ae134fe)

Open in CodSpeed

@wanghan-iapcm
Copy link
Contributor

please also address the review comments of the AI

@codecov
Copy link

codecov bot commented Mar 1, 2026

Codecov Report

❌ Patch coverage is 71.42857% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.30%. Comparing base (ae134fe) to head (ac0d5ba).

Files with missing lines Patch % Lines
dpdata/cp2k/output.py 71.42% 8 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #947      +/-   ##
==========================================
- Coverage   86.37%   86.30%   -0.07%     
==========================================
  Files          86       86              
  Lines        8086     8105      +19     
==========================================
+ Hits         6984     6995      +11     
- Misses       1102     1110       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

OpenClaw Bot and others added 2 commits March 2, 2026 12:08
- Add tests for energy parsing with extra whitespace
- Add tests for FORCES| header line filtering (Atom x y z, Atomic forces)
- Add integration test for CP2K 2025 format with LabeledSystem
- Improve code coverage for CP2K 2025 format support
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
tests/test_cp2k_2025_output.py (2)

175-203: Strengthen force-header edge tests with value assertions.

Both tests only assert atom count (shape[1] == 2). Adding one component check makes header filtering regressions easier to catch.

Suggested enhancement
     def test_cp2k2025_force_parsing_with_header_lines(self):
@@
         try:
             system = dpdata.LabeledSystem(fname, fmt="cp2k/output")
             self.assertEqual(system.data["forces"].shape[1], 2)
+            self.assertAlmostEqual(system.data["forces"][0][0][0], -2.94874881, places=5)
         finally:
             os.unlink(fname)
@@
     def test_cp2k2025_force_parsing_with_atomic_forces_line(self):
@@
         try:
             system = dpdata.LabeledSystem(fname, fmt="cp2k/output")
             self.assertEqual(system.data["forces"].shape[1], 2)
+            self.assertAlmostEqual(system.data["forces"][0][0][0], -2.94874881, places=5)
         finally:
             os.unlink(fname)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_cp2k_2025_output.py` around lines 175 - 203, Update the two tests
test_cp2k2025_force_parsing_with_header_lines and
test_cp2k2025_force_parsing_with_atomic_forces_line to also assert actual force
values (not just atom count) to catch header-filtering regressions: after
creating the LabeledSystem (system = dpdata.LabeledSystem(...)) and before
unlinking, fetch forces = system.data["forces"] and add assertions on specific
components (e.g., forces[0,0] or forces[0,1] / forces[1,0], comparing to the
expected parsed float values from create_cp2k_output_2025's forces_lines) so the
tests validate both shape and numeric content.

1-207: Please run the targeted format test module before merge.

Given this is a format-specific parser/test addition, run the focused test module as a pre-merge gate.

Based on learnings: For format-specific changes, run relevant test modules: cd tests && python -m unittest test_<format>*.py; for core system changes run cd tests && python -m unittest test_system*.py test_multisystems.py.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_cp2k_2025_output.py` around lines 1 - 207, Summary: Reviewer
requests running the format-specific tests for CP2K 2025 before merging. Run the
targeted test module tests/test_cp2k_2025_output.py (which contains
TestCp2k2025Output, TestCp2k2025EdgeCases, and helper create_cp2k_output_2025)
locally with unittest (cd tests && python -m unittest test_cp2k_2025_output.py)
and confirm all cases pass (energy/forces extraction and edge-case variants); if
failures occur, fix the parser code paths referenced by fmt="cp2k/output" and
the energy/force parsing logic, re-run the same command, and add this module to
the CI pre-merge test pattern so format-specific tests are always executed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/test_cp2k_2025_output.py`:
- Around line 114-117: Replace truthiness checks with explicit None checks:
change "if energy_line:" to "if energy_line is not None:" and similarly change
any checks that currently use truthiness for override inputs to "is not None"
(e.g., the override input variables used in the second block around the f.write
call). Update both the energy_line branch and the override input branch (the
latter referenced in the comment as lines 127-130) so empty strings/empty lists
are treated as valid overrides instead of falling back to defaults.
- Around line 163-172: The test
test_cp2k2025_energy_parsing_with_extra_whitespace currently only checks for
presence of energies; add an explicit numeric assertion that the parsed energy
equals the expected value from the input string (-7.364190264587725) to ensure
correct parsing. Locate the call that constructs the system via
dpdata.LabeledSystem and replace or augment the non-null check with an assertion
using the test framework (e.g., self.assertAlmostEqual or self.assertEqual with
a tolerance) against the energy entry in system.data["energies"] (e.g., the
first energy element), referencing the helper create_cp2k_output_2025 for the
expected value. Ensure you account for array shape (use the correct index into
system.data["energies"]) and use a small tolerance like 1e-12 to avoid
floating-point comparison issues.

---

Nitpick comments:
In `@tests/test_cp2k_2025_output.py`:
- Around line 175-203: Update the two tests
test_cp2k2025_force_parsing_with_header_lines and
test_cp2k2025_force_parsing_with_atomic_forces_line to also assert actual force
values (not just atom count) to catch header-filtering regressions: after
creating the LabeledSystem (system = dpdata.LabeledSystem(...)) and before
unlinking, fetch forces = system.data["forces"] and add assertions on specific
components (e.g., forces[0,0] or forces[0,1] / forces[1,0], comparing to the
expected parsed float values from create_cp2k_output_2025's forces_lines) so the
tests validate both shape and numeric content.
- Around line 1-207: Summary: Reviewer requests running the format-specific
tests for CP2K 2025 before merging. Run the targeted test module
tests/test_cp2k_2025_output.py (which contains TestCp2k2025Output,
TestCp2k2025EdgeCases, and helper create_cp2k_output_2025) locally with unittest
(cd tests && python -m unittest test_cp2k_2025_output.py) and confirm all cases
pass (energy/forces extraction and edge-case variants); if failures occur, fix
the parser code paths referenced by fmt="cp2k/output" and the energy/force
parsing logic, re-run the same command, and add this module to the CI pre-merge
test pattern so format-specific tests are always executed.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ac0d5ba and 7478a4c.

📒 Files selected for processing (1)
  • tests/test_cp2k_2025_output.py

Comment on lines +114 to +117
if energy_line:
f.write(energy_line + "\n")
else:
f.write(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use explicit None checks for override inputs.

Line 114 and Line 127 use truthiness checks, so empty overrides ("" / []) cannot be tested and silently fall back to defaults.

Suggested fix
-            if energy_line:
+            if energy_line is not None:
                 f.write(energy_line + "\n")
             else:
                 f.write(
                     " ENERGY| Total FORCE_EVAL ( QS ) energy [hartree] -7.364190264587725\n"
                 )
@@
-            if forces_lines:
+            if forces_lines is not None:
                 for line in forces_lines:
                     f.write(line + "\n")
             else:
                 f.write(
                     " FORCES| 1 -5.73440344E-02 2.95274914E-02 -1.50988167E-02 6.62433792E-02\n"
                 )

Also applies to: 127-130

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_cp2k_2025_output.py` around lines 114 - 117, Replace truthiness
checks with explicit None checks: change "if energy_line:" to "if energy_line is
not None:" and similarly change any checks that currently use truthiness for
override inputs to "is not None" (e.g., the override input variables used in the
second block around the f.write call). Update both the energy_line branch and
the override input branch (the latter referenced in the comment as lines
127-130) so empty strings/empty lists are treated as valid overrides instead of
falling back to defaults.

Comment on lines +163 to +172
def test_cp2k2025_energy_parsing_with_extra_whitespace(self):
"""Test energy parsing with extra whitespace around value (coverage for parsing robustness)."""
fname = self.create_cp2k_output_2025(
energy_line=" ENERGY| Total FORCE_EVAL ( QS ) energy [hartree] -7.364190264587725 "
)
try:
system = dpdata.LabeledSystem(fname, fmt="cp2k/output")
self.assertIsNotNone(system.data["energies"])
self.assertEqual(system.data["forces"].shape[1], 2)
finally:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The whitespace-energy test does not validate the parsed energy value.

This test currently only checks non-null fields, so incorrect energy parsing could still pass. Add an explicit numeric assertion for Line 166 input.

Suggested fix
     def test_cp2k2025_energy_parsing_with_extra_whitespace(self):
         """Test energy parsing with extra whitespace around value (coverage for parsing robustness)."""
         fname = self.create_cp2k_output_2025(
             energy_line=" ENERGY| Total FORCE_EVAL ( QS ) energy [hartree]   -7.364190264587725  "
         )
         try:
             system = dpdata.LabeledSystem(fname, fmt="cp2k/output")
-            self.assertIsNotNone(system.data["energies"])
+            self.assertAlmostEqual(
+                system.data["energies"][0], -200.3898256786414, places=5
+            )
             self.assertEqual(system.data["forces"].shape[1], 2)
         finally:
             os.unlink(fname)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_cp2k_2025_output.py` around lines 163 - 172, The test
test_cp2k2025_energy_parsing_with_extra_whitespace currently only checks for
presence of energies; add an explicit numeric assertion that the parsed energy
equals the expected value from the input string (-7.364190264587725) to ensure
correct parsing. Locate the call that constructs the system via
dpdata.LabeledSystem and replace or augment the non-null check with an assertion
using the test framework (e.g., self.assertAlmostEqual or self.assertEqual with
a tolerance) against the energy entry in system.data["energies"] (e.g., the
first energy element), referencing the helper create_cp2k_output_2025 for the
expected value. Ensure you account for array shape (use the correct index into
system.data["energies"]) and use a small tolerance like 1e-12 to avoid
floating-point comparison issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cp2k lgtm This PR has been approved by a maintainer size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants