fix: Support CP2K 2025 output format for energy and forces (fixes #850) by newtontech · Pull Request #947 · deepmodeling/dpdata

newtontech · 2026-02-28T03:33:17Z

This commit adds support for parsing CP2K 2025 version output files:

Changes in CP2K 2025 format:

Energy line format changed from: 'ENERGY| Total FORCE_EVAL ( QS ) energy (a.u.): -7.997403996236343' to: 'ENERGY| Total FORCE_EVAL ( QS ) energy [hartree] -7.364190264587725'
Forces output format changed from:
'ATOMIC FORCES in [a.u.]' table with ' Atom Kind Element X Y Z' header
to:
'FORCES| Atomic forces [hartree/bohr]' with 'FORCES| Atom x y z |f|' prefix lines

Implementation:

Detect CP2K 2025 format by checking for 'energy [hartree]' in the content
Parse energy from new '[hartree]' format
Parse forces from new 'FORCES|' prefixed lines
Maintain backward compatibility with CP2K 2023 format

Testing:

Added test file for CP2K 2025 format (tests/cp2k/cp2k_2025_output/)
Added test case TestCp2k2025Output to verify parsing
Added regression test TestCp2k2023FormatStillWorks to ensure backward compatibility
All existing CP2K tests pass

Summary by CodeRabbit

New Features
- Added support for CP2K 2025 output parsing with automatic format detection and robust energy/force extraction alongside existing CP2K 2023 handling.
Tests
- Added comprehensive tests and fixtures for CP2K 2025 parsing, including edge-case scenarios, plus regression tests ensuring CP2K 2023 behavior remains unchanged.

…pmodeling#850) This commit adds support for parsing CP2K 2025 version output files: **Changes in CP2K 2025 format:** 1. Energy line format changed from: 'ENERGY| Total FORCE_EVAL ( QS ) energy (a.u.): -7.997403996236343' to: 'ENERGY| Total FORCE_EVAL ( QS ) energy [hartree] -7.364190264587725' 2. Forces output format changed from: 'ATOMIC FORCES in [a.u.]' table with ' Atom Kind Element X Y Z' header to: 'FORCES| Atomic forces [hartree/bohr]' with 'FORCES| Atom x y z |f|' prefix lines **Implementation:** - Detect CP2K 2025 format by checking for 'energy [hartree]' in the content - Parse energy from new '[hartree]' format - Parse forces from new 'FORCES|' prefixed lines - Maintain backward compatibility with CP2K 2023 format **Testing:** - Added test file for CP2K 2025 format (tests/cp2k/cp2k_2025_output/) - Added test case TestCp2k2025Output to verify parsing - Added regression test TestCp2k2023FormatStillWorks to ensure backward compatibility - All existing CP2K tests pass

for more information, see https://pre-commit.ci

coderabbitai · 2026-02-28T03:34:18Z

📝 Walkthrough

Walkthrough

Adds CP2K 2025 output support to dpdata by detecting a "energy [hartree]" token and branching parsing logic to extract energies and forces from the new CP2K 2025 line formats while preserving CP2K 2023 behavior. New test fixtures and unit tests validate parsing and edge cases.

Changes

Cohort / File(s)	Summary
Core implementation `dpdata/cp2k/output.py`	Adds CP2K 2025 detection flag (scan for "energy [hartree]") and branching parsing: energy extraction after "[hartree]" token with numeric-token fallback; new forces parsing from lines containing `FORCES
Test fixtures `tests/cp2k/cp2k_2025_output/cp2k_2025_output`, `tests/cp2k/cp2k_2025_output/deepmd/type.raw`, `tests/cp2k/cp2k_2025_output/deepmd/type_map.raw`	Adds CP2K 2025 sample output capturing program header, SCF, energy, forces, stress, coordinates, and small DeepMD type/map files for the fixture.
Test cases `tests/test_cp2k_2025_output.py`	Adds unit tests covering CP2K 2025 energy/force extraction, regression tests for CP2K 2023, and edge-case tests (extra whitespace, header lines, atomic forces line variants).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

[BUG] dpdata's plugin cp2kdata cannot extract energy and force parameters from the output file of the cp2k.2025 version. #850 — Matches: implements CP2K 2025 detection and energy/force parsing that the issue reported as failing.

Suggested reviewers

njzjz
pxlxingliang

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 70.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically summarizes the main change: adding support for CP2K 2025 output format for energy and forces parsing, with a reference to the fixed issue.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dpdata/cp2k/output.py`:
- Around line 438-454: In the is_cp2k_2025 branch the fallback loop may leave
energy uninitialized if no numeric token is found; modify the block around
is_cp2k_2025 / parts / hartree_idx so that if neither the "[hartree]" lookup nor
the reversed parts numeric search sets energy you raise a clear exception or log
an error with context (e.g., include the original line string ii and parts)
instead of continuing silently; ensure the raised error (ValueError or custom)
describes that energy could not be parsed from the CP2K 2025 output so the later
assertion yields a helpful message.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ae134fe and ac0d5ba.

📒 Files selected for processing (10)

dpdata/cp2k/output.py
tests/cp2k/cp2k_2025_output/cp2k_2025_output
tests/cp2k/cp2k_2025_output/deepmd/set.000/box.npy
tests/cp2k/cp2k_2025_output/deepmd/set.000/coord.npy
tests/cp2k/cp2k_2025_output/deepmd/set.000/energy.npy
tests/cp2k/cp2k_2025_output/deepmd/set.000/force.npy
tests/cp2k/cp2k_2025_output/deepmd/set.000/virial.npy
tests/cp2k/cp2k_2025_output/deepmd/type.raw
tests/cp2k/cp2k_2025_output/deepmd/type_map.raw
tests/test_cp2k_2025_output.py

coderabbitai · 2026-02-28T03:36:59Z

dpdata/cp2k/output.py

+                if is_cp2k_2025:
+                    # Find the energy value after "[hartree]"
+                    parts = ii.split()
+                    try:
+                        hartree_idx = parts.index("[hartree]")
+                        energy = parts[hartree_idx + 1]
+                    except (ValueError, IndexError):
+                        # Fallback: try to find energy value in the line
+                        for part in reversed(parts):
+                            try:
+                                float(part)
+                                energy = part
+                                break
+                            except ValueError:
+                                continue
+                else:
+                    energy = ii.split()[8]


⚠️ Potential issue | 🟡 Minor

Potential uninitialized variable in fallback path.

If "[hartree]" is not found in parts and the fallback loop doesn't find any numeric value, energy will not be set for this line, but the code continues silently. While the assertion on line 491 will eventually catch this, the error message won't be helpful for debugging malformed CP2K 2025 output.

Consider adding explicit error handling:

Proposed fix

if is_cp2k_2025: # Find the energy value after "[hartree]" parts = ii.split() try: hartree_idx = parts.index("[hartree]") energy = parts[hartree_idx + 1] except (ValueError, IndexError): # Fallback: try to find energy value in the line for part in reversed(parts): try: float(part) energy = part break except ValueError: continue + else: + # Loop completed without finding energy + raise RuntimeError(f"Cannot parse energy from CP2K 2025 line: {ii}")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@dpdata/cp2k/output.py` around lines 438 - 454, In the is_cp2k_2025 branch the fallback loop may leave energy uninitialized if no numeric token is found; modify the block around is_cp2k_2025 / parts / hartree_idx so that if neither the "[hartree]" lookup nor the reversed parts numeric search sets energy you raise a clear exception or log an error with context (e.g., include the original line string ii and parts) instead of continuing silently; ensure the raised error (ValueError or custom) describes that energy could not be parsed from the CP2K 2025 output so the later assertion yields a helpful message.

codspeed-hq · 2026-03-01T09:30:12Z

Merging this PR will not alter performance

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

✅ 2 untouched benchmarks

_{Comparing newtontech:fix-cp2k-2025-format-v2 (ac0d5ba) with master (ae134fe)}

wanghan-iapcm · 2026-03-01T09:30:23Z

please also address the review comments of the AI

codecov · 2026-03-01T09:32:53Z

Codecov Report

❌ Patch coverage is 71.42857% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.30%. Comparing base (ae134fe) to head (ac0d5ba).

Files with missing lines	Patch %	Lines
dpdata/cp2k/output.py	71.42%	8 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #947      +/-   ##
==========================================
- Coverage   86.37%   86.30%   -0.07%     
==========================================
  Files          86       86              
  Lines        8086     8105      +19     
==========================================
+ Hits         6984     6995      +11     
- Misses       1102     1110       +8

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- Add tests for energy parsing with extra whitespace - Add tests for FORCES| header line filtering (Atom x y z, Atomic forces) - Add integration test for CP2K 2025 format with LabeledSystem - Improve code coverage for CP2K 2025 format support

for more information, see https://pre-commit.ci

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

tests/test_cp2k_2025_output.py (2)

175-203: Strengthen force-header edge tests with value assertions.

Both tests only assert atom count (shape[1] == 2). Adding one component check makes header filtering regressions easier to catch.

Suggested enhancement

     def test_cp2k2025_force_parsing_with_header_lines(self):
@@
         try:
             system = dpdata.LabeledSystem(fname, fmt="cp2k/output")
             self.assertEqual(system.data["forces"].shape[1], 2)
+            self.assertAlmostEqual(system.data["forces"][0][0][0], -2.94874881, places=5)
         finally:
             os.unlink(fname)
@@
     def test_cp2k2025_force_parsing_with_atomic_forces_line(self):
@@
         try:
             system = dpdata.LabeledSystem(fname, fmt="cp2k/output")
             self.assertEqual(system.data["forces"].shape[1], 2)
+            self.assertAlmostEqual(system.data["forces"][0][0][0], -2.94874881, places=5)
         finally:
             os.unlink(fname)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/test_cp2k_2025_output.py` around lines 175 - 203, Update the two tests
test_cp2k2025_force_parsing_with_header_lines and
test_cp2k2025_force_parsing_with_atomic_forces_line to also assert actual force
values (not just atom count) to catch header-filtering regressions: after
creating the LabeledSystem (system = dpdata.LabeledSystem(...)) and before
unlinking, fetch forces = system.data["forces"] and add assertions on specific
components (e.g., forces[0,0] or forces[0,1] / forces[1,0], comparing to the
expected parsed float values from create_cp2k_output_2025's forces_lines) so the
tests validate both shape and numeric content.

1-207: Please run the targeted format test module before merge.

Given this is a format-specific parser/test addition, run the focused test module as a pre-merge gate.

Based on learnings: For format-specific changes, run relevant test modules: cd tests && python -m unittest test_<format>*.py; for core system changes run cd tests && python -m unittest test_system*.py test_multisystems.py.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/test_cp2k_2025_output.py` around lines 1 - 207, Summary: Reviewer
requests running the format-specific tests for CP2K 2025 before merging. Run the
targeted test module tests/test_cp2k_2025_output.py (which contains
TestCp2k2025Output, TestCp2k2025EdgeCases, and helper create_cp2k_output_2025)
locally with unittest (cd tests && python -m unittest test_cp2k_2025_output.py)
and confirm all cases pass (energy/forces extraction and edge-case variants); if
failures occur, fix the parser code paths referenced by fmt="cp2k/output" and
the energy/force parsing logic, re-run the same command, and add this module to
the CI pre-merge test pattern so format-specific tests are always executed.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/test_cp2k_2025_output.py`:
- Around line 114-117: Replace truthiness checks with explicit None checks:
change "if energy_line:" to "if energy_line is not None:" and similarly change
any checks that currently use truthiness for override inputs to "is not None"
(e.g., the override input variables used in the second block around the f.write
call). Update both the energy_line branch and the override input branch (the
latter referenced in the comment as lines 127-130) so empty strings/empty lists
are treated as valid overrides instead of falling back to defaults.
- Around line 163-172: The test
test_cp2k2025_energy_parsing_with_extra_whitespace currently only checks for
presence of energies; add an explicit numeric assertion that the parsed energy
equals the expected value from the input string (-7.364190264587725) to ensure
correct parsing. Locate the call that constructs the system via
dpdata.LabeledSystem and replace or augment the non-null check with an assertion
using the test framework (e.g., self.assertAlmostEqual or self.assertEqual with
a tolerance) against the energy entry in system.data["energies"] (e.g., the
first energy element), referencing the helper create_cp2k_output_2025 for the
expected value. Ensure you account for array shape (use the correct index into
system.data["energies"]) and use a small tolerance like 1e-12 to avoid
floating-point comparison issues.

---

Nitpick comments:
In `@tests/test_cp2k_2025_output.py`:
- Around line 175-203: Update the two tests
test_cp2k2025_force_parsing_with_header_lines and
test_cp2k2025_force_parsing_with_atomic_forces_line to also assert actual force
values (not just atom count) to catch header-filtering regressions: after
creating the LabeledSystem (system = dpdata.LabeledSystem(...)) and before
unlinking, fetch forces = system.data["forces"] and add assertions on specific
components (e.g., forces[0,0] or forces[0,1] / forces[1,0], comparing to the
expected parsed float values from create_cp2k_output_2025's forces_lines) so the
tests validate both shape and numeric content.
- Around line 1-207: Summary: Reviewer requests running the format-specific
tests for CP2K 2025 before merging. Run the targeted test module
tests/test_cp2k_2025_output.py (which contains TestCp2k2025Output,
TestCp2k2025EdgeCases, and helper create_cp2k_output_2025) locally with unittest
(cd tests && python -m unittest test_cp2k_2025_output.py) and confirm all cases
pass (energy/forces extraction and edge-case variants); if failures occur, fix
the parser code paths referenced by fmt="cp2k/output" and the energy/force
parsing logic, re-run the same command, and add this module to the CI pre-merge
test pattern so format-specific tests are always executed.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ac0d5ba and 7478a4c.

📒 Files selected for processing (1)

tests/test_cp2k_2025_output.py

coderabbitai · 2026-03-02T04:12:03Z

tests/test_cp2k_2025_output.py

+            if energy_line:
+                f.write(energy_line + "\n")
+            else:
+                f.write(


⚠️ Potential issue | 🟡 Minor

Use explicit None checks for override inputs.

Line 114 and Line 127 use truthiness checks, so empty overrides ("" / []) cannot be tested and silently fall back to defaults.

Suggested fix

- if energy_line: + if energy_line is not None: f.write(energy_line + "\n") else: f.write( " ENERGY| Total FORCE_EVAL ( QS ) energy [hartree] -7.364190264587725\n" ) @@ - if forces_lines: + if forces_lines is not None: for line in forces_lines: f.write(line + "\n") else: f.write( " FORCES| 1 -5.73440344E-02 2.95274914E-02 -1.50988167E-02 6.62433792E-02\n" )

Also applies to: 127-130

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/test_cp2k_2025_output.py` around lines 114 - 117, Replace truthiness checks with explicit None checks: change "if energy_line:" to "if energy_line is not None:" and similarly change any checks that currently use truthiness for override inputs to "is not None" (e.g., the override input variables used in the second block around the f.write call). Update both the energy_line branch and the override input branch (the latter referenced in the comment as lines 127-130) so empty strings/empty lists are treated as valid overrides instead of falling back to defaults.

coderabbitai · 2026-03-02T04:12:03Z

tests/test_cp2k_2025_output.py

+    def test_cp2k2025_energy_parsing_with_extra_whitespace(self):
+        """Test energy parsing with extra whitespace around value (coverage for parsing robustness)."""
+        fname = self.create_cp2k_output_2025(
+            energy_line=" ENERGY| Total FORCE_EVAL ( QS ) energy [hartree]   -7.364190264587725  "
+        )
+        try:
+            system = dpdata.LabeledSystem(fname, fmt="cp2k/output")
+            self.assertIsNotNone(system.data["energies"])
+            self.assertEqual(system.data["forces"].shape[1], 2)
+        finally:


⚠️ Potential issue | 🟠 Major

The whitespace-energy test does not validate the parsed energy value.

This test currently only checks non-null fields, so incorrect energy parsing could still pass. Add an explicit numeric assertion for Line 166 input.

Suggested fix

def test_cp2k2025_energy_parsing_with_extra_whitespace(self): """Test energy parsing with extra whitespace around value (coverage for parsing robustness).""" fname = self.create_cp2k_output_2025( energy_line=" ENERGY| Total FORCE_EVAL ( QS ) energy [hartree] -7.364190264587725 " ) try: system = dpdata.LabeledSystem(fname, fmt="cp2k/output") - self.assertIsNotNone(system.data["energies"]) + self.assertAlmostEqual( + system.data["energies"][0], -200.3898256786414, places=5 + ) self.assertEqual(system.data["forces"].shape[1], 2) finally: os.unlink(fname)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/test_cp2k_2025_output.py` around lines 163 - 172, The test test_cp2k2025_energy_parsing_with_extra_whitespace currently only checks for presence of energies; add an explicit numeric assertion that the parsed energy equals the expected value from the input string (-7.364190264587725) to ensure correct parsing. Locate the call that constructs the system via dpdata.LabeledSystem and replace or augment the non-null check with an assertion using the test framework (e.g., self.assertAlmostEqual or self.assertEqual with a tolerance) against the energy entry in system.data["energies"] (e.g., the first energy element), referencing the helper create_cp2k_output_2025 for the expected value. Ensure you account for array shape (use the correct index into system.data["energies"]) and use a small tolerance like 1e-12 to avoid floating-point comparison issues.

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Feb 28, 2026

[pre-commit.ci] auto fixes from pre-commit.com hooks

ac0d5ba

for more information, see https://pre-commit.ci

dosubot bot added the cp2k label Feb 28, 2026

coderabbitai bot reviewed Feb 28, 2026

View reviewed changes

wanghan-iapcm approved these changes Mar 1, 2026

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Mar 1, 2026

OpenClaw Bot and others added 2 commits March 2, 2026 12:08

[pre-commit.ci] auto fixes from pre-commit.com hooks

7478a4c

for more information, see https://pre-commit.ci

coderabbitai bot reviewed Mar 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Support CP2K 2025 output format for energy and forces (fixes #850)#947

fix: Support CP2K 2025 output format for energy and forces (fixes #850)#947
newtontech wants to merge 4 commits intodeepmodeling:masterfrom
newtontech:fix-cp2k-2025-format-v2

newtontech commented Feb 28, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 28, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 28, 2026

Uh oh!

codspeed-hq bot commented Mar 1, 2026

Uh oh!

wanghan-iapcm commented Mar 1, 2026

Uh oh!

codecov bot commented Mar 1, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 2, 2026

Uh oh!

coderabbitai bot Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

newtontech commented Feb 28, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

codspeed-hq bot commented Mar 1, 2026

Merging this PR will not alter performance

Uh oh!

wanghan-iapcm commented Mar 1, 2026

Uh oh!

codecov bot commented Mar 1, 2026

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

newtontech commented Feb 28, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 28, 2026 •

edited

Loading