Skip to content

Search NCBI for papers associated with private datasets#606

Open
vagisha wants to merge 27 commits intorelease25.11-SNAPSHOTfrom
25.11_fb_pubmed-publication-notification
Open

Search NCBI for papers associated with private datasets#606
vagisha wants to merge 27 commits intorelease25.11-SNAPSHOTfrom
25.11_fb_pubmed-publication-notification

Conversation

@vagisha
Copy link
Collaborator

@vagisha vagisha commented Mar 10, 2026

Rationale

Enhance the Private Data Reminder system to automatically detect publications associated with private datasets by searching PubMed Central and PubMed. When a publication is found, the reminder message includes the citation and encourages the submitter to make their data public.

Related Pull Requests

Changes

  • Added a setting in the admin console (Private Data Reminder Settings) to enable publication search
  • Added NcbiPublicationSearchService that searches for publications in
    • PubMed Central (full-text available): search terms ProteomeXchange Id, Panorama Public short URL, Panorama Public DOI. Author and title verified.
    • PubMed - fallback if no matches found in PMC; search author and title only since PubMed does not have full-text articles
    • Preprints (e.g. biorxiv, medrxiv etc.) are filtered out
    • Results are prioritized by number of matched search terms (e.g. PX ID, Panorama link) as well as paper publication date proximity to data submission date
  • Publication search happens automatically in the PrivateDataReminderJob when enabled in the admin console
  • Publications for a single dataset can also be searched through the Search Publications menu item in the TargetedMS Experiment webpart menu
  • If a publication match is found during the PrivateDataReminderJob, instead of the usual reminder message, a "publication found" message is posted to the submitter
  • Submitter can dismiss suggested publication by clicking the Dismiss Publication Suggestion link in the message.
  • DatasetStatus caches publication result to avoid repeated calls to NCBI's EUtils endpoints
  • Moved NCBI citation lookup methods to NcbiPublicationSearchServiceImpl
  • Tests
    • Unit tests added in NcbiPublicationSearchServiceImpl.
    • Added Selenium test PublicationSearchTest.
      • Test uses a mock NCBI service on TeamCity (mocking only the outbound HTTP requests) so all real search and filtering logic is exercised without live API calls.

private-data-reminders-overview.md
SPEC-SUMMARY.md

…he PubMed Id, PubMed search strategy, and user dismissal status

- Updated panoramapublic.xml to be consistent with changes to DatasetStatus
- Changed schema versionf rom 25.003 to 25.004
- Added checkbox to enable publication check in the private data reminder settings form
- Added "Enable Publication Check" checkbox on sendPrivateDataRemindersForm.jsp that can override the value in the saved settings.
- Added DismissPubMedSuggestionAction
- Added CheckPubMedForDatasetAction to perform publication check on one dataset
- New method added to PanoramaPublicNotification for sending message about published paper match
- Updated PrivateDataReminderJob to check for publications
…matches will have PubMed Ids.

- Post notification to support thread when publication suggestion is dismissed.
- Removed redundant constants
- Removed redundant class PublicationCheckResult
- Parameterized log calls.
…g code in the action class

- Added toJson method to PublicationMatch.
- Display the citation in the notification message as well as the result page when querying publications for a single dataset.
- Mock NCBI service for TeamCity tests
- Fixed notification messages
- Added API action class to register mock publication data with the MockNcbiPublicationSearchService
- Updated PublicationSearchTest - test one more dataset
- Fixed citation retrieval through mock service
- Fixed confirm messages
- Added "tool" as parameter to EUtils requests
- Require at least two keywords in title for keyword match
- Updated unit tests
…rred re-search after dismissal

- Re-search NCBI after deferral expires; clear dismissal if different publication found
- Add publication search frequency (default 3 months) to admin UI and settings form
- Update tests to match new log messages and Date-based dismissal
…exception fetching the citation from NCBI, display the publication Id label instead.
… prevent stale reads

- Fixed comment for PUBMED_ID regex
- In searchPublicationsForDataset.jsp display the publication Id label when citation is not available.
… have to lookup the citation for a saved publicationId.

- Added test: running the pipeline job in "test" mode should not update rows in the DatasetStatus table
- Updated PublicationMatch constructor and fromMatchInfo method
- Fixed comments, renamed variables, class.
…publication" message.

- Improve author match and title match logic
@vagisha vagisha requested a review from labkey-jeckels March 10, 2026 03:33
private List<String> executeSearch(String query, String database, Logger log)
{
String encodedQuery = URLEncoder.encode(query, StandardCharsets.UTF_8);
String url = ESEARCH_URL +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These parameters should be URL-encoded. encodedQuery is already handled but unless database and others are known to the safe, they should be encoded.

if (ids.isEmpty()) return Collections.emptyMap();

String idString = String.join(",", ids);
String url = ESUMMARY_URL +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

URL encoding

}

// Search PubMed: "LastName FirstName[Author] AND Title NOT preprint[Publication Type]"
String query = String.format("%s %s[Author] AND %s NOT preprint[Publication Type]",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know the PubMed search syntax very well, but any need to do escaping here?

if (title == null) return "";

return stripDiacritics(title.toLowerCase())
.replaceAll("<[^>]+>", " ") // Strip HTML/XML tags
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need to strip tags, do we also need to HTML-decode?

}
catch (InterruptedException e)
{
Thread.currentThread().interrupt();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this state will already be set by virtue of the exception, but shouldn't be harmful either.

*/
private static String quote(String str)
{
return "\"" + str + "\"";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this escape quotes within the string?

@Override
public boolean handlePost(NotifySubmitterForm form, BindException errors) throws Exception
{
ExperimentAnnotations exptAnnotations = ExperimentAnnotationsManager.get(form.getId());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also check that the container matches?

return new SimpleErrorView(errors);
}

form.setPubmedId(_copiedExperiment.getPubmedId());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this intentional? If so, great. If not, we're not propagating the ID anymore.

Comment on lines +10991 to +10993
PanoramaPublicNotification.postPrivateDataReminderMessage(
journal, submission, exptAnnotations, submitter, getUser(), notifyUsers,
_announcement, _announcementsContainer, getUser(), selectedMatch);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this and subsequent updates in this method be transacted?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants