Search NCBI for papers associated with private datasets#606
Search NCBI for papers associated with private datasets#606vagisha wants to merge 27 commits intorelease25.11-SNAPSHOTfrom
Conversation
…he PubMed Id, PubMed search strategy, and user dismissal status - Updated panoramapublic.xml to be consistent with changes to DatasetStatus - Changed schema versionf rom 25.003 to 25.004
- Added checkbox to enable publication check in the private data reminder settings form - Added "Enable Publication Check" checkbox on sendPrivateDataRemindersForm.jsp that can override the value in the saved settings. - Added DismissPubMedSuggestionAction - Added CheckPubMedForDatasetAction to perform publication check on one dataset - New method added to PanoramaPublicNotification for sending message about published paper match - Updated PrivateDataReminderJob to check for publications
…matches will have PubMed Ids. - Post notification to support thread when publication suggestion is dismissed.
- Removed redundant constants - Removed redundant class PublicationCheckResult
…create date in the sorting logic
- Parameterized log calls.
…icationsForDatasetApiAction
…g code in the action class - Added toJson method to PublicationMatch.
- Display the citation in the notification message as well as the result page when querying publications for a single dataset.
- Mock NCBI service for TeamCity tests - Fixed notification messages
- Added API action class to register mock publication data with the MockNcbiPublicationSearchService - Updated PublicationSearchTest - test one more dataset
- Fixed citation retrieval through mock service - Fixed confirm messages
- Added "tool" as parameter to EUtils requests - Require at least two keywords in title for keyword match - Updated unit tests
…rred re-search after dismissal - Re-search NCBI after deferral expires; clear dismissal if different publication found - Add publication search frequency (default 3 months) to admin UI and settings form - Update tests to match new log messages and Date-based dismissal
…exception fetching the citation from NCBI, display the publication Id label instead.
… prevent stale reads - Fixed comment for PUBMED_ID regex - In searchPublicationsForDataset.jsp display the publication Id label when citation is not available.
… have to lookup the citation for a saved publicationId. - Added test: running the pipeline job in "test" mode should not update rows in the DatasetStatus table - Updated PublicationMatch constructor and fromMatchInfo method - Fixed comments, renamed variables, class.
…publication" message. - Improve author match and title match logic
| private List<String> executeSearch(String query, String database, Logger log) | ||
| { | ||
| String encodedQuery = URLEncoder.encode(query, StandardCharsets.UTF_8); | ||
| String url = ESEARCH_URL + |
There was a problem hiding this comment.
These parameters should be URL-encoded. encodedQuery is already handled but unless database and others are known to the safe, they should be encoded.
| if (ids.isEmpty()) return Collections.emptyMap(); | ||
|
|
||
| String idString = String.join(",", ids); | ||
| String url = ESUMMARY_URL + |
| } | ||
|
|
||
| // Search PubMed: "LastName FirstName[Author] AND Title NOT preprint[Publication Type]" | ||
| String query = String.format("%s %s[Author] AND %s NOT preprint[Publication Type]", |
There was a problem hiding this comment.
I don't know the PubMed search syntax very well, but any need to do escaping here?
| if (title == null) return ""; | ||
|
|
||
| return stripDiacritics(title.toLowerCase()) | ||
| .replaceAll("<[^>]+>", " ") // Strip HTML/XML tags |
There was a problem hiding this comment.
If we need to strip tags, do we also need to HTML-decode?
| } | ||
| catch (InterruptedException e) | ||
| { | ||
| Thread.currentThread().interrupt(); |
There was a problem hiding this comment.
I think this state will already be set by virtue of the exception, but shouldn't be harmful either.
| */ | ||
| private static String quote(String str) | ||
| { | ||
| return "\"" + str + "\""; |
There was a problem hiding this comment.
Should this escape quotes within the string?
| @Override | ||
| public boolean handlePost(NotifySubmitterForm form, BindException errors) throws Exception | ||
| { | ||
| ExperimentAnnotations exptAnnotations = ExperimentAnnotationsManager.get(form.getId()); |
There was a problem hiding this comment.
Should this also check that the container matches?
| return new SimpleErrorView(errors); | ||
| } | ||
|
|
||
| form.setPubmedId(_copiedExperiment.getPubmedId()); |
There was a problem hiding this comment.
Was this intentional? If so, great. If not, we're not propagating the ID anymore.
| PanoramaPublicNotification.postPrivateDataReminderMessage( | ||
| journal, submission, exptAnnotations, submitter, getUser(), notifyUsers, | ||
| _announcement, _announcementsContainer, getUser(), selectedMatch); |
There was a problem hiding this comment.
Should this and subsequent updates in this method be transacted?
Rationale
Enhance the Private Data Reminder system to automatically detect publications associated with private datasets by searching PubMed Central and PubMed. When a publication is found, the reminder message includes the citation and encourages the submitter to make their data public.
Related Pull Requests
Changes
NcbiPublicationSearchServicethat searches for publications inPrivateDataReminderJobwhen enabled in the admin consolePrivateDataReminderJob, instead of the usual reminder message, a "publication found" message is posted to the submitterDatasetStatuscaches publication result to avoid repeated calls to NCBI's EUtils endpointsNcbiPublicationSearchServiceImplNcbiPublicationSearchServiceImpl.PublicationSearchTest.private-data-reminders-overview.md
SPEC-SUMMARY.md