Skip to content

Add Smithsonian process and report script#278

Open
oree-xx wants to merge 8 commits intomainfrom
process_report
Open

Add Smithsonian process and report script#278
oree-xx wants to merge 8 commits intomainfrom
process_report

Conversation

@oree-xx
Copy link
Contributor

@oree-xx oree-xx commented Jan 29, 2026

Fixes

Description

  • Added stacked bar plot in plotly.py
  • Completing the process and report stage for Smithsonian
  • Added analysis of units by top 10 and lowest 10

Checklist

  • I have read and understood the Developer Certificate of Origin (DCO), below, which covers the contents of this pull request (PR).
  • My pull request doesn't include code or content generated with AI (also see Avoiding generative AI development tools — Creative Commons Open Source).
  • My pull request has a descriptive title (not a vague title like Update index.md).
  • My pull request targets the default branch of the repository (main or master).
  • My commit messages follow best practices.
  • My code follows the established code style of the repository.
  • I added or updated unit tests and/or test scripts for the changes I made (if applicable).
  • I added or updated documentation (if applicable).
  • I tried running the project locally and verified that there are no
    visible errors.

Developer Certificate of Origin

For the purposes of this DCO, "license" is equivalent to "license or public domain dedication," and "open source license" is equivalent to "open content license or public domain dedication."

Developer Certificate of Origin
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

@oree-xx oree-xx requested review from a team as code owners January 29, 2026 09:39
@oree-xx oree-xx requested review from TimidRobot and possumbilities and removed request for a team January 29, 2026 09:39
Copy link
Member

@TimidRobot TimidRobot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to see this coming together! Acknowledging this is a WIP, you'll need to update labels and descriptions to provide context and make the data more meaningful.

It might also be worth looking to see if there's any way to automatically get the full names for units. I have no idea what the abbreviations mean.

@oree-xx oree-xx changed the title [WIP] Smithsonian and stacked bar plot Add Smithsonian process and report script Feb 6, 2026
@TimidRobot TimidRobot self-assigned this Feb 9, 2026
Copy link
Member

@TimidRobot TimidRobot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is coming along nicely, keep up the good work!

]
QUARTER = os.path.basename(PATHS["data_quarter"])

unit_map = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment indicating how this data was compiled. Something like "Manually compiled from information on URL".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, does the source(s) have information on:

2026-02-09 16:31:34,002 - WARNING - smithsonian_fetch - New unit code(s) not in unit_map: ['ACAH', 'CHSDM', 'EEPA', 'FSA', 'NAA', 'NASMAC', 'NMAIA', 'NPMA', 'SAAMPAIK', 'SI']

"TOTAL_OBJECTS",
]
HEADER_2_UNITS = [
"UNIT",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to store both unit code and unit names since the unit names are not available from the API.

"Overview",
None,
None,
"The Smithsonian data returns the overall "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove space at end of string

Comment on lines +126 to +128
f"The results indicate a total record of {total_objects} objects,"
f" with a breakdown of {CC0_records} objects without CC0 Media and"
f" {CC0_records_with_media} objects with CC0 Media, taking a"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please format numbers with commas to improve readability.

Comment on lines +131 to +132
" representing museums, libraries, zoos and many other"
f" with a minimum of {min_unit} objects.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • min_units seems to be misleading. Shouldn't it be something like min_objects?
    • How does a "a minimum of 147 objects" relate to the 3rd plot that shows Anacostia Community Museum Archives with 57 works?
  • Please update wording: and many other ➡️ and other institutions

Comment on lines +178 to +182
"Plots showing totals by units.",
"This shows the distribution of top 10"
" units/ sub providers across smithsonian"
f" with an average of {average_unit} objects"
" across the top 10 sub providers.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Please hard wrap at 79 instead of 53
  • Please define "NMNH"
  • Please format number with commas to improve readability

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. First use of "Smithsonian" in report should be "Smithsonian Institute"
  2. I think the information presented to the user should replace "units" and "sub providers" with "institute members"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The institution member (units) plots are dominated by the member names:

  1. Image
  2. Image

I propose a function be added to this shared library that will insert a newline roughly halfway through the name (replacing a space).

Alternatively, the abbreviations could be used and the definitions of those abbreviations could be added as text below the plot.

I'm open to other ideas, as well.

)


def plot_totals_by_records(args):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new plot is interesting!

Please:

  • populate "bar x scale" value (I expect it should be linear instead of None)
  • see note about label/name length
  • indicate why the 10 records are shown (ex. 10 with hightest cc0 media percentage)
  • key shouldn't obscure data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In review

Development

Successfully merging this pull request may close these issues.

2 participants