Migrate Alpine importer to advisory V2#2111
Migrate Alpine importer to advisory V2#2111ziadhany wants to merge 6 commits intoaboutcode-org:mainfrom
Conversation
|
|
@TG1999 @pombredanne I have a question about Alpine migration. We are fetching one URL and processing the data without grouping by CVE. The problem is that each URL reports a package version along with its fixed CVEs. How can we obtain a unique identifier for this importer? Is it a good idea to restructure the data and create a large mapping, using the CVE as the unique identifier? Proposed structure: Example: Sources: |
| ) | ||
|
|
||
| for cve in aliases: | ||
| advisory_id = f"{pkg_infos['name']}/{qualifiers['distroversion']}/{cve}" |
There was a problem hiding this comment.
ex:
apache2/v3.20/2.4.26-r0/CVE-2017-7668
vulnerabilities/tests/pipelines/v2_importers/test_alpine_linux_importer_pipeline.py
Show resolved
Hide resolved
|
The logs in debug mode: |
keshav-space
left a comment
There was a problem hiding this comment.
Thanks @ziadhany, see comments below.
| def fetch_advisory_directory_links( | ||
| page_response_content: str, | ||
| base_url: str, | ||
| logger: callable = None, | ||
| ) -> List[str]: | ||
| """ | ||
| Return a list of advisory directory links present in `page_response_content` html string | ||
| """ | ||
| index_page = BeautifulSoup(page_response_content, features="lxml") | ||
| alpine_versions = [ | ||
| link.text | ||
| for link in index_page.find_all("a") | ||
| if link.text.startswith("v") or link.text.startswith("edge") | ||
| ] | ||
|
|
||
| if not alpine_versions: | ||
| if logger: | ||
| logger( | ||
| f"No versions found in {base_url!r}", | ||
| level=logging.DEBUG, | ||
| ) | ||
| return [] | ||
|
|
||
| advisory_directory_links = [urljoin(base_url, version) for version in alpine_versions] | ||
|
|
||
| return advisory_directory_links | ||
|
|
||
|
|
||
| def fetch_advisory_links( | ||
| advisory_directory_page: str, | ||
| advisory_directory_link: str, | ||
| logger: callable = None, | ||
| ) -> Iterable[str]: | ||
| """ | ||
| Yield json file urls present in `advisory_directory_page` | ||
| """ | ||
| advisory_directory_page = BeautifulSoup(advisory_directory_page, features="lxml") | ||
| anchor_tags = advisory_directory_page.find_all("a") | ||
| if not anchor_tags: | ||
| if logger: | ||
| logger( | ||
| f"No anchor tags found in {advisory_directory_link!r}", | ||
| level=logging.DEBUG, | ||
| ) | ||
| return iter([]) | ||
| for anchor_tag in anchor_tags: | ||
| if anchor_tag.text.endswith("json"): | ||
| yield urljoin(advisory_directory_link, anchor_tag.text) |
There was a problem hiding this comment.
@ziadhany this is bit brittle. I've created a mirror for Alpine secdb here https://github.com/aboutcode-org/aboutcode-mirror-alpine-secdb let's use this instead.
There was a problem hiding this comment.
Ok, I’ll update the code. I didn’t notice we have a mirror
| return (cls.collect_and_store_advisories,) | ||
|
|
||
| def advisories_count(self) -> int: | ||
| return 0 |
There was a problem hiding this comment.
Let's return count based on packages key.
There was a problem hiding this comment.
Are you sure about this? The problem is that we create an AdvisoryData entry for every CVE.
For example (not related): CVE-2019-3828, CVE-2020-1733.
https://nvd.nist.gov/vuln/detail/CVE-2019-3828
https://nvd.nist.gov/vuln/detail/CVE-2020-1733
"packages": [
{
"pkg": {
"name": "ansible",
"secfixes": {
"2.6.3-r0": [
"CVE-2018-10875"
],
"2.7.9-r0": [
"CVE-2018-16876"
],
"2.8.11-r0": [
"CVE-2019-3828",
"CVE-2020-1733",
"CVE-2020-1740"
],
getting the correct count means we should loop over every package alias.
There was a problem hiding this comment.
@ziadhany since we already have all the advisory files locally, we can instead return the count of CVEs from these files.
Perhaps we can return something like this?
sum(len(re.findall(r'\bCVE-\d{4}-\d+\b', a.read_text())) for a in secdb.rglob("*.json"))
Signed-off-by: ziad hany <ziadhany2016@gmail.com>
…aseImporterPipelineV2 Signed-off-by: ziad hany <ziadhany2016@gmail.com>
Signed-off-by: ziad hany <ziadhany2016@gmail.com>
Fix duplication on advisory_id Signed-off-by: ziad hany <ziadhany2016@gmail.com>
Signed-off-by: ziad hany <ziadhany2016@gmail.com>
26f912d to
0bb7b03
Compare
Signed-off-by: ziad hany <ziadhany2016@gmail.com>
Issue: