Analysing 25 Years of CVEs

If you are reading this blog post via a 3rd party source it is very likely that many parts of it will not render correctly (usually, the interactive graphs). Please view the post on dogesec.com for the full interactive viewing experience.

tl;dr

The CVE List was launched in September 1999, listing 321 CVE records. 25 years later there are 265,767 CVE records.

Key Findings

2024 already holds the records for the most vulnerabilities published in a single year
RedHat (11,565), Microsoft (11,030), and Oracle (8,042) are the CNAs that have submitted the most vulnerabilities for their products.
The mean CVSS base score is 6.5
Most CVEs are published on a Wednesday (54,862, 21.3%)
Linux operating systems have the most published vulnerabilities

Why I decided to write this post

It is a homage to the CVE program.

We all owe a lot to the people behind it.

Follow along

For this post I am going to use the STIX Objects created by cve2stix over this period.

If you would like to follow along, you can download the same dataset I will be using with the help of stix2arango.

Once you have installed stix2arango, all you need to run is;

python3 utilities/arango_cve_processor/insert_archive_cve.py \
  --database blog_demo

This command will import all CVEs published by the NVD into the graph database (including those since this post was written), ready for querying in the ArangoDB web interface (usually http://127.0.0.1:8529/).

The STIX Vulnerability object

In this post we will be querying the STIX Vulnerability objects created by cve2stix. For reference, here is an example of one of those Vulnerability STIX 2.1 SDO…

        {
            "type": "vulnerability",
            "spec_version": "2.1",
            "id": "vulnerability--4d2cad44-0a5a-5890-925c-29d535c3f49e",
            "created_by_ref": "identity--562918ee-d5da-5579-b6a1-fae50cc6bad3",
            "created": "2024-08-01T16:15:06.750Z",
            "modified": "2024-08-01T16:45:25.400Z",
            "name": "CVE-2024-6040",
            "description": "In parisneo/lollms-webui version v9.8, the lollms_binding_infos is missing the client_id parameter, which leads to multiple security vulnerabilities. Specifically, the endpoints /reload_binding, /install_binding, /reinstall_binding, /unInstall_binding, /set_active_binding_settings, and /update_binding_settings are susceptible to CSRF attacks and local attacks. An attacker can exploit this vulnerability to perform unauthorized actions on the victim's machine.",
            "external_references": [
                {
                    "source_name": "cve",
                    "url": "https://nvd.nist.gov/vuln/detail/CVE-2024-6040",
                    "external_id": "CVE-2024-6040"
                },
                {
                    "source_name": "cwe",
                    "url": "https://cwe.mitre.org/data/definitions/CWE-304.html",
                    "external_id": "CWE-304"
                },
                {
                    "source_name": "[email protected]",
                    "description": "",
                    "url": "https://huntr.com/bounties/ac0bbb1d-89aa-42ba-bc48-1b59bd16acc7"
                },
                {
                    "source_name": "vulnStatus",
                    "description": "Awaiting Analysis"
                },
                {
                    "source_name": "sourceIdentifier",
                    "description": "[email protected]"
                }
            ],
            "object_marking_refs": [
                "marking-definition--94868c89-83c2-464b-929b-a1a8aa3c8487",
                "marking-definition--562918ee-d5da-5579-b6a1-fae50cc6bad3"
            ],
            "extensions": {
                "extension-definition--2c5c13af-ee92-5246-9ba7-0b958f8cd34a": {
                    "extension_type": "toplevel-property-extension"
                }
            },
            "x_cvss": {
                "v3_0": {
                    "base_score": 4.4,
                    "base_severity": "MEDIUM",
                    "exploitability_score": 1.8,
                    "impact_score": 2.5,
                    "source": "[email protected]",
                    "type": "Secondary",
                    "vector_string": "CVSS:3.0/AV:L/AC:L/PR:N/UI:R/S:U/C:N/I:L/A:L"
                }
            }
        }

Note, cve2stix creates other STIX 2.1 objects for the CVE, but that’s for another post.

Analysis

The first ever CVE published

Despite the name CVE-1999-0095, which uses the year 1999 and the ID 0095 (not the lowest CVE ID, e.g CVE-1999-0082 exists but has a later published date), is the CVE with the earliest published date; 1988-10-01.

Count of CVE records

This query counts all Vulnerability objects

RETURN LENGTH(
  FOR doc IN nvd_cve_vertex_collection
  FILTER doc.type == "vulnerability"
    RETURN doc
)

[
  265767
]

FOR doc IN nvd_cve_vertex_collection
  FILTER doc.type == "vulnerability"
  COLLECT year = SUBSTRING(doc.created, 0, 4) WITH COUNT INTO count
  SORT year
  RETURN { year: year, count: count }

Vulnerabilities by publish year

With 3 months left of the year, this year already holds the records for the most vulnerabilities published (31,139).

Most active CNAs

This search groups all Vulnerabilities by the CNA that submitted them

FOR doc IN nvd_cve_vertex_collection
  FILTER doc.type == "vulnerability"
  FOR ref IN doc.external_references
    FILTER ref.source_name == "sourceIdentifier"
    COLLECT description = ref.description WITH COUNT INTO count
    SORT count DESC, description
    RETURN { description: description, count: count }

Vulnerabilities by CNA

MITRE, the Root CNA responsible for managing the CVE program, top the list, which is to be expected. As is the fact the rest of the top 10 is made up of major software vendors, RedHat, Microsoft, Oracle, etc.

There has been a lot of complaints in the community about poorly written submissions, or put more bluntly, submissions that should never had been made in the first place.

This search is similar to the previous one, however, this time counts the number of CVEs rejected by the NVD by CNA;

FOR doc IN nvd_cve_vertex_collection
  FILTER doc.type == "vulnerability"
  AND (
            LENGTH(
                FOR ref IN doc.external_references
                    FILTER ref.source_name == "vulnStatus"
                    AND ref.description == "Rejected"
                    RETURN ref
            ) > 0
        )
  FOR ref IN doc.external_references
    FILTER ref.source_name == "sourceIdentifier"
    COLLECT description = ref.description WITH COUNT INTO count
    SORT count DESC, description
    RETURN { description: description, count: count }

Rejected Vulnerabilities by CNA

I’m looking at you, Intel!

CVEs by CVSS base score

The CVSS base score reflects the inherent severity of a vulnerability before any mitigation efforts are considered, providing a consistent way to compare vulnerabilities.

Here is a search that shows the distribution of results by CVSS base score;

LET filtered_docs = (
    FOR doc IN nvd_cve_vertex_collection
        FILTER doc.type == "vulnerability"
        AND NOT (
            LENGTH(
                FOR ref IN doc.external_references
                    FILTER ref.source_name == "vulnStatus"
                    AND ref.description == "Rejected"
                    RETURN ref
            ) > 0
        )
        RETURN doc
)

LET base_score_groups = (
    FOR doc IN filtered_docs
        // Iterate over each cvss version
        FOR cvss IN VALUES(doc.x_cvss)
        LET base_score = cvss.base_score
        FILTER base_score != null
        // Classify the base_score into different ranges using ternary operator
        LET range = 
            base_score >= 0 AND base_score < 1 ? "0-1" :
            base_score >= 1 AND base_score < 2 ? "1-2" :
            base_score >= 2 AND base_score < 3 ? "2-3" :
            base_score >= 3 AND base_score < 4 ? "3-4" :
            base_score >= 4 AND base_score < 5 ? "4-5" :
            base_score >= 5 AND base_score < 6 ? "5-6" :
            base_score >= 6 AND base_score < 7 ? "6-7" :
            base_score >= 7 AND base_score < 8 ? "7-8" :
            base_score >= 8 AND base_score < 9 ? "8-9" :
            base_score >= 9 AND base_score <= 10 ? "9-10" : "unknown"
        COLLECT score_range = range INTO group
        RETURN {
            score_range: score_range,
            count: LENGTH(group)
        }
)

// Sort by the score range
FOR entry IN base_score_groups
    SORT entry.score_range
    RETURN entry

Vulnerabilities by CVSS base score

The mean and median scores can be calculated using this search;

LET filtered_docs = (
    FOR doc IN nvd_cve_vertex_collection
        FILTER doc.type == "vulnerability"
        AND NOT (
            LENGTH(
                FOR ref IN doc.external_references
                    FILTER ref.source_name == "vulnStatus"
                    AND ref.description == "Rejected"
                    RETURN ref
            ) > 0
        )
        RETURN doc
)

LET base_scores = (
    FOR doc IN filtered_docs
        // Iterate over each cvss version
        FOR cvss IN VALUES(doc.x_cvss)
        LET base_score = cvss.base_score
        FILTER base_score != null
        RETURN base_score
)

LET mean_base_score = AVERAGE(base_scores)
LET median_base_score = MEDIAN(base_scores)

RETURN {
    mean_base_score: mean_base_score,
    median_base_score: median_base_score
}

[
  {
    "mean_base_score": 6.51316089491444,
    "median_base_score": 6.5
  }
]

How many vulnerabilities are published on a Friday?

Is it me, or are most vulnerabilities published on a Friday?

LET monday_count = (
    FOR doc IN nvd_cve_vertex_collection
        FILTER doc.type == "vulnerability"
        AND DATE_DAYOFWEEK(doc.created) == 1
        COLLECT WITH COUNT INTO count
        RETURN count
)[0]

LET tuesday_count = (
    FOR doc IN nvd_cve_vertex_collection
        FILTER doc.type == "vulnerability"
        AND DATE_DAYOFWEEK(doc.created) == 2
        COLLECT WITH COUNT INTO count
        RETURN count
)[0]

LET wednesday_count = (
    FOR doc IN nvd_cve_vertex_collection
        FILTER doc.type == "vulnerability"
        AND DATE_DAYOFWEEK(doc.created) == 3
        COLLECT WITH COUNT INTO count
        RETURN count
)[0]

LET thursday_count = (
    FOR doc IN nvd_cve_vertex_collection
        FILTER doc.type == "vulnerability"
        AND DATE_DAYOFWEEK(doc.created) == 4
        COLLECT WITH COUNT INTO count
        RETURN count
)[0]

LET friday_count = (
    FOR doc IN nvd_cve_vertex_collection
        FILTER doc.type == "vulnerability"
        AND DATE_DAYOFWEEK(doc.created) == 5
        COLLECT WITH COUNT INTO count
        RETURN count
)[0]

LET saturday_count = (
    FOR doc IN nvd_cve_vertex_collection
        FILTER doc.type == "vulnerability"
        AND DATE_DAYOFWEEK(doc.created) == 6
        COLLECT WITH COUNT INTO count
        RETURN count
)[0]

LET sunday_count = (
    FOR doc IN nvd_cve_vertex_collection
        FILTER doc.type == "vulnerability"
        AND DATE_DAYOFWEEK(doc.created) == 7
        COLLECT WITH COUNT INTO count
        RETURN count
)[0]

Vulnerabilities by day published

How wrong I was!

Most vulnerable products

Finally, I wanted to see the products with the most vulnerabilities discovered in them;

LET unique_vulnerable_criteria = (
    FOR doc IN nvd_cve_vertex_collection
        FILTER doc.type == "indicator"
        // Extract the unique vulnerable criteria for each document
        LET criteria = (
            FOR cpe IN doc.x_cpes.vulnerable
                LET parts = SPLIT(cpe.criteria, ":")
                RETURN DISTINCT CONCAT(parts[3], ":", parts[4])
        )
        // Return the unique criteria for the document
        RETURN DISTINCT FLATTEN(criteria)
)

FOR criteria_list IN unique_vulnerable_criteria
    FOR criteria IN criteria_list
    COLLECT vulnerable_criteria = criteria WITH COUNT INTO count
    SORT count DESC
    RETURN {
        vulnerable_criteria: vulnerable_criteria,
        count: count
    }

Products with most vulnerabilities

UNIX OS’s Debian, Fedora, Ubuntu, OpenSUSE, and Redhat make up the top 5 products with most vulnerabilities published.

Keep in mind my search only considers vendor and product, it does not split the results by product version. Thus products that have multiple versions, and have likely been around for longer, are considered as one. Even so, given these are open-source projects, it is not surprising.

For me, the biggest takeaway from this data is the amount of firmware listed, specifically from Qualcomm, highlighting the ever-increasing risk of vulnerabilities at the edge (aka Internet of Things).