Message ID | 20231208050439.461257-1-dnagodra@cisco.com |
---|---|
State | Accepted, archived |
Commit | 7101d654635b707e56b0dbae8c2146b312d211ea |
Headers | show |
Series | [master] cve-update-nvd2-native: increase the delay between subsequent request failures | expand |
Hi, On 2023/12/08 14:04, Dhairya Nagodra via lists.openembedded.org wrote: > Sometimes NVD servers are unstable and return too many errors. > There is an option to have higher fetch attempts to increase the chances > of successfully fetching the CVE data. > > Additionally, it also makes sense to progressively increase the delay > after a failed request to an already unstable or busy server. > The increase in delay is reset after every successful request and > the maximum delay is limited to 30 seconds. > > Also, the logs are improved to give more clarity. > > Signed-off-by: Dhairya Nagodra <dnagodra@cisco.com> I was just working on a similar issue. As a specific example, multiple cve-update-nvd2-native:do_fetch runs in parallel can easily reach the rate limit. It can be assumed that this situation will occur if several people run bitbake in one office. (often unaware of each other...) I have observed the do_fetch logs and found that HTTP 403 errors are returned if the request is blocked, probably due to rate limitation. NOTE: Requesting https://services.nvd.nist.gov/rest/json/cves/2.0?startIndex=6000 NOTE: CVE database: received error (HTTP Error 403: Forbidden), retrying NOTE: CVE database: received error (HTTP Error 403: Forbidden), retrying NOTE: CVE database: received error (HTTP Error 403: Forbidden), retrying NOTE: CVE database: received error (HTTP Error 403: Forbidden), retrying NOTE: CVE database: received error (HTTP Error 403: Forbidden), retrying WARNING: CVE database update failed DEBUG: Python function do_fetch finished Other times a request may fail with IncompleteRead, but this is clearly distinguishable from an HTTP error. Here, we can think of the following ideas. If an HTTP error occurs, assume that the rate limit has already been reached and wait 30 seconds to ensure that the next window starts. The patch will be something like this. --- meta/recipes-core/meta/cve-update-nvd2-native.bb | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/meta/recipes-core/meta/cve-update-nvd2-native.bb b/meta/recipes-core/meta/cve-update-nvd2-native.bb index 9ab8dc6050..34fcc0317e 100644 --- a/meta/recipes-core/meta/cve-update-nvd2-native.bb +++ b/meta/recipes-core/meta/cve-update-nvd2-native.bb @@ -121,6 +121,7 @@ def nvd_request_next(url, attempts, api_key, args): import urllib.request import urllib.parse + import urllib.error import gzip import http import time @@ -142,10 +143,12 @@ def nvd_request_next(url, attempts, api_key, args): r.close() + except urllib.error.HTTPError as e: + bb.note("CVE database: received error (%s), wait until the next window" % (e)) + time.sleep(30) except Exception as e: bb.note("CVE database: received error (%s), retrying" % (e)) time.sleep(6) - pass else: return raw_data else: -- The time taken to fetch is likely to increase further, but the probability of failure due to error is expected to decrease greatly. Unfortunately, if the number of parallel executions is too large, this is still not good enough... I will consider what to do with this patch after your patches are merged. Since it may be enough to just extend the delay each time. Regards, Yuta Hayama
Hello, Le 11/12/2023 à 08:51, Yuta Hayama a écrit : > Hi, > > On 2023/12/08 14:04, Dhairya Nagodra via lists.openembedded.org wrote: >> Sometimes NVD servers are unstable and return too many errors. >> There is an option to have higher fetch attempts to increase the chances >> of successfully fetching the CVE data. >> >> Additionally, it also makes sense to progressively increase the delay >> after a failed request to an already unstable or busy server. >> The increase in delay is reset after every successful request and >> the maximum delay is limited to 30 seconds. >> >> Also, the logs are improved to give more clarity. >> >> Signed-off-by: Dhairya Nagodra <dnagodra@cisco.com> > > I was just working on a similar issue. > As a specific example, multiple cve-update-nvd2-native:do_fetch runs in > parallel can easily reach the rate limit. It can be assumed that this situation > will occur if several people run bitbake in one office. (often unaware of each > other...) > > I have observed the do_fetch logs and found that HTTP 403 errors are returned > if the request is blocked, probably due to rate limitation. Shouldn't we ask the NVD to return "429 Too Many Requests" instead? https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/429 > NOTE: Requesting https://services.nvd.nist.gov/rest/json/cves/2.0?startIndex=6000 > NOTE: CVE database: received error (HTTP Error 403: Forbidden), retrying > NOTE: CVE database: received error (HTTP Error 403: Forbidden), retrying > NOTE: CVE database: received error (HTTP Error 403: Forbidden), retrying > NOTE: CVE database: received error (HTTP Error 403: Forbidden), retrying > NOTE: CVE database: received error (HTTP Error 403: Forbidden), retrying > WARNING: CVE database update failed > DEBUG: Python function do_fetch finished > > Other times a request may fail with IncompleteRead, but this is clearly > distinguishable from an HTTP error. > > Here, we can think of the following ideas. > If an HTTP error occurs, assume that the rate limit has already been reached > and wait 30 seconds to ensure that the next window starts. The patch will be > something like this. > > --- > meta/recipes-core/meta/cve-update-nvd2-native.bb | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) Regards,
Hi, On 2023/12/11 10:02, Yoann Congal wrote: >Hello, > >Le 11/12/2023 à 08:51, Yuta Hayama a écrit : >> Hi, >> >> On 2023/12/08 14:04, Dhairya Nagodra via lists.openembedded.org wrote: >>> Sometimes NVD servers are unstable and return too many errors. >>> There is an option to have higher fetch attempts to increase the >>> chances of successfully fetching the CVE data. >>> >>> Additionally, it also makes sense to progressively increase the delay >>> after a failed request to an already unstable or busy server. >>> The increase in delay is reset after every successful request and the >>> maximum delay is limited to 30 seconds. >>> >>> Also, the logs are improved to give more clarity. >>> >>> Signed-off-by: Dhairya Nagodra <dnagodra@cisco.com> >> >> I was just working on a similar issue. >> As a specific example, multiple cve-update-nvd2-native:do_fetch runs >> in parallel can easily reach the rate limit. It can be assumed that >> this situation will occur if several people run bitbake in one office. >> (often unaware of each >> other...) >> >> I have observed the do_fetch logs and found that HTTP 403 errors are >> returned if the request is blocked, probably due to rate limitation. As per my knowledge, HTTP 403 is related to a permission issue rather than a rate limitation. I have not seen an HTTP 403 error, anytime. Can you please help clarify how was it generated? Is it reproducible? I tried removing both sleep delays altogether (and without API keys) to try and generate an error. In that, I couldn't generate any errors instead, got the response with a 15-20 sec delay. So, I guess it stayed within the rate limit. > >Shouldn't we ask the NVD to return "429 Too Many Requests" instead? >https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/429 > >> NOTE: Requesting >> https://services.nvd.nist.gov/rest/json/cves/2.0?startIndex=6000 >> NOTE: CVE database: received error (HTTP Error 403: Forbidden), >> retrying >> NOTE: CVE database: received error (HTTP Error 403: Forbidden), >> retrying >> NOTE: CVE database: received error (HTTP Error 403: Forbidden), >> retrying >> NOTE: CVE database: received error (HTTP Error 403: Forbidden), >> retrying >> NOTE: CVE database: received error (HTTP Error 403: Forbidden), >> retrying >> WARNING: CVE database update failed >> DEBUG: Python function do_fetch finished >> >> Other times a request may fail with IncompleteRead, but this is >> clearly distinguishable from an HTTP error. >> >> Here, we can think of the following ideas. >> If an HTTP error occurs, assume that the rate limit has already been >> reached and wait 30 seconds to ensure that the next window starts. The >> patch will be something like this. >> >> --- >> meta/recipes-core/meta/cve-update-nvd2-native.bb | 5 ++++- >> 1 file changed, 4 insertions(+), 1 deletion(-) > >Regards, > >-- >Yoann Congal >Smile ECS - Tech Expert Regards, Dhairya
Hi, On 2023/12/11 19:28, Dhairya Nagodra via lists.openembedded.org wrote: >>> I have observed the do_fetch logs and found that HTTP 403 errors are >>> returned if the request is blocked, probably due to rate limitation. > > As per my knowledge, HTTP 403 is related to a permission issue rather than a rate limitation. > I have not seen an HTTP 403 error, anytime. Can you please help clarify how was it generated? Is it reproducible? > I tried removing both sleep delays altogether (and without API keys) to try and generate an error. In that, I couldn't generate any errors instead, got the response with a 15-20 sec delay. So, I guess it stayed within the rate limit. > Yesterday I also tried to fetch with sleep removed, the result was the same, no error of any kind occurred. Perhaps someone has not read the documentation about rate limiting, so the server is putting a delay before returning a response. That is, I think a single bitbake will not cause the issue. The HTTP 403 error should be reproducible by running multiple cve-update-nvd2-native:do_fetch at the same time on a single PC. Here, I noticed that yesterday I could reproduce the error by executing two tasks in parallel, but today I had to execute three tasks in parallel to reproduce the error. Somehow, the delay that may have been provided by the server may have become longer. I think you are right that HTTP 403 does not look like reaching the rate limit (As Yoann noted, 429 would feel right). The HTTP 403 error returned may be because the request is being sent from a single PC, so I will try cve-update-nvd2-native:do_fetch on three different PCs. This would be closer to the actual use case I have indicated. >>> As a specific example, multiple cve-update-nvd2-native:do_fetch runs >>> in parallel can easily reach the rate limit. It can be assumed that >>> this situation will occur if several people run bitbake in one office. >>> (often unaware of each >>> other...) Regards, Yuta Hayama
On 2023/12/12 11:54, Yuta Hayama via lists.openembedded.org wrote: > I think you are right that HTTP 403 does not look like reaching the rate limit > (As Yoann noted, 429 would feel right). The HTTP 403 error returned may be > because the request is being sent from a single PC, so I will try > cve-update-nvd2-native:do_fetch on three different PCs. This would be closer > to the actual use case I have indicated. I have been trying to test running cve-update-nvd2-native:do_fetch on three PCs at the same time. The result was that only one machine continued to make successful requests until do_fetch completed, while the other two failed with <urlopen error [Errno 101] Network is unreachable>. It seems that only one person can fetch in my assumed situation, no matter how much we try to adjust the delay time... >>>> As a specific example, multiple cve-update-nvd2-native:do_fetch runs >>>> in parallel can easily reach the rate limit. It can be assumed that >>>> this situation will occur if several people run bitbake in one office. >>>> (often unaware of each >>>> other...) Also, the following was my misunderstanding. > Perhaps someone has not read the documentation > about rate limiting, so the server is putting a delay before returning a > response. > That is, I think a single bitbake will not cause the issue. The reason it takes a few seconds for the server to respond is probably that the response is too long and the server is struggling. In fact, if we send requests repeatedly that shorten the response, the response will come back in a relatively short time, eventually reaching the rate limit. And the HTTP error code at that time seems to be 403 for some reason. This is the operation that clearly causes the rate limit to be reached, but it still does not look like it will be 429. Below is an example of test code: ------------------------------------------------------------------------------- import urllib.request import time cves = \ ['CVE-2019-14899', 'CVE-2021-3714', 'CVE-2021-3864', 'CVE-2022-0400', 'CVE-2022-1247', 'CVE-2022-3219', 'CVE-2022-36402', 'CVE-2022-38096', 'CVE-2022-4543', 'CVE-2022-46456', 'CVE-2023-0687', 'CVE-2023-1386', 'CVE-2023-25584', 'CVE-2023-3019', 'CVE-2023-3397', 'CVE-2023-3640', 'CVE-2023-38559', 'CVE-2023-40030', 'CVE-2023-4010', 'CVE-2023-4039', 'CVE-2023-42363', 'CVE-2023-42364', 'CVE-2023-42365', 'CVE-2023-42366', 'CVE-2023-46407', 'CVE-2023-47100', 'CVE-2023-49292', 'CVE-2023-5088', 'CVE-2023-5156', 'CVE-2023-6238',] for cve in cves: url = 'https://services.nvd.nist.gov/rest/json/cves/2.0?cveId=' + cve print('URL:', url) res = urllib.request.urlopen(url) print('.urlopen() done.') #time.sleep(6) ------------------------------------------------------------------------------- Anyway, the question of HTTP error codes still remains, but unfortunately the situation of multiple people fetching at the same time seems to be difficult to deal with. Regards, Yuta Hayama
diff --git a/meta/recipes-core/meta/cve-update-nvd2-native.bb b/meta/recipes-core/meta/cve-update-nvd2-native.bb index 941fca34c6..bfe48b27e7 100644 --- a/meta/recipes-core/meta/cve-update-nvd2-native.bb +++ b/meta/recipes-core/meta/cve-update-nvd2-native.bb @@ -114,7 +114,10 @@ def cleanup_db_download(db_file, db_tmp_file): if os.path.exists(db_tmp_file): os.remove(db_tmp_file) -def nvd_request_next(url, attempts, api_key, args): +def nvd_request_wait(attempt, min_wait): + return min ( ( (2 * attempt) + min_wait ) , 30) + +def nvd_request_next(url, attempts, api_key, args, min_wait): """ Request next part of the NVD dabase """ @@ -143,8 +146,10 @@ def nvd_request_next(url, attempts, api_key, args): r.close() except Exception as e: - bb.note("CVE database: received error (%s), retrying" % (e)) - time.sleep(6) + wait_time = nvd_request_wait(attempt, min_wait) + bb.note("CVE database: received error (%s)" % (e)) + bb.note("CVE database: retrying download after %d seconds. attempted (%d/%d)" % (wait_time, attempt+1, attempts)) + time.sleep(wait_time) pass else: return raw_data @@ -195,7 +200,7 @@ def update_db_file(db_tmp_file, d, database_time): while True: req_args['startIndex'] = index - raw_data = nvd_request_next(url, attempts, api_key, req_args) + raw_data = nvd_request_next(url, attempts, api_key, req_args, wait_time) if raw_data is None: # We haven't managed to download data return False
Sometimes NVD servers are unstable and return too many errors. There is an option to have higher fetch attempts to increase the chances of successfully fetching the CVE data. Additionally, it also makes sense to progressively increase the delay after a failed request to an already unstable or busy server. The increase in delay is reset after every successful request and the maximum delay is limited to 30 seconds. Also, the logs are improved to give more clarity. Signed-off-by: Dhairya Nagodra <dnagodra@cisco.com> --- meta/recipes-core/meta/cve-update-nvd2-native.bb | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-)