diff mbox series

[master] cve-update-nvd2-native: increase the delay between subsequent request failures

Message ID 20231208050439.461257-1-dnagodra@cisco.com
State Accepted, archived
Commit 7101d654635b707e56b0dbae8c2146b312d211ea
Headers show
Series [master] cve-update-nvd2-native: increase the delay between subsequent request failures | expand

Commit Message

Sometimes NVD servers are unstable and return too many errors.
There is an option to have higher fetch attempts to increase the chances
of successfully fetching the CVE data.

Additionally, it also makes sense to progressively increase the delay
after a failed request to an already unstable or busy server.
The increase in delay is reset after every successful request and
the maximum delay is limited to 30 seconds.

Also, the logs are improved to give more clarity.

Signed-off-by: Dhairya Nagodra <dnagodra@cisco.com>
---
 meta/recipes-core/meta/cve-update-nvd2-native.bb | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

Comments

Yuta Hayama Dec. 11, 2023, 7:51 a.m. UTC | #1
Hi,

On 2023/12/08 14:04, Dhairya Nagodra via lists.openembedded.org wrote:
> Sometimes NVD servers are unstable and return too many errors.
> There is an option to have higher fetch attempts to increase the chances
> of successfully fetching the CVE data.
> 
> Additionally, it also makes sense to progressively increase the delay
> after a failed request to an already unstable or busy server.
> The increase in delay is reset after every successful request and
> the maximum delay is limited to 30 seconds.
> 
> Also, the logs are improved to give more clarity.
> 
> Signed-off-by: Dhairya Nagodra <dnagodra@cisco.com>

I was just working on a similar issue.
As a specific example, multiple cve-update-nvd2-native:do_fetch runs in
parallel can easily reach the rate limit. It can be assumed that this situation
will occur if several people run bitbake in one office. (often unaware of each
other...)

I have observed the do_fetch logs and found that HTTP 403 errors are returned
if the request is blocked, probably due to rate limitation.

NOTE: Requesting https://services.nvd.nist.gov/rest/json/cves/2.0?startIndex=6000
NOTE: CVE database: received error (HTTP Error 403: Forbidden), retrying
NOTE: CVE database: received error (HTTP Error 403: Forbidden), retrying
NOTE: CVE database: received error (HTTP Error 403: Forbidden), retrying
NOTE: CVE database: received error (HTTP Error 403: Forbidden), retrying
NOTE: CVE database: received error (HTTP Error 403: Forbidden), retrying
WARNING: CVE database update failed
DEBUG: Python function do_fetch finished

Other times a request may fail with IncompleteRead, but this is clearly
distinguishable from an HTTP error.

Here, we can think of the following ideas.
If an HTTP error occurs, assume that the rate limit has already been reached
and wait 30 seconds to ensure that the next window starts. The patch will be
something like this.

---
 meta/recipes-core/meta/cve-update-nvd2-native.bb | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/meta/recipes-core/meta/cve-update-nvd2-native.bb b/meta/recipes-core/meta/cve-update-nvd2-native.bb
index 9ab8dc6050..34fcc0317e 100644
--- a/meta/recipes-core/meta/cve-update-nvd2-native.bb
+++ b/meta/recipes-core/meta/cve-update-nvd2-native.bb
@@ -121,6 +121,7 @@ def nvd_request_next(url, attempts, api_key, args):

     import urllib.request
     import urllib.parse
+    import urllib.error
     import gzip
     import http
     import time
@@ -142,10 +143,12 @@ def nvd_request_next(url, attempts, api_key, args):

             r.close()

+        except urllib.error.HTTPError as e:
+            bb.note("CVE database: received error (%s), wait until the next window" % (e))
+            time.sleep(30)
         except Exception as e:
             bb.note("CVE database: received error (%s), retrying" % (e))
             time.sleep(6)
-            pass
         else:
             return raw_data
     else:
--

The time taken to fetch is likely to increase further, but the probability
of failure due to error is expected to decrease greatly. Unfortunately, if
the number of parallel executions is too large, this is still not good enough...

I will consider what to do with this patch after your patches are merged.
Since it may be enough to just extend the delay each time.


Regards,

Yuta Hayama
Yoann Congal Dec. 11, 2023, 8:02 a.m. UTC | #2
Hello,

Le 11/12/2023 à 08:51, Yuta Hayama a écrit :
> Hi,
> 
> On 2023/12/08 14:04, Dhairya Nagodra via lists.openembedded.org wrote:
>> Sometimes NVD servers are unstable and return too many errors.
>> There is an option to have higher fetch attempts to increase the chances
>> of successfully fetching the CVE data.
>>
>> Additionally, it also makes sense to progressively increase the delay
>> after a failed request to an already unstable or busy server.
>> The increase in delay is reset after every successful request and
>> the maximum delay is limited to 30 seconds.
>>
>> Also, the logs are improved to give more clarity.
>>
>> Signed-off-by: Dhairya Nagodra <dnagodra@cisco.com>
> 
> I was just working on a similar issue.
> As a specific example, multiple cve-update-nvd2-native:do_fetch runs in
> parallel can easily reach the rate limit. It can be assumed that this situation
> will occur if several people run bitbake in one office. (often unaware of each
> other...)
> 
> I have observed the do_fetch logs and found that HTTP 403 errors are returned
> if the request is blocked, probably due to rate limitation.

Shouldn't we ask the NVD to return "429 Too Many Requests" instead?
https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/429

> NOTE: Requesting https://services.nvd.nist.gov/rest/json/cves/2.0?startIndex=6000
> NOTE: CVE database: received error (HTTP Error 403: Forbidden), retrying
> NOTE: CVE database: received error (HTTP Error 403: Forbidden), retrying
> NOTE: CVE database: received error (HTTP Error 403: Forbidden), retrying
> NOTE: CVE database: received error (HTTP Error 403: Forbidden), retrying
> NOTE: CVE database: received error (HTTP Error 403: Forbidden), retrying
> WARNING: CVE database update failed
> DEBUG: Python function do_fetch finished
> 
> Other times a request may fail with IncompleteRead, but this is clearly
> distinguishable from an HTTP error.
> 
> Here, we can think of the following ideas.
> If an HTTP error occurs, assume that the rate limit has already been reached
> and wait 30 seconds to ensure that the next window starts. The patch will be
> something like this.
> 
> ---
>  meta/recipes-core/meta/cve-update-nvd2-native.bb | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)

Regards,
Hi,

On 2023/12/11 10:02, Yoann Congal wrote:
>Hello,
>
>Le 11/12/2023 à 08:51, Yuta Hayama a écrit :
>> Hi,
>>
>> On 2023/12/08 14:04, Dhairya Nagodra via lists.openembedded.org wrote:
>>> Sometimes NVD servers are unstable and return too many errors.
>>> There is an option to have higher fetch attempts to increase the
>>> chances of successfully fetching the CVE data.
>>>
>>> Additionally, it also makes sense to progressively increase the delay
>>> after a failed request to an already unstable or busy server.
>>> The increase in delay is reset after every successful request and the
>>> maximum delay is limited to 30 seconds.
>>>
>>> Also, the logs are improved to give more clarity.
>>>
>>> Signed-off-by: Dhairya Nagodra <dnagodra@cisco.com>
>>
>> I was just working on a similar issue.
>> As a specific example, multiple cve-update-nvd2-native:do_fetch runs
>> in parallel can easily reach the rate limit. It can be assumed that
>> this situation will occur if several people run bitbake in one office.
>> (often unaware of each
>> other...)
>>
>> I have observed the do_fetch logs and found that HTTP 403 errors are
>> returned if the request is blocked, probably due to rate limitation.

As per my knowledge, HTTP 403 is related to a permission issue rather than a rate limitation.
I have not seen an HTTP 403 error, anytime. Can you please help clarify how was it generated? Is it reproducible?
I tried removing both sleep delays altogether (and without API keys) to try and generate an error. In that, I couldn't generate any errors instead, got the response with a 15-20 sec delay. So, I guess it stayed within the rate limit.

>
>Shouldn't we ask the NVD to return "429 Too Many Requests" instead?
>https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/429
>
>> NOTE: Requesting
>> https://services.nvd.nist.gov/rest/json/cves/2.0?startIndex=6000
>> NOTE: CVE database: received error (HTTP Error 403: Forbidden),
>> retrying
>> NOTE: CVE database: received error (HTTP Error 403: Forbidden),
>> retrying
>> NOTE: CVE database: received error (HTTP Error 403: Forbidden),
>> retrying
>> NOTE: CVE database: received error (HTTP Error 403: Forbidden),
>> retrying
>> NOTE: CVE database: received error (HTTP Error 403: Forbidden),
>> retrying
>> WARNING: CVE database update failed
>> DEBUG: Python function do_fetch finished
>>
>> Other times a request may fail with IncompleteRead, but this is
>> clearly distinguishable from an HTTP error.
>>
>> Here, we can think of the following ideas.
>> If an HTTP error occurs, assume that the rate limit has already been
>> reached and wait 30 seconds to ensure that the next window starts. The
>> patch will be something like this.
>>
>> ---
>>  meta/recipes-core/meta/cve-update-nvd2-native.bb | 5 ++++-
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>
>Regards,
>
>--
>Yoann Congal
>Smile ECS - Tech Expert

Regards,
Dhairya
Yuta Hayama Dec. 12, 2023, 2:54 a.m. UTC | #4
Hi,

On 2023/12/11 19:28, Dhairya Nagodra via lists.openembedded.org wrote:
>>> I have observed the do_fetch logs and found that HTTP 403 errors are
>>> returned if the request is blocked, probably due to rate limitation.
> 
> As per my knowledge, HTTP 403 is related to a permission issue rather than a rate limitation.
> I have not seen an HTTP 403 error, anytime. Can you please help clarify how was it generated? Is it reproducible?
> I tried removing both sleep delays altogether (and without API keys) to try and generate an error. In that, I couldn't generate any errors instead, got the response with a 15-20 sec delay. So, I guess it stayed within the rate limit.
> 
Yesterday I also tried to fetch with sleep removed, the result was the same,
no error of any kind occurred. Perhaps someone has not read the documentation
about rate limiting, so the server is putting a delay before returning a
response.

That is, I think a single bitbake will not cause the issue.
The HTTP 403 error should be reproducible by running multiple
cve-update-nvd2-native:do_fetch at the same time on a single PC.
Here, I noticed that yesterday I could reproduce the error by executing two
tasks in parallel, but today I had to execute three tasks in parallel to
reproduce the error. Somehow, the delay that may have been provided by the
server may have become longer.

I think you are right that HTTP 403 does not look like reaching the rate limit
(As Yoann noted, 429 would feel right). The HTTP 403 error returned may be
because the request is being sent from a single PC, so I will try
cve-update-nvd2-native:do_fetch on three different PCs. This would be closer
to the actual use case I have indicated.
>>> As a specific example, multiple cve-update-nvd2-native:do_fetch runs
>>> in parallel can easily reach the rate limit. It can be assumed that
>>> this situation will occur if several people run bitbake in one office.
>>> (often unaware of each
>>> other...)


Regards,

Yuta Hayama
Yuta Hayama Dec. 13, 2023, 12:28 a.m. UTC | #5
On 2023/12/12 11:54, Yuta Hayama via lists.openembedded.org wrote:
> I think you are right that HTTP 403 does not look like reaching the rate limit
> (As Yoann noted, 429 would feel right). The HTTP 403 error returned may be
> because the request is being sent from a single PC, so I will try
> cve-update-nvd2-native:do_fetch on three different PCs. This would be closer
> to the actual use case I have indicated.


I have been trying to test running cve-update-nvd2-native:do_fetch on three
PCs at the same time. The result was that only one machine continued to make
successful requests until do_fetch completed, while the other two failed with
<urlopen error [Errno 101] Network is unreachable>.

It seems that only one person can fetch in my assumed situation, no matter how
much we try to adjust the delay time...
>>>> As a specific example, multiple cve-update-nvd2-native:do_fetch runs
>>>> in parallel can easily reach the rate limit. It can be assumed that
>>>> this situation will occur if several people run bitbake in one office.
>>>> (often unaware of each
>>>> other...)

Also, the following was my misunderstanding.
> Perhaps someone has not read the documentation
> about rate limiting, so the server is putting a delay before returning a
> response.
> That is, I think a single bitbake will not cause the issue.

The reason it takes a few seconds for the server to respond is probably that
the response is too long and the server is struggling.

In fact, if we send requests repeatedly that shorten the response, the response
will come back in a relatively short time, eventually reaching the rate limit.
And the HTTP error code at that time seems to be 403 for some reason. This is
the operation that clearly causes the rate limit to be reached, but it still
does not look like it will be 429.
Below is an example of test code:
-------------------------------------------------------------------------------
import urllib.request
import time

cves = \
['CVE-2019-14899',
'CVE-2021-3714',
'CVE-2021-3864',
'CVE-2022-0400',
'CVE-2022-1247',
'CVE-2022-3219',
'CVE-2022-36402',
'CVE-2022-38096',
'CVE-2022-4543',
'CVE-2022-46456',
'CVE-2023-0687',
'CVE-2023-1386',
'CVE-2023-25584',
'CVE-2023-3019',
'CVE-2023-3397',
'CVE-2023-3640',
'CVE-2023-38559',
'CVE-2023-40030',
'CVE-2023-4010',
'CVE-2023-4039',
'CVE-2023-42363',
'CVE-2023-42364',
'CVE-2023-42365',
'CVE-2023-42366',
'CVE-2023-46407',
'CVE-2023-47100',
'CVE-2023-49292',
'CVE-2023-5088',
'CVE-2023-5156',
'CVE-2023-6238',]

for cve in cves:
    url = 'https://services.nvd.nist.gov/rest/json/cves/2.0?cveId=' + cve
    print('URL:', url)
    res = urllib.request.urlopen(url)
    print('.urlopen() done.')
    #time.sleep(6)
-------------------------------------------------------------------------------

Anyway, the question of HTTP error codes still remains, but unfortunately the
situation of multiple people fetching at the same time seems to be difficult
to deal with.


Regards,

Yuta Hayama
diff mbox series

Patch

diff --git a/meta/recipes-core/meta/cve-update-nvd2-native.bb b/meta/recipes-core/meta/cve-update-nvd2-native.bb
index 941fca34c6..bfe48b27e7 100644
--- a/meta/recipes-core/meta/cve-update-nvd2-native.bb
+++ b/meta/recipes-core/meta/cve-update-nvd2-native.bb
@@ -114,7 +114,10 @@  def cleanup_db_download(db_file, db_tmp_file):
     if os.path.exists(db_tmp_file):
         os.remove(db_tmp_file)
 
-def nvd_request_next(url, attempts, api_key, args):
+def nvd_request_wait(attempt, min_wait):
+    return min ( ( (2 * attempt) + min_wait ) , 30)
+
+def nvd_request_next(url, attempts, api_key, args, min_wait):
     """
     Request next part of the NVD dabase
     """
@@ -143,8 +146,10 @@  def nvd_request_next(url, attempts, api_key, args):
             r.close()
 
         except Exception as e:
-            bb.note("CVE database: received error (%s), retrying" % (e))
-            time.sleep(6)
+            wait_time = nvd_request_wait(attempt, min_wait)
+            bb.note("CVE database: received error (%s)" % (e))
+            bb.note("CVE database: retrying download after %d seconds. attempted (%d/%d)" % (wait_time, attempt+1, attempts))
+            time.sleep(wait_time)
             pass
         else:
             return raw_data
@@ -195,7 +200,7 @@  def update_db_file(db_tmp_file, d, database_time):
 
         while True:
             req_args['startIndex'] = index
-            raw_data = nvd_request_next(url, attempts, api_key, req_args)
+            raw_data = nvd_request_next(url, attempts, api_key, req_args, wait_time)
             if raw_data is None:
                 # We haven't managed to download data
                 return False