diff mbox series

oe.data: allow to mask out secret variables

Message ID 20230726110946.129932-1-enrico.scholz@sigma-chemnitz.de
State New
Headers show
Series oe.data: allow to mask out secret variables | expand

Commit Message

Enrico Scholz July 26, 2023, 11:09 a.m. UTC
From: Enrico Scholz <enrico.scholz@sigma-chemnitz.de>

Some integrations require that passwords or secret tokens are
assigned to bitbake variables.  E.g. the meta-dependencytrack
layer has a 'DEPENDENCYTRACK_API_KEY' or my sstate-server requires
a 'SSTATE_SERVER_SESSION' token.

These secrets will appear in testdata.json which can easily leak them
when the deploy directory is published publicly.

Patch adds a special 'secrets' flag for variables.  When a variable is
marked with it, its content will be replaced by '**masked**'.

E.g. formerly

|    "SSTATE_SERVER_PATH": "HKBOZ8C279S4iwBA",
|    "SSTATE_MIRRORS": "    ... https://sstate..../api/v1/download/HKBOZ8C279S4iwBA/sstate/...

and now

|    "SSTATE_SERVER_PATH": "**masked**",
|    "SSTATE_MIRRORS": "    ... https://sstate..../api/v1/download/**masked**/sstate

Corresponding bbclass contains

| SSTATE_SERVER_PATH ??= "-"
| SSTATE_SERVER_PATH[secret] = "true"

Signed-off-by: Enrico Scholz <enrico.scholz@sigma-chemnitz.de>
---
 meta/lib/oe/data.py | 29 +++++++++++++++++++++++++++--
 1 file changed, 27 insertions(+), 2 deletions(-)

Comments

Alexander Kanavin July 26, 2023, 11:19 a.m. UTC | #1
Please no. These things can leak out in a million other ways (e.g. if
you publish logs), it's better to just scrub them prior to publishing
with a post-script. Having secrets in bitbake variables is a bad idea
to begin with.

Alex

On Wed, 26 Jul 2023 at 13:10, Enrico Scholz via lists.openembedded.org
<enrico.scholz=sigma-chemnitz.de@lists.openembedded.org> wrote:
>
> From: Enrico Scholz <enrico.scholz@sigma-chemnitz.de>
>
> Some integrations require that passwords or secret tokens are
> assigned to bitbake variables.  E.g. the meta-dependencytrack
> layer has a 'DEPENDENCYTRACK_API_KEY' or my sstate-server requires
> a 'SSTATE_SERVER_SESSION' token.
>
> These secrets will appear in testdata.json which can easily leak them
> when the deploy directory is published publicly.
>
> Patch adds a special 'secrets' flag for variables.  When a variable is
> marked with it, its content will be replaced by '**masked**'.
>
> E.g. formerly
>
> |    "SSTATE_SERVER_PATH": "HKBOZ8C279S4iwBA",
> |    "SSTATE_MIRRORS": "    ... https://sstate..../api/v1/download/HKBOZ8C279S4iwBA/sstate/...
>
> and now
>
> |    "SSTATE_SERVER_PATH": "**masked**",
> |    "SSTATE_MIRRORS": "    ... https://sstate..../api/v1/download/**masked**/sstate
>
> Corresponding bbclass contains
>
> | SSTATE_SERVER_PATH ??= "-"
> | SSTATE_SERVER_PATH[secret] = "true"
>
> Signed-off-by: Enrico Scholz <enrico.scholz@sigma-chemnitz.de>
> ---
>  meta/lib/oe/data.py | 29 +++++++++++++++++++++++++++--
>  1 file changed, 27 insertions(+), 2 deletions(-)
>
> diff --git a/meta/lib/oe/data.py b/meta/lib/oe/data.py
> index 37121cfad2b7..57a8e5b5e049 100644
> --- a/meta/lib/oe/data.py
> +++ b/meta/lib/oe/data.py
> @@ -5,7 +5,9 @@
>  #
>
>  import json
> +import re
>  import oe.maketype
> +import oe.types
>
>  def typed_value(key, d):
>      """Construct a value for the specified metadata variable, using its flags
> @@ -23,9 +25,10 @@ def typed_value(key, d):
>      except (TypeError, ValueError) as exc:
>          bb.msg.fatal("Data", "%s: %s" % (key, str(exc)))
>
> -def export2json(d, json_file, expand=True, searchString="",replaceString=""):
> +def export2json(d, json_file, expand=True, searchString="",replaceString="", mask_secrets=True):
>      data2export = {}
>      keys2export = []
> +    secrets = []
>
>      for key in d.keys():
>          if key.startswith("_"):
> @@ -38,12 +41,34 @@ def export2json(d, json_file, expand=True, searchString="",replaceString=""):
>              continue
>          elif d.getVarFlag(key, "func"):
>              continue
> +        elif mask_secrets and oe.types.boolean(d.getVarFlag(key, "secret") or "false"):
> +            var = d.getVar(key)
> +
> +            ## When secret variable contains a placeholder (is empty
> +            ## or single character), show it.  When it is too short
> +            ## emit a warning and exclude it completely from output
> +            ## but do not mask out its value in other places.
> +            if var is None or len(var) < 2:
> +                bb.debug(1, "variable '%s' is marked as secret but seems to contain some placeholder; showing it" % key)
> +            elif len(var) < 8:
> +                bb.warn("variable '%s' is marked as secret but content is too short; skipping it" % key)
> +                continue
> +            else:
> +                secrets.append(re.escape(var))
>
>          keys2export.append(key)
>
> +    if len(secrets) == 0:
> +        secrets = None
> +    else:
> +        secrets = re.compile('|'.join(secrets))
> +
>      for key in keys2export:
>          try:
> -            data2export[key] = d.getVar(key, expand).replace(searchString,replaceString)
> +            var = d.getVar(key, expand).replace(searchString,replaceString)
> +            if secrets:
> +                var = secrets.sub("**masked**", var)
> +            data2export[key] = var
>          except bb.data_smart.ExpansionError:
>              data2export[key] = ''
>          except AttributeError:
> --
> 2.41.0
>
>
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#184876): https://lists.openembedded.org/g/openembedded-core/message/184876
> Mute This Topic: https://lists.openembedded.org/mt/100368202/1686489
> Group Owner: openembedded-core+owner@lists.openembedded.org
> Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [alex.kanavin@gmail.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>
Enrico Scholz July 26, 2023, 11:41 a.m. UTC | #2
Alexander Kanavin <alex.kanavin@gmail.com> writes:

> Please no. These things can leak out in a million other ways

no; that is very unlikely.  The parts which are dealing with secrets
usually take care about not leaking them.

All major CI systems have the same problem (need secret variables) and
at least gitlab solves it in the same way (mark it as to be masked and
replace it in logs).


> (e.g. if you publish logs),

Secrets do not appear in the usual 'bitbake ...' output only in the deep
.../temp/log.do_* files.

I do not think that people are really publishing these files.


> it's better to just scrub them prior to publishing with a post-script.

Sounds unergonomic; you have to know which variables are secret.  You
have to read and interpret the testdata.json file, substitute values and
write it back.

It is much better to do it in the first place.  The classes which are
dealing with secrets can mark them as such.


> Having secrets in bitbake variables is a bad idea to begin with.

Yes; because they are exported in testdata.json ;)

Else, there are sometimes not many ways to work without them.
E.g. SSTATE_MIRRORS has contain the secret token because it is used
directly by bitbake; perhaps I could use a wget wrapper and write a
custom curl python class...



Enrico
Alexander Kanavin July 26, 2023, 11:50 a.m. UTC | #3
On Wed, 26 Jul 2023 at 13:42, Enrico Scholz
<enrico.scholz@sigma-chemnitz.de> wrote:> > it's better to just scrub
them prior to publishing with a post-script.>
> Else, there are sometimes not many ways to work without them.
> E.g. SSTATE_MIRRORS has contain the secret token because it is used
> directly by bitbake; perhaps I could use a wget wrapper and write a
> custom curl python class...

Yes, the secret needs to be in a file (or other access-controlled
facility), and read from it by the process that needs it, and only
directly prior to using it. Having it in a bitbake variable which gets
passed through a million tasks and components is a terrible idea, and
I do not want to validate it by having a 'secret' flag. Sorry, still
no.

Alex
Enrico Scholz July 26, 2023, 12:02 p.m. UTC | #4
Alexander Kanavin <alex.kanavin@gmail.com> writes:

>> Else, there are sometimes not many ways to work without them.
>> E.g. SSTATE_MIRRORS has contain the secret token because it is
>> used directly by bitbake; perhaps I could use a wget wrapper and
>> write a custom curl python class...
>
> Yes, the secret needs to be in a file (or other access-controlled
> facility), and read from it by the process that needs it, and only
> directly prior to using it. Having it in a bitbake variable which gets
> passed through a million tasks and components

Where is the problem?  I known only one component
(rootfs-postcommands.bbclass) which dumps the whole environment and
leaks it.

Else, when there is a malicious component that wants to steal secrets
from a bitbake variable, what would stop it from reading the secret from
a file?

Your suggestion (write secrets in files instead of bitbake variables)
does not improve security but causes only extra work.



Enrico
Richard Purdie July 26, 2023, 12:38 p.m. UTC | #5
On Wed, 2023-07-26 at 14:02 +0200, Enrico Scholz via
lists.openembedded.org wrote:
> Alexander Kanavin <alex.kanavin@gmail.com> writes:
> 
> > > Else, there are sometimes not many ways to work without them.
> > > E.g. SSTATE_MIRRORS has contain the secret token because it is
> > > used directly by bitbake; perhaps I could use a wget wrapper and
> > > write a custom curl python class...
> > 
> > Yes, the secret needs to be in a file (or other access-controlled
> > facility), and read from it by the process that needs it, and only
> > directly prior to using it. Having it in a bitbake variable which gets
> > passed through a million tasks and components
> 
> Where is the problem?  I known only one component
> (rootfs-postcommands.bbclass) which dumps the whole environment and
> leaks it.
> 
> Else, when there is a malicious component that wants to steal secrets
> from a bitbake variable, what would stop it from reading the secret from
> a file?
> 
> Your suggestion (write secrets in files instead of bitbake variables)
> does not improve security but causes only extra work.

It does improve security since there is an extra step to get the data
and you can more easily audit when that data is accessed or present.

I'd also note that there are patches under review to change rootfs-
postcommands to only export a known list of variables for other reasons
so this problem should go away when we get that patch merged.

Cheers,

Richard
diff mbox series

Patch

diff --git a/meta/lib/oe/data.py b/meta/lib/oe/data.py
index 37121cfad2b7..57a8e5b5e049 100644
--- a/meta/lib/oe/data.py
+++ b/meta/lib/oe/data.py
@@ -5,7 +5,9 @@ 
 #
 
 import json
+import re
 import oe.maketype
+import oe.types
 
 def typed_value(key, d):
     """Construct a value for the specified metadata variable, using its flags
@@ -23,9 +25,10 @@  def typed_value(key, d):
     except (TypeError, ValueError) as exc:
         bb.msg.fatal("Data", "%s: %s" % (key, str(exc)))
 
-def export2json(d, json_file, expand=True, searchString="",replaceString=""):
+def export2json(d, json_file, expand=True, searchString="",replaceString="", mask_secrets=True):
     data2export = {}
     keys2export = []
+    secrets = []
 
     for key in d.keys():
         if key.startswith("_"):
@@ -38,12 +41,34 @@  def export2json(d, json_file, expand=True, searchString="",replaceString=""):
             continue
         elif d.getVarFlag(key, "func"):
             continue
+        elif mask_secrets and oe.types.boolean(d.getVarFlag(key, "secret") or "false"):
+            var = d.getVar(key)
+
+            ## When secret variable contains a placeholder (is empty
+            ## or single character), show it.  When it is too short
+            ## emit a warning and exclude it completely from output
+            ## but do not mask out its value in other places.
+            if var is None or len(var) < 2:
+                bb.debug(1, "variable '%s' is marked as secret but seems to contain some placeholder; showing it" % key)
+            elif len(var) < 8:
+                bb.warn("variable '%s' is marked as secret but content is too short; skipping it" % key)
+                continue
+            else:
+                secrets.append(re.escape(var))
 
         keys2export.append(key)
 
+    if len(secrets) == 0:
+        secrets = None
+    else:
+        secrets = re.compile('|'.join(secrets))
+
     for key in keys2export:
         try:
-            data2export[key] = d.getVar(key, expand).replace(searchString,replaceString)
+            var = d.getVar(key, expand).replace(searchString,replaceString)
+            if secrets:
+                var = secrets.sub("**masked**", var)
+            data2export[key] = var
         except bb.data_smart.ExpansionError:
             data2export[key] = ''
         except AttributeError: