diff mbox series

[RFC] bitbake.conf/pseudo: Switch from exclusion list to inclusion list

Message ID 20231211173553.2421931-1-richard.purdie@linuxfoundation.org
State New
Headers show
Series [RFC] bitbake.conf/pseudo: Switch from exclusion list to inclusion list | expand

Commit Message

Richard Purdie Dec. 11, 2023, 5:35 p.m. UTC
Currently, pseudo tracks all files referenced within it's presence unless
they're listed in an exclusion list. The exclusion list has grown to be
fairly unwieldy.

This patch swaps PSEUDO_IGNORE_PATHS for PSEUDO_INCLUDE_PATHS which in
theory should be easier and more explicit to maintain.

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
---
 meta/conf/bitbake.conf                     | 11 ++++-------
 meta/recipes-core/glibc/glibc-locale.inc   |  2 ++
 meta/recipes-devtools/pseudo/pseudo_git.bb |  4 ++--
 3 files changed, 8 insertions(+), 9 deletions(-)

For now I'm sharing this as an experiment to see if this is a change we want to
make. I'd be interested if anyone wanted to compare performance in particular. We
do need to run it through oe-selftest as I suspect there are issues to be identified
and tracked down. This patch was tested with basic image generation.

Comments

Ross Burton Dec. 11, 2023, 8:35 p.m. UTC | #1
On 11 Dec 2023, at 17:35, Richard Purdie via lists.openembedded.org <richard.purdie=linuxfoundation.org@lists.openembedded.org> wrote:
> 
> Currently, pseudo tracks all files referenced within it's presence unless
> they're listed in an exclusion list. The exclusion list has grown to be
> fairly unwieldy.
> 
> This patch swaps PSEUDO_IGNORE_PATHS for PSEUDO_INCLUDE_PATHS which in
> theory should be easier and more explicit to maintain.

I’ve never understood the rationale behind exclude over include (then again there’s lots I don’t understand), so I’m very much for this in principle.

Ross
Richard Purdie Dec. 11, 2023, 11:31 p.m. UTC | #2
On Mon, 2023-12-11 at 20:35 +0000, Ross Burton wrote:
> On 11 Dec 2023, at 17:35, Richard Purdie via lists.openembedded.org <richard.purdie=linuxfoundation.org@lists.openembedded.org> wrote:
> > 
> > Currently, pseudo tracks all files referenced within it's presence unless
> > they're listed in an exclusion list. The exclusion list has grown to be
> > fairly unwieldy.
> > 
> > This patch swaps PSEUDO_IGNORE_PATHS for PSEUDO_INCLUDE_PATHS which in
> > theory should be easier and more explicit to maintain.
> 
> I’ve never understood the rationale behind exclude over include (then again there’s lots I don’t understand), so I’m very much for this in principle.

If you create a file and then move it into pseudo's "sight",
problematic things can happen. That sounds like an odd thing to do
until you see tmp files created in /tmp, or a "cp /dev/null newfile" or
the fun games you can play with /proc.

Another example would be create something in cwd (not included), then
move it so becomes something which is included and the permissions are
lost.

The argument was that exclusions were therefore safer than inclusions
and in some ways that is true. The question is which is going to work
the best and be the most maintainable too. This patch at least allows
us to experiment with the other choice.

Cheers,

Richard
Mark Hatle Dec. 12, 2023, 6:44 p.m. UTC | #3
On 12/11/23 2:35 PM, Ross Burton wrote:
> On 11 Dec 2023, at 17:35, Richard Purdie via lists.openembedded.org <richard.purdie=linuxfoundation.org@lists.openembedded.org> wrote:
>>
>> Currently, pseudo tracks all files referenced within it's presence unless
>> they're listed in an exclusion list. The exclusion list has grown to be
>> fairly unwieldy.
>>
>> This patch swaps PSEUDO_IGNORE_PATHS for PSEUDO_INCLUDE_PATHS which in
>> theory should be easier and more explicit to maintain.
> 
> I’ve never understood the rationale behind exclude over include (then again there’s lots I don’t understand), so I’m very much for this in principle.

History:

It was originally everything, because files were referenced, linked and copied 
around random parts of the system and/or build and needed preservation of 
components.  This was very early Yocto Project and before.  (Psuedo existed for 
many years before OE/YP used it!)

Then the build/work directories became more standardized, and the components 
'copying in' from elsewhere was greatly reduced... but there were points where 
reference files were still used (think .m4 files or tex templates for instance 
for the kinds of things...)  Since we didn't know what all of the reference 
points (to the host) were, we decided to be conservative to add exclusion.

So excluding paths where we could count on owners/groups (and more importantly 
permissions) being standard became the norm.  That's why the primary exclusion was:

/usr/,/etc/,/lib,/dev/,/run/

Over time though we've gotten our dependencies understood and really severely 
limited the external components we use in the build.  (including .m4, autoconf, 
tex, etc etc etc..)   So moving to an INCLUDE seems to make more sense to me. 
Since we have a clear understanding of the path structure we use....

/tmp - used by the compiler (and tons of other tooling) for temporary files, we 
need this for sure.

/proc - there are referenced entries that MIGHT be needed, especially links to 
/proc/mounts (where the link inherits perms)

/dev - I'm not sure if this is needed, but things like /dev/null and /dev/shm, 
another tmpfs, mean we probably need to track this.

/run - often contains other short-term temp files, especially for things like 
trying to process pre/post package scripting on rootfs configuration.


Then the items in WORKDIR of course.  IN THE PAST, we needed source code to be 
covered both to handle symlink references, as well as specific copy functions. 
But I suspect this is not needed any longer.  The actual build directory also is 
likely not needed anymore (but had been in the past!)  Leaving us the image, 
package, rootfs, etc directories.

So I think this maps well to "how did we get here".


The one question I ask, is why we're using ',' as a separator and not ':' like 
standard Unix style paths.  But I suspect the answer is Windows level 
support....  but I'd be very tempted to move it back to ':' to make this more 
standard like other path separators.  (But that can be done at another time!)


--Mark


> Ross
> 
> 
> 
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#192185): https://lists.openembedded.org/g/openembedded-core/message/192185
> Mute This Topic: https://lists.openembedded.org/mt/103113368/3616948
> Group Owner: openembedded-core+owner@lists.openembedded.org
> Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [mark.hatle@kernel.crashing.org]
> -=-=-=-=-=-=-=-=-=-=-=-
>
Seebs Dec. 12, 2023, 6:51 p.m. UTC | #4
On Tue, 12 Dec 2023 12:44:47 -0600
"Mark Hatle" <mark.hatle@kernel.crashing.org> wrote:

> /tmp - used by the compiler (and tons of other tooling) for temporary
> files, we need this for sure.

If memory serves, this is basically why it was an IGNORE path list
originally. If we ever accessed anything outside of pseudo, we'd get
leakage. The basic pattern looks like:

* file gets created outside of our workspace
* we then copy it in using something that tries to preserve ownership
* since file wasn't being tracked through pseudo, it has real UID on it
* now we've copied something into our workspace using that UID

The very early design, back in our pre-Yocto build system, imagined a
single unified database being used for the entire build process,
persistently. That was maybe not the best design idea, but it was
how we'd been using fakeroot and I didn't really revisit it at the
time. Also, we had a *lot* of cross-pollination between components, so
things would end up copying in or referring to files that were part
of another package, without using the intermediate archived form, so...

Wild times. Definitely one of those things where, with the wisdom of
hindsight, I know enough about the problem that if you asked me to do
it today I'd probably confidently tell you that it's not possible. :)

-s
diff mbox series

Patch

diff --git a/meta/conf/bitbake.conf b/meta/conf/bitbake.conf
index e7826e7af96..bf639fe4f2a 100644
--- a/meta/conf/bitbake.conf
+++ b/meta/conf/bitbake.conf
@@ -745,18 +745,15 @@  SRC_URI = ""
 PSEUDO_LOCALSTATEDIR ?= "${WORKDIR}/pseudo/"
 PSEUDO_PASSWD ?= "${STAGING_DIR_TARGET}:${PSEUDO_SYSROOT}"
 PSEUDO_SYSROOT = "${COMPONENTS_DIR}/${BUILD_ARCH}/pseudo-native"
-PSEUDO_IGNORE_PATHS = "/usr/,/etc/,/lib,/dev/,/run/,${T},${WORKDIR}/recipe-sysroot,${SSTATE_DIR},${STAMPS_DIR}"
-PSEUDO_IGNORE_PATHS .= ",${TMPDIR}/sstate-control,${TMPDIR}/buildstats,${TMPDIR}/sysroots-components,${TMPDIR}/pkgdata"
-PSEUDO_IGNORE_PATHS .= ",${WORKDIR}/deploy-,${WORKDIR}/sstate-build-package_,${WORKDIR}/sstate-install-package_,${WORKDIR}/pkgdata-sysroot"
-PSEUDO_IGNORE_PATHS .= ",${DEPLOY_DIR},${BUILDHISTORY_DIR},${TOPDIR}/cache,${COREBASE}/scripts,${CCACHE_DIR}"
+PSEUDO_INCLUDE_PATHS = "/tmp,/proc,/dev,${WORKDIR}/image,${WORKDIR}/package,${WORKDIR}/rootfs,${WORKDIR}/sstate-build-package/,${WORKDIR}/sstate-install-package/,${WORKDIR}/pkgdata,${WORKDIR}/minidebuginfo"
 
 export PSEUDO_DISABLED = "1"
 #export PSEUDO_PREFIX = "${STAGING_DIR_NATIVE}${prefix_native}"
 #export PSEUDO_BINDIR = "${STAGING_DIR_NATIVE}${bindir_native}"
 #export PSEUDO_LIBDIR = "${STAGING_DIR_NATIVE}$PSEUDOBINDIR/../lib/pseudo/lib
-FAKEROOTBASEENV = "PSEUDO_BINDIR=${PSEUDO_SYSROOT}${bindir_native} PSEUDO_LIBDIR=${PSEUDO_SYSROOT}${prefix_native}/lib/pseudo/lib PSEUDO_PREFIX=${PSEUDO_SYSROOT}${prefix_native} PSEUDO_IGNORE_PATHS=${@oe.path.canonicalize(d.getVar('PSEUDO_IGNORE_PATHS'))} PSEUDO_DISABLED=1 PYTHONDONTWRITEBYTECODE=1"
+FAKEROOTBASEENV = "PSEUDO_BINDIR=${PSEUDO_SYSROOT}${bindir_native} PSEUDO_LIBDIR=${PSEUDO_SYSROOT}${prefix_native}/lib/pseudo/lib PSEUDO_PREFIX=${PSEUDO_SYSROOT}${prefix_native} PSEUDO_INCLUDE_PATHS=${@oe.path.canonicalize(d.getVar('PSEUDO_INCLUDE_PATHS'))} PSEUDO_DISABLED=1 PYTHONDONTWRITEBYTECODE=1"
 FAKEROOTCMD = "${PSEUDO_SYSROOT}${bindir_native}/pseudo"
-FAKEROOTENV = "PSEUDO_PREFIX=${PSEUDO_SYSROOT}${prefix_native} PSEUDO_LOCALSTATEDIR=${PSEUDO_LOCALSTATEDIR} PSEUDO_PASSWD=${PSEUDO_PASSWD} PSEUDO_NOSYMLINKEXP=1 PSEUDO_IGNORE_PATHS=${@oe.path.canonicalize(d.getVar('PSEUDO_IGNORE_PATHS'))} PSEUDO_DISABLED=0"
+FAKEROOTENV = "PSEUDO_PREFIX=${PSEUDO_SYSROOT}${prefix_native} PSEUDO_LOCALSTATEDIR=${PSEUDO_LOCALSTATEDIR} PSEUDO_PASSWD=${PSEUDO_PASSWD} PSEUDO_NOSYMLINKEXP=1 PSEUDO_INCLUDE_PATHS=${@oe.path.canonicalize(d.getVar('PSEUDO_INCLUDE_PATHS'))} PSEUDO_DISABLED=0"
 FAKEROOTNOENV = "PSEUDO_UNLOAD=1"
 FAKEROOTDIRS = "${PSEUDO_LOCALSTATEDIR}"
 FAKEROOTLOGS = "${WORKDIR}/pseudo/pseudo.log"
@@ -951,7 +948,7 @@  BB_HASHEXCLUDE_COMMON ?= "TMPDIR FILE PATH PWD BB_TASKHASH BBPATH BBSERVER DL_DI
     SSTATE_HASHEQUIV_METHOD SSTATE_HASHEQUIV_REPORT_TASKDATA \
     SSTATE_HASHEQUIV_OWNER CCACHE_TOP_DIR BB_HASHSERVE GIT_CEILING_DIRECTORIES \
     OMP_NUM_THREADS BB_CURRENTTASK"
-BB_BASEHASH_IGNORE_VARS ?= "${BB_HASHEXCLUDE_COMMON} PSEUDO_IGNORE_PATHS BUILDHISTORY_DIR \
+BB_BASEHASH_IGNORE_VARS ?= "${BB_HASHEXCLUDE_COMMON} PSEUDO_INCLUDE_PATHS BUILDHISTORY_DIR \
     SSTATE_DIR SOURCE_DATE_EPOCH RUST_BUILD_SYS RUST_HOST_SYS RUST_TARGET_SYS"
 BB_HASHCONFIG_IGNORE_VARS ?= "${BB_HASHEXCLUDE_COMMON} DATE TIME SSH_AGENT_PID \
     SSH_AUTH_SOCK PSEUDO_BUILD BB_ENV_PASSTHROUGH_ADDITIONS DISABLE_SANITY_CHECKS \
diff --git a/meta/recipes-core/glibc/glibc-locale.inc b/meta/recipes-core/glibc/glibc-locale.inc
index 289f58d4df9..d946b939ec3 100644
--- a/meta/recipes-core/glibc/glibc-locale.inc
+++ b/meta/recipes-core/glibc/glibc-locale.inc
@@ -64,6 +64,8 @@  FILES:localedef = "${bindir}/localedef"
 
 LOCALETREESRC = "${COMPONENTS_DIR}/${PACKAGE_ARCH}/glibc-stash-locale"
 
+PSEUDO_INCLUDE_PATHS .= ",${WORKDIR}/locale-tree"
+
 copy_locale_files() {
 	local dir=$1 mode=$2
 
diff --git a/meta/recipes-devtools/pseudo/pseudo_git.bb b/meta/recipes-devtools/pseudo/pseudo_git.bb
index 699cab11c66..daecb1554f5 100644
--- a/meta/recipes-devtools/pseudo/pseudo_git.bb
+++ b/meta/recipes-devtools/pseudo/pseudo_git.bb
@@ -1,6 +1,6 @@ 
 require pseudo.inc
 
-SRC_URI = "git://git.yoctoproject.org/pseudo;branch=master;protocol=https \
+SRC_URI = "git://git.yoctoproject.org/pseudo;branch=rpurdie/include;protocol=https \
            file://0001-configure-Prune-PIE-flags.patch \
            file://glibc238.patch \
            file://fallback-passwd \
@@ -14,7 +14,7 @@  SRC_URI:append:class-nativesdk = " \
     file://older-glibc-symbols.patch"
 SRC_URI[prebuilt.sha256sum] = "ed9f456856e9d86359f169f46a70ad7be4190d6040282b84c8d97b99072485aa"
 
-SRCREV = "a8453eea4d902bbb0e01c786f1cb4a178c3bbee3"
+SRCREV = "97f343ddacdd308c54d298a9dde7019968aa959d"
 S = "${WORKDIR}/git"
 PV = "1.9.0+git"