Message ID | 20220217003248.581903-1-saul.wold@windriver.com |
---|---|
State | New |
Headers | show |
Series | [v2] convert-variables: Script for Inclusive Language variable renames | expand |
On Wed, 2022-02-16 at 16:32 -0800, Saul Wold wrote: > From: Saul Wold <Saul.Wold@windriver.com> > > This script searches for a list of variable that have been renamed > and converts them to their more descriptive names. It also searches > for a list of variables that have been removed or deprecated and > prints a message. > > It will print a message to inform the user that there are terms that > need to be updated in their files. Many of these changes are context > sensitive and may not be modified as they might be existing calls to > other libraries. This message is informational only. > > I have tested this on poky and meta-openembedded so far. > > (From OE-Core rev: 50fe7ba8dba05a9681c9095506f798796cfc2750) > > Signed-off-by: Saul Wold <saul.wold@windriver.com> > --- > v2: renamed script, removed bitbake internal vars, added WHITELIST_ option > > scripts/contrib/convert-variables.py | 110 +++++++++++++++++++++++++++ > 1 file changed, 110 insertions(+) > create mode 100755 scripts/contrib/convert-variables.py > > diff --git a/scripts/contrib/convert-variables.py b/scripts/contrib/convert-variables.py > new file mode 100755 > index 0000000000..a632fd4d5c > --- /dev/null > +++ b/scripts/contrib/convert-variables.py > @@ -0,0 +1,110 @@ > +#!/usr/bin/env python3 > +# > +# Conversion script to rename variables with more descriptive terms > +# > +# > +# SPDX-License-Identifier: GPL-2.0-only > +# > + > +import re > +import os > +import sys > +import tempfile > +import shutil > +import mimetypes > + > +if len(sys.argv) < 2: > + print("Please specify a directory to run the conversion script against.") > + sys.exit(1) > + > +renames = { > +"BB_ENV_WHITELIST":"BB_ENV_PASSTHROUGH", > +"BB_ENV_EXTRAWHITE":"BB_ENV_PASSTHROUGH_ADDITIONS", > +"BB_HASHCONFIG_WHITELIST":"BB_HASHCONFIG_IGNORE_VARS", > +"BB_SETSCENE_ENFORCE_WHITELIST":"BB_SETSCENE_ENFORCE_IGNORE_TASKS", > +"BB_HASHBASE_WHITELIST":"BB_BASEHASH_IGNORE_VARS", > +"BB_HASHTASK_WHITELIST":"BB_TASKHASH_IGNORE_TASKS", > +"CVE_CHECK_PN_WHITELIST":"CVE_CHECK_SKIP_RECIPE", > +"CVE_CHECK_WHITELIST":"CVE_CHECK_IGNORE", > +"MULTI_PROVIDER_WHITELIST":"BB_MULTI_PROVIDER_ALLOWED", > +"PNBLACKLIST":"SKIP_RECIPE", > +"SDK_LOCAL_CONF_BLACKLIST":"ESDK_LOCAL_CONF_ALLOW", > +"SDK_LOCAL_CONF_WHITELIST":"ESDK_LOCAL_CONF_REMOVE", > +"SDK_INHERIT_BLACKLIST":"ESDK_CLASS_INHERIT_DISABLE", > +"SSTATE_DUPWHITELIST":"SSTATE_ALLOW_OVERLAP_FILES", > +"SYSROOT_DIRS_BLACKLIST":"SYSROOT_DIRS_IGNORE", > +"UNKNOWN_CONFIGURE_WHITELIST":"UNKNOWN_CONFIGURE_OPT_IGNORE", > +"WHITELIST_":"INCOMPATIBLE_LICENSE_ALLOWED_RECIPE_", > +} > + > +removed_list = [ > +"BB_STAMP_WHITELIST", > +"BB_STAMP_POLICY", > +"ICECC_USER_CLASS_BL", > +"ICECC_USER_PACKAGE_BL", > +"ICECC_USER_PACKAGE_WL", > +"INHERIT_BLACKLIST", > +"TUNEABI_WHITELIST", > +] > + Thanks Saul. The key to making this a success is in the detail and I know you're doing your best with it but there are some things we need to get right. a) the script name or description at the top of it still doesn't sit right with me. "convert-variables" doesn't really tell me what it is for in a years time. I've tweaked the patch to "convert-variable-renames.py" locally which mentions the fact it is renaming things and I tweaked the description too. b) ESDK_LOCAL_CONF_ALLOW and ESDK_LOCAL_CONF_REMOVE are backwards above. The wiki also used LOCALCONF in one case and LOCAL_CONF in the other. I prefer LOCALCONF and that was what the original discussion said so I've fixed everything to match that. I've updated things in master-next. c) The WHITELIST_ change isn't what was discussed. That variable needs removing somehow, at least the license addition bit to it as it is horrible. A straight conversion is not appropriate. We can patch this into the conversion script when it is ready, IMO it isn't there yet. d) The subtleties of the ICECC_USER* translations have gotten lost in the system. The idea was to merge several variables together. The final piece of all this is communication. We need regular and clear communication with the community about what is happening, when things are merging, what work remains, the current work items and so on. I guess my next best thing to work on is going to be to communicate where things sit as I see them and what remains. Cheers, Richard
On Wed, 2022-02-16 at 16:32 -0800, Saul Wold wrote: > +def processfile(fn): > + > + print(f"processing file '{fn}'") The fact that someone took the original conversion scripts, dropped the copyright header and then added f strings to it, which the project doesn't yet officially support and are an issue of contention, *really* could annoy me. Cheers, Richard
"Saul Wold" <Saul.Wold@windriver.com> writes: > From: Saul Wold <Saul.Wold@windriver.com> > > This script searches for a list of variable that have been renamed > and converts them to their more descriptive names. again: most of these renamings make identifiers much less descriptive because they now sound like boolean flags instead of lists > +"BB_SETSCENE_ENFORCE_WHITELIST":"BB_SETSCENE_ENFORCE_IGNORE_TASKS", > +"BB_HASHBASE_WHITELIST":"BB_BASEHASH_IGNORE_VARS", > +"BB_HASHTASK_WHITELIST":"BB_TASKHASH_IGNORE_TASKS", > +"CVE_CHECK_PN_WHITELIST":"CVE_CHECK_SKIP_RECIPE", > +"CVE_CHECK_WHITELIST":"CVE_CHECK_IGNORE", > +"MULTI_PROVIDER_WHITELIST":"BB_MULTI_PROVIDER_ALLOWED", > +"PNBLACKLIST":"SKIP_RECIPE", > +"SDK_LOCAL_CONF_BLACKLIST":"ESDK_LOCAL_CONF_ALLOW", > +"SDK_LOCAL_CONF_WHITELIST":"ESDK_LOCAL_CONF_REMOVE", > +"SDK_INHERIT_BLACKLIST":"ESDK_CLASS_INHERIT_DISABLE", > +"SSTATE_DUPWHITELIST":"SSTATE_ALLOW_OVERLAP_FILES", > +"SYSROOT_DIRS_BLACKLIST":"SYSROOT_DIRS_IGNORE", > +"UNKNOWN_CONFIGURE_WHITELIST":"UNKNOWN_CONFIGURE_OPT_IGNORE", Enrico
On Fri, 2022-02-18 at 14:40 +0100, Enrico Scholz via lists.openembedded.org wrote: > "Saul Wold" <Saul.Wold@windriver.com> writes: > > > From: Saul Wold <Saul.Wold@windriver.com> > > > > This script searches for a list of variable that have been renamed > > and converts them to their more descriptive names. > > again: most of these renamings make identifiers much less descriptive > because they now sound like boolean flags instead of lists Whilst you could say they sound like booleans, they clearly can't be as that wouldn't make sense, e.g. "Ignoring variables in basehash" makes no sense. Secondly, most metadata variables are lists of things. Adding "LIST" to every variable that is a list of items would just make the variable names much longer with little end resulting readability gain. Unfortunately there likely isn't a perfect rename and these were the best we were able to come up with. Cheers, Richard
Richard Purdie <richard.purdie@linuxfoundation.org> writes: >> > This script searches for a list of variable that have been renamed >> > and converts them to their more descriptive names. s/descriptive/politically correct/ >> again: most of these renamings make identifiers much less descriptive >> because they now sound like boolean flags instead of lists > > Whilst you could say they sound like booleans, they clearly can't be > as that wouldn't make sense, e.g. "Ignoring variables in basehash" > makes no sense. True; because it does not make sense, these variables should not be named in this way. > Secondly, most metadata variables are lists of things. The renamed ones; yes. But when I look in bitbake.conf, I can not say that most variables are lists. > Adding "LIST" to every variable that is a list of items would just make > the variable names much longer with little end resulting readability > gain. True; explicit type annotations like in the hungarian notation are nonsense. But "whitelist" and "blacklist" are words with a meaning; the "list" is not used as a type annotation there. Variable names were perfectly readable. A change should make things better, not worse. > Unfortunately there likely isn't a perfect rename and these were the > best we were able to come up with. Why is the rename done at all? It makes the product just worse due to a less usable api and because it causes a lot of work for the users of the api (I shudder when I think about updating my CI to work with new and old branches). Enrico
On Fri, 18 Feb 2022 at 15:27, Enrico Scholz via lists.openembedded.org <enrico.scholz=sigma-chemnitz.de@lists.openembedded.org> wrote: > > Unfortunately there likely isn't a perfect rename and these were the > > best we were able to come up with. > > Why is the rename done at all? It makes the product just worse due to a > less usable api and because it causes a lot of work for the users of the > api (I shudder when I think about updating my CI to work with new and > old branches). Enrico, cut this out please. Every time you start with a specific, kind of valid, question, then turn it into a rant about how the whole effort is useless or worse. People simply will stop answering you if this carries on. Alex
On Fri, 2022-02-18 at 15:27 +0100, Enrico Scholz wrote: > Richard Purdie <richard.purdie@linuxfoundation.org> writes: > > > > > This script searches for a list of variable that have been renamed > > > > and converts them to their more descriptive names. > > s/descriptive/politically correct/ We did try and make some of the names better describe what the variables do and make the variables more consistent. For example, a reference to HASHBASE was changed to BASEHASH which better matches every other reference to that thing in the code. The BB_ENV variables were also traditionally very confusing. Once you know, you know but for new users they weren't great. Also see the discussion about the license variable issues. I'm very consciously blocking the "simple" change as I want to see things improve. > Variable names were perfectly readable. A change should make things > better, not worse. See above, there are at least some changes which do. I'm sad you don't agree. > > Unfortunately there likely isn't a perfect rename and these were the > > best we were able to come up with. > > Why is the rename done at all? It makes the product just worse due to a > less usable api and because it causes a lot of work for the users of the > api (I shudder when I think about updating my CI to work with new and > old branches). For better or worse some of the terminology we have is offensive to some people. Obviously (and sadly) there will probably always be some element of something which someone could be offended by but these issues have come under a particular spotlight. People with far more experience of this than us have produced guidelines on the key issues, the priority of dealing with certain language and so on. We do have people wanting to try and improve things. Projects do need to be open to and able to change. If we don't try and take this input and help those people make such changes where appropriate, we'll just alienate and frustrate more users even if those users were not personally offended by the language, a kind of "rot" can set in to our community. I do not want to see that. I have advised caution all the way through this. Should the project get this wrong, the potential for negative social media feedback against the project is huge. For me personally, there is also a huge risk should I "misstep" in handling this. The potential to break things for users is also really high, I realise that. I have done a lot of work to try and make sure issues are at least clearly reported to users with some idea of how to resolve them so the transition is less painful. I do have some experience of making changes to the project and am trying my best to use that to our benefit. At this stage, rightly or wrongly (I make no judgement) the project cannot afford not to make these changes. The best thing we can do is to try and reduce the impact on users as best we can. In doing so, we also make the project's values very clear. We do not mean any offence by the language we've had, we just didn't realise the issues. Now we are aware, we're doing our best to correct it. Cheers, Richard
Richard Purdie <richard.purdie@linuxfoundation.org> writes: >> > > > This script searches for a list of variable that have been renamed >> > > > and converts them to their more descriptive names. >> >> s/descriptive/politically correct/ > > We did try and make some of the names better describe what the variables > do and make the variables more consistent. > > For example, a reference to HASHBASE was changed to BASEHASH which > better matches every other reference to that thing in the code. When you really feel that this Yoda style naming must be fixed; then ok. But for me, the gain would not be large enough to break the api. > The BB_ENV variables were also traditionally very confusing. Once you > know, you know but for new users they weren't great. The new naming is much more confusing. While this kind of operation (exclude or include something) was previously named by "whitelist" + "blacklist" more or less consistently, it is scattered around across multiple variants now. Especially for the hash related lists, searching for "whitelist" or "blacklist" in the sources revealed often the right way how to do things. > Also see the discussion about the license variable issues. I'm very > consciously blocking the "simple" change as I want to see things > improve. > >> Variable names were perfectly readable. A change should make things >> better, not worse. > > See above, there are at least some changes which do. I'm sad you don't > agree. ok; some changes are ok. E.g. | "PNBLACKLIST":"SKIP_RECIPE", because former name is misleading (it is not really a list/set but has some specially addressing by varflags). >> Why is the rename done at all? It makes the product just worse due to a >> less usable api and because it causes a lot of work for the users of the >> api (I shudder when I think about updating my CI to work with new and >> old branches). > > For better or worse some of the terminology we have is offensive to > some people. Is there any real-world evidence that this is really the case? Especially the embedded sector is filled with blacklisted terms (i2c master/slave, spi mosi/miso) so I do not think that somebody is really offended. In the opposite, I feel offended when such changes are labeled as (technical) improvements ("more descriptive names"). > Obviously (and sadly) there will probably always be some element of > something which someone could be offended by but these issues have > come under a particular spotlight. People with far more experience of > this than us have produced guidelines on the key issues, In my experience, such people have too much spare time and invented a problem only to come up with a solution later. > We do have people wanting to try and improve things. Projects do > need to be open to and able to change. If we don't try and take this > input and help those people make such changes where appropriate, > we'll just alienate and frustrate more users even if those users > were not personally offended by the language, a kind of "rot" can > set in to our community. I do not want to see that I see a much higher risk when significant changes are done due to political reasons rather than technical ones. > I have advised caution all the way through this. Should the project > get this wrong, the potential for negative social media feedback > against the project is huge. Does such negative social media feedback really exist resp. does it exist in channels that are relevant to us? Linux kernel has not "cleaned" its api either and just said that future development should avoid certain terms. I can not remember any negative feedback. Just do the same here: avoid some terms in future development and when there are really technical reasons to touch existing variables, then rename them. > For me personally, there is also a huge risk should I "misstep" in > handling this. The potential to break things for users is also really > high, I realise that. I have done a lot of work to try and make sure > issues are at least clearly reported to users with some idea of how > to resolve them so the transition is less painful. I do have some > experience of making changes to the project and am trying my best to > use that to our benefit. You are doing a great job, but the project evolved beautiful over the last >15 years with the old identifier and I can not remember any complaints about them. Enrico
diff --git a/scripts/contrib/convert-variables.py b/scripts/contrib/convert-variables.py new file mode 100755 index 0000000000..a632fd4d5c --- /dev/null +++ b/scripts/contrib/convert-variables.py @@ -0,0 +1,110 @@ +#!/usr/bin/env python3 +# +# Conversion script to rename variables with more descriptive terms +# +# +# SPDX-License-Identifier: GPL-2.0-only +# + +import re +import os +import sys +import tempfile +import shutil +import mimetypes + +if len(sys.argv) < 2: + print("Please specify a directory to run the conversion script against.") + sys.exit(1) + +renames = { +"BB_ENV_WHITELIST":"BB_ENV_PASSTHROUGH", +"BB_ENV_EXTRAWHITE":"BB_ENV_PASSTHROUGH_ADDITIONS", +"BB_HASHCONFIG_WHITELIST":"BB_HASHCONFIG_IGNORE_VARS", +"BB_SETSCENE_ENFORCE_WHITELIST":"BB_SETSCENE_ENFORCE_IGNORE_TASKS", +"BB_HASHBASE_WHITELIST":"BB_BASEHASH_IGNORE_VARS", +"BB_HASHTASK_WHITELIST":"BB_TASKHASH_IGNORE_TASKS", +"CVE_CHECK_PN_WHITELIST":"CVE_CHECK_SKIP_RECIPE", +"CVE_CHECK_WHITELIST":"CVE_CHECK_IGNORE", +"MULTI_PROVIDER_WHITELIST":"BB_MULTI_PROVIDER_ALLOWED", +"PNBLACKLIST":"SKIP_RECIPE", +"SDK_LOCAL_CONF_BLACKLIST":"ESDK_LOCAL_CONF_ALLOW", +"SDK_LOCAL_CONF_WHITELIST":"ESDK_LOCAL_CONF_REMOVE", +"SDK_INHERIT_BLACKLIST":"ESDK_CLASS_INHERIT_DISABLE", +"SSTATE_DUPWHITELIST":"SSTATE_ALLOW_OVERLAP_FILES", +"SYSROOT_DIRS_BLACKLIST":"SYSROOT_DIRS_IGNORE", +"UNKNOWN_CONFIGURE_WHITELIST":"UNKNOWN_CONFIGURE_OPT_IGNORE", +"WHITELIST_":"INCOMPATIBLE_LICENSE_ALLOWED_RECIPE_", +} + +removed_list = [ +"BB_STAMP_WHITELIST", +"BB_STAMP_POLICY", +"ICECC_USER_CLASS_BL", +"ICECC_USER_PACKAGE_BL", +"ICECC_USER_PACKAGE_WL", +"INHERIT_BLACKLIST", +"TUNEABI_WHITELIST", +] + +context_check_list = [ +"blacklist", +"whitelist", +"abort", +] + +def processfile(fn): + + print(f"processing file '{fn}'") + try: + fh, abs_path = tempfile.mkstemp() + modified = False + with os.fdopen(fh, 'w') as new_file: + with open(fn, "r") as old_file: + lineno = 0 + for line in old_file: + lineno += 1 + if not line or "BB_RENAMED_VARIABLE" in line: + continue + # Do the renames + for old_name, new_name in renames.items(): + if old_name in line: + line = line.replace(old_name, new_name) + modified = True + # Find removed names + for removed_name in removed_list: + if removed_name in line: + print(f"{fn} needs further work at line {lineno} because {removed_name} has been deprecated") + for check_word in context_check_list: + if re.search(check_word, line, re.IGNORECASE): + print(f"{fn} needs further work at line {lineno} since it contains {check_word}") + new_file.write(line) + if modified: + print(f"*** Modified file '{fn}'") + shutil.copymode(fn, abs_path) + os.remove(fn) + shutil.move(abs_path, fn) + except UnicodeDecodeError: + pass + +ourname = os.path.basename(sys.argv[0]) +ourversion = "0.1" + +if os.path.isfile(sys.argv[1]): + processfile(sys.argv[1]) + sys.exit(0) + +for targetdir in sys.argv[1:]: + print("processing directory '%s'" % targetdir) + for root, dirs, files in os.walk(targetdir): + for name in files: + if name == ourname: + continue + fn = os.path.join(root, name) + if os.path.islink(fn): + continue + if "ChangeLog" in fn or "/.git/" in fn or fn.endswith(".html") or fn.endswith(".patch") or fn.endswith(".m4") or fn.endswith(".diff") or fn.endswith(".orig"): + continue + processfile(fn) + +print("All files processed with version %s" % ourversion)