Message ID | 20220203170724.1319808-1-saul.wold@windriver.com |
---|---|
State | New |
Headers | show |
Series | recipetool/create: Scan for SDPX-License-Identifier | expand |
On Thu, 2022-02-03 at 09:07 -0800, Saul Wold wrote: > When a file can not be identified by checksum and they contain an SPDX > License-Identifier tag, use it as the found license. > > [YOCTO #14529] > > Tested with LICENSE files that contain 1 or more SPDX-License-Identifier tags > > Signed-off-by: Saul Wold <saul.wold@windriver.com> > --- > scripts/lib/recipetool/create.py | 16 +++++++++++----- > 1 file changed, 11 insertions(+), 5 deletions(-) > > diff --git a/scripts/lib/recipetool/create.py b/scripts/lib/recipetool/create.py > index 507a230511..9149c2d94f 100644 > --- a/scripts/lib/recipetool/create.py > +++ b/scripts/lib/recipetool/create.py > @@ -1221,14 +1221,20 @@ def guess_license(srctree, d): > for licfile in sorted(licfiles): > md5value = bb.utils.md5_file(licfile) > license = md5sums.get(md5value, None) > + license_list = [] > if not license: > license, crunched_md5, lictext = crunch_license(licfile) > if lictext and not license: > - license = 'Unknown' > - logger.info("Please add the following line for '%s' to a 'lib/recipetool/licenses.csv' " \ > - "and replace `Unknown` with the license:\n" \ > - "%s,Unknown" % (os.path.relpath(licfile, srctree), md5value)) > - if license: > + spdx_re = re.compile('SPDX-License-Identifier:\s+([-A-Za-z\d. ]+)[ |\n|\r\n]*?') > + license_list = re.findall(spdx_re, "\n".join(lictext)) > + if not license_list: > + license_list.append('Unknown') > + logger.info("Please add the following line for '%s' to a 'lib/recipetool/licenses.csv' " \ > + "and replace `Unknown` with the license:\n" \ > + "%s,Unknown" % (os.path.relpath(licfile, srctree), md5value)) > + else: > + license_list.append(license) > + for license in license_list: > licenses.append((license, os.path.relpath(licfile, srctree), md5value)) > > # FIXME should we grab at least one source file with a license header and add that too? I think to close this bug the code may need to go one step further and effectively grep over the source tree. We'd probably want to list the value of any SPDX-License-Identifier: header found in any of the source files for the user to then decide upon? Or am I misunderstanding? Cheers, Richard
On 2/3/22 13:24, Richard Purdie wrote: > On Thu, 2022-02-03 at 09:07 -0800, Saul Wold wrote: >> When a file can not be identified by checksum and they contain an SPDX >> License-Identifier tag, use it as the found license. >> >> [YOCTO #14529] >> >> Tested with LICENSE files that contain 1 or more SPDX-License-Identifier tags >> >> Signed-off-by: Saul Wold <saul.wold@windriver.com> >> --- >> scripts/lib/recipetool/create.py | 16 +++++++++++----- >> 1 file changed, 11 insertions(+), 5 deletions(-) >> >> diff --git a/scripts/lib/recipetool/create.py b/scripts/lib/recipetool/create.py >> index 507a230511..9149c2d94f 100644 >> --- a/scripts/lib/recipetool/create.py >> +++ b/scripts/lib/recipetool/create.py >> @@ -1221,14 +1221,20 @@ def guess_license(srctree, d): >> for licfile in sorted(licfiles): >> md5value = bb.utils.md5_file(licfile) >> license = md5sums.get(md5value, None) >> + license_list = [] >> if not license: >> license, crunched_md5, lictext = crunch_license(licfile) >> if lictext and not license: >> - license = 'Unknown' >> - logger.info("Please add the following line for '%s' to a 'lib/recipetool/licenses.csv' " \ >> - "and replace `Unknown` with the license:\n" \ >> - "%s,Unknown" % (os.path.relpath(licfile, srctree), md5value)) >> - if license: >> + spdx_re = re.compile('SPDX-License-Identifier:\s+([-A-Za-z\d. ]+)[ |\n|\r\n]*?') >> + license_list = re.findall(spdx_re, "\n".join(lictext)) >> + if not license_list: >> + license_list.append('Unknown') >> + logger.info("Please add the following line for '%s' to a 'lib/recipetool/licenses.csv' " \ >> + "and replace `Unknown` with the license:\n" \ >> + "%s,Unknown" % (os.path.relpath(licfile, srctree), md5value)) >> + else: >> + license_list.append(license) >> + for license in license_list: >> licenses.append((license, os.path.relpath(licfile, srctree), md5value)) >> >> # FIXME should we grab at least one source file with a license header and add that too? > > I think to close this bug the code may need to go one step further and > effectively grep over the source tree. > > We'd probably want to list the value of any SPDX-License-Identifier: header > found in any of the source files for the user to then decide upon? > That's moving in to the create-spdx.bbclass territory I think. The change would need to be much larger. and I will likely have to shelve for a while. > Or am I misunderstanding? > Maybe it's my misunderstanding, Tim has mentioned the LICENSE related files in the bug report. Sau! > Cheers, > > Richard > > > > >
On Thu, 2022-02-03 at 13:58 -0800, Saul Wold wrote: > > On 2/3/22 13:24, Richard Purdie wrote: > > On Thu, 2022-02-03 at 09:07 -0800, Saul Wold wrote: > > > When a file can not be identified by checksum and they contain an SPDX > > > License-Identifier tag, use it as the found license. > > > > > > [YOCTO #14529] > > > > > > Tested with LICENSE files that contain 1 or more SPDX-License-Identifier tags > > > > > > Signed-off-by: Saul Wold <saul.wold@windriver.com> > > > --- > > > scripts/lib/recipetool/create.py | 16 +++++++++++----- > > > 1 file changed, 11 insertions(+), 5 deletions(-) > > > > > > diff --git a/scripts/lib/recipetool/create.py b/scripts/lib/recipetool/create.py > > > index 507a230511..9149c2d94f 100644 > > > --- a/scripts/lib/recipetool/create.py > > > +++ b/scripts/lib/recipetool/create.py > > > @@ -1221,14 +1221,20 @@ def guess_license(srctree, d): > > > for licfile in sorted(licfiles): > > > md5value = bb.utils.md5_file(licfile) > > > license = md5sums.get(md5value, None) > > > + license_list = [] > > > if not license: > > > license, crunched_md5, lictext = crunch_license(licfile) > > > if lictext and not license: > > > - license = 'Unknown' > > > - logger.info("Please add the following line for '%s' to a 'lib/recipetool/licenses.csv' " \ > > > - "and replace `Unknown` with the license:\n" \ > > > - "%s,Unknown" % (os.path.relpath(licfile, srctree), md5value)) > > > - if license: > > > + spdx_re = re.compile('SPDX-License-Identifier:\s+([-A-Za-z\d. ]+)[ |\n|\r\n]*?') > > > + license_list = re.findall(spdx_re, "\n".join(lictext)) > > > + if not license_list: > > > + license_list.append('Unknown') > > > + logger.info("Please add the following line for '%s' to a 'lib/recipetool/licenses.csv' " \ > > > + "and replace `Unknown` with the license:\n" \ > > > + "%s,Unknown" % (os.path.relpath(licfile, srctree), md5value)) > > > + else: > > > + license_list.append(license) > > > + for license in license_list: > > > licenses.append((license, os.path.relpath(licfile, srctree), md5value)) > > > > > > # FIXME should we grab at least one source file with a license header and add that too? > > > > I think to close this bug the code may need to go one step further and > > effectively grep over the source tree. > > > > We'd probably want to list the value of any SPDX-License-Identifier: header > > found in any of the source files for the user to then decide upon? > > > That's moving in to the create-spdx.bbclass territory I think. The > change would need to be much larger. and I will likely have to shelve > for a while. This isn't related to create-spdx. > > > Or am I misunderstanding? > > > Maybe it's my misunderstanding, Tim has mentioned the LICENSE related > files in the bug report. Right, we want to "guess" what the right LICENSE is for the new recipe. To do that wouldn't we scan all the source for SPDX-License-Identifier: lines in the headers, add those all together and suggest that as the LICENSE field? Cheers, Richard
Hi Saul, Am 03.02.2022 um 18:07 schrieb Saul Wold via lists.openembedded.org: > When a file can not be identified by checksum and they contain an SPDX > License-Identifier tag, use it as the found license. > > [YOCTO #14529] > > Tested with LICENSE files that contain 1 or more SPDX-License-Identifier tags Can you please give an example for an project with use a SPDX-License-Identifier inside a license file. > Signed-off-by: Saul Wold <saul.wold@windriver.com> > --- > scripts/lib/recipetool/create.py | 16 +++++++++++----- > 1 file changed, 11 insertions(+), 5 deletions(-) > > diff --git a/scripts/lib/recipetool/create.py b/scripts/lib/recipetool/create.py > index 507a230511..9149c2d94f 100644 > --- a/scripts/lib/recipetool/create.py > +++ b/scripts/lib/recipetool/create.py > @@ -1221,14 +1221,20 @@ def guess_license(srctree, d): > for licfile in sorted(licfiles): > md5value = bb.utils.md5_file(licfile) > license = md5sums.get(md5value, None) > + license_list = [] Could you please use an other name. We already have licenses and it is hard to distinguish the difference between licenses and license_list. > if not license: > license, crunched_md5, lictext = crunch_license(licfile) > if lictext and not license: > - license = 'Unknown' > - logger.info("Please add the following line for '%s' to a 'lib/recipetool/licenses.csv' " \ > - "and replace `Unknown` with the license:\n" \ > - "%s,Unknown" % (os.path.relpath(licfile, srctree), md5value)) > - if license: > + spdx_re = re.compile('SPDX-License-Identifier:\s+([-A-Za-z\d. ]+)[ |\n|\r\n]*?') > + license_list = re.findall(spdx_re, "\n".join(lictext)) > + if not license_list: > + license_list.append('Unknown') > + logger.info("Please add the following line for '%s' to a 'lib/recipetool/licenses.csv' " \ > + "and replace `Unknown` with the license:\n" \ > + "%s,Unknown" % (os.path.relpath(licfile, srctree), md5value)) > + else: > + license_list.append(license) > + for license in license_list: > licenses.append((license, os.path.relpath(licfile, srctree), md5value)) > > # FIXME should we grab at least one source file with a license header and add that too? Regards Stefan
Hi Richard, Am 03.02.2022 um 22:24 schrieb Richard Purdie via lists.openembedded.org: > On Thu, 2022-02-03 at 09:07 -0800, Saul Wold wrote: >> When a file can not be identified by checksum and they contain an SPDX >> License-Identifier tag, use it as the found license. >> >> [YOCTO #14529] >> >> Tested with LICENSE files that contain 1 or more SPDX-License-Identifier tags >> >> Signed-off-by: Saul Wold <saul.wold@windriver.com> >> --- >> scripts/lib/recipetool/create.py | 16 +++++++++++----- >> 1 file changed, 11 insertions(+), 5 deletions(-) >> >> diff --git a/scripts/lib/recipetool/create.py b/scripts/lib/recipetool/create.py >> index 507a230511..9149c2d94f 100644 >> --- a/scripts/lib/recipetool/create.py >> +++ b/scripts/lib/recipetool/create.py >> @@ -1221,14 +1221,20 @@ def guess_license(srctree, d): >> for licfile in sorted(licfiles): >> md5value = bb.utils.md5_file(licfile) >> license = md5sums.get(md5value, None) >> + license_list = [] >> if not license: >> license, crunched_md5, lictext = crunch_license(licfile) >> if lictext and not license: >> - license = 'Unknown' >> - logger.info("Please add the following line for '%s' to a 'lib/recipetool/licenses.csv' " \ >> - "and replace `Unknown` with the license:\n" \ >> - "%s,Unknown" % (os.path.relpath(licfile, srctree), md5value)) >> - if license: >> + spdx_re = re.compile('SPDX-License-Identifier:\s+([-A-Za-z\d. ]+)[ |\n|\r\n]*?') >> + license_list = re.findall(spdx_re, "\n".join(lictext)) >> + if not license_list: >> + license_list.append('Unknown') >> + logger.info("Please add the following line for '%s' to a 'lib/recipetool/licenses.csv' " \ >> + "and replace `Unknown` with the license:\n" \ >> + "%s,Unknown" % (os.path.relpath(licfile, srctree), md5value)) >> + else: >> + license_list.append(license) >> + for license in license_list: >> licenses.append((license, os.path.relpath(licfile, srctree), md5value)) >> >> # FIXME should we grab at least one source file with a license header and add that too? > > I think to close this bug the code may need to go one step further and > effectively grep over the source tree. Please keep in mind that we need a full license text and not only the license name for license compliance. The current function only search for license files with license text. > We'd probably want to list the value of any SPDX-License-Identifier: header > found in any of the source files for the user to then decide upon? I think this is an other feature like a license checker because if you have a SPDX-License-Identifier without a license text you have a license violation. This brings us to the problem that this code will interpret a file with only a SPDX-License-Identifier as a license file with license text. Regards Stefan
On Fri, 2022-02-04 at 10:05 +0100, Stefan Herbrechtsmeier wrote: > Hi Richard, > > Am 03.02.2022 um 22:24 schrieb Richard Purdie via lists.openembedded.org: > > On Thu, 2022-02-03 at 09:07 -0800, Saul Wold wrote: > > > When a file can not be identified by checksum and they contain an SPDX > > > License-Identifier tag, use it as the found license. > > > > > > [YOCTO #14529] > > > > > > Tested with LICENSE files that contain 1 or more SPDX-License-Identifier tags > > > > > > Signed-off-by: Saul Wold <saul.wold@windriver.com> > > > --- > > > scripts/lib/recipetool/create.py | 16 +++++++++++----- > > > 1 file changed, 11 insertions(+), 5 deletions(-) > > > > > > diff --git a/scripts/lib/recipetool/create.py b/scripts/lib/recipetool/create.py > > > index 507a230511..9149c2d94f 100644 > > > --- a/scripts/lib/recipetool/create.py > > > +++ b/scripts/lib/recipetool/create.py > > > @@ -1221,14 +1221,20 @@ def guess_license(srctree, d): > > > for licfile in sorted(licfiles): > > > md5value = bb.utils.md5_file(licfile) > > > license = md5sums.get(md5value, None) > > > + license_list = [] > > > if not license: > > > license, crunched_md5, lictext = crunch_license(licfile) > > > if lictext and not license: > > > - license = 'Unknown' > > > - logger.info("Please add the following line for '%s' to a 'lib/recipetool/licenses.csv' " \ > > > - "and replace `Unknown` with the license:\n" \ > > > - "%s,Unknown" % (os.path.relpath(licfile, srctree), md5value)) > > > - if license: > > > + spdx_re = re.compile('SPDX-License-Identifier:\s+([-A-Za-z\d. ]+)[ |\n|\r\n]*?') > > > + license_list = re.findall(spdx_re, "\n".join(lictext)) > > > + if not license_list: > > > + license_list.append('Unknown') > > > + logger.info("Please add the following line for '%s' to a 'lib/recipetool/licenses.csv' " \ > > > + "and replace `Unknown` with the license:\n" \ > > > + "%s,Unknown" % (os.path.relpath(licfile, srctree), md5value)) > > > + else: > > > + license_list.append(license) > > > + for license in license_list: > > > licenses.append((license, os.path.relpath(licfile, srctree), md5value)) > > > > > > # FIXME should we grab at least one source file with a license header and add that too? > > > > I think to close this bug the code may need to go one step further and > > effectively grep over the source tree. > > Please keep in mind that we need a full license text and not only the > license name for license compliance. The current function only search > for license files with license text. > > > We'd probably want to list the value of any SPDX-License-Identifier: header > > found in any of the source files for the user to then decide upon? > > I think this is an other feature like a license checker because if you > have a SPDX-License-Identifier without a license text you have a license > violation. > > This brings us to the problem that this code will interpret a file with > only a SPDX-License-Identifier as a license file with license text. As I understand it the tool is there to help write a recipe so filling out LICENSE and highlighting a missing full license text would be a valid approach for the tool and helpful to the user? It certainly isn't intended as full validation, just intended to assist the creation of a recipe. Cheers, Richard
Am 04.02.2022 um 14:41 schrieb Richard Purdie: > On Fri, 2022-02-04 at 10:05 +0100, Stefan Herbrechtsmeier wrote: >> Am 03.02.2022 um 22:24 schrieb Richard Purdie via lists.openembedded.org: >>> On Thu, 2022-02-03 at 09:07 -0800, Saul Wold wrote: >>>> When a file can not be identified by checksum and they contain an SPDX >>>> License-Identifier tag, use it as the found license. >>>> >>>> [YOCTO #14529] >>>> >>>> Tested with LICENSE files that contain 1 or more SPDX-License-Identifier tags >>>> >>>> Signed-off-by: Saul Wold <saul.wold@windriver.com> >>>> --- >>>> scripts/lib/recipetool/create.py | 16 +++++++++++----- >>>> 1 file changed, 11 insertions(+), 5 deletions(-) >>>> >>>> diff --git a/scripts/lib/recipetool/create.py b/scripts/lib/recipetool/create.py >>>> index 507a230511..9149c2d94f 100644 >>>> --- a/scripts/lib/recipetool/create.py >>>> +++ b/scripts/lib/recipetool/create.py >>>> @@ -1221,14 +1221,20 @@ def guess_license(srctree, d): >>>> for licfile in sorted(licfiles): >>>> md5value = bb.utils.md5_file(licfile) >>>> license = md5sums.get(md5value, None) >>>> + license_list = [] >>>> if not license: >>>> license, crunched_md5, lictext = crunch_license(licfile) >>>> if lictext and not license: >>>> - license = 'Unknown' >>>> - logger.info("Please add the following line for '%s' to a 'lib/recipetool/licenses.csv' " \ >>>> - "and replace `Unknown` with the license:\n" \ >>>> - "%s,Unknown" % (os.path.relpath(licfile, srctree), md5value)) >>>> - if license: >>>> + spdx_re = re.compile('SPDX-License-Identifier:\s+([-A-Za-z\d. ]+)[ |\n|\r\n]*?') >>>> + license_list = re.findall(spdx_re, "\n".join(lictext)) >>>> + if not license_list: >>>> + license_list.append('Unknown') >>>> + logger.info("Please add the following line for '%s' to a 'lib/recipetool/licenses.csv' " \ >>>> + "and replace `Unknown` with the license:\n" \ >>>> + "%s,Unknown" % (os.path.relpath(licfile, srctree), md5value)) >>>> + else: >>>> + license_list.append(license) >>>> + for license in license_list: >>>> licenses.append((license, os.path.relpath(licfile, srctree), md5value)) >>>> >>>> # FIXME should we grab at least one source file with a license header and add that too? >>> >>> I think to close this bug the code may need to go one step further and >>> effectively grep over the source tree. >> >> Please keep in mind that we need a full license text and not only the >> license name for license compliance. The current function only search >> for license files with license text. >> >>> We'd probably want to list the value of any SPDX-License-Identifier: header >>> found in any of the source files for the user to then decide upon? >> >> I think this is an other feature like a license checker because if you >> have a SPDX-License-Identifier without a license text you have a license >> violation. >> >> This brings us to the problem that this code will interpret a file with >> only a SPDX-License-Identifier as a license file with license text. > > As I understand it the tool is there to help write a recipe so filling out > LICENSE and highlighting a missing full license text would be a valid approach > for the tool and helpful to the user? Yes, but we should distinguish between license files which are guess via hash of the content and SPDX-License-Identifier which labels the source code’s license. In this case the SPDX-License-Identifier is non-material text from a license file and should be filtered out inside crunch_license function. The collection of all used licenses via SPDX-License-Identifier is an additional feature and we need a warning if a SPDX-License-Identifier exists without license file. > It certainly isn't intended as full validation, just intended to assist the > creation of a recipe. But this patch is an regress because it doesn't distinguish between a license file with a known hash and a mostly empty file with a SPDX-License-Identifier. Regards Stefan
diff --git a/scripts/lib/recipetool/create.py b/scripts/lib/recipetool/create.py index 507a230511..9149c2d94f 100644 --- a/scripts/lib/recipetool/create.py +++ b/scripts/lib/recipetool/create.py @@ -1221,14 +1221,20 @@ def guess_license(srctree, d): for licfile in sorted(licfiles): md5value = bb.utils.md5_file(licfile) license = md5sums.get(md5value, None) + license_list = [] if not license: license, crunched_md5, lictext = crunch_license(licfile) if lictext and not license: - license = 'Unknown' - logger.info("Please add the following line for '%s' to a 'lib/recipetool/licenses.csv' " \ - "and replace `Unknown` with the license:\n" \ - "%s,Unknown" % (os.path.relpath(licfile, srctree), md5value)) - if license: + spdx_re = re.compile('SPDX-License-Identifier:\s+([-A-Za-z\d. ]+)[ |\n|\r\n]*?') + license_list = re.findall(spdx_re, "\n".join(lictext)) + if not license_list: + license_list.append('Unknown') + logger.info("Please add the following line for '%s' to a 'lib/recipetool/licenses.csv' " \ + "and replace `Unknown` with the license:\n" \ + "%s,Unknown" % (os.path.relpath(licfile, srctree), md5value)) + else: + license_list.append(license) + for license in license_list: licenses.append((license, os.path.relpath(licfile, srctree), md5value)) # FIXME should we grab at least one source file with a license header and add that too?
When a file can not be identified by checksum and they contain an SPDX License-Identifier tag, use it as the found license. [YOCTO #14529] Tested with LICENSE files that contain 1 or more SPDX-License-Identifier tags Signed-off-by: Saul Wold <saul.wold@windriver.com> --- scripts/lib/recipetool/create.py | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-)