Patchwork [bitbake-devel,RFC] bitbake: Rewrite fetch2.decodeurl() to use urlparse.urlsplit()

login
register
mail settings
Submitter Phil Blundell
Date Jan. 10, 2014, 4:28 p.m.
Message ID <1389371323.9182.169.camel@phil-desktop.brightsign>
Download mbox | patch
Permalink /patch/64551/
State New
Headers show

Comments

Phil Blundell - Jan. 10, 2014, 4:28 p.m.
This means that it now understands "standard" URI syntax as well as
the slightly odd legacy bitbake variant.

There are other places in bitbake (e.g. Local.urldata_init) that also 
need fixing, but this is a start.

Signed-off-by: Phil Blundell <pb@pbcl.net>
---
 lib/bb/fetch2/__init__.py | 60 ++++++++++++++++++++++++++---------------------
 1 file changed, 33 insertions(+), 27 deletions(-)
Martin Jansa - Jan. 16, 2014, 2:21 p.m.
On Fri, Jan 10, 2014 at 04:28:43PM +0000, Phil Blundell wrote:
> This means that it now understands "standard" URI syntax as well as
> the slightly odd legacy bitbake variant.
> 
> There are other places in bitbake (e.g. Local.urldata_init) that also 
> need fixing, but this is a start.

I agree it's good start, I was trying to test this together with
http://lists.openembedded.org/pipermail/bitbake-devel/2014-January/004327.html

and bitbake-selftest shows failure on different URL, did it pass for you?
- ('http', 'www.google.com', '/index.html', None, None, {})
+ ('http', 'www.google.com', '/index.html', '', '', {})

+ few errors before that like:
File "/usr/lib64/python2.7/re.py", line 238, in _compile
    raise TypeError, "first argument must be string or compiled pattern"
  TypeError: first argument must be string or compiled pattern

> Signed-off-by: Phil Blundell <pb@pbcl.net>
> ---
>  lib/bb/fetch2/__init__.py | 60 ++++++++++++++++++++++++++---------------------
>  1 file changed, 33 insertions(+), 27 deletions(-)
> 
> diff --git a/lib/bb/fetch2/__init__.py b/lib/bb/fetch2/__init__.py
> index 260fb37..4886dae 100644
> --- a/lib/bb/fetch2/__init__.py
> +++ b/lib/bb/fetch2/__init__.py
> @@ -329,40 +329,46 @@ def decodeurl(url):
>      user, password, parameters).
>      """
>  
> -    m = re.compile('(?P<type>[^:]*)://((?P<user>.+)@)?(?P<location>[^;]+)(;(?P<parm>.*))?').match(url)
> -    if not m:
> +    if url.startswith("file://"):
> +        # This is an old-style bitbake URL.  Fix it up.
> +        url = "file:" + url[7:]
> +
> +    import urlparse
> +    d = urlparse.urlsplit(url)
> +    if not d.scheme:
>          raise MalformedUrl(url)
>  
> -    type = m.group('type')
> -    location = m.group('location')
> -    if not location:
> +    netloc = d.netloc
> +    path = d.path
> +
> +    if not path:
>          raise MalformedUrl(url)
> -    user = m.group('user')
> -    parm = m.group('parm')
>  
> -    locidx = location.find('/')
> -    if locidx != -1 and type.lower() != 'file':
> -        host = location[:locidx]
> -        path = location[locidx:]
> -    else:
> -        host = ""
> -        path = location
> -    if user:
> -        m = re.compile('(?P<user>[^:]+)(:?(?P<pswd>.*))').match(user)
> -        if m:
> -            user = m.group('user')
> -            pswd = m.group('pswd')
> -    else:
> -        user = ''
> -        pswd = ''
> +    user = ''
> +    pswd = ''
> +    host = ''
> +
> +    if netloc:
> +        m = re.compile('((?P<user>[^:@]+)(:(?P<pswd>[^@]+))?@)?(?P<host>.+)').match(netloc)
> +        if not m:
> +            raise MalformedUrl(url)
> +
> +        user = m.group('user')
> +        pswd = m.group('pswd')
> +        host = m.group('host')
>  
>      p = {}
> -    if parm:
> -        for s in parm.split(';'):
> -            s1, s2 = s.split('=')
> -            p[s1] = s2
> +    sep = path.find(";")
> +    if sep != -1:
> +        for s in path[sep+1:].split(';'):
> +            try:
> +                s1, s2 = s.split('=')
> +                p[s1] = s2
> +            except ValueError:
> +                raise MalformedUrl(url)
> +        path = path[:sep]
>  
> -    return type, host, urllib.unquote(path), user, pswd, p
> +    return d.scheme, host, urllib.unquote(path), user, pswd, p
>  
>  def encodeurl(decoded):
>      """Encodes a URL from tokens (scheme, network location, path,
> -- 
> 1.8.5
> 
> 
> 
> _______________________________________________
> bitbake-devel mailing list
> bitbake-devel@lists.openembedded.org
> http://lists.openembedded.org/mailman/listinfo/bitbake-devel
Olof Johansson - Jan. 16, 2014, 2:45 p.m.
On 14-01-10 17:28 +0100, Phil Blundell wrote:
> This means that it now understands "standard" URI syntax as well as
> the slightly odd legacy bitbake variant.
> 
> There are other places in bitbake (e.g. Local.urldata_init) that also 
> need fixing, but this is a start.

I wrote a URI class last year that got integrated to bitbake's
fetch2, but the commit that actually made decode/encodeurl a
wrapper around it was reverted because I missed adding support
for query params (oops :-)). The class itself is still intact
though (it's just above decodeurl).

I did send fixes for that (adding support for query params), but
they haven't been merged. Perhaps I should resend?
Richard Purdie - Jan. 17, 2014, 12:33 p.m.
On Thu, 2014-01-16 at 15:45 +0100, Olof Johansson wrote:
> On 14-01-10 17:28 +0100, Phil Blundell wrote:
> > This means that it now understands "standard" URI syntax as well as
> > the slightly odd legacy bitbake variant.
> > 
> > There are other places in bitbake (e.g. Local.urldata_init) that also 
> > need fixing, but this is a start.
> 
> I wrote a URI class last year that got integrated to bitbake's
> fetch2, but the commit that actually made decode/encodeurl a
> wrapper around it was reverted because I missed adding support
> for query params (oops :-)). The class itself is still intact
> though (it's just above decodeurl).
> 
> I did send fixes for that (adding support for query params), but
> they haven't been merged. Perhaps I should resend?

Please do, they've fallen off the radar...

Cheers,

Richard

Patch

diff --git a/lib/bb/fetch2/__init__.py b/lib/bb/fetch2/__init__.py
index 260fb37..4886dae 100644
--- a/lib/bb/fetch2/__init__.py
+++ b/lib/bb/fetch2/__init__.py
@@ -329,40 +329,46 @@  def decodeurl(url):
     user, password, parameters).
     """
 
-    m = re.compile('(?P<type>[^:]*)://((?P<user>.+)@)?(?P<location>[^;]+)(;(?P<parm>.*))?').match(url)
-    if not m:
+    if url.startswith("file://"):
+        # This is an old-style bitbake URL.  Fix it up.
+        url = "file:" + url[7:]
+
+    import urlparse
+    d = urlparse.urlsplit(url)
+    if not d.scheme:
         raise MalformedUrl(url)
 
-    type = m.group('type')
-    location = m.group('location')
-    if not location:
+    netloc = d.netloc
+    path = d.path
+
+    if not path:
         raise MalformedUrl(url)
-    user = m.group('user')
-    parm = m.group('parm')
 
-    locidx = location.find('/')
-    if locidx != -1 and type.lower() != 'file':
-        host = location[:locidx]
-        path = location[locidx:]
-    else:
-        host = ""
-        path = location
-    if user:
-        m = re.compile('(?P<user>[^:]+)(:?(?P<pswd>.*))').match(user)
-        if m:
-            user = m.group('user')
-            pswd = m.group('pswd')
-    else:
-        user = ''
-        pswd = ''
+    user = ''
+    pswd = ''
+    host = ''
+
+    if netloc:
+        m = re.compile('((?P<user>[^:@]+)(:(?P<pswd>[^@]+))?@)?(?P<host>.+)').match(netloc)
+        if not m:
+            raise MalformedUrl(url)
+
+        user = m.group('user')
+        pswd = m.group('pswd')
+        host = m.group('host')
 
     p = {}
-    if parm:
-        for s in parm.split(';'):
-            s1, s2 = s.split('=')
-            p[s1] = s2
+    sep = path.find(";")
+    if sep != -1:
+        for s in path[sep+1:].split(';'):
+            try:
+                s1, s2 = s.split('=')
+                p[s1] = s2
+            except ValueError:
+                raise MalformedUrl(url)
+        path = path[:sep]
 
-    return type, host, urllib.unquote(path), user, pswd, p
+    return d.scheme, host, urllib.unquote(path), user, pswd, p
 
 def encodeurl(decoded):
     """Encodes a URL from tokens (scheme, network location, path,