python中url解析库（urlparse、 urlunparse、 urlsplit、 urlunsplit、 urlsplit、 urlunsplit、 urljoin）

IT之一小佬发布时间：2022-05-05 17:00:46 ，浏览量：1

urlparse()

使用urlparse库会将url分解成6部分，返回的是一个元组 (scheme, netloc, path, parameters, query, fragment)。可以再使用urljoin、urlsplit、urlunsplit、urlparse把分解后的url拼接起来。

def urlparse(url, scheme='', allow_fragments=True):
    """Parse a URL into 6 components:
    :///;?#
    Return a 6-tuple: (scheme, netloc, path, params, query, fragment).
    Note that we don't break the components up in smaller bits
    (e.g. netloc is a single string) and we don't expand % escapes."""
    url, scheme, _coerce_result = _coerce_args(url, scheme)
    splitresult = urlsplit(url, scheme, allow_fragments)
    scheme, netloc, url, query, fragment = splitresult
    if scheme in uses_params and ';' in url:
        url, params = _splitparams(url)
    else:
        params = ''
    result = ParseResult(scheme, netloc, url, params, query, fragment)
    return _coerce_result(result)

注意：通过urlparse库返回的元组可以用来确定网络协议（HTTP、FTP等）、服务器地址、文件路径等。

示例代码：

from urllib.parse import urlparse


url = urlparse('http://www.baidu.com/index.php?username=dgw')
print(url)
print(url.netloc)

urlunparse()

使用urlunparse库将一个元组(scheme, netloc, path, parameters, query, fragment)组成一个具有正确格式的URL。

def urlunparse(components):
    """Put a parsed URL back together again.  This may result in a
    slightly different, but equivalent URL, if the URL that was parsed
    originally had redundant delimiters, e.g. a ? with an empty query
    (the draft states that these are equivalent)."""
    scheme, netloc, url, params, query, fragment, _coerce_result = (
                                                  _coerce_args(*components))
    if params:
        url = "%s;%s" % (url, params)
    return _coerce_result(urlunsplit((scheme, netloc, url, query, fragment)))

示例代码：

from urllib.parse import urlparse, urlunparse


url = urlparse('http://www.baidu.com/index.php?username=dgw')
print(url)
url_join1 = urlunparse(url)
print(url_join1)

url_tuple = ("http", "www.baidu.com", "index.php", "", "username=dgw", "")
url_join2 = urlunparse(url_tuple)
print(url_join2)

urlsplit()

使用urlsplit库只要用来分析urlstring，返回包含5个参数的元组(scheme, netloc, path, query, fragment)。urlsplit()和urlparse()差不多。不过它不切分URL的参数。

def urlsplit(url, scheme='', allow_fragments=True):
    """Parse a URL into 5 components:
    :///?#
    Return a 5-tuple: (scheme, netloc, path, query, fragment).
    Note that we don't break the components up in smaller bits
    (e.g. netloc is a single string) and we don't expand % escapes."""
    url, scheme, _coerce_result = _coerce_args(url, scheme)
    allow_fragments = bool(allow_fragments)
    key = url, scheme, allow_fragments, type(url), type(scheme)
    cached = _parse_cache.get(key, None)
    ......

示例代码：

from urllib.parse import urlparse, urlsplit


url = urlparse('http://www.baidu.com/index.php?username=dgw')
print(url)

url2 = urlsplit('http://www.baidu.com/index.php?username=dgw')
print(url2)

urlunsplit()

def urlunsplit(components):
    """Combine the elements of a tuple as returned by urlsplit() into a
    complete URL as a string. The data argument can be any five-item iterable.
    This may result in a slightly different, but equivalent URL, if the URL that
    was parsed originally had unnecessary delimiters (for example, a ? with an
    empty query; the RFC states that these are equivalent)."""
    scheme, netloc, url, query, fragment, _coerce_result = (
                                          _coerce_args(*components))
    if netloc or (scheme and scheme in uses_netloc and url[:2] != '//'):
        if url and url[:1] != '/': url = '/' + url

示例代码：

from urllib.parse import urlparse, urlsplit, urlunsplit


url = urlparse('http://www.baidu.com/index.php?username=dgw')
print(url)

url2 = urlsplit('http://www.baidu.com/index.php?username=dgw')
print(url2)

url3 = urlunsplit(url2)
print(url3)

urljoin()

urljoin()将一个基本URL和一个可能的相对URL连接起来，形成对后者的绝对地址。

注意：如果基本URL并非以字符/结尾的话，那么URL基地址最右边部分就会被这个相对路径所替换。

def urljoin(base, url, allow_fragments=True):
    """Join a base URL and a possibly relative URL to form an absolute
    interpretation of the latter."""
    if not base:
        return url
    if not url:
        return base

    base, url, _coerce_result = _coerce_args(base, url)
    ......

示例代码：

from urllib.parse import urljoin


url = urljoin('http://www.baidu.com/test/', 'index.php?username=dgw')
print(url)

url2 = urljoin('http://www.baidu.com/test', 'index.php?username=dgw')
print(url2)

关注

打赏

1665675218

查看更多评论

python中url解析库（urlparse、 urlunparse、 urlsplit、 urlunsplit、 urlsplit、 urlunsplit、 urljoin）

最近更新

热门博客

[ 申请 ]友情链接：