您当前的位置: 首页 >  Python

IT之一小佬

暂无认证

  • 1浏览

    0关注

    1192博文

    0收益

  • 0浏览

    0点赞

    0打赏

    0留言

私信
关注
热门博文

python中url解析库(urlparse、 urlunparse、 urlsplit、 urlunsplit、 urlsplit、 urlunsplit、 urljoin)

IT之一小佬 发布时间:2022-05-05 17:00:46 ,浏览量:1

urlparse()

        使用urlparse库会将url分解成6部分,返回的是一个元组 (scheme, netloc, path, parameters, query, fragment)。可以再使用urljoin、urlsplit、urlunsplit、urlparse把分解后的url拼接起来。

def urlparse(url, scheme='', allow_fragments=True):
    """Parse a URL into 6 components:
    :///;?#
    Return a 6-tuple: (scheme, netloc, path, params, query, fragment).
    Note that we don't break the components up in smaller bits
    (e.g. netloc is a single string) and we don't expand % escapes."""
    url, scheme, _coerce_result = _coerce_args(url, scheme)
    splitresult = urlsplit(url, scheme, allow_fragments)
    scheme, netloc, url, query, fragment = splitresult
    if scheme in uses_params and ';' in url:
        url, params = _splitparams(url)
    else:
        params = ''
    result = ParseResult(scheme, netloc, url, params, query, fragment)
    return _coerce_result(result)

注意:通过urlparse库返回的元组可以用来确定网络协议(HTTP、FTP等)、服务器地址、文件路径等。

示例代码:

from urllib.parse import urlparse


url = urlparse('http://www.baidu.com/index.php?username=dgw')
print(url)
print(url.netloc)

 urlunparse()

        使用urlunparse库将一个元组(scheme, netloc, path, parameters, query, fragment)组成一个具有正确格式的URL。

def urlunparse(components):
    """Put a parsed URL back together again.  This may result in a
    slightly different, but equivalent URL, if the URL that was parsed
    originally had redundant delimiters, e.g. a ? with an empty query
    (the draft states that these are equivalent)."""
    scheme, netloc, url, params, query, fragment, _coerce_result = (
                                                  _coerce_args(*components))
    if params:
        url = "%s;%s" % (url, params)
    return _coerce_result(urlunsplit((scheme, netloc, url, query, fragment)))

示例代码:

from urllib.parse import urlparse, urlunparse


url = urlparse('http://www.baidu.com/index.php?username=dgw')
print(url)
url_join1 = urlunparse(url)
print(url_join1)

url_tuple = ("http", "www.baidu.com", "index.php", "", "username=dgw", "")
url_join2 = urlunparse(url_tuple)
print(url_join2)

 urlsplit()

        使用urlsplit库只要用来分析urlstring,返回包含5个参数的元组(scheme, netloc, path, query, fragment)。urlsplit()和urlparse()差不多。不过它不切分URL的参数。

def urlsplit(url, scheme='', allow_fragments=True):
    """Parse a URL into 5 components:
    :///?#
    Return a 5-tuple: (scheme, netloc, path, query, fragment).
    Note that we don't break the components up in smaller bits
    (e.g. netloc is a single string) and we don't expand % escapes."""
    url, scheme, _coerce_result = _coerce_args(url, scheme)
    allow_fragments = bool(allow_fragments)
    key = url, scheme, allow_fragments, type(url), type(scheme)
    cached = _parse_cache.get(key, None)
    ......

示例代码:

from urllib.parse import urlparse, urlsplit


url = urlparse('http://www.baidu.com/index.php?username=dgw')
print(url)

url2 = urlsplit('http://www.baidu.com/index.php?username=dgw')
print(url2)

 urlunsplit()
def urlunsplit(components):
    """Combine the elements of a tuple as returned by urlsplit() into a
    complete URL as a string. The data argument can be any five-item iterable.
    This may result in a slightly different, but equivalent URL, if the URL that
    was parsed originally had unnecessary delimiters (for example, a ? with an
    empty query; the RFC states that these are equivalent)."""
    scheme, netloc, url, query, fragment, _coerce_result = (
                                          _coerce_args(*components))
    if netloc or (scheme and scheme in uses_netloc and url[:2] != '//'):
        if url and url[:1] != '/': url = '/' + url

示例代码:

from urllib.parse import urlparse, urlsplit, urlunsplit


url = urlparse('http://www.baidu.com/index.php?username=dgw')
print(url)

url2 = urlsplit('http://www.baidu.com/index.php?username=dgw')
print(url2)

url3 = urlunsplit(url2)
print(url3)

 urljoin()

        urljoin()将一个基本URL和一个可能的相对URL连接起来,形成对后者的绝对地址。

注意:如果基本URL并非以字符/结尾的话,那么URL基地址最右边部分就会被这个相对路径所替换。

def urljoin(base, url, allow_fragments=True):
    """Join a base URL and a possibly relative URL to form an absolute
    interpretation of the latter."""
    if not base:
        return url
    if not url:
        return base

    base, url, _coerce_result = _coerce_args(base, url)
    ......

示例代码:

from urllib.parse import urljoin


url = urljoin('http://www.baidu.com/test/', 'index.php?username=dgw')
print(url)

url2 = urljoin('http://www.baidu.com/test', 'index.php?username=dgw')
print(url2)

关注
打赏
1665675218
查看更多评论
立即登录/注册

微信扫码登录

0.0940s