webutil
Reference documentation.
flask_util
Utilities for Flask. View classes, decorators, URL route converters, etc.
- exception oauth_dropins.webutil.flask_util.Created(description: str | None = None, response: Response | None = None)[source]
Bases:
HTTPException
- exception oauth_dropins.webutil.flask_util.Accepted(description: str | None = None, response: Response | None = None)[source]
Bases:
HTTPException
- exception oauth_dropins.webutil.flask_util.NoContent(description: str | None = None, response: Response | None = None)[source]
Bases:
HTTPException
- exception oauth_dropins.webutil.flask_util.Redirect(*args, location=None, **kwargs)[source]
Bases:
HTTPException
- exception oauth_dropins.webutil.flask_util.MovedPermanently(*args, location=None, **kwargs)[source]
Bases:
Redirect
- exception oauth_dropins.webutil.flask_util.Found(*args, location=None, **kwargs)[source]
Bases:
Redirect
- exception oauth_dropins.webutil.flask_util.NotModified(description: str | None = None, response: Response | None = None)[source]
Bases:
HTTPException
- exception oauth_dropins.webutil.flask_util.PaymentRequired(description: str | None = None, response: Response | None = None)[source]
Bases:
HTTPException
- exception oauth_dropins.webutil.flask_util.ProxyAuthenticationRequired(description: str | None = None, response: Response | None = None)[source]
Bases:
HTTPException
- exception oauth_dropins.webutil.flask_util.MisdirectedRequest(description: str | None = None, response: Response | None = None)[source]
Bases:
HTTPException
- exception oauth_dropins.webutil.flask_util.UpgradeRequired(description: str | None = None, response: Response | None = None)[source]
Bases:
HTTPException
- exception oauth_dropins.webutil.flask_util.PreconditionRequired(description: str | None = None, response: Response | None = None)[source]
Bases:
HTTPException
- exception oauth_dropins.webutil.flask_util.ClientClosedRequest(description: str | None = None, response: Response | None = None)[source]
Bases:
HTTPException
- exception oauth_dropins.webutil.flask_util.VariantAlsoNegotiates(description: str | None = None, response: Response | None = None)[source]
Bases:
HTTPException
- exception oauth_dropins.webutil.flask_util.InsufficientStorage(description: str | None = None, response: Response | None = None)[source]
Bases:
HTTPException
- exception oauth_dropins.webutil.flask_util.LoopDetected(description: str | None = None, response: Response | None = None)[source]
Bases:
HTTPException
- exception oauth_dropins.webutil.flask_util.NotExtended(description: str | None = None, response: Response | None = None)[source]
Bases:
HTTPException
- exception oauth_dropins.webutil.flask_util.NetworkAuthenticationRequired(description: str | None = None, response: Response | None = None)[source]
Bases:
HTTPException
- exception oauth_dropins.webutil.flask_util.NetworkConnectTimeoutError(description: str | None = None, response: Response | None = None)[source]
Bases:
HTTPException
- class oauth_dropins.webutil.flask_util.RegexConverter(url_map, *items)[source]
Bases:
BaseConverter
Regexp URL route for Werkzeug/Flask.
Based on https://github.com/rhyselsmore/flask-reggie.
Usage:
@app.route('/<regex("abc|def"):letters>')
Install with:
app.url_map.converters['regex'] = RegexConverter
- oauth_dropins.webutil.flask_util.get_required_param(name)[source]
Returns the given request parameter.
If it’s not in a query parameter or POST field, the current HTTP request aborts with status 400.
- oauth_dropins.webutil.flask_util.ndb_context_middleware(app, client=None, **kwargs)[source]
WSGI middleware to add an NDB context per request.
Follows the WSGI standard. Details: http://www.python.org/dev/peps/pep-0333/
Install with eg:
ndb_client = ndb.Client() app = Flask('my-app') app.wsgi_app = flask_util.ndb_context_middleware(app.wsgi_app, ndb_client)
- Parameters:
client –
google.cloud.ndb.Client
kwargs – passed through to
google.cloud.ndb.Client.context()
- oauth_dropins.webutil.flask_util.handle_exception(e)[source]
Flask error handler that propagates HTTP exceptions into the response.
Install with:
app.register_error_handler(Exception, handle_exception)
- oauth_dropins.webutil.flask_util.error(msg, status=400, exc_info=False, **kwargs)[source]
Logs and returns an HTTP error via
werkzeug.exceptions.HTTPException
.- Parameters:
msg (str) –
status (int) –
exc_info – Python exception info three-tuple, eg from
sys.exc_info()
kwargs – passed through to
flask.abort()
- oauth_dropins.webutil.flask_util.flash(msg, **kwargs)[source]
Wrapper for
flask.flash`()
that also logs the message.
- oauth_dropins.webutil.flask_util.default_modern_headers(resp)[source]
Include modern HTTP headers by default, but let the response override them.
Install with:
app.after_request(default_modern_headers)
- oauth_dropins.webutil.flask_util.cached(cache, timeout, headers=(), http_5xx=False)[source]
Thin flask-cache wrapper that supports timedelta and cache query param.
If the
cache
URL query parameter isfalse
, skips the cache. Also, does not store the response in the cache if it’s an HTTP 5xx or if there are any flashed messages.- Parameters:
cache (
flask_caching.Cache
) –timeout (
datetime.timedelta
) –headers – sequence of str, optional headers to include in the cache key
http_5xx (bool) – optional, whether to cache HTTP 5xx (server error) responses
- oauth_dropins.webutil.flask_util.canonicalize_domain(from_domains, to_domain)[source]
Returns a callable that redirects one or more domains to a canonical domain.
Preserves scheme, path, and query.
Install with eg:
app = flask.Flask(...) app.before_request(canonicalize_domain(('old1.com', 'old2.org'), 'new.com'))
- Parameters:
from_domains – str or sequence of str
to_domain – str
- class oauth_dropins.webutil.flask_util.XrdOrJrd[source]
Bases:
View
Renders and serves an XRD or JRD file.
JRD is served if the request path ends in .jrd or .json, or the format query parameter is
jrd
orjson
, or the request`s Accept header includesjrd
orjson
.XRD is served if the request path ends in .xrd or .xml, or the format query parameter is
xml
orxrd
, or the request’s Accept header includesxml
orxrd
.Otherwise, defaults to DEFAULT_TYPE.
Subclasses must override
template_prefix()`()
andtemplate_vars()`()
. URL route variables are passed through totemplate_vars()`()
as keyword args.- DEFAULT_TYPE = 'jrd'
Either
JRD
orwhich
, the type to return by default if the request doesn’t ask for one explicitly with the Accept header.
- oauth_dropins.webutil.flask_util.cls
alias of
NetworkConnectTimeoutError
instance_info
Renders vital stats about a single App Engine instance.
Intended for developers, not users. To turn on concurrent request recording, add the middleware and InfoHandler to your WSGI application, eg:
from oauth_dropins.webutil.instance_info import concurrent_requests_wsgi_middleware, info
application = concurrent_requests_wsgi_middleware(WSGIApplication([
...
('/_info', info),
])
- class oauth_dropins.webutil.instance_info.Concurrent(count, when)
Bases:
tuple
- count
Alias for field number 0
- when
Alias for field number 1
- oauth_dropins.webutil.instance_info.info()[source]
Flask handler that renders current instance info.
- oauth_dropins.webutil.instance_info.concurrent_requests_wsgi_middleware(app)[source]
WSGI middleware for per request instance info instrumentation.
Follows the WSGI standard. Details: http://www.python.org/dev/peps/pep-0333/
logs
A handler that serves all app logs for an App Engine HTTP request.
StackDriver Logging API: https://cloud.google.com/logging/docs/apis
- oauth_dropins.webutil.logs.sanitize(msg)[source]
Sanitizes access tokens and Authorization headers.
- oauth_dropins.webutil.logs.url(when, key, **params)[source]
Returns the relative URL (no scheme or host) to a log page.
- Parameters:
when (
datetime
) –key (
ndb.Key
or str) –params – included as query params, eg module, path
- oauth_dropins.webutil.logs.maybe_link(when, key, time_class='dt-updated', link_class='', **params)[source]
Returns an HTML snippet with a timestamp and maybe a log page link.
Example:
<a href="/log?start_time=1513904267&key=aglz..." class="u-bridgy-log"> <time class="dt-updated" datetime="2017-12-22T00:57:47.222060" title="Fri Dec 22 00:57:47 2017"> 3 days ago </time> </a>
The
<a>
tag is only included if the timestamp is 30 days old or less, since Stackdriver’s basic tier doesn’t store logs older than that: * https://cloud.google.com/monitoring/accounts/tiers#logs_ingestion * https://github.com/snarfed/bridgy/issues/767- Parameters:
when (datetime) –
key (
ndb.Key
or str) –time_class (str) – optional class value for the
<time>
taglink_class (str) – optional class value for the
<a>
tag (if generated)(dict (params) – str): query params to include in the link URL, eg module, path
str – str): query params to include in the link URL, eg module, path
Returns: string HTML
- oauth_dropins.webutil.logs.linkify_datastore_keys(msg)[source]
Converts string datastore keys to links to the admin console viewer.
- oauth_dropins.webutil.logs.log(module=None, path=None)[source]
Flask view that searches for and renders app logs for an HTTP request.
URL parameters:
start_time
(float): seconds since the epochkey
(str): token to find in the first app log of the request
Install with:
app.add_url_rule('/log', view_func=logs.log)
Or:
@app.get('/log') @cache.cached(600) def log(): return logs.log()
models
App Engine datastore model base classes, properties, and utilites.
- class oauth_dropins.webutil.models.StringIdModel(**kwargs)[source]
Bases:
Model
An
ndb.Model
class that requires a string id.
- class oauth_dropins.webutil.models.JsonProperty(*args, **kwargs)[source]
Bases:
TextProperty
Fork of ndb’s that subclasses
ndb.TextProperty
instead ofndb.BlobProperty
.This makes values show up as normal, human-readable, serialized JSON in the web console. https://github.com/googleapis/python-ndb/issues/874#issuecomment-1442753255
Duplicated in arroba: https://github.com/snarfed/arroba/blob/main/arroba/ndb_storage.py
- class oauth_dropins.webutil.models.ComputedJsonProperty(*args, **kwargs)[source]
Bases:
JsonProperty
,ComputedProperty
Custom
ndb.ComputedProperty
for JSON values that stores them as strings.…instead of like
ndb.StructuredProperty
, with “entity” type, which bloats them unnecessarily in the datastore.
testutil
Unit test utilities.
- oauth_dropins.webutil.testutil.requests_response(body='', url=None, status=200, content_type=None, redirected_url=None, headers=None, allow_redirects=None, encoding=None)[source]
- Parameters:
redirected_url (str sequence of str) – URL(s) for multiple redirects
- oauth_dropins.webutil.testutil.enable_flask_caching(app, cache)[source]
Test case decorator that enables a flask_caching cache.
Usage:
from app import app, cache class FooTest(TestCase): @enable_flask_caching(app, cache) def test_foo(self): ..
- Parameters:
app (flask.Flask) –
cache (flask_caching.Cache) –
- class oauth_dropins.webutil.testutil.UrlopenResult(status_code, content, url=None, headers={})[source]
Bases:
object
A fake
urllib.request.urlopen()
orurlfetch.fetch()
result object.
- class oauth_dropins.webutil.testutil.Asserts[source]
Bases:
object
Test case mixin class with extra assert helpers.
- assert_entities_equal(a, b, ignore=frozenset({}), keys_only=False, in_order=False)[source]
Asserts that a and b are equivalent entities or lists of entities.
…specifically, that they have the same property values, and if they both have populated keys, that their keys are equal too.
- assert_equals(expected, actual, msg=None, in_order=False, ignore=())[source]
Pinpoints individual element differences in lists and dicts.
If
in_order
is False, ignores order in lists and tuples.
- assert_multiline_equals(expected, actual, ignore_blanks=False)[source]
Compares two multi-line strings and reports a diff style output.
Ignores leading and trailing whitespace on each line, and squeezes repeated blank lines down to just one.
- Parameters:
ignore_blanks (boolean) – whether to ignore blank lines altogether
- assert_multiline_in(expected, actual, ignore_blanks=False)[source]
Checks that a multi-line string is in another and reports a diff output.
Ignores leading and trailing whitespace on each line, and squeezes repeated blank lines down to just one.
- Parameters:
ignore_blanks (boolean) – whether to ignore blank lines altogether
- class oauth_dropins.webutil.testutil.TestCase(methodName='runTest')[source]
Bases:
MoxTestBase
,Asserts
Test case class with lots of extra helpers.
- maxDiff = None
- expect_urlopen(url, response=None, status=200, data=None, headers=None, response_headers={}, **kwargs)[source]
Stubs out
urllib.request.urlopen()
and sets up an expected call.If status isn’t 2xx, makes the expected call raise a
urllib.error.HTTPError
instead of returning the response.If data is set, url must be a
urllib.request.Request
.If response is unset, returns the expected call.
- Parameters:
url (str,
re.RegexObject
,urllib.request.Request
, orwebob.request.Request
) –response (str) –
status (int) – HTTP response code
data (str) – optional
POST
bodyheaders (dict) – optional expected request headers
response_headers (dict) – optional response headers
kwargs – other keyword args, e.g. timeout
util
Misc web-related utilities.
- oauth_dropins.webutil.util.user_agent = 'webutil (https://github.com/snarfed/webutil)'
Set with
set_user_agent()
.
- oauth_dropins.webutil.util.HTTP_TIMEOUT = 15
Default HTTP request timeout, used in
requests_get()
etc.
- oauth_dropins.webutil.util.MAX_HTTP_RESPONSE_SIZE = 2000000
Average HTML size as of 2015-10-15 is 56K, so this is generous and conservative. Raised from 1MB to 2MB on 2023-07-07.
- oauth_dropins.webutil.util.now(tz=datetime.timezone.utc, **kwargs)
Alias, allows unit tests to mock the function.
- oauth_dropins.webutil.util.beautifulsoup_parser = None
Global config, string parser for BeautifulSoup to use, e.g. ‘lxml’. May be set at runtime. https://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser
- oauth_dropins.webutil.util.LINK_RE = re.compile('\\b(?:[a-z]{3,9}:/{1,3})?(?:[^\\s.!"#$%&\'()*+,/:;<=>?@[\\]^_`{|}~]+\\.)+[a-z]{2,}(?::\\d{2,6})?(?:(?:/[\\w/.\\-_~.;:%?@$#&()=+]*)|\\b)', re.IGNORECASE)
Regexps for domains, hostnames, and URLs.
Based on kylewm’s from redwind:
https://github.com/snarfed/bridgy/issues/209#issuecomment-47583528
https://github.com/kylewm/redwind/blob/863989d48b97a85a1c1a92c6d79753d2fbb70775/redwind/util.py#L39
I used to use a more complicated regexp based on https://github.com/silas/huck/blob/master/huck/utils.py#L59 , but i kept finding new input strings that would make it hang the regexp engine.
More complicated alternatives:
http://stackoverflow.com/questions/720113#comment23297770_2102648
https://daringfireball.net/2010/07/improved_regex_for_matching_urls
List of TLDs: https://en.wikipedia.org/wiki/List_of_Internet_top-level_domains#ICANN-era_generic_top-level_domains
Allows emoji and other unicode chars in all domain labels except TLDs. TODO: support IDN TLDs:
TODO: fix bug in
LINK_RE
that makes it miss emoji domain links without scheme, eg☕⊙.ws
. bug is that theat the beginning of
SCHEME_RE
doesn’t apply to emoji, since they’re not word-constituent characters, and that the?
added inLINK_RE
only applies to the parenthesized group inSCHEME_RE
, not the. I tried changing
to
'(?:^|[\s%s])' % PUNCT
, but that broke other things.
- class oauth_dropins.webutil.util.Struct(**kwargs)[source]
Bases:
object
A generic class that initializes its attributes from constructor kwargs.
- class oauth_dropins.webutil.util.CacheDict[source]
Bases:
dict
A dict that also implements memcache’s
get_multi
andset_multi
methods.Useful as a simple in memory replacement for App Engine’s memcache API for e.g.
granary.Source.get_activities_response()
.
- oauth_dropins.webutil.util.to_xml(value)[source]
Renders a dict (usually from JSON) as an XML snippet.
- oauth_dropins.webutil.util.trim_nulls(value, ignore=())[source]
Recursively removes dict and list elements with None or empty values.
- oauth_dropins.webutil.util.uniquify(input)[source]
Returns a list with duplicate items removed.
Like
list(set(...))
, but preserves order.
- oauth_dropins.webutil.util.get_list(obj, key)[source]
Returns a value from a dict as a list.
If the value is a list or tuple, it’s converted to a list. If it’s something else, it’s returned as a single-element list. If the key doesn’t exist, returns
[]
.
- oauth_dropins.webutil.util.pop_list(obj, key)[source]
Like
get_list()
, but also removes the item.
- oauth_dropins.webutil.util.encode(obj, encoding='utf-8')[source]
Character encodes all unicode strings in a collection, recursively.
- oauth_dropins.webutil.util.get_first(obj, key, default=None)[source]
Returns the first element of a dict value.
If the value is a list or tuple, returns the first value. If it’s something else, returns the value itself. If the key doesn’t exist, returns None.
- oauth_dropins.webutil.util.get_url(val, key=None)[source]
Returns
val['url']
ifval
is a dict, otherwiseval
.If key is not None, looks in
val[key]
instead ofval
.
- oauth_dropins.webutil.util.get_urls(obj, key, inner_key=None)[source]
Returns
elem['url']
if dict, otherwiseelem
, for each elem inobj[key]
.If
inner_key
is provided, the returned values areelem[inner_key]['url']
.
- oauth_dropins.webutil.util.tag_uri(domain, name, year=None)[source]
Returns a tag URI string for the given domain and name.
Example return value:
tag:twitter.com,2012:snarfed_org/172417043893731329
Background on tag URIs: http://taguri.org/
- oauth_dropins.webutil.util.parse_tag_uri(uri)[source]
Returns the domain and name in a tag URI string.
Inverse of
tag_uri()
.- Returns:
(str domain, str name) tuple, or None if the tag URI couldn’t be parsed
- oauth_dropins.webutil.util.parse_acct_uri(uri, hosts=None)[source]
Parses acct: URIs of the form acct:user@example.com .
Background: http://hueniverse.com/2009/08/making-the-case-for-a-new-acct-uri-scheme/
- Parameters:
- Returns:
(username, host)
- Return type:
Raises: ValueError if the uri is invalid or the host isn’t allowed.
- oauth_dropins.webutil.util.domain_from_link(url, minimize=True)[source]
Extracts and returns the meaningful domain from a URL.
- oauth_dropins.webutil.util.domain_or_parent_in(input, domains)[source]
Returns True if an input domain or its parent is in a set of domains.
Examples:
foo
,[]
=> Falsefoo
,[foo]
=> Truefoo.bar.com
,[bar.com]
=> Truefoobar.com
,[bar.com]
=> Falsefoo.bar.com
,[.bar.com]
=> Truefoo.bar.com
,[fux.bar.com]
=> Falsebar.com
,[fux.bar.com]
=> False
- oauth_dropins.webutil.util.update_scheme(url, request)[source]
Returns a modified URL with the scheme upgraded to https if the request uses https.
Useful for converting URLs to https if and only if the current request itself is being served over https.
- Parameters:
url (str) –
request (
flask.Request
orwebob.Request
) –
- Returns:
URL
- Return type:
- oauth_dropins.webutil.util.schemeless(url, slashes=True)[source]
Strips the scheme (e.g.
https:
) from a URL.
- oauth_dropins.webutil.util.clean_url(url)[source]
Removes transient query params (e.g.
utm_*
) from a URL.The
utm_*
(Urchin Tracking Metrics?) params come from Google Analytics. https://support.google.com/analytics/answer/1033867The
source=rss-...
params are on all links in Medium’s RSS feeds.
- oauth_dropins.webutil.util.quote_path(url)[source]
Quotes (URL-encodes) just the path part of a URL.
- oauth_dropins.webutil.util.base_url(url)[source]
Returns the base of a given URL.
For example, returns
http://site/posts/
forhttp://site/posts/123
.- Parameters:
url (str) –
- oauth_dropins.webutil.util.is_web(url)[source]
Returns True if the argument is an http or https URL, False otherwise.
- oauth_dropins.webutil.util.extract_links(text)[source]
Returns a list of unique string URLs in the given text.
URLs in the returned list are in the order they first appear in the text.
- oauth_dropins.webutil.util.tokenize_links(text, skip_bare_cc_tlds=False, skip_html_links=True, require_scheme=False)[source]
Splits text into link and non-link text.
- Parameters:
- Returns:
list of links and list of non-link text. Roughly equivalent to the output of
re.findall()
andre.split()
, with some post-processing.- Return type:
- oauth_dropins.webutil.util.linkify(text, pretty=False, skip_bare_cc_tlds=False, **kwargs)[source]
Adds HTML links to URLs in the given plain text.
For example:
linkify('Hello http://tornadoweb.org!')
would returnHello <a href="http://tornadoweb.org">http://tornadoweb.org</a>!
Ignores URLs that are inside HTML links, ie anchor tags that look like
<a href="...">
.- Parameters:
text (str) – input
pretty (bool) – if True, uses
pretty_link()
for link textskip_bare_cc_tlds (bool) – whether to skip links of the form
[domain].[2-letter TLD]
with no schema and no path
- Returns:
linkified input
- Return type:
- oauth_dropins.webutil.util.pretty_link(url, text=None, text_prefix=None, keep_host=True, glyphicon=None, attrs=None, new_tab=False, max_length=None)[source]
Renders a pretty, short HTML link to a URL.
If text is not provided, the link text is the URL without the leading
http(s)://[www.]
, ellipsized at the end if necessary. URL escape characters and UTF-8 are decoded.The default maximum length follow’s Twitter’s rules: full domain plus 15 characters of path (including leading slash).
- Parameters:
url (str) –
text (str) – optional
text_prefix (str) – optional, added to beginning of text
keep_host (bool) – if False, remove the host from the link text
glyphicon (str) – glyphicon to render after the link text, if provided. Details: http://glyphicons.com/
attrs (dict) – attributes => values to include in the a tag. optional
new_tab (bool) – include
target="_blank"
if Truemax_length (int) – max link text length in characters. ellipsized beyond this.
- Returns:
HTML snippet with
<a>
tag- Return type:
- oauth_dropins.webutil.util.parse_iso8601(val)[source]
Parses an ISO 8601 or RFC 3339 date/time string and returns a datetime.
Time zone designator is optional. If present, the returned datetime will be time zone aware.
- Parameters:
val (str) – ISO 8601 or RFC 3339, e.g.
2012-07-23T05:54:49+00:00
- Returns:
- oauth_dropins.webutil.util.parse_iso8601_duration(input)[source]
Parses an ISO 8601 duration.
Note: converts months to 30 days each. ISO 8601 doesn’t seem to define the number of days in a month. Background: https://stackoverflow.com/a/29458514/186123
- Parameters:
input (str) – ISO 8601 duration, e.g.
P3Y6M4DT12H30M5S
https://en.wikipedia.org/wiki/ISO_8601#Durations
- Returns:
…or None if input cannot be parsed as an ISO 8601 duration
- Return type:
- oauth_dropins.webutil.util.to_iso8601_duration(input)[source]
Converts a timedelta to an ISO 8601 duration.
Returns a fairly strict format:
PnMTnS
. Fractional seconds are silently dropped.- Parameters:
input (
datetime.timedelta
) –
https://en.wikipedia.org/wiki/ISO_8601#Durations
- Returns:
ISO 8601 duration, e.g.
P3DT4S
- Return type:
Raises:
TypeError
if delta is not adatetime.timedelta
- oauth_dropins.webutil.util.maybe_iso8601_to_rfc3339(input)[source]
Tries to convert an ISO 8601 date/time string to RFC 3339.
The formats are similar, but not identical, eg. RFC 3339 includes a colon in the timezone offset at the end (
+0000
instead of+00:00
), but ISO 8601 doesn’t.If the input can’t be parsed as ISO 8601, it’s silently returned, unchanged!
- oauth_dropins.webutil.util.maybe_timestamp_to_rfc3339(input)[source]
Tries to convert a string or int UNIX timestamp to RFC 3339.
Assumes UNIX timestamps are always UTC. (They’re generally supposed to be.)
- oauth_dropins.webutil.util.maybe_timestamp_to_iso8601(input)[source]
Tries to convert a string or int UNIX timestamp to ISO 8601.
Assumes UNIX timestamps are always UTC. (They’re generally supposed to be.)
- oauth_dropins.webutil.util.to_utc_timestamp(input)[source]
Converts a datetime to a float POSIX timestamp (seconds since epoch).
- oauth_dropins.webutil.util.as_utc(input)[source]
Converts a timezone-aware datetime to a naive UTC datetime.
If input is timezone-naive, it’s returned as is.
Doesn’t support DST!
- oauth_dropins.webutil.util.naturaltime(val, when=None, **kwargs)[source]
Wrapper for humanize.naturaltime that handles timezone-aware datetimes.
…since humanize currently doesn’t. :( https://github.com/python-humanize/humanize/issues/17
- oauth_dropins.webutil.util.ellipsize(str, words=14, chars=140)[source]
Truncates and ellipsizes str if it’s longer than words or chars.
Words are simply tokenized on whitespace, nothing smart.
- oauth_dropins.webutil.util.add_query_params(url, params)[source]
Adds new query parameters to a URL. Encodes as UTF-8 and URL-safe.
- Parameters:
url (str) – URL or
urllib.request.Request
. May already have query parameters.params (dict or list of (str key, str value) tuples) – Keys may repeat.
- Returns:
URL
- Return type:
- oauth_dropins.webutil.util.remove_query_param(url, param)[source]
Removes query parameter(s) from a URL. Decodes URL escapes and UTF-8.
If the query parameter is not present in the URL, the URL is returned unchanged, and the returned value is None.
If the query parameter is present multiple times, only the last value is returned.
- oauth_dropins.webutil.util.dedupe_urls(urls, key=None)[source]
Normalizes and de-dupes http(s) URLs.
Converts domain to lower case, adds trailing slash when path is empty, and ignores scheme (http vs https), preferring https. Preserves order. Removes Nones and blank strings.
Domains are case insensitive, even modern domains with Unicode/punycode characters:
As examples, http://foo/ and https://FOO are considered duplicates, but http://foo/bar and http://foo/bar/ aren’t.
Background: https://en.wikipedia.org/wiki/URL_normalization
- oauth_dropins.webutil.util.encode_oauth_state(obj)[source]
The state parameter is passed to various source authorization endpoints and returned in a callback. This encodes a JSON object so that it can be safely included as a query string parameter.
- Parameters:
obj (dict) – JSON-serializable
- Returns:
str
- oauth_dropins.webutil.util.decode_oauth_state(state)[source]
Decodes a state parameter encoded by
encode_state_parameter()
.- Parameters:
state (str) – JSON-serialized dict, or None
- Returns:
dict
- oauth_dropins.webutil.util.if_changed(cache, updates, key, value)[source]
Returns a value if it’s different from the cached value, otherwise None.
Values that evaluate to False are considered equivalent to None, in order to save cache space.
If the values differ, updates[key] is set to value. You can use this to collect changes that should be made to the cache in batch. None values in updates mean that the corresponding key should be deleted.
- Parameters:
cache – any object with a
get(key)
methodupdates (dict) –
key – anything supported by cache
value – anything supported by cache
- Returns:
value or None
- oauth_dropins.webutil.util.generate_secret()[source]
Generates a URL-safe random secret string.
Uses App Engine’s
os.urandom()
, which is designed to be cryptographically secure: http://code.google.com/p/googleappengine/issues/detail?id=1055- Parameters:
bytes (int) – length of string to generate
- Returns:
str
- oauth_dropins.webutil.util.is_int(arg)[source]
Returns True if arg can be converted to an integer, False otherwise.
- oauth_dropins.webutil.util.is_float(arg)[source]
Returns True if arg can be converted to a float, False otherwise.
- oauth_dropins.webutil.util.is_base64(arg)[source]
Returns True if arg is a base64 encoded string, False otherwise.
- oauth_dropins.webutil.util.sniff_json_or_form_encoded(value)[source]
Detects whether value is JSON or form-encoded, parses and returns it.
- oauth_dropins.webutil.util.interpret_http_exception(exception)[source]
Extracts the status code and response from different HTTP exception types.
- Parameters:
exception (Exception) –
an HTTP request exception. Supported types:
apiclient.errors.HttpError
gdata.client.RequestError
- Returns:
(str status code or
None
, str response body orNone
)
- oauth_dropins.webutil.util.is_connection_failure(exception)[source]
Returns True if the given exception is a network connection failure.
…False otherwise.
- class oauth_dropins.webutil.util.FileLimiter(file_obj, read_limit)[source]
Bases:
object
A file object wrapper that reads up to a limit and then reports EOF.
From http://stackoverflow.com/a/29838711/186123 . Thanks SO!
- oauth_dropins.webutil.util.read(filename)[source]
Returns the contents of filename, or None if it doesn’t exist.
- oauth_dropins.webutil.util.load_file_lines(file)[source]
Reads lines from a file and returns them as a set.
Leading and trailing whitespace is trimmed. Blank lines and lines beginning with
#
(ie comments) are ignored.- Parameters:
file – a file object or other iterable that returns lines
- Returns:
set of str
- oauth_dropins.webutil.util.json_loads(*args, **kwargs)[source]
Wrapper around
json.loads()
that centralizes our JSON handling.
- oauth_dropins.webutil.util.json_dumps(*args, **kwargs)[source]
Wrapper around
json.dumps()
that centralizes our JSON handling.
- oauth_dropins.webutil.util.set_user_agent(val)[source]
Sets the user agent to be sent in
urlopen()
andrequests_fn()
.- Parameters:
val (str) –
- oauth_dropins.webutil.util.urlopen(url_or_req, *args, **kwargs)[source]
Wraps
urllib.request.urlopen()
and logs the HTTP method and URL.Use
set_user_agent()
to change theUser-Agent
header to be sent.
- oauth_dropins.webutil.util.requests_fn(fn)[source]
Wraps requests.* and logs the HTTP method and URL.
Use
set_user_agent()
to change theUser-Agent
header to be sent.- Parameters:
method (callable) –
requests.get()
,requests.head()
, orrequests.post()
- Returns:
drop-in replacement for
requests.get()
etcThe gateway kwarg is a bool for whether this is in a HTTP gateway request handler context. If True, errors will be raised as appropriate Flask HTTP exceptions. Malformed URLs result in
werkzeug.exceptions.BadRequest
(HTTP 400), connection failures and HTTP 4xx and 5xx result inwerkzeug.exceptions.BadGateway
(HTTP 502).- Return type:
callable (str url, gateway=None, **kwargs) =>
requests.Response
- oauth_dropins.webutil.util.requests_post_with_redirects(url, *args, **kwargs)[source]
Make an HTTP POST, and follow redirects with POST instead of GET.
Violates the HTTP spec’s rule to follow POST redirects with GET. Yolo!
- Parameters:
url (str) –
- Returns:
- Raises:
- oauth_dropins.webutil.util.follow_redirects(url, **kwargs)[source]
Fetches a URL with HEAD, repeating if necessary to follow redirects.
Caches results for 1 day by default. To bypass the cache, use follow_redirects.__wrapped__(…).
Does not raise an exception if any of the HTTP requests fail, just returns the failed response. If you care, be sure to check the returned response’s status code!
- Parameters:
url (str) –
kwargs – passed to
requests.head()
- Returns:
- from the final request. The
url
attribute has the final URL.
- from the final request. The
- Return type:
- class oauth_dropins.webutil.util.UrlCanonicalizer(scheme='https', domain=None, subdomain=None, approve=None, reject=None, query=False, fragment=False, trailing_slash=False, redirects=True, headers=None)[source]
Bases:
object
Converts URLs to their canonical form.
If an input URL matches approve or reject, it’s automatically approved as is without following redirects.
If we HEAD the URL to follow redirects and it returns 4xx or 5xx, we return None.
- class oauth_dropins.webutil.util.WideUnicode(*args, **kwargs)[source]
Bases:
str
String class with consistent indexing and
len()
on narrow and wide Python.PEP 261 describes that Python 2 builds come in “narrow” and “wide” flavors. Wide is configured with
--enable-unicode=ucs4
, which represents Unicode high code points above the 16-bit Basic Multilingual Plane in unicode strings as single characters. This means thatlen()
, indexing, and slices of unicode strings use Unicode code points consistently.Narrow, on the other hand, represents high code points as “surrogate pairs” of 16-bit characters. This means that
len()
, indexing, and slicing unicode strings does not always correspond to Unicode code points.Mac OS X, Windows, and older Linux distributions have narrow Python 2 builds, while many modern Linux distributions have wide builds, so this can cause platform-specific bugs, e.g. with many commonly used emoji.
Docs:
Inspired by: http://stackoverflow.com/a/9934913
Related work:
On StackOverflow:
- oauth_dropins.webutil.util.parse_html(input, **kwargs)[source]
Parses an HTML string with BeautifulSoup.
Uses the HTML parser currently set in the beautifulsoup_parser global. http://www.crummy.com/software/BeautifulSoup/bs4/doc/#specifying-the-parser-to-use
We generally try to use the same parser and version in prod and locally, since we’ve been bit by at least one meaningful difference between lxml and e.g. html5lib: lxml includes the contents of
<noscript>
tags, html5lib omits them: https://github.com/snarfed/bridgy/issues/798#issuecomment-370508015Also lxml is noticeably faster than the others.
Specifically, projects like oauth-dropins, granary, and bridgy all use lxml explicitly.
- Parameters:
input – (str or
requests.Response
): input HTMLkwargs – passed through to
bs4.BeautifulSoup
constructor
- Returns:
bs4.BeautifulSoup
- oauth_dropins.webutil.util.parse_mf2(input, url=None, id=None)[source]
Parses microformats2 out of HTML.
Currently uses mf2py.
- Parameters:
input – (str,
bs4.BeautifulSoup
, orrequests.Response
)url (str) – optional, URL of the input page, used as the base for relative URLs
id (str) – optional id of specific element to extract and parse. defaults to the whole page.
- Returns:
- parsed mf2 data, or
None
if id is provided and not found in the input HTML
- parsed mf2 data, or
- Return type:
- oauth_dropins.webutil.util.parse_http_equiv(content)[source]
Parses the value in the
http_equiv
meta field and returns the url.- Parameters:
content (str) –
http_equiv
content str: https://www.w3.org/TR/WCAG20-TECHS/H76.html#procedure- Returns:
empty if content format is incorrect
- Return type:
- oauth_dropins.webutil.util.fetch_http_equiv(input, **kwargs)[source]
Fetches http_equiv meta tag, if available.
- Parameters:
input (str,
bs4.BeautifulSoup
, orrequests.Response
) –- Returns:
empty if not available or a url if available
- Return type:
- oauth_dropins.webutil.util.fetch_mf2(url, get_fn=<function requests_fn.<locals>.call>, gateway=False, require_backlink=None, **kwargs)[source]
Fetches an HTML page over HTTP, parses it, and returns its microformats2.
If url includes a fragment, or redirects to a URL with a fragment, only that element of the HTML will be parsed and returned.
- Parameters:
url (str) –
get_fn (callable) – matching
requests.get()
’s signature, for the HTTP fetchgateway (bool) – see
requests_fn()
require_backlink (str or sequence of strs) – If provided, one of these must be in the response body, in any form. Generally used for webmention validation.
kwargs – passed through to
requests.get()
- Returns:
- parsed mf2 data. Includes the final URL of the parsed document
(after redirects) in the top-level
url
field.
- Return type:
- Raises:
ValueError – if a backlink in
require_backlink
is not found
webmention
Webmention endpoint discovery and sending.
Spec: https://webmention.net/draft/
- class oauth_dropins.webutil.webmention.Endpoint(endpoint, response)
Bases:
tuple
Returned by
discover
.- response
- Type:
- endpoint
Alias for field number 0
- response
Alias for field number 1
- oauth_dropins.webutil.webmention.discover(url, follow_meta_refresh=False, **requests_kwargs)[source]
Discovers a URL’s webmention endpoint.
Follows up to 30 HTTP 3xx redirects, and at most one client-side HTML meta
http-equiv=refresh
redirects.- Parameters:
- Returns:
If no endpoint is discovered, the endpoint attribute will be
None
.- Return type:
- Raises:
ValueError – on bad URL
requests.HTTPError – on failure
- oauth_dropins.webutil.webmention.send(endpoint, source, target, **requests_kwargs)[source]
Sends a webmention.
- Parameters:
- Returns:
on success
- Return type:
- Raises:
ValueError – on bad URL
requests.HTTPError – on failure