oauth_dropins.webutil¶
Reference documentation.
util¶
Misc utilities.
-
class
oauth_dropins.webutil.util.
Struct
(**kwargs)[source]¶ Bases:
object
A generic class that initializes its attributes from constructor kwargs.
-
__weakref__
¶ list of weak references to the object (if defined)
-
-
class
oauth_dropins.webutil.util.
CacheDict
[source]¶ Bases:
dict
A dict that also implements memcache’s get_multi() and set_multi() methods.
Useful as a simple in memory replacement for App Engine’s memcache API for e.g. get_activities_response() in snarfed/activitystreams-unofficial.
-
__weakref__
¶ list of weak references to the object (if defined)
-
-
oauth_dropins.webutil.util.
to_xml
(value)[source]¶ Renders a dict (usually from JSON) as an XML snippet.
-
oauth_dropins.webutil.util.
trim_nulls
(value, ignore=())[source]¶ Recursively removes dict and list elements with None or empty values.
Parameters: - value – dict or list
- ignore – optional sequence of keys to allow to have None/empty values
-
oauth_dropins.webutil.util.
uniquify
(input)[source]¶ Returns a list with duplicate items removed.
Like list(set(…)), but preserves order.
-
oauth_dropins.webutil.util.
get_list
(obj, key)[source]¶ Returns a value from a dict as a list.
If the value is a list or tuple, it’s converted to a list. If it’s something else, it’s returned as a single-element list. If the key doesn’t exist, returns [].
-
oauth_dropins.webutil.util.
get_first
(obj, key, default=None)[source]¶ Returns the first element of a dict value.
If the value is a list or tuple, returns the first value. If it’s something else, returns the value itself. If the key doesn’t exist, returns None.
-
oauth_dropins.webutil.util.
get_url
(val, key=None)[source]¶ Returns val[‘url’] if val is a dict, otherwise val.
If key is not None, looks in val[key] instead of val.
-
oauth_dropins.webutil.util.
get_urls
(obj, key, inner_key=None)[source]¶ Returns elem[‘url’] if dict, otherwise elem, for each elem in obj[key].
If inner_key is provided, the returned values are elem[inner_key][‘url’].
-
oauth_dropins.webutil.util.
tag_uri
(domain, name, year=None)[source]¶ Returns a tag URI string for the given domain and name.
Example return value: ‘tag:twitter.com,2012:snarfed_org/172417043893731329’
Background on tag URIs: http://taguri.org/
-
oauth_dropins.webutil.util.
parse_tag_uri
(uri)[source]¶ Returns the domain and name in a tag URI string.
Inverse of
tag_uri()
.Returns: (string domain, string name) tuple, or None if the tag URI couldn’t be parsed
-
oauth_dropins.webutil.util.
parse_acct_uri
(uri, hosts=None)[source]¶ Parses acct: URIs of the form acct:user@example.com .
Background: http://hueniverse.com/2009/08/making-the-case-for-a-new-acct-uri-scheme/
Parameters: - uri – string
- hosts – sequence of allowed hosts (usually domains). None means allow all.
Returns: (username, host) tuple
Raises: ValueError if the uri is invalid or the host isn’t allowed.
-
oauth_dropins.webutil.util.
domain_from_link
(url)[source]¶ Extracts and returns the meaningful domain from a URL.
Strips www., mobile., and m. from the beginning of the domain.
Parameters: url – string Returns: string
-
oauth_dropins.webutil.util.
domain_or_parent_in
(input, domains)[source]¶ Returns True if an input domain or its parent is in a set of domains.
Examples:
- foo, [] => False
- foo, [foo] => True
- foo.bar.com, [bar.com] => True
- foo.bar.com, [.bar.com] => True
- foo.bar.com, [fux.bar.com] => False
- bar.com, [fux.bar.com] => False
Parameters: - input – string domain
- domains – sequence of string domains
Returns: boolean
-
oauth_dropins.webutil.util.
update_scheme
(url, handler)[source]¶ Returns a modified string url with the current request’s scheme.
Useful for converting URLs to https if and only if the current request itself is being served over https.
-
oauth_dropins.webutil.util.
schemeless
(url, slashes=True)[source]¶ Strips the scheme (e.g. ‘https:’) from a URL.
Parameters: - url – string
- leading_slashes – if False, also strips leading slashes and trailing slash, e.g. ‘http://example.com/’ becomes ‘example.com’
Returns: string URL
-
oauth_dropins.webutil.util.
fragmentless
(url)[source]¶ Strips the fragment (e.g. ‘#foo’) from a URL.
Parameters: url – string Returns: string URL
-
oauth_dropins.webutil.util.
clean_url
(url)[source]¶ Removes transient query params (e.g. utm_*) from a URL.
The utm_* (Urchin Tracking Metrics?) params come from Google Analytics. https://support.google.com/analytics/answer/1033867
The source=rss-… params are on all links in Medium’s RSS feeds.
Parameters: url – string Returns: string, the cleaned url, or None if it can’t be parsed
-
oauth_dropins.webutil.util.
base_url
(url)[source]¶ Returns the base of a given URL.
For example, returns ‘http://site/posts/’ for ‘http://site/posts/123’.
Parameters: url – string
-
oauth_dropins.webutil.util.
extract_links
(text)[source]¶ Returns a list of unique string URLs in the given text.
URLs in the returned list are in the order they first appear in the text.
-
oauth_dropins.webutil.util.
tokenize_links
(text, skip_bare_cc_tlds=False)[source]¶ Splits text into link and non-link text.
Parameters: - text – string to linkify
- skip_bare_cc_tlds – boolean, whether to skip links of the form [domain].[2-letter TLD] with no schema and no path
Returns: a tuple containing two lists of strings, a list of links and list of non-link text. Roughly equivalent to the output of re.findall and re.split, with some post-processing.
-
oauth_dropins.webutil.util.
linkify
(text, pretty=False, skip_bare_cc_tlds=False, **kwargs)[source]¶ Adds HTML links to URLs in the given plain text.
For example:
linkify('Hello http://tornadoweb.org!')
would return ‘Hello <a href=”http://tornadoweb.org”>http://tornadoweb.org</a>!’Ignores URLs that are inside HTML links, ie anchor tags that look like <a href=”…”> .
Parameters: - text – string, input
- pretty – if True, uses
pretty_link()
for link text
Returns: string, linkified input
-
oauth_dropins.webutil.util.
pretty_link
(url, text=None, keep_host=True, glyphicon=None, attrs=None, new_tab=False, max_length=None)[source]¶ Renders a pretty, short HTML link to a URL.
If text is not provided, the link text is the URL without the leading http(s)://[www.], ellipsized at the end if necessary. URL escape characters and UTF-8 are decoded.
The default maximum length follow’s Twitter’s rules: full domain plus 15 characters of path (including leading slash).
Parameters: - url – string
- text – string, optional
- keep_host – if False, remove the host from the link text
- glyphicon – string glyphicon to render after the link text, if provided. Details: http://glyphicons.com/
- attrs – dict of attributes => values to include in the a tag. optional
- new_tab – boolean, include target=”_blank” if True
- max_length – int, max link text length in characters. ellipsized beyond this.
Returns: unicode string HTML snippet with <a> tag
-
class
oauth_dropins.webutil.util.
SimpleTzinfo
[source]¶ Bases:
datetime.tzinfo
A simple, DST-unaware tzinfo subclass.
-
__weakref__
¶ list of weak references to the object (if defined)
-
-
oauth_dropins.webutil.util.
parse_iso8601
(str)[source]¶ Parses an ISO 8601 or RFC 3339 date/time string and returns a datetime.
Time zone designator is optional. If present, the returned datetime will be time zone aware.
Parameters: str – string ISO 8601 or RFC 3339, e.g. ‘2012-07-23T05:54:49+00:00’ Returns: datetime
-
oauth_dropins.webutil.util.
maybe_iso8601_to_rfc3339
(input)[source]¶ Tries to convert an ISO 8601 date/time string to RFC 3339.
The formats are similar, but not identical, eg. RFC 3339 includes a colon in the timezone offset at the end (+0000 instead of +00:00), but ISO 8601 doesn’t.
If the input can’t be parsed as ISO 8601, it’s silently returned, unchanged!
-
oauth_dropins.webutil.util.
maybe_timestamp_to_rfc3339
(input)[source]¶ Tries to convert a string or int UNIX timestamp to RFC 3339.
-
oauth_dropins.webutil.util.
to_utc_timestamp
(input)[source]¶ Converts a datetime to a float POSIX timestamp (seconds since epoch).
-
oauth_dropins.webutil.util.
as_utc
(input)[source]¶ Converts a timezone-aware datetime to a naive UTC datetime.
If input is timezone-naive, it’s returned as is.
Doesn’t support DST!
-
oauth_dropins.webutil.util.
ellipsize
(str, words=14, chars=140)[source]¶ Truncates and ellipsizes str if it’s longer than words or chars.
Words are simply tokenized on whitespace, nothing smart.
-
oauth_dropins.webutil.util.
add_query_params
(url, params)[source]¶ Adds new query parameters to a URL. Encodes as UTF-8 and URL-safe.
Parameters: - url – string URL or
urllib2.Request
. May already have query parameters. - params – dict or list of (string key, string value) tuples. Keys may repeat.
Returns: string URL
- url – string URL or
-
oauth_dropins.webutil.util.
dedupe_urls
(urls)[source]¶ Normalizes and de-dupes http(s) URLs.
Converts domain to lower case, adds trailing slash when path is empty, and ignores scheme (http vs https), preferring https. Preserves order. Removes Nones and blank strings.
Domains are case insensitive, even modern domains with Unicode/punycode characters:
http://unicode.org/faq/idn.html#6 https://tools.ietf.org/html/rfc4343#section-5
As examples, http://foo/ and https://FOO are considered duplicates, but http://foo/bar and http://foo/bar/ aren’t.
Background: https://en.wikipedia.org/wiki/URL_normalization
TODO: port to https://pypi.python.org/pypi/urlnorm
Parameters: urls – sequence of string URLs Returns: sequence of string URLs
-
oauth_dropins.webutil.util.
encode_oauth_state
(obj)[source]¶ The Ostate parameter is passed to various source authorization endpoints and returned in a callback. This encodes a JSON object so that it can be safely included as a query string parameter.
Parameters: obj – a JSON-serializable dict Returns: a string
-
oauth_dropins.webutil.util.
decode_oauth_state
(state)[source]¶ Decodes a state parameter encoded by
encode_state_parameter()
.Parameters: state – a string (JSON-serialized dict), or None Returns: dict
-
oauth_dropins.webutil.util.
if_changed
(cache, updates, key, value)[source]¶ Returns a value if it’s different from the cached value, otherwise None.
Values that evaluate to False are considered equivalent to None, in order to save cache space.
If the values differ, updates[key] is set to value. You can use this to collect changes that should be made to the cache in batch. None values in updates mean that the corresponding key should be deleted.
Parameters: - cache – any object with a get(key) method
- updates – mapping (e.g. dict)
- key – anything supported by cache
- value – anything supported by cache
Returns: value or None
-
oauth_dropins.webutil.util.
generate_secret
()[source]¶ Generates a URL-safe random secret string.
Uses App Engine’s os.urandom(), which is designed to be cryptographically secure: http://code.google.com/p/googleappengine/issues/detail?id=1055
Parameters: bytes – integer, length of string to generate Returns: random string
-
oauth_dropins.webutil.util.
is_int
(arg)[source]¶ Returns True if arg can be converted to an integer, False otherwise.
-
oauth_dropins.webutil.util.
is_float
(arg)[source]¶ Returns True if arg can be converted to a float, False otherwise.
-
oauth_dropins.webutil.util.
is_base64
(arg)[source]¶ Returns True if arg is a base64 encoded string, False otherwise.
-
oauth_dropins.webutil.util.
interpret_http_exception
(exception)[source]¶ Extracts the status code and response from different HTTP exception types.
Parameters: exception – an HTTP request exception. Supported types:
apiclient.errors.HttpError
webob.exc.WSGIHTTPException
gdata.client.RequestError
oauth2client.client.AccessTokenRefreshError
requests.HTTPError
urllib2.HTTPError
urllib2.URLError
Returns: (string status code or None, string response body or None)
-
oauth_dropins.webutil.util.
is_connection_failure
(exception)[source]¶ Returns True if the given exception is a network connection failure.
…False otherwise.
-
class
oauth_dropins.webutil.util.
FileLimiter
(file_obj, read_limit)[source]¶ Bases:
object
A file object wrapper that reads up to a limit and then reports EOF.
From http://stackoverflow.com/a/29838711/186123 . Thanks SO!
-
__weakref__
¶ list of weak references to the object (if defined)
-
-
oauth_dropins.webutil.util.
load_file_lines
(file)[source]¶ Reads lines from a file and returns them as a set.
Leading and trailing whitespace is trimmed. Blank lines and lines beginning with # (ie comments) are ignored.
Parameters: file – a file object or other iterable that returns lines Returns: set of strings
-
oauth_dropins.webutil.util.
urlopen
(url_or_req, *args, **kwargs)[source]¶ Wraps urllib2.urlopen and logs the HTTP method and URL.
-
oauth_dropins.webutil.util.
requests_fn
(fn)[source]¶ Wraps requests.* and logs the HTTP method and URL.
-
oauth_dropins.webutil.util.
follow_redirects
(url, cache=None, fail_cache_time_secs=86400, **kwargs)[source]¶ Fetches a URL with HEAD, repeating if necessary to follow redirects.
Does not raise an exception if any of the HTTP requests fail, just returns the failed response. If you care, be sure to check the returned response’s status code!
Parameters: - url – string
- cache – optional, a cache object to read and write resolved URLs to. Must have get(key) and set(key, value, time=…) methods. Stores ‘R [original URL]’ in key, final URL in value.
- kwargs – passed to requests.head()
Returns: - the requests.Response for the final request. The url attribute has the
final URL.
-
class
oauth_dropins.webutil.util.
UrlCanonicalizer
(scheme='https', domain=None, subdomain=None, approve=None, reject=None, query=False, fragment=False, trailing_slash=False, redirects=True, headers=None)[source]¶ Bases:
object
Converts URLs to their canonical form.
If an input URL matches approve or reject, it’s automatically approved as is without following redirects.
If we HEAD the URL to follow redirects and it returns 4xx or 5xx, we return None.
-
__init__
(scheme='https', domain=None, subdomain=None, approve=None, reject=None, query=False, fragment=False, trailing_slash=False, redirects=True, headers=None)[source]¶ Constructor.
Parameters: - scheme – string canonical scheme for this source (default ‘https’)
- domain – string canonical domain for this source (default None). If set, links on other domains will be rejected without following redirects.
- subdomain – string canonical subdomain, e.g. ‘www’ (default none, ie root domain). only added if there’s not already a subdomain.
- approve – string regexp matching URLs that are automatically considered canonical
- reject – string regexp matching URLs that are automatically considered canonical
- query – boolean, whether to keep query params, if any (default False)
- fragment – boolean, whether to keep fragment, if any (default False)
- slash (trailing) – boolean, whether the path should end in / (default False)
- redirects – boolean, whether to make HTTP HEAD requests to follow redirects (default True)
- headers – passed through to the requests.head call for following redirects
-
__call__
(url, redirects=None)[source]¶ Canonicalizes a string URL.
Returns the canonical form of a string URL, or None if it can’t be canonicalized, ie it’s in the blacklist or its domain doesn’t match.
-
__weakref__
¶ list of weak references to the object (if defined)
-
-
class
oauth_dropins.webutil.util.
WideUnicode
(*args, **kwargs)[source]¶ Bases:
unicode
String class with consistent indexing and len() on narrow and wide Python.
PEP 261 describes that Python 2 builds come in “narrow” and “wide” flavors. Wide is configured with –enable-unicode=ucs4, which represents Unicode high code points above the 16-bit Basic Multilingual Plane in unicode strings as single characters. This means that len(), indexing, and slices of unicode strings use Unicode code points consistently.
Narrow, on the other hand, represents high code points as “surrogate pairs” of 16-bit characters. This means that len(), indexing, and slicing unicode strings does not always correspond to Unicode code points.
Mac OS X, Windows, and older Linux distributions have narrow Python 2 builds, while many modern Linux distributions have wide builds, so this can cause platform-specific bugs, e.g. with many commonly used emoji.
Docs: https://www.python.org/dev/peps/pep-0261/ https://docs.python.org/2.7/library/codecs.html?highlight=ucs2#encodings-and-unicode http://www.unicode.org/glossary/#high_surrogate_code_point
Inspired by: http://stackoverflow.com/a/9934913
Related work: https://uniseg-python.readthedocs.io/ https://pypi.python.org/pypi/pytextseg https://github.com/LuminosoInsight/python-ftfy/ https://github.com/PythonCharmers/python-future/issues/116 https://dev.twitter.com/basics/counting-characters
On StackOverflow: http://stackoverflow.com/questions/1446347/how-to-find-out-if-python-is-compiled-with-ucs-2-or-ucs-4 http://stackoverflow.com/questions/12907022/python-getting-correct-string-length-when-it-contains-surrogate-pairs http://stackoverflow.com/questions/35404144/correctly-extract-emojis-from-a-unicode-string
-
__weakref__
¶ list of weak references to the object (if defined)
-
handlers¶
Request handler utility classes.
Includes classes for serving templates with common variables and XRD[S] and JRD files like host-meta and friends.
-
oauth_dropins.webutil.handlers.
handle_exception
(self, e, debug)[source]¶ A webapp2 exception handler that propagates HTTP exceptions into the response.
Use this as a
webapp2.RequestHandler.handle_exception()
method by adding this line to your handler class definition:handle_exception = handlers.handle_exception
I originally tried to put this in a
webapp2.RequestHandler
subclass, but it gave me this exception:File ".../webapp2-2.5.1/webapp2_extras/local.py", line 136, in _get_current_object raise RuntimeError('no object bound to %s' % self.__name__) RuntimeError: no object bound to app
These are probably related:
-
oauth_dropins.webutil.handlers.
redirect
(from_domain, to_domain)[source]¶ webapp2.RequestHandler
decorator that 301 redirects to a new domain.Preserves scheme, path, and query.
Parameters: - from_domain – string or sequence of strings
- to_domain – strings
-
oauth_dropins.webutil.handlers.
memcache_response
(expiration)[source]¶ webapp2.RequestHandler
decorator that memcaches the response.Parameters: expiration – datetime.timedelta
-
class
oauth_dropins.webutil.handlers.
ModernHandler
(*args, **kwargs)[source]¶ Bases:
webapp2.RequestHandler
Base handler that adds modern open/secure headers like CORS, HSTS, etc.
-
class
oauth_dropins.webutil.handlers.
TemplateHandler
(*args, **kwargs)[source]¶ Bases:
oauth_dropins.webutil.handlers.ModernHandler
Renders and serves a template based on class attributes.
Subclasses must override
template_file()
and may also overridetemplate_vars()
andcontent_type()
.
-
class
oauth_dropins.webutil.handlers.
XrdOrJrdHandler
(*args, **kwargs)[source]¶ Bases:
oauth_dropins.webutil.handlers.TemplateHandler
Renders and serves an XRD or JRD file.
JRD is served if the request path ends in .json, or the query parameters include ‘format=json’, or the request headers include ‘Accept: application/json’.
Subclasses must override
template_prefix()
.- Class members:
- JRD_TEMPLATE: boolean, renders JRD with a template if True, otherwise renders it as JSON directly.
-
class
oauth_dropins.webutil.handlers.
HostMetaHandler
(*args, **kwargs)[source]¶ Bases:
oauth_dropins.webutil.handlers.XrdOrJrdHandler
Renders and serves the /.well-known/host-meta file.
-
class
oauth_dropins.webutil.handlers.
HostMetaXrdsHandler
(*args, **kwargs)[source]¶ Bases:
oauth_dropins.webutil.handlers.TemplateHandler
Renders and serves the /.well-known/host-meta.xrds XRDS-Simple file.
models¶
App Engine datastore model base classes and utilites.
-
class
oauth_dropins.webutil.models.
StringIdModel
(*args, **kwds)[source]¶ Bases:
google.appengine.ext.ndb.model.Model
An ndb model class that requires a string id.
-
class
oauth_dropins.webutil.models.
KeyNameModel
(*args, **kwargs)[source]¶ Bases:
google.appengine.ext.db.Model
A db model class that requires a key name.
-
class
oauth_dropins.webutil.models.
SingleEGModel
(self_or_cls, *args, **kwargs)[source]¶ Bases:
google.appengine.ext.db.Model
A model class that stores all entities in a single entity group.
All entities use the same parent key (below), and
all()
automatically adds it as an ancestor. That allows, among other things, fetching all entities of this kind with strong consistency.-
enforce_parent
(fn)[source]¶ Sets the parent keyword arg. If it’s already set, checks that it’s correct.
Returns the shared parent key for this class.
It’s not actually an entity, just a placeholder key.
-
testutil¶
Unit test utilities.
-
oauth_dropins.webutil.testutil.
get_task_params
(task)[source]¶ Parses a task’s POST body and returns the query params in a dict.
-
oauth_dropins.webutil.testutil.
requests_response
(body='', url=None, status=200, content_type=None, redirected_url=None, headers=None, allow_redirects=None)[source]¶ Parameters: redirected_url – string URL or sequence of string URLs for multiple redirects
-
class
oauth_dropins.webutil.testutil.
UrlopenResult
(status_code, content, url=None, headers={})[source]¶ Bases:
object
A fake
urllib2.urlopen()
orurlfetch.fetch()
result object.-
__weakref__
¶ list of weak references to the object (if defined)
-
-
class
oauth_dropins.webutil.testutil.
Asserts
[source]¶ Bases:
object
Test case mixin class with extra assert helpers.
-
assert_entities_equal
(a, b, ignore=frozenset([]), keys_only=False, in_order=False)[source]¶ Asserts that a and b are equivalent entities or lists of entities.
…specifically, that they have the same property values, and if they both have populated keys, that their keys are equal too.
Parameters: - a –
db.Model
orndb.Model
instances or lists of instances - b – same
- ignore – sequence of strings, property names not to compare
- keys_only – boolean, if True only compare keys
- in_order – boolean. If False, all entities must have keys.
- a –
-
assert_equals
(expected, actual, msg=None, in_order=False)[source]¶ Pinpoints individual element differences in lists and dicts.
If in_order is False, ignores order in lists and tuples.
-
assert_multiline_equals
(expected, actual, ignore_blanks=False)[source]¶ Compares two multi-line strings and reports a diff style output.
Ignores leading and trailing whitespace on each line, and squeezes repeated blank lines down to just one.
Parameters: ignore_blanks – boolean, whether to ignore blank lines altogether
-
assert_multiline_in
(expected, actual, ignore_blanks=False)[source]¶ Checks that a multi-line string is in another and reports a diff output.
Ignores leading and trailing whitespace on each line, and squeezes repeated blank lines down to just one.
Parameters: ignore_blanks – boolean, whether to ignore blank lines altogether
-
__weakref__
¶ list of weak references to the object (if defined)
-
-
class
oauth_dropins.webutil.testutil.
TestCase
(methodName='runTest')[source]¶ Bases:
mox.MoxTestBase
,oauth_dropins.webutil.testutil.Asserts
Test case class with lots of extra helpers.
-
expect_urlopen
(url, response=None, status=200, data=None, headers=None, response_headers={}, **kwargs)[source]¶ Stubs out
urllib2.urlopen()
and sets up an expected call.If status isn’t 2xx, makes the expected call raise a
urllib2.HTTPError
instead of returning the response.If data is set, url must be a
urllib2.Request
.If response is unset, returns the expected call.
Parameters: - url – string,
re.RegexObject
orurllib2.Request
orwebob.request.Request
- response – string
- status – int, HTTP response code
- data – optional string POST body
- headers – optional expected request header dict
- response_headers – optional response header dict
- kwargs – other keyword args, e.g. timeout
- url – string,
-
-
class
oauth_dropins.webutil.testutil.
HandlerTest
(methodName='runTest')[source]¶ Bases:
oauth_dropins.webutil.testutil.TestCase
Base test class for webapp2 request handlers.
Uses App Engine’s testbed to set up API stubs: http://code.google.com/appengine/docs/python/tools/localunittesting.html
-
application
¶
-
handler
¶
-