Coverage for pygments.regexopt : 95%

Hot-keys on this page
r m x p toggle line displays
j k next/prev highlighted chunk
0 (zero) top of page
1 (one) first highlighted chunk
# -*- coding: utf-8 -*- pygments.regexopt ~~~~~~~~~~~~~~~~~
An algorithm that generates optimized regexes for matching long lists of literal strings.
:copyright: Copyright 2006-2014 by the Pygments team, see AUTHORS. :license: BSD, see LICENSE for details. """
"""Return a regex that matches any string in the sorted list of strings.""" # print strings, repr(open_paren) # print '-> nothing left' return '' # print '-> only 1 string' # print '-> first string empty' + '?' + close_paren # multiple one-char strings? make a charset else: # print '-> 1-character + rest' return open_paren + regex_opt_inner(rest, '') + '|' \ + make_charset(oneletter) + close_paren # print '-> only 1-character' # we have a prefix for all strings # print '-> prefix:', prefix + regex_opt_inner([s[plen:] for s in strings], '(?:') \ + close_paren # is there a suffix? # print '-> suffix:', suffix[::-1] + regex_opt_inner(sorted(s[:-slen] for s in strings), '(?:') \ + escape(suffix[::-1]) + close_paren # recurse on common 1-string prefixes # print '-> last resort' '|'.join(regex_opt_inner(list(group[1]), '') for group in groupby(strings, lambda s: s[0] == first[0])) \ + close_paren
"""Return a compiled regex that matches any string in the given list.
The strings to match must be literal strings, not regexes. They will be regex-escaped.
*prefix* and *suffix* are pre- and appended to the final regex. """ |