openmedialibrary_platform/Shared/lib/python2.7/site-packages/html5lib-0.999.egg-info/PKG-INFO

Metadata-Version: 1.1
Name: html5lib
Version: 0.999
Summary: HTML parser based on the WHATWG HTML specifcation
Home-page: https://github.com/html5lib/html5lib-python
Author: James Graham
Author-email: james@hoppipolla.co.uk
License: MIT License
Description: html5lib
        ========
        
        .. image:: https://travis-ci.org/html5lib/html5lib-python.png?branch=master
          :target: https://travis-ci.org/html5lib/html5lib-python
        
        html5lib is a pure-python library for parsing HTML. It is designed to
        conform to the WHATWG HTML specification, as is implemented by all major
        web browsers.
        
        
        Usage
        -----
        
        Simple usage follows this pattern:
        
        .. code-block:: python
        
          import html5lib
          with open("mydocument.html", "rb") as f:
              document = html5lib.parse(f)
        
        or:
        
        .. code-block:: python
        
          import html5lib
          document = html5lib.parse("<p>Hello World!")
        
        By default, the ``document`` will be an ``xml.etree`` element instance.
        Whenever possible, html5lib chooses the accelerated ``ElementTree``
        implementation (i.e. ``xml.etree.cElementTree`` on Python 2.x).
        
        Two other tree types are supported: ``xml.dom.minidom`` and
        ``lxml.etree``. To use an alternative format, specify the name of
        a treebuilder:
        
        .. code-block:: python
        
          import html5lib
          with open("mydocument.html", "rb") as f:
              lxml_etree_document = html5lib.parse(f, treebuilder="lxml")
        
        When using with ``urllib2`` (Python 2), the charset from HTTP should be
        pass into html5lib as follows:
        
        .. code-block:: python
        
          from contextlib import closing
          from urllib2 import urlopen
          import html5lib
        
          with closing(urlopen("http://example.com/")) as f:
              document = html5lib.parse(f, encoding=f.info().getparam("charset"))
        
        When using with ``urllib.request`` (Python 3), the charset from HTTP
        should be pass into html5lib as follows:
        
        .. code-block:: python
        
          from urllib.request import urlopen
          import html5lib
        
          with urlopen("http://example.com/") as f:
              document = html5lib.parse(f, encoding=f.info().get_content_charset())
        
        To have more control over the parser, create a parser object explicitly.
        For instance, to make the parser raise exceptions on parse errors, use:
        
        .. code-block:: python
        
          import html5lib
          with open("mydocument.html", "rb") as f:
              parser = html5lib.HTMLParser(strict=True)
              document = parser.parse(f)
        
        When you're instantiating parser objects explicitly, pass a treebuilder
        class as the ``tree`` keyword argument to use an alternative document
        format:
        
        .. code-block:: python
        
          import html5lib
          parser = html5lib.HTMLParser(tree=html5lib.getTreeBuilder("dom"))
          minidom_document = parser.parse("<p>Hello World!")
        
        More documentation is available at http://html5lib.readthedocs.org/.
        
        
        Installation
        ------------
        
        html5lib works on CPython 2.6+, CPython 3.2+ and PyPy.  To install it,
        use:
        
        .. code-block:: bash
        
            $ pip install html5lib
        
        
        Optional Dependencies
        ---------------------
        
        The following third-party libraries may be used for additional
        functionality:
        
        - ``datrie`` can be used to improve parsing performance (though in
          almost all cases the improvement is marginal);
        
        - ``lxml`` is supported as a tree format (for both building and
          walking) under CPython (but *not* PyPy where it is known to cause
          segfaults);
        
        - ``genshi`` has a treewalker (but not builder); and
        
        - ``charade`` can be used as a fallback when character encoding cannot
          be determined; ``chardet``, from which it was forked, can also be used
          on Python 2.
        
        - ``ordereddict`` can be used under Python 2.6
          (``collections.OrderedDict`` is used instead on later versions) to
          serialize attributes in alphabetical order.
        
        
        Bugs
        ----
        
        Please report any bugs on the `issue tracker
        <https://github.com/html5lib/html5lib-python/issues>`_.
        
        
        Tests
        -----
        
        Unit tests require the ``nose`` library and can be run using the
        ``nosetests`` command in the root directory; ``ordereddict`` is
        required under Python 2.6. All should pass.
        
        Test data are contained in a separate `html5lib-tests
        <https://github.com/html5lib/html5lib-tests>`_ repository and included
        as a submodule, thus for git checkouts they must be initialized::
        
          $ git submodule init
          $ git submodule update
        
        If you have all compatible Python implementations available on your
        system, you can run tests on all of them using the ``tox`` utility,
        which can be found on PyPI.
        
        
        Questions?
        ----------
        
        There's a mailing list available for support on Google Groups,
        `html5lib-discuss <http://groups.google.com/group/html5lib-discuss>`_,
        though you may get a quicker response asking on IRC in `#whatwg on
        irc.freenode.net <http://wiki.whatwg.org/wiki/IRC>`_.
        
        Change Log
        ----------
        
        0.999
        ~~~~~
        
        Released on December 23, 2013
        
        * Fix #127: add work-around for CPython issue #20007: .read(0) on
          http.client.HTTPResponse drops the rest of the content.
        
        * Fix #115: lxml treewalker can now deal with fragments containing, at
          their root level, text nodes with non-ASCII characters on Python 2.
        
        
        0.99
        ~~~~
        
        Released on September 10, 2013
        
        * No library changes from 1.0b3; released as 0.99 as pip has changed
          behaviour from 1.4 to avoid installing pre-release versions per
          PEP 440.
        
        
        1.0b3
        ~~~~~
        
        Released on July 24, 2013
        
        * Removed ``RecursiveTreeWalker`` from ``treewalkers._base``. Any
          implementation using it should be moved to
          ``NonRecursiveTreeWalker``, as everything bundled with html5lib has
          for years.
        
        * Fix #67 so that ``BufferedStream`` to correctly returns a bytes
          object, thereby fixing any case where html5lib is passed a
          non-seekable RawIOBase-like object.
        
        
        1.0b2
        ~~~~~
        
        Released on June 27, 2013
        
        * Removed reordering of attributes within the serializer. There is now
          an ``alphabetical_attributes`` option which preserves the previous
          behaviour through a new filter. This allows attribute order to be
          preserved through html5lib if the tree builder preserves order.
        
        * Removed ``dom2sax`` from DOM treebuilders. It has been replaced by
          ``treeadapters.sax.to_sax`` which is generic and supports any
          treewalker; it also resolves all known bugs with ``dom2sax``.
        
        * Fix treewalker assertions on hitting bytes strings on
          Python 2. Previous to 1.0b1, treewalkers coped with mixed
          bytes/unicode data on Python 2; this reintroduces this prior
          behaviour on Python 2. Behaviour is unchanged on Python 3.
        
        
        1.0b1
        ~~~~~
        
        Released on May 17, 2013
        
        * Implementation updated to implement the `HTML specification
          <http://www.whatwg.org/specs/web-apps/current-work/>`_ as of 5th May
          2013 (`SVN <http://svn.whatwg.org/webapps/>`_ revision r7867).
        
        * Python 3.2+ supported in a single codebase using the ``six`` library.
        
        * Removed support for Python 2.5 and older.
        
        * Removed the deprecated Beautiful Soup 3 treebuilder.
          ``beautifulsoup4`` can use ``html5lib`` as a parser instead. Note that
          since it doesn't support namespaces, foreign content like SVG and
          MathML is parsed incorrectly.
        
        * Removed ``simpletree`` from the package. The default tree builder is
          now ``etree`` (using the ``xml.etree.cElementTree`` implementation if
          available, and ``xml.etree.ElementTree`` otherwise).
        
        * Removed the ``XHTMLSerializer`` as it never actually guaranteed its
          output was well-formed XML, and hence provided little of use.
        
        * Removed default DOM treebuilder, so ``html5lib.treebuilders.dom`` is no
          longer supported. ``html5lib.treebuilders.getTreeBuilder("dom")`` will
          return the default DOM treebuilder, which uses ``xml.dom.minidom``.
        
        * Optional heuristic character encoding detection now based on
          ``charade`` for Python 2.6 - 3.3 compatibility.
        
        * Optional ``Genshi`` treewalker support fixed.
        
        * Many bugfixes, including:
        
          * #33: null in attribute value breaks XML AttValue;
        
          * #4: nested, indirect descendant, <button> causes infinite loop;
        
          * `Google Code 215
            <http://code.google.com/p/html5lib/issues/detail?id=215>`_: Properly
            detect seekable streams;
        
          * `Google Code 206
            <http://code.google.com/p/html5lib/issues/detail?id=206>`_: add
            support for <video preload=...>, <audio preload=...>;
        
          * `Google Code 205
            <http://code.google.com/p/html5lib/issues/detail?id=205>`_: add
            support for <video poster=...>;
        
          * `Google Code 202
            <http://code.google.com/p/html5lib/issues/detail?id=202>`_: Unicode
            file breaks InputStream.
        
        * Source code is now mostly PEP 8 compliant.
        
        * Test harness has been improved and now depends on ``nose``.
        
        * Documentation updated and moved to http://html5lib.readthedocs.org/.
        
        
        0.95
        ~~~~
        
        Released on February 11, 2012
        
        
        0.90
        ~~~~
        
        Released on January 17, 2010
        
        
        0.11.1
        ~~~~~~
        
        Released on June 12, 2008
        
        
        0.11
        ~~~~
        
        Released on June 10, 2008
        
        
        0.10
        ~~~~
        
        Released on October 7, 2007
        
        
        0.9
        ~~~
        
        Released on March 11, 2007
        
        
        0.2
        ~~~
        
        Released on January 8, 2007
        
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.6
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.2
Classifier: Programming Language :: Python :: 3.3
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Markup :: HTML
Open Media Library Platform 2013-10-11 19:28:32 +02:00			`Metadata-Version: 1.1`
			`Name: html5lib`
			`Version: 0.999`
			`Summary: HTML parser based on the WHATWG HTML specifcation`
			`Home-page: https://github.com/html5lib/html5lib-python`
			`Author: James Graham`
			`Author-email: james@hoppipolla.co.uk`
			`License: MIT License`
			`Description: html5lib`
			`========`

			`.. image:: https://travis-ci.org/html5lib/html5lib-python.png?branch=master`
			`:target: https://travis-ci.org/html5lib/html5lib-python`

			`html5lib is a pure-python library for parsing HTML. It is designed to`
			`conform to the WHATWG HTML specification, as is implemented by all major`
			`web browsers.`


			`Usage`
			`-----`

			`Simple usage follows this pattern:`

			`.. code-block:: python`

			`import html5lib`
			`with open("mydocument.html", "rb") as f:`
			`document = html5lib.parse(f)`

			`or:`

			`.. code-block:: python`

			`import html5lib`
			`document = html5lib.parse("<p>Hello World!")`

			By default, the ``document`` will be an ``xml.etree`` element instance.
			Whenever possible, html5lib chooses the accelerated ``ElementTree``
			implementation (i.e. ``xml.etree.cElementTree`` on Python 2.x).

			Two other tree types are supported: ``xml.dom.minidom`` and
			``lxml.etree``. To use an alternative format, specify the name of
			`a treebuilder:`

			`.. code-block:: python`

			`import html5lib`
			`with open("mydocument.html", "rb") as f:`
			`lxml_etree_document = html5lib.parse(f, treebuilder="lxml")`

			When using with ``urllib2`` (Python 2), the charset from HTTP should be
			`pass into html5lib as follows:`

			`.. code-block:: python`

			`from contextlib import closing`
			`from urllib2 import urlopen`
			`import html5lib`

			`with closing(urlopen("http://example.com/")) as f:`
			`document = html5lib.parse(f, encoding=f.info().getparam("charset"))`

			When using with ``urllib.request`` (Python 3), the charset from HTTP
			`should be pass into html5lib as follows:`

			`.. code-block:: python`

			`from urllib.request import urlopen`
			`import html5lib`

			`with urlopen("http://example.com/") as f:`
			`document = html5lib.parse(f, encoding=f.info().get_content_charset())`

			`To have more control over the parser, create a parser object explicitly.`
			`For instance, to make the parser raise exceptions on parse errors, use:`

			`.. code-block:: python`

			`import html5lib`
			`with open("mydocument.html", "rb") as f:`
			`parser = html5lib.HTMLParser(strict=True)`
			`document = parser.parse(f)`

			`When you're instantiating parser objects explicitly, pass a treebuilder`
			class as the ``tree`` keyword argument to use an alternative document
			`format:`

			`.. code-block:: python`

			`import html5lib`
			`parser = html5lib.HTMLParser(tree=html5lib.getTreeBuilder("dom"))`
			`minidom_document = parser.parse("<p>Hello World!")`

			`More documentation is available at http://html5lib.readthedocs.org/.`


			`Installation`
			`------------`

			`html5lib works on CPython 2.6+, CPython 3.2+ and PyPy. To install it,`
			`use:`

			`.. code-block:: bash`

			`$ pip install html5lib`


			`Optional Dependencies`
			`---------------------`

			`The following third-party libraries may be used for additional`
			`functionality:`

			- ``datrie`` can be used to improve parsing performance (though in
			`almost all cases the improvement is marginal);`

			- ``lxml`` is supported as a tree format (for both building and
			`walking) under CPython (but not PyPy where it is known to cause`
			`segfaults);`

			- ``genshi`` has a treewalker (but not builder); and

			- ``charade`` can be used as a fallback when character encoding cannot
			be determined; ``chardet``, from which it was forked, can also be used
			`on Python 2.`

			- ``ordereddict`` can be used under Python 2.6
			(``collections.OrderedDict`` is used instead on later versions) to
			`serialize attributes in alphabetical order.`


			`Bugs`
			`----`

			Please report any bugs on the `issue tracker
			<https://github.com/html5lib/html5lib-python/issues>`_.


			`Tests`
			`-----`

			Unit tests require the ``nose`` library and can be run using the
			``nosetests`` command in the root directory; ``ordereddict`` is
			`required under Python 2.6. All should pass.`

			Test data are contained in a separate `html5lib-tests
			<https://github.com/html5lib/html5lib-tests>`_ repository and included
			`as a submodule, thus for git checkouts they must be initialized::`

			`$ git submodule init`
			`$ git submodule update`

			`If you have all compatible Python implementations available on your`
			system, you can run tests on all of them using the ``tox`` utility,
			`which can be found on PyPI.`


			`Questions?`
			`----------`

			`There's a mailing list available for support on Google Groups,`
			`html5lib-discuss <http://groups.google.com/group/html5lib-discuss>`_,
			though you may get a quicker response asking on IRC in `#whatwg on
			irc.freenode.net <http://wiki.whatwg.org/wiki/IRC>`_.

			`Change Log`
			`----------`

			`0.999`
			`~~~~~`

			`Released on December 23, 2013`

			`* Fix #127: add work-around for CPython issue #20007: .read(0) on`
			`http.client.HTTPResponse drops the rest of the content.`

			`* Fix #115: lxml treewalker can now deal with fragments containing, at`
			`their root level, text nodes with non-ASCII characters on Python 2.`


			`0.99`
			`~~~~`

			`Released on September 10, 2013`

			`* No library changes from 1.0b3; released as 0.99 as pip has changed`
			`behaviour from 1.4 to avoid installing pre-release versions per`
			`PEP 440.`


			`1.0b3`
			`~~~~~`

			`Released on July 24, 2013`

			* Removed ``RecursiveTreeWalker`` from ``treewalkers._base``. Any
			`implementation using it should be moved to`
			``NonRecursiveTreeWalker``, as everything bundled with html5lib has
			`for years.`

			* Fix #67 so that ``BufferedStream`` to correctly returns a bytes
			`object, thereby fixing any case where html5lib is passed a`
			`non-seekable RawIOBase-like object.`


			`1.0b2`
			`~~~~~`

			`Released on June 27, 2013`

			`* Removed reordering of attributes within the serializer. There is now`
			an ``alphabetical_attributes`` option which preserves the previous
			`behaviour through a new filter. This allows attribute order to be`
			`preserved through html5lib if the tree builder preserves order.`

			* Removed ``dom2sax`` from DOM treebuilders. It has been replaced by
			``treeadapters.sax.to_sax`` which is generic and supports any
			treewalker; it also resolves all known bugs with ``dom2sax``.

			`* Fix treewalker assertions on hitting bytes strings on`
			`Python 2. Previous to 1.0b1, treewalkers coped with mixed`
			`bytes/unicode data on Python 2; this reintroduces this prior`
			`behaviour on Python 2. Behaviour is unchanged on Python 3.`


			`1.0b1`
			`~~~~~`

			`Released on May 17, 2013`

			* Implementation updated to implement the `HTML specification
			<http://www.whatwg.org/specs/web-apps/current-work/>`_ as of 5th May
			2013 (`SVN <http://svn.whatwg.org/webapps/>`_ revision r7867).

			* Python 3.2+ supported in a single codebase using the ``six`` library.

			`* Removed support for Python 2.5 and older.`

			`* Removed the deprecated Beautiful Soup 3 treebuilder.`
			``beautifulsoup4`` can use ``html5lib`` as a parser instead. Note that
			`since it doesn't support namespaces, foreign content like SVG and`
			`MathML is parsed incorrectly.`

			* Removed ``simpletree`` from the package. The default tree builder is
			now ``etree`` (using the ``xml.etree.cElementTree`` implementation if
			available, and ``xml.etree.ElementTree`` otherwise).

			* Removed the ``XHTMLSerializer`` as it never actually guaranteed its
			`output was well-formed XML, and hence provided little of use.`

			* Removed default DOM treebuilder, so ``html5lib.treebuilders.dom`` is no
			longer supported. ``html5lib.treebuilders.getTreeBuilder("dom")`` will
			return the default DOM treebuilder, which uses ``xml.dom.minidom``.

			`* Optional heuristic character encoding detection now based on`
			``charade`` for Python 2.6 - 3.3 compatibility.

			* Optional ``Genshi`` treewalker support fixed.

			`* Many bugfixes, including:`

			`* #33: null in attribute value breaks XML AttValue;`

			`* #4: nested, indirect descendant, <button> causes infinite loop;`

			* `Google Code 215
			<http://code.google.com/p/html5lib/issues/detail?id=215>`_: Properly
			`detect seekable streams;`

			* `Google Code 206
			<http://code.google.com/p/html5lib/issues/detail?id=206>`_: add
			`support for <video preload=...>, <audio preload=...>;`

			* `Google Code 205
			<http://code.google.com/p/html5lib/issues/detail?id=205>`_: add
			`support for <video poster=...>;`

			* `Google Code 202
			<http://code.google.com/p/html5lib/issues/detail?id=202>`_: Unicode
			`file breaks InputStream.`

			`* Source code is now mostly PEP 8 compliant.`

			* Test harness has been improved and now depends on ``nose``.

			`* Documentation updated and moved to http://html5lib.readthedocs.org/.`


			`0.95`
			`~~~~`

			`Released on February 11, 2012`


			`0.90`
			`~~~~`

			`Released on January 17, 2010`


			`0.11.1`
			`~~~~~~`

			`Released on June 12, 2008`


			`0.11`
			`~~~~`

			`Released on June 10, 2008`


			`0.10`
			`~~~~`

			`Released on October 7, 2007`


			`0.9`
			`~~~`

			`Released on March 11, 2007`


			`0.2`
			`~~~`

			`Released on January 8, 2007`

			`Platform: UNKNOWN`
			`Classifier: Development Status :: 5 - Production/Stable`
			`Classifier: Intended Audience :: Developers`
			`Classifier: License :: OSI Approved :: MIT License`
			`Classifier: Operating System :: OS Independent`
			`Classifier: Programming Language :: Python`
			`Classifier: Programming Language :: Python :: 2`
			`Classifier: Programming Language :: Python :: 2.6`
			`Classifier: Programming Language :: Python :: 2.7`
			`Classifier: Programming Language :: Python :: 3`
			`Classifier: Programming Language :: Python :: 3.2`
			`Classifier: Programming Language :: Python :: 3.3`
			`Classifier: Topic :: Software Development :: Libraries :: Python Modules`
			`Classifier: Topic :: Text Processing :: Markup :: HTML`