update shared dependencies
This commit is contained in:
parent
d4d3d82be3
commit
736cd598a8
521 changed files with 45146 additions and 22574 deletions
|
|
@ -0,0 +1,378 @@
|
|||
html5lib
|
||||
========
|
||||
|
||||
.. image:: https://travis-ci.org/html5lib/html5lib-python.png?branch=master
|
||||
:target: https://travis-ci.org/html5lib/html5lib-python
|
||||
|
||||
html5lib is a pure-python library for parsing HTML. It is designed to
|
||||
conform to the WHATWG HTML specification, as is implemented by all major
|
||||
web browsers.
|
||||
|
||||
|
||||
Usage
|
||||
-----
|
||||
|
||||
Simple usage follows this pattern:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
import html5lib
|
||||
with open("mydocument.html", "rb") as f:
|
||||
document = html5lib.parse(f)
|
||||
|
||||
or:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
import html5lib
|
||||
document = html5lib.parse("<p>Hello World!")
|
||||
|
||||
By default, the ``document`` will be an ``xml.etree`` element instance.
|
||||
Whenever possible, html5lib chooses the accelerated ``ElementTree``
|
||||
implementation (i.e. ``xml.etree.cElementTree`` on Python 2.x).
|
||||
|
||||
Two other tree types are supported: ``xml.dom.minidom`` and
|
||||
``lxml.etree``. To use an alternative format, specify the name of
|
||||
a treebuilder:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
import html5lib
|
||||
with open("mydocument.html", "rb") as f:
|
||||
lxml_etree_document = html5lib.parse(f, treebuilder="lxml")
|
||||
|
||||
When using with ``urllib2`` (Python 2), the charset from HTTP should be
|
||||
pass into html5lib as follows:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from contextlib import closing
|
||||
from urllib2 import urlopen
|
||||
import html5lib
|
||||
|
||||
with closing(urlopen("http://example.com/")) as f:
|
||||
document = html5lib.parse(f, encoding=f.info().getparam("charset"))
|
||||
|
||||
When using with ``urllib.request`` (Python 3), the charset from HTTP
|
||||
should be pass into html5lib as follows:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from urllib.request import urlopen
|
||||
import html5lib
|
||||
|
||||
with urlopen("http://example.com/") as f:
|
||||
document = html5lib.parse(f, encoding=f.info().get_content_charset())
|
||||
|
||||
To have more control over the parser, create a parser object explicitly.
|
||||
For instance, to make the parser raise exceptions on parse errors, use:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
import html5lib
|
||||
with open("mydocument.html", "rb") as f:
|
||||
parser = html5lib.HTMLParser(strict=True)
|
||||
document = parser.parse(f)
|
||||
|
||||
When you're instantiating parser objects explicitly, pass a treebuilder
|
||||
class as the ``tree`` keyword argument to use an alternative document
|
||||
format:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
import html5lib
|
||||
parser = html5lib.HTMLParser(tree=html5lib.getTreeBuilder("dom"))
|
||||
minidom_document = parser.parse("<p>Hello World!")
|
||||
|
||||
More documentation is available at http://html5lib.readthedocs.org/.
|
||||
|
||||
|
||||
Installation
|
||||
------------
|
||||
|
||||
html5lib works on CPython 2.6+, CPython 3.2+ and PyPy. To install it,
|
||||
use:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
$ pip install html5lib
|
||||
|
||||
|
||||
Optional Dependencies
|
||||
---------------------
|
||||
|
||||
The following third-party libraries may be used for additional
|
||||
functionality:
|
||||
|
||||
- ``datrie`` can be used to improve parsing performance (though in
|
||||
almost all cases the improvement is marginal);
|
||||
|
||||
- ``lxml`` is supported as a tree format (for both building and
|
||||
walking) under CPython (but *not* PyPy where it is known to cause
|
||||
segfaults);
|
||||
|
||||
- ``genshi`` has a treewalker (but not builder); and
|
||||
|
||||
- ``charade`` can be used as a fallback when character encoding cannot
|
||||
be determined; ``chardet``, from which it was forked, can also be used
|
||||
on Python 2.
|
||||
|
||||
- ``ordereddict`` can be used under Python 2.6
|
||||
(``collections.OrderedDict`` is used instead on later versions) to
|
||||
serialize attributes in alphabetical order.
|
||||
|
||||
|
||||
Bugs
|
||||
----
|
||||
|
||||
Please report any bugs on the `issue tracker
|
||||
<https://github.com/html5lib/html5lib-python/issues>`_.
|
||||
|
||||
|
||||
Tests
|
||||
-----
|
||||
|
||||
Unit tests require the ``nose`` library and can be run using the
|
||||
``nosetests`` command in the root directory; ``ordereddict`` is
|
||||
required under Python 2.6. All should pass.
|
||||
|
||||
Test data are contained in a separate `html5lib-tests
|
||||
<https://github.com/html5lib/html5lib-tests>`_ repository and included
|
||||
as a submodule, thus for git checkouts they must be initialized::
|
||||
|
||||
$ git submodule init
|
||||
$ git submodule update
|
||||
|
||||
If you have all compatible Python implementations available on your
|
||||
system, you can run tests on all of them using the ``tox`` utility,
|
||||
which can be found on PyPI.
|
||||
|
||||
|
||||
Questions?
|
||||
----------
|
||||
|
||||
There's a mailing list available for support on Google Groups,
|
||||
`html5lib-discuss <http://groups.google.com/group/html5lib-discuss>`_,
|
||||
though you may get a quicker response asking on IRC in `#whatwg on
|
||||
irc.freenode.net <http://wiki.whatwg.org/wiki/IRC>`_.
|
||||
|
||||
Change Log
|
||||
----------
|
||||
|
||||
0.9999999/1.0b8
|
||||
~~~~~~~~~~~~~~~
|
||||
|
||||
Released on September 10, 2015
|
||||
|
||||
* Fix #195: fix the sanitizer to drop broken URLs (it threw an
|
||||
exception between 0.9999 and 0.999999).
|
||||
|
||||
|
||||
0.999999/1.0b7
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
Released on July 7, 2015
|
||||
|
||||
* Fix #189: fix the sanitizer to allow relative URLs again (as it did
|
||||
prior to 0.9999/1.0b5).
|
||||
|
||||
|
||||
0.99999/1.0b6
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
Released on April 30, 2015
|
||||
|
||||
* Fix #188: fix the sanitizer to not throw an exception when sanitizing
|
||||
bogus data URLs.
|
||||
|
||||
|
||||
0.9999/1.0b5
|
||||
~~~~~~~~~~~~
|
||||
|
||||
Released on April 29, 2015
|
||||
|
||||
* Fix #153: Sanitizer fails to treat some attributes as URLs. Despite how
|
||||
this sounds, this has no known security implications. No known version
|
||||
of IE (5.5 to current), Firefox (3 to current), Safari (6 to current),
|
||||
Chrome (1 to current), or Opera (12 to current) will run any script
|
||||
provided in these attributes.
|
||||
|
||||
* Pass error message to the ParseError exception in strict parsing mode.
|
||||
|
||||
* Allow data URIs in the sanitizer, with a whitelist of content-types.
|
||||
|
||||
* Add support for Python implementations that don't support lone
|
||||
surrogates (read: Jython). Fixes #2.
|
||||
|
||||
* Remove localization of error messages. This functionality was totally
|
||||
unused (and untested that everything was localizable), so we may as
|
||||
well follow numerous browsers in not supporting translating technical
|
||||
strings.
|
||||
|
||||
* Expose treewalkers.pprint as a public API.
|
||||
|
||||
* Add a documentEncoding property to HTML5Parser, fix #121.
|
||||
|
||||
|
||||
0.999
|
||||
~~~~~
|
||||
|
||||
Released on December 23, 2013
|
||||
|
||||
* Fix #127: add work-around for CPython issue #20007: .read(0) on
|
||||
http.client.HTTPResponse drops the rest of the content.
|
||||
|
||||
* Fix #115: lxml treewalker can now deal with fragments containing, at
|
||||
their root level, text nodes with non-ASCII characters on Python 2.
|
||||
|
||||
|
||||
0.99
|
||||
~~~~
|
||||
|
||||
Released on September 10, 2013
|
||||
|
||||
* No library changes from 1.0b3; released as 0.99 as pip has changed
|
||||
behaviour from 1.4 to avoid installing pre-release versions per
|
||||
PEP 440.
|
||||
|
||||
|
||||
1.0b3
|
||||
~~~~~
|
||||
|
||||
Released on July 24, 2013
|
||||
|
||||
* Removed ``RecursiveTreeWalker`` from ``treewalkers._base``. Any
|
||||
implementation using it should be moved to
|
||||
``NonRecursiveTreeWalker``, as everything bundled with html5lib has
|
||||
for years.
|
||||
|
||||
* Fix #67 so that ``BufferedStream`` to correctly returns a bytes
|
||||
object, thereby fixing any case where html5lib is passed a
|
||||
non-seekable RawIOBase-like object.
|
||||
|
||||
|
||||
1.0b2
|
||||
~~~~~
|
||||
|
||||
Released on June 27, 2013
|
||||
|
||||
* Removed reordering of attributes within the serializer. There is now
|
||||
an ``alphabetical_attributes`` option which preserves the previous
|
||||
behaviour through a new filter. This allows attribute order to be
|
||||
preserved through html5lib if the tree builder preserves order.
|
||||
|
||||
* Removed ``dom2sax`` from DOM treebuilders. It has been replaced by
|
||||
``treeadapters.sax.to_sax`` which is generic and supports any
|
||||
treewalker; it also resolves all known bugs with ``dom2sax``.
|
||||
|
||||
* Fix treewalker assertions on hitting bytes strings on
|
||||
Python 2. Previous to 1.0b1, treewalkers coped with mixed
|
||||
bytes/unicode data on Python 2; this reintroduces this prior
|
||||
behaviour on Python 2. Behaviour is unchanged on Python 3.
|
||||
|
||||
|
||||
1.0b1
|
||||
~~~~~
|
||||
|
||||
Released on May 17, 2013
|
||||
|
||||
* Implementation updated to implement the `HTML specification
|
||||
<http://www.whatwg.org/specs/web-apps/current-work/>`_ as of 5th May
|
||||
2013 (`SVN <http://svn.whatwg.org/webapps/>`_ revision r7867).
|
||||
|
||||
* Python 3.2+ supported in a single codebase using the ``six`` library.
|
||||
|
||||
* Removed support for Python 2.5 and older.
|
||||
|
||||
* Removed the deprecated Beautiful Soup 3 treebuilder.
|
||||
``beautifulsoup4`` can use ``html5lib`` as a parser instead. Note that
|
||||
since it doesn't support namespaces, foreign content like SVG and
|
||||
MathML is parsed incorrectly.
|
||||
|
||||
* Removed ``simpletree`` from the package. The default tree builder is
|
||||
now ``etree`` (using the ``xml.etree.cElementTree`` implementation if
|
||||
available, and ``xml.etree.ElementTree`` otherwise).
|
||||
|
||||
* Removed the ``XHTMLSerializer`` as it never actually guaranteed its
|
||||
output was well-formed XML, and hence provided little of use.
|
||||
|
||||
* Removed default DOM treebuilder, so ``html5lib.treebuilders.dom`` is no
|
||||
longer supported. ``html5lib.treebuilders.getTreeBuilder("dom")`` will
|
||||
return the default DOM treebuilder, which uses ``xml.dom.minidom``.
|
||||
|
||||
* Optional heuristic character encoding detection now based on
|
||||
``charade`` for Python 2.6 - 3.3 compatibility.
|
||||
|
||||
* Optional ``Genshi`` treewalker support fixed.
|
||||
|
||||
* Many bugfixes, including:
|
||||
|
||||
* #33: null in attribute value breaks XML AttValue;
|
||||
|
||||
* #4: nested, indirect descendant, <button> causes infinite loop;
|
||||
|
||||
* `Google Code 215
|
||||
<http://code.google.com/p/html5lib/issues/detail?id=215>`_: Properly
|
||||
detect seekable streams;
|
||||
|
||||
* `Google Code 206
|
||||
<http://code.google.com/p/html5lib/issues/detail?id=206>`_: add
|
||||
support for <video preload=...>, <audio preload=...>;
|
||||
|
||||
* `Google Code 205
|
||||
<http://code.google.com/p/html5lib/issues/detail?id=205>`_: add
|
||||
support for <video poster=...>;
|
||||
|
||||
* `Google Code 202
|
||||
<http://code.google.com/p/html5lib/issues/detail?id=202>`_: Unicode
|
||||
file breaks InputStream.
|
||||
|
||||
* Source code is now mostly PEP 8 compliant.
|
||||
|
||||
* Test harness has been improved and now depends on ``nose``.
|
||||
|
||||
* Documentation updated and moved to http://html5lib.readthedocs.org/.
|
||||
|
||||
|
||||
0.95
|
||||
~~~~
|
||||
|
||||
Released on February 11, 2012
|
||||
|
||||
|
||||
0.90
|
||||
~~~~
|
||||
|
||||
Released on January 17, 2010
|
||||
|
||||
|
||||
0.11.1
|
||||
~~~~~~
|
||||
|
||||
Released on June 12, 2008
|
||||
|
||||
|
||||
0.11
|
||||
~~~~
|
||||
|
||||
Released on June 10, 2008
|
||||
|
||||
|
||||
0.10
|
||||
~~~~
|
||||
|
||||
Released on October 7, 2007
|
||||
|
||||
|
||||
0.9
|
||||
~~~
|
||||
|
||||
Released on March 11, 2007
|
||||
|
||||
|
||||
0.2
|
||||
~~~
|
||||
|
||||
Released on January 8, 2007
|
||||
|
||||
|
||||
|
|
@ -0,0 +1 @@
|
|||
pip
|
||||
|
|
@ -0,0 +1,403 @@
|
|||
Metadata-Version: 2.0
|
||||
Name: html5lib
|
||||
Version: 0.9999999
|
||||
Summary: HTML parser based on the WHATWG HTML specification
|
||||
Home-page: https://github.com/html5lib/html5lib-python
|
||||
Author: James Graham
|
||||
Author-email: james@hoppipolla.co.uk
|
||||
License: MIT License
|
||||
Platform: UNKNOWN
|
||||
Classifier: Development Status :: 5 - Production/Stable
|
||||
Classifier: Intended Audience :: Developers
|
||||
Classifier: License :: OSI Approved :: MIT License
|
||||
Classifier: Operating System :: OS Independent
|
||||
Classifier: Programming Language :: Python
|
||||
Classifier: Programming Language :: Python :: 2
|
||||
Classifier: Programming Language :: Python :: 2.6
|
||||
Classifier: Programming Language :: Python :: 2.7
|
||||
Classifier: Programming Language :: Python :: 3
|
||||
Classifier: Programming Language :: Python :: 3.2
|
||||
Classifier: Programming Language :: Python :: 3.3
|
||||
Classifier: Programming Language :: Python :: 3.4
|
||||
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
||||
Classifier: Topic :: Text Processing :: Markup :: HTML
|
||||
Requires-Dist: six
|
||||
|
||||
html5lib
|
||||
========
|
||||
|
||||
.. image:: https://travis-ci.org/html5lib/html5lib-python.png?branch=master
|
||||
:target: https://travis-ci.org/html5lib/html5lib-python
|
||||
|
||||
html5lib is a pure-python library for parsing HTML. It is designed to
|
||||
conform to the WHATWG HTML specification, as is implemented by all major
|
||||
web browsers.
|
||||
|
||||
|
||||
Usage
|
||||
-----
|
||||
|
||||
Simple usage follows this pattern:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
import html5lib
|
||||
with open("mydocument.html", "rb") as f:
|
||||
document = html5lib.parse(f)
|
||||
|
||||
or:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
import html5lib
|
||||
document = html5lib.parse("<p>Hello World!")
|
||||
|
||||
By default, the ``document`` will be an ``xml.etree`` element instance.
|
||||
Whenever possible, html5lib chooses the accelerated ``ElementTree``
|
||||
implementation (i.e. ``xml.etree.cElementTree`` on Python 2.x).
|
||||
|
||||
Two other tree types are supported: ``xml.dom.minidom`` and
|
||||
``lxml.etree``. To use an alternative format, specify the name of
|
||||
a treebuilder:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
import html5lib
|
||||
with open("mydocument.html", "rb") as f:
|
||||
lxml_etree_document = html5lib.parse(f, treebuilder="lxml")
|
||||
|
||||
When using with ``urllib2`` (Python 2), the charset from HTTP should be
|
||||
pass into html5lib as follows:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from contextlib import closing
|
||||
from urllib2 import urlopen
|
||||
import html5lib
|
||||
|
||||
with closing(urlopen("http://example.com/")) as f:
|
||||
document = html5lib.parse(f, encoding=f.info().getparam("charset"))
|
||||
|
||||
When using with ``urllib.request`` (Python 3), the charset from HTTP
|
||||
should be pass into html5lib as follows:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from urllib.request import urlopen
|
||||
import html5lib
|
||||
|
||||
with urlopen("http://example.com/") as f:
|
||||
document = html5lib.parse(f, encoding=f.info().get_content_charset())
|
||||
|
||||
To have more control over the parser, create a parser object explicitly.
|
||||
For instance, to make the parser raise exceptions on parse errors, use:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
import html5lib
|
||||
with open("mydocument.html", "rb") as f:
|
||||
parser = html5lib.HTMLParser(strict=True)
|
||||
document = parser.parse(f)
|
||||
|
||||
When you're instantiating parser objects explicitly, pass a treebuilder
|
||||
class as the ``tree`` keyword argument to use an alternative document
|
||||
format:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
import html5lib
|
||||
parser = html5lib.HTMLParser(tree=html5lib.getTreeBuilder("dom"))
|
||||
minidom_document = parser.parse("<p>Hello World!")
|
||||
|
||||
More documentation is available at http://html5lib.readthedocs.org/.
|
||||
|
||||
|
||||
Installation
|
||||
------------
|
||||
|
||||
html5lib works on CPython 2.6+, CPython 3.2+ and PyPy. To install it,
|
||||
use:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
$ pip install html5lib
|
||||
|
||||
|
||||
Optional Dependencies
|
||||
---------------------
|
||||
|
||||
The following third-party libraries may be used for additional
|
||||
functionality:
|
||||
|
||||
- ``datrie`` can be used to improve parsing performance (though in
|
||||
almost all cases the improvement is marginal);
|
||||
|
||||
- ``lxml`` is supported as a tree format (for both building and
|
||||
walking) under CPython (but *not* PyPy where it is known to cause
|
||||
segfaults);
|
||||
|
||||
- ``genshi`` has a treewalker (but not builder); and
|
||||
|
||||
- ``charade`` can be used as a fallback when character encoding cannot
|
||||
be determined; ``chardet``, from which it was forked, can also be used
|
||||
on Python 2.
|
||||
|
||||
- ``ordereddict`` can be used under Python 2.6
|
||||
(``collections.OrderedDict`` is used instead on later versions) to
|
||||
serialize attributes in alphabetical order.
|
||||
|
||||
|
||||
Bugs
|
||||
----
|
||||
|
||||
Please report any bugs on the `issue tracker
|
||||
<https://github.com/html5lib/html5lib-python/issues>`_.
|
||||
|
||||
|
||||
Tests
|
||||
-----
|
||||
|
||||
Unit tests require the ``nose`` library and can be run using the
|
||||
``nosetests`` command in the root directory; ``ordereddict`` is
|
||||
required under Python 2.6. All should pass.
|
||||
|
||||
Test data are contained in a separate `html5lib-tests
|
||||
<https://github.com/html5lib/html5lib-tests>`_ repository and included
|
||||
as a submodule, thus for git checkouts they must be initialized::
|
||||
|
||||
$ git submodule init
|
||||
$ git submodule update
|
||||
|
||||
If you have all compatible Python implementations available on your
|
||||
system, you can run tests on all of them using the ``tox`` utility,
|
||||
which can be found on PyPI.
|
||||
|
||||
|
||||
Questions?
|
||||
----------
|
||||
|
||||
There's a mailing list available for support on Google Groups,
|
||||
`html5lib-discuss <http://groups.google.com/group/html5lib-discuss>`_,
|
||||
though you may get a quicker response asking on IRC in `#whatwg on
|
||||
irc.freenode.net <http://wiki.whatwg.org/wiki/IRC>`_.
|
||||
|
||||
Change Log
|
||||
----------
|
||||
|
||||
0.9999999/1.0b8
|
||||
~~~~~~~~~~~~~~~
|
||||
|
||||
Released on September 10, 2015
|
||||
|
||||
* Fix #195: fix the sanitizer to drop broken URLs (it threw an
|
||||
exception between 0.9999 and 0.999999).
|
||||
|
||||
|
||||
0.999999/1.0b7
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
Released on July 7, 2015
|
||||
|
||||
* Fix #189: fix the sanitizer to allow relative URLs again (as it did
|
||||
prior to 0.9999/1.0b5).
|
||||
|
||||
|
||||
0.99999/1.0b6
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
Released on April 30, 2015
|
||||
|
||||
* Fix #188: fix the sanitizer to not throw an exception when sanitizing
|
||||
bogus data URLs.
|
||||
|
||||
|
||||
0.9999/1.0b5
|
||||
~~~~~~~~~~~~
|
||||
|
||||
Released on April 29, 2015
|
||||
|
||||
* Fix #153: Sanitizer fails to treat some attributes as URLs. Despite how
|
||||
this sounds, this has no known security implications. No known version
|
||||
of IE (5.5 to current), Firefox (3 to current), Safari (6 to current),
|
||||
Chrome (1 to current), or Opera (12 to current) will run any script
|
||||
provided in these attributes.
|
||||
|
||||
* Pass error message to the ParseError exception in strict parsing mode.
|
||||
|
||||
* Allow data URIs in the sanitizer, with a whitelist of content-types.
|
||||
|
||||
* Add support for Python implementations that don't support lone
|
||||
surrogates (read: Jython). Fixes #2.
|
||||
|
||||
* Remove localization of error messages. This functionality was totally
|
||||
unused (and untested that everything was localizable), so we may as
|
||||
well follow numerous browsers in not supporting translating technical
|
||||
strings.
|
||||
|
||||
* Expose treewalkers.pprint as a public API.
|
||||
|
||||
* Add a documentEncoding property to HTML5Parser, fix #121.
|
||||
|
||||
|
||||
0.999
|
||||
~~~~~
|
||||
|
||||
Released on December 23, 2013
|
||||
|
||||
* Fix #127: add work-around for CPython issue #20007: .read(0) on
|
||||
http.client.HTTPResponse drops the rest of the content.
|
||||
|
||||
* Fix #115: lxml treewalker can now deal with fragments containing, at
|
||||
their root level, text nodes with non-ASCII characters on Python 2.
|
||||
|
||||
|
||||
0.99
|
||||
~~~~
|
||||
|
||||
Released on September 10, 2013
|
||||
|
||||
* No library changes from 1.0b3; released as 0.99 as pip has changed
|
||||
behaviour from 1.4 to avoid installing pre-release versions per
|
||||
PEP 440.
|
||||
|
||||
|
||||
1.0b3
|
||||
~~~~~
|
||||
|
||||
Released on July 24, 2013
|
||||
|
||||
* Removed ``RecursiveTreeWalker`` from ``treewalkers._base``. Any
|
||||
implementation using it should be moved to
|
||||
``NonRecursiveTreeWalker``, as everything bundled with html5lib has
|
||||
for years.
|
||||
|
||||
* Fix #67 so that ``BufferedStream`` to correctly returns a bytes
|
||||
object, thereby fixing any case where html5lib is passed a
|
||||
non-seekable RawIOBase-like object.
|
||||
|
||||
|
||||
1.0b2
|
||||
~~~~~
|
||||
|
||||
Released on June 27, 2013
|
||||
|
||||
* Removed reordering of attributes within the serializer. There is now
|
||||
an ``alphabetical_attributes`` option which preserves the previous
|
||||
behaviour through a new filter. This allows attribute order to be
|
||||
preserved through html5lib if the tree builder preserves order.
|
||||
|
||||
* Removed ``dom2sax`` from DOM treebuilders. It has been replaced by
|
||||
``treeadapters.sax.to_sax`` which is generic and supports any
|
||||
treewalker; it also resolves all known bugs with ``dom2sax``.
|
||||
|
||||
* Fix treewalker assertions on hitting bytes strings on
|
||||
Python 2. Previous to 1.0b1, treewalkers coped with mixed
|
||||
bytes/unicode data on Python 2; this reintroduces this prior
|
||||
behaviour on Python 2. Behaviour is unchanged on Python 3.
|
||||
|
||||
|
||||
1.0b1
|
||||
~~~~~
|
||||
|
||||
Released on May 17, 2013
|
||||
|
||||
* Implementation updated to implement the `HTML specification
|
||||
<http://www.whatwg.org/specs/web-apps/current-work/>`_ as of 5th May
|
||||
2013 (`SVN <http://svn.whatwg.org/webapps/>`_ revision r7867).
|
||||
|
||||
* Python 3.2+ supported in a single codebase using the ``six`` library.
|
||||
|
||||
* Removed support for Python 2.5 and older.
|
||||
|
||||
* Removed the deprecated Beautiful Soup 3 treebuilder.
|
||||
``beautifulsoup4`` can use ``html5lib`` as a parser instead. Note that
|
||||
since it doesn't support namespaces, foreign content like SVG and
|
||||
MathML is parsed incorrectly.
|
||||
|
||||
* Removed ``simpletree`` from the package. The default tree builder is
|
||||
now ``etree`` (using the ``xml.etree.cElementTree`` implementation if
|
||||
available, and ``xml.etree.ElementTree`` otherwise).
|
||||
|
||||
* Removed the ``XHTMLSerializer`` as it never actually guaranteed its
|
||||
output was well-formed XML, and hence provided little of use.
|
||||
|
||||
* Removed default DOM treebuilder, so ``html5lib.treebuilders.dom`` is no
|
||||
longer supported. ``html5lib.treebuilders.getTreeBuilder("dom")`` will
|
||||
return the default DOM treebuilder, which uses ``xml.dom.minidom``.
|
||||
|
||||
* Optional heuristic character encoding detection now based on
|
||||
``charade`` for Python 2.6 - 3.3 compatibility.
|
||||
|
||||
* Optional ``Genshi`` treewalker support fixed.
|
||||
|
||||
* Many bugfixes, including:
|
||||
|
||||
* #33: null in attribute value breaks XML AttValue;
|
||||
|
||||
* #4: nested, indirect descendant, <button> causes infinite loop;
|
||||
|
||||
* `Google Code 215
|
||||
<http://code.google.com/p/html5lib/issues/detail?id=215>`_: Properly
|
||||
detect seekable streams;
|
||||
|
||||
* `Google Code 206
|
||||
<http://code.google.com/p/html5lib/issues/detail?id=206>`_: add
|
||||
support for <video preload=...>, <audio preload=...>;
|
||||
|
||||
* `Google Code 205
|
||||
<http://code.google.com/p/html5lib/issues/detail?id=205>`_: add
|
||||
support for <video poster=...>;
|
||||
|
||||
* `Google Code 202
|
||||
<http://code.google.com/p/html5lib/issues/detail?id=202>`_: Unicode
|
||||
file breaks InputStream.
|
||||
|
||||
* Source code is now mostly PEP 8 compliant.
|
||||
|
||||
* Test harness has been improved and now depends on ``nose``.
|
||||
|
||||
* Documentation updated and moved to http://html5lib.readthedocs.org/.
|
||||
|
||||
|
||||
0.95
|
||||
~~~~
|
||||
|
||||
Released on February 11, 2012
|
||||
|
||||
|
||||
0.90
|
||||
~~~~
|
||||
|
||||
Released on January 17, 2010
|
||||
|
||||
|
||||
0.11.1
|
||||
~~~~~~
|
||||
|
||||
Released on June 12, 2008
|
||||
|
||||
|
||||
0.11
|
||||
~~~~
|
||||
|
||||
Released on June 10, 2008
|
||||
|
||||
|
||||
0.10
|
||||
~~~~
|
||||
|
||||
Released on October 7, 2007
|
||||
|
||||
|
||||
0.9
|
||||
~~~
|
||||
|
||||
Released on March 11, 2007
|
||||
|
||||
|
||||
0.2
|
||||
~~~
|
||||
|
||||
Released on January 8, 2007
|
||||
|
||||
|
||||
|
|
@ -0,0 +1,79 @@
|
|||
html5lib/__init__.py,sha256=OeEYU2bhPKOq9YE4ueQUsKrdSN2Ly_FdIp8GjNL5AeQ,783
|
||||
html5lib/constants.py,sha256=B5LN2DMP-6lEp9wpON4ecX3Kx01n_cbMjuGd6AteixE,86873
|
||||
html5lib/html5parser.py,sha256=P1fmBDiTFMZgTxwiNAuPA8P0VR96QSnlp7i-xLNHYnc,117335
|
||||
html5lib/ihatexml.py,sha256=MT12cVXAKaW-ALUkUeN175HpUP73xK8wAIpPzQ8cgfI,16581
|
||||
html5lib/inputstream.py,sha256=MmlG5JLwn2MvxPIEeSjJWkJZOzzg4aWvMyKhMk5k4qs,31641
|
||||
html5lib/sanitizer.py,sha256=sbyGySzFzCD_v0JYYSr6sLYVLpO6bpVmRiDMKbFRcCw,17804
|
||||
html5lib/tokenizer.py,sha256=6Uf8sDUkvNn661bcBSBYUCTfXzSs9EyCTiPcj5PAjYI,76929
|
||||
html5lib/utils.py,sha256=-kEjo4p5eVd2szBNnIuLExc6MOvKF0F01jQBP_9E0Oc,3255
|
||||
html5lib/filters/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
|
||||
html5lib/filters/_base.py,sha256=z-IU9ZAYjpsVsqmVt7kuWC63jR11hDMr6CVrvuao8W0,286
|
||||
html5lib/filters/alphabeticalattributes.py,sha256=fpRLbz6TCe5yXEkGmyMlJ80FekWsTR-sHk3Ano0U9LQ,624
|
||||
html5lib/filters/inject_meta_charset.py,sha256=xllv1I7unxhcyZTf3LTsv30wh2mAkT7wmTZx7zIhpuY,2746
|
||||
html5lib/filters/lint.py,sha256=8eJo0SXDcY40OhsNd0Cft36kUXCZ5t-30mNFSUf4LnE,4208
|
||||
html5lib/filters/optionaltags.py,sha256=4ozLwBgMRaxe7iqxefLQpDhp3irK7YHo9LgSGsvZYMw,10500
|
||||
html5lib/filters/sanitizer.py,sha256=MvGUs_v2taWPgGhjxswRSUiHfxrqMUhsNPz-eSeUYUQ,352
|
||||
html5lib/filters/whitespace.py,sha256=LbOUcC0zQ9z703KNZrArOr0kVBO7OMXjKjucDW32LU4,1142
|
||||
html5lib/serializer/__init__.py,sha256=xFXFP-inaTNlbnau5c5DGrH_O8yPm-C6HWbJxpiSqFE,490
|
||||
html5lib/serializer/htmlserializer.py,sha256=TeR9CNlbKV6QPhkJhBHyuy8hgaAPZc0JsV0mnMv1-oM,12843
|
||||
html5lib/treeadapters/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
|
||||
html5lib/treeadapters/sax.py,sha256=3of4vvaUYIAic7pngebwJV24hpOS7Zg9ggJa_WQegy4,1661
|
||||
html5lib/treebuilders/__init__.py,sha256=Xz4X6B5DA1R-5GyRa44j0sJwfl6dUNyb0NBu9-7sK3U,3405
|
||||
html5lib/treebuilders/_base.py,sha256=rzLhQqEJsI1qgh8__6kJ7So4Z4qUl4vSJYFtqzHBw6E,13699
|
||||
html5lib/treebuilders/dom.py,sha256=jvmtvnERtpxXpHvBgiq1FpzAUYAAzoolOTx_DoXwGEI,8469
|
||||
html5lib/treebuilders/etree.py,sha256=1HKcq5Np0PgUNSjhcsdjZVWbgouxp5KeL_8MjUuUuKM,12609
|
||||
html5lib/treebuilders/etree_lxml.py,sha256=z3Bnfm2MstEEb_lbaAeicl5l-ab6MSQa5Q1ZZreK7Pc,14031
|
||||
html5lib/treewalkers/__init__.py,sha256=m2-4a5P4dMNlQb26MNIhgj69p6ms1i-JD2HPDr7iTfw,5766
|
||||
html5lib/treewalkers/_base.py,sha256=Ms4kXXrxceHBnCsVBZ04LFIR7V3e2AiSL41QIaBTWT4,7002
|
||||
html5lib/treewalkers/dom.py,sha256=Lb63Nuz8HtgvkuuvSmU5LOyUkEtstH5saPPAg5xN4r8,1421
|
||||
html5lib/treewalkers/etree.py,sha256=qJklMXBbBYqHxQfAeyAxW57x2xGt-nqOe80KDbv4MM0,4588
|
||||
html5lib/treewalkers/genshistream.py,sha256=IbBFrlgi-59-K7P1zm0d7ZFIknBN4c5E57PHJDkx39s,2278
|
||||
html5lib/treewalkers/lxmletree.py,sha256=1vMqSFN6IhwFtH3eqAXSNm-55I-5NwSb_-oBOCAvLv8,5980
|
||||
html5lib/treewalkers/pulldom.py,sha256=9W6i8yWtUzayV6EwX-okVacttHaqpQZwdBCc2S3XeQ4,2302
|
||||
html5lib/trie/__init__.py,sha256=mec5zyJ5wIKRM8819gIcIsYQwncg91rEmPwGH1dG3Ho,212
|
||||
html5lib/trie/_base.py,sha256=WGY8SGptFmx4O0aKLJ54zrIQOoyuvhS0ngA36vAcIcc,927
|
||||
html5lib/trie/datrie.py,sha256=rGMj61020CBiR97e4kyMyqn_FSIJzgDcYT2uj7PZkoo,1166
|
||||
html5lib/trie/py.py,sha256=zg7RZSHxJ8mLmuI_7VEIV8AomISrgkvqCP477AgXaG0,1763
|
||||
html5lib-0.9999999.dist-info/DESCRIPTION.rst,sha256=4symxPlAGFfTe4DH5texohsZDD1gYeIxBpeE-9Uh0CY,9899
|
||||
html5lib-0.9999999.dist-info/METADATA,sha256=KM-AHI5RI5O7BSTrqF8Acd8jX_7yodP5PmZ1V8tREck,10902
|
||||
html5lib-0.9999999.dist-info/RECORD,,
|
||||
html5lib-0.9999999.dist-info/WHEEL,sha256=lCqt3ViRAf9c8mCs6o7ffkwROUdYSy8_YHn5f_rulB4,93
|
||||
html5lib-0.9999999.dist-info/metadata.json,sha256=ptJ1V8e-ESI9H736RKJ5AJZ3I0T31YwhyIB0Zu2ru4o,1116
|
||||
html5lib-0.9999999.dist-info/top_level.txt,sha256=XEX6CHpskSmvjJB4tP6m4Q5NYXhIf_0ceMc0PNbzJPQ,9
|
||||
/var/lib/lxc/openmedialibrary/rootfs/srv/client/platform/Shared/p34/lib/python3.4/site-packages/html5lib-0.9999999.dist-info/INSTALLER,sha256=zuuue4knoyJ-UwPPXg8fezS7VCrXJQrAP7zeNuwvFQg,4
|
||||
html5lib/treeadapters/__pycache__/__init__.cpython-34.pyc,,
|
||||
html5lib/serializer/__pycache__/__init__.cpython-34.pyc,,
|
||||
html5lib/serializer/__pycache__/htmlserializer.cpython-34.pyc,,
|
||||
html5lib/__pycache__/tokenizer.cpython-34.pyc,,
|
||||
html5lib/treewalkers/__pycache__/_base.cpython-34.pyc,,
|
||||
html5lib/trie/__pycache__/py.cpython-34.pyc,,
|
||||
html5lib/filters/__pycache__/lint.cpython-34.pyc,,
|
||||
html5lib/treewalkers/__pycache__/pulldom.cpython-34.pyc,,
|
||||
html5lib/treebuilders/__pycache__/etree_lxml.cpython-34.pyc,,
|
||||
html5lib/__pycache__/sanitizer.cpython-34.pyc,,
|
||||
html5lib/treebuilders/__pycache__/__init__.cpython-34.pyc,,
|
||||
html5lib/treebuilders/__pycache__/dom.cpython-34.pyc,,
|
||||
html5lib/__pycache__/utils.cpython-34.pyc,,
|
||||
html5lib/filters/__pycache__/__init__.cpython-34.pyc,,
|
||||
html5lib/treebuilders/__pycache__/etree.cpython-34.pyc,,
|
||||
html5lib/filters/__pycache__/sanitizer.cpython-34.pyc,,
|
||||
html5lib/treebuilders/__pycache__/_base.cpython-34.pyc,,
|
||||
html5lib/__pycache__/constants.cpython-34.pyc,,
|
||||
html5lib/filters/__pycache__/whitespace.cpython-34.pyc,,
|
||||
html5lib/trie/__pycache__/__init__.cpython-34.pyc,,
|
||||
html5lib/filters/__pycache__/optionaltags.cpython-34.pyc,,
|
||||
html5lib/treewalkers/__pycache__/dom.cpython-34.pyc,,
|
||||
html5lib/trie/__pycache__/_base.cpython-34.pyc,,
|
||||
html5lib/__pycache__/inputstream.cpython-34.pyc,,
|
||||
html5lib/__pycache__/__init__.cpython-34.pyc,,
|
||||
html5lib/treewalkers/__pycache__/etree.cpython-34.pyc,,
|
||||
html5lib/treewalkers/__pycache__/lxmletree.cpython-34.pyc,,
|
||||
html5lib/filters/__pycache__/_base.cpython-34.pyc,,
|
||||
html5lib/treeadapters/__pycache__/sax.cpython-34.pyc,,
|
||||
html5lib/treewalkers/__pycache__/genshistream.cpython-34.pyc,,
|
||||
html5lib/filters/__pycache__/alphabeticalattributes.cpython-34.pyc,,
|
||||
html5lib/treewalkers/__pycache__/__init__.cpython-34.pyc,,
|
||||
html5lib/__pycache__/html5parser.cpython-34.pyc,,
|
||||
html5lib/__pycache__/ihatexml.cpython-34.pyc,,
|
||||
html5lib/filters/__pycache__/inject_meta_charset.cpython-34.pyc,,
|
||||
html5lib/trie/__pycache__/datrie.cpython-34.pyc,,
|
||||
|
|
@ -0,0 +1,5 @@
|
|||
Wheel-Version: 1.0
|
||||
Generator: bdist_wheel (0.29.0)
|
||||
Root-Is-Purelib: true
|
||||
Tag: cp34-none-any
|
||||
|
||||
|
|
@ -0,0 +1 @@
|
|||
{"classifiers": ["Development Status :: 5 - Production/Stable", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.6", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.2", "Programming Language :: Python :: 3.3", "Programming Language :: Python :: 3.4", "Topic :: Software Development :: Libraries :: Python Modules", "Topic :: Text Processing :: Markup :: HTML"], "extensions": {"python.details": {"contacts": [{"email": "james@hoppipolla.co.uk", "name": "James Graham", "role": "author"}], "document_names": {"description": "DESCRIPTION.rst"}, "project_urls": {"Home": "https://github.com/html5lib/html5lib-python"}}}, "extras": [], "generator": "bdist_wheel (0.29.0)", "license": "MIT License", "metadata_version": "2.0", "name": "html5lib", "run_requires": [{"requires": ["six"]}], "summary": "HTML parser based on the WHATWG HTML specification", "version": "0.9999999"}
|
||||
|
|
@ -0,0 +1 @@
|
|||
html5lib
|
||||
Loading…
Add table
Add a link
Reference in a new issue