0x2620/python-ox

No description

Find a file

Will Thompson cbcef39ec0 ox.html: fix sanitizing whitespace-only strings lxml raises: ParserError: Document is empty if you ask it to parse a string with no non-whitespace characters. The existing truthiness test squashed the commonest case (empty string) but not the general case.		2015-11-24 18:17:48 +00:00
ox	ox.html: fix sanitizing whitespace-only strings	2015-11-24 18:17:48 +00:00
README	add python3 to README	2014-10-11 20:15:00 +02:00
requirements.txt	add six>=1.5.2 to requirements.txt	2014-10-05 21:12:27 +02:00
setup.py	update url	2015-11-15 15:31:16 +01:00
test.sh	fix test.sh	2010-07-08 01:28:04 +02:00

README

python-ox - the web in a dict

Depends:
 python >= 2.7 or python3 >= 3.4
 python-chardet (http://chardet.feedparser.org/)
 python-feedparser (http://www.feedparser.org/)
 python-lxml (http://codespeak.net/lxml/)          [optional]
 django (otherwise dates < 1900 are not supported) [optional]

Usage:
 import ox
 
 data = ox.cache.read_url('http:/...')
 text = ox.strip_tags(data)
 ox.normalize_newlines(text)
 ox.format_bytes(len(data))

 ox.format_bytes(1234567890)
 '1.15 GB'

 import ox.web.imdb
 imdbId = ox.web.imdb.guess('The Matrix')
 info = ox.web.imdb.Imdb(imdbId)
 info['year']
 1999

For information on ox.django see https://wiki.0x2620.org/wiki/ox.django

Install:
  python setup.py install

Cookies:
  some ox.web modules require user accont information or cookies to work,
  those are saved in ~/.ox/auth.json, most basic form looks like this:
  {
    "key": "value"
  }

Tests:
 nosetests --with-doctest ox
 nosetests3 --with-doctest ox