Commit graph

785 commits

Author SHA1 Message Date
j
7695a9c015 fix some tests and urls 2016-05-21 15:19:25 +02:00
5355dbf821 Add WebVTT output support
This subset of the format is almost identical to SRT, but I think it's
cleaner to have a separate module (at the cost of a little bit of
copy-pasta).
2016-03-11 12:14:50 +00:00
b75a0f9bb8 srt: neater docstrings, some cleanup 2016-03-11 12:10:32 +00:00
j
959931114b add rmvb as video extension 2016-03-01 14:44:47 +05:30
j
ec1e5459f6 remove ox.django 2016-02-20 17:51:46 +05:30
j
8055e1dd54 update to django 1.9 2016-02-20 17:51:46 +05:30
j
43783b00a1 rewrite ox.django.fields 2016-02-19 19:00:53 +05:30
j
2681536b08 use PY2 2016-01-14 17:09:10 +05:30
j
c49f663d54 py3 does not have string.letters 2015-12-25 20:43:15 +05:30
j
1db297169b basestring->six.string_types 2015-12-25 20:38:55 +05:30
j
fa29557a6f ignore audio language if stream length does not match 2015-12-18 15:46:01 +01:00
j
85c1e789ba cleanup cache, fix delete 2015-12-11 20:00:05 +01:00
j
d938091b26 add option delete option to cache 2015-12-11 19:27:54 +01:00
cbcef39ec0 ox.html: fix sanitizing whitespace-only strings
lxml raises:

    ParserError: Document is empty

if you ask it to parse a string with no non-whitespace characters. The
existing truthiness test squashed the commonest case (empty string) but
not the general case.
2015-11-24 18:17:48 +00:00
533a1a627e ox.html.sanitize_fragment: documentation, tests 2015-11-24 18:05:48 +00:00
5448aec902 ox.html.sanitize_html: fix existing tests
The backslashes need to be escaped to come out as literal backslashes in
the Python source code run by doctest.
2015-11-24 17:58:39 +00:00
j
98d83192ce jsonc: handle parse errors from 'json' gracefully (fixes #2858)
- JSONDecodeError is only available in simplejson, use ValueError
- imporove error context output
2015-11-12 13:01:29 +01:00
j
3ed213d6d7 update crawler 2015-11-03 23:16:34 +01:00
j
4a8717ee76 update user agent 2015-11-03 23:15:57 +01:00
j
4b3af0cbaf update imdb.movieconnections 2015-10-12 13:56:25 +02:00
7c9887410c Allow definition lists in sanitized HTML 2015-09-14 22:47:21 +02:00
j
5230d59d44 UA strings: Edge+El Capitan 2015-08-04 19:23:47 +02:00
j
77f34143f5 criterion: decode some html 2015-08-02 15:58:59 +02:00
86bffd67b3 API: raise if caller supplies both dict and kwargs
I (incorrectly) wrote something like the following:

    api.find({'query': {...}}, keys=['id'], range=[0, n])

and the query was silently ignored, giving totally different
results to what I wanted. fixes #2822
2015-08-02 15:57:48 +02:00
j
586dbaa932 fix akas 2015-06-01 14:51:09 +02:00
j
4a3fecab19 force cache update 2015-05-23 21:44:37 +02:00
j
5bf53ba463 titles without countries 2015-05-04 10:53:17 +02:00
j
b147c61f5c ubu cleanup 2015-04-26 15:29:32 +02:00
j
5c883e19e6 better ubu parser 2015-04-24 19:02:25 +02:00
j
47bdf3c897 include size for unknown formats 2015-04-24 16:09:31 +02:00
j
72f34f2a60 fix net.oshash for small files 2015-03-19 18:57:50 +05:30
j
ed465c527f better title 2015-03-15 02:38:38 +05:30
j
36c1754725 use video link if its mp4(ubu) 2015-03-15 02:25:31 +05:30
j
9dd0c2416e better description 2015-03-15 02:23:52 +05:30
j
16a955f310 better title 2015-03-15 02:21:56 +05:30
j
c4c0c40825 only ignore title 2015-03-15 02:17:33 +05:30
j
cdea161d2f ignore emtpy parts 2015-03-15 02:12:40 +05:30
j
60ad26d201 update ubu/archive 2015-03-15 01:07:34 +05:30
j
7f7b0c3ee8 filter/map return generators in py3, wrap in list 2015-03-07 23:46:59 +05:30
j
dc6f25aac1 dont fail if files dont have all format keys 2015-01-22 15:31:36 +05:30
j
e4c51f0598 use ffprobe in avinfo if installed 2015-01-03 10:58:21 +01:00
j
f02d42712d dont throw exception for invalid files 2014-12-24 23:18:29 +01:00
rolux
75e0ec06f9 cosmetic changes 2014-12-21 14:14:02 +00:00
rolux
154a3a5c69 update documentation for api.error 2014-12-19 14:43:09 +00:00
j
fd0c35fa14 fix ox.ffprobe output to match ox.avinfo 2014-12-19 11:57:38 +00:00
rolux
edf876c119 update documentation of api.api 2014-12-18 20:37:47 +00:00
rolux
27c701b97a when sorting names, handle trailing (...) and [...] 2014-12-16 18:11:14 +00:00
j
abaae5e059 add startpage.find 2014-12-09 13:11:36 +01:00
j
79df151729 dont create current dir 2014-12-09 13:11:09 +01:00
rolux
34a48e6e68 update UA_VERSIONS.system 2014-11-21 09:46:12 +00:00
j
a871ecb3c5 also fix return value of drawText 2014-11-20 12:50:33 +00:00
j
440c7ad49b add font offset to getTextSize if PIL is > 2.1 < 2.6.1 2014-11-20 10:50:09 +00:00
rolux
645cc0ff04 fix format_timecode 2014-11-17 16:33:46 +00:00
j
8e696b1da3 alias fromAZ/decode_base26 toAZ/encode_base26. add parse_timecode/format_timecode 2014-11-16 16:41:00 +00:00
j
7addf13c90 not all filesystems use NFD, normalize to NFD, fixes #2553 2014-11-11 12:01:24 +01:00
j
f5770f12d1 fix fixunicode 2014-11-11 12:00:22 +01:00
j
cd9f49b771 dont decode utf-8, use unicode literal 2014-10-31 16:30:35 +01:00
j
c2e0129438 encode filename before opening 2014-10-29 01:56:26 +01:00
j
03e2ac76bb prepare for 2.3.x release 2014-10-11 20:10:10 +02:00
j
316e985eca support direct json POST and from action/data in api, pass data to api functions 2014-10-06 08:29:36 +00:00
j
2d467ea6c6 fix utf-8 urls 2014-10-06 08:22:25 +02:00
j
dcc23ba2a4 get rid of all urllib2 calls 2014-10-05 20:06:22 +02:00
j
1f14f6db55 more urlencode 2014-10-05 19:54:13 +02:00
j
865e94da22 add ox.cache.get_json/ox.net.get_json, fixes #2451 2014-10-05 13:24:14 +02:00
j
9b860d0d33 urlencode 2014-10-05 10:23:56 +02:00
j
f630877098 fix GET 2014-10-04 21:07:18 +02:00
j
a3c470847d fix POST in py3 2014-10-04 21:04:55 +02:00
j
f50b02dd64 fix ox.api in python3 2014-10-04 16:05:00 +02:00
j
b70dfecccc fix ox.api 2014-10-04 13:37:33 +02:00
j
83cf8eea53 i really likes movies, s/six.movies/six.moves/ 2014-10-02 20:17:31 +02:00
j
970f37c38c more file open py2/3 cleanups 2014-10-02 10:34:04 +02:00
j
37dfed3143 more python3 cleanups 2014-10-02 10:28:22 +02:00
j
4b8aad5b38 2+3 ox.django 2014-10-02 08:34:58 +02:00
j
53fbc2e1fb make ox.torrent in python 2 and 3 2014-10-01 11:21:11 +02:00
j
8bfbaef598 keep version in release 2014-10-01 11:03:39 +02:00
j
6dfa80b646 fix ox.image in python3 2014-10-01 10:48:06 +02:00
j
c2de06d9d8 better performances of ox.js.minify 2014-09-30 23:19:19 +02:00
j
46278349e3 fix ox.file 2014-09-30 21:30:25 +02:00
j
ec252440d9 from __futre__ import print_function 2014-09-30 21:27:26 +02:00
j
a9002374b1 fix ox.text in python 3 2014-09-30 21:17:15 +02:00
j
d4d09b56b6 use six to support python 2 and 3 2014-09-30 21:04:46 +02:00
j
1b1dcf1c58 add ts to video extensions 2014-09-29 18:03:56 +02:00
j
ff0d776b09 fix ox.web.youtube 2014-09-28 21:57:45 +02:00
j
14ea6a0f7d fix ox.django 2014-09-28 21:57:31 +02:00
j
954312e0d6 support more kwargs to __init__ 2014-09-05 13:04:18 +02:00
rolux
a0666acf89 parse_useragent: add Mac OS X 10.10 Yosemite 2014-09-04 18:50:34 +02:00
j
9edf30085e fix ox.iso language lookup 2014-09-03 13:48:11 +02:00
j
3f15161bed fix ox.iso 2014-07-22 17:32:34 +02:00
j
25c203e981 use metadata from ffmpeg2theora if available 2014-07-20 12:54:13 +02:00
j
2f129c4766 parse language from audio track if video has multiple audio tracks 2014-07-20 11:35:55 +02:00
j
bc9c3c8944 map track language to track 2014-07-20 11:20:43 +02:00
j
2bd1c7d657 handly empty subs 2014-07-20 11:20:31 +02:00
j
5e2b3cf448 fix imdb poster 2014-07-10 09:38:36 +02:00
j
5488920d07 add more video extensions 2014-06-04 14:04:25 +03:00
j
2ee2087b1d add aiff 2014-05-17 22:24:17 +02:00
j
f3295c0eec dont fail if running outside of django env 2014-05-17 18:30:15 +02:00
j
92d7c210ca work around thread issues with ox.cache 2014-05-17 11:25:19 +02:00
j
07cd885b0a cleanup 2014-05-09 12:20:55 +02:00
j
73a60e73d7 add abebooks 2014-05-06 00:24:13 +02:00
j
8212c28ac7 handle broken headers 2014-04-23 15:38:38 +02:00
j
94ca01a041 string.letters is changes uppercase position between python version, use string.ascii_uppercase 2014-04-22 19:03:32 +02:00
j
d2a6511a95 add timeout argument to ox.web.youtube.info 2014-04-22 16:15:20 +02:00
j
cdc56bc63f add lookupbyisbn 2014-04-03 12:15:30 +02:00
j
9c844d0ce7 fix amazon parser 2014-04-03 01:34:15 +02:00
rolux
cc72dc96d3 ox.image: don't create array of identical arrays 2014-03-25 12:44:50 +01:00
j
87a89f0594 update user-agent string 2014-03-19 10:47:15 +01:00
j
7383bf08c4 fix content-disposition 2014-03-01 14:17:23 +01:00
j
075e735cd1 update ox.web.youtube 2014-02-19 14:09:54 +05:30
j
1c871f4d31 add method to add Access-Control-Allow-Origin to HttpFileResponses 2014-02-05 06:37:37 +00:00
j
34691832eb revert change, fragment_fromstring only parses single element 2014-02-04 10:44:51 +00:00
j
8bda86c17d use fragment_fromstring instead of document_fromstring 2014-02-04 10:40:01 +00:00
j
7577b319ce dont take random number if film has no year 2014-01-17 23:09:45 +05:30
j
d1a5613f3f more summary fixes 2014-01-16 13:56:07 +05:30
j
5a61dea925 fix imdb plotsummary parser 2014-01-16 13:49:30 +05:30
j
5179a4fcf9 add yt 4k format 2014-01-15 22:03:39 +05:30
j
2456ec2d5a ox.web.youtube: use in/out/value like ox.srt, decode html value 2014-01-15 20:12:14 +05:30
j
575549ae33 only add api methods to API instance 2014-01-03 00:54:49 +05:30
j
2abe99c89f fix wikipedia movie parser 2013-12-22 13:38:43 +05:30
j
5c1ab13749 no need to load json string into ram 2013-11-15 16:16:21 +01:00
j
37cd92dfba fix html cleanup of empty string 2013-12-01 12:35:38 +00:00
rolux
6f68729b6f API: don't fail on missing 'doc' property 2013-11-23 15:17:59 +01:00
j
d664d99f89 rewrite sanitize_html to support global attributes 2013-11-10 22:00:24 +00:00
j
d8bb547e25 workaround for python2.6 2013-11-06 10:33:45 +01:00
j
828223ad82 dont break ox.API subclasses 2013-11-03 16:39:57 +01:00
j
714729fee7 return new class for each ox.API call 2013-11-02 17:40:01 +01:00
j
d38da54a17 strip <p> 2013-10-31 13:49:55 +01:00
j
5dcd8b3552 allow iframes in sanitize_html 2013-10-24 16:40:04 +00:00
j
38853b1f4b detect IE11, part of ticket #1917 2013-10-24 00:24:13 +02:00
j
e3ee66fe08 trivia 2013-10-21 17:33:00 +02:00
j
0effb090a3 move EXTENSIONS to ox.file and add image type 2013-10-14 20:07:05 +02:00
j
5c6ff50027 use iter to read file with a multiple of hash block_size(sha1sum) 2013-10-14 12:35:07 +02:00
j
7d712445bf utf-8 filenames 2013-10-11 20:38:35 +02:00
j
413848638b remove debug 2013-10-11 20:13:10 +02:00
j
68b0e525ca fixes for django 1.5.x 2013-10-11 20:12:37 +02:00
j
36c7e95788 support nulls_last in sqlite 2013-10-11 20:12:23 +02:00
j
74a9b812b0 update user agent, fixes #1894 2013-09-27 18:14:34 +02:00
j
98ab0e29db support returning more than 10 results 2013-09-08 15:56:57 +02:00
rolux
cb45a25a7c geo.get_country: allow name as arg, not just code 2013-08-28 12:06:56 +02:00
j
22eecc22e4 allow more html5 tags 2013-08-27 08:51:18 +00:00
j
a8e76893d3 only use most common title per type, fixes #1826 2013-08-24 17:30:37 +02:00
rolux
f429ed8b07 add geo.split_geoname 2013-08-18 11:56:48 +02:00
j
3cc5659310 add option to get tweets from one user 2013-08-01 15:14:06 +02:00
rolux
3bf45b9d33 update UA parser 2013-07-30 19:06:01 +02:00
rolux
68a324d8fa update UA parser 2013-07-30 18:33:33 +02:00
j
611db3ed7b fix typos 2013-07-30 15:22:23 +02:00
rolux
893a70791c update ua parser 2013-07-29 19:03:46 +02:00
rolux
996344c689 update ua parser 2013-07-29 18:22:22 +02:00
rolux
b7f98ffecd cosmetic changes 2013-07-25 09:29:24 +02:00
j
ba6ee2e62e make sound unique 2013-07-23 14:54:32 +02:00
j
f3d26879fd one more 2013-07-16 13:42:58 +02:00
j
aa8641f22f more titles to ignore, cloes #1532 2013-07-16 13:41:49 +02:00
j
7acbc72305 return utf-8 encoded json 2013-07-16 11:10:47 +00:00
j
02afccc253 normalize alternative title country names 2013-07-16 11:41:16 +02:00
j
07e1a36ba9 filter working titles, one more World-wide/Internaltional 2013-07-16 11:35:55 +02:00
j
5b9cb279ba world-wide title 2013-07-16 11:02:43 +02:00
j
4c41db9460 add script to update ox.geo.COUNTRIES, normalize_country_name takes and returns a unicode string 2013-07-13 16:14:25 +02:00
j
adfe642547 use geo.normalize_country_name for normalize imdb names 2013-07-13 15:48:26 +02:00
j
ad7e21e7a8 fix lxml unicode handling 2013-07-04 20:32:54 +02:00
j
b1d248c4df add timeout as option to twitter.find, also return html 2013-07-04 12:22:56 +02:00
rolux
0d9bba8865 in wrapText, when testing smaller line widths, don't hyphenate words that were previously not hyphenated 2013-07-03 13:08:41 +02:00
j
330cc5ff3b fix missing space i.e. for:
ox.image.wrapText("VERTEIDIGUNG DER ZEIT",  608, 2, "./data/MontserratBold.ttf", 48)
2013-07-02 20:48:37 +02:00
j
6996f9c422 fix typos 2013-07-02 13:00:27 +00:00
j
1d429b6d33 fix date time serialization 2013-07-02 14:44:18 +02:00
j
78986c671e alternative titles are flipped now 2013-06-29 18:50:10 +02:00
j
deaa2bb988 dont return x as release date 2013-06-29 18:21:58 +02:00
j
1e65f3e478 fix dates < 1900 2013-06-28 14:57:31 +00:00
j
f7e9605828 fix release date parser 2013-06-28 16:53:25 +02:00
j
d7bd98d63a add www. 2013-06-17 22:43:44 +02:00
j
27df553ffb move 2013-06-14 12:19:36 +02:00
j
15ba89bac2 use ddg 2013-06-14 12:17:18 +02:00
j
e2bffccd36 add youtube playlist parser 2013-06-12 18:28:19 +02:00
rolux
f3d91b78d6 update criterion.py 2013-06-09 16:48:58 +02:00
rolux
339b7026f5 ox.file: add ensure_ascii parameter to write_json; add write_image method (write_path + image.save) 2013-06-09 16:45:26 +02:00
j
3951c67623 only look at function closures 2013-06-07 09:44:00 +00:00
j
223ac3c534 pass settings to api template 2013-06-07 11:25:28 +02:00
j
008654ad5d fix api documentation for double decorators 2013-06-07 11:13:01 +02:00
rolux
d79a3c0b95 fix #1574 (wrong id for non-imdb series episode 0 item) 2013-06-06 20:37:46 +02:00
rolux
cca251bc32 ox.image: add getTextSize method 2013-06-06 17:29:52 +02:00
j
f086c64e51 parse arsenal 2013-06-06 11:20:43 +02:00
j
f535b82e7b faster and more reliable encoding detection of html content 2013-06-01 13:29:24 +02:00
j
3165e3a8b1 fix unicode detection 2013-06-01 13:21:13 +02:00
j
7d7c7c9407 fix ddg 2013-06-01 12:25:20 +02:00
j
96f7975747 episodes without season/episode are season 1, add season besides having it in the title, fixes #1548 2013-05-31 22:34:06 +02:00
j
8038b0d13f fix google parser, and by that imdb id lookup, fixes #1545 2013-05-31 22:05:25 +02:00
j
cb3701d3e2 more unwanted akas, fixes #1532 2013-05-31 21:45:25 +02:00
rolux
2daae6a4c3 less error-prone version 2013-05-31 16:01:55 +02:00
rolux
14137c30c8 can't access set via index 2013-05-31 15:57:42 +02:00
rolux
762fab3519 typo 2013-05-31 15:56:49 +02:00
rolux
c33edd3ff3 fix '.en' stripping (movies can have parts, do don't check for '1 srt is en', but 'all srts are en') 2013-05-31 15:52:09 +02:00
j
8bd76ed27f normalize_paths not needed as extra function 2013-05-31 15:23:38 +02:00
rolux
913c8f4c1b in parse_item_files, strip unneeded '.en' (if, per version and per subtitle extension, there is only one language=='en' file) 2013-05-31 15:18:33 +02:00
j
e1508f4068 add movie.normalize_paths and call in parse_item_files 2013-05-31 13:06:42 +00:00
j
986a788bc7 fix #1546 2013-05-31 11:54:57 +02:00
rolux
647f027e8a criterion.py: fix title and synopsis detection 2013-05-31 11:03:09 +02:00
j
4f0654db68 fix keywords, fixes #1541 2013-05-30 21:12:28 +02:00
rolux
540f0bc4bd fix imdb keywords parser 2013-05-30 21:10:06 +02:00
j
100a93296f dont keep originalTitle for episodes, fixes #1535 2013-05-30 13:55:54 +02:00
j
cb9a791a97 ignore more alternative titles, fixes #1532 2013-05-30 11:59:40 +02:00
j
a93dc6e37b fix filmingLocations 2013-05-15 00:23:00 +02:00
j
8563ea8239 add column, line to javascript tokenizer tokens 2013-05-10 13:00:32 +00:00