Commit graph

825 commits

Author SHA1 Message Date
j
fa29557a6f ignore audio language if stream length does not match 2015-12-18 15:46:01 +01:00
j
85c1e789ba cleanup cache, fix delete 2015-12-11 20:00:05 +01:00
j
d938091b26 add option delete option to cache 2015-12-11 19:27:54 +01:00
cbcef39ec0 ox.html: fix sanitizing whitespace-only strings
lxml raises:

    ParserError: Document is empty

if you ask it to parse a string with no non-whitespace characters. The
existing truthiness test squashed the commonest case (empty string) but
not the general case.
2015-11-24 18:17:48 +00:00
533a1a627e ox.html.sanitize_fragment: documentation, tests 2015-11-24 18:05:48 +00:00
5448aec902 ox.html.sanitize_html: fix existing tests
The backslashes need to be escaped to come out as literal backslashes in
the Python source code run by doctest.
2015-11-24 17:58:39 +00:00
j
98d83192ce jsonc: handle parse errors from 'json' gracefully (fixes #2858)
- JSONDecodeError is only available in simplejson, use ValueError
- imporove error context output
2015-11-12 13:01:29 +01:00
j
3ed213d6d7 update crawler 2015-11-03 23:16:34 +01:00
j
4a8717ee76 update user agent 2015-11-03 23:15:57 +01:00
j
4b3af0cbaf update imdb.movieconnections 2015-10-12 13:56:25 +02:00
7c9887410c Allow definition lists in sanitized HTML 2015-09-14 22:47:21 +02:00
j
5230d59d44 UA strings: Edge+El Capitan 2015-08-04 19:23:47 +02:00
j
77f34143f5 criterion: decode some html 2015-08-02 15:58:59 +02:00
86bffd67b3 API: raise if caller supplies both dict and kwargs
I (incorrectly) wrote something like the following:

    api.find({'query': {...}}, keys=['id'], range=[0, n])

and the query was silently ignored, giving totally different
results to what I wanted. fixes #2822
2015-08-02 15:57:48 +02:00
j
586dbaa932 fix akas 2015-06-01 14:51:09 +02:00
j
4a3fecab19 force cache update 2015-05-23 21:44:37 +02:00
j
5bf53ba463 titles without countries 2015-05-04 10:53:17 +02:00
j
b147c61f5c ubu cleanup 2015-04-26 15:29:32 +02:00
j
5c883e19e6 better ubu parser 2015-04-24 19:02:25 +02:00
j
47bdf3c897 include size for unknown formats 2015-04-24 16:09:31 +02:00
j
72f34f2a60 fix net.oshash for small files 2015-03-19 18:57:50 +05:30
j
ed465c527f better title 2015-03-15 02:38:38 +05:30
j
36c1754725 use video link if its mp4(ubu) 2015-03-15 02:25:31 +05:30
j
9dd0c2416e better description 2015-03-15 02:23:52 +05:30
j
16a955f310 better title 2015-03-15 02:21:56 +05:30
j
c4c0c40825 only ignore title 2015-03-15 02:17:33 +05:30
j
cdea161d2f ignore emtpy parts 2015-03-15 02:12:40 +05:30
j
60ad26d201 update ubu/archive 2015-03-15 01:07:34 +05:30
j
7f7b0c3ee8 filter/map return generators in py3, wrap in list 2015-03-07 23:46:59 +05:30
j
dc6f25aac1 dont fail if files dont have all format keys 2015-01-22 15:31:36 +05:30
j
e4c51f0598 use ffprobe in avinfo if installed 2015-01-03 10:58:21 +01:00
j
f02d42712d dont throw exception for invalid files 2014-12-24 23:18:29 +01:00
rolux
75e0ec06f9 cosmetic changes 2014-12-21 14:14:02 +00:00
rolux
154a3a5c69 update documentation for api.error 2014-12-19 14:43:09 +00:00
j
fd0c35fa14 fix ox.ffprobe output to match ox.avinfo 2014-12-19 11:57:38 +00:00
rolux
edf876c119 update documentation of api.api 2014-12-18 20:37:47 +00:00
rolux
27c701b97a when sorting names, handle trailing (...) and [...] 2014-12-16 18:11:14 +00:00
j
abaae5e059 add startpage.find 2014-12-09 13:11:36 +01:00
j
79df151729 dont create current dir 2014-12-09 13:11:09 +01:00
rolux
34a48e6e68 update UA_VERSIONS.system 2014-11-21 09:46:12 +00:00
j
a871ecb3c5 also fix return value of drawText 2014-11-20 12:50:33 +00:00
j
440c7ad49b add font offset to getTextSize if PIL is > 2.1 < 2.6.1 2014-11-20 10:50:09 +00:00
rolux
645cc0ff04 fix format_timecode 2014-11-17 16:33:46 +00:00
j
8e696b1da3 alias fromAZ/decode_base26 toAZ/encode_base26. add parse_timecode/format_timecode 2014-11-16 16:41:00 +00:00
j
7addf13c90 not all filesystems use NFD, normalize to NFD, fixes #2553 2014-11-11 12:01:24 +01:00
j
f5770f12d1 fix fixunicode 2014-11-11 12:00:22 +01:00
j
cd9f49b771 dont decode utf-8, use unicode literal 2014-10-31 16:30:35 +01:00
j
c2e0129438 encode filename before opening 2014-10-29 01:56:26 +01:00
j
03e2ac76bb prepare for 2.3.x release 2014-10-11 20:10:10 +02:00
j
316e985eca support direct json POST and from action/data in api, pass data to api functions 2014-10-06 08:29:36 +00:00
j
2d467ea6c6 fix utf-8 urls 2014-10-06 08:22:25 +02:00
j
dcc23ba2a4 get rid of all urllib2 calls 2014-10-05 20:06:22 +02:00
j
1f14f6db55 more urlencode 2014-10-05 19:54:13 +02:00
j
865e94da22 add ox.cache.get_json/ox.net.get_json, fixes #2451 2014-10-05 13:24:14 +02:00
j
9b860d0d33 urlencode 2014-10-05 10:23:56 +02:00
j
f630877098 fix GET 2014-10-04 21:07:18 +02:00
j
a3c470847d fix POST in py3 2014-10-04 21:04:55 +02:00
j
f50b02dd64 fix ox.api in python3 2014-10-04 16:05:00 +02:00
j
b70dfecccc fix ox.api 2014-10-04 13:37:33 +02:00
j
83cf8eea53 i really likes movies, s/six.movies/six.moves/ 2014-10-02 20:17:31 +02:00
j
970f37c38c more file open py2/3 cleanups 2014-10-02 10:34:04 +02:00
j
37dfed3143 more python3 cleanups 2014-10-02 10:28:22 +02:00
j
4b8aad5b38 2+3 ox.django 2014-10-02 08:34:58 +02:00
j
53fbc2e1fb make ox.torrent in python 2 and 3 2014-10-01 11:21:11 +02:00
j
8bfbaef598 keep version in release 2014-10-01 11:03:39 +02:00
j
6dfa80b646 fix ox.image in python3 2014-10-01 10:48:06 +02:00
j
c2de06d9d8 better performances of ox.js.minify 2014-09-30 23:19:19 +02:00
j
46278349e3 fix ox.file 2014-09-30 21:30:25 +02:00
j
ec252440d9 from __futre__ import print_function 2014-09-30 21:27:26 +02:00
j
a9002374b1 fix ox.text in python 3 2014-09-30 21:17:15 +02:00
j
d4d09b56b6 use six to support python 2 and 3 2014-09-30 21:04:46 +02:00
j
1b1dcf1c58 add ts to video extensions 2014-09-29 18:03:56 +02:00
j
ff0d776b09 fix ox.web.youtube 2014-09-28 21:57:45 +02:00
j
14ea6a0f7d fix ox.django 2014-09-28 21:57:31 +02:00
j
954312e0d6 support more kwargs to __init__ 2014-09-05 13:04:18 +02:00
rolux
a0666acf89 parse_useragent: add Mac OS X 10.10 Yosemite 2014-09-04 18:50:34 +02:00
j
9edf30085e fix ox.iso language lookup 2014-09-03 13:48:11 +02:00
j
3f15161bed fix ox.iso 2014-07-22 17:32:34 +02:00
j
25c203e981 use metadata from ffmpeg2theora if available 2014-07-20 12:54:13 +02:00
j
2f129c4766 parse language from audio track if video has multiple audio tracks 2014-07-20 11:35:55 +02:00
j
bc9c3c8944 map track language to track 2014-07-20 11:20:43 +02:00
j
2bd1c7d657 handly empty subs 2014-07-20 11:20:31 +02:00
j
5e2b3cf448 fix imdb poster 2014-07-10 09:38:36 +02:00
j
5488920d07 add more video extensions 2014-06-04 14:04:25 +03:00
j
2ee2087b1d add aiff 2014-05-17 22:24:17 +02:00
j
f3295c0eec dont fail if running outside of django env 2014-05-17 18:30:15 +02:00
j
92d7c210ca work around thread issues with ox.cache 2014-05-17 11:25:19 +02:00
j
07cd885b0a cleanup 2014-05-09 12:20:55 +02:00
j
73a60e73d7 add abebooks 2014-05-06 00:24:13 +02:00
j
8212c28ac7 handle broken headers 2014-04-23 15:38:38 +02:00
j
94ca01a041 string.letters is changes uppercase position between python version, use string.ascii_uppercase 2014-04-22 19:03:32 +02:00
j
d2a6511a95 add timeout argument to ox.web.youtube.info 2014-04-22 16:15:20 +02:00
j
cdc56bc63f add lookupbyisbn 2014-04-03 12:15:30 +02:00
j
9c844d0ce7 fix amazon parser 2014-04-03 01:34:15 +02:00
rolux
cc72dc96d3 ox.image: don't create array of identical arrays 2014-03-25 12:44:50 +01:00
j
87a89f0594 update user-agent string 2014-03-19 10:47:15 +01:00
j
7383bf08c4 fix content-disposition 2014-03-01 14:17:23 +01:00
j
075e735cd1 update ox.web.youtube 2014-02-19 14:09:54 +05:30
j
1c871f4d31 add method to add Access-Control-Allow-Origin to HttpFileResponses 2014-02-05 06:37:37 +00:00
j
34691832eb revert change, fragment_fromstring only parses single element 2014-02-04 10:44:51 +00:00
j
8bda86c17d use fragment_fromstring instead of document_fromstring 2014-02-04 10:40:01 +00:00
j
7577b319ce dont take random number if film has no year 2014-01-17 23:09:45 +05:30
j
d1a5613f3f more summary fixes 2014-01-16 13:56:07 +05:30
j
5a61dea925 fix imdb plotsummary parser 2014-01-16 13:49:30 +05:30
j
5179a4fcf9 add yt 4k format 2014-01-15 22:03:39 +05:30
j
2456ec2d5a ox.web.youtube: use in/out/value like ox.srt, decode html value 2014-01-15 20:12:14 +05:30
j
575549ae33 only add api methods to API instance 2014-01-03 00:54:49 +05:30
j
2abe99c89f fix wikipedia movie parser 2013-12-22 13:38:43 +05:30
j
5c1ab13749 no need to load json string into ram 2013-11-15 16:16:21 +01:00
j
37cd92dfba fix html cleanup of empty string 2013-12-01 12:35:38 +00:00
rolux
6f68729b6f API: don't fail on missing 'doc' property 2013-11-23 15:17:59 +01:00
j
d664d99f89 rewrite sanitize_html to support global attributes 2013-11-10 22:00:24 +00:00
j
d8bb547e25 workaround for python2.6 2013-11-06 10:33:45 +01:00
j
828223ad82 dont break ox.API subclasses 2013-11-03 16:39:57 +01:00
j
714729fee7 return new class for each ox.API call 2013-11-02 17:40:01 +01:00
j
d38da54a17 strip <p> 2013-10-31 13:49:55 +01:00
j
5dcd8b3552 allow iframes in sanitize_html 2013-10-24 16:40:04 +00:00
j
38853b1f4b detect IE11, part of ticket #1917 2013-10-24 00:24:13 +02:00
j
e3ee66fe08 trivia 2013-10-21 17:33:00 +02:00
j
0effb090a3 move EXTENSIONS to ox.file and add image type 2013-10-14 20:07:05 +02:00
j
5c6ff50027 use iter to read file with a multiple of hash block_size(sha1sum) 2013-10-14 12:35:07 +02:00
j
7d712445bf utf-8 filenames 2013-10-11 20:38:35 +02:00
j
413848638b remove debug 2013-10-11 20:13:10 +02:00
j
68b0e525ca fixes for django 1.5.x 2013-10-11 20:12:37 +02:00
j
36c7e95788 support nulls_last in sqlite 2013-10-11 20:12:23 +02:00
j
74a9b812b0 update user agent, fixes #1894 2013-09-27 18:14:34 +02:00
j
98ab0e29db support returning more than 10 results 2013-09-08 15:56:57 +02:00
rolux
cb45a25a7c geo.get_country: allow name as arg, not just code 2013-08-28 12:06:56 +02:00
j
22eecc22e4 allow more html5 tags 2013-08-27 08:51:18 +00:00
j
a8e76893d3 only use most common title per type, fixes #1826 2013-08-24 17:30:37 +02:00
rolux
f429ed8b07 add geo.split_geoname 2013-08-18 11:56:48 +02:00
j
3cc5659310 add option to get tweets from one user 2013-08-01 15:14:06 +02:00
rolux
3bf45b9d33 update UA parser 2013-07-30 19:06:01 +02:00
rolux
68a324d8fa update UA parser 2013-07-30 18:33:33 +02:00
j
611db3ed7b fix typos 2013-07-30 15:22:23 +02:00
rolux
893a70791c update ua parser 2013-07-29 19:03:46 +02:00
rolux
996344c689 update ua parser 2013-07-29 18:22:22 +02:00
rolux
b7f98ffecd cosmetic changes 2013-07-25 09:29:24 +02:00
j
ba6ee2e62e make sound unique 2013-07-23 14:54:32 +02:00
j
f3d26879fd one more 2013-07-16 13:42:58 +02:00
j
aa8641f22f more titles to ignore, cloes #1532 2013-07-16 13:41:49 +02:00
j
7acbc72305 return utf-8 encoded json 2013-07-16 11:10:47 +00:00
j
02afccc253 normalize alternative title country names 2013-07-16 11:41:16 +02:00
j
07e1a36ba9 filter working titles, one more World-wide/Internaltional 2013-07-16 11:35:55 +02:00
j
5b9cb279ba world-wide title 2013-07-16 11:02:43 +02:00
j
4c41db9460 add script to update ox.geo.COUNTRIES, normalize_country_name takes and returns a unicode string 2013-07-13 16:14:25 +02:00
j
adfe642547 use geo.normalize_country_name for normalize imdb names 2013-07-13 15:48:26 +02:00
j
ad7e21e7a8 fix lxml unicode handling 2013-07-04 20:32:54 +02:00
j
b1d248c4df add timeout as option to twitter.find, also return html 2013-07-04 12:22:56 +02:00
rolux
0d9bba8865 in wrapText, when testing smaller line widths, don't hyphenate words that were previously not hyphenated 2013-07-03 13:08:41 +02:00
j
330cc5ff3b fix missing space i.e. for:
ox.image.wrapText("VERTEIDIGUNG DER ZEIT",  608, 2, "./data/MontserratBold.ttf", 48)
2013-07-02 20:48:37 +02:00
j
6996f9c422 fix typos 2013-07-02 13:00:27 +00:00
j
1d429b6d33 fix date time serialization 2013-07-02 14:44:18 +02:00
j
78986c671e alternative titles are flipped now 2013-06-29 18:50:10 +02:00
j
deaa2bb988 dont return x as release date 2013-06-29 18:21:58 +02:00
j
1e65f3e478 fix dates < 1900 2013-06-28 14:57:31 +00:00
j
f7e9605828 fix release date parser 2013-06-28 16:53:25 +02:00
j
d7bd98d63a add www. 2013-06-17 22:43:44 +02:00
j
27df553ffb move 2013-06-14 12:19:36 +02:00
j
15ba89bac2 use ddg 2013-06-14 12:17:18 +02:00
j
e2bffccd36 add youtube playlist parser 2013-06-12 18:28:19 +02:00
rolux
f3d91b78d6 update criterion.py 2013-06-09 16:48:58 +02:00
rolux
339b7026f5 ox.file: add ensure_ascii parameter to write_json; add write_image method (write_path + image.save) 2013-06-09 16:45:26 +02:00
j
3951c67623 only look at function closures 2013-06-07 09:44:00 +00:00
j
223ac3c534 pass settings to api template 2013-06-07 11:25:28 +02:00
j
008654ad5d fix api documentation for double decorators 2013-06-07 11:13:01 +02:00
rolux
d79a3c0b95 fix #1574 (wrong id for non-imdb series episode 0 item) 2013-06-06 20:37:46 +02:00
rolux
cca251bc32 ox.image: add getTextSize method 2013-06-06 17:29:52 +02:00
j
f086c64e51 parse arsenal 2013-06-06 11:20:43 +02:00
j
f535b82e7b faster and more reliable encoding detection of html content 2013-06-01 13:29:24 +02:00
j
3165e3a8b1 fix unicode detection 2013-06-01 13:21:13 +02:00
j
7d7c7c9407 fix ddg 2013-06-01 12:25:20 +02:00
j
96f7975747 episodes without season/episode are season 1, add season besides having it in the title, fixes #1548 2013-05-31 22:34:06 +02:00
j
8038b0d13f fix google parser, and by that imdb id lookup, fixes #1545 2013-05-31 22:05:25 +02:00
j
cb3701d3e2 more unwanted akas, fixes #1532 2013-05-31 21:45:25 +02:00
rolux
2daae6a4c3 less error-prone version 2013-05-31 16:01:55 +02:00
rolux
14137c30c8 can't access set via index 2013-05-31 15:57:42 +02:00
rolux
762fab3519 typo 2013-05-31 15:56:49 +02:00
rolux
c33edd3ff3 fix '.en' stripping (movies can have parts, do don't check for '1 srt is en', but 'all srts are en') 2013-05-31 15:52:09 +02:00
j
8bd76ed27f normalize_paths not needed as extra function 2013-05-31 15:23:38 +02:00
rolux
913c8f4c1b in parse_item_files, strip unneeded '.en' (if, per version and per subtitle extension, there is only one language=='en' file) 2013-05-31 15:18:33 +02:00
j
e1508f4068 add movie.normalize_paths and call in parse_item_files 2013-05-31 13:06:42 +00:00
j
986a788bc7 fix #1546 2013-05-31 11:54:57 +02:00
rolux
647f027e8a criterion.py: fix title and synopsis detection 2013-05-31 11:03:09 +02:00
j
4f0654db68 fix keywords, fixes #1541 2013-05-30 21:12:28 +02:00
rolux
540f0bc4bd fix imdb keywords parser 2013-05-30 21:10:06 +02:00
j
100a93296f dont keep originalTitle for episodes, fixes #1535 2013-05-30 13:55:54 +02:00
j
cb9a791a97 ignore more alternative titles, fixes #1532 2013-05-30 11:59:40 +02:00
j
a93dc6e37b fix filmingLocations 2013-05-15 00:23:00 +02:00
j
8563ea8239 add column, line to javascript tokenizer tokens 2013-05-10 13:00:32 +00:00
j
be9424036f add 3gp to video types 2013-05-09 21:17:27 +02:00
j
642e50b721 remove debug 2013-05-07 18:41:44 +02:00
j
0ad7a088bf fix wikipedia parser 2013-05-06 10:25:56 +02:00
j
7e4c2bdaff youtube 2013-03-23 22:28:47 +05:30
j
89f868fc39 composer lists of lists 2013-03-14 14:17:10 +05:30
j
f7186b936c fix wikipedia parser 2013-03-02 05:15:57 +05:30
j
24410c458a mini-series too 2013-03-01 16:01:35 +05:30
j
02b20b0042 expose isSeries flag 2013-03-01 15:15:18 +05:30
j
79ecde337c parse composer 2013-02-26 13:20:06 +05:30
j
8b3230df05 parse color and sound better 2013-02-25 19:23:44 +05:30
j
2d65dcd16a parse color and sound 2013-02-25 19:18:14 +05:30
j
e896dac0de camel case 2013-02-25 13:57:48 +05:30
j
8609e6e9f4 make language unique, fixes #1280 2013-02-20 09:46:34 +05:30
j
1b13c7cb00 parse production companies 2013-02-18 19:20:30 +05:30
j
6e3390d65a fix default values from south migration 2013-02-16 00:35:34 +00:00
j
5e989bcb19 fix movie.parse_path, set language 2013-02-17 20:09:48 +05:30
rolux
340c7fb924 add normalize_country_name method 2013-02-09 10:07:38 +05:30
rolux
f7d72335ef update format_path / parse_path 2013-02-08 23:28:21 +05:30
j
284cb97f1c fix #1216 ignore thousand separators 2013-02-01 16:13:40 +05:30
j
4ae8783d27 ignore cache if not able to load json file 2013-01-31 20:29:21 +05:30
j
e2936705c4 fix import 2013-01-31 20:07:11 +05:30
j
bb78574bc0 add ox.web.twitter 2013-01-31 19:48:07 +05:30
j
a7307ec08a alos load releaseinfo for international titles, fixes #1128 2013-01-31 14:55:23 +05:30
j
f0c2b888d7 fix poster lookup 2013-01-29 08:05:31 +05:30
j
2bfd63ff0f add ox.fix_bad_unicode 2012-12-30 16:16:23 +01:00
j
8021672cfd add ox.web.ubu 2012-12-29 13:04:24 +01:00
rolux
5b5cb0f2c4 typo 2012-12-27 15:05:58 +01:00
rolux
e4e6a2ff3a image module: update getImageHash method 2012-12-27 14:47:15 +01:00
rolux
0ddb32858b image module: add getImageHash method 2012-12-27 14:45:20 +01:00
rolux
5c377a5d3a saner copyfile method (that works for files > max python str len) 2012-12-26 21:26:43 +01:00
j
7884e8671a dont use Japan (imdb display title) (English title) 2012-11-19 12:04:06 +01:00
j
9dfcb77be8 international english before often original uk 2012-11-11 17:23:34 +01:00
j
6529e5f1c1 some more special cases 2012-11-11 17:15:40 +01:00
j
43a54740bd restored version 2012-11-11 03:08:54 +01:00
j
8229a651f7 USA/UK 2012-11-11 02:31:51 +01:00
rolux
b8b7e666bc ua parser: detect chrome on ios 2012-11-09 23:07:31 +01:00
j
11a00db3bb update series detection, fixes #1153 2012-11-09 14:28:34 +01:00
j
c7a2ef21c7 titles 2012-11-08 20:34:26 +01:00
j
91ebecccc3 alternative spelling 2012-11-07 20:07:24 +01:00
j
6eb18c3077 use title from series entry as series title 2012-11-07 00:05:18 +01:00
j
15e78b8cb1 , 2012-11-04 16:12:28 +01:00
j
8c6f211f25 add more video extensions 2012-11-04 16:08:00 +01:00
j
dd316160be fix youtube download urls 2012-11-01 13:14:59 +01:00
rolux
0001859ba8 improve parse_useragent 2012-10-27 21:59:40 +02:00
rolux
d73833f467 in parse_useragent, add chromium 2012-10-27 21:35:46 +02:00
rolux
48240cee7f update parse_useragent 2012-10-27 18:51:39 +02:00
j
71b99c8ce9 better unicode support for download names 2012-10-24 17:18:23 +02:00
j
3b7b87a89f match charset in lower case 2012-10-19 14:22:39 +02:00
j
639b74eac3 reduce choice of alternative titles for english films, fixes #1084 2012-10-10 13:27:19 +02:00
j
13a55763d3 take first title only, fixes #1088 2012-10-10 13:08:37 +02:00
j
eaa8b9995f use title/originalTitle not title/internationalTitle 2012-10-09 14:55:29 +02:00
j
073aaa1b32 make directors unique but keep order, fixes #1076 2012-10-08 13:44:02 +02:00
rolux
104e9f1c5c format_path: don't use more than 10 director names 2012-10-08 11:27:28 +02:00
j
7fe62b5ce3 regex first, fixes #1058 2012-10-01 23:29:57 +02:00
j
102365eb8e ignore - descriptions, fixes #1047 2012-09-30 12:17:45 +02:00
j
1935b76b46 also strip single quotes from titles. fixes #1050 2012-09-30 12:14:33 +02:00
j
e0dd4d53b1 fix only one connection, parse connection description too 2012-09-29 18:13:58 +02:00
j
da09714910 parse new movieconnections page, fixes #1045 2012-09-29 17:55:37 +02:00
rolux
52932bccec fix opera version detection 2012-09-29 14:48:19 +02:00
j
c690f811b6 minor change 2012-09-25 13:54:54 +02:00