Commit graph

38 commits

Author SHA1 Message Date
j
9f396acd48 dont extract text if extract_text is false 2016-02-20 20:24:23 +05:30
j
a96b55e006 dont take pdf metadata if title starts with Microsoft Word 2016-02-14 20:26:30 +05:30
j
9747f27d31 run pdftotext only once 2016-02-07 17:11:00 +05:30
j
0e3794e6a3 hide window, open file not folder 2016-02-01 00:49:25 +05:30
j
24d4c4dc70 pdftotext also need short names 2016-01-31 23:01:52 +05:30
j
5dead44107 windows pathnames 2016-01-31 22:58:53 +05:30
j
7380c9aab7 dont cloes_fds if stdout/stderr is piped 2016-01-31 18:55:12 +05:30
j
95af9f4f4a pdf with html description 2016-01-29 22:17:39 +05:30
j
c03f72b47c dont fail parsing parts of the pdf 2016-01-25 15:51:54 +05:30
j
d70bd8797a s/exc_info=1/exc_info=True/g 2016-01-24 14:43:03 +05:30
j
f43fc6a172 add meta.extract_text 2016-01-19 21:34:32 +05:30
j
b9a8c91868 some attributes don't work 2016-01-13 11:33:47 +05:30
j
de984a344e extract tableofcontents from pdf 2016-01-12 14:57:33 +05:30
j
02e040d9f5 store metadata per user. remove primaryid. only store isbn13 2016-01-11 19:17:12 +05:30
j
71d8825783 normalize names 2016-01-08 16:15:10 +05:30
j
619a2fbd37 split pdf author 2015-12-25 20:23:22 +05:30
j
f8c09226de normalize language 2015-12-25 19:40:49 +05:30
j
c5afc46af1 cleanup pdf 2015-12-25 13:33:32 +05:30
j
ebc0b95022 better pdf parsing 2015-12-24 20:30:14 +05:30
j
d497e89b2b use logging.getLogger(__name__) 2015-11-29 15:56:38 +01:00
j
7a76e21e99 only strip strings 2015-02-22 16:37:42 +05:30
j
d722ae004b handle utf-16 pdf info 2014-11-15 00:57:49 +00:00
j
c6c8e0dc8a try to decrypt pdf with empty password if its encrypted 2014-10-31 16:13:02 +01:00
j
c961aa5c64 fix text extraction on osx 2014-09-30 22:30:09 +02:00
j
8c6164e0c4 use PyPDF2 2014-09-08 20:46:09 +02:00
j
de68f4c4c4 more py3 porting 2014-09-03 01:09:42 +02:00
j
8e27b9f76e port to python3 2014-09-03 00:38:34 +02:00
j
2cd77e07a2 close_fds=True by default 2014-08-22 18:49:11 +02:00
j
7e7478be30 fix pdf info 2014-05-27 11:09:06 +02:00
j
21d6324eb6 performance 2014-05-27 01:45:29 +02:00
j
b3caaf335a use poppler pdftocairo for preview 2014-05-25 14:44:07 +02:00
j
feddea0ccd lots of stuff 2014-05-21 02:02:21 +02:00
j
326a8f75c6 postupdate, pdf osx fixes 2014-05-20 02:08:38 +02:00
j
9aef3616ba extract textsize, take timestamp for changelog entries update peers on peering events 2014-05-19 11:38:41 +02:00
j
d6f350e5a1 import/lists/autocompleteFolder 2014-05-19 01:24:16 +02:00
j
e4ca454c41 queue peering requests and send again 2014-05-18 05:01:24 +02:00
j
c58a8a5bcb osx fixes 2014-05-16 19:08:33 +02:00
j
2ee2bc178a Open Media Library 2014-05-12 04:09:31 +02:00