|
7b826468d8
|
remove crop debugging
|
2024-06-10 16:26:27 +01:00 |
|
|
5bd561e64f
|
extract detail from pdf
|
2024-06-09 14:47:36 +01:00 |
|
|
9d01259d66
|
space
|
2019-06-18 09:18:23 +02:00 |
|
|
c38d3a8b35
|
use pdftotext if available
|
2019-02-01 17:38:31 +05:30 |
|
|
b4c6f2b4ac
|
use .editorconfig
|
2019-01-24 18:36:20 +05:30 |
|
|
6c7d6bb6b0
|
for update
|
2019-01-15 13:20:11 +05:30 |
|
|
e877371a0b
|
avoid space titles
|
2016-03-18 18:35:41 +01:00 |
|
|
9f396acd48
|
dont extract text if extract_text is false
|
2016-02-20 20:24:23 +05:30 |
|
|
a96b55e006
|
dont take pdf metadata if title starts with Microsoft Word
|
2016-02-14 20:26:30 +05:30 |
|
|
9747f27d31
|
run pdftotext only once
|
2016-02-07 17:11:00 +05:30 |
|
|
0e3794e6a3
|
hide window, open file not folder
|
2016-02-01 00:49:25 +05:30 |
|
|
24d4c4dc70
|
pdftotext also need short names
|
2016-01-31 23:01:52 +05:30 |
|
|
5dead44107
|
windows pathnames
|
2016-01-31 22:58:53 +05:30 |
|
|
7380c9aab7
|
dont cloes_fds if stdout/stderr is piped
|
2016-01-31 18:55:12 +05:30 |
|
|
95af9f4f4a
|
pdf with html description
|
2016-01-29 22:17:39 +05:30 |
|
|
c03f72b47c
|
dont fail parsing parts of the pdf
|
2016-01-25 15:51:54 +05:30 |
|
|
d70bd8797a
|
s/exc_info=1/exc_info=True/g
|
2016-01-24 14:43:03 +05:30 |
|
|
f43fc6a172
|
add meta.extract_text
|
2016-01-19 21:34:32 +05:30 |
|
|
b9a8c91868
|
some attributes don't work
|
2016-01-13 11:33:47 +05:30 |
|
|
de984a344e
|
extract tableofcontents from pdf
|
2016-01-12 14:57:33 +05:30 |
|
|
02e040d9f5
|
store metadata per user. remove primaryid. only store isbn13
|
2016-01-11 19:17:12 +05:30 |
|
|
71d8825783
|
normalize names
|
2016-01-08 16:15:10 +05:30 |
|
|
619a2fbd37
|
split pdf author
|
2015-12-25 20:23:22 +05:30 |
|
|
f8c09226de
|
normalize language
|
2015-12-25 19:40:49 +05:30 |
|
|
c5afc46af1
|
cleanup pdf
|
2015-12-25 13:33:32 +05:30 |
|
|
ebc0b95022
|
better pdf parsing
|
2015-12-24 20:30:14 +05:30 |
|
|
d497e89b2b
|
use logging.getLogger(__name__)
|
2015-11-29 15:56:38 +01:00 |
|
|
7a76e21e99
|
only strip strings
|
2015-02-22 16:37:42 +05:30 |
|
|
d722ae004b
|
handle utf-16 pdf info
|
2014-11-15 00:57:49 +00:00 |
|
|
c6c8e0dc8a
|
try to decrypt pdf with empty password if its encrypted
|
2014-10-31 16:13:02 +01:00 |
|
|
c961aa5c64
|
fix text extraction on osx
|
2014-09-30 22:30:09 +02:00 |
|
|
8c6164e0c4
|
use PyPDF2
|
2014-09-08 20:46:09 +02:00 |
|
|
de68f4c4c4
|
more py3 porting
|
2014-09-03 01:09:42 +02:00 |
|
|
8e27b9f76e
|
port to python3
|
2014-09-03 00:38:34 +02:00 |
|
|
2cd77e07a2
|
close_fds=True by default
|
2014-08-22 18:49:11 +02:00 |
|
|
7e7478be30
|
fix pdf info
|
2014-05-27 11:09:06 +02:00 |
|
|
21d6324eb6
|
performance
|
2014-05-27 01:45:29 +02:00 |
|
|
b3caaf335a
|
use poppler pdftocairo for preview
|
2014-05-25 14:44:07 +02:00 |
|
|
feddea0ccd
|
lots of stuff
|
2014-05-21 02:02:21 +02:00 |
|
|
326a8f75c6
|
postupdate, pdf osx fixes
|
2014-05-20 02:08:38 +02:00 |
|
|
9aef3616ba
|
extract textsize, take timestamp for changelog entries update peers on peering events
|
2014-05-19 11:38:41 +02:00 |
|
|
d6f350e5a1
|
import/lists/autocompleteFolder
|
2014-05-19 01:24:16 +02:00 |
|
|
e4ca454c41
|
queue peering requests and send again
|
2014-05-18 05:01:24 +02:00 |
|
|
c58a8a5bcb
|
osx fixes
|
2014-05-16 19:08:33 +02:00 |
|
|
2ee2bc178a
|
Open Media Library
|
2014-05-12 04:09:31 +02:00 |
|