python
Tags: computers
Context sensitivity: https://groups.google.com/g/comp.theory/c/nJzOyRnAP3k/m/AInZxo48FNIJ?pli=1
Parsing PDF’s in Python
- You can use:
pypdf.
orpypdf4
pdfminer
- also
pypdfparser
- also
textract
-> https://textract.readthedocs.io/en/stable/python_package.htmlpoppler
-> pdf rendering engine?- OCR ->
tesseract
withpytesseract
, preprocess withopencv
tika
.with the pythontika
interface
Difflib
- https://docs.python.org/3/library/difflib.html
- generates diffs
- has a series of diffs that allow for fleixible computing of sequences including:
- html
- pure strings (
get_close_matches
) - number type diffs
- has a series of diffs that allow for fleixible computing of sequences including:
OS
- CPU Count:
- https://docs.python.org/3/library/os.html#os.cpu_count
- https://docs.python.org/3/library/os.html#os.sched_getaffinity is not available on all platforms