As the title suggests, i am looking for a way to read a PDF, redact certain words (make them black) and save the PDF-file. I think its possible, I just don't know how. Any help/tips are highly appreciated!
Redact certain words from PDF-file in Python
312 Views Asked by Dion At
1
There are 1 best solutions below
Related Questions in PYTHON
- How to store a date/time in sqlite (or something similar to a date)
- Instagrapi recently showing HTTPError and UnknownError
- How to Retrieve Data from an MySQL Database and Display it in a GUI?
- How to create a regular expression to partition a string that terminates in either ": 45" or ",", without the ": "
- Python Geopandas unable to convert latitude longitude to points
- Influence of Unused FFN on Model Accuracy in PyTorch
- Seeking Python Libraries for Removing Extraneous Characters and Spaces in Text
- Writes to child subprocess.Popen.stdin don't work from within process group?
- Conda has two different python binarys (python and python3) with the same version for a single environment. Why?
- Problem with add new attribute in table with BOTO3 on python
- Can't install packages in python conda environment
- Setting diagonal of a matrix to zero
- List of numbers converted to list of strings to iterate over it. But receiving TypeError messages
- Basic Python Question: Shortening If Statements
- Python and regex, can't understand why some words are left out of the match
Related Questions in PDF
- How to use custom font during html to pdf conversion?
- How to get content of BLOCK types LAYOUT_TITLE, LAYOUT_SECTION_HEADER and LAYOUT_xx in Textract
- PDF form checkbox/radio button ignores content stream
- Suggest python library for rendering html to pdf files
- Problems with the order in which PDF files are created
- Centering a map element on a generated PDF
- download all pdf files from website doesn't support wildcard
- How to enter external pdf into quarto book while keeping page layout+numbering
- How do I create a website that combines user input and standard text and converts it into a pdf?
- Excel VBA error 1004 on PDF export - not a path issue
- downloading pdf using requests not working
- Creating pdf on Firestore with Pdfplum: Template path "no such object"
- Export password protected PDF from QGIS
- XPS convert PDF with Ghostscript
- Download PDF in ASP.NET MVC application
Related Questions in REDACTION
- Chrome extension to remove or redact names from any webpage
- PdfSweep 4.0.1 partial redaction of found string
- Seemingly random non ascii characters in response body when using APISIX plugin to rewrite response
- Oracle Redaction and Definer's Rights Stored Procedures
- Has anyone used AWS Textract to add OCR text to PDFs in Python?
- How to redact a stringified object using Pino LoggerOptions
- Redact sensitive info from urllib3 logger
- How to redact texts in a pdf file in NodeJs
- Saving a redacted PDF file in Python to mask underneath text
- Using Python to search for hidden data
- Redact function not working in Oracle SQL Developer
- Python script to redact keywords in Word document
- Redact certain words from PDF-file in Python
- how to de-identify/ redact word files using GCPs DLP API in python
- How to create a Redact Policy in oracle
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Disclaimer: I am the author of
borbthe library used in this answer.The general idea for this solution is to determine which rectangular areas need to be redacted. And then, in a second step, apply those redactions.
The first step can be achieved using
RegularExpressionTextExtraction. This class looks through an entire PDF, one page at a time, matching a regular expression. It then spits out a list of matches (containing the rectangular area they matched).Here's an example of that particular code.
Next up is adding a
RedactionAnnotationto eachPage.Now if you want to actually remove the contents, you can simply apply the annotations.