Rendering and Highlighting PDFs in React with react-pdf
Rendering and Highlighting PDFs in React with react-pdf
Overview
There are two fundamentally different PDF use cases in web applications:
- Generate PDFs from structured data (e.g., JSON → PDF).
- Render and interact with existing PDF documents (e.g., viewing, pagination, text highlighting, OCR overlays).
While exploring json-render.dev React PDF renderer, it became clear that it relies on @react-pdf/renderer (from react-pdf) for PDF generation.
However, for rendering and interacting with existing PDFs in a web app, the correct tool is react-pdf, which is a React wrapper around PDF.js.
This article documents the distinction and shows how to render a PDF with:
- Pagination
- Text-layer highlighting
- Coordinate-based OCR overlays
The Discovery
Initial exploration:
json-render.dev→ generates PDFs from JSON specs.- Under the hood:
@react-pdf/renderer. - Good for document creation.
- Not designed for rendering existing PDFs interactively in the browser.
Then discovered:
wojtekmaj/react-pdf- Wraps PDF.js
- Designed for displaying and interacting with existing PDFs
-
Supports:
- Text layer
- Annotation layer
- Page rendering
- Pagination
- Custom text rendering
This enables highlighting both:
- Native PDF text (via
customTextRenderer) - Arbitrary regions (e.g., OCR bounding boxes)
How It Works
Mental Model
react-pdf (wojtekmaj) wraps PDF.js and exposes:
<Document />→ Loads PDF<Page />→ Renders a page- Text layer (optional)
- Annotation layer (optional)
Rendering layers:
Canvas layer → actual PDF page raster
Text layer → selectable/searchable text
Annotation layer → links, annotations
Custom overlays → your absolute-positioned divs
Text Highlighting
PDF.js extracts text into spans in the text layer.
You can override rendering with:
customTextRenderer={({ str }) => highlightText(str)}
This allows injecting HTML like <mark>.
OCR-Based Highlighting
OCR systems often return coordinates in:
- PDF points
- 1 point = 1/72 inch
- Origin: bottom-left (PDF coordinate system)
In this example, coordinates are directly mapped assuming:
- Page scale = 1
- Top-left origin adjustment already handled (inferred — verify if Y-axis inversion is required depending on PDF)
Implementation
Installation
npm install react-pdf
Worker Setup
Required for PDF.js:
pdfjs.GlobalWorkerOptions.workerSrc = new URL(
"pdfjs-dist/build/pdf.worker.min.mjs",
import.meta.url
).toString();
Without this, rendering fails.
Full Example
import "./App.css";
import { useState } from "react";
import { Document, Page, Thumbnail, pdfjs } from "react-pdf";
import "react-pdf/dist/Page/AnnotationLayer.css";
import "react-pdf/dist/Page/TextLayer.css";
pdfjs.GlobalWorkerOptions.workerSrc = new URL(
"pdfjs-dist/build/pdf.worker.min.mjs",
import.meta.url
).toString();
function App() {
const [numPages, setNumPages] = useState<number>();
const [pageNumber, setPageNumber] = useState<number>(3);
const scale = 1;
function onDocumentLoadSuccess({ numPages }: { numPages: number }): void {
setNumPages(numPages);
}
const highlightText = (text: string) => {
return text.replace(/Kernel GroupChat/g, "<mark>$&</mark>");
};
// OCR data: coordinates are in PDF points (1/72 inch)
const ocrHighlights = [
{ page: 3, x: 100, y: 200, width: 150, height: 20, text: "Sample Text" },
{ page: 3, x: 100, y: 250, width: 200, height: 20, text: "Another highlight" },
];
return (
<div style={{ display: "flex", gap: "20px" }}>
<Document file="/MCP.pdf" onLoadSuccess={onDocumentLoadSuccess}>
<div style={{ position: "relative" }}>
<Page
pageNumber={pageNumber}
scale={scale}
customTextRenderer={({ str }) => highlightText(str)}
/>
{ocrHighlights
.filter((h) => h.page === pageNumber)
.map((highlight, i) => (
<div
key={i}
style={{
position: "absolute",
left: `${highlight.x * scale}px`,
top: `${highlight.y * scale}px`,
width: `${highlight.width * scale}px`,
height: `${highlight.height * scale}px`,
backgroundColor: "yellow",
opacity: 0.4,
pointerEvents: "none",
}}
title={highlight.text}
/>
))}
</div>
<p>
<button
onClick={() => setPageNumber(pageNumber - 1)}
disabled={pageNumber <= 1}
>
Previous
</button>
<span>
Page {pageNumber} of {numPages}
</span>
<button
onClick={() => setPageNumber(pageNumber + 1)}
disabled={pageNumber >= (numPages || 1)}
>
Next
</button>
</p>
</Document>
</div>
);
}
export default App;
Gotchas & Observations
1. Worker Configuration Is Mandatory
If workerSrc is not configured, rendering silently fails.
2. Text Highlighting Is String-Based
customTextRenderer operates on text fragments.
Implications:
- Regex must match fragment boundaries
- Long phrases may be split across spans
- Highlighting multi-line text is non-trivial
3. OCR Coordinates Require Careful Mapping
OCR systems usually return:
- Bottom-left origin
- PDF point units
You may need to:
-
Invert Y-axis:
top = pageHeight - y - height -
Apply scaling correctly
The current example assumes alignment is already correct. This must be validated per document.
4. Scale Synchronization
When changing scale:
- All overlay coordinates must be multiplied by the same scale.
- Otherwise, overlays drift.
5. Performance
Rendering large PDFs:
- Each page renders to canvas
- Text layer adds DOM nodes
- Multiple pages increase memory pressure
For heavy documents:
- Render one page at a time
- Avoid rendering thumbnails unless needed
Conclusion
Use cases break down clearly:
| Use Case | Tool |
|---|---|
| Generate PDFs from React components | @react-pdf/renderer |
| Render existing PDFs in browser | wojtekmaj/react-pdf |
If you need:
- Text selection
- Highlighting
- OCR overlays
- Pagination
- Interactive viewing
→ Use react-pdf (PDF.js wrapper).
If you need:
- Programmatic document generation
- Declarative PDF layouts
→ Use @react-pdf/renderer.
Next Steps
- Implement coordinate normalization layer for OCR alignment.
- Add search indexing across pages.
- Explore virtualized multi-page rendering for large documents.
- Evaluate text-layer extraction for semantic indexing.