Chrome extension: how to show UI for PDF file?
I am trying to write a Google Chrome extension to display PDF files. Once I find that the browser is redirecting the URL pointing to the PDF file, I want it to stop loading the default PDF viewer, but start showing its interface instead. The UI will use PDF.JS to render PDF and jQuery-ui to show some other stuff.
The question is: how should I do this? It's very important to block the original PDF viewer because I don't want to double my memory consumption by showing two copies of the document. So I have to move the tab to its own view somehow.
source to share
As the main author of the PDF.js Chrome extension , I can share some insights into the build logic of the PDF Viewer Chrome extension.
How do I identify a PDF file?
In an ideal world, every website would serve PDF files with a standard application/pdf
MIME type . Unfortunately, the real world is not perfect and in practice there are many sites that use the wrong MIME type. You will catch most cases by selecting queries that satisfy any of the following conditions:
- The resource is served with a response header
Content-Type: application/pdf
. - The resource is served with a response header
Content-Type: application/octet-stream
and its URL contains ".pdf" (case insensitive).
In addition, you also need to determine if the user wants to view the PDF file or download the PDF file. If you're not interested in the difference, it's simple: just intercept the request if it matches any of the previous conditions.
Otherwise (and this is the approach I took), you need to check if a response header exists Content-Disposition
and its value starts with " attachment
".
If you want to support PDF downloads (e.g. through the UI) you need to add a response header Content-Disposition: attachment
. If the title already exists, you need to replace the existing placement type (for example inline
) with "attachment". Don't bother trying to parse the full meaning of the title, just separate the first part up to the first semicolon and then put "attachment" in front of it. (If you really want to parse the header, read RFC 2616 (section 19.5.1) and RFC 6266 ).
What Chrome APIs (Extension) should I use to intercept PDF files?
The API chrome.webRequest
can be used to intercept and redirect requests. With the following logic, you can intercept and redirect PDFs to your custom viewer, which requests a PDF file from a given URL.
chrome.webRequest.onHeadersReceived.addListener(function(details) {
if (/* TODO: Detect if it is not a PDF file*/)
return; // Nope, not a PDF file. Ignore this request.
var viewerUrl = chrome.extension.getURL('viewer.html') +
'?file=' + encodeURIComponent(details.url);
return { redirectUrl: viewerUrl };
}, {
urls: ["<all_urls>"],
types: ["main_frame", "sub_frame"]
}, ["responseHeaders", "blocking"]);
(see https://github.com/mozilla/pdf.js/blob/master/extensions/chromium/pdfHandler.js for actually implementing PDF detection using the logic at the top of this answer)
With the above code, you can intercept any PDF file on http and https addresses. If you want to view PDF files from your local file system and / or ftp, you need to use chrome.webRequest.onBeforeRequest
instead . Fortunately, you can assume that if the file ends with ".pdf", then the resource is most likely a PDF file. Users who want to use the extension to view the local PDF file must explicitly allow this on the extension's settings page. onHeadersReceived
On Chrome OS, use the chrome.fileBrowserHandler
API to register your extension as a PDF viewer ( https://github.com/mozilla/pdf.js/blob/master/extensions/chromium/pdfHandler-vcros.js ).
Methods based on the webRequest API only work for PDFs in top-level documents and frames, not PDFs embedded via <object>
and <embed>
. While they are rare, I wanted to support them anyway, so I came up with an unconventional method for detecting and loading the PDF viewer in these contexts. The implementation can be viewed at https://github.com/mozilla/pdf.js/pull/4549/files... This technique is based on the fact that when an element is placed in a document, it must eventually be rendered. When it is displayed, CSS styles are applied. When I declare animation for embed / object elements in CSS, the animation events will fire. These events bubble up in the document. Then I can add a listener for this event and replace the content of the object / embed element with an iframe loading my PDF Viewer.
There are several ways to replace an element or content, but I used Shadow DOM to change the displayed content without affecting the DOM on the page.
Limitations and Notes
The method described here has several limitations:
-
The PDF file is requested at least two times from the server: first, a normal request for headers, which is aborted when the extension is redirected to the PDF viewer. Then another request for an actual data request.
Hence, if the file is only valid once, then the PDF cannot be displayed (the first request is not valid for the URL, and the second request does not work). -
This method only works for GET requests. There is no public API to directly get response bodies from a request in a Chrome extension ( crbug.com/104058 ).
-
A method for working PDFs for elements
<object>
and<embed>
requires a script to be executed on each page. I profiled the code and found that the performance impact is negligible, but you still need to be careful if you want to change the logic.
(I first tried using Mutation Watchers to detect, which slowed down page loads by 3-20% on huge documents and caused an additional 1.5GB spike in memory usage in the DOM benchmark test.) -
The
<object>
/ tag detection method<embed>
may still cause any NPAPI / PPAPI based PDF plugins to be loaded as it only replaces the<embed>
/ tag<object>
when it has already been inserted and displayed. When the tab is inactive, no animations are scheduled and therefore dispatch of the animation event will be significantly delayed.
Afterword
PDF.js is open source, you can view the code for the Chrome extension at https://github.com/mozilla/pdf.js/tree/master/extensions/chromium . If you go through the source, you will notice that the code is a little more complex than I explained here. This is because extensions cannot forward requests on an event onHeadersReceived
until I implemented it a few months ago ( crbug.com/280464 , Chrome 35).
And there is also some logic to make the url in the omnibox look a little better.
The PDF.js extension continues to evolve, so unless you want to significantly change the PDF Viewer UI, I suggest asking users to install the official PDF.js PDF Viewer in the Chrome Web Store and / or open issues in PDF.js issue tracker for reasonable queries functions.
source to share
Custom PDF viewer
Basically, to accomplish what you want to do, you need to:
- Insert PDF url on upload;
- Stop PDF download,
- Start your own PDF viewer and download the PDF inside it.
how
-
Using the API
chrome.webRequest
, you can easily listen to web requests made by Chrome, and more specifically those that will download files.pdf
. Using an eventchrome.webRequest.onBeforeRequest
, you can listen for all requests ending in ".pdf" and get the url of the requested resource. -
Create a page, for example
display_pdf.html
, where you will see PDF files and do whatever you want with them. -
In a listener,
chrome.webRequest.onBeforeRequest
don't load a return resource{redirectUrl: ...}
to redirect to the pagedisplay_pdf.html
. -
Pass the PDF url to your page. This can be done in several ways, but for me the easiest is to add the encoded PDF url at the end of your page url, like an encoded query string, something like
display_pdf.html?url=http%3A%2F%2Fwww.example.com%2Fexample.pdf
. -
Inside the page, you get the URL-address with JavaScript and process and will render PDF from any library you want, for example pdf.js .
Code
Following the steps above, your extension will look like this:
<root>/
/background.js
/display_pdf.html
/display_pdf.js
/manifest.json
So, first of all, let's take a look at the file : you will need to declare permissions for and , so it should look like this: manifest.json
webRequest
webRequestBlocking
{
"manifest_version": 2,
"name": "PDF Test",
"version": "0.0.1",
"background": {
"scripts": ["/background.js"]
},
"permissions": ["webRequest", "webRequestBlocking", "<all_urls>"],
}
Then in you listen for the event and update the tab that loads the PDF with the url of your custom page , like so: background.js
chrome.webRequest.onBeforeRequest
display_pdv.html
chrome.webRequest.onBeforeRequest.addListener(function(details) {
var displayURL;
if (/\.pdf$/i.test(details.url)) { // if the resource is a PDF file ends with ".pdf"
displayURL = chrome.runtime.getURL('/display_pdf.html') + '?url=' + encodeURIComponent(details.url);
return {redirectUrl: displayURL};
// stop the request and proceed to your custom display page
}
}, {urls: ['*://*/*.pdf']}, ['blocking']);
And finally, in your file, you extract the PDF url from the query string and use it to do whatever you want: display_pdf.js
var PDF_URL = decodeURIComponent(location.href.split('?url=')[1]);
// this will be something like http://www.somesite.com/path/to/example.pdf
alert('The PDF url is: ' + PDF_URL);
// do something with the pdf... like processing it with PDF.js
Working example
A working example of what I said above can be found HERE .
Links to documentation
I recommend that you check out the official documentation of the above APIs which you can find at the following links:
source to share