Chrome extension: how to show UI for PDF file?

I am trying to write a Google Chrome extension to display PDF files. Once I find that the browser is redirecting the URL pointing to the PDF file, I want it to stop loading the default PDF viewer, but start showing its interface instead. The UI will use PDF.JS to render PDF and jQuery-ui to show some other stuff.

The question is: how should I do this? It's very important to block the original PDF viewer because I don't want to double my memory consumption by showing two copies of the document. So I have to move the tab to its own view somehow.

+3


source to share


2 answers


As the main author of the PDF.js Chrome extension , I can share some insights into the build logic of the PDF Viewer Chrome extension.

How do I identify a PDF file?

In an ideal world, every website would serve PDF files with a standard application/pdf

MIME type . Unfortunately, the real world is not perfect and in practice there are many sites that use the wrong MIME type. You will catch most cases by selecting queries that satisfy any of the following conditions:

  • The resource is served with a response header Content-Type: application/pdf

    .
  • The resource is served with a response header Content-Type: application/octet-stream

    and its URL contains ".pdf" (case insensitive).

In addition, you also need to determine if the user wants to view the PDF file or download the PDF file. If you're not interested in the difference, it's simple: just intercept the request if it matches any of the previous conditions.
Otherwise (and this is the approach I took), you need to check if a response header exists Content-Disposition

and its value starts with " attachment

".

If you want to support PDF downloads (e.g. through the UI) you need to add a response header Content-Disposition: attachment

. If the title already exists, you need to replace the existing placement type (for example inline

) with "attachment". Don't bother trying to parse the full meaning of the title, just separate the first part up to the first semicolon and then put "attachment" in front of it. (If you really want to parse the header, read RFC 2616 (section 19.5.1) and RFC 6266 ).

What Chrome APIs (Extension) should I use to intercept PDF files?

The API chrome.webRequest

can be used to intercept and redirect requests. With the following logic, you can intercept and redirect PDFs to your custom viewer, which requests a PDF file from a given URL.

chrome.webRequest.onHeadersReceived.addListener(function(details) {
    if (/* TODO: Detect if it is not a PDF file*/)
        return; // Nope, not a PDF file. Ignore this request.

    var viewerUrl = chrome.extension.getURL('viewer.html') +
      '?file=' + encodeURIComponent(details.url);
    return { redirectUrl: viewerUrl };
}, {
    urls: ["<all_urls>"],
    types: ["main_frame", "sub_frame"]
}, ["responseHeaders", "blocking"]);

      

(see https://github.com/mozilla/pdf.js/blob/master/extensions/chromium/pdfHandler.js for actually implementing PDF detection using the logic at the top of this answer)



With the above code, you can intercept any PDF file on http and https addresses. If you want to view PDF files from your local file system and / or ftp, you need to use chrome.webRequest.onBeforeRequest

instead . Fortunately, you can assume that if the file ends with ".pdf", then the resource is most likely a PDF file. Users who want to use the extension to view the local PDF file must explicitly allow this on the extension's settings page. onHeadersReceived

On Chrome OS, use the chrome.fileBrowserHandler

API to register your extension as a PDF viewer ( https://github.com/mozilla/pdf.js/blob/master/extensions/chromium/pdfHandler-vcros.js ).

Methods based on the webRequest API only work for PDFs in top-level documents and frames, not PDFs embedded via <object>

and <embed>

. While they are rare, I wanted to support them anyway, so I came up with an unconventional method for detecting and loading the PDF viewer in these contexts. The implementation can be viewed at https://github.com/mozilla/pdf.js/pull/4549/files... This technique is based on the fact that when an element is placed in a document, it must eventually be rendered. When it is displayed, CSS styles are applied. When I declare animation for embed / object elements in CSS, the animation events will fire. These events bubble up in the document. Then I can add a listener for this event and replace the content of the object / embed element with an iframe loading my PDF Viewer.
There are several ways to replace an element or content, but I used Shadow DOM to change the displayed content without affecting the DOM on the page.

Limitations and Notes

The method described here has several limitations:

  • The PDF file is requested at least two times from the server: first, a normal request for headers, which is aborted when the extension is redirected to the PDF viewer. Then another request for an actual data request.
    Hence, if the file is only valid once, then the PDF cannot be displayed (the first request is not valid for the URL, and the second request does not work).

  • This method only works for GET requests. There is no public API to directly get response bodies from a request in a Chrome extension ( crbug.com/104058 ).

  • A method for working PDFs for elements <object>

    and <embed>

    requires a script to be executed on each page. I profiled the code and found that the performance impact is negligible, but you still need to be careful if you want to change the logic.
    (I first tried using Mutation Watchers to detect, which slowed down page loads by 3-20% on huge documents and caused an additional 1.5GB spike in memory usage in the DOM benchmark test.)

  • The <object>

    / tag detection method <embed>

    may still cause any NPAPI / PPAPI based PDF plugins to be loaded as it only replaces the <embed>

    / tag <object>

    when it has already been inserted and displayed. When the tab is inactive, no animations are scheduled and therefore dispatch of the animation event will be significantly delayed.

Afterword

PDF.js is open source, you can view the code for the Chrome extension at https://github.com/mozilla/pdf.js/tree/master/extensions/chromium . If you go through the source, you will notice that the code is a little more complex than I explained here. This is because extensions cannot forward requests on an event onHeadersReceived

until I implemented it a few months ago ( crbug.com/280464 , Chrome 35).

And there is also some logic to make the url in the omnibox look a little better.

The PDF.js extension continues to evolve, so unless you want to significantly change the PDF Viewer UI, I suggest asking users to install the official PDF.js PDF Viewer in the Chrome Web Store and / or open issues in PDF.js issue tracker for reasonable queries functions.

+10


source


Custom PDF viewer

Basically, to accomplish what you want to do, you need to:

  • Insert PDF url on upload;
  • Stop PDF download,
  • Start your own PDF viewer and download the PDF inside it.

how

  • Using the API chrome.webRequest

    , you can easily listen to web requests made by Chrome, and more specifically those that will download files .pdf

    . Using an event chrome.webRequest.onBeforeRequest

    , you can listen for all requests ending in ".pdf" and get the url of the requested resource.

  • Create a page, for example display_pdf.html

    , where you will see PDF files and do whatever you want with them.

  • In a listener, chrome.webRequest.onBeforeRequest

    don't load a return resource {redirectUrl: ...}

    to redirect to the page display_pdf.html

    .

  • Pass the PDF url to your page. This can be done in several ways, but for me the easiest is to add the encoded PDF url at the end of your page url, like an encoded query string, something like display_pdf.html?url=http%3A%2F%2Fwww.example.com%2Fexample.pdf

    .

  • Inside the page, you get the URL-address with JavaScript and process and will render PDF from any library you want, for example pdf.js .

Code

Following the steps above, your extension will look like this:

<root>/
    /background.js
    /display_pdf.html
    /display_pdf.js
    /manifest.json

      

So, first of all, let's take a look at the file : you will need to declare permissions for and , so it should look like this: manifest.json

webRequest

webRequestBlocking

{
    "manifest_version": 2,

    "name": "PDF Test",
    "version": "0.0.1",

    "background": {
        "scripts": ["/background.js"] 
    },

    "permissions": ["webRequest", "webRequestBlocking", "<all_urls>"],
}

      



Then in you listen for the event and update the tab that loads the PDF with the url of your custom page , like so: background.js

chrome.webRequest.onBeforeRequest

display_pdv.html

chrome.webRequest.onBeforeRequest.addListener(function(details) {
    var displayURL;

    if (/\.pdf$/i.test(details.url)) { // if the resource is a PDF file ends with ".pdf"
        displayURL = chrome.runtime.getURL('/display_pdf.html') + '?url=' + encodeURIComponent(details.url);

        return {redirectUrl: displayURL};
        // stop the request and proceed to your custom display page
    }   
}, {urls: ['*://*/*.pdf']}, ['blocking']);

      

And finally, in your file, you extract the PDF url from the query string and use it to do whatever you want: display_pdf.js

var PDF_URL = decodeURIComponent(location.href.split('?url=')[1]);
// this will be something like http://www.somesite.com/path/to/example.pdf

alert('The PDF url is: ' + PDF_URL);
// do something with the pdf... like processing it with PDF.js

      

Working example

A working example of what I said above can be found HERE .

Links to documentation

I recommend that you check out the official documentation of the above APIs which you can find at the following links:

+2


source







All Articles