How do I read the html content of a specific url using a Firefox addon?

I want to create an addon that will load the html content of a specific url and save a specific line of that page and then navigate to that url. I've read a lot about Mozila.org about web page content, but I don't understand how to read the html content.

0


source to share


3 answers


Here is a simple snippet that makes an XHR request, WITHOUT cookies. Don't worry about cross-origin as you are working in scope, meaning you are not coding this on a website, but as a Firefox addon.

var {Cu: utils, Cc: classes, Ci: instances} = Components;
Cu.import('resource://gre/modules/Services.jsm');
function xhr(url, cb) {
    let xhr = Cc["@mozilla.org/xmlextras/xmlhttprequest;1"].createInstance(Ci.nsIXMLHttpRequest);

    let handler = ev => {
        evf(m => xhr.removeEventListener(m, handler, !1));
        switch (ev.type) {
            case 'load':
                if (xhr.status == 200) {
                    cb(xhr.response);
                    break;
                }
            default:
                Services.prompt.alert(null, 'XHR Error', 'Error Fetching Package: ' + xhr.statusText + ' [' + ev.type + ':' + xhr.status + ']');
                break;
        }
    };

    let evf = f => ['load', 'error', 'abort'].forEach(f);
    evf(m => xhr.addEventListener(m, handler, false));

    xhr.mozBackgroundRequest = true;
    xhr.open('GET', url, true);
    xhr.channel.loadFlags |= Ci.nsIRequest.LOAD_ANONYMOUS | Ci.nsIRequest.LOAD_BYPASS_CACHE | Ci.nsIRequest.INHIBIT_PERSISTENT_CACHING;
    //xhr.responseType = "arraybuffer"; //dont set it, so it returns string, you dont want arraybuffer. you only want this if your url is to a zip file or some file you want to download and make a nsIArrayBufferInputStream out of it or something
    xhr.send(null);
}

      



An example using this snippet:

var href = 'http://www.bing.com/'
xhr(href, data => {
    Services.prompt.alert(null, 'XHR Success', data);
});

      

+1


source


Without knowing the page and url to find on it, I cannot create a complete solution, but here is a Greasemonkey script example I wrote that does something similar.

This script is for Java articles on DZone. When an article has a link to a source, it is redirected to that source page:

// ==UserScript==
// @name        DZone source
// @namespace   com.kwebble
// @description Directly go to the source of a DZone article.
// @include     http://java.dzone.com/*
// @version     1
// @grant       none
// ==/UserScript==

var node = document.querySelector('a[target="_blank"]');

if (node !== null) {
    document.location = node.getAttribute('href');
}

      

Using:

  • Install Greasemonkey if you haven't already.
  • Create a script similar to mine. Set the value for @include to the page containing the found URL.
  • You have to determine what identifies the portion of the page with the target URL and change your script to find that URL. For my script, this is a link with the target "_blank".


After saving the script, navigate to the page with the link. Greasemonkey should execute your script and redirect the browser.

[edit] This looks for script tags for text as described and redirects.

// ==UserScript==
// @name        Test
// @namespace   com.kwebble
// @include     your_page
// @version     1
// @grant       none
// ==/UserScript==

var nodes = document.getElementsByTagName('script'),
    i, matches;

for (i = 0; i < nodes.length; i++) {
    if (nodes.item(i).innerHTML !== '') {
        matches = nodes.item(i).innerHTML.match(/windows\.location = "(.*?).php";/);

        if (matches !== null){
            document.location = matches[1];
        }
    }
}

      

The regex to search for a URL may need some tweaking to match the exact content of the page.

+1


source


The Addon or GreaseMonkey script has a similar approach, but the addon can use Firefox's native APIs. (but this is much more complicated than scripts)

Basically, this is a process (without knowing your exact requirements)

  • Get the content of the remote url with XMLHttpReques()

  • Get the data you need with RegEx or DOMParser()

  • Change the current url to this target with location.replace()

0


source







All Articles