Java - How to Download Complete HTML Site Source

I am trying to load FULL HTML site's source code String

in Java. I've tried several approaches, however, I get almost all of the source code. To make it worse: One of the main parts I don't get is the part I need the most!

+3


source to share


2 answers


URL url = new URL("http://www.website.com");
URLConnection spoof = url.openConnection();

//Spoof the connection so we look like a web browser
spoof.setRequestProperty( "User-Agent", "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0;    H010818)" );
BufferedReader in = new BufferedReader(new InputStreamReader(spoof.getInputStream()));
String strLine = "";
String finalHTML = "";
//Loop through every line in the source
while ((strLine = in.readLine()) != null){
   finalHTML += strLine;
}

      



+5


source


Maybe because the content you are looking for is loaded dynamically, via ajax / javascript.



for example, a website might contain an empty DIV tag that will be filled with many things only after the page has loaded (via an AJAX call elsewhere).

+5


source







All Articles