No cookie generated by PHP and cURL Login from remote site
I am not getting a successful login before $ loginUrl (i.e. the cookie.txt is not created in the same directory as the file) based on the code below and therefore I am unable to load the HTML data from $ url (i.e. is not downloading). When I view curl_exec for loginUrl it looks like it is not submitting username and password to the form, although I have $ store = curl_exec ($ ch) as the form is displayed instead of a successful login.
function parseDOM($data)
{
global $projectID, $sRedirect, $database;
libxml_use_internal_errors(true);
$dom = new DOMDocument();
if(!$dom->loadHTML($data))
{
echo "did not load";
}
}
$ch = @curl_init();
if($ch)
{
$username = 'username';
$password = 'password';
//$url = 'https://global-factiva-com.libproxy.lib.unc.edu/ha/default.aspx#./!?&_suid=14977301633480007720669669887936';
//trying different URL
$url = 'https://global.factiva.com.libproxy.lib.unc.edu/redir/default.aspx?P=sa&NS=16&AID=9UNI011500&f=g&an=j000000020010807dw8b00lc2&cat=a';
//loginUrl is the same as the URL for the form post action
$loginUrl = 'https://sso.unc.edu/idp/profile/SAML2/POST/SSO;jsessionid=A2C0B6480084BED37E1104E903B07AA9?execution=e1s1';
//Set the URL to work with
curl_setopt($ch, CURLOPT_URL, $loginUrl);
// ENABLE HTTP POST
curl_setopt($ch, CURLOPT_POST, 1);
//Set the post parameters
curl_setopt($ch, CURLOPT_POSTFIELDS, 'j_username='.$username.'&j_password='.$password);
//Handle cookies for the login
$cookie=dirname(__FILE__)."\\cookie.txt";
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie);
//execute the request (the login)
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$store = curl_exec($ch);
//now access the URL that requires login
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
$content=curl_exec($ch);
$headers = curl_getinfo($ch);
curl_close($ch);
parseDOM($content);
}
source to share
This is the approach I would use. First use Google Chrome and open the network inspector. If you manually login, you will be able to see all submitted request headers, form fields, etc.
Armed with this information, you can create a curl request and specify all custom headers. I have worked with systems prior to this rejection requests without legal summarization or user agent.
So for example ..
<?php
$username = 'hello';
$password = 'letmein';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,"https://sso.unc.edu/idp/profile/SAML2/POST/SSO?execution=e1s1");
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS,'j_username:='.$username.'&j_password:='.$password.'&_eventId_proceed:=');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookie.txt');
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookie.txt');
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$headers = [
'Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Encoding:gzip, deflate, br',
'Accept-Language:en-US,en;q=0.8,es;q=0.6',
'Cache-Control:max-age=0',
'Connection:keep-alive',
'Content-Length:57',
'Content-Type:application/x-www-form-urlencoded',
'Host:sso.unc.edu',
'Origin:https://sso.unc.edu',
'Referer:https://sso.unc.edu/idp/profile/SAML2/POST/SSO?execution=e1s1',
'Upgrade-Insecure-Requests:1',
'User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'
];
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
$output = curl_exec ($ch);
curl_close ($ch);
echo $output;
?>
Once you run this, we hope you are logged in and the cookie is set. Then you can make a second request using the new one curl_init()
to the second url and include the CURLOPT_COOKIEFILE
and parameters CURLOPT_COOKIEJAR
.
Hope this gives you something to work with. Good luck.
source to share
you didn’t tell us where you want to log in, but in the comment you posted this link https://auth.lib.unc.edu/ezproxy_auth.php?url=https://global.factiva.com/ha/default.aspx
, which itself links to 4 different login pages. however, the CURLOPT_VERBOSE log you posted in the File Generated PHP cookie and cURL Login from a remote site suggests that you are trying to login to a site called Onyen
. after some research it turns out that they have a very strange login system that starts at https://auth.lib.unc.edu/ezproxy_auth.php?url=https://global.factiva.com/ha/default.aspx - makes a GET request to that url, this will create the cookie session you need for all subsequent requests and provide you with the information you need in HTML. parse the HTML <form
element containing the element <input
containingOnyen
in value
-property (the input form to look for looks like <input name="submit" value="Onyen Sign In" accesskey="o" type="submit">
) this form element gives you 3 which you need to add to the next GET request whose url you get from the attribute action
. I suspect all values are constant except for the 1 called auth
which is probably unique per cookie session or IP address or something. The url generated in my browser (and later php) turned out to be https://auth.lib.unc.edu/authentication.php?url=https://global.factiva.com/ha/default.aspx&auth=shibboleth&submit=Onyen+Sign+In
- now if you did everything right by creating the correct url and submitting it using the cookies received in the previous request it should respond with 302 Found
http-redirect. which you must follow. after this redirect you get another page with html with one tag<form
whose url you should retrieve, and the elements <input
that name and evaluate you should parse and add to your netext POST request that goes to https://sso.unc.edu/idp/profile/SAML2/POST/SSO
- now this POST gives a 302 Found http redirect which you should also follow. now when you redirect you finally get to https://sso.unc.edu/idp/profile/SAML2/POST/SSO?execution=e2s1
where there is html with one tag <form
, with elements <input
whose names and values you have to parse and fill in j_username
and j_password
inputs and add to the next POST request that is sent to https://sso.unc.edu/idp/profile/SAML2/POST/SSO?execution=e2s1
- now sends this request POST with valid username / password and session cookie, you will probably be logged in. Here's an implementation using DOMDocument / DOMXpath for HTML parsing, and hhb_curl fromhttps://github.com/divinity76/hhb_.inc.php/blob/master/hhb_.inc.php for http / cookies (its libcurl wrapper), just replace username_here
with real username and password_here
with real password on lines 72 and 73.
<?php
declare(strict_types = 1);
require_once ('hhb_.inc.php');
function getFormUrl(\hhb_curl $hc, \DOMNode $form): string {
$url = $form->getAttribute ( "action" );
if (empty ( $url )) {
$url = '';
}
if (! parse_url ( $url, PHP_URL_HOST )) {
$url = 'https://' . rtrim ( parse_url ( $hc->getinfo ( CURLINFO_EFFECTIVE_URL ), PHP_URL_HOST ), '/' ) . '/' . ltrim ( $url, '/' );
}
if (false === strpos ( $url, '?' )) {
$url .= '?';
}
return $url;
}
$hc = new hhb_curl ( 'https://auth.lib.unc.edu/ezproxy_auth.php?url=https://global.factiva.com/ha/default.aspx', true );
$hc->exec ();
// hhb_var_dump ( $hc->getStdErr (), $hc->getStdOut () );
$domd = @DOMDocument::loadHTML ( $hc->getResponseBody () );
$form = (new DOMXPath ( $domd ))->query ( '//input[contains(@value,\'Onyen Sign In\')]/parent::form' )->item ( 0 );
$url = getFormUrl ( $hc, $form );
// probably looks like $url = 'https://auth.lib.unc.edu/authentication.php?';
$queryparms = array ();
foreach ( $form->getElementsByTagName ( "input" ) as $input ) {
$url .= urlencode ( $input->getAttribute ( "name" ) ) . '=' . urlencode ( $input->getAttribute ( "value" ) ) . '&';
}
$url = substr ( $url, 0, - 1 );
// hhb_var_dump ( $url );
$hc->exec ( $url );
// hhb_var_dump ( $hc->getStdErr (), $hc->getStdOut () );
$domd = @DOMDocument::loadHTML ( $hc->getResponseBody () );
$form = $domd->getElementsByTagName ( "form" )->item ( 0 );
$url = getFormUrl ( $hc, $form );
$posts = array ();
foreach ( $form->getElementsByTagName ( "input" ) as $input ) {
$name = $input->getAttribute ( "name" );
if (empty ( $name )) {
continue;
}
$posts [$name] = $input->getAttribute ( "value" );
}
// hhb_var_dump ( $posts );
$hc->setopt_array ( array (
CURLOPT_POST => true,
CURLOPT_POSTFIELDS => http_build_query ( $posts ),
CURLOPT_URL => $url
) );
$hc->exec ();
// hhb_var_dump ( $hc->getStdErr (), $hc->getStdOut () );
$domd = @DOMDocument::loadHTML ( $hc->getResponseBody () );
$form = $domd->getElementsByTagName ( "form" )->item ( 0 );
$url = getFormUrl ( $hc, $form );
$posts = array ();
foreach ( $form->getElementsByTagName ( "input" ) as $input ) {
$name = $input->getAttribute ( "name" );
if (empty ( $name )) {
continue;
}
$posts [$name] = $input->getAttribute ( "value" );
}
foreach ( $form->getElementsByTagName ( "button" ) as $button ) {
$name = $button->getAttribute ( "name" );
if (empty ( $name )) {
continue;
}
$posts [$name] = $button->getAttribute ( "value" );
}
assert ( isset ( $posts ['j_username'] ), 'failed to find the username input!' );
assert ( isset ( $posts ['j_password'] ), 'failed to find the password input!' );
$posts ['j_username'] = 'username_here';
$posts ['j_password'] = 'password_here';
hhb_var_dump ( $posts );
$hc->setopt_array ( array (
CURLOPT_POST => true,
CURLOPT_POSTFIELDS => http_build_query ( $posts ),
CURLOPT_URL => $url
) );
$hc->exec ();
hhb_var_dump ( $hc->getStdErr (), $hc->getStdOut () );
edit: Fixed a bug where the element name was <button>
not appended to the POST data.
source to share