No cookie generated by PHP and cURL Login from remote site

I am not getting a successful login before $ loginUrl (i.e. the cookie.txt is not created in the same directory as the file) based on the code below and therefore I am unable to load the HTML data from $ url (i.e. is not downloading). When I view curl_exec for loginUrl it looks like it is not submitting username and password to the form, although I have $ store = curl_exec ($ ch) as the form is displayed instead of a successful login.

function parseDOM($data)
{
  global $projectID, $sRedirect, $database;
  libxml_use_internal_errors(true);
  $dom = new DOMDocument();
  if(!$dom->loadHTML($data))
  {
    echo "did not load";
  }
}

$ch = @curl_init();
if($ch)
{
  $username = 'username';
  $password = 'password';
  //$url = 'https://global-factiva-com.libproxy.lib.unc.edu/ha/default.aspx#./!?&_suid=14977301633480007720669669887936';
  //trying different URL
  $url = 'https://global.factiva.com.libproxy.lib.unc.edu/redir/default.aspx?P=sa&NS=16&AID=9UNI011500&f=g&an=j000000020010807dw8b00lc2&cat=a';
  //loginUrl is the same as the URL for the form post action
  $loginUrl = 'https://sso.unc.edu/idp/profile/SAML2/POST/SSO;jsessionid=A2C0B6480084BED37E1104E903B07AA9?execution=e1s1';

  //Set the URL to work with
  curl_setopt($ch, CURLOPT_URL, $loginUrl);
  // ENABLE HTTP POST
  curl_setopt($ch, CURLOPT_POST, 1);
  //Set the post parameters
  curl_setopt($ch, CURLOPT_POSTFIELDS, 'j_username='.$username.'&j_password='.$password);
  //Handle cookies for the login
  $cookie=dirname(__FILE__)."\\cookie.txt";
  curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie);
  curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie);
  //execute the request (the login)
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
  curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
  $store = curl_exec($ch);

  //now access the URL that requires login
  curl_setopt($ch, CURLOPT_URL, $url);
  curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
  $content=curl_exec($ch);
  $headers = curl_getinfo($ch);

  curl_close($ch);
  parseDOM($content);

}

      

+3


source to share


2 answers


This is the approach I would use. First use Google Chrome and open the network inspector. If you manually login, you will be able to see all submitted request headers, form fields, etc.

Armed with this information, you can create a curl request and specify all custom headers. I have worked with systems prior to this rejection requests without legal summarization or user agent.

So for example ..



<?php

$username = 'hello';
$password = 'letmein';

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,"https://sso.unc.edu/idp/profile/SAML2/POST/SSO?execution=e1s1");
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS,'j_username:='.$username.'&j_password:='.$password.'&_eventId_proceed:=');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookie.txt');
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookie.txt');
curl_setopt($ch, CURLOPT_TIMEOUT, 10);

$headers = [
    'Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
    'Accept-Encoding:gzip, deflate, br',
    'Accept-Language:en-US,en;q=0.8,es;q=0.6',
    'Cache-Control:max-age=0',
    'Connection:keep-alive',
    'Content-Length:57',
    'Content-Type:application/x-www-form-urlencoded',
    'Host:sso.unc.edu',
    'Origin:https://sso.unc.edu',
    'Referer:https://sso.unc.edu/idp/profile/SAML2/POST/SSO?execution=e1s1',
    'Upgrade-Insecure-Requests:1',
    'User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'
];

curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);

$output = curl_exec ($ch);

curl_close ($ch);

echo $output;

?>

      

Once you run this, we hope you are logged in and the cookie is set. Then you can make a second request using the new one curl_init()

to the second url and include the CURLOPT_COOKIEFILE

and parameters CURLOPT_COOKIEJAR

.

Hope this gives you something to work with. Good luck.

+1


source


you didn’t tell us where you want to log in, but in the comment you posted this link https://auth.lib.unc.edu/ezproxy_auth.php?url=https://global.factiva.com/ha/default.aspx

, which itself links to 4 different login pages. however, the CURLOPT_VERBOSE log you posted in the File Generated PHP cookie and cURL Login from a remote site suggests that you are trying to login to a site called Onyen

. after some research it turns out that they have a very strange login system that starts at https://auth.lib.unc.edu/ezproxy_auth.php?url=https://global.factiva.com/ha/default.aspx - makes a GET request to that url, this will create the cookie session you need for all subsequent requests and provide you with the information you need in HTML. parse the HTML <form

element containing the element <input

containingOnyen

in value

-property (the input form to look for looks like <input name="submit" value="Onyen Sign In" accesskey="o" type="submit">

) this form element gives you 3 which you need to add to the next GET request whose url you get from the attribute action

. I suspect all values ​​are constant except for the 1 called auth

which is probably unique per cookie session or IP address or something. The url generated in my browser (and later php) turned out to be https://auth.lib.unc.edu/authentication.php?url=https://global.factiva.com/ha/default.aspx&auth=shibboleth&submit=Onyen+Sign+In

- now if you did everything right by creating the correct url and submitting it using the cookies received in the previous request it should respond with 302 Found

http-redirect. which you must follow. after this redirect you get another page with html with one tag<form

whose url you should retrieve, and the elements <input

that name and evaluate you should parse and add to your netext POST request that goes to https://sso.unc.edu/idp/profile/SAML2/POST/SSO

- now this POST gives a 302 Found http redirect which you should also follow. now when you redirect you finally get to https://sso.unc.edu/idp/profile/SAML2/POST/SSO?execution=e2s1

where there is html with one tag <form

, with elements <input

whose names and values ​​you have to parse and fill in j_username

and j_password

inputs and add to the next POST request that is sent to https://sso.unc.edu/idp/profile/SAML2/POST/SSO?execution=e2s1

- now sends this request POST with valid username / password and session cookie, you will probably be logged in. Here's an implementation using DOMDocument / DOMXpath for HTML parsing, and hhb_curl fromhttps://github.com/divinity76/hhb_.inc.php/blob/master/hhb_.inc.php for http / cookies (its libcurl wrapper), just replace username_here

with real username and password_here

with real password on lines 72 and 73.

<?php
declare(strict_types = 1);
require_once ('hhb_.inc.php');
function getFormUrl(\hhb_curl $hc, \DOMNode $form): string {
    $url = $form->getAttribute ( "action" );
    if (empty ( $url )) {
        $url = '';
    }
    if (! parse_url ( $url, PHP_URL_HOST )) {
        $url = 'https://' . rtrim ( parse_url ( $hc->getinfo ( CURLINFO_EFFECTIVE_URL ), PHP_URL_HOST ), '/' ) . '/' . ltrim ( $url, '/' );
    }
    if (false === strpos ( $url, '?' )) {
        $url .= '?';
    }
    return $url;
}
$hc = new hhb_curl ( 'https://auth.lib.unc.edu/ezproxy_auth.php?url=https://global.factiva.com/ha/default.aspx', true );
$hc->exec ();
// hhb_var_dump ( $hc->getStdErr (), $hc->getStdOut () );
$domd = @DOMDocument::loadHTML ( $hc->getResponseBody () );
$form = (new DOMXPath ( $domd ))->query ( '//input[contains(@value,\'Onyen Sign In\')]/parent::form' )->item ( 0 );
$url = getFormUrl ( $hc, $form );
// probably looks like $url = 'https://auth.lib.unc.edu/authentication.php?';
$queryparms = array ();
foreach ( $form->getElementsByTagName ( "input" ) as $input ) {
    $url .= urlencode ( $input->getAttribute ( "name" ) ) . '=' . urlencode ( $input->getAttribute ( "value" ) ) . '&';
}
$url = substr ( $url, 0, - 1 );
// hhb_var_dump ( $url );
$hc->exec ( $url );
// hhb_var_dump ( $hc->getStdErr (), $hc->getStdOut () );
$domd = @DOMDocument::loadHTML ( $hc->getResponseBody () );
$form = $domd->getElementsByTagName ( "form" )->item ( 0 );
$url = getFormUrl ( $hc, $form );
$posts = array ();
foreach ( $form->getElementsByTagName ( "input" ) as $input ) {
    $name = $input->getAttribute ( "name" );
    if (empty ( $name )) {
        continue;
    }
    $posts [$name] = $input->getAttribute ( "value" );
}
// hhb_var_dump ( $posts );
$hc->setopt_array ( array (
        CURLOPT_POST => true,
        CURLOPT_POSTFIELDS => http_build_query ( $posts ),
        CURLOPT_URL => $url 
) );
$hc->exec ();
// hhb_var_dump ( $hc->getStdErr (), $hc->getStdOut () );
$domd = @DOMDocument::loadHTML ( $hc->getResponseBody () );
$form = $domd->getElementsByTagName ( "form" )->item ( 0 );
$url = getFormUrl ( $hc, $form );
$posts = array ();
foreach ( $form->getElementsByTagName ( "input" ) as $input ) {
    $name = $input->getAttribute ( "name" );
    if (empty ( $name )) {
        continue;
    }
    $posts [$name] = $input->getAttribute ( "value" );
}
foreach ( $form->getElementsByTagName ( "button" ) as $button ) {
    $name = $button->getAttribute ( "name" );
    if (empty ( $name )) {
        continue;
    }
    $posts [$name] = $button->getAttribute ( "value" );
}

assert ( isset ( $posts ['j_username'] ), 'failed to find the username input!' );
assert ( isset ( $posts ['j_password'] ), 'failed to find the password input!' );
$posts ['j_username'] = 'username_here';
$posts ['j_password'] = 'password_here';
hhb_var_dump ( $posts );
$hc->setopt_array ( array (
        CURLOPT_POST => true,
        CURLOPT_POSTFIELDS => http_build_query ( $posts ),
        CURLOPT_URL => $url 
) );
$hc->exec ();
hhb_var_dump ( $hc->getStdErr (), $hc->getStdOut () );

      



edit: Fixed a bug where the element name was <button>

not appended to the POST data.

+1


source







All Articles