Get root DNS record from php server; get domain name without www, ect
For this project: http://drupal.org/project/parallel
Using:
echo parallel_get_domain("www.robknight.org.uk") . "<br>";
echo parallel_get_domain("www.google.com") . "<br>";
echo parallel_get_domain("www.yahoo.com") . "<br>";
Functions:
/**
* Given host name returns top domain.
*
* @param $host
* String containing the host name: www.example.com
*
* @return string
* top domain: example.com
*/
function parallel_get_domain($host) {
if (strtoupper(substr(PHP_OS, 0, 3)) == 'WIN' && strnatcmp(phpversion(),'5.3.0') < 0) {
// This works 1/2 the time... CNAME doesn't work with nslookup
for ($end_pieces = substr_count($host, '.'); $end_pieces > 0; $end_pieces--) {
$test_domain = end(explode('.', $host, $end_pieces));
if (checkdnsrr($test_domain)) {
$domain = $test_domain;
break;
}
}
return isset($domain) ? $domain : FALSE;
}
else {
// This always works
$sections = explode('.', $host);
array_unshift($sections, '');
foreach($sections as $key => $value) {
$parts[$key] = $value;
$test_domain = implode('.', parallel_array_xor($parts, $sections));
if (checkdnsrr($test_domain, 'NS') && !checkdnsrr($test_domain, 'CNAME')) {
$domain = $test_domain;
break;
}
}
return isset($domain) ? $domain : FALSE;
}
}
/**
* Opposite of array_intersect().
*
* @param $array_a
* First array
* @param $array_b
* Second array
*
* @return array
*/
function parallel_array_xor ($array_a, $array_b) {
$union_array = array_merge($array_a, $array_b);
$intersect_array = array_intersect($array_a, $array_b);
return array_diff($union_array, $intersect_array);
}
/**
* Win compatible version of checkdnsrr.
*
* checkdnsrr() support for Windows by HM2K <php [spat] hm2k.org>
* http://us2.php.net/manual/en/function.checkdnsrr.php#88301
*
* @param $host
* String containing host name
* @param $type
* String containing the DNS record type
*
* @return bool
*/
function parallel_win_checkdnsrr($host, $type='MX') {
if (strtoupper(substr(PHP_OS, 0, 3)) != 'WIN') { return FALSE; }
if (empty($host)) { return FALSE; }
$types=array('A', 'MX', 'NS', 'SOA', 'PTR', 'CNAME', 'AAAA', 'A6', 'SRV', 'NAPTR', 'TXT', 'ANY');
if (!in_array($type, $types)) {
user_error("checkdnsrr() Type '$type' not supported", E_USER_WARNING);
return FALSE;
}
@exec('nslookup -type=' . $type . ' ' . escapeshellcmd($host), $output);
foreach($output as $line){
if (preg_match('/^' . $host . '/', $line)) { return TRUE; }
}
}
// Define checkdnsrr() if it doesn't exist
if (!function_exists('checkdnsrr')) {
function checkdnsrr($host, $type='MX') {
return parallel_win_checkdnsrr($host, $type);
}
}
Conclusion - Windows:
org.uk
google.com
yahoo.com
Output - Linux:
robknight.org.uk
google.com
yahoo.com
source to share
As you discovered, some countries only use TLDs (example: .tv, .us), others subdivide their country TLDs (example: uk).
Ideally, you would need a search list (it won't be long) of approved TLDs and, if split, TLDs with each division (eg ".co.uk" instead of ".uk") This will tell you which "dots" (right ) save. Then move one point to the left of it (if found) and slice everything up to it.
Without a lookup list, you can use the fact that subdivisions (.co, etc.) are only for countries (having two letter TLDs) and AFAIK no more than 3 characters and are always letters, so you can probably recognize them using a regular expression pattern.
Edit: Nevermind, the actual list of open suffixes is much more complicated. You will need to use a lookup table to figure out what the suffix is, go back to the previous point and trim to the left. RegEx is a bad solution. Instead, keep the list of suffixes in a dictionary, then test your domain name by ripping out one dotted part at a time from the left until you hit the match, then add back the part you just truncated.
source to share
Note: as pointed out in the comments, this method doesn't actually work in all cases. The reason for this is that some top-level domains allow IP addresses, even though most of them don't. Therefore, it is not possible to determine whether a given name is a top-level or pseudo-top-level domain name simply by checking if it has an IP address. Unfortunately, this probably means the search list is the only solution, given how, in practice, top-level domains are handled out of date.
Again, don't rely on the code below to work for you. I am leaving it here for educational purposes only.
There is a way to do this without a search list. The list may be unreliable or incomplete, whereas this method is guaranteed to work:
<?php
function get_domain($url) {
$dots = substr_count($url, '.');
$domain = '';
for ($end_pieces = $dots; $end_pieces > 0; $end_pieces--) {
$test_domain = end(explode('.', $url, $end_pieces));
if (dns_check_record($test_domain, 'A')) {
$domain = $test_domain;
break;
}
}
return $domain;
}
$my_domain = get_domain('www.robknight.org.uk');
echo $my_domain;
?>
In this case, it will output 'robknight.org.uk'. It will work equally well for .com, .edu, .com.au, .ly, or whatever top-level domain you are working on.
It works by starting from the right and doing a DNS check on the first thing that looks like it might be a viable domain name. In the example above, it starts with "org.uk" but finds that it is not the actual domain name, but a ccTLD. Then it goes to check "robknight.org.uk" which is valid and returns this. If the domain name was, say, "www.php.net", it would start by checking for "php.net", which is a valid domain name, and would return it immediately without a loop. I should also point out that if no valid domain name is found, an empty string ('') will be returned.
This code may not be suitable for processing a large number of domain names in a short amount of time due to the time it takes for DNS lookups, but it is great for single lookups or code that is not time critical.
source to share