How do I get a list of TLDs using bash to create a regex?

When looking for email addresses and hostnames, we would like to improve the existing regular expression to only search for existing public TLDs .

We need one bash command where we can copy and paste its output into our regex .

We already had a trial version (co | com) that only matches "co" and doesn't match the full "com" for .com domains, so the tld list needs to be sorted with longest TLDs first .

Can anyone supply a single line "copy and paste" bash command that outputs the most recent list of TLDs sorted and formatted?


Using @Alex_Volkov his answer to Regular Expression to Match DNS Hostname or IP Address? we pointed to is the source for the TLD.

With the help of @thiton's answer in Sorting rows from longest to shortest, the output can be sorted to include the longest TLDs.

Result in this one liner:

$ curl -s | sed '1d; s/^ *//; s/ *$//; /^$/d' | awk '{print length" "$0}' | sort -rn | cut -d' ' -f2- | tr '\n' '|' | tr '[:upper:]' '[:lower:]' | sed 's/\(.*\)./\1/'

which outputs well the desired part of the TLD regex:




