RegEx Compliance for Profile G + URL
I was trying to match only user id or vanity URIs for Google+ accounts. I am using GAS (Google Script Engine) which I have loaded XRegExp to match Unicode characters.
As long as I have this: ((https?://)?(plus\.)?google\.com/)?(.*/)?([a-zA-Z0-9._]*)($|\?.*)
which you can see in regular tests (external site) , it still doesn't just match the right-hand side.
I tried using \p{L}
internally [a-zA-Z0-9._]
but no luck with that. Also, I end up with an extra slash at the end of the profile name when it matches.
UPDATE # 1: I am trying to fix any G + url in a spreadsheet copied from a google form. Links are not all the same and the simplest profile link is https://plus.google.com/ "+ user id or vanity name.
UPDATE # 2: So far I have ([+]\w+|[0-9]{21})(?:\/)?(?:\w+)?$
using @demrks simplified version of @ guest271314's answer. However, there are two problems:
1) Google Vanity urls can have unicode in them. Example: https://plus.google.com/u/0/+JoseManuelGarcía_ertatto
that fails. I tried using \ p {L} but didn't seem to figure it out.
2) GAS doesn't seem to like this event, although regular tests are running on this site. = (
UPDATE # 3: It seems like GAS just hates using it \w
, so I had to extend it. So I have this so far:
/([+][A-Za-z0-9-_]+|[0-9]{21})(?:\/)?(?:[A-Za-z0-9-_]+)?$/
It even matches "/ about" or "/ posts" at the end of the url. However, still not UNICODE compliant. = (I am still working on this.
UPDATE # 4: . This seems to work: /([+][\\w-_\\p{L}]+|[\\d]{21})(?:\/)?(?:[\\w-_]+)?$/
Looks like I needed to do a double backslash towards the character classes. So it works for now. Not sure if there is a shorter way to use this.
source to share
Change, update
Try (v4)
document.URL.match(/\++\w+.*|\d+\d|\/+\w+$/).toString()
.replace(/\/+|posts|about|photos|videos|plusones|reviews/g, "")
eg.
var urls = ["https://plus.google.com/+google/posts"
, "https://plus.google.com/+google/about"
, "https://plus.google.com/+google/photos"
, "https://plus.google.com/+google/videos"
, "https://plus.google.com/+google/plusones"
, "https://plus.google.com/+google/reviews"
, "https://plus.google.com/communities/104645458102703754878"
, "https://plus.google.com/u/0/LONGIDHERE"
, "https://plus.google.com/u/0/+JoseManuelGarcía_ertatto"];
var _urls = [];
urls.forEach(function(item) {
_urls.push(item.match(/\++\w+.*|\d+\d|\/+\w+$/).toString()
.replace(/\/+|posts|about|photos|videos|plusones|reviews/g, ""));
});
_urls.forEach(function(id) {
var _id = document.createElement("div");
_id.innerHTML = id;
document.body.appendChild(_id)
});
source to share
Following a possible solution:
(?:\+)(\w+)|(?:\/)(\w+)$
Explanation:
-
1st alternative:
(?:\+)(\w+)
(?:\+)
Missing group:\+
Literally matches the character+
. Capture group(\w+)
:\w+
matches any character in the word [a-zA-Z0-9_]. Quantifier: one to unlimited times. -
The second alternative:
(?:\/)(\w+)$
.(?:\/)
Not an exciting group.\/
literally matches a character/
. Capturing a group(\w+)
.\w+
matches any character in the word[a-zA-Z0-9_]
. Quantifier: from one to unlimited time.$
approve the position at the end of the line.
Hope this is helpful!
source to share