Regex remove company type from name

I am new to Regex and am trying to learn.

I am creating a merge tool and want to use RegExp

it to give me more flexibility and control. One of the placeholders I am replacing is company_name

.

I have a list of companies. Many of them have a company name (for example, "My Company , Inc. ) or" My Company LLC ). I would like to use a regular expression to standardize the results. However, I'm not sure how to write it other than manually listing each option. For example, each of these names should result in the same meaning at the end:

  • My company LLC
  • My company, LLC
  • My company, Inc.
  • My company, Inc
  • MY Company Inc.
  • My company Inc
  • My company Co
  • My company

And on and on ...

I believe I can use this to achieve the results I want:

var companyName = lead.company_name;
companyName = companyName.replace(/(, Inc.)|( Inc.)|(, LLC)/gi, '');

      

However, I was hoping there was a more efficient way:

  • Capturing changes
  • Make sure the type of company is always at the end.
  • Include commas and periods if they exist, but don't need to list all parameters with and without

WARNING: I have to consider the possibility of company type characters existing in the actual name (eg My Co

mpany Co

) and only remove the organization type at the end.

Can this be done easily?

+3


source to share


3 answers


If each company name is a string on its own, you can try the following regex:

/,?\s*(llc|inc|co)\.?$/i

Explanation:



  • Extra comma
  • Extra spaces
  • Or one of LLC / Inc / Co (case insensitive)
  • Additional period
  • All of the above at the end of the line

const companyNames = [
'My Company LLC',
'My Company, LLC',
'My Company, Inc.',
'My Company, Inc',
'MY Company Inc.',
'My Company Inc',
'My Company Co',
'My Company',
];

console.log(companyNames.map(name => name.replace(/,?\s*(llc|inc|co)\.?$/i, '')));
      

Run codeHide result


+3


source


Yes, there is a more efficient way (if by efficient we mean a shorter one), although multi-convention patterns like this often lead to a trade-off between conciseness and readability.

This is a subgroup question to avoid repetition.

var rgx = /(, ?)?(LLC|Inc|Co)\.?$/i;

      



Let's break it down.

  • The first part, (, ?)?

    says that the name of the company should not necessarily be accompanied by a combination of comma and extra space. So it won't give any comma, comma with no space after it, or comma with space after it.

  • The second part (LLC|Inc|Co)

    is a simple subgroup in which all kinds of suffixes like

  • The last part \.?

    ,, allows for an extra period at the end (we avoid the period because in most REGEX implementations the period has a special meaning, matching any nonspatial character).

Note that you do not need a flag g

as (presumably) no company name will contain more than one suffix type. Also, the flag $

is useful here as it ensures that our match should be at the end of the company name, and not just somewhere inside it.

+5


source


I would do:

companyName = companyName .replace(/,?\h*(?:\b(?:inc|LLC|co)\b\.?)?$/i,"");

      

Explanation:

/                       : delimiter
    ,?                  : optional comma
    \h*                 : optional horizontal spaces
    (?:                 : non capture group
        \b              : word boundary
        (?:inc|LLC|co)  : non capture group, one of the alternatives
        \b              : word boundary
        \.?             : a dot, optional
    )?                  : end group, optional
    $                   : end of string
/i                      : delimiter, case insensitive

      

+2


source







All Articles