Using String methods instead of Regex

since I'm not very familiar with regex, is it possible (difficult or not) to extract specific text between characters? eg:

<meta name="description" content="THIS IS THE TEXT I WANT TO EXTRACT" />

      

+2


source to share


4 answers


Since you are giving an example xml, just use an XML parser:

string s = (string) XElement.Parse(xml).Attribute("content");

      

xml is not a simple text format and is Regex

not very suitable; using an appropriate tool will protect you from a lot of evils ... for example, the following is identical as xml:

<meta
    name="description"
    content=
        'THIS IS THE TEXT I WANT TO EXTRACT'
/>

      



This also means that when the requirement changes, you have a simple setup to make the code do, instead of trying to expand the regex and put it back again (which can be tricky if you access a non-trivial node). Equally, xpath can be an option; so in your xpath data:

/meta/@content

      

is all you need.

If you don't have .NET 3.5:

XmlDocument doc = new XmlDocument();
doc.LoadXml(xml);
string s = doc.DocumentElement.GetAttribute("content");

      

+5


source


Of course, you can determine the start and end of your desired substring with string methods such as IndexOf

, and then get the desired one Substring

! In your example, you want to find (c IndexOf

) "contents =" and then the first is the next "

, right? And once you have those indices in the string, it Substring

will work fine. (Don't post code in C # because I'm not really sure what exactly it wants, except for IndexOf and Substring ... !-)

If yes, then:



int first = str.IndexOf("contents=\"");
int last = str.IndexOf("\"", first + 10);
return str.Substring(first + 10, last - first - 10);

      

should more or less do what you want (again apologize if in these hardcoded ones 10

they are out of order) they should be behind the length of the first substring you are looking for, adjust them slightly up or down until you get exactly that the result you want! -), but this is a general concept. Find the beginning with one argument IndexOf

, find the end with two arguments IndexOf

, cut off the desired chunk with Substring

...!

+2


source


if input: text1 / text2 / text3

The below regex will give the 2 in the group i.e, TEXT3

^([^/]*/){2}([^/]*)/$


if you need the last text always, then use the below

^.*/([^/]*)/$

      

+1


source


You can of course do this with Regex. Let's say you want to get the text between <and> ...

string GetTextBetween(string content)
{
  int start = content.IndexOf("<");
  if(start == -1) return null; // Not found.
  int end = content.IndexOf(">");
  if(end == -1) return null;  // end not found
  return content.SubString(start, end - start);
}

      

0


source







All Articles