Parsing urls using C strings in C ++

I am learning C ++ for one of my CS classes and for our first project I need to parse some urls using c-strings (i.e. I cannot use the C ++ String class).

The only way I can think of approaching this is to just iterate (since it's char []) and use some switch statements. From someone more experienced in C ++ - is there a better approach? Could you point me to a good online resource? I haven't found it yet.

+2


source to share


6 answers


It's strange that you are not allowed to use the features of the C ++ language, that is, C ++ strings!

The C standard library has some C string functions.

eg.

strdup - duplicate a string
strtok - breaking a string into tokens. Beware - this modifies the original string.
strcpy - copying string
strstr - find string in string
strncpy - copy up to n bytes of string
etc

      

There is a good online link here with a complete list of available c string functions for finding and finding things.



http://www.cplusplus.com/reference/clibrary/cstring/

You can step through the strings by accessing them like an array if you need to.

eg.

char* url="http://stackoverflow.com/questions/1370870/c-strings-in-c"
int len = strlen(url);
for (int i = 0; i < len; ++i){
  std::cout << url[i];
}
std::cout << endl;

      

As for actually how to do the parsing, you have to work on your own. This is the destination after all.

+6


source


There are a number of standard C library functions that can help you.

First, let's look at the standard C library function strtok . This allows you to extract parts of a C string separated by specific delimiters. For example, you can do / delimited tokenization to get the protocol, domain, and then the file path. You can spoof a domain with a delimiter. get subdomain (s), second level domain and top level domain. Etc.

It's not nearly as powerful as the regex parser you really need to parse URLs, but it works on C strings, is part of the C standard library, and is probably suitable for your intended use.



Other C library functions that might help:

  • strstr () Extracts substrings just like std :: string :: substr ()
  • strspn (), strchr (), and strpbrk () Find a character or characters in a string similar to std :: string :: find_first_of () etc.

Edit: A reminder that the correct way to use these functions in C ++ is to include <cstring>

and use them in the std :: namespace for example. std :: strtok ().

+5


source


You might want to refer to an open source library that can parse URLs (as a link to how others have done), such as not copy and paste them!) Like curl or wget (links are directly linked to their url parsing files).

+2


source


I don't know what the requirements are for parsing urls, but if it is a CS level it would make sense to use (very simple) BNF and (very simple) recursive descent parser.

This will provide a more robust solution than direct iteration, for example. for bad urls.

Very few string functions from the C standard library are needed.

+1


source


You can use C functions like strtok, strchr, strstr, etc.

0


source


Many of the mentioned runtime library features that have been mentioned work reasonably well, either in conjunction with or with the approach to iterating through a string that you mentioned (which, in my opinion, is honored in time).

0


source







All Articles