Parsing urls using C strings in C ++
I am learning C ++ for one of my CS classes and for our first project I need to parse some urls using c-strings (i.e. I cannot use the C ++ String class).
The only way I can think of approaching this is to just iterate (since it's char []) and use some switch statements. From someone more experienced in C ++ - is there a better approach? Could you point me to a good online resource? I haven't found it yet.
It's strange that you are not allowed to use the features of the C ++ language, that is, C ++ strings!
The C standard library has some C string functions.
eg.
strdup - duplicate a string
strtok - breaking a string into tokens. Beware - this modifies the original string.
strcpy - copying string
strstr - find string in string
strncpy - copy up to n bytes of string
etc
There is a good online link here with a complete list of available c string functions for finding and finding things.
http://www.cplusplus.com/reference/clibrary/cstring/
You can step through the strings by accessing them like an array if you need to.
eg.
char* url="http://stackoverflow.com/questions/1370870/c-strings-in-c"
int len = strlen(url);
for (int i = 0; i < len; ++i){
std::cout << url[i];
}
std::cout << endl;
As for actually how to do the parsing, you have to work on your own. This is the destination after all.
There are a number of standard C library functions that can help you.
First, let's look at the standard C library function strtok . This allows you to extract parts of a C string separated by specific delimiters. For example, you can do / delimited tokenization to get the protocol, domain, and then the file path. You can spoof a domain with a delimiter. get subdomain (s), second level domain and top level domain. Etc.
It's not nearly as powerful as the regex parser you really need to parse URLs, but it works on C strings, is part of the C standard library, and is probably suitable for your intended use.
Other C library functions that might help:
- strstr () Extracts substrings just like std :: string :: substr ()
- strspn (), strchr (), and strpbrk () Find a character or characters in a string similar to std :: string :: find_first_of () etc.
Edit: A reminder that the correct way to use these functions in C ++ is to include <cstring>
and use them in the std :: namespace for example. std :: strtok ().
You might want to refer to an open source library that can parse URLs (as a link to how others have done), such as not copy and paste them!) Like curl or wget (links are directly linked to their url parsing files).
I don't know what the requirements are for parsing urls, but if it is a CS level it would make sense to use (very simple) BNF and (very simple) recursive descent parser.
This will provide a more robust solution than direct iteration, for example. for bad urls.
Very few string functions from the C standard library are needed.
You can use C functions like strtok, strchr, strstr, etc.
Many of the mentioned runtime library features that have been mentioned work reasonably well, either in conjunction with or with the approach to iterating through a string that you mentioned (which, in my opinion, is honored in time).