Convert words from camelCase to snake_case in C
What I am trying to code is that if I enter camelcase
, it should just print camelcase
, but if it contains any uppercase letters, for example if I enter camelcase
, it should print camel_case
.
Below I work, but the problem is that if I enter,, camelcase
it outputs camel_ase
.
Can someone tell me the reason and how to fix it?
#include <stdio.h>
#include <ctype.h>
int main() {
char ch;
char input[100];
int i = 0;
while ((ch = getchar()) != EOF) {
input[i] = ch;
if (isupper(input[i])) {
input[i] = '_';
//input[i+1] = tolower(ch);
} else {
input[i] = ch;
}
printf("%c", input[i]);
i++;
}
}
source to share
Look at your code first and think about what happens when someone enters a word longer than 100 characters -> undefined. If you are using a buffer for input, you always need to add checks so that you don't overflow that buffer.
But then when you are printing characters directly, why do you need a buffer? This is completely unnecessary with the approach you are showing. Try the following:
#include <stdio.h>
#include <ctype.h>
int main()
{
int ch;
int firstChar = 1; // needed to also accept PascalCase
while((ch = getchar())!= EOF)
{
if(isupper(ch))
{
if (!firstChar) putchar('_');
putchar(tolower(ch));
} else
{
putchar(ch);
}
firstChar = 0;
}
}
Side note: I changed the type ch
to int
. This is due to the fact that the getchar()
returns int
, putchar()
, isupper()
and islower()
take int
, and they all use the value unsigned char
or EOF
. Since it is char
allowed to subscribe, on a signed platform char
, you will get undefined behavior calling these functions with negative char
. I know this is a little tricky. Another way to work around this problem is to always throw char
on unsigned char
when calling a function that takes the value unsigned char
as int
.
As you are using a buffer, and it is useless right now, you may be wondering that there is a possible solution using a buffer: reading and writing a whole line at a time. This is slightly more efficient than calling a function for every single character. Here's an example:
#include <stdio.h>
static size_t toSnakeCase(char *out, size_t outSize, const char *in)
{
const char *inp = in;
size_t n = 0;
while (n < outSize - 1 && *inp)
{
if (*inp >= 'A' && *inp <= 'Z')
{
if (n > outSize - 3)
{
out[n++] = 0;
return n;
}
out[n++] = '_';
out[n++] = *inp + ('a' - 'A');
}
else
{
out[n++] = *inp;
}
++inp;
}
out[n++] = 0;
return n;
}
int main(void)
{
char inbuf[512];
char outbuf[1024]; // twice the lenght of the input is upper bound
while (fgets(inbuf, 512, stdin))
{
toSnakeCase(outbuf, 1024, inbuf);
fputs(outbuf, stdout);
}
return 0;
}
This version also avoids isupper()
and tolower()
, but sacrifices portability. It only works if the character encoding has letters in sequence and has uppercase letters before lowercase letters. For ASCII, these assumptions are met. Keep in mind that what counts as a (capital) letter may also vary by language. The program above only works for the letters AZ, like in English.
source to share
There are two problems in the code:
- You insert one character in each branch
if
, while one of them should insert two characters, and - You print characters as you go, but the first branch should print both
_
as wellch
.
You can fix this by increasing i
on paste with i++
and printing the whole word at the end:
int ch; // <<== Has to be int, not char
char input[100];
int i = 0;
while((ch = getchar())!= EOF && (i < sizeof(input)-1)) {
if(isupper(ch)) {
if (i != 0) {
input[i++] = '_';
}
ch = tolower(ch);
}
input[i++] = ch;
}
input[i] = '\0'; // Null-terminate the string
printf("%s\n", input);
source to share
There are several problems with the code:
-
ch
is defined aschar
: you cannot validate end of file correctly ifc
not defined asint
.getc()
can return all values of the typeunsigned char
plus the special valueEOF
, which is negative. Determinech
howint
. -
You store bytes in an array
input
and useisupper(input[i])
.isupper()
is defined only for values returnedgetc()
, not for potentially negative values of a typechar
if that type is signed on the target system. Useisupper(ch)
orisupper((unsigned char)input[i])
. -
You don't check if enough is enough
i
before storing bytes ininput[i]
, causing a potential buffer overflow. Please note that there is no need to store characters in an array for your problem. -
You must insert
'_'
into the array and the character converted to lowercase. This is your main problem. -
To convert
Main
to_main
,Main
or leave as aMain
matter of specification.
Here's a simpler version:
#include <ctype.h>
#include <stdio.h>
int main(void) {
int c;
while ((c = getchar()) != EOF) {
if (isupper(c)) {
putchar('_');
putchar(tolower(c));
} else {
putchar(c);
}
}
return 0;
}
source to share
You don't need to use an array to display the entered characters in the form you showed. The program might look like this.
#include <stdio.h>
#include <ctype.h>
int main( void )
{
int c;
while ((c = getchar()) != EOF && c != '\n')
{
if (isupper(c))
{
putchar('_');
c = tolower(c);
}
putchar(c);
}
putchar('\n');
return 0;
}
If you want to use a character array, you must reserve one element of it for null termination if you want the array to contain a string.
In this case, the program might look like
#include <stdio.h>
#include <ctype.h>
int main( void )
{
char input[100];
const size_t N = sizeof(input) / sizeof(*input);
int c;
size_t i = 0;
while ( i + 1 < N && (c = getchar()) != EOF && c != '\n')
{
if (isupper(c))
{
input[i++] = '_';
c = tolower(c);
}
if ( i + 1 != N ) input[i++] = c;
}
input[i] = '\0';
puts(input);
return 0;
}
source to share