I'm trying to write a simple split function in c, where you supply a string and a char to split on, and it returns a list of split-strings:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char ** split(char * tosplit, char delim){
int amount = 0;
for (int i=0; i<strlen(tosplit); i++) {
if (tosplit[i] == delim) {
amount++;
}
}
char ** split_ar = malloc(0);
int counter = 0;
char *token = strtok(tosplit, &delim);
while (token){
split_ar[counter] = malloc(0);
split_ar[counter] = token;
token = strtok(NULL, &delim);
counter++;
}
split_ar[counter] = 0;
return split_ar;
}
int main(int argc, char *argv[]){
if (argc == 2){
char *tosplit = argv[1];
char delim = *argv[2];
char ** split_ar = split(tosplit, delim);
while (*split_ar){
printf("%s\n", *split_ar);
split_ar++;
}
} else {
puts("Please enter words and a delimiter.");
}
}
I use malloc twice: once to allocate space for the pointers to strings, and once allocate space for the actual strings themselves. The strange thing is: during testing I found that the code still worked when I malloc'ed no space at all.
When I removed the malloc-lines I got Segfaults or Malloc-assertion errors, so the lines do seem to be necessary, even though they don't seem to do anything. Can someone please explain me why?
I expect it has something to with strtok; the string being tokenized is initialized outside the function scope, and strtok returns pointers to the original string, so maybe malloc isn't even necessary. I have looked at many old SO threads but could find nothing similar enough to answer my question.
Calling
malloc(0)is OK. Using that pointer later as insplit_ar[counter] = malloc(0);is undefined behavior (UB) as evensplit_ar[0]attempts to access outside the allocated memory.When code incurs undefined behavior, there is no should produce an error. It is undefined behavior. There is no defined behavior in undefined behavior. It might "work", it might not. It is UB.
C does not certainly add safeguards to weak programming.
If you need a language to add extra checks for such mistakes, C is not the best answer.
Instead, allocate the correct amount. In OP's case I think it is, at most,
amount + 2. (Consider the case whentosplitdoes not contain any delimiters.)Further
Code is only attempting to copy the pointer and not the string.
Instead, allocate for the string and copy the string. Research
strdup().Advanced
Use
strspn()andstrcspn()to walk down an sing and parse it. This has the nice benefit of operating on aconststring and readily knowing the size of the token - useful in allocating.Use the same technique twice to pre-calculate token count as well as parsing. This avoids differences that exist in OP's 2 methods.