[Why does my code work] I have created a parser in C and if I do not allocate and immediately free an array it breaks.

9    16 Jun 2016 01:16 by u/vorsamp

[the code in question](https://gist.github.com/acidmath/b2c653757017793500c9dfaa4c252c94)

I have created a program that is loaded onto a microcontroller, [it's so old it's not in their inventory any more](https://www.parallax.com/catalog/microcontrollers), it takes in a string from my C# program, parses it, and tells my machine to spit out coins. A sample input in ASCII would be 48 31 49 31 49 48 31 30 -> 0 US 1 US 10 US RS -> turn on motor 0 until 10 coins exit -> gimme 10 pennies. Assume the input is always valid, [here is my C# interface](https://gist.github.com/acidmath/4d0dee2462984ba78b5a85747e517a5f).

Everything works great until I ask for more than 9 of a coin, when the amount requested has more than one digit. I have tried for 3 days to get rid of this problem to no avail, until today. I created a char pointer, printed out some diagnostics through the serial port, freed the char pointer when I was done with it and lo and behold, it fucking works. I was able to narrow it down to the program allocating and freeing an array is enough to get it to work for any amount requested. However, I know this is bad. I know I have bad code and it needs to be fixed but I do not know where to start.

I am not a very good C programmer. If you see some code you don't like, even if it has nothing to do with the problem I have posed, tell me I'm an idiot, and if you're feeling up to it, tell me why so I can fix it.

29 comments

8

The first thing to look for when you get this kind of "I change some random other thing and it starts/stops working" behaviour is some kind of memory corruption. Check the limits on all your loops, double check memory allocation/deallocation (if you're using it, which on a microcontroller you probably don't want to), make sure every pointer is initialized properly. Chances are, your allocating and freeing the array changes the layout of the rest of your data in memory in just the right way so that whatever bad memory access is happening now happens to some less important memory.

Onto the code... You probably want to change allocCharBuffer() so that it allocates and initializes (numberOfChars+1) bytes, not (numberOfChars) bytes, to make room for the terminating NUL character at the end of the string. My guess is your code is crashing when you call atoi() on an unterminated string.

When you allocate your string, this causes your next allocCharBuffer() to land in some other chunk of memory which, chances are, is already initialized with zeroes, which is why it works in this case (for a while).

1

Of course he's got some kind of memory corruption.

1

This comment is useless, but as per my post, thank you for telling me I am an idiot.

1

Nope. Sorry about that gentlemen, totally didn't see the 'if' there. My fault. I would still do a check to make sure your allocation routine isn't getting called with a zero.

1

This comment is useful, thank you for telling me I am an idiot.

2

Thank you so much for your reply.

change allocCharBuffer() so that it allocates (numberOfChars+1) bytes

I have read about allocating strings, which in C is a character array, and allowing for \0 to take up the end, this did not help me, but I will double check that tomorrow.

check memory allocation/deallocation (if you're using it, which on a microcontroller you probably don't want to)

I was considering a refactor that would use arrays instead of pointers, this piece of your comment seems to affirm that. Can I get a little more information about this suggestion?

1

Sure. :) First up, I'm a bit of a dinosaur in terms of embedded systems and so what I'm saying applies to old, tiny micros. Even on a modern 'small' chip like an AVR or similar you can probably get away with treating it like a little desktop computer. That said...

Generally in embedded software development, static memory allocation is favoured over dynamic allocation. Embedded systems often have very limited processing power and memory, and are expected to run reliably for long periods of time. They're also sometimes hard to debug due to the difficulty of getting reliable debug information out of them.

Static allocation is simpler, eliminates memory leaks and a whole bunch of other potential pointer issues, and generally is a better option in embedded development.

In this case you could do something like this (whoops, I was just going to write some pseudocode... :P )

// Parses an integer, terminated by a US character, from 'str' starting at the
// given position. Updates the position if found, otherwise returns false.
bool parseInt(char *str, int& pos, int& val) {
    int end = pos;
    // Find the first non-numeric character
    while ('0' <= str[end] && str[end] <= '9')
        end++;
    // If the next char isn't an US, then fail
    if (str[end] != US)
        return false;
    // Parse the value
    str[end] = 0;
    val = atoi(str+pos);
    str[end] = US; // NOTE: Don't need this if we don't use the string again
    pos = end+1;
    return true;
}
bool parseDispenseRequest(char *input, int& position, int& currency, int& amount) {
    int curpos = 0;
    return parseInt(input, curpos, position)
        && parseInt(input, curpos, currency)
        && parseInt(input, curpos, amount)
        && (input[curpos] == RS);
}

Source with test cases.

0

This is very valuable and I will be sure to return and update you on how I've implemented parts of your solution.

Juuuuuuuust for fun, I don't have any type definitions in my C so there is no such thing as bool.

1

So I have made everything a static array and it works brilliantly. I have also rewritten atoi, without changing how my parser behaves, it still tracks start and length of the values between unit seperators. I am sure the following code is trivial, but I just wanted to share how I am parsing my integers.

/* reads integer in the range inputArray[start, end) */
int inputRangeToInteger(int start, int end){
  int value = 0;
  for(int i=start; i<end; i++){
    value = value + (inputArray[i]-48)*pow(10, end-i-1); 
  } 
  return value; 
} 
0

Awesome, good work!

The new 'atoi' looks good for what you want, the only thing I'd suggest is avoiding the call to pow() since it's generally slow and you can easily live without it. Also it might be more readable to write 48 as '0', it does exactly the same thing but the next person to read the code might not know that 48 = ASCII '0' off the top of their head:

value = 10 * value + (inputArray[i] - '0');

:)

2

It looks like inputBufferLength is starting out at zero. And then allocCharBuffer(inputBufferLength) gets called which calls malloc with a length of zero. I guess malloc returns the pointer anyway. Yep I just tested it and it does on my system too.

Here's some junk about it: https://bytes.com/topic/c/answers/578467-what-malloc-0-should-returns

2

For the purpose of my problem, you can assume input is always valid, there is always at least one integer before c==US.

0

Your problem is here:

char* inputBufferSubset = allocCharBuffer(inputBufferLength);

And inputBufferLength starts out at 0. Anything else that happens in the program is up for grabs because you might be overwriting some data structure used by the allocator or something.

1

I have updated my post to indicate the assumption about input. There will always be an integer before a unit separator (US) is encountered. In C#, int and uint cannot be null.

my code that makes the request

0

Dude, that stuff doesn't matter. You can't do this:

char* inputBufferSubset = allocCharBuffer(inputBufferLength);
memcpy(inputBufferSubset, &input[i-inputBufferLength], inputBufferLength);

inputBufferLength starts out at 0. You're overwriting memory you're not supposed to.

1

Yes, inputBufferLength starts at 0. Given valid input, which I have indicated you can assume, at least one integer appears before c==US, so inputBufferLenght++ is called at least once, so you can assume inputBufferLength is at least 1 when this section of the switch statement is called.

2

No, it loops, while incrementing inputBufferLength, until it finds US and then allocate the memory. US is unit seperator, ASCII 31.

1

Doesn't matter what happens later. He's modifying memory that's not been allocated properly.

char* inputBufferSubset = allocCharBuffer(inputBufferLength);
memcpy(inputBufferSubset, &input[i-inputBufferLength], inputBufferLength);
2

(Based on the value he gave in its post) First time c is assigned, its value is 48. Since US is 31 and RS is 30, it will go down to the else, line 43, which increments inputBufferLength. It will not call malloc with a size of 0.

2

Yeah, didn't see that 'if' there. My fault.

2

You have given very honest feedback. Do you think a switch statement would make my code more readable?

2

Nah, it's fine. I just can't see!

1

You're making assumptions about memory layout and allocation that are not valid. The result of malloc on a size of 0 is undefined. Don't do it. Exactly which region of memory malloc will allocate is also undefined. Don't make assumptions. You're doing both of these things. Allocate a global buffer and reuse that buffer obviously first nulling and ensuring that the buffer is large enough to handle your data. Don't repeatedly allocate and free. Don't make assumptions about what memory is allocated.

Assuming you want to add in more complex stuff, you also might want to check out LEX/YACC. Great tools for lexical analysis and execution in C.

edit Important note to add. Just because your code functions in one environment in C does not mean your code works. Your code in this project does not work. It assumes the result of a malloc of 0 which means the code is broken. Whether or not it gives you the right output for one specific scenario in one specific environment is immaterial to the question.

0

Some things I noticed

1) You should be able to use calloc to allocate memory and zero it, so you don't have to loop and set it all to zero.

2) I'm not quite sure what you're trying to do here, but I don't think atoi(inputBufferSubset) does what you think it does. If you want the value at inputBufferSubset[0] then you need to do *inputBufferSubset. In which case, you could just do char position = *inputBufferSubset; If you need the value of multiple chars, you'll need to combine the bytes into an integer. (int)*inputBufferSubet would work, but it would include the junk after the value. In this case you would need the length and mask it or bitshift. Alternatively, put the value into a zeroed char[] coinCount; and do (int)*coinCount. EDIT: I may be wrong with with this point. I tested it real quickly and it always returned 0..but in retrospect it should have worked.

3) positionCountArray = allocIntBuffer(8); requestArray = allocIntBuffer(8); There's probably more code than what is here, but hopefully you are freeing these at some point.

I would write the code totally differently:

#define US 31
void getToken(char* input, unsigned int inputLength, char* result, unsigned int* resultLength) 
{   
    int i = 0;
    for(; i < inputLength; i++) {
        if(input[i] == US) {
               break;
        }
    }
    *resultLength = i;
    memcpy(result, input, i);
}
void main(...) {
    char input[8] = {48, 31, 49, 31, 49, 48, 31, 30};
    int inputLength = 8; // Not going to use strlen because no \0
    char token[8] = {}; // results can never be longer than the input
    int resultLength = 0;
    int index = 0;
   // Get motor
   getToken(&input[index], inputLength, token, &resultLength);
   int motor = token[0];
   index += resultLength + 1;
   // Get second unit (ASCII 49)
   getToken(&input[index], inputLength-index, token, &resultLength);
   int secondUnit = token[0];
  index += resultLength + 1;
  // Get quantity
  getToken(&input[index], inputLength-index, token, &resultLength);
  int coinCount = (int)*token & 0x0000FFFF; // Don't forget about endianness!
     and bitshift to get the integer value. I can't imagine someone would want > 99 coins, but who knows. */
 // etc...
}

It's really rough, incomplete, and probably broken but I think you get the idea. Your really should try to avoid calling malloc & free so many times. Eventually the memory will become fragmented and memory allocation will fail. Hopefully this helps.