C/C++ and large file sizes (over 2GB)
1 09 Jul 2015 23:59 by u/RiderRockon
Hey everyone, this is something I've been wrestling with for the last few days. I'm writing an application that reads files and writes them out again. There's some data manipulation happening, but that's not relevant for this topic.
The big issue I keep running into is that I can't find a reliable way to get the file's size without having to read every byte of it, which is really slow. There seem to be 2 primary methods to get the file size, according to what I've found through Google on places like StackOverflow.
Method 1:
int getFileSize(const std::string &fileName)
{
ifstream file(fileName.c_str(), ifstream::binary);
file.seekg(0, ios::end);
int fileSize = file.tellg();
file.close();
return fileSize;
}
Method 2:
struct stat st;
if(stat(filename.c_str(), &st) != 0) {
return 0;
}
return st.st_size;
Both of these have their problems in handling larger file sizes. I found with the first method that, even when using a datatype large enough to hold the size (like a long long) it still wraps around at around the 2GB mark. The second method does not have this ailment, but becomes oddly inaccurate at larger file sizes.
I've read about the existence of stat64 being able to work with large file types, but aside from a few references I couldn't find anything concrete on how that was supposed to work.
Now, provided you don't want to delve into platform-specific functions, what would be the best way to accurately get a file's size for small and large files?
6 comments
0 u/sulami 10 Jul 2015 00:11
So, according to this you need to use
off_t, and compile with-D_FILE_OFFSET_BITS=64if you are on a 32-bit system (using method 2). That is supposed to work on any platform.0 u/RiderRockon [OP] 10 Jul 2015 09:32
I gave that a shot. However when I run this on a large file it returns -1. I'm using MinGW on Windows for this, but that shouldn't matter, should it?
0 u/sulami 10 Jul 2015 09:43
I'm using MinGW on Windows for this, but that shouldn't matter, should it?I don't think so. If you paste the whole section of whatever you wrote, I could throw it into a compiler here and check if I can find something odd.
0 u/RiderRockon [OP] 10 Jul 2015 10:34
After a bit of digging I think I might be doing something wrong with the compiler flag, as sizeof( off_t ) still returns 4. Here's what I use to compile the project.
0 u/sulami 10 Jul 2015 10:55
So on 64-bit Linux this give me 8 bytes, so 64 bits, regardless of whether I use
-D_FILE_OFFSET__BITS=64, as expected.I did some more quick searching and found this, where OP found a MinGW-specific solution that involves a 64-bit stat struct, afaict.
1 u/RiderRockon [OP] 10 Jul 2015 13:47
Yes! That seems to work for me as well! Thank you!
That seems a rather oddly specific way to do it though.