C ++: working with bytes

My problem is that I need to download a binary and work with single bits from the file. After that I need to store it as bytes of course.

My main problem is which data type should I choose to work - char or long int? Can I work with symbols somehow?

+3


source to share


6 answers


Before you get started, make sure you understand endianess , C ++ type sizes , and how weird they can be.

unsigned char

is the only type that is a fixed size (natural byte of the machine, usually 8 bits). Therefore, if you are creating portability, this is a safe bet. But it's not hard to just use, unsigned int

or even long long

to speed things up and use size_of

to find out how many bits you get in each read, although the code becomes more complex this way.

You should be aware that for true portability, none of the C ++ internal types are fixed. An unsigned char can be 9 bits and an int can be as small as in the range 0 to 65535 as stated in this and this answer



Another alternative, as suggested by user 1200129, is to use the boost integer library to reduce all these uncertainties. This is if you have support on your platform. Although, if you are going to use external libraries, there are many serializing libraries.

But above all, before you even start optimizing, do something simple that will work. Then you can start profiling when you start having timing issues.

+5


source


If performance isn't critical, use whatever makes your code easier to understand and maintain.



+5


source


It really depends on what you want to do, but I would say, in general, the best speed would be in the size of the integers your program compiled. So if you have 32 bit program then choose 32 bit integers, and if you have 64 bit choose 64 bit.

It might be different if you have multiple bytes or integers in your file. Without knowing the exact structure of your file, it is difficult to determine the optimal value.

+3


source


Your suggestions are not actually correct, but as far as I can interpret the question, you can use the unsigned char type (which is a byte) to be able to change each byte separately.

Edit: modified as per comment.

+1


source


If you are dealing with bytes, the best way to do this is to use a type of a specific size.

#include <algorithm>
#include <iterator>
#include <cinttypes>
#include <vector>
#include <fstream>

int main()
{
     std::vector<int8_t> file_data;
     std::ifstream file("file_name", std::ios::binary);

     //read
     std::copy(std::istream_iterator<int8_t>(file),
               std::istream_iterator<int8_t>(),
               std::back_inserter(file_data));

     //write
     std::ofstream out("outfile");           
     std::copy(file_data.begin(), file_data.end(),
               std::ostream_iterator<int8_t>(out));

}

      

Fixed EDIT error

+1


source


If you need to ensure that the number of bits is in an integer type, you must use a header <stdint.h>

. It is present in both C and C ++. It defines the type of a type uint8_t

(unsigned 8-bit integer) that is guaranteed to be resolved for the corresponding type on the platform. It also tells other programmers who are reading your code that the number of bits is important.

If you're worried about performance, you can use types with larger than 8-bit types, for example uint32_t

. However, when reading and writing files, you need to pay attention to the ultimate goal of your system. It is noteworthy that if you have a little-endian system (for example, x86, most of all ARM), then the 32-bit value 0x12345678

will be written to the file as four bytes 0x78 0x56 0x34 0x12

, and if you have a big-endian (for example, Sparc, PowerPC, Cell , some ARM and Internet), it will be written as 0x12 0x34 0x56 0x78

. (the same happens or is read). Of course, you can work with 8-bit types and eliminate this problem entirely.

+1


source







All Articles