What is the max array size you can declare (big data challenge)

With the popularity of BIG DATA concept, have you think about what is the max data structure you can run?http://stackoverflow.com/questions/216259/is-there-a-max-array-length-limit-in-c


  • is your OS 32 bit 64bit?
  • how big is your memory?
  • where do you allocate the array

how memory is allocated

  • constant variables compiled into constant DATA region of the program
  • new Object, so it is allocated from heap, which is big in normal sensor, handred of MB
  • decleare in function or other block, so from stack which is tiny depending on OS and EXE model
  • allocator, usually continuous/contigious address memory


  • always use a 64bit OS, 64bit compiler/64bit Python version
  • estimate the memory requirement of your big data, 1e10 int32_t will used 4GB
  • Break the large 1D array into container of array: std::vector, there is no need for continuous memory, if you see your program exit obnormally, it could be the reason
  • Feed data batch by batch, for example, instead of filling the data all in one go, using stream API could be use Python permute()

indexing container by size_t

size_t is the return type of sizeof(), it is the max array/STL container size of c and C++ (always using definite size of int type from ) it is uint32_t on 32bit OS, and 64bit on 64bit OS!

  • windows is using LLP64 model, only long long and pointer are 64bit
  • Linux is using LP64 model, long and pointer are 64bit.
  • it is clear, int is always 32bit, long long is 64bit!

In big data era, your program will fail silently later for code: for(int i=0;i<N;i++)

unsigned Index = 0;
while (MyBigNumberField[Index] != id)

The given code won’t process an array containing more than UINT_MAX items in a 64-bit program. After the access to the item with UNIT_MAX index an overflow of the Index variable will occur and we’ll get infinite loop.

While, as current PC will not let you declear an array size >2G, you are safe now, but not long!

