What is the max array size you can declare (big data challenge)

copyright (C) 2011-2017 iesensor.com

by Qingfeng Xia @ iesensor.com 2015

With the popularity of BIG DATA concept, have you think about what is the max data structure you can run?http://stackoverflow.com/questions/216259/is-there-a-max-array-length-limit-in-c

constraints?

  • is your OS 32 bit 64bit?
  • how big is your memory?
  • where do you allocate the array

how memory is allocated

  • constant variables compiled into constant DATA region of the program
  • new Object, so it is allocated from heap, which is big in normal sensor, handred of MB
  • decleare in function or other block, so from stack which is tiny depending on OS and EXE model
  • allocator, usually continuous/contigious address memory

Solutions

  • always use a 64bit OS, 64bit compiler/64bit Python version
  • estimate the memory requirement of your big data, 1e10 int32_t will used 4GB
  • Break the large 1D array into container of array: std::vector, there is no need for continuous memory, if you see your program exit obnormally, it could be the reason
  • Feed data batch by batch, for example, instead of filling the data all in one go, using stream API could be use Python permute()

indexing container by size_t

size_t is the return type of sizeof(), it is the max array/STL container size of c and C++ (always using definite size of int type from ) it is uint32_t on 32bit OS, and 64bit on 64bit OS!

  • windows is using LLP64 model, only long long and pointer are 64bit
  • Linux is using LP64 model, long and pointer are 64bit.
  • it is clear, int is always 32bit, long long is 64bit!

In big data era, your program will fail silently later for code: for(int i=0;i<N;i++)

The given code won’t process an array containing more than UINT_MAX items in a 64-bit program. After the access to the item with UNIT_MAX index an overflow of the Index variable will occur and we’ll get infinite loop.

While, as current PC will not let you declear an array size >2G, you are safe now, but not long!

BSD documentation licensed Free for non-commercial usage only
Author: Qingfeng XIA
copyright (C) 2011-2017
http://www.iesensor.com
please keep the original link in your reference.
http://www.iesensor.com/blog/2015/09/18/what-is-the-max-array-size-you-can-declare-big-data-challenge/
This entry was posted in Programming. Bookmark the permalink.