Jump to content

Talk:Data structure alignment

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Cppljevans (talk | contribs) at 13:05, 17 February 2011 (calculation of whole structure alignment and paddings). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
WikiProject iconComputing Start‑class
WikiProject iconThis article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
StartThis article has been rated as Start-class on Wikipedia's content assessment scale.
???This article has not yet received a rating on the project's importance scale.
WikiProject iconComputer science Start‑class Mid‑importance
WikiProject iconThis article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
StartThis article has been rated as Start-class on Wikipedia's content assessment scale.
MidThis article has been rated as Mid-importance on the project's importance scale.
Things you can help WikiProject Computer science with:

The page gives no reason why this practise is used. --Anonymous

I replaced this, and left out the specific programs mentioned because I could not find any mention of trouble porting those specific programs: "A common problem in computer programming is called word alignement. One way to increase performance, especially in RISC processors which are designed to maximize raw performance is to require data to loaded or stored on a word boundary. So though memory is commonly addressed by 8 bit bytes, loading a 32 bit integer or 64 bit floating point number would be required to be start at every 64 bits on a 64 bit machine. The processor could flag a fault if it were asked to load a number which was not on such a boundary, or call a routine which would effectively figure out which word or words contained the data and extract the equivalent value. This caused difficulty when the team from Mosaic Software ported their Twin Spreadsheet to the 68000 based Atari ST. The Intel 8086 architecture had no such restrictions. It would also cause difficulties in porting Microsoft Office to Windows NT on MIPS and PowerPC for NEC and IBM. Since the software was not written with such restrictions in mind, designers had to set a bit in the O/S to enable non-aligned data. However since this bit was masked with other flags, it was impossible to keep the O/S from faulting on non-aligned data when other modules used the other flags. This may have been a major factor in abandoning Windows NT on non-Intel processors as they failed as platforms for hosting common Windows applications and one more reason for the baffling dominance of the x86 architecture over technologically elegant rivals. "

I hope to have provided sufficient background, but I think the article still needs some work. (but I need to sleep)

Aij 07:03, 29 May 2006 (UTC)[reply]

Packed array

What is a packed array? It redirects here, but I could not find it. 85.145.103.163 (talk) 15:52, 30 January 2008 (UTC)[reply]

In the section Data_Structure_Padding, the article talks about that. 0x6adb015 (talk) 14:07, 3 March 2008 (UTC)[reply]


Too verbose/jargon filled

I'm a 4th year Comp Sci student sitting a PhD next year...and don't think the description here is clear enough. It really doesn't explain it in any useful way. I think a diagram, an analogy, or something similar would help. In its current state it's no use to anyone who isn't a computer expert-I come pretty close, and I can't follow it. It needs to be much, much, simpler at first, instead of jumping right into details. —Preceding unsigned comment added by 130.209.241.193 (talk) 11:53, 27 April 2008 (UTC)[reply]

I agree, one thing which struck is why is the word datum used instead of data? I've seen the word used before but needed to look it up, only to see that it's apparantly just a synonym to data, which is significantly more well understod. —Preceding unsigned comment added by 195.84.66.198 (talk) 13:31, 7 June 2008 (UTC)[reply]

I rewrote the introduction, added a short example and removed the {{context}} tag. Virtlinktc 22:08, 28 July 2008 (UTC)[reply]

...due to how the CPU handles memory

I've read the article and, well I agree with the opinions above, but, in the introduction there's a part that says '...due to how the CPU handles memory'. Well if you Google that you're going to come up with thousands of hits... how 'bout a link or something that could explain the part of "how the CPU handles memory" that it's really important on the "Structure padding"? thanks. (I would do it myself but my knowledge doesn't extent that long :S) —Preceding unsigned comment added by 190.201.223.72 (talk) 01:31, 20 January 2009 (UTC)[reply]

Default packing and #pragma pack

The cited source for this section doesn't support the information presented. The linked MSDN article just talks about "default alignment" derived from member type, e.g. 4 bytes for an int. This is very different from /Zp packsize which basically wraps whole source file in #pragma pack(push, N) ... #pragma pack(pop). —Preceding unsigned comment added by 195.146.153.234 (talk) 16:14, 4 October 2010 (UTC)[reply]

least common multiple requires at least 2 args

The page contains the sentence:

 It is important to note that the last member is padded with the
 number of bytes required that the total size of the structure should
 be a least common multiple of the size of a largest structure
 member.

However, from

 [1]

The least common multiple operator (lcm) requires 2 arguments; hence, the cited sentence is confusing. It's also probably wrong because it makes no mention of the alignment of the largest structure member, it only mentions the size of the largest structure member. Hence, if the largest structure member is:

 char[1000] carray;

and the complete structure were:

 struct A{ char[1000] carray, char last};

then this sentence would require the A::last to have 999 padding characters after it. That's wrong, as shown by the program:

 struct A
 { char carray[1000];
   char last;
 };
 
 #include <iostream>
 
 int main(void)
 {
     std::cout<<"sizeof(A)="<<sizeof(A)<<"\n";
     return 0;
 }    

whose output is:

 sizeof(A)=1001

— Preceding unsigned comment added by Cppljevans (talkcontribs) 13:27, 16 February 2011 (UTC)[reply]

calculation of whole structure alignment and paddings

Given the following pseudo-c-code data structure:

 struct TheStructure
 {
     Type[1] member[1];
     Type[2] member[2];
     ...
     Type[N] member[N];
 };


and the following definitions:

 size[i] = offset[i]+sizeof(Type[i])
   /*size of TheStructure truncated to just i members*/
   for i=1 to N;
   
 align[i] = alignof(Type[i])/*alignment of member[i]*/
   for i=1 to N;
 
 remainder[i] = size[i] % align[i+1]
   for i=1 to N;
 
 padding[i] = (remainder[i]==0)? 0 : align[i+1] - remainder[i]
   /*padding required after member[i] to achieve proper
     alignment of member[i+1]
    */
   for i=1 to N;
   
 offset[1] = 0
   /*offset from start of TheStructure to start of member[0]*/
   ;
   
 offset[i+1] = size[i]+padding[i]
   /*offset from start of TheStructure to start of member[i+1]
    */
   for i=1 to N-1;
                   

where:

 alignof(SomeType) = the alignment of type, SomeType
 
 sizeof(SomeType) = the size of type, SomeType
 

then the problem is to find:

   padding[i], for i=1 to N //padding following member[i]
   align[0] //alignment for TheStructure
 
 constrained_by:
 
   (m*align[0]+offset[i]) % align[i] == 0 //member[i] is properly aligned
     , for i=1 to N
     , for any unsigned m>=0

For the particular case where i=1, the constrained_by is:

 (m*align[0]+0)%align[1] == 0
     , for any unsigned m>=0
     

Any solution has the form:

 align[0] == j*align[1]
   , for any unsigned j>=1.

IOW, the set of solutions is the set:

 soln[1] = {j*align[1]|unsigned j>= 1}
   

For the particular case where i=2, the constrained_by is:

   (m*align[0]+offset[2]) % align[2] == 0
     , for any unsigned m>=0
     

After substituting in the definition of offset[2], this becomes:

   (m*align[0]+size[1]+padding[1]) % align[2] == 0
     , for any unsigned m>=0
     

After substituting in the definition of padding[1], this becomes:

   ( m*align[0]
   + size[1]
   + (remainder[1]==0)? 0 : align[2] - remainder[1]
   ) % align[2] == 0
     , for any unsigned m>=0
   

After substituting in the definition of remainder[1], this becomes:

   ( m*align[0]
   + size[1]
   + (size[1]%align[2])==0)? 0 : align[i+1] - (size[1]%align[2])
   ) % align[2] == 0
     , for any unsigned m>=0
     

There are two cases in the above:

 case_1) size[1]%align[2]==0
   In this case, padding[1]==0 and, from the definition of offset:
     offset[2] = size[1].
   so, the constrained by becomes:
     ( m*align[0]
     + size[1]
     ) % align[2] == 0
       , for any unsigned m>=0
   but because size[1]%align[2] == 0:
     size[1]==k*align[2]
       , for some k>=0
   substituting this definition of size[1] into the above
   constrained_by gives:
     ( m*align[0]
     + k*align[2]
     ) % align[2] == 0
       , for any unsigned m>=0
       
  and any solution for align[0] takes the form:
   
     align[0] = j*align[2]
       , for any unsigned j>=1
     
  because then the contrained_by gives:
   
     ( m*j*align[2]
     + k*align[2]
     ) % align[2] == 0
       , for any unsigned m>=0
       
   or:
   
     ( (m*j+k)*align[2]
     ) % align[2] == 0
       , for any unsigned m>=0
       
   which is obviously true.
       
 case_2) size[1]%align[2]!=0
   In this case:
     padding[1]==align[2]-remainder[i]
               ==align[2]-size[1]%align[2]
     offset[2] = size[1]+align[2]-size[1]%align[2]

Now, since % is the c modulus function, and the modulus function is described here:

 modulus description

with the assertion:

  r = a - n * q

where r = a%n, then size[1]%align[2] can be replaced with size[1]-align[2]*q to give:

     offset[2] = size[1]+align[2]-(size[1]-align[2]*q)
     

which simplifies to:

     offset[2] = align[2]*(1+q)
     

and substituting into the constrained_by:

   (m*align[0]+align[2]*(1+q)) % align[2] == 0
     , for any unsigned m>=0

where, again, any solution has the form:

 align[0] == j*align[2]
   , for any unsigned j>=1.
   

IOW, the set of solutions is:

 soln[2] = {j*align[2]|unsigned j>= 1}
   

A analysis similar to the above for i==2 can be done for i=3 to N, resulting in an array of partial solutions:

 soln[1], soln[2], ... soln[N]

To satisfy all of soln[1]...soln[N], align[0] has to be in the intersection of soln[1]...soln[N]. IOW, align[0] has to be a multiple of all of align[1]...align[N]. The least solution satisfying this constraint is the least common multiple of align[1],align[2],...,align[N].

Cppljevans (talk) 13:05, 17 February 2011 (UTC)[reply]