Tuesday, August 21, 2007

Probability, Memory, Satellite Imagery, and A Gentlemen's Dual

You guys have sent me some interesting e-mail on recent topics, so here is a more detailed exploration of several posts.

The "oldest/tallest woman in the same county" post generated quite a few e-mails about how to calculate the probability of such an event occurring. Here's how I calculated the odds of 90 billion to 1:
1. The world's population is estimated at 6.6 billion, so I took half of that (3.3 million women) and divided it by half the population of the county (44,000 population, so I used 22,000). That put the odds of one of the women living in that county at 150,000 to 1.

2. So if it's 150,000-1 that the oldest woman in the world lives in that county, it should be the same odds that the tallest woman lives in that county, so the probability that both of them live in that county should just be 150,000*150,000, which is 22.5 billion to 1.

Which is NOWHERE NEAR 90 billion to 1 I posted, so WTF was I doing? I have no idea. But using the method that I thought I was using, and using it properly, produces 22.5 billion to 1.

Ron Watkins was the one who e-mailed me and carefully went through his method, which I realized was the exact method I used, except he can count and advanced things like that, and apparently I can't.

I mentioned memory addressing and 32-bit operation systems last week (here at the bottom, then here), and that generated some interesting e-mail. First, from Chris Nahr:
As a fan of both your blog and Loyd Case's articles I still have to point out that Loyd's corrections on this subject are partly wrong...

First, a typo: I/O addresses are assigned to the lower 1 _GB_ of memory space, not 1 _MB_. Actually there may be more or less than 1 GB because all the on-board RAM of your graphics card is mapped into the 4 GB of total address space, and takes out an accordingly big chunk.

(I also thought it was the upper 1 GB, not the lower one, but I'm not sure on that point...)

Loyd also claims: "Some newer motherboards allow I/O remapping, so even a 32-bit OS can get a full 4GB." This is incorrect.

What these motherboards do is not remap I/O -- that's impossible, as memory-mapped I/O requires devices to reside at fixed addresses within the 4 GB address space of the 32-bit CPU. Rather, these motherboards remap RAM around the "hole" caused by I/O.

That causes part of the RAM that was previously obscured by I/O addresses to appear above the 4 GB limit. And that, by definition, still leaves this RAM inaccessible to 32-bit operating systems -- at least without those address extension tricks that Loyd mentioned earlier. The remapped RAM is now accessible to 64-bit operating systems, however, which is the point of this feature.

Head exploding? Well, you're not out of the woods yet. From Skip Key:
FWIW, what you wrote is close to true, but not exactly true. By default, the 4g of address space is split into 2 equal parts, with the OS getting the upper 2g of the address space and the application getting the lower 2g. But you can change this to where the split is 3g for the app and 1g for the OS. In order to do this you have to do 2 things:
2. Mark a bit in the executable header that says it's ok to use the extra gig.

The reason that it takes both the OS and the app to OK getting the extra gig is that, by and large, programs break otherwise if they haven't specifically been tested in this scenario. The reason is that there's a whole ton of code out there that assumes that a pointer is equivalent to an integer. And since they're the same size, that mostly works. But consider code like this:
if (pMem=HeapAlloc( GetProcessHeap(), dFlags, dSize )>0)
{
//Do something interesting
}

Code like this exists in thousands of applications, and as soon as you open up to more than 2G it breaks. Because addresses above 2g are negative numbers when they become integers. You also get code that breaks when the memory it has allocated straddles the 2G boundary. What's frustrating about this is that the program will appear to work fine, about 99.99% of the time, and will only fail out in the field, usually cryptically. Those of us that started coding on 8/16 bit processors in the dark ages didn't usually suffer from this because on a 16 bit processor pointer values greater than 32k are negative, and those were common even on machines with only a meg of memory. So we learned not to do that. But kids today that have never had to program a segmented architecture haven't had to learn those lessons.

Oh, and as a historical note, the 2g/2g split wasn't due to Microsoft making a decision like they did in deciding that 640k was enough memory for DOS apps. It came from Windows NT's heritage as a portable OS. The MIPS processor only allowed system code to run in the upper 2G of address space. So they used the same model for all processors for portability reasons.

I think my brain just exploded.