Tuesday, August 10, 2010

Information statics

Information statics is the study of information in storage. At its center we find the most basic numerical scale: data size. This scale is vast, pragmatically logarithmic, and ranges from bits to yottabytes (2^80 bytes). Though the approximately 7 items stored in human short term memory can consist of relatively long strings, it still only amounts to a few tens of bytes, while human DNA contains 3 billion base pairs and thus about 3 billion bits of memory. The human brain consists of approximately 100 billion neurons, which would give a capacity of approximately 10 gigabytes if we assume one bit of storage per neuron. Of course, it may require multiple neurons to store a single bit in a manner similar to how it requires multiple transistors to store a single bit of static RAM or it may be possible that neurons can store many bits of information.

The area of statics itself is organized along the following conceptual scale, starting from the bottom scale of the physical encoding of bits as pits in optical media or nanoscopic areas of magnetic field in magnetic media and going to the top scale of designing an information architecture that humans can use to organize and search for information they need. The database scale has substantial implications for the privacy and security of information.

Information architecture
Databases
Data structures
Data encoding
Physical encoding

The computer memory hierarchy is a crucial scale in the design of computing hardware and software. Its scales range from directly accessible solid state memory containing hundreds of bytes to petabytes of off-line tape or optical storage.

Off-line storage (tape, optical)
On-line storage (hard drives, flash drives, network area storage)
Main memory (DRAM)
Cache (three levels are typical today)
Directly accessible (processor registers)

Points on the memory hierarchy scale near the top support larger amounts of data but require longer access times, ranging from millseconds for hard drives to minutes or hours for off-line tape storage. At the other end of the hierarchy, where access has virtually no delay, there is room for only hundreds of bytes in the processor's registers and a few kilobytes in the level 1 cache. Since a modern processor executes multiple instructions per nanosecond, a delay of a few milliseconds to access a hard drive causes a computation to wait for millions of instructions to access the data. A delay of minutes means waiting for hundreds of billions of instructions before data to complete the computation is available.

The speed of light is a fundamental limit for the speed of memory access, preventing the unlimited increase of the number of registers or size of the level 1 cache. This is a major reason for the development of level 2 and 3 caches that are much larger than level 1 caches but further away from the processor and slower to access.

While programs using small amounts of data can typically ignore the memory hierarchy, large scale systems like the Google search database, Facebook, and Flickr that use petabytes of storage, require intense focus on memory hierarchy scales to return results quickly to users.

No comments:

Post a Comment