Tuesday, August 17, 2010

Information dynamics

Information dynamics is the study of the transformation of information through the process of computation. Information processing happens along a very rich scale:

Social
Desktop
Applications
Frameworks
Libraries
System Calls
Kernel
Drivers
Assembly
Macro-ops
Micro-ops
Gates
Elementary particles

There are a dozen conceptual levels of abstraction between the user at the keyboard and the physical process of computation proceeding inside a modern computer. This set of scales has originated through a gradual process of invention from the earliest electronic computers which were programmed by manually manipulating switches and wires and used by a single person at a time to modern personal computers which do not require users to write programs and which interact with millions of people through a global network.

This expansion has been enabled by the exponential growth in computational capacity described by Moore's Law because the addition of each scale of abstraction has a cost in terms of performance. Our computations would proceed more rapidly if we represented them as an optimized set of micro-operations that the microprocessor could directly process without any intermediaries. However, it would take us a tremendous amount more time and effort to represent our computations in such a manner. Instead we perform our computations through applications with usable human interfaces that are built on lower scales of software that can be re-used for a wide variety of tasks.

The social scale is concerned with the design and specification of such user interfaces, the field of human computer interaction, while the desktop scale addresses the combination of applications, such as the window manager, file browser, and a variety of hardware management tools, that interact to form the user interface. Applications, such as web browsers, mail clients, and instant messagers, run within the desktop environment at the application scale, dependent on the frameworks below them.

The framework scale contains both desktop frameworks such as Gnome and web frameworks such as Ruby on Rails. Software frameworks promote a standard structure for applications and control the flow of control between parts of an application. Frameworks are often built on top of a wide variety of libraries at the library scales. Libraries are collections of subroutines that perform a common task, such as data compression or image manipulation, required by many applications.

Libraries request service from the operating system kernel through system calls. System calls provide the interface between the application and the operating system. They are needed for any task requiring access to the hardware or communication with other processes or machines. The kernel scale is responsible for managing system resources, including the processor, memory, and input/output resources such as disk, graphics, and network, and providing a layer of abstraction between applications and the hardware that actually performs computations.

Kernels are written in assembly language or a combination of assembly language another low level programming language like C. Assembly languages provide a symbolic representation of the machine code that is used by a particular microprocessor.

Microprocessors are typically programmed in a machine code that is backwards compatible with many older generations of microprocessors. To enable speed improvements while retaining backwards compatibility, the microprocessor performs arithmetic and logical operations on micro-ops, a lower level machine code that is generated by translation hardware on the microprocessor from macro-ops, the machine code that assembly language is translated into. This translation uses programmable memory on the processor, allowing the processor's machine language to be modified by the kernel to fix bugs or even add features.

The computation described by a micro-op, such as addition of two numbers, is performed by a set of logic gates, combining sets of binary numbers and producing a binary number as output. At the of the scale, ones are represented by a high voltage level and zeros are represented by a low voltage level. At this level, computers are analog devices, having to deal with noise and timing issues in determining whether a particular voltage is a low or high level representing a one or zero respectively.

The survey of information dynamics above has led us through the zone of discrete artificial computation. As such, its foundation rests on the notion of a universal computer, arguably the simplest model of which is the Turing machine. Universality appears at many scales in information dynamics. The language of the microprocessor is every bit as Turing universal as, many layers up, a scripting language for a 3D animation program would be. Indeed both have access to similar scale-invariant features that have evolved to suit the needs of humans: decision statements, loop statements, behavior encapsulation in terms of procedures, and so on, precisely the features that make it universal.

Still, as we go up the scale we make use of more powerful abstractions. Abstractions in computer science are more dynamic and ad hoc than in mathematics (Colburn and Shute 2007), and are more subtle than the simple collapsing of distinctions into broader equivalence classes. Indeed each lower layer presents a public picture of itself to the higher layers, often through an Application Program Interface (API). These can be very elegant object-oriented interfaces, or very “leaky”, allowing layer-violating access to lower levels.

There is heterogeneity of design here, and human programmer communities struggle to build complex systems and come up with a variety of solution schemes. What software engineers are trying to manage here is a scale of complexity, gauged for example by the integral numerical scale of lines of code (LOC). From scripts a few tens of lines long, to operating systems a few tens of millions of lines long, this is a great height one must scale. (It is interesting that English can use the verb to scale to denote the act of traversing a scale, as well as to establish a scale.) The deep idea here is that to handle the vast scale of LOC, software engineers have turned to another type of scale, a scale of increasing abstraction. This subverts the LOC gauge by rescaling it: a ten-line script in Python for a game may translate into hundreds of thousands of lines of code that execute when it runs. Programmers have often seen the humor in this (Munroe 2009).

The field of artificial computation is of course pluralistic. Beyond the traditions sketched above, there is also the tradition coming out of the lambda calculus, and the tradition coming out of nondeterministic and stochastic programming. The Church programming language (Goodman et al. 2008) is a new example of work that combines these traditions.

Underlying artificial computation is the idea of a text that is both dead (data) and alive (executable). The power and limits of Turing computation arise from the consequences of treating code as text and conversely. The performative nature of code (Austin 1975) gives it its power as it acts on the world. (The simple proof of the unsolvability of the halting problem is only a few lines long, a meditation on the notion of executable text.)

It is the notion of text that gives us the LOC scale; it is the notion of execution that gives us the time scales used in asymptotic complexity analysis. These are complementary. Across program text we have the notion of scope, as in the range of text over which a variable has a given meaning. This is a special case of locality of reference (Denning 2005) along a scale. There are many scales of abstract time complexity, the coarsest of which is (P, NP, non-NP).

Natural computation also has its own set of scales, the most obvious being the spatiotemporal scale. Quantum computing and biological computing are two areas of interest here, and insights there have the promise of shaking up both computer science and information theory (Kirby 1998, 2002). For example, the so-called quantum-to-classical transition, and the very qbit/bit distinction, is a scale issue. The emergence of self-replicating macromolecules, and then the emergence of information processing out of uninterpreted dynamics occurs across a complexity scale.

The pervasiveness of scale in computing, taken together with the pervasiveness of computing itself, actually pulls us into a world picture that is quite different from the mechanical world view (Dodig-Crnkovic and Müller 2010) It is this we will explore in the next section.

Sunday, August 15, 2010

Information kinematics

Information kinematics is the study of information in transit from one location to another. The Internet at all levels, from social networks to the transmission of electromagnetic waves through wires or the air, is the domain of kinematics. Information kinematics covers communication phenomena over a wide variety of conceptual scales, including:

Social
Content
Application
Presentation
Session
Transport
Network
Data Link
Physical

These conceptual scales are often called layers in field of data communication. The third through ninth scales are the layers of the O.S.I. (Open System Interconnection) model, which is the most widely used conceptual model for discussing data communications. Each layer depends on the layer directly below it and supports the layer above it.

Content is transmitted through application layer protocols, such as Hypertext Transfer Protocol (HTTP), which is used by web browsers and servers to communicate, or Simple Mail Transfer Protocol (SMTP), which is used for e-mail. The presentation scale deals with the representation of content transmitted through the application scale, including issues such as byte order and character sets for different languages.

The session scale manages user sessions with applications, including keeping consistent user identity through connections going to a farm of servers and synchronizing audio and video streams used for network video transmission.

Transport scale protocols create long term dialogues between devices, breaking up application data into units of data called segment for transmission across the network. These protocols ensure that data segments are reliably transmitted across the network. Transmission Control Protocol (TCP) is the best known transport scale protocol and is one of two core protocols of the Internet.

Network scale protocols address end-to-end delivery of data packets, including routing through intermediate network nodes, while data link protocols address simple node to adjacent node communication. The Internet Protocol (IP) is the best known network scale protocol and the second of the two core protocols of the Internet. This protocol is also responsible for assigning network addresses (IP addresses) to each network node.

Data link protocols deal with the encoding of data packets into different electronic media protocols, including Ethernet and IEEE 802.11 wireless.

The physical scale addresses the encoding of bits on a wire or radio spectrum and signaling techniques. Individual Ethernet protocols such as 100BASE-TX or 1000BASE-T (gigabit) are physical scale protocols.

The layer model becomes even more profound when it is transgressed. In tunneling, data that is delivered can belong to a layer lower down (e.g. the data link layer) but is wrapped so that it is delivered as if it were at a higher level (e.g. the transport layer). It is through tunneling, for example, that firewalls that attempt to block certain services can be subverted. This freedom to “wrap” data to transgress the layers is the information kinematic analog to universality and software emulation in information dynamics, which we turn to next.

Tuesday, August 10, 2010

Information statics

Information statics is the study of information in storage. At its center we find the most basic numerical scale: data size. This scale is vast, pragmatically logarithmic, and ranges from bits to yottabytes (2^80 bytes). Though the approximately 7 items stored in human short term memory can consist of relatively long strings, it still only amounts to a few tens of bytes, while human DNA contains 3 billion base pairs and thus about 3 billion bits of memory. The human brain consists of approximately 100 billion neurons, which would give a capacity of approximately 10 gigabytes if we assume one bit of storage per neuron. Of course, it may require multiple neurons to store a single bit in a manner similar to how it requires multiple transistors to store a single bit of static RAM or it may be possible that neurons can store many bits of information.

The area of statics itself is organized along the following conceptual scale, starting from the bottom scale of the physical encoding of bits as pits in optical media or nanoscopic areas of magnetic field in magnetic media and going to the top scale of designing an information architecture that humans can use to organize and search for information they need. The database scale has substantial implications for the privacy and security of information.

Information architecture
Databases
Data structures
Data encoding
Physical encoding

The computer memory hierarchy is a crucial scale in the design of computing hardware and software. Its scales range from directly accessible solid state memory containing hundreds of bytes to petabytes of off-line tape or optical storage.

Off-line storage (tape, optical)
On-line storage (hard drives, flash drives, network area storage)
Main memory (DRAM)
Cache (three levels are typical today)
Directly accessible (processor registers)

Points on the memory hierarchy scale near the top support larger amounts of data but require longer access times, ranging from millseconds for hard drives to minutes or hours for off-line tape storage. At the other end of the hierarchy, where access has virtually no delay, there is room for only hundreds of bytes in the processor's registers and a few kilobytes in the level 1 cache. Since a modern processor executes multiple instructions per nanosecond, a delay of a few milliseconds to access a hard drive causes a computation to wait for millions of instructions to access the data. A delay of minutes means waiting for hundreds of billions of instructions before data to complete the computation is available.

The speed of light is a fundamental limit for the speed of memory access, preventing the unlimited increase of the number of registers or size of the level 1 cache. This is a major reason for the development of level 2 and 3 caches that are much larger than level 1 caches but further away from the processor and slower to access.

While programs using small amounts of data can typically ignore the memory hierarchy, large scale systems like the Google search database, Facebook, and Flickr that use petabytes of storage, require intense focus on memory hierarchy scales to return results quickly to users.

Sunday, August 8, 2010

Organizing informatics: statics, kinematics, and dynamics

The name computer science implies a field of study limited to examining the properties of certain machines. However, computer science is a broader field than its name suggests, consisting of elements of mathematics, engineering, and science. The name of the field is so limiting compared to its broad nature that Dijkstra compared it to using the name "telescope science" for astronomy.

Recently, the term informatics has been used in the U.S. to name both computer science and a wide range of inter- and trans-disciplinary fields, some of which have informatics in their names, like bioinformatics and health informatics, and others of which do not, such as information security, library science, and human computer interaction. Some of these fields have been housed in departments or schools of computer science, while others have been housed elsewhere in the university until the rise of colleges of informatics or information studies.

The breadth of meanings for informatics leads us to examine how to identify and organize the various fields of informatics. In this entry, I look at classical mechanics as a source of inspiration. Classical mechanics consists of three major branches: statics, kinematics, and dynamics. Statics is the study of systems in static equilibrium, where the positions of objects do not vary with time. It is of critical importance to architects and structural engineers. Kinematics describes systems in motion without considering the forces that lead to that motion. This branch focuses on the concepts of position, distance, and velocity. Dynamics is the study of the causes of motion and changes in motion. Newton's laws of motions describe the dynamics of classical mechanics.

The study of information can be divided into three analagous branches of information statics, information kinematics, and information dynamics, dealing respectively with information in storage, information in communication, and information in processing. Information statics includes fields such as information architecture, storage management, and the solid state physics of magnetic media, while information kinematics includes subjects such as social informatics, web science, and data communication. Information dynamics focuses on computation, including software engineering, theory of computation, and the computer engineering of microprocessors. In future blog posts, I'll look at each of these fields in more detail.

Friday, August 6, 2010

Reception Pictures




Thursday's reception for Anthony Moore attracted a good number of people from across the university.



Thursday, August 5, 2010

Call for Local Informaticists

Seeking: NKU faculty members from Arts and Sciences, Business, Education, Health Professions, and Law whose teaching and scholarly interests connect to some aspect of informatics, and who are available to be bought out of one course during one semester between now and Fall 2012 to participate in a “transdisciplinary” project.


NKU has been awarded a $300K three-year grant from the National Science Foundation CPATH program. One goal of this NSF program is to revitalize and transform computing curricula to engage with computational thinking across all disciplines. Our grant project, entitled Informatics at Multiple Scales, uses the theme of scales (small-large, slow-fast, concrete-abstract, local-global, to name a few) to frame a transdisciplinary vision of information and translate it into curriculum. One particular focus is on a team-taught experimental course, INF 128, which is being piloted this fall with sixty students.

We have funding for six NKU faculty members –who must be outside the College of Informatics – to join us to work on this project. The first such “local informaticist in residence” is Rudy Garns, a philosopher from Arts and Sciences. With five more units of reassigned time available between now and Fall 2012, we are putting out a call across campus. These faculty members would (1) develop modules for the INF 128 course that arise from their discipline; (2) teach a few days of this course; (3) work with other project members on scholarly publications relating to this transdisciplinary vision of information; and (4) help integrate these work into a book and set of resources to be finalized in 2012.

If you are interested in participating in this project, please send a one-page statement of interest to Kevin Kirby, the Principal Investigator on this grant (kirby@nku.edu) by September 30. The project committee (PI and co-PIs) will review these statements to try and achieve a balance across many disciplines and then follow up with you promptly. Naturally, this will ultimately involve agreements between you and your chair on whether you would be available to accept this reassigned time.

For more information on the project, see scalinginformatics.nku.edu and feel free to contact PI Kevin Kirby (x6544, kirby@nku.edu), or Rudy Garns (garns@nku.edu).

Anthony Moore Reception

Faculty and students are invited to meet Professor Anthony Moore of the Academy of Media Arts in Cologne at a reception on Thursday August 5, 3:30-5:00 pm in room 108 of the Student Union. He will be working with the project team on modules related to art and computing, including the sonification of data, for use in the transdisciplinary Principles of Informatics course (INF 128) and the resulting book. A brief biography is here. This is Anthony's second visit to the U.S. this year; in April he performed some of his early work at Le Poisson Rouge in Manhattan.

Anthony will be at NKU through August 18 as the first in a series of "Informaticists in Residence" supported by this NSF grant. This is part of the project's effort to bring in artists, scientists, and humanities scholars from around the world whose work touches on themes of information and computation.