|
Stardate
20020821.1402 (Captain's log): The last job I held was in the cell phone industry, working on an advanced form of communications called Code Division Multiple Access, or CDMA. It has all sorts of interesting and cool characteristics, and by far the most interesting aspect of it is the fact that many cell phones in the same area will transmit on the same frequency at the same time. Unlike TDMA systems, which share a frequency but time-share so that only one is transmitting at once, in CDMA they truly are all transmitting simultaneously. And yet, the cell can take the output of a single receiver and divide it into the individual messages that each phone was sending.
It works that way going out, too. The cell figures out what it wants to send to all those phones, and mixes them all together to be broadcast by a single transmitter. Each individual phone receives the exact same thing, but can separate out the part intended for it, discarding all the information intended for other phones. If you're curious about the details of how this miracle works, I wrote about it here in April. The overall principle, however, leverages two very powerful concepts from information theory: signal and noise. CDMA is designed to encode all the information in such a way that any given phone can treat the information intended for it as signal, while treating what's being sent to all other phones as noise. In most cases there's much more noise than signal, but as long as there's enough signal, then a sophisticated noise rejection algorithm is capable of discarding all the noise leaving only a clean signal. For it to work, the amount of signal that must exist is an absolute, not a proportion. (Technically, in CDMA the measurement of signal availability is known as "EC/I0" and it's expressed in dB.) As long as it exceeds a certain threshold (with current generation hardware, somewhere in the range of 18 dB) then the total amount of noise doesn't matter.
In fact, for esoteric reasons having to do with what "bandwidth" means in systems like this (which is non-intuitive), if there is too much signal relative to the noise then the system is wasting capacity. If a phone detects that the cell is sending too much signal, it will send a message to request that the cell use less.
CDMA incorporates extraordinarily large amounts of signal processing in order to work properly. At a level above the transmission of chips and reproduction of bits, the system uses forward error correction. This is a cool technique for transmitting data over unreliable links which can get the data through without retransmission as long as the unreliability of the link doesn't exceed a certain level.
The forward error correction encoder takes a packet of bits to be transmitted, say 256 of them, and runs it through an algorithm which is actually similar to the way that some kinds of encryption work, and out the back end you get a somewhat larger packet, say 270 bits. The resulting bit pattern doesn't resemble the incoming one in any way.
And it has odd characteristics: if you were to flip one of those incoming 256 bits, and then examine the output packet, about half of them would flip. Flip a different bit, and about half of the output bits would flip but not all the same ones. What it means is that every one of the output bits contains information about every one of the input bits.
The longer packet can be transmitted over an unreliable link which munges a few of the bits in it. At the receiving end, the forward error correction decoder can then reproduce the original 256 bits as long as no more than (say) six of the transmitted bits were munged, no matter which ones they were. What it's doing is to take the information each of the transmitted bits says about each payload bit and tallying them up, while assuming that some of them will be lying. As long as enough signal gets through, the noise can be completely rejected.
More interestingly, if it's constructed properly, then if there was too much munging it will know. It either provides the right answer or tells you that it cannot provide any answer at all, permitting you to recover from the transmission failure at a higher level in the stack. The particular FEC algorithm used in CDMA was invented by Dr. Viterbi, and bears his name: Viterbi encoding. (And it's covered by a major patent.)
Remember those cheap walkie-talkies you can buy in toy stores for kids, which the package claims are good for "up to a quarter of a mile"? My memory of those from when I was a kid was that in practice they didn't carry any further than you could shout. A quarter mile required absolutely ideal conditions, which would be damned rare in the real world.
Those are using simple AM transmissions on one of the CB frequencies, and they transmit at 50 milliwatts (above which power the FCC requires a license). AM is just about as bad as it gets in radio transmission for reliability, and it depends entirely on power to get its signal through. Simple FM is slightly better, but not much. (That's what AMPS cell phones use.)
CDMA cell phones are restricted by the FCC to no more than 200 milliwatts, and usually operate at much lower levels than that, yet their signal can carry more than four miles to a cell. It's all that noise rejection and encoding which makes it work. Technically, it's referred to as "coding gain" and it means that sophisticated signal processing can substitute for raw power. It's all an application of Papa Shannon's work (may his name be blessed).
This is an example of a more general concept in engineering called fault tolerance. The idea is to design sufficient redundancy into the system so that variations in use and a small chance of component failure won't necessarily cause the system overall to die. In some cases when a system dies, people do too. If a wing falls off an aircraft in flight, everyone in it is screwed.
But even when you're talking about information processing, failures can be anywhere from mildly annoying to seriously inconveniencing to life threatening. And though such a failure might not cost anyone their lives, it can mean commercial death for a company. So a lot of people have been paying attention to this in computing for many years now, on many levels. Of course, engineers overall have been concerning themselves with this kind of thing since before the Romans.
One of the biggest users of fault tolerant computing is telecommunications, such as the phone company. You call Directory Assistance and you talk to a person, who listens to what you say and taps it into a computer terminal, and then switches you to a computer voice which reads back to you the information you wanted. For the moment, the only reason that human operator is in there is because computer speech recognition for arbitrary voices isn't yet reliable, so humans do that part.
But if the computer system went down, a hell of a lot of people would be mad. That's why they don't go down.
Or rather, why they don't fail when they do go down. The computer which is handling that problem is a big multiprocessor system, and it is designed so that if one of its component processors fails then something else will recognize that fact and shut it down. The system is installed with overcapacity, more processors than are really needed for the job, and as long as the number of failed ones is below a certain number than the system overall will continue to operate, and it doesn't matter which ones die. (In a sense, it's analogous to how FEC works.) When a processor fails, an alarm goes off on a console, and an operator will go hot-swap a working spare into that position, and send the failed one away to be repaired, after which it goes into the pile of spares ready to take over for the next one which dies.
Of course, sometimes there's no repairman around when you want one, and errors would be catastrophic. The computer system which runs the Shuttle must work properly at all times during launches and landings, or the shuttle will be lost. The whole system is fly-by-wire; the pilot can't directly control the steering rockets and control surfaces. He uses a stick and other controls which feed to the computer, which actually activates the ship's mechanisms.
If the computers fail during a landing, the stick will go dead, and a few seconds later the crew will be too as the ship tumbles and breaks up from the stresses. There's no room for error here.
The computer system on the shuttle is quite old now and uses extremely primitive technology. It uses core for memory, for example. Genuine core; little ferrite tori. That was state of the art in the mid 1970's when these computers were originally designed. The reason they stay with this is one you'll recognize immediately: there's a lot of software written for it, and if they updated the computers all of the software would have to be rewritten. And considering the kind of care which was involved in crafting that code, it's just not worth it.
That may well be the most carefully crafted code in existence. The process used to build that software is rigorous beyond anything used anywhere else I've ever heard of. But the overall system designers for the Shuttle didn't want to rely on just that, and that was wise.
There are five computers on the shuttle which all run in parallel doing exactly the same thing. Any one of them is sufficient to let the ship land. They are all constantly checking each other, and if any of them starts acting aberrantly, the others will alert the pilot, who can manually disable the one which has failed.
That will protect against component failures, but not against design bugs. If all five computers have the same bug, they'll all get the wrong answer. So four of them are one design, and the fifth one was designed completely independently. It came from a different supplier, and runs different code. If there's a design problem on one side, it's highly unlikely to be in the other.
That actually happened. The recovery from this varies depending on what's going on. When you're already in flight, you can't give up. But if there's a computer problem reported before the engines light up during a launch sequence, then they scrub the launch. And the very first attempt to launch the Shuttle was indeed scrubbed, with just a few seconds to go before engine start, because the four computers and the one computer disagreed. (They all detected the disagreement, and were programmed to automatically freeze the launch sequence as a result.)
So the launch was scrubbed, and the engineers dug in to see what had happened. It turns out that they all had gotten the right answer, but not at the same time. The computers don't just check each other's answers, they also make sure everyone's getting done on time. In a system like the shuttle, getting the answer late is no better than getting it wrong. If the control surfaces respond slowly, you can lose the ship.
It turns out that the one computer was on time; the four computers were late. It was the one computer which raised the alarm. Post analysis showed that it was correct.
In a sense, fault tolerance and noise rejection are strongly related concepts. But the kind of deliberate cross checking which these kinds of computer systems use can't scale. It works when you have no more than a few hundred processors. What if there are tens of thousands?
One of the biggest problems facing computer science now is that we're rapidly approaching physical limits of the granularity of the universe and fundamental physical constants. A recent news item I read said that one of the electronics companies was working on a 90 nanometer IC process, and the insulating layer in the FETs would only be five atoms thick. How small is "small"? Eventually we're going to run out of "small", and we won't be able to use brute force to make our computers faster and faster.
In the long run, the only way to keep increasing the effective power of computing systems is parallelism, using large numbers of smaller, slower, cheaper processors to do the job instead of one single humongously fast processor.
We've actually been building distributed systems for a long time, but they're highly specialized. Individual power stations in the electrical system have computers in them, custom designed to run power houses. Every cell in a cellular phone system has a bank of computers in it, and your CDMA phone contains at least two computers, and in future will probably have three.
But they're all customized, specialized. It's distributed, but it isn't simple.
One interesting research effort going on now is to try to evaluate how colonial insects can implement such complex behavior patterns. This is computer scientists doing this, not entomologists. They're trying to understand the "hive mind". The best example of that, by far, is ants and termites.
It's certainly arguable from a biological point of view that a colony of ants is actually a single evolutionary being, implemented in multiple bodies. All the ant workers are sterile and come from eggs laid by the hive queen and are strongly genetically related to each other, and especially to the queen herself. In a sense, they're biological machines produced by the queen for purposes of making it possible in the long run for the hive to produce its yearly crop of young queens and drones to go out and mate and establish new colonies, thus spreading the species around. The job of the workers is ultimately to collect food for the queen so she can make babies without risking herself in the environment. Loss of a worker is not important, but loss of the queen is death of the hive, without a chance to reproduce.
And ant colonies can do some pretty amazing things, which can actually seem well considered in terms of collective behavior. It almost does seem as if there's a mind there, but of course there isn't. What you're actually seeing is an emergent behavior which is the sum of a whole lot of individual small decisions being made by independent ants, reacting to their surroundings and relatively primitive signals from other ants in the colony. As such, it seems like an interesting natural model for the idea of using quite simple computers with relatively primitive intercommunication to nonetheless create large and complex systems with useful emergent behavior.
I don't know that this study had yielded any practical results yet, but I think it will. One of the things they've been experimenting with is simulations, to see what kinds of large emergent systems can be created from contributions of small and simple systems working together. (There's a guy out there who thinks he's cracked this problem and convinced himself that he's actually generalized the concept to a new fundamental theory of everything in physics from the smallest to the largest; I remain to be convinced.)
Even though it hasn't yielded practical results yet, the idea itself is an interesting one, and it actually fits well with the other things I've been talking about.
Let's take these ideas and mix them together. Let's take a huge number of independent biological processors, all of which were designed independently and which are running different programs, and link them together loosely through low-bandwidth communications into a huge computing matrix. Let's combine all their outputs into a single information stream and make the assumption that a significant number of them will actually get the wrong answer because the fault rate among those computers will be high. Let's treat the right answers as signal and the wrong ones as noise and apply signal processing to the information stream to implement strong noise rejection, as a practical way of implementing extremely strong fault tolerance. Put all of that together, and what do you get?
Elections.
From the point of view of a system engineer, that's how a democracy makes important decisions. Each individual voter makes an independent decision on an issue, using their unique brain which doesn't work exactly the same as anyone else's brain, using a different knowledge base than anyone else has based on their culture and their education and their environment, and comes to a decision on the issue. All the voters consider the same issue, but because of the incredible variety of different points of view being applied, it's inevitable that there will be different results. But usually there's a consensus, and that's treated as signal, while the dissenting voices are noise. The noise is rejected, leaving only the clean signal in the form of an electoral result, and the system acts on that.
Whichever candidate gets the most votes gets the office. The other candidates don't.
The system isn't perfect. In fact, it's impossible for it to be perfect. Kenneth Arrow proved mathematically that all election systems are flawed in one way or another. (It's an elegant and pernicious result, and he won one of the first Nobel prizes awarded in Economics for it.)
So in this case it's not just a question of whether a given system will eventually fail; it's guaranteed to do so. The only debate is when, and how, and how serious the consequences of the failure will be, and whether the system will be robust enough to recover from it.
Not that this is anything to intimidate an engineer, because no competent engineer ever designs a system which cannot fail. What you try to do is to design so that failure is acceptably rare, and that the consequences of failure are acceptably benign.
Any design which is perfect will also be perfectly unbuildable. A perfect aircraft will weigh too much to get into the air. A perfect anything will cost too much and take too long.
And the process of creating a perfect design is itself unacceptable, because in practice it never ends. You don't ever get to implementation because you'll be stuck in an endless redesign loop, chasing ever more obscure and improbable failure modes well beyond the point of diminishing returns.
That's why most engineers roll their eyes in exasperation when laymen ask for perfect safety from anything, such as nuclear power or the use of genetic modifications on food. Asked if he can guarantee that a nuclear power plant won't fail in such a way as to release significant radioactivity into the area around it, the engineer will answer, "Of course I can't. Why would I even want to try?" That makes the layman think the engineer is a reckless fool who doesn't give a damn about that layman's life and health.
But to that engineer, the idea of a system guaranteed not to fail is idiocy. There ain't no such beast. The system in question isn't guaranteed to never fail, but it has an adequate margin of safety along with a benign failure mode, and that's all the better anyone can do.
It's a cultural thing; laypeople don't understand that all engineering is a tradeoff. In some cases the effort applied to reliability will be very extensive, as the work involved with the Shuttle's computers demonstrates. In other cases it will not make commercial sense to do that; it will massively raise development costs without any commensurate increase in sales to pay for it.
The same thing goes for electoral systems. No matter which you consider, it is always possible to construct a scenario where it fails and clearly gets the wrong answer. Arrow proved it. What you do, instead, is to try to make sure that failures are rare, and the consequences of them are not serious. Ideally you'd like to engineer the electoral system in such a way that structurally it is resistant to critical failures, but that may not be possible and it may be necessary to get above the level of signal processing and start applying semantic filtration.
In the case of the American electoral system, the founders decided that there were certain specific issues which were much too important to risk. The consequences of failure in these areas was perceived to be so great that even if the chance of failure was low, it had to be prevented. So they passed the Bill of Rights.
What it does is to place certain issues beyond the scope of the normal electoral process. They can still be changed, but a different and much more elaborate process is required, and it's instructive to compare them.
Normal elections aren't permitted to infringe freedom of speech. If our elected representatives pass such a law, or if we as voters directly pass an initiative which tries to do so, the courts will nullify it. In order to alter freedom of speech, it is necessary to amend the Constitution, which requires action by the House and Senate and consent of three quarters of the State legislatures. That process represents a much higher degree of filtration which requires a far higher signal-to-noise ratio, so that the signal must be particularly strong to get through. That's what you implement in cases where a false positive is much worse than a false negative. Another example of such a system is the one used to transmit firing orders to submarines carrying ballistic missiles. The one thing you want to avoid at all costs is for them to think they've received such an order when one actually hasn't been sent, because such a mistake will start World War III. Equally, the amendment process requires a much greater degree of certainty because the results of a mistake could be so catastrophic to the system.
Not that it hasn't happened anyway. They passed the 18th amendment, for example, only to repeal it sixteen years later when it became clear that Prohibition was a total failure. And this demonstrates that the system isn't fault-free but it is very robust, because the mistake was corrected, and the nation survived.
The process of election includes a number of steps, all of which are to some degree controversial because they ultimately affect the outcome.
You have to choose which people in your nation are permitted to contribute to the collective decision. No nation in history has ever let everyone vote.
You have to decide on the process by which the voters communicate their opinion. That means to design an encoding for the decision, and deciding just how much the voter is permitted to say, about what.
You have to design the filtration processing and noise rejection algorithm at the regulatory authority (the "election commission") to decide how much information to let through and how much to reject. You are trying to decide what is signal and what is noise, and how to evaluate the signal after noise rejection.
None of these are straightforward. The first is a decision about who to give franchise to, and the US has changed that several times, each time as an expansion. In the beginning, pretty much only economically established white male citizens over the age of 21 were permitted to vote. Over time the franchise was spread to all white men even if they weren't land holders, and then to all men even if they weren't white, and then to women as well. The most recent change, the 26th amendment, gave the vote to 18 year olds.
The encoding is also subject to considerable research and debate. A more complex encoding provides more information (and also potentially more noise) to the processing step, and in some cases proposals for alternative approaches to processing may require a change in encoding. One reason a more complicated encoding has the potential for increasing noise is that the voters may become confused, or they may deliberately manipulate the information to try to subvert the system result.
In our system right now, each voter transmits the briefest possible answer. In a choice for an initiative, it's a single bit: "Yes" or "No". When there are a number of candidates, it is a number saying which candidate is desired, so with four candidates it's only two bits.
Some have proposed alternatives where the voters rank the choices, which involves a much richer data stream to the final stage, information processing and noise rejection.
That's where you get the real controversy, and the widest variety of actual systems in implementation. This is where the biggest difference lies between the winner-take-all system in the US and the coalition approach generally used in Parliamentary systems.
On the level of system analysis, the fundamental difference between those two is the degree of noise rejection applied. The Parliamentary system, by its nature, has a lower threshold for noise rejection; it gives minority viewpoints greater opportunity for representation and influence than the US system. In a nation where coalition governments are the norm, and where coalitions usually include one or two splinter groups, then they gain influence for their platform. In the American system, on the other hand, splinter viewpoints tend to have much less influence.
You won't find any agreement about which is better, and indeed I'm not sure that there's any objective way to decide between them, except for one: try 'em all and see which ones work better. Maybe someday it will become clear that one is better, or maybe it will turn out that their faults are approximately equal and there's no real reason to choose. (And they are guaranteed to have faults; Arrow will see to that.)
I have my own opinion on it, however. I agree with the wisdom of the Founders to put certain decisions beyond the reach of the normal election process, and I also prefer a system with a high degree of noise rejection.
I am a populist, in the sense that I absolutely reject any concept of a small enlightened group, no matter how designated or chosen, making decisions for the rest of us without our input. I believe in the fundamental wisdom of my fellow citizens to be able to see and understand issues and to make collective decisions correctly.
But that doesn't mean I think each individual will make the right decision; I know full well that they won't. With as much diversity as our nation has, and with the unprecedented access to information of all kinds available to us all, a substantial number of people will get confused and get bizarre answers. It's simply inevitable; humans are fallible, and some of the information they'll process will be false or deliberately misleading. The strength of our system is the collective voice, not the cacophony of the individuals. The strength of our system is the basic wisdom of the signal which emerges from all the filtration, having rejected all the noise.
No single citizen is physically capable of fully evaluating all the information available. There just isn't enough time. But the collective distributed computing system has enough input bandwidth and computing capability so that it can, even though no single component of it can. Each distributed computer in the system will evaluate some part of the data available, and if it happens to luck into a skewed sample of the data it will probably get the wrong result. However, when you apply millions of them to the problem, the errors will "smear" themselves while the signal will generate a strong spike. More of them will get the right answer than any of the wrong ones, and our system rejects the wrong ones and discards them completely as noise. And that is good.
There are too many cases where small groups, or sometimes not-so-small groups, can come up with answers that I don't like and with which I profoundly disagree and would consider terribly dangerous to implement. But I do not feel any need to use any kind of force (physical or legal) to shut them up, because I feel extremely confident that the strong noise rejection implicit in our system will make sure that their ideas don't ultimately affect the policies of our nation. As a result, I believe in the full right of White Supremacists to make their case publicly, or for blacks to parade in Washington demanding reparations for slavery.
They can do those things because they can't cause any important harm that way; their contribution has no chance of ever rising high enough to penetrate above the noise rejection threshold. And thus we can structurally permit the widest possible range of expression because it might help and it has virtually no chance of causing harm.
Every new idea has to begin with one mind, and spread. Every idea which becomes majority was minority once. Our system permits such ideas to germinate, and gives them the chance to be spread, and if they can indeed convince a sufficiently large number of voters that they are right, they can eventually become signal. Thus was the 19th amendment passed, giving women the vote.
In a system with a lower noise threshold, which permits minority viewpoints a greater chance of rising about the noise rejection threshold, you'll find a distinct nervousness. There will be a fear of the common voice, the worry that extremists may somehow manage to influence the system all out of proportion to their numbers. I think you'll find that in such nations there are always going to be more restrictions on free expression in at least some areas. That means that they're being forced to use explicit semantic filters to eliminate things which their election process permits through structurally. We do some semantic filtering too, but we haven't had to modify ours in response to transient political movements. We don't ban Nazi symbols, or suppress hate speech, because we don't have to. Our filtration is implemented as court challenges to election results, not as suppression of freedom of expression.
Our system controls those things on the structural level by content-neutral means of a higher noise-rejection threshold, which means we don't have to use semantic filters (i.e. censorship) and we don't have to be as nervous about expression of extreme viewpoints.
One of the canons of the Transnational Progressivism ideology (you knew this was coming, didn't you?) is the idea of a guarantee of the inclusion of minority groups (especially those officially designated as "victims") in the decision making process in government and industry. I strongly disagree with that because that represents a deliberate attempt to drop the noise-rejection threshold to the point of uselessness, where nearly everything is designated as signal, and the most loony of loonies will still get some say in how things are run. In such a system, those who have been demanding reparations for slavery might actually have a chance of prevailing, which would be a disaster for the nation. And if it were actually done evenly across the political spectrum, then the substantial number of White Supremacists out there might well be able to get a voice to resist our efforts at reducing discrimination.
Jeeze; if the US were run that way, we might even have ratified Kyoto and the ICC treaty.
Update: Not that it actually affects the point of this article, but several people have written in to point out that my information about the Shuttle's computers is a bit out of date. Scott sends this article which describes the process by which its software is written. Michael points out that they did an upgrade on the hardware in the 1990's to use RAM and more modern electronics. (Tom sent the same link.) Michael also pointed to this page which describes in more detail how they've split the software up. During launch, there are two separate programs running which were developed independently, and the pilots can switch from one to the other rapidly if they think they need to in order to save the ship, if something bogus starts happening.
Richard points out that I botched my description of the minimum EC/I0 needed for CDMA to work, since dB is indeed a ratio by definition. Rats. What do you expect from a software guy? I was about as far from the RF as anyone on the design team could be except maybe the case designers. I studied the hardware, and I understand how the rake receivers work, but when I was taking that class, as soon as they started to talk about transmitting the chips in pairs in some kind of quadrature analog encoding, I got totally lost. I don't have the math.
In any case, just to clear things up: There are two different values, expressed either in dB or dBM; I was never clear on that and never really understood what dBM meant. One is the signal strength, the other is the noise floor, which represents the amount of the total signal which was being used to send data to phones other than us (plus a small contribution by naturally-occurring interference). The difference between those two was EC/I0, and if that exceeded about 18 then there was enough usable signal for the rake receivers and all the rest of the digital electronics to use coding gain to reproduce the original packets.
Jason writes to say that I may have mischaracterized the capabilities of the Viterbi decoder. I don't really know, but I don't think that Jason's description of it in his mail was really right.. The Viterbi decoder actually does speculative decoding, and if it finds itself in a trap it will back up and try a different path from some earlier branch point, or so it was explained to me. It's a lot more sophisticated than most FEC; I believe it's actually implemented with a state machine. And I do know that it reports a quality level with each packet, because we used that to generate feedback to the cell to control transmit power. If the Viterbi decoder told us it wasn't making any corrections at all, we would tell the cell to stop shouting, because it meant we were wasting some of our coding gain. We also knew if the decode failed, but I may be wrong about the decoder telling us that. (Now that I think about it, the clear data packet had a checksum and that may be how we knew it had failed.)
Like the analog stuff, however, the Viterbi Decoder was far enough away from my part of the system so that I only needed high level theoretical knowledge of how it worked. We in the software group had to directly control the rake receiver, but the Viterbi Decoder was self contained in the ASIC, so we mainly needed to know its function and how to interpret its output (including its reported quality level).
If analog is beyond me, so's this stuff. I'm just not in Dr. Viterbi's league. But for whatever it's worth, here's a page which goes into how it works.
Update 20020822: The guy with the new theory is Stephen Wolfram. (Thanks to Patrick for the pointer.)
Aaron Haspel comments.
include
+force_include -force_exclude
|