[Cryptology] [The Cryptology Independent Study Project] The basic
spoken-word cipher
The Cryptology Independent Study Project
chris at dod.net
Thu May 18 06:24:08 UTC 2006
http://cryptology.dod.net/index.php?/archives/40-The-basic-spoken-word-cipher.html
During the process of documenting human rights abuses Alice and Bob are taken captive by the secret police of some unnamed country. While the secret police suspect Alice and Bob are involved in a project to document these abuses, they are not quite sure how much the two know. As a result Alice and Bob are placed in two locked rooms connected by an air duct where Eve, a member of the secret police, has planted a listening device. The secret police hope to get a better idea of the type and nature of the information that Alice and Bob know by listening on their supposedly secret conversations. How can Alice and Bob communicate using the robust characteristics of their language, while still making their conversation secure. This scenario is intended to lay the groundwork for an interesting problem in secure communication, and our protagonists are here to help us examine the possible solutions to this dilemma. While almost every child in America learns Pig Latin or some other such system of pseudo-secret communication, this examination is intended to be a brief introduction to the actual problems, and possible solutions, of such systems. It is hoped that with a brief overview of cryptography and linguistics we can begin to find a possible solution to Alice and Bobs situation.
One must start by examining the principals of good cryptography. That is, since we are aware of what makes cryptography successful it would appear logical to apply those same principals to Alice and Bobs communication problem. A good cipher depends on a couple things, but we will examine one here. Experience has shown that long and pseudo-random keys are often best. Indeed the only unbreakable, though not practical, cipher is the one-time-pad where every element of plaintext is encrypted and decrypted with its own key value. If the algorithm is good, most of the security will reside in a good key. One must ask why this is the case. Well it stems from being unable to apply frequency analysis to find the unknown element of the cipher. That is, the trick is to use mathematics to deduce the key, and ultimately the plaintext. While this is certainly an oversimplification of the problem it serves one important purpose: namely, good cryptography is mathematically complex. This simple fact leads us to the initial and disappointing result that, while humans are geniuses when it comes to language, they are not equipped to perform the types of mathematical operations needed for good cryptography at the speed of language. This unfortunate reality does not mean that one should give up on spoken-word ciphers, but that we must figure out a satisfactory way to achieve our goal using speech and the innate genius humans have to both speak and perceive language.
What we must do is appeal to human language in hopes that we can find a system well suited to our needs. That is, what lessons can Alice and Bob learn from the field of linguistics and the unique nature of humans ability with language. There are at least three linguistic categories that immediately stand out as interesting: lexicon, morphology, and phonetics. At first Alice and Bob might recognize that what they are trying to do, obscure the meaning of their language, has been done by militaries before. That is, military codes have been used to conceal language meaning by troops in the field when their communication channel is not secure. While this may appear to solve the problem Alice and Bob find themselves faced with, it falls far short of their needs. Codes, while applicable in some situations, are not robust enough to handle diverse and meaningful conversation. Furthermore, they suffer from a major problem: namely, they are based on a languages lexicon. Codes usually map one-to-one, or one-to-many depending on context, between the lexicon of the speakers primary language and some other word in that language. With codes one needs a codebook, or mapping, which must be changed frequently to confuse the eavesdropper. If Alice were attempting to convey a complex idea to Bob she would need to look each word up in this code book to construct her sentence, and Bob would need to look each word up to deconstruct it. This means they would most likely need to write things down, and that will not solve Alice and Bobs problem. It should immediately strike the reader that a code is very similar to the popular notion of a language. Alice and Bob would essentially need to learn a new language. Additionally a languages lexicon changes frequently and would require something akin to an English to Spanish translation dictionary to fully describe. In short, codes are not cryptography. Cryptography intends to change some plaintext, or in our case language, into some meaningless ciphertext, or cipherlanguage. This means that Alice and Bob can immediately dismiss lexicon as a useful solution to their problem. What they need is a system to quickly speak and perceive their language. What they need is a way to morph their existing language into something else.
This brings us to the next area of investigation: morphology. Morphology is most certainly the language analog to what one finds in other mediums of cryptography. Cryptography in computers morphs at the bit or byte level, in writing systems cryptography morphs at the single character level, and in spoken-language we would like morphology to work at the word or syllabic level. Such operations would appear to be the most efficient and effective way for humans to perform reversible changes on a language. Let us first examine morphemes and their traditional meaning in linguistics. An affix is a word element that is usually attached to the root of a word at the beginning, middle, or end. These are usually referred to as prefix, infix, and suffixes respectively, and most commonly show up as suffixes in the English language. The other important distinction is that affixes are either derivational or inflectional. For Alice and Bobs purpose these traditional categories of affixes fail to fully describe what a spoken-word cipher does. What we want is an operation that we can perform on the existing root with all of its affixes as required by the languages grammar. This requirement is almost something of a super-affix that wraps ones native language. Perhaps it is possible to make an argument that this type of affix should be classified as a phonological rule, but for now it is enough of an anomaly that we will leave the question open. Let us first examine the applicability of prefixes and suffixes in a spoken word cipher. Both affixes are appended to one or the other end of the root, which means the entire word, and therefore meaning, can be easily deduced. Because this is such a superficial change to the root the eavesdropper is much more likely to understand the context of the conversation based on the usage of the root. The infix affix, however, shows much more promise.
The traditional usage of an infix is to insert the affix word segment into the middle of a word root. This could help obscure the meaning of single or even double syllable words, but when Alice attempts to say the word constitutionality to Bob the remaining six syllables of the word will contain enough familiarity to the eavesdropper that the word may be obvious. Because of this we will break with the more traditional notion of an infix word segment and propose a syllabic based solution. In this alternate usage we will break a word, both its root and its affixes, apart into syllables and insert the infix in between the vowel of that syllable. Traditionally affixes have word formation rules and, although we are not quite in that paradigm, it may help to think of this requirement as our WFM. This method should help resolve the problem above while maintaining our requirement that the rule be quick and natural to the speaker. But what should our infix word segment look like?
The next logical question is: how can the construction of our affixes maximize our cryptological objectives? First it is helpful to understand how these affixes are analogous to classical cryptography and its usage of keys. Because modern cryptography is so mathematically complex, Alice and Bob are better suited to start where classical cryptography did. That is, if one cannot perform operations similar to modern cryptography at the speed of language, can they perform operations similar to classical cryptography at that same speed? Please keep in mind that while classical methods may be better suited to our needs they fail spectacularly against frequency analysis and techniques such as counting the index of coincidence. This drawback may allow us to proceed, but we should always remain weary of the result: while we may find an adequate solution for the casual listener, it is much more difficult to fool Eve or any party that is willing to record and dissect the language. It is best to think of classical cryptography as a function machine. One usually has two inputs P[i], which represents one element of the plaintext, and K[j], which represents one element of the key. These two pieces are thrown into a function that produces C[k], which is a single element of ciphertext. On the other end C[k] and K[j] are fed into an inverse function that produces P[i], the original element of plaintext. With our syllabic rule we can say that we will pass each syllable of our native language and the key operation, which is our infix affix, into the function machine, and on the other end we get our cipherlanguage. Similarly the listener, aware of the infix affix, must listen to that cipherlanguage and deconstruct it into its original meaning. If this works and one can learn the affix rule relatively quickly, this is a true testament to innate human genius in language. Let us now examine our first affix.
Perhaps the most compelling way for one to see how this process works is to walk through the process that Eve must go through when she hears Alice speak with Bob for the first time. To this end please download and listen to the following 60-second audio clip of someone speaking American English with an infix that will be described later. Feel free to listen to the clip as much as is necessary to take a few guesses at its content.
spoken word cipher sample
In cryptography the hardest break is a ciphertext only break. This break, or its spoken-word cipher analog, is what Eve must attempt to perform. Please note any interesting results and leave them in the comments. The next type of attack in cryptography is a plaintext attack. In this attack the plaintext is known and one attempts to find statistical or other anomalies in the ciphertext that may lead to a solution. Now listen to the audio file again, but this time do it knowing that it is the two opening sentences of this essay. Because humans are so linguistically capable it should not be surprising if you can hear and learn the rule this way. Because the purpose of this exercise is to assess the effectiveness of the infix operation, please make a mention if you have already learned the rule. If you have not already learned the rule, which we hope you have not, it is simple. In each syllable after the vowel insert the sound [zlf] and repeat the vowel at the end of the word. So cat becomes ca[zlf]at, dog becomes do[zlf]og, and butterfly becomes bu[zlf]utter[zlf]erfly[zlf]ly. Notice that butterfly does not break down as perfectly with the rule as cat or dog. This anomaly is because the infix operation does not pander to the written language, but rather seeks to preserve some semblance of the original sound for the listener. The hard and fast written rule for butterfly would look like: bu[zlf]utte[zlf]erfly[zlf]y. The reason for this is that when speaking the language Alice needs to preserve enough of the original sound so that Bob can remove the infix and still solve for the word. This requirement is so natural that when this author originally made the rule I violated it unconsciously and almost immediately. Listen to the audio clip two more times. First attempt to understand what is being said without reading along, and then read along knowing the rule. If you have any observations please leave them in the comments. So now we have one infix element that Alice and Bob can use to communicate somewhat securely. Nevertheless, Eve is aware of frequency analysis and, since she read this paper, she knows that she is looking for single syllable words that occur frequently in American English. Once she collects enough audio data she can make a frequency count of the sounds she hears and assume that the most common is the word the. Is there an improvement that Alice and Bob can make to further aggravate Eves attempts at a solution?
Let us turn again to our discussion of cryptographic functions. Remember the general rule that longer and more random keys are better. Well what Alice and Bob have so far is a key of size one. That is, every time there is a new syllable they use the same infix word segment. Let us now discuss a more complex word formation rule, and thus extend our key length. What if we chose three other infix word segments to go with our existing [zlf] word segment? This would make the key length four, which appears manageable for Alice and Bob, but should aggravate Eve. The question is: how should we use these new word segments. Well the word formation rule for [zlf]appears to serve us well, but wouldnt it be difficult for both Alice and Bob to change infix word segments after every syllable? It most certainly appears to be difficult but there is a more compelling reason not to use the infix word segments this way. If there is a rhythmic steady repetition of the new key of size four, then Eve may have an easier time breaking the cipher. Imagine two other possibilities: word and sentence boundaries. The former would add a bit of entropy into the cipherlanguage because of the pseudo-random occurrence of multi syllable words, but it may still be too difficult to construct. One benefit would be that the listener could hear word boundaries tagged clearly. The latter would most certainly be easier to construct and would have the same source of entropy, but it may loose in the long repetition of the same sound. Clearly there are many possibilities here, and all would most certainly help to confuse cryptanalysis. Another approach would be to match our infix word segments with common endings or beginnings of words in the English language. Or one can do the opposite and find the most nonsensical segments like [zlf] above. Perhaps one could even attempt to add segments that would give the appearance of another language all together. For example, someone speaking Spanish may wish to construct their infix word segments to give a high occurrence of Italian sounds. There are definitely many possible morphemes worth exploring, and perhaps some other tricks not thought of here.
The last area of investigation, phonetics, includes speech perception and could be a very useful addition to what we have discussed so far. In fact, there may be an argument that much of what we have discussed above should be classified as phonetics. The ability to introduce sounds that are not part of normal speech may help to obscure their meaning; however, they may also be difficult for the listener to understand. The real question is: would such sounds introduce something so unnatural to the human ear that even a-priori knowledge of the key and process would not help the listener? Since this is a more involved area of linguistics we can leave it open for future discussion. For now we have the groundwork for a viable, although weak, spoken-word cipher.
So it would appear that Alice and Bob have a semi-secure means of communication. The cipher is certainly weak when the attacker can record and listen to the conversation later, but it would appear that without exploiting some interesting characteristic of the innate human ability for language one is restricted to this weaker cipher form. The fundamental problem is that cipher function mentioned above. While humans are amazing linguistic machines, the function is too restrictive. When the classical cryptographic ciphers were found lacking they were replaced with more complex and diverse functions. Our innate ability with language, although quite amazing, may be too restrictive for robust cryptographic operations. Ideally one must strive for a seemingly random occurrence of sounds in the resulting cipherlanguage of a spoken-word cipher. Perhaps speech perception and phonetics could make this type of cipher more robust. Furthermore, it should be noted that part of this innate ability humans have with language makes for great error correction. When teaching the basic infix operation above to my wife and friends I found they were less than perfect at speaking the cipher. Nevertheless, I was able to understand everything they said to me because of my ability to match sounds and context. If you, the reader, would like to open a discussion about spoken-word ciphers please do so. There may already be information in some obscure literature that you think would add to the topic. As for now, the spoken-word cipher is little more than something interesting to teach your friends and family.
--
The Cryptology Independent Study Project is powered by Serendipity.
The best blog around, you can use it too.
Check out <http://s9y.org> to find out how.
More information about the Cryptology
mailing list