Tuesday, 10 June 2008

Voynich linguistic disproof!

Possibly as a byproduct of all the philosophy of science lectures I once endured, I've got a bit of a soft spot for Karl Popper. Basically, a Popperian approach to science involves constructing cunning weapons of disproof to chop down falsifiable hypotheses, where the "last man standing" is your current best bet at the truth. This is not unlike a somewhat formalized version of Conan Doyle's "When you have eliminated all which is impossible, then whatever remains, however improbable, must be the truth".

To be honest, Conan Doyle's version is a tad sucky, as it assumes (to allow Sherlock Holmes to ever solve anything) that you are able to generate all possible explanations, in order that your process of elimination-by-disproof can ultimately iterate to the One True Truth. In the real world, however, an imaginative scientist should be able to conjure up candidate explanations at a faster rate than they could ever practically be tested.

Another very significant problem is the economic cost of constructing cunning weapons of disproof that will demonstrate that hypothesis X cannot be true. Doing this for even a single case can be very hard, let alone for situations where there are hundreds of possibilities.

Yet the scientific method typically works to an abysmally lower level of proof, looking merely for persuasive mental models and correlative statistics to back it up. Basically, the scientific method makes Bad Science easy to do because you haven't got Karl Popper peering over your shoulder saying there are no proofs, only disproofs, you haven't disproved anything.

All of which is simply to help paint a picture of the lamentable situation in which studies of the Voynich Manuscript have been for so long, where there are not only countless imaginative hypotheses to deal with, but also few if any Popperian tools of disproof. This has meant that people can (and do) make pretty much any pseudo-scientific assertion about the VMs they like and nobody can (without invoking particularly arcane statistical arguments which only a tiny minority can easily understand) tell them they're definitively wrong.

Until now.

Voynich researcher Marke Fincher has long been fascinated by Voynichese words' strange behaviour, and how it differs from the behaviour of words in real languages (such as Latin, French, Swahili, etc). Yet nobody had devised a way of making this difference visible.

But recently Marke developed a programme called WPPA which allows a lot of this structure to be made visible. In particular, Marke showed that real languages have an implicit word association structure whereby recurring pairs of words can be found not only next to each other, but at a certain distance from each other as well. Word pairs also largely prefer a particular order: Marke points out that "and the" is very much frequent in English than "the and".

His paper shows plots taken from a number of languages, which (when taken together) show what you might call a meta-linguistic curve, a statistical behaviour shape that is followed by basically all the real languages he had tried - an expression of languageness, in terms of the patterns of behaviour you'd fully expect to see in texts written in real-world languages.

But Voynichese does not display these curves: and so isn't a simple language.

Any, errrm, cunning linguist who thinks they have a sample of a little-known language which somehow bucks this trend is free to email Marke Fincher for a copy of his WPPA program (or you can just send him a copy of the text). But you know, I think he's not going to be dreadfully surprised by his inbox any day soon.

And not only is Voynichese not a simple language, it also is not a simple language written right-to-left, nor a simple substitution cipher of any sort (including simple verbose ciphers), nor a consistent intra-word transposition cipher (like a reverse anagram cipher), because none of these would alter Voynichese's basic linguistic curve.

For years, people have endlessly debated whether the nature of Voynichese is that of a cipher or that of a unknown language - cryptology vs linguistics. Well, Marke Fincher has now given us all his cunning Popperian machinery of disproof to rule out basically all simple language conjectures and a lot of simple cipher theories too.

This is great, because if someone now tries to convince you (for whatever reason) that the VMs is in High Middle German, Hebrew, Celtic, Shelta Thari or whatever but written in a funny way, you can wholeheartedly say - sorry, but no. Voynichese words don't work like any known language in several key ways, and that's that.

Moving ever forward, there is one thing I suspect that Marke should perhaps now consider: whether the fact that Voynichese word pairs appear pretty much as often forward as reversed (which isn't true of languages at all) is part of the "specification" (as it were) of Voynichese, or whether some lines (say, even-numbered lines within paragraphs?) might be word-ordered from right-to-left (i.e. some kind of boustrophedon word-ordering). That is, whether Voynichese's symmetrical reversibility might actually have a word-transpositional explanation.

Some people may think that being able to disprove things is no big deal: but I think it's actually a very big deal indeed. Karl Popper would be proud!

No comments: