Language Is Sneakier Than You Might Think.

As promised, I’m going to talk a little bit about the other side of that arts/science divide that I keep seeing. Notice, please, that I’m not going into any particular depth here. I’m just trying to illustrate what I see as a genuine problem: that the people who handle language well aren’t communicating with the people who handle science well, and vice versa.

In the last post, I pointed out that a little attention to mathematicians and the definition of ‘information’ might have saved a few decades of academic tooth-gnashing and grief if it had been applied to the question of “the source of meaning in a text”. Simply enough, a problem which built and razed careers in literary acadaemia over a sizable chunk of the 20th century becomes trivial when observed through the lens of basic information theory.

On the other hand, there’s no reason for scientists to be smug.

One of the problems facing the peer-review process in science today is the need for competent peer reviewers — that is, people who can handle the high-end technical jargon which is more or less necessary to describe some of the cutting-edge work going on today. In fact, the tendency towards impenetrable jargon has become so problematic that a bunch of wags at MIT decided they’d do something which would highlight the issue.

Enter SCIgen, and it’s twin sister Mathgen.

These two programs generate research papers. SCIgen handles computer science research, and Mathgen handles mathematics.  The papers contain plenty of information, but there’s bugger-all meaning to them from the perspective of, you know, actual science or mathematics. (Hey. Information is just information, after all. It doesn’t have to be valuable, useful, or even in any fashion sensible.)

Now, you might think that computer-generated science papers wouldn’t fool anybody. According to what I’ve read, the basic method used by the computer is simply to take nouns from the appropriate field of science, and hook them up more or less grammatically with verbs and descriptors from the same field of science. Gibberish, in other words.

Unfortunately, you’d be wrong. Here’s an article from Nature discussing the discovery and subsequent withdrawal from publication of 120 such “papers”. Yep – they’d all been through the peer review process, apparently… Admittedly, most of ’em appeared at conferences in China which apparently has even more publish-or-perish pressures than we do, but still. A hundred and twenty?

Worse: the papers were detected not by alert reviewers or readers, but by another computer programme which essentially did nothing more than check the papers for similarities to existing SCIgen articles, much the way that anti-plagiarism routines operate.

Now, here’s the thing: this is a quote from the London Review of Books on a Mathgen-created paper which sneaked into Advances In Pure Mathematics. (The title of the bogus paper, by the way, is “‘Independent, Negative, Canonically Turing Arrows of Equations and Problems in Applied Formal PDE” — which really ought to be a giveaway in itself.) The quote:

“Each of these sentences [of the paper] contains mathematical nouns linked by the verbs mathematicians use, but the sentences scarcely connect with each other. “


What brought all this to my attention was another article — which I have unfortunately lost in the great Sea of Webbiness — which discussed the matter in worried tones, and provided fifteen samples from a collection of papers which included both legit work, and SCIgen masterpieces. According to the author of the article, discerning between them was supposed to be challenging even for those with expertise in the appropriate jargon.

That could be possible. Not myself being a master of cutting-edge mathematical or computer science language, I just, you know… read the samples. As language. I looked at nouns, verbs, adjectives and the bits that joined them together. And in a very short space of time, I’d successfully sorted all the ring-ins from the real works — even if I still didn’t understand what the real works were talking about.

It was actually pretty easy.

My friend Sharon B, far higher on the literary and academic tree than ever I shall be, remarked that she helped her husband edit his thesis (in organic chemistry) in the same manner. She just followed the rules of English, and ensured that his sentences actually made some kind of grammatical sense. (This is not easy. Her husband has a bit of a love-hate relationship with grammar at the best of times. Combine that with high-end chemistry, and it’s easy to imagine that the experience must have placed something of a strain on relations in the household…)

Now, consider what has taken place here. On one hand, someone like me (or Sharon) can sit down and make a pretty fair stab at sorting the crap from the crunchy stuff without really putting a lot of effort into it. On the other hand, the computer science world has decided that the right approach to the Menace of SCIgen is to create a computer program that looks for SCIgen stuff.

And there you go. That’s precisely the point I’ve been trying to make. The science folks don’t seem to be aware that the language people can bring insight and value to the process of communication. Meanwhile, the language folks don’t seem to have much interest in considering the input from the science and math folks.

I do get it, to an extent. There are times when the utter dipstickery of self-indulgent post-modernism makes me want to lace up a pair of steel-capped boots and practice a tap-dance routine on an extensive array of highly respected academic dental work. And there are times when I come across papers in science so abstruse and so trivial that I cannot conceive of any sane person undertaking the research in the first place.  How much more difficult must it be for someone who has built a career on one side of the divide to find the time to take an interest in the other?

Nevertheless, I’m thinking maybe it’s about time somebody did…



One comment

  1. I like working in the divide, and I think it is essential. A strong command of language is invaluable in the marketing of ideas and information; research shows us that stories are far more effective than facts at communicating important points and actually triggering a change in response to the new data. Unfortunately the current response of science to communication problems is to throw more facts into the argument instead of investigating ways to convince the broader community of the ideas supported by said facts.

    It’s one of the big reasons why I write my blog, which is designed to use personal anecdote and pretty photos to communitate ideas about sustainability and environmental science to a non-scientific audience, with aspirations of triggering behavioural changes at the personal level (i.e. people stopping to think about how they do daily things, and hopefully deciding to do them in more throughtful ways).

    Good use of language is a joy, and a powerful story will always resonate further than powerful facts.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: