On romanization

Vendors, read this.

Languages in East Asia are tough, at least for foreigners. They are some of the most difficult languages to learn in the world, and for tea drinkers who don’t speak or read such languages, they can be a bit of a pain to navigate. Since names for teas are already such issues, with vendors naming their own teas and also the confusion and lack of oversight of tea nomenclature. It doesn’t help, however, when romanization is itself an issue. This is more of an issue for Chinese and less so for Japanese, since there the romanization is pretty standard. Korean romanization can be a little weird too, with different competing systems (Jeolla in Revised Romanization vs Cholla in McCune–Reischauer, for example), but since Korean teas are, let’s face it, a relatively small universe with better sourcing information generally, I’ll ignore its issues for now.

For those of you who know Chinese, you probably know that there are two main romanization systems, Wade-Giles and Pinyin. Up until the early 90s, pretty much everyone used Wade-Giles except those in Mainland China, who used Pinyin. Then things flipped, and everyone started using Pinyin, and Wade-Giles is increasingly dropped with the exception of Taiwan, which finally adopted Pinyin two years ago. These are partly for political reasons, and partly because, well, a billion people can’t be wrong, I suppose. I personally reserve a special hatred of simplified characters, because in the simplification process much of the meaning of the proper characters are lost, but I realize that many people now simply cannot read proper characters, unfortunately.

Anyway, with two romanization system and the relatively recent date of conversion, you can imagine there are issues with their usage. The problem is further complicated by two things: 1) conventions from the past, and 2) the fact that many Chinese people, especially those from Taiwan and Hong Kong, never actually learned any romanization system at all. Chinese, as you probably know, consists of characters that do not have phonetic indicators – meaning that by looking at the characters, you can’t tell how to pronounce them. It’s awful for people trying to learn the language, but it’s great for the purpose of keeping lots of people who don’t use the same dialect sharing the same written language. For all these romanizations that we’re talking about, we’re only concerning ourselves with the use of standard Mandarin.

So we’ve got two main romanization systems, but a fair number of people who don’t really know either, and a lot of vendors who probably don’t know much or any Chinese, as well as the use of older customary romanizations that persist. One of the most obvious and common old conventional spelling that still exists today, as related to tea, is the use of puerh instead of the Wade-Giles p’u-erh and Pinyin pu’er. I use puerh, instead of the Wade-Giles or the Pinyin version. Another common one is tikwanyin, which in Wade-Giles should be t’ieh-kuan-yin and in Pinyin tieguanyin. One reason people have dropped Wade-Giles in favour of Pinyin is because Wade-Giles has finicky rules regarding the use of apostrophes, which are essential for accuracy, and also hyphens. Without those, or getting those wrong, renders Wade-Giles rather useless. Pinyin only has issues with apostrophes, which is easier to deal with and errors are often not fatal (although still frequently wrong).

Pinyin also includes strict rules with regards to how to separate words. Since Chinese is character-based, it is very tempting to put everything into separate characters and just be done with it. Using the cake from the last post as an example:

Photobucket

The two big words are “yesheng” or “wild”. Then, above the 2005 is “xianliangban” or “limited edition”. After the 2005 are “Menghai laoshu yesheng tedingcha” or “Menghai (region) old wild tree special ordered tea”. At the bottom is “Chenguanghe tang chaye yanjiu zhongxin rongyu dingzhi” or “Proudly ordered by the Chenguanghe Tang Tea Research Centre”. Note, of course, the nonsensical “Chen kang ho tang Pu-erh Tea”.

Now, imagine if the bottom row is all separated (and capitalized, as is often done for reasons unknown) “Chen Guang He Tang Cha Ye Yan Jiu Zhong Xin Rong Yu Ding Zhi”. What’s going on is that by separating everything, it becomes very difficult to tell where one word ends and the next begins. When romanizing, one of the things the person doing the romanization is splicing the words into sensible units, following the rules I linked to above. If I see a row of romanized characters all separated into individual syllable, I often need to see the Chinese original to know what I’m looking at. Properly romanized, however, it is usually quite easy to figure out what we’re dealing with.

One of the worst offenders of romanization confusion is Hou De. For example, the puerh brand Xizihao is routinely romanized as Xi-zhi Hao (finally fixed in some 2011 new listings, but persist for the older ones). There are no hyphens in Pinyin, and no X in Wade-Giles, so this is really neither. Hou De routinely does this sort of mixing, but Guang’s certainly not the only one. In his defense, he probably never learned Pinyin, having grown up instead on zhuyin fuhao. Other vendors mix in capital letters when there should be none, separate words randomly, mix in Wade-Giles from time to time, or simply spell things wrong. Babelcarp has a truckload of such misspellings, helpfully linked to the most widely used one.

Another issue is more simple – some vendors choose to give you the name of the tea in translation, while others give you the name in transliteration. Biluochun and green snail spring are the same thing, but you wouldn’t know it unless you’ve learned that somehow. Likewise, you can see Keemun, the old conventional name for Qimen, often on websites and teas and what not. Qihong is Keemun black, but again, you wouldn’t know it unless you somehow already knew.

While most people can figure out that puerh and pu’er are the same thing, it’s harder when the difference is between tikwanyin and tieguanyin, or even oolong vs wulong. Ideally, we’ll all use the same thing, so there are no problems, but I choose, for example, to use oolong instead of the proper wulong romanization because of accessibility – the same reason why puerh is used instead of pu’er on this blog. Very often people will find oolong being the word on vendor pages, not wulong, and might wonder what wulong is when in fact it’s the same thing they’ve always had. I also thought about switching wholesale to use Pinyin exclusively, but the thought of somehow having to go back and fix past listings stops me. I suppose the only way for a consumer to wade through all this is to arm him/herself with some knowledge of Chinese, so that one’s not too reliant on vendors’ proclivities. Vendors can also help by using more Chinese in the websites – it never hurts, and in this day and age, easy to do. Unless, of course, if they decide to rename their low grade Yunnan black tea Golden Peanuts, or something.

Addendum: Sometimes I forget the original impetus for writing these things. Jakub, helpfully, reminded me with his comment. For things like proper names, one should string them together. So, Yunnan province is Yunnan Sheng. Menghai county should be Menghai Xian. Yiwu mountain should be Yiwu Shan, not Yi Wu shan or Yi Wu Shan. Gaoshan Zhai, or Guafeng Zhai, or any other village, are a little more ambiguous. Zhai, in this case, is really “village”, “hamlet”, or literally, “stockade”. So it should be treated the same way as sheng (province) and shan (mountain) and separated from the name.


Comments

On romanization — 25 Comments

  1. Golden peanuts sounds great :) But after visiting the website of wine shop named “Fat bastard” yesterday, it does not sound as weird as it could.

    How should one split words of villages/regions? E.g., Yiwu/Yi Wu; Gao Shan Zhai/Gaoshan Zhai/Gaoshanzhai;… I am constantly fighting chaos when writing these and the chaos still keeps winning.
    Jakub

    • If you don’t know some Chinese then I don’t know how you will learn the intuition for using a system of writing Chinese phonetically; I think it’s better not to worry about it, or else learn some basic Chinese.
      Really, the biggest problem is incomplete translation–I agree with Marshall–actually a lot of people know Tieguanyin is iron goddess of compassion; biluochun is not quite so famous so you wouldn’t know the translated name; as far as mundane names like “mountain,” if you don’t know what Yiwu means, then I don’t understand why you need to keep the word for mountain. Why know just say Yiwu Mountain, or Li mountain (in fact, I don’t even know why the latter has persisted as this in English, rather than simply Pear Mountain, a very easily translated tea-mountain!)

      • Yes, it’s indeed difficult if you don’t know any Chinese. When you become a vendor, however, and especially when you become a vendor selling name sensitive things like puerh, then learning a modicum of Chinese is pretty important, because learning how to parse Chinese in a sentence makes all the difference in how you translate the name of the tea, for example. When minor differences in names mean major differences in prices, then I think getting the language right is crucial.

        Yiwu mountain is pretty acceptable, so is Li mountain (or Mount Li, as some will do it). Pear Mountain might be a bit of a stretch, partly because often times you have multiple words in Chinese meaning the same thing, so translating the meaning of names is usually not a good idea, especially if the origin is not very clear. You also can’t do it for some and not others. Yiwu would end up being something silly like “Shifting Martial”, which is entirely nonsensical and wasn’t supposed to be the meaning to begin with (Yiwu itself is a transliteration of the sound of what locals called the place). You don’t really want Menghai County to become “Brave Sea County” and Shanghai being “Up Sea”. So stick with the sounds.

  2. People who sell chinese teas should at least take a few lessons to know a little of what they are talking about, and then choose to use Pinyin, or Wade-Giles… Because too much of them ignore absolutely everything about this. And they also should be able to pronounce chinese tea names properly. Even if everything isn’t perfect, the will of being aware should represent a priority.

    And yes, certainly some people think Keemun and Qimen, or oolong and wulong are different kinds of tea. Quite a mess!

    Very interesting post, thanks.
    Charlotte

  3. Your mention of how speakers of mutually unintelligible dialects can communicate in writing using Chinese characters put a strange thought in my head. The Chinese government is trying pretty hard to get everyone it rules to speak Beijing dialect, more or less. If and when it succeeds, and Cantonese and all the Fujian dialects are moribund, Chinese characters will no longer be needed to bridge dialects. Then can we (or our descendants!) expect Chinese language reformers to revive the idea of using a phonetic, alphabetic writing system? (Pinyin, I would guess?)

    • No, when your language only has something like 400 sounds (plus tones, so make that about 1300-1400 sounds since not all sounds exist in all tones) writing in romanized form creates all kinds of problems. Japanese can’t even ditch kanji without causing comprehension issues.

    • While speech and writing resemble each other, they’re not the same. For starters, you have to sound out everything. Have you ever tried reaching a Japanese text heavy with katakana loan-words? It’s hell.

  4. Sorry, I can’t comment on Japanese – I have only the dimmest bird’s eye view of that language. So could we please talk about Mandarin?

    Let’s say you’re a competent speaker of Mandarin and you have learned Pinyin well. Reading well-rendered Pinyin, you get all the sounds of the individual characters minus the distortions that come from sloppy speaking and background noise. You get the prosody of perfectly spoken sentences as well, because there are spaces between multisyllabic words, and there’s punctuation, too. You can even go back to the beginning of the sentence you’re trying to parse without straining your memory. So how is this harder to parse than speech?

    • Because it’s slow, and there’s ambiguity. I think it’s probably fine with very simple sentences and if you’re trying to render speech, but if you get into the territory of newspaper writing (or even more advanced stuff) then all of a sudden you have a lot of possibilities for what the words might be, and you’ll be none the wiser. Throw in some foreign words, names, etc, and you’ll be completely lost.

  5. “If you’re trying to render speech”: I take it you agree that Pinyin should be as easy to parse as speech.

    It’s true that there’s lots of written discourse that’s never destined to vibrate inside people’s ears. But even there, the advantages that Chinese characters have over Pinyin don’t seem as imposing to me as they do to you.

    While a monosyllabic pinyin word is ambiguous, most Mandarin words are multisyllabic, which drastically diminishes the ambiguity.

    In Pinyin, the work of delineating word boundaries has been done already, so the person reading the text needn’t worry about that. Consider how many mistakes about word boundaries get made by Google translate et al.

    There are lots of ambiguous Chinese characters, too. Parsing text in Chinese characters or in Pinyin, you need context, of course.

    Regarding foreign names, I would think that could actually be an advantage for Pinyin: you could just “spell them right” in their language of origin, couldn’t you?

    • I think you vastly underestimate the ease with which native readers of Chinese can read a text.

      Let’s use this example we had above: “Menghai laoshu yesheng tedingcha”

      Let’s say you saw this without any reference to anything else. Just that line on a piece of paper. What is it?

      Without tone markers, it’s almost impossible to figure that out. If you don’t know a place called Menghai exist, you wouldn’t know what it is. Laoshu is most likely mice. Yesheng is most likely wild, so wild mice? But the order is wrong. Tedingcha…. no idea. Special top tea? Special order tea? Something else entirely? How can you be sure?

      I know you’re going to say “but there are tone markers, and it’ll solve all problems”. Well, ok, but then you basically have to read out every sentence to figure out what it is. Is that really how you read when you read English? You sound out everything? I don’t think so. Even then, there are words that are entirely congruent in sound but differ in meaning. In cases when they are nouns, how do you know which one it is? Context? What if the context isn’t clear?

      Besides, regional differences are never going to go away, so it’s all a moot point.

      I brought up Japanese because it’s the one language where both kanji and a phonetic system exist together, in the same language and used together. Put simply, when Japanese is rendered purely in the phonetic hiragana, it can be almost nonsensical. It’s not impossible to read, but it’s much slower and much harder to read.

      I didn’t say pinyin should be as easy to parse as speech. I said you can probably figure out what’s being written if you’re trying to render speech – I said nothing of how long it might take or how much difficulty you might have to go through.

      Why don’t we just use English, or Esperanto? So much easier. Or we can just accept that certain languages are the way they are. Why is fish spelled fish and not ghoti? Does it really need a reason?

  6. At this point in history, I think it would be very hard to determine how fluent a native Chinese speaker could be at reading Pinyin, for how would you find native speakers who habitually read Pinyin? You’d have to spend a lot of time and effort training them, wouldn’t you?

    To say “we can just accept that certain languages are the way they are” is true, I guess, but it’s a partial truth (like most truths!) For languages do change. What prompted our interchange here was me thinking about the Chinese government’s attempts to stomp out the “regional differences” you think “are never going to go away”. They may succeed or fail, but don’t you think they’ve already had *some* success? Aren’t there lots more people in regions far from Beijing now who speak Mandarin well than, say, 30 years ago?

    And over the centuries and millennia, pronunciation drifts. This is why alphabetic languages like English have words like “laugh” that can hardly be called phonetic. But it’s also why, in lots of Chinese characters that are thought of as having a phonetic component, that component was a hell of a lot more suggestive thousands of years ago when the character was codified than it is now. So in that respect, Chinese characters keep getting harder to learn as time goes by, and that goes for little kids in school in Wuhan as much as it does for superannuated laowai like me.

    I guess what I’m saying is, for both English and Chinese, I can’t imagine living languages persisting for, say, ten thousand years without some kind of spelling reform – they would just become too difficult for a normal kid to learn to read and write.

    Thanks very much for engaging me with these increasingly off-topic thoughts!

  7. That isn’t *exactly* what I’m suggesting, especially by now as you’ve prodded me to develop my thoughts during this exchange.

    A good writing system for a language should be stable over time, so you could read texts from hundreds or thousands of years ago without being a philologist.

    A good writing system should be easy enough to learn that literacy would be achievable by the great majority of people.

    Trouble is, those two desiderata become increasingly incompatible over time as pronunciation drifts. And this, I think, applies to *all* languages, unless someone can think of a living language whose writing system is devoid of all hints to pronunciation.

    • “A good writing system for a language should be stable over time, so you could read texts from hundreds or thousands of years ago without being a philologist.”

      Check – my students in the US who’ve taken a few years Chinese can make out characters from a Qin dynasty inscription that I use in class. That’s 2200 years ago. My first and second year students in Hong Kong can make out quite a few of those words and, when presented with a transcription of the text in more legible form (not modern translation, just the same words but in modern typeface), can understand its meaning. These are people from all kinds of majors, not students specializing in history or literature (in fact, my school doesn’t even have those programs). Their comprehension is not perfect, but this shows that native speakers with a high school education can read 2200 years old Chinese texts somewhat competently, and certainly if they have the aid of a dictionary.

      “A good writing system should be easy enough to learn that literacy would be achievable by the great majority of people.”

      Literacy in China, Taiwan, and Hong Kong are all over 90%. Granted, this is not as high as many other places, but you do also need to factor in the significant portion of the population that grew up in political and economic turmoil and thus were deprived of a chance to get properly educated. Literacy for youngsters nowadays is almost universal. I don’t think Chinese is any more difficult than other languages for native speakers – sure, for foreigners, but I don’t think that’s an important consideration.

      “unless someone can think of a living language whose writing system is devoid of all hints to pronunciation.”

      I was under the impression you are studying Chinese, which means that you should be aware that trying to guess the reading of characters is a losing proposition. Chinese is basically devoid of pronunciation hints. You can, indeed, try guessing by looking at one side of the character, but it’s not accurate by any stretch of imagination, and somethings you can get spectacularly wrong. It’s not a language you can learn to pronounce without also learning the words, unlike, say, Italian, where you can read it out loud without having any clue what you’re reading.

  8. Sure, the phonetic component that occurs in 80% of all Chinese characters is not an infallible guide to how those characters are pronounced. But achieving literacy in Chinese is hard, and learners use whatever helps. Do you really think that Chinese schoolchildren ignore the fact that “湖”, “糊”, “蝴”, “葫”, “瑚”, “猢” all sound the same? There’s a lot of scientific evidence for the fact that native speakers use phonetic components of characters to learn to read and write. See

    http://psychbrain.bnu.edu.cn/teachcms/res_base/teachcms/upload/channel/file/2010_4/12_23/6vaogi1k6dqo.pdf

    for a summary by a researcher from Beijing Normal University.

    So, if you accept that help from the phonetic components of Chinese characters is important in how native speakers learn to read and write, if you take a long view of Chinese (centuries, millennia), you have to wonder if Chinese literacy will become a harder skill to attain as pronuciations of words change. (I am aware that this goes double for pure phonetic systems, whether Pinyin for Chinese or current orthography for English, in the absence of periodic spelling reforms!)

    • But how do you know what 胡 sounds like to begin with? Can you tell what that sounds like by looking at the word? Gu? Yue? You can’t – you just have to know. Reading the side only gets you so far. Yes, of course students learn using these clues, but they also become aware, rather quickly, that these clues are just that – clues, and are by no means foolproof. The paper you linked to show that students develop this awareness, but your logic in your last part

      “So, if you accept that help from the phonetic components of Chinese characters is important in how native speakers learn to read and write, if you take a long view of Chinese (centuries, millennia), you have to wonder if Chinese literacy will become a harder skill to attain as pronuciations of words change. (I am aware that this goes double for pure phonetic systems, whether Pinyin for Chinese or current orthography for English, in the absence of periodic spelling reforms!)”

      is quite flawed. Your A (that phonetic components help Chinese learning) doesn’t automatically lead to your B (that Chinese become harder to learn over time as pronunciation shifts). There are a few reasons: it is quite possible, for example, that the sounds evolve together, so a set of sounds for a family of characters, such as the one you cited, evolve in the same way and thus retain the same sound, even though the sound itself shifted. Clearly, the characters you include here are all at least centuries old, most of them dating back probably 2-3000 years. If any tonal shift has happened (and we know lots of tonal shift has happened since these characters were created) it hasn’t impacted this particular family of words, which then seems to invalidate your concern. After all, if a thousand years is not enough to cause problems, and human written history is only 3-4000 years old, it’s really not something worth worrying about.

      Also, there are plenty of families of characters that share the same phonetic clue but do not share the same sound. For example: 趁 診 珍 參. All have different sounds (the last word having two), even though they share the same phonetic clue. So your example is merely one that works, there are also quite a few that don’t work. They may very well have been products of tonal shift, but as I’ve already mentioned, your assertion that this is making Chinese more difficult to learn over time is hard to prove and is just an assertion. That hasn’t stopped these characters from being some of the most basic words you need to learn in order to be functional in Chinese, and millions of native speakers and foreigners learn it every year without apparent ill-effects. The written form of these characters actually allow for tonal shifts without causing real problems – you just learn that these words are pronounced a certain way, and you just have to remember that. It’s just like you don’t try to pronounce 胡 as gu or yue – you just know. The language inherently allows for changes and variations in pronunciation – which is why someone in Hong Kong and someone in Beijing can both read the same text and not understand each other, but both know exactly what the text said. But you knew that already.

      The problem you suggested does exist, and I’ve personally encountered that when learning Korean. Hangul is phonetic, and it’s a very good phonetic system that catches a lot of sounds. What it does, also, is that it has ossified some sounds in the written language from centuries ago, so that finials that are dropped still appear in the written language but are no longer spoken as such. It does make the language harder to learn, and I often forget adding those consonants in because you don’t actually say them. Chinese doesn’t have that problem, because the words do not tell you how to say them specifically – what the phonetic clues tell you is what other words might sound similar, but not how to pronounce them. That’s very different.

      It’s quite obvious that with more resources, Chinese literacy has become easier, not harder, to acquire. The big reform that took place was in the early 20th century, switching from literary Chinese to modern baihua, which is neither here nor there in terms of approximation of speech, but in any case is closer to how Chinese is used in daily life. That has certainly helped literacy as well. Chinese is not a written phonetic language. It has phonetic components, but is a far cry from many others that use alphabets or syllabaries.

      • “Your A (that phonetic components help Chinese learning) doesn’t automatically lead to your B (that Chinese become harder to learn over time as pronunciation shifts). There are a few reasons: it is quite possible, for example, that the sounds evolve together, so a set of sounds for a family of characters, such as the one you cited, evolve in the same way and thus retain the same sound, even though the sound itself shifted.”

        Yes, that’s true: phonetic groups that change en bloc don’t make learning the language harder over time.

        “[…]Also, there are plenty of families of characters that share the same phonetic clue but do not share the same sound. For example: 趁 診 珍 參. All have different sounds (the last word having two), even though they share the same phonetic clue. So your example is merely one that works, there are also quite a few that don’t work. They may very well have been products of tonal shift,

        Yes, and once the pronunciations for members of a group diverge from one another, they’re unlikely to get reunified, I would think. That’s why I surmise the characters get harder to learn over time as long as they don’t change.

        “but as I’ve already mentioned, your assertion that this is making Chinese more difficult to learn over time is hard to prove and is just an assertion.”

        True.

        “That hasn’t stopped these characters from being some of the most basic words you need to learn in order to be functional in Chinese, and millions of native speakers and foreigners learn it every year without apparent ill-effects.”

        Depends on how you define “ill-effects.” The study I cited above shows kids have a harder time learning characters like the 趁 診 珍 參 group.

        It occurs to me that, due to the complicated history of pronunciation shifts that led to the dialects that exist today, some of them may be less regular with respect to character phonetic components than others. According to my surmise, this would make it harder to learn to read and write in them. With a wide enough range of variation in character phonetic regularity, that would make for a natural experiment. I wonder if anyone has studied Chinese literacy acquisition in different dialects.

        • I think you’re overly focusing only on the link between pronunciation and orthography in language acquisition, and ignoring the other issues that makes Chinese a difficult language to learn – heavy reliance on context, relatively loose grammar rules, orthography, regional variations, etc. These are all related issues and I doubt anyone can actually separate them and study one but isolate them. Cantonese, for example, has some different grammar rules from Mandarin. They are also learning a written language that is a bit further from their speech in terms of word choice and grammar, so it’s going to be harder, but for reasons unrelated to what you’re concerned about.

  9. Pingback: On names and translations | Peonyts' blog

Leave a Reply