## 251 - unicode fun

While hunting for a quincunx [ⵘ] to use in explaining the use of fractions in ancient Rome, I found there is a very large number of symbols I might be expected to know about, but don’t.  So I thought I’d share some. The result might belong more in the Maths than the Writing, but I’ll accept opinions on that after publication.

0. Four ancient elements U1F7nn series: 🜁🜂🜃🜄 is followed by a further alchemical symbols, so I challenge you to supply the chemical formulae for these (and that’s as much help as you get):

1.    🜔  🜛  🜺  🝁  🜚   and ‘harder’ :   🜹    🜦    🜿

2.  Here is a different set of fundamental elements to identify, the “Five Phases”:

There are some lovely little pictures, which may or may not reproduce on your screen: from 1F9nn  🦆🦕🦐🥑🦇

to 1F6nn  🚴🚼 🛒 🛠 🚂, 1F3nnn🏃🐄 🍳🏐🐖🌏.

3. You might guess these:       الرياح النار الأرض والماء            What is the name of the language?

What is the name of the language here?        زمین آگ ہوا اور پانی

4. Here is a different set of four things in different character sets. I suggest you decipher one and then conclude the remainder. Identify the language or character set.

उत्तर  दक्षिण  पूर्व/पूरब पश्चिम             উত্তর  দক্ষিম t  পূর্ব্ব  পশ্চিম         الشمال والجنوب والشرق والغرب

There is a private use area in unicode, in the digit zones F0nnn and 10nnnn (‘planes’ 15 and 16); The same document explains that there are exactly 66 non-characters, which seems an oxymoron. You might wonder about those, and sentinels; the link explains enough to set you going.

#### Punctuation marks

¡ !   ⸘ ‽ “ ” ‘  ‛ ‟  .  ‚ „ ‘     ˝ ^ ° ¸ ˛ ¨ ` ˙ ˚ ª º … : ; & _ ¯ – —  # ⁊ ¶  ‡ @ % ‰ ‱ ¦ | /  \ˉ ˆ ˘  - ‒ ~ * ‼︎ ⁇  ⁉︎ ❛ ❜ ❝ ❞ ❢ ❣    ❡ ⸎ ⸐ ⸑ ⸓ ⸔ ⸕ ⸖ ⸗ ⸘⸚ ⸛ ⸜ ⸝ ⸞ ⸟ ⸠ ⸡ ⸢ ⸣ ⸤ ⸥ ⸦ ⸧ ⸨ ⸩ ⸪ ⸫ ⸬ ⸭ ⸮ ⸰ ⸋ ⸊ ⸉ ⸈ ⸇ ⸆ ⸅ ⸄
※＊⁕⁑ ⁂ ⁁  ‵‶‷‴⁗𒑲𒑳

different and not spaced   ⁏;︔⁏

similarly   〝〞〟‶⸗״᱿

in 2018, checking out punctuation, discovered the interrobang,  ‽  which replaces ⁈ and or ⁉︎. I can see how to use that; what about the the dagger and double dagger? Oh, they’re for footnotes, where instead of ¹ ² ³ one would use * † ‡ (asterisk, obelus, diesis). A triple dagger exists at U+2E4B, but I can’t persuade that to print. see wikipedia on this. I discovered here a further weird collection of punctuation marks. Where weird means not previously seen and unknown usage or intention; do read it as an insight to medieval punctuation - I wondered if it might be useful for playwrights, since it offers multiple version of a spoken pause (well, non-spoken, but you know what I mean…). Some of these might actually be useful: for example, maybe we could adopt one of the elevated commas to indicate a plural, microscopically different from the apostrophe?  Would you prefer CDs or CD⸃s or CDⸯ (different from CD’s). I can find reference to the unicode 2E30 to 2E80, but I cannot see it. what I get is a load of ⸱⸲⸳⸴⹹ .

I am bothered that, having discovered the way to do index without having to find superscript , the tiny digits are not at the same elevation:  ⁹⁸⁷⁶⁵⁴³²¹⁰⁻ⁿ. The subscript equivalent works,  ₀₁₂₃₄₅₆₇₈₉, but seeing  the adjacent  ⁵⁴³²  and ²¹⁰⁻¹⁻² gives difficulties with properly representing  10⁻¹⁰, for example. I keep all of these in character 'favourites', so reducing the work involved in input, particularly following some larger scale change, which tends to undo the sub or super script effort. In that sense, I have abandoned the use of superscript and subscript. I discover this is a feature failure of the Arial font I prefer, repeated in Monaco, Helvetica,  Lucida, Times (both). Verdana does this instead ⁹⁸⁷⁶⁵⁴³²¹⁰⁻ⁿ.   One solution is to change to Arial Unicode  ⁹⁸⁷⁶⁵⁴³²¹⁰⁻ⁿ.  Hence 10⁻¹⁰. Yippee‼︎ Thats doesn’t cure x⁻¹/³  but perhaps I can use the right raised omission bracket U+2E0D  as in x⁻¹⸍³ .

In much the same way, I write about CO₂, accepting that the digit is smaller than I'd like, where the subscript gives (will it print?) CO2 , marginally prefereable, but oh so easily defaulting back to ordinary text.

Here’s a character I’d like to use, the Tyronean et, ⁊, which would be more correct to use than the ampersand, &, which properly means per se, by itself. I read the history of both characters and see that we use & because we have always used it more; for example, it used to come at the end of every alphabet as a 27th character. Thus we now use & to pair items without using the and as separator . yet in its original use I’d write that last sentence as:  "use ⁊ to pair items without using the & as separator".

Similarly, the reversed question mark ⸮ is suggested to indicate irony, replacing the combination (!) and (?). Properly called a percontation point or the rhetorical question mark and proposed around Caxton’s time as to be used for a question that does not require an answer. Which surely, would occur in a lot of written prose⸮  U+2E2E if you can’t see it. Questions that require an answer use the regular ? symbol. In the same way the ¡ symbol was suggested to indicate ironic statements. More recently Hervé Bazin, in his 1966 essay Plumons l'Oiseau (“Let's pluck the bird”), came up with a longer list, shown here.

Ƨ 𐐢𐝈

Quiz

Name these characters        * ⁊ ‽ ^ | … † &

Identify each of these by writing the symbol:  caret, pilcrow, silcrow, guillemet, obelus, solidus

Longer challenge: write a story with characters from either of the previous two questions

Difficulty: x bar (mean in statistics) and p-hat (same)

1.   🜔 Is NaCl, common salt; 🜛is Ag, silver (I thought ammonia, NH₃, more appropriate); 🜺 is As, arsenic, where I thought it ought to be CO₂ ; 🝁 is CaO, quicklime ;     🜚 is Au, gold.

Harder:  🜹sal ammoniac Nh₄Cl;  🜦 Cu(SbO3)2  copper antimoniate, copper antimony oxide; 🜿   tartar, KC4H5O6  potassium bitartrate

I couldn’t choose: urine  🝕, a mixture of lots of things including urea;    iron ore 🜜, a mixture, often of oxides and salts of iron;  Aqua regia, 🜆, a 1:3 mixture of nitric and hydrochloric acids. I didn’t know these words well enough to use them: regulus (a partially purified form, perhaps up to 1% of impurity) as in regulus of antimony,   🜰 and 🜱; marcasite, a semiprecious stone with iron pyrites (the disulphide)

2.  The “Five Phases” are Wood ( ), Fire ( huǒ), Earth ( ), Metal ( jīn), and Water ( shuǐ).   Also, Jupiter-木, Saturn-土, Mercury-水, Venus-金, Mars-火   see here.   The Japanese is the same.       Wind , feng,

To find the characters, what I found worked was to put up the unicode set on my screen and search for the character I knew, such as shui: searching for shui produces a list of characters all pronounced ’shui’, ⽔谁睡税瞓说說誰稅 …(24 of them). This is a similar process to that used to send a mandarin text message.

Hunting through the unicode characters to find one I recognise is useful only in showing the basis for how characters are assembled and ordered — itself useful. Coupled with a frequency table (to know which characters are used the most, say the top 200) so that one worked with subsets to learn — this I deem useful. I cannot persuade my other half to see that this is desirable and she says, perhaps correctly, that I should do that myself. I continue to say that there is a need to teach these characters to westerners in a western style, not a far eastern one. We continue to disagree about process and my Mandarin makes no progress at all.

3الرياح النار الأرض والماء Is alriyah alnaar al’ard walma’ earth fire wind & water again. I cannot even identify for sure where the word breaks are. I typed this into my search engine and the sounds produced do not fit well with the roman character form. This الرياح النار الأرض والماء is Arabic, while  زمین آگ ہوا اور پانی   is Urdu.  T ome, they look to be the same general character set. Perhaps there is a generic name for such script? I also found much disagreement aas to why these two might be similar: Urdu classes as an Indo-European language on the Western Hindi branch of the language tree,[source]  and one might classify Hindi-urdu as a single language with, I see, four dialects (the other two are Dakhini and Rekhta). quite a few loan words shift with the Muslim population between Urdu (etc) and Farsi, Turkish and Arabic. Hindi uses the Devangari script (entirely new word to me). Many will insist that Hindi and Urdu are very different languages, but the academics differ. from the same source, I’ve talked with dozens of Pakistanis about Urdu and Hindi, and many insist that Urdu has more in common with Persian and Arabic than it does with Hindi.

Also, Over 60 languages are spoken throughout Pakistan, and over 400 languages are spoken in India. Many of these languages form what linguists called a dialect continuum, a group of dialects or languages that gradually fade from one to the next across geographic areas. Arabic is also technically a continuum of several languages and sub dialects that differ progressively from each other. While a Jordanian person and a Lebanese person may understand each other just fine, an Egyptian will have much more trouble understanding a Moroccan because these “dialects” of Arabic are not mutually intelligible and are so different from each other they are classified as different languages.

4    Hindi  North उत्तर, South दक्षिण , East पूर्व/पूरब , West. पश्चिम   in pinyin equivalent:   Uttar, Dakshin, Poorva, Paschim

North n – উত্তর (uttōr) South n – দক্ষিম (dōkkhim) East n – পূর্ব্ব (purbbō) West n – পশ্চিম (pōscim)  Bengali, so not surprisingly similar.

الشمال Al shamal, الجنوب, al ganoob, الشرق al sharq, الغرب Al gharb              Arabic

வடக்கு தெற்கு, கிழக்கு மேற்கு    [Their order would be ENSW] NSEW is vadaku therku kizaku mearku. This is Tamil.

ทิศเหนือ ตอนใต้  ทิศตะวันออก  ทิศตะวันตก   Thai. Not what I expected and I didn’t discover the pronunciation.

5  * asterisk,    ⁊ tyronian,    ‽; interrobang,    ^ caret,    | virgule,    … ellipsis,     † obelus,    & ampersand

caret,      ¶ pilcrow,    § silcrow,     «» guillemet,   † or ÷ obelus,   / or  ⁄  solidus

You may argue that solidus and virgule are synonyms, that a caret is a circumflex and so on. I agree and sympathise.

DJS,  perhaps 20180720

