Placeholder Image

字幕列表 影片播放

  • Twitter was set up to support 140 characters. And in the English alphabet, that's easy to

  • understand: a character is a letter, number, space or punctuation mark. People more or

  • less agree with computers there. And if it was twenty years ago, that's exactly how the

  • system would work. That far, no further.

  • But now, we have Unicode.

  • Mind you, it's still fairly straightforward in some languages. East Asian languages, for

  • example - Chinese, Japanese, Korean -- "one character" is a glyph, a number, a space,

  • or a punctuation mark. But since the language is denser -- each of these characters encodes

  • more information than an English character -- you can fit almost twice as much information

  • into each tweet.

  • And then, it gets complicated.

  • Take Arabic, for example. What counts as an Arabic letter? First of all, the shape of

  • Arabic letters change significantly depending on where they are in a word. Watch what happens

  • as I take the Arabic for "Arabic alphabet", and hit backspace. Arabic's right to left,

  • remember. The characters change in order to be consistent with the rules of the written

  • language, and the diacritics disappear separately to the letters they're next to.

  • In Vietnamese, on the other hand? Each of those counts as one character.

  • Backspace, and away they go.

  • It's at this point that most British programmers, myself included, throw up their hands in defeat

  • and just use existing code by some other generous soul who's already worked the problem out.

  • Or if they're lazy, they just say, well, no-one's going to use this who doesn't speak English,

  • so we don't need to worry about it.

  • (MOUTHS) Yes you do.

  • Hmm. Unicode has a single character for some English ligatures, like "ffi" - notice how

  • the letters there are smushed together to make them look better to the eye. Some programs

  • will automatically add those in for you. So you copy and paste your text from that into

  • Twitter, and suddenly you're saving characters.

  • People would count that as three characters. Unicode, and therefore Twitter, and pretty

  • much every computer program? Just one. The greatest example of this I could find is the

  • Arabic for "peace be upon him". Unicode has a single character for this, and Twitter will

  • treat it as counting for just 1 of your 140. Which is handy, if you're a devout Muslim

  • and want to talk about the prophets on Twitter.

  • So. What counts as a character? Well, it's complicated. Computers see things differently

  • to people. And let's be honest: unless you have a professor who's setting their essays

  • by character count instead of word count the only time it'll really matter for most people...

  • is when they're trying to tweet.

  • [Translating these subtitles? Add your name here!]

Twitter was set up to support 140 characters. And in the English alphabet, that's easy to

字幕與單字

單字即點即查 點擊單字可以查詢單字解釋

B1 中級 英國腔

為什麼你可以用日語發更多的微博?什麼才算一個字? (Why You Can Tweet More In Japanese: What Counts As A Character?)

  • 47 4
    Samuel 發佈於 2021 年 01 月 14 日
影片單字