Basic Latin (Unicode block)
Unicode character block
Basic Latin
or C0 Controls and Basic Latin |
|
---|---|
Range |
U+0000..U+007F
(128 code points) |
Plane | BMP |
Scripts |
Latin
(52 characters)
Common (76 characters) |
Major alphabets |
English
French German Spanish Vietnamese |
Symbol sets |
Arabic numerals
Punctuation |
Assigned |
128 code points
33 Control or Format |
Unused | 0 reserved code points |
Source standards | ISO/IEC 8859 , ISO 646 |
Unicode version history | |
1.0.0 (1991) | 128 (+128) |
Unicode documentation | |
Code chart ∣ Web page | |
Note : [1] [2] |
The Basic Latin Unicode block , [3] sometimes informally called C0 Controls and Basic Latin , [4] is the first block of the Unicode standard, and the only block which is encoded in one byte in UTF-8 . The block contains all the letters and control codes of the ASCII encoding. It ranges from U+0000 to U+007F, contains 128 characters and includes the C0 controls , ASCII punctuation and symbols , ASCII digits , both the uppercase and lowercase of the English alphabet and a control character .
The Basic Latin block was included in its present form from version 1.0.0 of the Unicode Standard, without addition or alteration of the character repertoire. [5] Its block name in Unicode 1.0 was ASCII . [6]
Table of characters
Subheadings
The C0 Controls and Basic Latin block contains six subheadings. [8]
C0 controls
The C0 Controls , referred to as C0 ASCII control codes in version 1.0, are inherited from ASCII and other 7-bit and 8-bit encoding schemes. The Alias names for C0 controls are taken from the ISO/IEC 6429:1992 standard. [8]
ASCII punctuation and symbols
This subheading refers to standard punctuation characters, simple mathematical operators , and symbols like the dollar sign, percent, ampersand, underscore, and pipe. [8]
ASCII digits
The ASCII Digits subheading contains the standard European number characters 1–9 and 0. [8]
Uppercase Latin alphabet
The Uppercase Latin alphabet subheading contains the standard 26-letter unaccented Latin alphabet in the majuscule . [8]
Lowercase Latin alphabet
The Lowercase Latin Alphabet subheading contains the standard 26-letter unaccented Latin alphabet in the minuscule . [8]
Control character
The Control Character subheading contains the "Delete" character . [8]
Number of symbols, letters and control codes
The table below shows the number of letters , symbols and control codes in each of the subheadings in the C0 Controls and Basic Latin block.
Subheading | Number of symbols | Range of characters |
---|---|---|
C0 controls | 32 control codes | U+0000 to U+001F |
ASCII punctuation and symbols | 33 punctuation marks and symbols | U+0020 to U+002F, U+003A to U+0040, U+005B to U+0060 and U+007B to U+007E |
ASCII digits | 10 digits | U+0030 to U+0039 |
Uppercase Latin Alphabet | 26 unaccented Latin letters in the majuscule. | U+0041 to U+005A |
Lowercase Latin Alphabet | 26 unaccented Latin letters in the minuscule. | U+0061 to U+007A |
Control character | 1 control code containing the "Delete" character. | U+007F |
Chart
C0 Controls and Basic Latin
[lower-alpha 1]
Official Unicode Consortium code chart (PDF) |
||||||||||||||||
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
U+000x | NUL | SOH | STX | ETX | EOT | ENQ | ACK | BEL | BS | HT | LF | VT | FF | CR | SO | SI |
U+001x | DLE | DC1 | DC2 | DC3 | DC4 | NAK | SYN | ETB | CAN | EM | SUB | ESC | FS | GS | RS | US |
U+002x | SP | ! | " | # | $ | % | & | ' | ( | ) | * | + | , | - | . | / |
U+003x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | : | ; | < | = | > | ? |
U+004x | @ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
U+005x | P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | ^ | _ |
U+006x | ` | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o |
U+007x | p | q | r | s | t | u | v | w | x | y | z | { | | | } | ~ | DEL |
|
Variants
Several of the characters are defined to render as a standardized variant if followed by variant indicators.
A variant is defined for a zero with a short diagonal stroke: U+0030 DIGIT ZERO, U+FE00 VS1 (0 ︀ ). [9] [10]
Twelve characters (#, *, and the digits) can be followed by U+FE0E VS15 or U+FE0F VS16 to create emoji variants. [11] [12] [13] [14] They are keycap base characters, for example #️⃣ (U+0023 NUMBER SIGN U+FE0F VS16 U+20E3 COMBINING ENCLOSING KEYCAP). The VS15 version is "text presentation" while the VS16 version is "emoji-style". [10]
U+ | 0023 | 002A | 0030 | 0031 | 0032 | 0033 | 0034 | 0035 | 0036 | 0037 | 0038 | 0039 |
base | # | * | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
base+VS15+keycap | # ︎ ⃣ | * ︎ ⃣ | 0 ︎ ⃣ | 1 ︎ ⃣ | 2 ︎ ⃣ | 3 ︎ ⃣ | 4 ︎ ⃣ | 5 ︎ ⃣ | 6 ︎ ⃣ | 7 ︎ ⃣ | 8 ︎ ⃣ | 9 ︎ ⃣ |
base+VS16+keycap | # ️ ⃣ | * ️ ⃣ | 0 ️ ⃣ | 1 ️ ⃣ | 2 ️ ⃣ | 3 ️ ⃣ | 4 ️ ⃣ | 5 ️ ⃣ | 6 ️ ⃣ | 7 ️ ⃣ | 8 ️ ⃣ | 9 ️ ⃣ |
History
The following Unicode-related documents record the purpose and process of defining specific characters in the Basic Latin block:
Version | Final code points [lower-alpha 1] | Count | UTC ID | L2 ID | WG2 ID | Document |
---|---|---|---|---|---|---|
1.0.0 | U+0000..007F | 128 | (to be determined) | |||
UTC/1999-013 | Karlsson, Kent (1999-05-27), Tildes and micro sign decompositions | |||||
L2/99-176R | Moore, Lisa (1999-11-04), "Micro Sign Case Mappings", Minutes from the joint UTC/L2 meeting in Seattle, June 8-10, 1999 | |||||
L2/04-145 | Starner, David (2004-04-30), C with stroke character examples from BAE report 1884 (Dorsey) | |||||
L2/04-202 | Anderson, Deborah (2004-06-07), Slashed C Feedback | |||||
N3046 | Suignard, Michel (2006-02-22), Improving formal definition for control characters | |||||
N3103 (pdf , doc ) | Umamaheswaran, V. S. (2006-08-25), "M48.33", Unconfirmed minutes of WG 2 meeting 48, Mountain View, CA, USA; 2006-04-24/27 | |||||
L2/11-043 | Freytag, Asmus; Karlsson, Kent (2011-02-02), Proposal to correct mistakes and inconsistencies in certain property assignments for super and subscripted letters | |||||
L2/11-160 | PRI #181 Changing General Category of Twelve Characters , 2011-05-02 | |||||
L2/11-261R2 |
Moore, Lisa (2011-08-16), "Consensus 128-C3",
UTC #128 / L2 #225 Minutes
,
Accept Ken Whistler's recommendations in L2/11-281 on name aliases for control characters with the addition of the abbreviations BEL and NUL. |
|||||
L2/11-438 [lower-alpha 2] [lower-alpha 3] | N4182 | Edberg, Peter (2011-12-22), Emoji Variation Sequences (Revision of L2/11-429) | ||||
L2/15-107 |
Moore, Lisa (2015-05-12), "Consensus 143-C5",
UTC #143 Minutes
,
Add the 12 keycap sequences in emoji-data.txt as provisional named sequences in Unicode 8.0. |
|||||
L2/15-268 | Beeton, Barbara; Freytag, Asmus; Iancu, Laurențiu; Sargent, Murray (2015-10-30), Proposal to Represent the Slashed Zero Variant of Empty Set | |||||
L2/15-301 [lower-alpha 4] [lower-alpha 3] | Pournader, Roozbeh (2015-11-01), A proposal for 278 standardized variation sequences for emoji | |||||
L2/15-254 | Moore, Lisa (2015-11-16), "B.12.1.2 Proposal to Represent the Slashed Zero Variant of Empty Set", UTC #145 Minutes | |||||
L2/17-294 | N4914 | Lunde, Ken (2017-08-14), Proposal to add standardized variation sequence for U+FF10 FULLWIDTH DIGIT ZERO | ||||
L2/22-019 | Scherer, Markus; et al. (2022-01-19), "F.2 F4: U+0019 in ISO vs. NameAliases.txt vs. chart/NamesList.txt", UTC #170 properties feedback & recommendations | |||||
L2/22-016 |
Constable, Peter (2022-04-21), "Consensus 170-C24",
UTC #170 Minutes
,
For U+0019, add a Name alias "EM" of type abbreviation, for Unicode version 15.0. |
|||||
|
See also
References
- ↑ "Unicode character database" . The Unicode Standard . Retrieved 2023-07-26 .
- ↑ "Enumerated Versions of The Unicode Standard" . The Unicode Standard . Retrieved 2023-07-26 .
- ↑ {{cite web|url= https://www.unicode.org/Public/UCD/latest/ucd/Blocks.txt%7Ctitle=block.txt%7Caccessdate=23-03-2023%7Cpublisher=The Unicode Consortium
- ↑ "C0 Controls and Basic Latin" (PDF) . The Unicode Standard, Version 15.0 . Unicode, Inc. 2022 . Retrieved March 22, 2023 .
- ↑ The Unicode Standard Version 1.0, Volume 1 . Addison-Wesley Publishing Company, Inc. 1990. ISBN 0-201-56788-1 .
- ↑ "3.8: Block-by-Block Charts" (PDF) . The Unicode Standard . version 1.0. Unicode Consortium .
- ↑ Michael S. Kaplan (2005-09-17). "When is a backslash not a backslash?" . Sorting it all Out . Microsoft. Archived from the original on 2010-06-12. Also available at: http://archives.miloush.net/michkap/archive/2005/09/17/469941.html
- 1 2 3 4 5 6 7 "Unicode 6.2 code charts" (PDF) . The Unicode Standard . Retrieved 1 April 2013 .
- ↑ Beeton, Barbara; Freytag, Asmus; Iancu, Laurențiu; Sargent, Murray (2015-10-30). "L2/15-268: Proposal to Represent the Slashed Zero Variant of Empty Set" (PDF) .
- 1 2 "UTS #51 Emoji Variation Sequences" . The Unicode Consortium.
- ↑ Edberg, Peter (2011-12-22). "L2/11-438: Emoji Variation Sequences (Revision of L2/11-429)" (PDF) .
- ↑ Pournader, Roozbeh (2015-11-01). "L2/15-301: A proposal for 278 standardized variation sequences for emoji" (PDF) .
- ↑ "UTR #51: Unicode Emoji" . Unicode Consortium. 2020-02-11.
- ↑ "UCD: Emoji Data for UTR #51" . Unicode Consortium. 2021-08-26.
External links
- Definitions from Wiktionary
- Media from Commons
- News from Wikinews
- Quotations from Wikiquote
- Texts from Wikisource
- Textbooks from Wikibooks
- Resources from Wikiversity