Tamil Script Code for Information Interchange
Tamil Script Code for Information Interchange (TSCII) is a coding scheme for representing the Tamil script. The lower 128 codepoints are plain ASCII, the upper 128 codepoints are TSCII-specific. After long years of being used on the Internet by private agreement only, it was successfully registered with the IANA in 2007.[1]
TSCII encodes the characters in visual (written) order, paralleling the use of the Tamil Typewriter.
Unicode has used the logical order encoding strategy for Tamil, following ISCII, in contrast to the case of Thai, where the visual order encoding grandfathered by TIS-620 was adopted.
The government of Tamil Nadu endorses its own TAB/TAM standards for 8-bit encoding and other, older encoding schemes can still be found on the WWW.
The free etext collection at Project Madurai uses the TSCII encoding, but has already started to provide Unicode versions.
History
The need for a common encoding for Tamil was felt by members of various mailing list based forums in mid-1990s, as there were multiple custom coded fonts were prevalent in those forums. While some of the commercial encodings were popular than the others, they were not accepted by wider community due to conflicting commercial interests. While Unicode was accepted by most as the future standard, most of the desktop systems at that time were still not capable of handling Unicode for Tamil language, and an interim 8-bit encoding was required.
A separate mailing list for discussion of such encodings (webmasters@tamil.net) was created in 1997 to initiate this discussion, starting with an email written by Dr.K.Kalyanasundaram to the popular Tamil author Sujatha who headed the committee for standardization of Tamil keyboard.[2] This forum quickly attracted enthusiastic participants from across the globe, including several prominent Tamil scholars. Archives of these discussion are maintained by INFITT.[3]
Subsequent to publishing TSCII, most of the members of webmasters@tamil.net mailing list became part of INFITT, which is a wider initiative to bring in standardization and continued development in various areas of Tamil computing.
Codepage layout
_0 | _1 | _2 | _3 | _4 | _5 | _6 | _7 | _8 | _9 | _A | _B | _C | _D | _E | _F | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
8_ |
௦ 0BE6 128 |
௧ 0BE7 129 |
ஸ்ரீ 0BB8 0BCD 0BB0 0BC0 130 |
ஜ 0B9C 131 |
ஷ 0BB7 132 |
ஸ 0BB8 133 |
ஹ 0BB9 134 |
க்ஷ 0B95 0BCD 0BB7 135 |
ஜ் 0B9C 0BCD 136 |
ஷ் 0BB7 0BCD 137 |
ஸ் 0BB8 0BCD 138 |
ஹ் 0BB9 0BCD 139 |
க்ஷ் 0B95 0BCD 0BB7 0BCD 140 |
௨ 0BE8 141 |
௩ 0BE9 142 |
௪ 0BEA 143 |
9_ |
௫ 0BEB 144 |
‘ 2018 145 |
’ 2019 146 |
“ 201C 147 |
” 201D 148 |
௬ 0BEC 149 |
௭ 0BED 150 |
௮ 0BEE 151 |
௯ 0BEF 152 |
ஙு 0B99 0BC1 153 |
ஞு 0B9E 0BC1 154 |
ஙூ 0B99 0BC2 155 |
ஞூ 0B9E 0BC2 156 |
௰ 0BF0 157 |
௱ 0BF1 158 |
௲ 0BF2 159 |
A_ |
NBSP 00A0 160 |
ா 0BBE 161 |
ி 0BBF 162 |
ீ 0BC0 163 |
ு 0BC1 164 |
ூ 0BC2 165 |
ெ 0BC6 166 |
ே 0BC7 167 |
ை 0BC8 168 |
© 00A9 169 |
ௗ 0BD7 170 |
அ 0B85 171 |
ஆ 0B86 172 |
ஈ 0B88 174 |
உ 0B89 175 |
|
B_ |
ஊ 0B8A 176 |
எ 0B8E 177 |
ஏ 0B8F 178 |
ஐ 0B90 179 |
ஒ 0B92 180 |
ஓ 0B93 181 |
ஔ 0B94 182 |
ஃ 0B83 183 |
க 0B95 184 |
ங 0B99 185 |
ச 0B9A 186 |
ஞ 0B9E 187 |
ட 0B9F 188 |
ண 0BA3 189 |
த 0BA4 190 |
ந 0BA8 191 |
C_ |
ப 0BAA 192 |
ம 0BAE 193 |
ய 0BAF 194 |
ர 0BB0 195 |
ல 0BB2 196 |
வ 0BB5 197 |
ழ 0BB4 198 |
ள 0BB3 199 |
ற 0BB1 200 |
ன 0BA9 201 |
டி 0B9F 0BBF 202 |
டீ 0B9F 0BC0 203 |
கு 0B95 0BC1 204 |
சு 0B9A 0BC1 205 |
டு 0B9F 0BC1 206 |
ணு 0BA3 0BC1 207 |
D_ |
து 0BA4 0BC1 208 |
நு 0BA8 0BC1 209 |
பு 0BAA 0BC1 210 |
மு 0BAE 0BC1 211 |
யு 0BAF 0BC1 212 |
ரு 0BB0 0BC1 213 |
லு 0BB2 0BC1 214 |
வு 0BB5 0BC1 215 |
ழு 0BB4 0BC1 216 |
ளு 0BB3 0BC1 217 |
று 0BB1 0BC1 218 |
னு 0BA9 0BC1 219 |
கூ 0B95 0BC2 220 |
சூ 0B9A 0BC2 221 |
டூ 0B9F 0BC2 222 |
ணூ 0BA3 0BC2 223 |
E_ |
தூ 0BA4 0BC2 224 |
நூ 0BA8 0BC2 225 |
பூ 0BAA 0BC2 226 |
மூ 0BAE 0BC2 227 |
யூ 0BAF 0BC2 228 |
ரூ 0BB0 0BC2 229 |
லூ 0BB2 0BC2 230 |
வூ 0BB5 0BC2 231 |
ழூ 0BB4 0BC2 232 |
ளூ 0BB3 0BC2 233 |
றூ 0BB1 0BC2 234 |
னூ 0BA9 0BC2 235 |
க் 0B95 0BCD 236 |
ங் 0B99 0BCD 237 |
ச் 0B9A 0BCD 238 |
ஞ் 0B9E 0BCD 239 |
F_ |
ட் 0B9F 0BCD 240 |
ண் 0BA3 0BCD 241 |
த் 0BA4 0BCD 242 |
ந் 0BA8 0BCD 243 |
ப் 0BAA 0BCD 244 |
ம் 0BAE 0BCD 245 |
ய் 0BAF 0BCD 246 |
ர் 0BB0 0BCD 247 |
ல் 0BB2 0BCD 248 |
வ் 0BB5 0BCD 249 |
ழ் 0BB4 0BCD 250 |
ள் 0BB3 0BCD 251 |
ற் 0BB1 0BCD 252 |
ன் 0BA9 0BCD 253 |
இ 0B87 254 |
|
_0 | _1 | _2 | _3 | _4 | _5 | _6 | _7 | _8 | _9 | _A | _B | _C | _D | _E | _F |
In the table above 80 is U+0BE6 TAMIL DIGIT ZERO, which has been accepted in Unicode version 4.1. A0 is the NO-BREAK SPACE. The codes AD and FF are unassigned.
Conversion Tools
You can convert TSCII encoded documents to UTF-8 using the GNU iconv tools as follows,
$ iconv -f utf-8 -t tscii hello.utf8 > hello.tscii
Whereas conversion from TSCII to UTF-8 is done by interchanging -f and -t flags.
References
- ↑ http://www.iana.org/assignments/charset-reg/TSCII
- ↑ http://www.infitt.org/tscii/archives/msg00001.html
- ↑ http://www.infitt.org/tscii/archives/maillist.html