Simple UTF-8 convertor



Top  •  Home

What is it?

This service works as a simple convertor. It can convert common text in utf-8 encoding to its binary representation by clicking the encode button. The decode button can be used to vice versa conversion. Thanks to utf-8 compatibility to ACSII this service converts ASCII characters correctly too.

Something about UTF-8 encoding

Unicode is a variable-length character encoding and is compatible with ASCII. The original specification allowed for sequences of up to six bytes but it was reduced by RFC to four later. The bits of a Unicode character are distributed into the lower bit positions inside the UTF-8 bytes, with the lowest bit going into the last bit of the last byte.

UTF Byte Order Mark

Character code U+FEFF on the beginning of data stream stands for Byte Order Mark. It's sometimes used as signature defining the byte order in plaintext files. In fact there are five correct forms of this BOM which depends on Unicode version. Under some protocols, use of BOM may be prohibited or mandatory. According to this some applications aren't able to work correctly with Unicode, sometimes.

Creation of this service

It was created for study purpose. Few of my friends was looking for similar service and I found it useful then.

Sentence example

Text:        Hello world!
Unicode:     U+0048 U+0065 U+006C U+006C U+006F U+0020 U+0077 U+006F U+0072 U+006C U+0064 U+0021
Hexadecimal: 0x48 0x65 0x6C 0x6C 0x6F 0x20 0x77 0x6F 0x72 0x6C 0x64 0x21
Binary:      01001000 01100101 01101100 01101100 01101111 00100000 01110111 01101111 01110010 01101100 01100100 00100001

New line example

New line is a charater, too. However on Windows machines, there is also carriage return character before a line feed character. The line feed is is used as new line character on UNIX and other operating systems.

Text:             B
                  y
                  e
Unicode (windows):      U+0042 U+000D U+000A U+0079 U+000D U+000A U+0065
Unicode (unix):         U+0042 U+000A U+0079 U+000A U+0065
Hexadecimal (windows):  0x42 0x0D 0x0A 0x79 0x0D 0x0A 0x65
Hexadecimal (unix):     0x42 0x0A 0x79 0x0A 0x65
Binary (windows):       01000010 00001101 00001010 01111001 00001101 00001010 01100101
Binary (unix):          01000010 00001010 01111001 00001010 01100101

Special characters example

It's longer, isn't it? :)

Text:        Žluťoučký kůň
Unicode:     U+017D U+006C U+0075 U+0165 U+006F U+0075 U+010D U+006B U+00FD U+0020 U+006B U+016F U+0148
Hexadecimal: 0xC5 0xBD 0x6C 0x75 0xC5 0xA5 0x6F 0x75 0xC4 0x8D 0x6B 0xC3 0xBD 0x20 0x6B 0xC5 0xAF 0xC5 0x88
Binary:      11000101 10111101 01101100 01110101 11000101 10100101 01101111 01110101 11000100 10001101 01101011 11000011 10111101 00100000 01101011 11000101 10101111 11000101 10001000

Top  •  Home