HTML Charsets and Encoding

HTML Charsets and Encoding – Understand Character Encoding

Character encoding is a system that pairs characters with specific numbers, allowing computers to store and transmit text accurately. It ensures that the content is readable in any browser or device.


Common Character Encodings:

  • UTF-8: The most widely used encoding, supports a vast range of characters from different languages. It’s backward compatible with ASCII.
  • ISO-8859-1: Also known as Latin-1, supports Western European languages.

2. Declaring the Charset in HTML

To ensure the correct character encoding, declare the charset in the <head> section of your HTML document using the <meta> tag.

Syntax:

<meta charset="UTF-8">

Example:

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <title>Character Encoding Example</title>
</head>
<body>
  <p>Welcome to the world of web development! 🌍</p>
</body>
</html>

Try It Now

3. Why Use UTF-8?

  • Universal Compatibility: Supports a wide range of characters from different languages.
  • Efficient Storage: Uses one to four bytes per character, optimizing storage for common characters.
  • Standard for the Web: Recommended by W3C and used by the majority of websites.

4. Handling Encoding Issues

If characters display incorrectly (e.g., � instead of the correct symbol), it’s usually due to a mismatch in encoding. To fix this:

  • Ensure the charset is set to UTF-8 in the HTML document.
  • Save the file with UTF-8 encoding in your text editor.

5. Verifying Charset in the Browser

You can check the character encoding of a webpage in most browsers:

  • Google Chrome: Right-click > “View Page Source” and look for <meta charset="UTF-8">.
  • Firefox: Right-click > “View Page Info” > “Technical Details” > “Text Encoding”.

Example of Different Charsets

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <title>UTF-8 Example</title>
</head>
<body>
  <p>This is a UTF-8 encoded page with symbols: € ¥ ©.</p>
</body>
</html>

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="ISO-8859-1">
  <title>ISO-8859-1 Example</title>
</head>
<body>
  <p>This is an ISO-8859-1 encoded page with symbols: &euro; &yen; &copy;.</p>
</body>
</html>

Try It Now

Conclusion

Setting the correct charset in your HTML document is essential for ensuring text is displayed correctly across all browsers and devices. UTF-8 is the most recommended charset for its wide range of supported characters and universal compatibility.