HTML TAGS AND ELEMENTS

You can use a simple HTML page to get a grasp on several important principles.

<html>
  <head>
    <title>A really simple page</title>
  </head>
  <body>
    <p>This the body text in a paragraph</p>
  <!-- this is a comment -->
  </body>
</html>

The parts enclosed in the < and > pairs are HTML tags that tell a browser how to treat the text between the tags. Extra indenting spaces are added to the example to show how lines are enclosed in pairs of tags, but this indenting is not an HTML requirement. Tags in HTML can be in either uppercase or lowercase to be interpreted correctly by browsers; however, future versions of HTML that are closer to the XML standard will have to keep tags in lowercase.

The entire document is enclosed in the <html> . . . </html> tag pair. The first is called the opening tag and the second is the closing tag. The <html> . . . </html> tag pair forms the root element of the document. The <head> . . . </head> and <body> . . . </body> tag pairs form the head and body elements respectively, which are the only elements that can appear directly inside the root element. The head and body elements are nested inside the html or root element.

Note the special markup starting with <!-- and ending with --> character sequences. This is a comment, not an element, and won't be displayed by a browser.

The HTML specification clearly defines what sort of elements can appear inside other elements. It's this strict control over what can appear where in an HTML document that makes it possible for Web pages to be displayed in different Web browsers without too much variation. In contrast, XML, being extensible, doesn't have this sort of standardization. This is discussed later in the lesson.

Tag Pairs and Empty Tags

When using HTML tag pairs to define elements, any nested tags inside the pair must be closed before the element closing tag appears. In the example, the <p> . . . </p> tag pair must be completely inside the <body> . . . </body> tag pair. An HTML document that adheres to this rule is said to be well formed .

HTML tags can appear in two ways -- paired as in the previous example, and as a singleton such as the <br> tag that causes a break in the text. Singleton tags are also referred to as empty tags because they don't enclose anything. Unfortunately, HTML adopts a convention for empty tags that is contrary to that used in XML. This convention is covered later in this lesson.

It'd be advantageous to programmers if both HTML and XML followed the exact same rules for forming tags, and there's a determined effort to create a version of HTML that follows the XML rules. The W3C (World Wide Web Consortium) has a standard called XHTML that accomplishes this, and establishes a clear road for future HTML extensions.

Tag Attributes

An opening tag can have additional information called attributes that take the form of name="value" . You already saw these in action in Lesson 3 where an input element in a form was defined with the following empty tag:

<input name="msg" type="text" size="40" value=""/>

Conventions dictate that the value of an attribute always be enclosed in quotes, although Web browsers aren't strict about this. Typical uses for attributes are to set color, fonts, and element locations.

HTML Standardization and the DOM

The W3C is the organization responsible for defining the specifications for HTML and creating many other Web-related standards. Many W3C projects are related to standardizing and improving the information carried by markup languages. One of the most fascinating projects is the Semantic Web -- an attempt to make the resources presented on the Web more directly usable by programs.

Many W3C standards are in a nearly constant state of revision, reflecting the incredibly rapid rate of innovation in the World Wide Web as thousands of ideas jostle for acceptance.

The DOM (Document Object Model) is a W3C standard defining how scripting languages and programs can address and modify the various elements of HTML and XML documents. You saw an example in Lesson 3, but the following section takes a look at that again.