HTML TAGS AND ELEMENTS
You can use a simple HTML page to get a grasp on several important principles.
<html>
<head>
<title>A really simple page</title>
</head>
<body>
<p>This the body text in a paragraph</p>
<!-- this is a comment -->
</body>
</html>
The parts enclosed in the < and > pairs are
HTML tags that tell a browser how to treat the text
between the tags. Extra indenting spaces are added to the example to show how
lines are enclosed in pairs of tags, but this indenting is not an HTML
requirement. Tags in HTML can be in either uppercase or lowercase to be
interpreted correctly by browsers; however, future versions of HTML that are
closer to the XML standard will have to keep tags in lowercase.
The entire document is enclosed in the <html> . . .
</html> tag pair. The first is called the opening tag and the second is the closing tag. The <html> . . .
</html> tag pair forms the root
element of the document. The <head> . . .
</head> and <body> . . .
</body> tag pairs form the head
and body elements respectively, which are the only
elements that can appear directly inside the root element. The head and body
elements are nested inside the html or root
element.
Note the special markup starting with <!-- and ending with
--> character sequences. This is a comment, not an element,
and won't be displayed by a browser.
The HTML specification clearly defines what sort of elements can appear inside other elements. It's this strict control over what can appear where in an HTML document that makes it possible for Web pages to be displayed in different Web browsers without too much variation. In contrast, XML, being extensible, doesn't have this sort of standardization. This is discussed later in the lesson.
Tag Pairs and Empty Tags
When using HTML tag pairs to define elements, any nested tags inside the pair
must be closed before the element closing tag appears. In the example, the
<p> . . . </p> tag pair must be completely inside
the <body> . . . </body> tag pair. An HTML document
that adheres to this rule is said to be well formed
.
HTML tags can appear in two ways -- paired as in
the previous example, and as a singleton such as
the <br> tag that causes a break in the text. Singleton
tags are also referred to as empty tags because
they don't enclose anything. Unfortunately, HTML adopts a convention for
empty tags that is contrary to that used in XML. This convention is covered
later in this lesson.
It'd be advantageous to programmers if both HTML and XML followed the exact same rules for forming tags, and there's a determined effort to create a version of HTML that follows the XML rules. The W3C (World Wide Web Consortium) has a standard called XHTML that accomplishes this, and establishes a clear road for future HTML extensions.
Tag Attributes
An opening tag can have additional information called attributes that take the form of
name="value" . You already saw these in action in Lesson 3 where
an input element in a form was defined with the following empty tag:
<input name="msg" type="text" size="40" value=""/>
Conventions dictate that the value of an attribute always be enclosed in quotes, although Web browsers aren't strict about this. Typical uses for attributes are to set color, fonts, and element locations.
HTML Standardization and the DOM
The W3C is the organization responsible for defining the specifications for HTML and creating many other Web-related standards. Many W3C projects are related to standardizing and improving the information carried by markup languages. One of the most fascinating projects is the Semantic Web -- an attempt to make the resources presented on the Web more directly usable by programs.
Many W3C standards are in a nearly constant state of revision, reflecting the incredibly rapid rate of innovation in the World Wide Web as thousands of ideas jostle for acceptance.
The DOM (Document Object Model) is a W3C standard defining how scripting languages and programs can address and modify the various elements of HTML and XML documents. You saw an example in Lesson 3, but the following section takes a look at that again.
