Topic #02Foundational15 min read

HTML (Semantic Structure & Forms)

Semantic markup and accessible forms end to end: the document skeleton, landmark & sectioning elements, the heading outline, then native form controls, label association, input types, native validation, fieldset/legend grouping, and ARIA error messaging — meaning that machines and assistive tech get for free.

#html#semantics#forms#accessibility#a11y#landmarks#aria#validation#seo

Why 'semantic' matters at all. HTML has two kinds of elements: semantic ones that describe what the content is (<nav>, <article>, <button>, <label>) and generic ones that mean nothing (<div>, <span>). The browser, search engines, and — most importantly — assistive technology like screen readers all build their understanding of your page from those meanings. Choose a <div> where a <button> belongs and you throw away keyboard support, focus behavior, and the announcement 'button' — then spend a day reimplementing it badly with JavaScript and ARIA. Semantics is the cheapest accessibility and SEO you will ever get.

The document skeleton. Every page starts from a fixed frame: <!DOCTYPE html> (opts into standards mode), <html lang="en"> (the lang attribute drives screen-reader pronunciation and translation), a <head> for metadata the user never sees, and a <body> for everything they do. In the <head> the non-negotiables are <meta charset="UTF-8"> (so text decodes correctly), <meta name="viewport" content="width=device-width, initial-scale=1"> (so mobile browsers don't zoom out), and a descriptive <title> (the tab label and the primary SEO/social signal).

Landmarks — the page's regions. Landmark elements let assistive tech jump straight to a region: <header> (intro/branding), <nav> (major navigation blocks), <main> (the primary content — exactly one per page), <aside> (tangential content like a sidebar), and <footer> (metadata, links, copyright). Screen-reader users navigate by landmark the way sighted users scan the page, so getting these right replaces the old 'skip to content' hacks. A <div class="main"> gives them none of this.

Sectioning & headings — the document outline. <section> groups related content that usually has a heading; <article> is a self-contained, independently-distributable unit (a blog post, a product card, a comment). Inside them, headings <h1><h6> form a strict outline: one <h1> describing the page, then <h2> for major sections, <h3> nested under those, and so on. The cardinal rule: never skip levels for visual size (don't jump <h1><h3> because it 'looks right') — that breaks the outline screen readers rely on. Size with CSS; choose the level by meaning.

Text-level semantics. Prefer meaning over appearance: <strong> (importance) and <em> (emphasis) over <b>/<i>; <time datetime="2026-07-01"> for machine-readable dates; <abbr>, <code>, <blockquote>/<cite> where they fit. <figure> + <figcaption> ties an image to its caption. And every content image needs an alt attribute: descriptive when it conveys information, or empty (alt="") when it's purely decorative so screen readers skip it.

Forms: use native controls first. The single biggest accessibility win is reaching for real form elements — <form>, <input>, <select>, <textarea>, <button> — before building custom widgets. Native controls come with keyboard support, focus management, form submission, and screen-reader roles built in and for free. A <div> styled to look like a checkbox is invisible to assistive tech and un-tabbable; the native <input type="checkbox"> just works.

Labels — associate every input. Every input needs a programmatic label, not just visible text nearby. The explicit form is <label for="email">Email</label> paired with <input id="email"> — the for must match the input's id. This does two things: screen readers announce the label when the field is focused, and clicking the label focuses/activates the control (a bigger hit target, great on mobile). You can also wrap the input inside the <label> (implicit association). When a visible label isn't possible, use aria-label or aria-labelledby — but a real <label> is always preferred.

Input types do real work. The type attribute is not just cosmetic: email, url, tel, number, date, search, password, color, range each bring built-in validation and the correct on-screen keyboard on mobile (an @ key for email, a numeric pad for number). Pick the most specific type and the platform hands you validation and UX you'd otherwise script. Add inputmode and autocomplete (e.g. autocomplete="email") so browsers can autofill correctly — another accessibility and conversion win.

Native validation before JavaScript. HTML gives you a validation layer with zero script: required, min/max/step (numbers, dates), minlength/maxlength, pattern (a regex), and type-based checks. The browser blocks submission and shows a message automatically. You can style validity with the :valid, :invalid, :required, and :user-invalid pseudo-classes. Only reach for the Constraint Validation API (checkValidity(), setCustomValidity(), the ValidityState object) when you need custom rules or custom messaging — build on top of native, don't replace it.

Grouping with fieldset & legend. Related controls — especially radio buttons and checkboxes — belong in a <fieldset> with a <legend> describing the group. A screen reader then announces 'Shipping speed, Standard, radio button 1 of 3', giving the individual option its group context. Without it, users hear 'Standard, radio button' with no idea what question they're answering. This is the correct structure for any set of choices.

Accessible error messaging. When validation fails, three things make it accessible: put the error text in an element referenced by the input's aria-describedby (so it's announced with the field), set aria-invalid="true" on the failing input, and move focus to the first invalid field on submit. For errors that appear after the fact, an aria-live="polite" (or role="alert") region announces them without the user hunting. Never signal an error with color alone — pair it with text and an icon for colorblind users.

The <button> vs <div onclick> trap. A <button> is focusable, fires on Enter/Space, and is announced as 'button' — all automatic. A clickable <div> gets none of that; you'd have to add tabindex="0", role="button", and keydown handlers just to limp toward parity. Inside a form, remember <button> defaults to type="submit" — set type="button" explicitly for buttons that shouldn't submit, and always call event.preventDefault() in a JS-handled submit to stop the full-page reload.

SEO falls out of good structure. Search crawlers read the same semantics assistive tech does. A single meaningful <h1>, a logical heading outline, landmark regions, descriptive alt text, and clean <a href> links let crawlers understand and rank your content. Semantic HTML plus fast rendering is technical SEO — no extra tags required.

The mental model (memorise this). Pick the element that means what the content is, never what it looks like — CSS handles looks. Landmarks (header/nav/main/footer) frame the page, one <h1> and an unbroken heading outline structure it, native form controls with matched <label for>/id, correct type, and native validation handle input, and <fieldset>/<legend> + aria-describedby/aria-invalid/aria-live handle grouping and errors. Do this and accessibility, keyboard support, and SEO come essentially for free.

Backend Analogy

Semantic HTML is like using a strongly-typed domain model instead of passing everything around as a `Map<String, Object>`. A `<button>` or `<input type="email">` is a typed field with built-in validation and behavior, the way a `LocalDate` or a `@Email`-annotated field carries meaning and constraints; a `<div>` is the untyped blob you have to validate and interpret by hand. Native form validation (`required`, `pattern`, `min`) is your Bean Validation / JSR-380 annotations enforced at the edge before anything reaches your logic, and the Constraint Validation API is the programmatic validator you drop to only for custom rules. `<label for>`/`id` and `aria-*` are the metadata/contract (like OpenAPI annotations) that let other consumers — screen readers, crawlers — understand your fields without reading the implementation.

Key Insights
  • Choose elements by meaning, not appearance - CSS controls looks. A <button> beats a clickable <div> because focus, keyboard, and the 'button' role come for free.
  • Landmarks (header, nav, main, aside, footer) let screen-reader users jump between regions; use exactly one <main> per page.
  • Headings h1-h6 form a document outline - never skip a level for visual size, style with CSS instead.
  • Every input needs a programmatic label: <label for="x"> matched to <input id="x">, or an implicit wrapping label; aria-label only when no visible label is possible.
  • The input type attribute drives built-in validation and the correct mobile keyboard - always pick the most specific type.
  • Prefer native validation (required, type, min/max, pattern, minlength) before JavaScript; drop to the Constraint Validation API only for custom rules.
  • Group related radios/checkboxes in a <fieldset> with a <legend> so the group's question is announced.
  • Accessible errors: aria-describedby links the message, aria-invalid marks the field, aria-live/role=alert announces it, and focus moves to the first error.
  • Set the html lang attribute, meta charset UTF-8, and the responsive viewport meta - the baseline every page needs.
  • Never convey errors or state with color alone; pair color with text and icons for colorblind users.

Worked Code

Semantic page skeleton with landmarks & heading outline
HTML
<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8" />
  <meta name="viewport" content="width=device-width, initial-scale=1" />
  <title>Expense Dashboard — Acme</title>
</head>
<body>
  <header>
    <nav aria-label="Primary">
      <a href="/">Home</a> <a href="/reports">Reports</a>
    </nav>
  </header>

  <main>                     <!-- exactly ONE main per page -->
    <h1>My Expenses</h1>     <!-- the single page-level heading -->
    <section aria-labelledby="recent-h">
      <h2 id="recent-h">Recent</h2>   <!-- h2 under the h1 -->
      <article>
        <h3>Team lunch</h3>            <!-- h3 under the h2, no skipping -->
        <p><time datetime="2026-06-30">Jun 30</time> — $84.20</p>
      </article>
    </section>
  </main>

  <aside aria-label="Tips"><p>Log receipts within 48h.</p></aside>
  <footer><small>&copy; 2026 Acme Inc.</small></footer>
</body>
</html>
Accessible form: labels, input types, grouping & native validation
HTML
<form novalidate>            <!-- novalidate to control messaging in JS; still validate! -->
  <!-- explicit label: for MUST match the input id -->
  <label for="email">Work email</label>
  <input id="email" name="email" type="email"
         autocomplete="email" required
         aria-describedby="email-err" />
  <span id="email-err" role="alert"></span>

  <label for="amount">Amount (USD)</label>
  <input id="amount" name="amount" type="number"
         min="0" step="0.01" inputmode="decimal" required />

  <!-- group related radios so the legend is announced as the question -->
  <fieldset>
    <legend>Reimbursement speed</legend>
    <label><input type="radio" name="speed" value="standard" checked /> Standard</label>
    <label><input type="radio" name="speed" value="express" /> Express</label>
  </fieldset>

  <!-- type=submit is the default inside a form; be explicit for clarity -->
  <button type="submit">Submit expense</button>
</form>
Progressive enhancement with the Constraint Validation API
TypeScript
const form = document.querySelector<HTMLFormElement>("form")!;
const email = document.querySelector<HTMLInputElement>("#email")!;
const emailErr = document.querySelector<HTMLElement>("#email-err")!;

// Custom rule layered ON TOP of native validation.
email.addEventListener("input", () => {
  // clear any custom error so the field can pass again
  email.setCustomValidity("");
  if (email.value.endsWith("@example.com")) {
    email.setCustomValidity("Personal example.com addresses aren't allowed.");
  }
});

form.addEventListener("submit", (e) => {
  // checkValidity() runs BOTH native constraints and our custom one
  if (!form.checkValidity()) {
    e.preventDefault();                 // stop the full-page reload
    const firstInvalid = form.querySelector<HTMLInputElement>(":invalid");
    if (firstInvalid) {
      firstInvalid.setAttribute("aria-invalid", "true");
      emailErr.textContent = firstInvalid.validationMessage; // announced via role=alert
      firstInvalid.focus();             // move focus to the first error
    }
    return;
  }
  e.preventDefault();
  console.log("valid -> would submit", new FormData(form).get("email"));
});
Styling validity states with pseudo-classes
CSS
/* Native validity pseudo-classes — no JS needed for the visuals */
input:required        { border-left: 3px solid #6366f1; } /* mark required */
input:focus-visible   { outline: 2px solid #4f46e5; outline-offset: 2px; }

/* :user-invalid only styles AFTER the user has interacted — avoids
   yelling 'invalid' at an empty field the moment the page loads. */
input:user-invalid    { border-color: #dc2626; }
input:user-invalid + [role="alert"] { color: #dc2626; }

/* Never rely on color alone: pair it with an icon/marker for colorblind users */
input:user-invalid { background-image: url("data:image/svg+xml,%3Csvg/%3E"); }

Try It Live

Edit the code and press Run — it executes safely in a sandboxed iframe. Use the Console tab for log output.

Accessible form with native + custom validation and announced errors

Interview-Ready Q&A

Semantic elements describe the role of content, so browsers, assistive tech, and crawlers understand the page for free. Screen-reader users navigate by landmarks like main and nav; a logical heading outline gives them structure; search engines rank content they can understand. A page of nested divs conveys no meaning, so you'd have to reimplement focus, keyboard, and roles by hand with ARIA — badly. Semantics is the cheapest accessibility and SEO you can get.

Things to Remember
  • 1Pick elements by meaning, not looks; CSS controls appearance. <button> beats a clickable <div>.
  • 2Landmarks: header, nav, main (exactly one per page), aside, footer - screen readers navigate by them.
  • 3Headings h1-h6 are an outline; never skip levels for visual size, style with CSS.
  • 4label[for] must match input[id]; clicking the label then focuses the input.
  • 5input type drives native validation AND the right mobile keyboard - pick the most specific.
  • 6Prefer native validation (required, type, min/max, pattern) before JS; use the Constraint Validation API only for custom rules.
  • 7Group related radios/checkboxes in <fieldset> with a <legend>.
  • 8Accessible errors: aria-describedby + aria-invalid + aria-live/role=alert + focus the first error.
  • 9Baseline head: lang on <html>, <meta charset="UTF-8">, responsive viewport meta, descriptive <title>.
  • 10Never convey state with color alone; every content <img> needs alt (empty alt="" if decorative).

References & Further Reading