Index
    Preface
      What This Book Is About
      What You Need to Know
      How This Book Is Organized
      How to Use This Book
      Conventions Used in This Book
      Using Code Examples
      How to Contact Us
      Web Site and Code Examples
      Acknowledgments
      Chapter 1.  Database Applications and the Web
      Section 1.1.  The Web
      Section 1.2.  Three-Tier Architectures
      Chapter 2.  The PHP Scripting Language
      Section 2.1.  Introducing PHP
      Section 2.2.  Conditions and Branches
      Section 2.3.  Loops
      Section 2.4.  Functions
      Section 2.5.  Working with Types
      Section 2.6.  User-Defined Functions
      Section 2.7.  A Working Example
      Chapter 3.  Arrays, Strings, and Advanced Data Manipulation in PHP
      Section 3.1.  Arrays
      Section 3.2.  Strings
      Section 3.3.  Regular Expressions
      Section 3.4.  Dates and Times
      Section 3.5.  Integers and Floats
      Chapter 4.  Introduction to Object-Oriented Programming with PHP 5
      Section 4.1.  Classes and Objects
      Section 4.2.  Inheritance
      Section 4.3.  Throwing and Catching Exceptions
      Chapter 5.  SQL and MySQL
      Section 5.1.  Database Basics
      Section 5.2.  MySQL Command Interpreter
      Section 5.3.  Managing Databases and Tables
      Section 5.4.  Inserting, Updating, and Deleting Data
      Section 5.5.  Querying with SQL SELECT
      Section 5.6.  Join Queries
      Section 5.7.  Case Study: Adding a New Wine
      Chapter 6.  Querying Web Databases
      Section 6.1.  Querying a MySQL Database Using PHP
      Section 6.2.  Processing User Input
      Section 6.3.  MySQL Function Reference
      Chapter 7.  PEAR
      Section 7.1.  Overview
      Section 7.2.  Core Components
      Section 7.3.  Packages
      Chapter 8.  Writing to Web Databases
      Section 8.1.  Database Inserts, Updates, and Deletes
      Section 8.2.  Issues in Writing Data to Databases
      Chapter 9.  Validation with PHP and JavaScript
      Section 9.1.  Validation and Error Reporting Principles
      Section 9.2.  Server-Side Validation with PHP
      Section 9.3.  JavaScript and Client-Side Validation
      Chapter 10.  Sessions
      Section 10.1.  Introducing Session Management
      Section 10.2.  PHP Session Management
      Section 10.3.  Case Study: Using Sessions in Validation
      Section 10.4.  When to Use Sessions
      Section 10.5.  PHP Session API and Configuration
      Chapter 11.  Authentication and Security
      Section 11.1.  HTTP Authentication
      Section 11.2.  HTTP Authentication with PHP
      Section 11.3.  Form-Based Authentication
      Section 11.4.  Protecting Data on the Web
      Chapter 12.  Errors, Debugging, and Deployment
      Section 12.1.  Errors
      Section 12.2.  Common Programming Errors
      Section 12.3.  Custom Error Handlers
      Chapter 13.  Reporting
      Section 13.1.  Creating a Report
      Section 13.2.  Producing PDF
      Section 13.3.  PDF-PHP Reference
      Chapter 14.  Advanced Features of Object-Oriented Programming in PHP 5
      Section 14.1.  Working with Class Hierarchies
      Section 14.2.  Class Type Hints
      Section 14.3.  Abstract Classes and Interfaces
      Section 14.4.  Freight Calculator Example
      Chapter 15.  Advanced SQL
      Section 15.1.  Exploring with SHOW
      Section 15.2.  Advanced Querying
      Section 15.3.  Manipulating Data and Databases
      Section 15.4.  Functions
      Section 15.5.  Automating Querying
      Section 15.6.  Table Types
      Section 15.7.  Backup and Recovery
      Section 15.8.  Managing Users and Privileges
      Section 15.9.  Tuning MySQL
      Chapter 16.  Hugh and Dave's Online Wines:A Case Study
      Section 16.1.  Functional and System Requirements
      Section 16.2.  Application Overview
      Section 16.3.  Common Components
      Chapter 17.  Managing Customers
      Section 17.1.  Code Overview
      Section 17.2.  Customer Validation
      Section 17.3.  The Customer Form
      Chapter 18.  The Shopping Cart
      Section 18.1.  Code Overview
      Section 18.2.  The Winestore Home Page
      Section 18.3.  The Shopping Cart Implementation
      Chapter 19.  Ordering and Shipping at the Online Winestore
      Section 19.1.  Code Overview
      Section 19.2.  Credit Card and Shipping Instructions
      Section 19.3.  Finalizing Orders
      Section 19.4.  HTML and Email Receipts
      Chapter 20.  Searching and Authentication in the Online Winestore
      Section 20.1.  Code Overview
      Section 20.2.  Searching and Browsing
      Section 20.3.  Authentication
      Appendix A.  Linux Installation Guide
      Section A.1.  Finding Out What's Installed
      Section A.2.  Installation Overview
      Section A.3.  Installing MySQL
      Section A.4.  Installing Apache
      Section A.5.  Installing PHP
      Section A.6.  What's Needed for This Book
      Appendix B.  Microsoft Windows Installation Guide
      Section B.1.  Installation Overview
      Section B.2.  Installing with EasyPHP
      Section B.3.  What's Needed for This Book
      Appendix C.  Mac OS X Installation Guide
      Section C.1.  Getting Started
      Section C.2.  Installing MySQL
      Section C.3.  Setting Up Apache and PHP
      Section C.4.  What's Needed for This Book
      Appendix D.  Web Protocols
      Section D.1.  Network Basics
      Section D.2.  Hypertext Transfer Protocol
      Appendix E.  Modeling and Designing Relational Databases
      Section E.1.  The Relational Model
      Section E.2.  Entity-Relationship Modeling
      Appendix F.  Managing Sessions in theDatabase Tier
      Section F.1.  Using a Database to Keep State
      Section F.2.  PHP Session Management
      Section F.3.  MySQL Session Store
      Appendix G.  Resources
      Section G.1.  Client Tier Resources
      Section G.2.  Middle-Tier Resources
      Section G.3.  Database Tier Resources
      Section G.4.  Security and Cryptography Resources
      Appendix H.  The Improved MySQL Library
      Section H.1.  New Features
      Section H.2.  Getting Started
      Section H.3.  Using the New Features
    Colophon
    Copyright



 

Previous Section  < Day Day Up >  Next Section

D.2 Hypertext Transfer Protocol

As discussed in Chapter 1, HTTP is the standard that allows documents to be communicated and shared over the Web. From a network perspective, HTTP is an application-layer protocol that is built on top of TCP/IP. Since the original version, HTTP/0.9, there have only been two revisions of the HTTP standard. HTTP/1.0 was released as RFC-1945[1] in May 1996 and HTTP/1.1 as RFC-2616 in June 1999.

[1] Request for Comments, or RFCs, are submitted to the RFC editor (http://www.rfc-editor.org) usually by authors attached to organizations such as the Internet Engineering Task Force (IETF at http://www.ietf.org). RFCs date back to the early ARPAnet days and are used to present networking protocols, procedures, programs, and concepts. They also include meeting notes, opinions, bad poems, and other humor: RFC-2324 describes the Hypertext Coffee Pot Control Protocol.

In Chapter 1, we told you that HTTP is very simple: a client—most conspicuously a web browser—sends a request for some resource to a web (HTTP) server, and the server sends back a response. The HTTP response carries the resource—the HTML document or image or whatever—as its payload back to the client.

Continuing our analogy from the previous section, HTTP is a kind of cover letter—like a fax cover sheet—that is stored in an envelope and tells the receiver what language the document is in, instructions on how to read the letter, and how to reply.

D.2.1 Uniform Resource Locators

Uniform resource locators—more commonly known as URLs—are used as the primary naming and addressing method of the Web. URLs belong to the larger class of uniform resource identifiers ; both identify resources, but URLs include specific host details that allow connection to a server that holds the resource.

A URL can be broken into three basic parts: first, the protocol identifier; second, the host and service identifier; and, last, a resource identifier that contains a path with optional parameters and an optional query that identifies the resource. The following example shows a URL that identifies an HTTP resource:

http://host_domain_name:8080/absolute_path?query

The HTTP standard doesn't place any limit on the length of a URL, but some older browsers and proxy servers do. The structure of a URL is formally described by RFC-2396: Uniform Resource Identifiers (URI): Generic Syntax.

D.2.1.1 Protocol

The first part of the URL identifies the application protocol. HTTP URLs start with the familiar http://. Other applications that use URLs to locate resources identify different protocols; for example, URLs used with the File Transfer Protocol (FTP) begin with ftp://. URLs that identify HTTP resources served over connections that are encrypted using the Secure Sockets Layer start with https://. We discuss the use of the Secure Sockets Layer to protect data transmitted over the Internet in Chapter 11.

D.2.1.2 Host and service identification

The next part of the HTTP URL identifies the host on which the web server is running, and the port on which the server listens for HTTP requests. The domain name or the IP address can identify the host component. Using the domain name allows user-friendly web addresses such as:

http://www.w3.org/Protocols/

The equivalent URL using the IP address is:

http://18.29.1.35/Protocols/

Domain names are not case sensitive.

D.2.1.3 Nonstandard TCP ports

By default, a HTTP server listens for requests on port 80. So, for example, requests for the URL http://www.oreilly.com are made to the host machine www.oreilly.com on port 80. When a nonstandard port is used, the URL must include the port number so the browser can successfully connect to the service. For example, the URL http://example.com:8080 connects to the web server running on port 8080 on the host example.com.

D.2.1.4 Resource identification

The remaining URL components help locate a specific resource. The path, with optional parameters, and an optional query are processed by the web server to locate or compute a response.

The path often corresponds to an actual file path on the host's filesystem. For example, an Apache web server running on a Unix machine that hosts example.com may store all the web content under the directory /usr/local/apache2/htdocs and be configured to use the path component of the URL relative to that directory. In this case, the HTTP response to the URL http://example.com/marketing/home.html contains the file /usr/local/apache2/htdocs/marketing/home.html.

In contrast to domain names, the resource identification component is usually case sensitive. This is because it refers to a directory or file on the web server, and Unix servers (which host the majority of web sites) are case sensitive.

D.2.1.5 Parameters and queries

The path component of a URL can include parameters and queries that are used by the web server. A common example is to include a query as part of the URL that runs a search script. The following example shows the string q=red as a query that the script search.php can use:

http://example.com/search.php?q=red

Multiple query terms can be encoded using the & character as a separator:

http://example.com/search.php?q=red&r=victoria

Parameters allow other information not related to a query to be encoded. For example, consider the parameter lines=10 in the URL:

http://example.com/search.php;lines=10?q=red

This can be used by the search.php script to modify the number of lines to display in a result screen.

HTTP provides the distinction between parameters and queries, but parameters are more complex than described here and are not commonly used in practice. We discussed how PHP can use query variables encoded into URLs in Chapter 6.

D.2.1.6 Fragment identifiers

A URL can include a fragment identifier that is interpreted by the client once a requested resource has been received. A fragment identifier is included at the end of a URL separated from the path by the # character. The meaning of the fragment identifier depends on the type of the resource. For example, the following URL includes the fragment identifier tannin for a HTML document:

http://example.com/documents/glossary.html#tannin

When a web browser receives the HTML resource, it then positions the rendered document in the display to start at the anchor element <a name="tannin"> if the named anchor exists.

D.2.1.7 Absolute and relative URLs

The URL general syntax allows a resource to be specified as an absolute or a relative URL. Absolute URLs identify the protocol http://, the host, and the path of the resource, and can be used alone to locate a resource. Here's an example absolute URL:

http://example.com/documents/glossary.html

Relative URLs don't contain all the components and are always considered with respect to a base URL. A relative URL is resolved to an absolute URL, with respect to the base URL. Typically, a relative URL contains the path components of a resource and allows related sets of resources to reference each other in a relative way. This allows path hierarchies to be readily changed without the need to change every URL embedded in a set of documents.

A web browser has two ways to set base URLs when resolving relative URLs. The first method allows a base URL to be encoded into the HTML using the <base> element. The second method sets the base URL to that of the current document; this is done in the absence of a <base> element. For example, the following HTML document contains three relative URLs embedded into <a> elements:

  <p>Read my <a href="cv.html">Curriculum Vitae</a>

  <p>Read my <a href="work/emp.html">employment history</a>

  <p>Visit <a href="/admin/fred.html">Fred's home page</a>

Consider what happens if the page that contains the example is requested with the following URL:

http://example.com/development/dave/home.html

The three relative URLs are resolved to the following absolute URLs by the browser:

http://example.com/development/dave/cv.html

http://example.com/development/dave/work/emp.html

http://example.com/admin/fred.html

Table D-1 shows several relative URLs and how they are resolved to the corresponding absolute URLs given the base URL http://example.com/a/b/c.html?foo=bar.

Table D-1. Example relative URLs resolved to absolute URLs

Relative URL

Absolute URL with respect to http://example.com/a/b/c.html?foo=bar

d.html

http://example.com/a/b/d.html

e/d.html

http://example.com/a/b/e/d.html

/d.html

http://example.com/d.html

../d.html

http://example.com/a/d.html

#xyz

http://example.com/a/b/c.html?foo=bar#xyz

/

http://example.com/a/b/

../

http://example.com/a/


D.2.1.8 URL encoding

The characters used in resource names, query strings, and parameters must not conflict with the characters that have special meanings or aren't allowed in a URL. For example, a question mark character identifies the beginning of a query, and an ampersand (&) character separates multiple terms in a query.

The meanings of these characters can be escaped using a hexadecimal encoding consisting of the percent character (%) followed by the two hexadecimal digits representing the ASCII encoded of the character. For example, an ampersand (&) character is encoded as %26.

The characters that need to be escape-encoded are the control, space, and reserved characters:

; / ? : @ & = + $ ,

Delimiter characters must also be encoded:

< > # % "

The following characters can cause problems with gateways and network agents, and should also be encoded:

{} | \ ^ [ ] `

PHP provides the rawurlencode( ) function to encode special characters. For example, rawurlencode( ) can build the href attribute of an embedded link:

echo '<a href="search.php?q=' . rawurlencode("100% + more") . '">';

The result is an <a> element with an embedded URL correctly encoded:

<a href="search.php?q=100%25%20%2B%20more">

D.2.2 HTTP Requests

The model used for HTTP requests is to apply methods to identified resources. A HTTP request message contains a method name, a URL to which the method is to be applied, and header fields. Some requests can include a body—for example, the data collected in a form—that is referred to in the HTTP standard as the entity-body.

The following is the example HTTP request we showed you in Chapter 1:

GET /~hugh/index.html HTTP/1.1

Host: goanna.cs.rmit.edu.au

From: hugh@hughwilliams.com (Hugh Williams)

User-agent: Hugh-fake-browser/version-1.0

Accept: text/plain, text/html

The request applies the GET method to the /~hugh/index.html resource. The action is to retrieve the HTML document stored in the file index.html.

The first line of the message is the request and contains the method name GET, the request URL /~hugh/index.html, and the HTTP version HTTP/1.1, each separated by a space character. The request is followed by a list of header fields. Each field is represented as a name and value pair separated with a colon character, and each field is on a separate line.

The header fields are followed by a blank line and then by the optional body of the message. A POST method request usually contains a body of text, as we discuss in the next section.

D.2.2.1 Request methods

There are six request methods, but only three are used in practice:


GET

Retrieves a resource. A query can be used to add extra information to the GET request and, as we discussed in our introduction to URLs, these are appended to the URL itself. A database search is a good example of an application of the GET request: the resource is likely to be a web script, and the query component of the URL is the search conditions.


POST

Sends data to a server. Rather than appending data to the URL, the data is sent in the body of the HTTP request.


HEAD

Requests only the header fields as a response, not the resource itself. This can be used for lightweight retrieval, so that the modification date of a resource can be checked before the full resource is retrieved with GET.


DELETE

Allows a resource identified by the URL to be deleted from a server. This is the counterpart to the PUT method discussed next and it allows an author to remove a resource from the specified URL. It's usually not implemented by web servers.


PUT

Similar to the POST method, this method is designed to put a resource onto a server. Some HTML editors and web servers support the PUT methods allowing authors to put resources onto a web site at the specified URL. However, it's usually not implemented by web servers.


TRACE

Produces diagnostic information.

The HTTP standard divides these methods into those that are safe and those that aren't. The safe methods—GET and HEAD—don't have any persistent side effects on the server. The unsafe methods—POST, PUT, and DELETE—are designed to have persistent effects on the server. The standard allows for clients to warn users that a request may be unsafe and, for example, most browsers won't resend a request with the POST method without user confirmation.

The HTTP standard further classifies methods as idempotent when a request can be repeated many times and have the same effect as if the method was called once. The GET, HEAD, PUT, and DELETE methods are classified as idempotent. The POST method isn't.

D.2.2.2 GET versus POST

Both the GET and POST methods send data to the server, but which method should you use?

The HTTP standard includes the two methods to achieve different goals. The POST method was intended to create a resource. The contents of the resource would be encoded into the body of the HTTP request. For example, an order form might be processed and a new row in a database created.

The GET method is used when a request has no side effects (such as performing a search) and the POST method is used when a request has side effects (such as adding a new row to a database). A more practical issue is that the GET method may result in long URLs, and may even exceed some browser and server limits on URL length.

Use the POST method if any of the following are true:

  • The result of the request has persistent side effects such as adding a new database row.

  • The data collected on the form is likely to result in a long URL if you used the GET method.

  • The data to be sent is in any encoding other than seven-bit ASCII.

Use the GET method if all the following are true:

  • The request is to find a resource, and HTML form data is used to help that search.

  • The result of the request has no persistent side effects.

  • The data collected and the input field names in a HTML form are in total less than 1,024 characters in size.

D.2.3 HTTP Responses

When a web server processes a request from a browser, it attempts to apply the method to the identified resource and create a response. The action of the request may succeed or fail, but the web server always sends a response message back to the browser.

A HTTP response message contains a status line, header fields, and (usually) the requested entity as the body of the message. For example, the following is the result of a GET method request for a small HTML file:

HTTP/1.1 200 OK

Date: Sun, 19 Dec 2004 02:54:37 GMT

Server: Apache/2.0.48

Last-Modified: Fri, 19 Dec 2003 02:53:08 GMT

ETag: "4445f-bf-39f4f994"

Content-Length: 321

Accept-Ranges: bytes

Connection: close

Content-Type: text/html

 

<!DOCTYPE HTML PUBLIC 

   "-//W3C//DTD HTML 4.0 Transitional//EN"

   "http://www.w3.org/TR/html4/loose.dtd" >

<html>

<head><title>Grapes and Glass</title></head>

<body>

<img src="http://example.com/grapes.gif">

<p>Welcome to my simple page 

<p><img src="http://example.com/glass.gif">

</body>

</html>

The first, status line begins with the protocol version of the message, followed by a status code and a reason phrase, each separated by a space character. The status code is a number and the reason phrase describes its meaning; these are discussed in the next section. The status line is then followed by the header fields. As with the request, each field is represented as a name and value pair separated with a colon character. A blank line separates the header fields from the body of the response, in this case an HTML document.

D.2.3.1 Status codes

HTTP status codes are used to classify responses to requests. The HTTP status code system is extensible, with a set of codes described in the standard that are "generally recognized in current practice". HTTP defines a status code as a three-digit number, where the first digit is the class of response. The following list shows the five classes of codes defined by HTTP:


1xx

Informational. HTTP 1.1 uses codes in this class to indicate the request has been received by the server and that processing is continuing.


2xx

Success. The request was successfully received, and the action successfully performed.


3xx

Redirection. When a response has a redirection code, the client needs to make a further request to get the specified resource. The URL of the actual resource is included in the response header field Location. When the status code is set to 301, the browser automatically makes the request for the URL specified in the Location header field. The use of the Location header field is discussed further in Chapter 6, and used in many examples throughout this book.


4xx

Client error. The request can't be processed because of bad syntax of the message, the sender is unauthorized or forbidden to access the resource, or the resource can't be found.


5xx

Server error. The server failed to fulfill a valid request.

D.2.4 Caching

Most user agents, such as web browsers, allow HTTP responses to be cached. HTTP responses are cached by saving a response to a request in memory. When a browser considers a request, it first looks to its local cache to see if it has an up-to-date copy of the response before sending the request to the web server. This can significantly reduce the number of requests sent to a web server, improving the performance of the web application and responsiveness to users.

Consider a web site that includes a company logo on the top of each HTML page:

<img src="/images/logo.gif">

When the browser requests a page that contains the image, a separate request is sent to retrieve the image /images/logo.gif. If the image resource is cacheable, and browser caching is enabled, the browser saves the response. A subsequent request for the image is recognized, and the local copy from the cache is used rather than sending another request to the web server.

A browser uses a cached response until the response becomes stale, or the cache becomes full and the response is displaced by the resources from other requests. The primary mechanism for determining if a response is stale is comparing the date and time set in the Expires header field with the date and time of the machine running the browser. If the date and time are incorrectly set on the machine, a cached response may expire immediately or be cached longer than intended.

HTTP describes the conditions that allow a user agent to cache a response. However, there are many situations in which an application may wish to prevent a page from being cached, particularly when the content of a response is dynamically generated, such as in a web database application.

HTTP/1.1 uses the Cache-Control header field as its basic caching control mechanism. For example, setting the Cache-Control header field to no-cache in a HTTP response prevents the response from being cached by a HTTP/1.1 user agent. The header can be used in requests and responses, but we consider only responses here.

Some HTTP/1.1 Cache-Control settings are directed to user agents that maintain caches for more that one user, such as proxy servers. Proxy servers are used to achieve several goals, the most important of which is to provide caching of responses for a group of users. A local network, such as that found in a university department, can be configured to send all HTTP requests to a proxy server. The proxy server forwards requests to the destination web server and passes back the responses to the originating client.

Proxy servers can cache responses and thus reduce requests sent outside the local network. Setting the Cache-Control header field to public allows a user agent to make the cached response available to any request. Setting the Cache-Control header field to private allows a user agent to make the cached response available only to the client who made the initial request.

Setting the Cache-Control header to no-store prevents a user agent from storing the response on disk. This prevents sensitive information from being inadvertently saved beyond the life of a browser session. HTTP/1.1 defines several other Cache-Control header fields not described here.

    Previous Section  < Day Day Up >  Next Section







    Copyright © 2010 | Domen maybe sale - bye this domen