HTTP Common Sense#

Host header#

HTTP request must have Host header. This is mandatory

  • Host header is just a domain name or IP of the web server and the port, it is “extracted” from URL.
  • For example, when accessing https://example.com:8080, the request has header Host: example.com:8080.
  • Web server can use Host header to determine which website to serve. An example is Content-Delivery Network(CDN)

Cookies#

Cookies can be in many shape or form. It could be a random string, like 1f4773fb4bba097d36502c797b0cfef6. Or it could be a meaningful string, like JWT.

  • A session cookie usually takes the form of a random string, and it have a way to store information about your session, called session variables on the server side. You have basically no control over the session cookie. More on this at Session Puzzling.
  • A cookie like JWT stores your session information right in the cookie string. The cookie is a string that we can read, and could potentially modify. Most cookie of this type are protected by a cryptographic algorithm like HMAC-SHA256 to prevent the user from modifying it. The server holds the key, and it uses that key to sign the cookie.

CDNs#

CDNs are just web cache that is deployed everywhere in the world to be closer to user.

  • When users access a page, like /blog/1, CDN fetches that from the real web server, then stores that for a while.
  • Other users who goes to /blog/1 will be served stored response from CDN, so the real web server doesn’t have to work hard

HTTP is Stateless#

HTTP is a stateless protocol

  • Meaning each request is viewed in isolation, no connection to the previous request
  • To fix that, web servers use #Cookies to “remember” context, making it pseudo dynamic

TCP Socket Reuse#

In addition, HTTP 1.1 can reuse a TCP socket to send multiple requests.

  • When reuse a TCP socket, multiple TCP packets is sent using the same socket, less handshakes, less overhead, more data
  • However, the TCP is stream-oriented, so multiple packets combined into a stream
  • And HTTP on the application layer only received the raw data from the TCP stream, so the web server needs a way to separate each HTTP request
  • Web servers uses Content-Length or Transfer-Encoding to know the length of each request’s body

HTTP 1.1 vs HTTP 2#

HTTP 1.1 is string-based protocol, HTTP 2 is binary based. HTTP 2 is more efficient

  • HTTP 2 uses a built-in mechanism to specify the length of the request’s body.
  • HTTP/2 implements measures that effectively prevent request smuggling attacks entirely
  • In some deployments, HTTP 2 requests are rewritten to HTTP 1.1 by an intermediary system before being forwarded to the web server