HTTP Common Sense#
Host header#
HTTP request must have Host header. This is mandatory
Hostheader is just a domain name or IP of the web server and the port, it is “extracted” from URL.- For example, when accessing
https://example.com:8080, the request has headerHost: example.com:8080. - Web server can use
Hostheader to determine which website to serve. An example is Content-Delivery Network(CDN)
Cookies#
Cookies can be in many shape or form. It could be a random string, like 1f4773fb4bba097d36502c797b0cfef6. Or it could be a meaningful string, like JWT.
- A session cookie usually takes the form of a random string, and it have a way to store information about your session, called session variables on the server side. You have basically no control over the session cookie. More on this at Session Puzzling.
- A cookie like
JWTstores your session information right in the cookie string. The cookie is a string that we can read, and could potentially modify. Most cookie of this type are protected by a cryptographic algorithm likeHMAC-SHA256to prevent the user from modifying it. The server holds the key, and it uses that key to sign the cookie.
CDNs#
CDNs are just web cache that is deployed everywhere in the world to be closer to user.
- When users access a page, like
/blog/1, CDN fetches that from the real web server, then stores that for a while. - Other users who goes to
/blog/1will be served stored response from CDN, so the real web server doesn’t have to work hard
HTTP is Stateless#
HTTP is a stateless protocol
- Meaning each request is viewed in isolation, no connection to the previous request
- To fix that, web servers use #Cookies to “remember” context, making it pseudo dynamic
TCP Socket Reuse#
In addition, HTTP 1.1 can reuse a TCP socket to send multiple requests.
- When reuse a TCP socket, multiple TCP packets is sent using the same socket, less handshakes, less overhead, more data
- However, the TCP is stream-oriented, so multiple packets combined into a stream
- And HTTP on the application layer only received the raw data from the TCP stream, so the web server needs a way to separate each HTTP request
- Web servers uses
Content-LengthorTransfer-Encodingto know the length of each request’s body
HTTP 1.1 vs HTTP 2#
HTTP 1.1 is string-based protocol, HTTP 2 is binary based. HTTP 2 is more efficient
- HTTP 2 uses a built-in mechanism to specify the length of the request’s body.
- HTTP/2 implements measures that effectively prevent request smuggling attacks entirely
- In some deployments, HTTP 2 requests are rewritten to HTTP 1.1 by an intermediary system before being forwarded to the web server