Skip to main content

Multipart Forms and Boundary Parameters

Request Headers

Using an example where a form accepts a file upload, we would have a HTTP POST request to some URL and we would see a header along the lines of:

Content-Type: multipart/form-data; boundary=----MyBoundary

In the Content-Type header, the name of the form boundary is identified.

Request Body

Form Fields and Values

If you inspect a web form, you'll see that a form has particular field names and inputs where the user can submit a value. So, the form acts as a series of key:value pairs that is submitted to the web application.

The web application can have hidden form fields or process the request and additional fields to the user input once the request is submitted. You can use a proxy such as Burp to inspect the HTTP POST request after submitting to understand the application form logic.

"multipart/form-data" contains a series of parts. Each part is expected to contain a content-disposition header [RFC 2183] where the disposition type is "form-data", and where the disposition contains an (additional) parameter of "name", where the value of that parameter is the original field name in the form.

https://www.ietf.org/rfc/rfc2388.txt

As with all multipart MIME types, each part has an optional  "Content-Type", which defaults to text/plain. If the contents of a file are returned via filling out a form, then the file input is identified as the appropriate media type, if known, or "application/octet-stream".

https://www.ietf.org/rfc/rfc2388.txt

What this means is that for every field in the form, there should be a boundary with a Content-Disposition header to indicate the field name. There is an optional Content-Type header to indicate the type of input that was passed to the form. 
The Content-Type header defaults to text/plain if not provided.

----MyBoundary
Content-Disposition: form-data; name="upload_id"

4e1aea6e-abd7-471c-8dd1-b1ea6c3aee8c
----MyBoundary
Content-Disposition: form-data; name="uploadfile"; filename="app.exe"
Content-Type: application/octet-stream

Raw bytes from file...
----MyBoundary--

Boundaries

Purpose of Boundaries

As with other multipart types, a boundary is selected that does not occur in any of the data. Each field of the form is sent, in the order defined by the sending application and form, as a part of the multipart stream.  Each part identifies the INPUT name within the original form. Each part should be labelled with an appropriate content-type if the media type is known (e.g., inferred from the file extension or operating system typing information) or as "application/octet-stream".

https://www.ietf.org/rfc/rfc2388.txt

To summarize:

  • There should be one boundary per each field of the web form
  • The boundaries are ordered based on the order of the fields of the web form
  • Each boundary identifies the field name from the form
  • The Content-Type header should be used in the boundary if known
    • Can be inferred from the file extension
    • Or, send as a byte stream --- application/octet-stream

Defining Boundaries

The encapsulation boundary is defined as a line consisting entirely of two hyphen characters ("-", decimal code 45) followed by the boundary parameter value from the Content-Type header field.

https://datatracker.ietf.org/doc/html/rfc1341

To summarize:

  • As stated before, each boundary identifies a field name in a web form
  • Each boundary should start with two (2) hyphens --
  • All of the boundaries should use the boundary identity specified in the Content-Type request header


Note that the encapsulation boundary must occur at the beginning of a line, i.e., following a CRLF, and that that initial CRLF is considered to be part of the encapsulation boundary rather than part of the preceding part. The boundary must be followed immediately either by another CRLF and the header fields for the next part, or by two CRLFs, in which case there are no header fields for the next part (and it is therefore assumed to be of Content-Type text/plain).

https://datatracker.ietf.org/doc/html/rfc1341

To summarize:

  • We designate the boundary with --BoundaryID and then provide CRLF (carriage return line feed, a.k.a. new line)
  • On the new line just below the boundary ID, we should provide the Content-Disposition header
  • On the line just below here, we should optionally provide a Content-Type header
    • If the Content-Type header is not provided, it is assumed to be Content-Type: text/plain
  • We should provide another CRLF and pass in the form contnet

Structure of the Web Request

METHOD /resource HTTP/version
Header: Value
Header: Value
Header: Value
Content-Type: multipart/form-data; boundary=--EncapsulationBoundary
<<=============================[+] CRLF is part of the start of the EncapsulationBoundary
								   2 CRLF if no preamble is used
--EncapsulationBoundary <<<<< Start of multipart data, first boundary
Boundary Headers

Encapsulated content
--EncapsulationBoundary <<<<< Add as many boundaries as needed for form submission
Boundary Headers

Encapsulated content
--EncapsulationBoundary-- <<<<< Final boundary ending notation

Note that the encapsulation boundary must occur at the beginning of a line, i.e., following a CRLF, and that that initial CRLF is considered to be part of the encapsulation boundary rather than part of the preceding part. The boundary must be followed immediately either by another CRLF and the header fields for the next part, or by two CRLFs, in which case there are no header fields for the next part (and it is therefore assumed to be of Content-Type text/plain).

https://datatracker.ietf.org/doc/html/rfc1341

The requirement that the encapsulation boundary begins with a CRLF implies that the body of a multipart entity must itself begin with a CRLF before the first encapsulation line -- that is, if the "preamble" area is not used, the entity headers must be followed by TWO CRLFs.

https://datatracker.ietf.org/doc/html/rfc1341

Encapsulation boundaries must not appear within the encapsulations, and must be no longer than 70 characters, not counting the two leading hyphens.

The encapsulation boundary following the last body part is a distinguished delimiter that indicates that no further body parts will follow. Such a delimiter is identical to the previous delimiters, with the addition of two more hyphens at the end of the line

To summarize:

  • At the end of the request headers, we must provide a CRLF (carriage return line feed, a.k.a. new line)
    • 2 CRLF if no preamble
  • The CRLF preceding the initial -- is considered to be part of the boundary definitioin
  • Following --EncapsulationBounday, we must provide a CRLF
    • 2 CRLF if no headers are provided
  • We must not use --BoundaryName syntax in the encapsulated content
  • Boundary names should be no longer than 70 characters, including the leading --
    • BoundaryNames can consist of any character in the ASCII table
    • Thus, you may see boundary names with multiple leading hyphesn ------
      • This is compliant, so long as it's less than 70 characters
  • The end of the form data should be denoted by a boundary ending with --
    • --EncapsulationBoundary--