Multipart Forms and Boundary Parameters
Request Headers
Using an example where a form accepts a file upload, we would have a HTTP POST
request to some URL and we would see a header along the lines of:
Content-Type: multipart/form-data; boundary=----MyBoundary
In the Content-Type
header, the name of the form boundary is identified.
Request Body
Form Fields and Values
If you inspect a web form, you'll see that a form has particular field names and inputs where the user can submit a value. So, the form acts as a series of key:value pairs that is submitted to the web application.
The web application can have hidden form fields or process the request and additional fields to the user input once the request is submitted. You can use a proxy such as Burp to inspect the HTTP POST
request after submitting to understand the application form logic.
"multipart/form-data" contains a series of parts. Each part is expected to contain a content-disposition header [RFC 2183] where the disposition type is "form-data", and where the disposition contains an (additional) parameter of "name", where the value of that parameter is the original field name in the form.
As with all multipart MIME types, each part has an optional "Content-Type", which defaults to text/plain. If the contents of a file are returned via filling out a form, then the file input is identified as the appropriate media type, if known, or "application/octet-stream".
What this means is that for every field in the form, there should be a boundary with a Content-Disposition
header to indicate the field name. There is an optional Content-Type
header to indicate the type of input that was passed to the form.
The Content-Type
header defaults to text/plain
if not provided.
----MyBoundary
Content-Disposition: form-data; name="upload_id"
4e1aea6e-abd7-471c-8dd1-b1ea6c3aee8c
----MyBoundary
Content-Disposition: form-data; name="uploadfile"; filename="app.exe"
Content-Type: application/octet-stream
Raw bytes from file...
----MyBoundary--
Boundaries
Purpose of Boundaries
As with other multipart types, a boundary is selected that does not occur in any of the data. Each field of the form is sent, in the order defined by the sending application and form, as a part of the multipart stream. Each part identifies the INPUT name within the original form. Each part should be labelled with an appropriate content-type if the media type is known (e.g., inferred from the file extension or operating system typing information) or as "application/octet-stream".
To summarize:
- There should be one boundary per each field of the web form
- The boundaries are ordered based on the order of the fields of the web form
- Each boundary identifies the field name from the form
- The
Content-Type
header should be used in the boundary if known- Can be inferred from the file extension
- Or, send as a byte stream ---
application/octet-stream
Defining Boundaries
The encapsulation boundary is defined as a line consisting entirely of two hyphen characters ("-", decimal code 45) followed by the boundary parameter value from the Content-Type header field.
To summarize:
- As stated before, each boundary identifies a field name in a web form
- Each boundary should start with two (2) hyphens
--
- All of the boundaries should use the boundary identity specified in the
Content-Type
request header
Note that the encapsulation boundary must occur at the beginning of a line, i.e., following a CRLF, and that that initial CRLF is considered to be part of the encapsulation boundary rather than part of the preceding part. The boundary must be followed immediately either by another CRLF and the header fields for the next part, or by two CRLFs, in which case there are no header fields for the next part (and it is therefore assumed to be of Content-Type text/plain).
To summarize:
- We designate the boundary with
--BoundaryID
and then provideCRLF
(carriage return line feed, a.k.a. new line) - On the new line just below the boundary ID, we should provide the
Content-Disposition
header - On the line just below here, we should optionally provide a
Content-Type
header- If the
Content-Type
header is not provided, it is assumed to beContent-Type: text/plain
- If the
- We should provide another
CRLF
and pass in the form contnet
Structure of the Web Request
METHOD /resource HTTP/version
Header: Value
Header: Value
Header: Value
Content-Type: multipart/form-data; boundary=--EncapsulationBoundary
<<=============================[+] CRLF is part of the start of the EncapsulationBoundary
2 CRLF if no preamble is used
--EncapsulationBoundary <<<<< Start of multipart data, first boundary
Boundary Headers
Encapsulated content
--EncapsulationBoundary <<<<< Add as many boundaries as needed for form submission
Boundary Headers
Encapsulated content
--EncapsulationBoundary-- <<<<< Final boundary ending notation
Note that the encapsulation boundary must occur at the beginning of a line, i.e., following a CRLF, and that that initial CRLF is considered to be part of the encapsulation boundary rather than part of the preceding part. The boundary must be followed immediately either by another CRLF and the header fields for the next part, or by two CRLFs, in which case there are no header fields for the next part (and it is therefore assumed to be of Content-Type text/plain).
The requirement that the encapsulation boundary begins with a CRLF implies that the body of a multipart entity must itself begin with a CRLF before the first encapsulation line -- that is, if the "preamble" area is not used, the entity headers must be followed by TWO CRLFs.
Encapsulation boundaries must not appear within the encapsulations, and must be no longer than 70 characters, not counting the two leading hyphens.
The encapsulation boundary following the last body part is a distinguished delimiter that indicates that no further body parts will follow. Such a delimiter is identical to the previous delimiters, with the addition of two more hyphens at the end of the line
To summarize:
- At the end of the request headers, we must provide a
CRLF
(carriage return line feed, a.k.a. new line)- 2 CRLF if no preamble
- The CRLF preceding the initial
--
is considered to be part of the boundary definitioin - Following
--EncapsulationBounday
, we must provide a CRLF- 2 CRLF if no headers are provided
- We must not use
--BoundaryName
syntax in the encapsulated content - Boundary names should be no longer than 70 characters, including the leading
--
- BoundaryNames can consist of any character in the ASCII table
- Thus, you may see boundary names with multiple leading hyphesn
------
- This is compliant, so long as it's less than 70 characters
- The end of the form data should be denoted by a boundary ending with
--
--EncapsulationBoundary--