Print This Post

ASP.Net File Upload Revisited - Part 2, RFC 1867 Parser

This is the second of four posts about the new version of the ASP.Net file upload module. This post concentrates on the new RFC 1867 parser implementation. In this version the parser is more efficient and in tests so far consumes less memory during large file uploads. I thought it would be useful to outline how it works- partly for documentation and partly in case it’s of use to anyone implementing anything similar.

RFC 1867 is a protocol designed to handle one or more uploads of files within an HTTP post. The RFC introduces the input type=”file” HTML element and defines how the file data and other fields are combined in the post in a way that allows them to be split back into their original parts on the server.

The first thing to say is that when a file upload is taking place the enctype of the form containing the file inputs must be set to multipart/form-data. This tells the server that the post will conform to the RFC 1867 format.

The fields and file uploads in the post are separated by a delimiter. This usually looks something like the following example:


-----------------------------7d8ed156a0e72

The delimiter contains a randomly generated number guaranteed not to appear in the file data and can be determined by looking at the content type of the request:


multipart/form-data; boundary=---------------------------7d8ed156a0e72

Each field or file upload in the request is separated by the delimiter, whilst the end of the request is marked by the delimiter plus two extra minus signs. Following is an example with a single file upload and a text box. I’ve left the CRLF pairs in (\r\n) to clearly show the line breaks- which are important for parsing. Note also that the view state field is passed as part of the request.


-----------------------------7d8ed156a0e72\r\n
Content-Disposition: form-data; name="__VIEWSTATE"\r\n\r\n
/wEPDwUJLTcwNzkzOTQzZGRnjzt7XiZHfNdaX1fRGzJSNEO9EQ==\r\n
-----------------------------7d8ed156a0e72\r\n
Content-Disposition: form-data; name="fuOne"; filename="C:\\Users\\DarrenJohnstone\\Desktop\\SampleUpload.txt"\r\n
Content-Type: text/plain\r\n\r\n
This is the contents of my sample file.\r\n
-----------------------------7d8ed156a0e72\r\n
Content-Disposition: form-data; name="test1"\r\n\r\n
test1\r\n
-----------------------------7d8ed156a0e72--\r\n

Each field has a header which comes after the boundary. This contains the field name and the content type and file name for uploads. The end of the header is marked by two CRLF pairs. After the header comes the content of the file which is encoded according the the request encoding, this is ended by another boundary (either for the next field/upload or for the end of the request).

As you can see, the RFC 1867 format is very simple. Parsing is also pretty simple. The only real complexity comes from the fact that the entire request isn’t available at the beginning- the stream instead being read in small chunks. That just makes it a little more difficult to detect boundaries as chunks may end with a partial boundary.

In the file upload module the parser is implemented as a System.IO.Stream class which consumes the request as it is read by the ASP.Net worker process. The parser reads each element in the input until the end of request marker is reached. As the stream processes it builds up a smaller version of the input data with the contents of uploaded files stripped out- this is what eventually becomes the request that is passed to the ASP.Net application when the module is finished it’s work. Each time the parser finds a file, it passes it off to a processor which implements the IFileProcessor interface and is designed to stream the file data off to disk, a database, or other storage medium. As a reminder the IFileProcessor interface looks like the following:

/// <summary>
/// The IFileProcessor interface defines classes which are used to
/// process an individual file coming from a form stream.
///
/// The interface defines methods to start the file processing (with a file
/// name and content type), write data, and end the upload process.
///
/// IFileProcessor implementations are used to write uploaded data to
/// persistant storage such as the file system or a database.
/// </summary>
public interface IFileProcessor :IDisposable
{
    /// <summary>
    /// Starts a new file.
    /// </summary>
    /// <param name="fileName">File name.</param>
    /// <param name="contentType">The content type of the file.</param>
    /// <param name="headerItems">A dictionary of items pulled from the header of the field.</param>
    void StartNewFile(string fileName, string contentType, Dictionary<string, string> headerItems);

    /// <summary>
    /// Writes to the output file.
    /// </summary>
    /// <param name="buffer">Buffer to write from.</param>
    /// <param name="offset">Offset in the buffer to write from.</param>
    /// <param name="count">Count of bytes to write.</param>
    void Write(byte[] buffer, int offset, int count);

    /// <summary>
    /// Ends current file processing.
    /// </summary>
    void EndFile();

    /// <summary>
    /// Returns the name of the file that is currently being processed.
    /// Null if there is no file.
    /// </summary>
    /// <returns>The file name.</returns>
    string GetFileName();
}

This process is the key to the operation of the upload module and is what helps to keep memory requirements low even for very large uploads.

Looking at the format of the incoming request, it would seem natural that the processor would primarily deal with strings. However, the request is passed to the parser as a byte stream. Whilst this is easily converted back and forward to a string, the parser is designed to work with the native byte stream. In testing I found this to be a great deal faster than converting byte arrays and strings back and forward.

The parser works by looking for a boundary and then parsing the header block to determine if it’s dealing with a file or a field. All boundaries and headers are written to the formContent MemoryStream which is the means of building up the replacement request data for the module. If a file is found then the data is sent to the appropriate processor, otherwise it is written to formContent. In this way the parser strips the file data from the request stream.

Image showing the upload process

If a chunk of data does not end with a boundary then a number of bytes equivalent to a boundary is held back and prepended to the input buffer on the next iteration of the parser. This ensures that partial boundaries are not missed. Otherwise two files may easily be accidentally merged into one.

Once the process is finished the stripped down request stream is passed to ASP.Net so that ViewState and other fields on the form are correctly processed. The shell of the file uploads (i.e. just the boundaries and header) are also passed to prevent ViewState errors after postbacks. The UploadManager singleton contains a list of correctly uploaded files and a second list of all files which encountered errors during the upload. None of the file contents are held in memory, however. Retrieval of the file data is up to the processors.

The next obstacle is going to be configuration of the processor and the module through the new upload control. At the moment configuration is done in global.asax:

void Application_Start(object sender, EventArgs e)
{
	// Set up the file processor for the upload manager
	UploadManager.Instance.ProcessorType = typeof(FileSystemProcessor);
	UploadManager.Instance.ProcessorInit += new FileProcessorInitEventHandler(Processor_Init);
}

void Processor_Init(object sender, FileProcessorInitEventArgs args)
{
	FileSystemProcessor processor;

	processor = args.Processor as FileSystemProcessor;

	if (processor != null)
	{
		// Set up the download path here - default to the root of the web application
		processor.OutputPath = @"c:\uploads";
	}
}

The module needs to be configured here because global.asax is processed before it in the ASP.Net execution pipeline. The problem of course is that this makes the control less dynamic. I’d really like these configuration settings to be properties on the upload control and leave global.asax out of the picture. Not solved that one yet though!

There Are 6 Responses So Far. »

  1. Hi, Thanks for great article !
    I want to ask about other fields in the forms, what if the a form have fields like last name , first name etc… with the upload on the same form, where do I proccess those fields ?

    TIA

  2. one more thing, what if I want instead of using identity while inserting row (CreateInitialInsertCommand) to insert a row with a value from the form ?

    how do I pass this value to the CreateInitialInsertCommand function

    TIA

  3. Hi Ronen,

    The other fields you pass in (i.e. not file uploads) will be available in the request as usual once the page has loaded. You can theoretically get at them whilst the parser is running but you need to be careful of the order they appear in the page, and you have no guarantees that they will have been processed by the time your file upload comes round and you have to call CreateInsertCommand.

    I think the best bet for you is to do the uploads and then update the row using the ID value passed back in the status. You can then get the fields from the request after the page has loaded and call a SQL update statement to match the values to the corresponding upload row.

    Hope this helps,
    Darren

  4. thanks for your quick response

    ok to make things clearer for me:
    1) first I do the upload to the db (it created a row)
    2) instead of inserting my row, Ill do an update to the row created.

    if I understans correctly if I want to pass to the CreateInsertCommand an ID for the primary key which resides on my page, I can get it on the parser ?

    maybe you have couple of lines to explain how to call upload with non identitny field.

    TIA

  5. Hi Ronen,

    Yes. You let the module create the database record for you. If I were you I’d just leave it with an identity column for the primary key. Then I’d add extra columns onto the schema for the extra information which is on the form. Once the form has loaded I’d iterate through the status.UploadedFiles collection and get the ID of each uploaded file. For each one I’d call a SQL update and set the column values on the appropriate rows (selecting by the ID) and getting the form values from the Page.Request object.

    You could get the values from the parser as it pulls them out but I wouldn’t recommend this as you’d have to do the parsing yourself and modify the FormStream class.

    Hope this helps,
    Darren

  6. We are having problems with ie 8 using our widget example here http://uk.searchwebme.com/jobs/#843

    When a job is clicked or a search carried it constantly opens new windows.

    Any ideas on a fix ?

    Alex P
    http://www.check4jobs.com

Post a Response