5 minute read

In this next installment of the Django export series, I will be demonstrating how to create a zip file containing multiple CSV files. Throughout the post, we will explore various methods, providing you with a range of options to consider for your own project.

Models example

Let’s consider a basic library system in which a book can be associated with multiple libraries.

The corresponding models are structured as follows:

class Library(models.Model):
    name = models.TextField()


class Book(models.Model):
    title = models.TextField()
    libraries = models.ManyToManyField(
        null=True,
        blank=True,
        to='Library',
        related_name='books'
    )

Our objective is to export a zip file that contains several CSV files, each one representing a library and displaying a list of books available in that library.

Export view

As is typical when creating a download API, we will create a view that only allows the GET method.

import csv, io, zipfile
from wsgiref.util import FileWrapper
from django.http import StreamingHttpResponse
from rest_framework.views import APIView

class ExportZip(APIView):
    def get(self):
        csv_datas = self.build_multiple_csv_files()

        temp_file = io.BytesIO()
        with zipfile.ZipFile(
             temp_file, "w", zipfile.ZIP_DEFLATED
        ) as temp_file_opened:
            # add csv files each library
            for data in csv_datas:
                data["csv_file"].seek(0)
                temp_file_opened.writestr(
                    f"library_{data['library_name']}.csv",
                    data["csv_file"].getvalue()
                )

        temp_file.seek(0)

        # put them to streaming content response
        # within zip content_type
        response = StreamingHttpResponse(
            FileWrapper(temp_file),
            content_type="application/zip",
        )

        response['Content-Disposition'] = 'attachment;filename=Libraries.zip'
        return response

    def build_multiple_csv_files(self):
        csv_files = []
        return csv_files

In the aforementioned view, we utilize the Python Standard Library’s zipfile module for compressing and archiving data.

The zipfile.ZipFile method enables us to open a zip file for writing. In this instance, the file is a binary I/O object, specified as temp_file_opened, with the temp_file object being its file-like equivalent.

class zipfile.ZipFile(
    file, mode='r', compression=ZIP_STORED,
    allowZip64=True, compresslevel=None,
    *, strict_timestamps=True
)

We utilize a context manager via the "with" statement to guarantee the closure of our zip file after the suite within the “with” block has been executed, even if an exception is raised.

temp_file = io.BytesIO()
with zipfile.ZipFile(
    temp_file, "w", zipfile.ZIP_DEFLATED
) as temp_file_opened:
    # write to zip file

Within the context manager, we write the CSV content file to the zip temp_file_opened using the writestr method.

ZipFile.writestr(
    zinfo_or_arcname, data,
    compress_type=None, compresslevel=None
)

Here, we specify two mandatory parameters - zinfo_or_arcname and data.

temp_file_opened.writestr(
    f"File_library_{file['lib']}.csv",
    file["csv_file"].getvalue()
)

Once we have completed writing multiple CSV files, we locate the zip file by using seek. We then convert the file-like objects to an iterator using FileWrapper before returning them in the StreamingHttpResponse.

At this stage, we can download an empty file named “Libraries.zip”.

Build CSV files

As shown, we have defined a method named "build_multiple_csv_files" that currently returns an empty list. In the following step, we will add the code to this function to generate a list of CSV files.

class ExportLibraries(APIView):
    header_data = {
        "name": "Name",
        "library": "Library Name"
    }

    def get(self):
        ...
        return response

    def build_multiple_csv_files(self, libraries, books):
        csv_files = []

        for library in libraries.iterator():
            mem_file = io.StringIO()
            writer = csv.DictWriter(
                mem_file, fieldnames=self.header_data.keys()
            )
            writer.writerow(self.header_data)

            books_in_library = books.filter(libraries__in=[library.id])
            for book in books_in_library:
                book_row = self.build_book_row(book, library)
                writer.writerow(book_row)

            mem_file.seek(0)

            csv_files.append({
                "library_name": library.name,
                "csv_file": mem_file
            })

        return csv_files

    def build_book_row(self, book, library):
        row = self.header_data.copy()

        row["name"] = book.name
        row["library"] = library.name

        return row

Reviewing the code above, we iterate over all libraries and construct a CSV file for each one. This is achieved by initializing a writer object using csv.DictWriter() and the keys from header_data.

Here is an example of what header_data might look like:

header_data = {
    "name": "Book Name",
    "library": "Library Name"
}

Next, we add the header to the writer and utilize a loop to add each book row by row to the writer using the .write_row() method.

Once the writing is complete, we append an object for each library, with the library’s name included in the object’s name, to help establish the CSV filename along with the file’s contents.

To enable export functionality, you may add this view to the URLs file.

urlpatterns = [
    path(
        'export_libraries/',
        ExportLibraries.as_view(),
        name="export_libraries"
    )
]

Unit Test

The following question is how to verify the output?

It is recommended to carry out this step before implementing the logic, as per the TDD (Test Driven Development) approach. However, I will demonstrate how the logic works first, as it may help you better understand which components require testing.

I intend to create two unit tests for this:

One for the API: Upon calling the API, a zip file should be exported. One for the CSV files and their contents: Calling build_multiple_csv_files() on the view should return a list containing data for each library. At this stage, the content of each CSV file can also be verified row by row.

NOTE: Please keep in mind that the sample unit tests provided below are solely intended to provide an idea of what they may look like, and you should adapt them according to your specific feature.

Test content response

This unit test is straightforward; we only need to verify that the API call returns a 200 status code and that the exported file is a zip file with the expected name.

def test_export_libraries(self):
    response = self.client.get(reverse('export_libraries'))
    assert response.status_code is status.HTTP_200_OK
    assert response.get('Content-Disposition') == "Libraries.zip"

Test view function to get multiple CSV files

def test_build_csvs_files(self):
    # assume we mock 2 libraries
    # library_1, library_2
    # queryset is books and libraries
    view = ExportRecipesCost()
    view.request = drf_request_for_context(self.user)
    csv_files = view.build_multiple_csv_files(
        libraries, books
    )
    # check number of csv files
    assert len(csv_files) == 2
    # first csv file
    assert csv_files[0]["library_name"] == library_1.name
    assert csv_files[0]["csv_file"]
    # go check csv content in first file here

    # second csv file
    assert csv_files[0]["library_name"] == library_2.name
    assert csv_files[0]["csv_file"]
    # go check csv content in first file here

I have added some comments in the code to indicate the data we will use for testing. Our approach is to call the view function using a mock DRF request, created using the drf_request_for_context utility function.

In addition to checking the number of CSV files returned, we can also verify the content of each CSV file based on its header.

Optimizing performance

To improve performance, we can optimize the current solution by using a single loop to prepare all the books data, regardless of the library it belongs to, and then loop through the libraries using this set of data. This approach can significantly reduce processing time.

Instead of solely relying on Django queries, one could use Pandas to flatten the data and make the export process even easier by working with dataframes.

Another improvement could be to handle the export process as a background task, such as a Celery task. Instead of using StreamingHttpResponse to download the file from the browser, we can upload the zip file to a service like S3 and provide the user with a URL or other means of accessing the file. This approach can improve user experience and prevent timeout errors when handling large amounts of data.

(If you have any other ideas for improving performance, please share them with me.)

Final word

To summarize, we have explored a method (referred to as “use Django queries”) for exporting a zip file containing multiple CSV files within a Django application. While there is an alternative approach using pandas to export data from a Django app, we may discuss it in the future.