Converting MNIST Data to CSV or Individual Files using Python

Converting MNIST Data to CSV or Individual Files using Python

·

3 min read

In order to convert MNIST data to CSV or image files using Python code, it is necessary to have a Python environment properly installed on your computer. Below are the steps needed to convert MNIST data to CSV:

1. Install Python

If you haven't already installed Python, you can download it from python.org. Make sure to add Python to your system path during the installation process to make sure that you can run Python commands.

2. Set Up an Environment

It’s a good idea to create a virtual environment for your Python projects. This keeps your dependencies organized and separate from other projects.

  • Open your command line interface (CLI).

  • Navigate to your project directory or where you want to keep your files.

  • Run the following commands:

# Install the virtual environment package globally
pip install virtualenv

# Create a new virtual environment
virtualenv myenv

# Activate the virtual environment
# On Windows
myenv\Scripts\activate
# On macOS or Linux
source myenv/bin/activate

These commands will create a virtual environment (folder) for you in the directory that you are running the commands from.

3. Install Required Libraries

With your environment set up, install the following Python libraries that we will be using in our project:

pip install numpy idx2numpy pandas Pillow

4. Prepare Your Script (CSV)

Create a Python script (convert_mnist_csv.py) with the code below. You can use a simple text editor like Notepad, or a code editor like Visual Studio Code.

import idx2numpy
import numpy as np
import pandas as pd

# Paths to the MNIST dataset files
file_images = 'train-images.idx3-ubyte'
file_labels = 'train-labels.idx1-ubyte'

# Read the image and label files
images = idx2numpy.convert_from_file(file_images)
labels = idx2numpy.convert_from_file(file_labels)

# Flatten the image array and normalize pixel values
images_flattened = images.reshape(images.shape[0], -1) / 255.0

# Combine labels and images
data = np.column_stack((labels, images_flattened))

# Convert to DataFrame and save as CSV
df = pd.DataFrame(data)
df.to_csv('mnist.csv', index=False, header=False)

This script will produce a CSV file where each row starts with the label followed by 784 pixel values (since each image is 28x28 pixels, flattened to a single row).

Alternatively, you can create single image files from your MNIST data.

4a. Prepare Your Script (Image Files)

If you prefer to extract/save each image individually, you can do so by creating a Python script (convert_mnist_img.py) with the code below.

from PIL import Image
import idx2numpy

file_images = 'train-images.idx3-ubyte'
images = idx2numpy.convert_from_file(file_images)

# Convert each image to a PNG file
for index, img in enumerate(images):
    image = Image.fromarray(img)
    image.save(f'img\\image_{index}.png')

This script will save each MNIST image as a separate PNG file in your working directory.

5. Run Your Script

Before you run your script(s), make sure that:

  1. You copied the train-images and train-labels compressed files and have extracted them. Remember your virtual environment folder is created in whichever directory you ran the create virtual environment command from.

  2. Also, put your script file in the same folder as the data folders.

The run your script.

python convert_mnist_csv.py

# Or for individual image creation

python convert_mnist_images.py

After running the script, check the output file (mnist.csv or image files) to ensure they have been created as expected.

The Python script files can be found here: https://github.com/tjgokcen/MNISTConverter-Python