The neural network architecture which automatically generate captions from images.
The project is structured as a series of Jupyter notebooks that are designed to be completed in sequential order: sequential order:
0_Dataset: Step 1: Initialize the COCO API Step 2: Plot a sample image 1_Preliminaries: Step 1: Explore the Data Loader Step 2: Use the Data Loader to Obtain Batches Step 3: Experiment with the CNN Encoder Step 4: Implement the RNN Decoder 2_Training: Step 1: Training Setup Step 2: Train your Model 3_Inference: Step 1: Get Data Loader for Test Dataset Step 2: Load Trained Models Step 3: Finish the Sampler Step 4: Clean up Captions Step 5: Generate Predictions!
You MUST enable GPU mode for this project.
Please note, a completely trained model is expected to take between 5-12 hours to train well on a GPU.