This article was produced as part of the final project for Harvard’s AC215 Fall 2023 course.
Authors: Amelia Li, Rebecca Qiu, Peter Wu
Project Github Repo — Link
Video — Link
We extend our deepest gratitude to our project mentor Connor Capitolo for his steadfast dedication and support, which were instrumental throughout the project. Our appreciation also goes out to Pavlos Protopapas and Shivas Jayaram for delivering an inspiring and comprehensive course, and we are thankful to the entire course staff whose efforts were vital in bringing this project to fruition.
Table of Contents
Introduction

Welcome to the world of PlatePals, an innovative adventure into the realm of nutritional technology. In an age where our health and dietary habits are increasingly becoming focal points of our daily lives, PlatePals emerges as a groundbreaking application designed to bridge the gap between technology and nutrition. It combines the power of artificial intelligence (AI) with the simplicity of a user-friendly interface, offering a novel approach to dietary management and nutritional awareness.
At the heart of PlatePals lies a simple yet powerful concept: empowering individuals with instant, AI-driven insights into the food they eat. Imagine having the ability to snap a picture of your meal and receiving in-depth nutritional information and personalized dietary advice in return.
As we go through the features and capabilities of PlatePals in this blog, we invite readers to engage with this innovative tool. PlatePals is not just for health enthusiasts or those with a keen interest in dietetics; it is a tool for anyone interested in gaining a deeper understanding of their dietary habits. While we won’t delve into intricate technical details in this article, those with a keen interest can find our codebase on our GitHub repository for a more in-depth exploration. Through its advanced functionalities and user-centric design, PlatePals is set to revolutionize the field of nutritional technology, offering a new perspective on how we view and manage our dietary intake.
Our application offers a streamlined and engaging user experience, beginning with the simple act of uploading a food image. Here’s how it unfolds:

Goals
The inception of PlatePals is grounded in a vision to transform the landscape of dietary management and nutritional awareness through the power of machine learning. Our project targets several ambitious goals that combine technological innovation with practical applications in health and nutrition.
1. Advanced Image Recognition and Nutritional Insight: At the core of PlatePals is the objective to develop a sophisticated machine learning application capable of accurately identifying various types of food in user-uploaded images. Utilizing TensorFlow’s Food-101 dataset, we aim to process these images with high accuracy and correlate them with relevant nutritional information, thereby providing users with immediate and valuable dietary insights.
2. Personalized Dietary Recommendations: Beyond mere identification, PlatePals aspires to personalize the user experience. By analyzing the food items recognized in the images, our application is designed to offer tailored dietary recommendations, aligning with individual user preferences, dietary specifications, or health goals.
3. Interactive Dietary Chatbot Development: A key feature and goal of PlatePals is the integration of an advanced chatbot, designed to interact with users conversationally. This chatbot is not just a passive information provider but a dynamic tool that enhances user engagement. Its capabilities include Food Item Clarification, Customized User Interactions, and Tailored Dietary Recommendations.
Architecture
We have detailed our design document outlining the application’s architecture, user interface, and code organization principles in the following images.
Here is our Solution Architecture:

Here is our Technical Architecture:

Data
In the development of PlatePals, our primary data resource is TensorFlow’s Food-101 dataset, which comprises 101,000 annotated food images spanning 101 categories. This comprehensive dataset, amounting to 4.65GB in size and securely hosted in a private Google Cloud Bucket, undergoes meticulous preprocessing.

We have implemented a containerized process that extracts data from the Food-101 dataset, applies a train-test split of 75% and 25% respectively, and subsequently uploads these files to our remote GCS bucket in zip format. During preprocessing, image dimensions are standardized to 224x224 pixels, and we apply various transformations like horizontal flips, rotations, and zooms to enhance the dataset’s robustness, ensuring the images are optimally prepared for the intensive machine learning processes that follow.
Model
Now let’s talk about the heart of our project, the EfficientNetV2B0 model.
EfficientNetV2B0 is an evolution of the original EfficientNet architecture, designed for both improved accuracy and computational efficiency in the context of image classification tasks. According to this study, the EfficientNetV2 models train faster while being up to 6.8x smaller. These enhancements make EfficientNetV2B0 an attractive choice for our project, as it allows us to achieve state-of-the-art accuracy while also maintaining computational efficiency, a key factor given the resource constraints that are often present in real-world machine learning projects.
Our model
- We use a pre-trained version of the EfficientNetV2B0 model from TensorFlow as our base model.
- We added a few layers on top of this base model:
- Global Average Pooling to reduce spatial dimensions.
- A dense layer with 101 units (corresponding to the 101 food categories in our dataset).
- An activation layer using softmax to get probability distributions over classes.
Training
Our model training process consists of two phases:
- Feature extraction
- In this first phase, we froze the layers of the base model (EfficientNetV2B0) to perform feature extraction.
- We compile the model with suitable loss and optimizer settings.
- The model is trained for 5 epochs using the training data. - Fine-tuning
- In this second phase, we unfreeze some of the layers of the base model for fine-tuning.
- We compile the model again with a lower learning rate.
- Finally, the model is fine-tuned for 100 epochs, incorporating early stopping and model checkpointing mechanisms for optimization.
Distillation
We attempted model distillation, which is a model optimization technique that aims to transfer knowledge from a larger, more complex model (the teacher) to a smaller, more efficient one (the student) while preserving its predictive capabilities. However, we decided not to move forward with this approach due to its poor performance, as demonstrated in the next section.
Results
Our final experiment comprised of three models:
- Fine-tuned EfficientNetV2B0 model (red)
- EfficientNetV2B0 model (blue)
- Distilled EfficientNetV2B0 model (green)
The accompanying chart outlines their training and validation performance. From the data, the fine-tuned EfficientNetV2B0 model (red) clearly leads in performance. Based on these findings and our dedication to optimal accuracy, we will continue our development with the base EfficientNetV2B0 model.

Backend API
In the backend of our application, we have built a backend API service using FastAPI to expose model functionality to the frontend. We have also included APIs that connect our sophisticated model to the user interface, ensuring a seamless integration.

Frontend
In our pursuit of enhancing user experience, we’ve developed an intuitive React application that leverages the power of Convolutional Neural Networks (CNNs) from our backend to identify food items in images. This application is designed for simplicity: users can snap a photo of their meal and upload it. The app then seamlessly communicates with the backend API, which analyzes the image and returns a classification label, identifying the type of food.
We’ve integrated an AI Chatbot, powered by OpenAI, to engage users further. Depending on the food identified by the model, the chatbot will provide tailored suggestions for prompts. For instance, if the app recognizes a hamburger, the chatbot might suggest asking, ‘Can you provide a detailed nutritional analysis of hamburgers, including its calorie content and key nutrients?’
Below are some screenshots of our application, showcasing its user-friendly interface and some of its features.



Deployment
We’ve launched our frontend and backend on a Kubernetes cluster, enhancing our system’s reliability through efficient load balancing and failover mechanisms. To streamline the creation and updating of this cluster, we’ve employed Ansible scripts. The use of Ansible aids in maintaining a clear record of our application’s infrastructure on GitHub but also significantly simplifies and automates our deployment processes. Our deployment strategy is encapsulated within a container that is tasked with the management of building and deploying the containers for our application. This containerized approach ensures a smooth and consistent deployment process. We’ve used Google Cloud Platform (GCP) for hosting, with all Docker images being stored in Google Container Registry (GCR). This setup provides us with a robust, scalable, and efficient infrastructure, perfectly suited to meet the evolving demands of our application.
Future Work
As PlatePals continues its journey of innovation and enhancement, we have identified three pivotal areas for development to further elevate the user experience and technological sophistication of our platform:
1. Dataset Diversification: Recognizing the diverse culinary preferences of our global user base, we are committed to expanding our dataset to encompass a more comprehensive range of global cuisines. This expansion is not just about quantity; it’s about quality and diversity. By integrating datasets that represent a vast array of cultural and regional food varieties, we aim to make PlatePals universally applicable and more accurate in identifying a broader spectrum of dishes. This inclusivity will not only improve the accuracy of our image recognition capabilities but also make our app more relevant and personalized for users from different cultural backgrounds.
2. Advanced AI Techniques: We would like to further integrate cutting-edge AI methodologies to advance our image classification and nutritional analysis features. A key focus is the implementation of reinforcement learning and semi-supervised learning models. These advanced techniques will enable our models to learn more effectively with less labeled data, adapting and improving through user interactions. The use of reinforcement learning, in particular, will allow our system to dynamically adjust and optimize its performance based on real-time feedback, leading to more accurate and user-specific results. This will significantly enhance the precision of our food identification and nutritional analysis, providing users with more reliable and detailed information.
3. Chatbot Enhancement: The third pillar of our development strategy revolves around upgrading the natural language processing (NLP) capabilities of our AI-powered chatbot. We envision a chatbot that not only understands and responds accurately to user queries but also engages in more natural, intuitive, and human-like conversations. By harnessing advanced NLP techniques and algorithms, we aim to create a chatbot that can comprehend a wider range of user inputs, discern context, and provide more personalized and relevant responses. This enhancement will make the interaction with the chatbot more fluid and enjoyable, transforming it into a truly interactive and helpful companion for our users in their culinary explorations and nutritional inquiries.