Hey there!
In this codelab, you'll explore building a Python Flask application that leverages Gemini Pro Vision Multimodal's capabilities to perform tasks like image classification, object detection, and text understanding. By the end, you'll have a practical understanding of integrating Gemini Pro Vision into your backend applications.
In order to follow this codelab, you'll need the following:
API KEY
.pip install Flask
pip install marko google-generativeai
import os
from flask import Flask, request, Response, g, render_template, jsonify
import marko
import google.generativeai as genai
genai.configure(api_key=os.getenv("API_KEY"))
app = Flask(__name__)
app.debug = True
config = {
'temperature': 0,
'top_k': 20,
'top_p': 0.9,
'max_output_tokens': 500
}
We also need to confiugre the security settings for the model output:
safety_settings = [
{
"category": "HARM_CATEGORY_HARASSMENT",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
}
]
model = genai.GenerativeModel(model_name="gemini-pro-vision",
generation_config=config,
safety_settings=safety_settings)
@app.route('/', methods=['GET'])
def hello_world():
return render_template("chat.html")
@app.route('/chat', methods=['POST'])
def chat():
if 'user_image' not in request.files:
return jsonify({"error": "No file part"})
file = request.files['user_image']
if file.filename == '':
return jsonify({"error": "No selected file"})
if file:
image_data = file.read()
image_parts = [
{
"mime_type": file.content_type,
"data": image_data
},
]
prompt_parts = [
"You are Sheldon Cooper. User will upload an image. Based on the image, you have to come up with a Sheldon Cooper style fun fact. Also give a funny, sarcastic note about the image. \n\nUser's image:\n\n",
image_parts[0],
"\n\nFun fact:\n",
]
response = model.generate_content(prompt_parts)
return jsonify({
"response": marko.convert(response.text)
})
Finally, we'll add the entrypoint for the file which runs the Flask development server.
if __name__ == '__main__':
app.run(debug=True, host="0.0.0.0", port=int(os.environ.get("PORT", 8080)))
With the backend done, we're free to implement the UI for the API in any manner we want to.
Now, we can develop a quick UI to work with the API of the chatbot. Here's some sample HTML/JS to create one such UI.
<div>
<div id="chat-box" class="bg-light p-3 mb-3 rounded">
<blockquote style="border-left: 4px solid #43a047;">Hi, I am Sheldon Cooper. Upload an image and I will tell you a fun fact.</blockquote>
</div>
<progress id="progress-bar" style="display: none"></progress>
<!-- Updated form to include file upload -->
<form id="chat-form">
<div class="input-group mb-3">
<input id="image-input" type="file" class="form-control">
</div>
<div class="input-group">
<button type="submit" id="send-button" class="btn btn-primary">Upload</button>
</div>
</form>
</div>
Then, let's add some JavaScript to the page for interactivity:
<script>
function appendImageToChat(file) {
var reader = new FileReader();
reader.onloadend = function () {
var img = $('<img>').attr('src', reader.result).css({'max-width': '100%', 'height': 'auto'});
$('#chat-box').append($('<blockquote>').css({'border-left': '4px solid dodgerblue'}).append(img));
}
if (file) {
reader.readAsDataURL(file);
}
}
$(function() {
$('#chat-form').submit(function(e) {
e.preventDefault(); // Prevent the default form submission
var formData = new FormData(this);
var fileInput = $('#image-input')[0].files[0];
formData.append('user_image', fileInput);
if (fileInput) {
$('#chat-box').append('<blockquote style="border-left: 4px solid #1288ff;">User: </blockquote>');
appendImageToChat(fileInput);
$('#image-input').val('');
$('#progress-bar').show();
// Use AJAX to send the formData to the server
$.ajax({
url: '/chat',
type: 'POST',
data: formData,
processData: false, // Prevent jQuery from converting the data into a query string
contentType: false, // Set content type to false as jQuery will tell the server its a query string request
success: function(data) {
$('#chat-box').append('<blockquote style="border-left: 4px solid #43a047;">Sheldon: ' + data.response + '</blockquote>');
$('#progress-bar').hide();
},
error: function() {
$('#chat-box').append('<blockquote style="border-left: 4px solid red;">Sheldon: Sorry, I am not able to respond at the moment.</blockquote>');
$('#progress-bar').hide();
}
});
}
});
});
</script>
API_KEY
as environment variable in the terminal:export API_KEY=your_api_key
python main.py
http://localhost:8080
. You should see your chatbot interface.Congratulations! You've just built and deployed a software powered by Google's Gemini AI! This bot acts like Sheldon Cooper and provides facts about user's uploaded images!