Enhancing scene metadata

Axis products produce scene metadata that includes object snapshots: cropped images of the objects detected in a scene. This example shows how to consume these snapshots from the com.axis.scene.object_track.v1 producer over MQTT, and how to use them as input to AI models to build richer descriptions on top of the existing metadata.

Result

As objects move through the scene, the camera tracks them in the feed and selects the best image of each one. When a tracked object leaves the scene, the product publishes its consolidated metadata, which includes the single best cropped snapshot captured while the object was visible.

Example snapshot demo

The example below shows one such snapshot and the description an AI model generated from it:

Snapshot	AI response
	`A person wearing a blue jacket and a white helmet is riding a motorcycle.`

How it works

1. Prerequisites

Access to an MQTT broker
A device that supports AXIS Scene Metadata
You have set up and connected the on-device MQTT client to an MQTT broker, through the device's Web Interface or MQTT client API

The client connects to the device's MQTT broker and subscribes to the com.axis.scene.object_track.v1#1 topic. Connection details are configured as constants.

import paho.mqtt.client as mqtt

client = mqtt.Client()
client.on_message = on_message
client.connect("<MQTT broker IP>", <MQTT broker port>)
client.subscribe("<MQTT topic>")
client.loop_forever()

3. Extract and save the snapshot

Each message payload has a specific JSON format. The image can be found with the data key.

def on_message(client, userdata, msg):
    payload = json.loads(msg.payload)

    # Skip payloads that do not contain an image
    if "image" not in payload:
        return

    # Decode and save image locally (Optional)
    base64_string = payload["image"]["data"]
    image_data = base64.b64decode(base64_string)
    image = Image.open(BytesIO(image_data))
    image.save("snapshot.jpg")

4. Describe the scene with AI

We can send the image to a model which returns a description of the provided image data.

    description = get_ai_response(base64_string)
    print("AI response:", description)

Output
AI response: A person wearing a blue jacket and a white helmet is riding a motorcycle.

5. Tailor the analysis with metadata

You can build on the existing metadata to make the analysis more targeted. For example, read the object's class from the payload and select a prompt that matches its type, so the AI model receives more relevant instructions for each object.

    PROMPTS = {
        "human": "Describe the human in the image. (Gender, age, clothing, etc.)",
        "bike": "Describe the bike in the image. (Type, color, etc.)",
        "car": "Describe the car in the image. (Brand, model, color, etc.)",
    }

    object_type = next((elem.get("type", "").lower() for elem in payload.get("classes", [])), None)
    if object_type in PROMPTS.keys():
        description = get_ai_response(base64_string, ai_prompt=PROMPTS[object_type])
        print("AI response:", description)

About the AI model

get_ai_response is a placeholder for your own model call. It takes the base64-encoded image and returns a short text description. You can use any image-to-text model, such as a small local vision model or a hosted service.

Result​

How it works​

1. Prerequisites​

2. Connect and subscribe​

3. Extract and save the snapshot​

4. Describe the scene with AI​

5. Tailor the analysis with metadata​