Frequently asked questions about rich image descriptions

Note: Rich image descriptions are available with Windows Narrator and in preview through the Windows Insider Program with Click to Do.

Rich image descriptions provide detailed descriptions of visual content such as images, charts, graphs, diagrams, unlabeled buttons, and more. Rich image descriptions enable users who are blind or have low vision to understand visual content through detailed context. This feature is currently available on Snapdragon-powered Copilot+ PCs in the Windows Insider Program. Other Windows devices will continue to use the standard image description experience, which relies solely on online services.

The rich image descriptions feature uses AI models to provide detailed textual descriptions of images, charts, and graphs.

For example, the generated description of an image of a nursery would be:

The image depicts a large organized arrangement of small green leafy plants which are likely sprouts or seedlings arranged in a neat dense grid pattern. Each plant is contained within a small shallow black container suggesting a nursery or a planting setup. The plants are evenly spaced creating a uniform and orderly appearance which may symbolize growth organization or a collection. The black containers provide a stark contrast to the green sprouts highlighting the focus on the plants.

To generate rich image descriptions in Narrator:

When Narrator is turned on, you can press Narrator key+Ctrl+D to get a description of the image or item you are focused on.

Note: To learn more about using Narrator, go to Complete guide to Narrator.

To generate rich image descriptions in Click to Do:

To enter Click to Do, use Windows key+mouse click or press Windows key+Q and select the Describe image action when focused on an image or item. You can also enter Click to Do through a right swipe on touch enabled PCs, or from the Snipping Tool menu if Snipping Tool is installed.

Note: To learn more about Click to do, go to Click to Do: do more with what’s on your screen.

Rich image descriptions are designed to provide text descriptions of visual content for individuals who are blind or have low vision. The descriptions are intended to improve your understanding of images, charts, and graphs, and support accessibility. You can regenerate the image description and copy the description for future reference.

To ensure the quality of descriptions generated, a data set including various types of images was created. These images included natural photographs, charts, graphs, screenshots, and app user interfaces. The generated descriptions were evaluated for accuracy, completeness, relevance, and usefulness. Several evaluation methods, including human expert judgments and LLM-assisted scoring, were used to find areas for improving the quality of generated descriptions.

Microsoft is committed to creating responsible AI by design. Our work is guided by a core set of principles: fairness, reliability and safety, privacy and security, inclusiveness, transparency, and accountability. This feature may provide inaccurate image description, data in charts or graphs, or emotional inferences. This may lead to incorrect assumptions about an image, or the intent of visual content based on the generated description. We continue to work on the models used, to improve the quality of provided image descriptions. You can submit feedback using any of the methods discussed in How do I provide feedback on image descriptions in Narrator?

This feature should not be used to:

Generate descriptions for medical or health-related images that could be misinterpreted as medical advice. Incorrect descriptions could lead to misinformation and potentially harmful decisions by users.
Generate descriptions for images in legal or financial documents where accuracy is critical. Misinterpretation of such images could lead to legal disputes or financial losses
Generate descriptions for images containing cultural or religious symbols without proper context. Misinterpretation could lead to cultural insensitivity or offense.
Generate descriptions for images containing maps, flags, or globes. Misinterpretation of these images could lead to misinformation and involvement in international affairs.

To generate rich image descriptions in Narrator:

To get an image description when Narrator is on, press Narrator key+Ctrl+D while focusing on visual content. To turn off image descriptions in Narrator, go to Settings > Accessibility > Narrator > Get image descriptions, page titles, and popular links and select the toggle switch.

Note: To learn more about using Narrator, go to Complete guide to Narrator.

To generate rich image descriptions in Click to Do:

Use Windows key+mouse click or press Windows key+Q to enter Click to Do, select or focus on an image, then select the Describe image action to start generating the image description.

Note: To learn more about Click to do, go to Click to Do: do more with what’s on your screen.

There could be inaccuracies in the descriptions provided by this feature. To improve the quality of descriptions, you can provide feedback by:

Selecting the thumbs-up or thumbs-down icon on an image description in the Narrator user interface.
Responding to occasional prompts from Windows asking you to rate or provide written feedback about the product or services you use.
Opening Feedback Hub to find similar feedback to upvote or give new feedback by filling out the form.

Microsoft’s commitment to responsible AI and Privacy

Microsoft has been working to advance AI responsibly since 2017, when we first defined our AI principles and later operationalized our approach through our Responsible AI Standard. Privacy and security are core principles as we develop and deploy AI systems. We work to help our customers use our AI products responsibly, sharing our learnings, and building trust-based partnerships. For more about our responsible AI efforts, the principles that guide us, and the tools and capabilities we've created to assure that we develop AI technology responsibly, see Responsible AI.

The rich image descriptions feature is designed to improve accessibility for blind and low-vision users and is not intended for a wider audience. The AI models for this feature use contextual cues in the entire image, including people or entities in the background, which is how the models can still associate the image with an individual, or describe emotions. Rich image descriptions allow for emotional inferences, but do not use biometric data. Any processing that returns results that identify an individual or infer an individual's emotion is not the result of processing of the face, such as facial recognition, generation, and comparison of facial templates. For example, if an image contains a photo of a popular athlete wearing their team's jersey and their specific number, the models may still return a result that might identify the individual based on those contextual cues.

This feature should not be used to infer or deduce the emotions of natural persons in the workplace or in education institutions (e.g., employees or students). Rich image descriptions can provide detailed text descriptions related to perceived emotions of people in images. The processes underlying human emotion are complex, and there are cultural, geographical, and individual differences that influence how we may perceive, experience, and express emotions. Responses related to the emotions of people in images are based on how they appear and may not necessarily accurately indicate the internal state of individual people.

To provide clarity on how each AI feature works, it’s important for you to understand its capabilities and limitations. You should understand the choices available to you in an AI feature and the responsibility associated with those choices.

Click to Do suggests actions that you can take, and you can choose the apps that will be the provider (if applicable) for those actions. Once you choose the action and provider for the action, the results from that action are the responsibility of the provider. For example, from Click to Do you can choose the action Remove background with Paint, which means you’ve chosen Paint as the provider for the action.  Once you have selected the action from the Click to Do context menu, it launches the Paint app, and the selected image is processed by Paint. 

Click to Do’s models have undergone fairness assessments, alongside comprehensive responsible AI, security and privacy assessments, to make sure the technology is effective and equitable while adhering to Microsoft’s Responsible AI best practices. 

For more information about privacy for Recall, see Privacy and control over your Recall experience.

Published: February 11, 2025

Last updated: June 23, 2025

Frequently asked questions about rich image descriptions

Microsoft’s commitment to responsible AI and Privacy

Need more help?

Want more options?

Was this information helpful?

Thank you for your feedback!

What are rich image descriptions?

How do rich image descriptions work?

What can rich image descriptions be used for?

How was the rich image descriptions feature evaluated? What metrics are used to measure performance?

What are the limitations of rich image descriptions, and how can users minimize the impact of these limitations when using the system?

What operational factors and settings allow for effective and responsible use of rich image descriptions?

How do I provide feedback on generated rich image descriptions?

Microsoft’s commitment to responsible AI and Privacy

Need more help?

Want more options?

Was this information helpful?

Thank you for your feedback!