Initial tests suggest that ChatGPT Health’s analysis of your fitness data could cause unnecessary panic

January 27, 2026

3

Earlier this month, OpenAI introduced a new health-focused section within ChatGPT, pitching it as a safer way for users to ask questions about sensitive topics such as medical data, illnesses and fitness. One of the key features highlighted at launch was ChatGPT Health’s ability to analyze data from apps like Apple Health, MyFitnessPal and Peloton to uncover long-term trends and deliver personalized results. However, a new report suggests that OpenAI may have overstated how effective the feature is at extracting reliable insights from this data.

According to initial tests conducted by Geoffrey A. Fowler of The Washington Post, the chatbot gave the reporter’s heart health a grade of “F” when ChatGPT Health was given access to a decade’s worth of Apple Health data. However, after reviewing the assessment, a cardiologist called it “unfounded” and said the reporter’s actual risk of heart disease was extremely low.

Dr. Scripps Research Institute’s Eric Topol was blunt about ChatGPT Health’s capabilities, saying the tool was unwilling to provide medical advice and relied too heavily on unreliable smartwatch metrics. ChatGPT’s evaluation relied heavily on the Apple Watch’s estimates of VO2 max and heart rate variability, both of which have known limitations and can vary significantly depending on device and software build. Independent research has found that Apple Watch VO2 max estimates are often low, but ChatGPT still considers them to be clear indicators of poor health.

ChatGPT Health gave different grades for the same data

But the problems didn’t end there. When the reporter asked ChatGPT Health to repeat the same scoring exercise, the score in the conversations fluctuated between “F” and “B,” with the chatbot sometimes ignoring recent blood test reports it had access to and occasionally forgetting basic details like the reporter’s age and gender. Claude for Anthropic Healthcare, which also debuted earlier this month, showed similar consistencies, awarding grades that varied between a C and a B-minus.

Both OpenAI and Anthropic have emphasized that their tools are not intended to replace doctors, but only to provide general context. Still, both chatbots provided reliable, highly personalized cardiovascular health assessments. This combination of authority and inconsistency could turn off healthy users or falsely reassure unhealthy users. While AI may be able to glean valuable insights from long-term health data, early testing suggests that feeding years of fitness tracking data into these tools is currently creating more confusion than clarity.

Related reads:

Initial tests suggest that ChatGPT Health’s analysis of your fitness data could cause unnecessary panic

ChatGPT Health gave different grades for the same data

A 32-inch 4K OLED 240Hz monitor for $799.98 is a “finish the setup” deal that’s hard to ignore

Presidents’ Day is over — but this $1,000 e-bike deal is still active

Google Gemini adds Lyria 3, an AI model that can create music using words and photos

LEAVE A REPLY Cancel reply

Most Popular

A 32-inch 4K OLED 240Hz monitor for $799.98 is a “finish the setup” deal that’s hard to ignore

2027 Audi RS5 liftback and wagon leaked

Presidents’ Day is over — but this $1,000 e-bike deal is still active

Mark Selles on discipline and design in landscaping

Recent Comments

EDITOR PICKS

A 32-inch 4K OLED 240Hz monitor for $799.98 is a “finish the setup” deal that’s hard to ignore

2027 Audi RS5 liftback and wagon leaked

Presidents’ Day is over — but this $1,000 e-bike deal is still active

POPULAR POSTS

A 32-inch 4K OLED 240Hz monitor for $799.98 is a “finish the setup” deal that’s hard to ignore

2027 Audi RS5 liftback and wagon leaked

Presidents’ Day is over — but this $1,000 e-bike deal is still active

POPULAR CATEGORY

ABOUT US

FOLLOW US