Case-Based Roundtable
General Dermatology
Eczema
Chronic Hand Eczema
Alopecia
Aesthetics
Vitiligo
COVID-19
Actinic Keratosis
Precision Medicine and Biologics
Rare Disease
Wound Care
Rosacea
Psoriasis
Psoriatic Arthritis
Atopic Dermatitis
Melasma
NP and PA
Skin Cancer
Hidradenitis Suppurativa
Drug Watch
Pigmentary Disorders
Acne
Pediatric Dermatology
Practice Management
Prurigo Nodularis
Buy-and-Bill

News

Article

June 11, 2024

Can AI Diagnose Dermatological Conditions?

Author(s):

Maddi Hebebrand, MC, Associate Editor

A recent study investigated the technology as a resource for patients seeking online medical information.

Image Credit: © Supatman

As the world of technology and artificial intelligence (AI) continues to evolve, as does its use in the world of medicine. A recent study examined the use of a widely accessible AI tool, Chat Generative Pre-Trained Transformer (ChatGPT), as a diagnostic tool and information source in clinical dermatology. The report focused on the latest version of ChatGPT, GPT-4, and its capability of analyzing clinical images.¹

Despite advancements in technology, the study found that chatbot showed “significant limitations” in providing reliable and clinically useful responses to the images. Although recent studies have shown the model is capable of passing medical exams, including one from the American Board of Dermatology,² the program provided study participants with responses “irrelevant to the condition, superficial or with potentially harmful inaccuracies.”

Background

It is common for patients to turn to the internet and social media for medical advice before scheduling an appointment. One study found that one of the most common unmet needs in online dermatology, as reported by patients, to be a lack of telemedicine chat opportunities.³ Being an easily accessible online chat with free and paid options, the researchers behind this study chose to investigate the potential of ChatGPT as a resource for clinical dermatology, specifically through the photo submission function.

Study Methods

Two senior consultant dermatologists selected 15 clinical images from the Danish web atlas, Danderm,⁴ depicting several common and rare skin conditions. The images included porphyria cutanea tarda, palmoplantar pustulosis, hidradenitis suppurativa, perioral dermatitis, rosacea, alopecia areata, bullous pemphigoid, erythema multiforme, chronic hand eczema, poikiloderma of Civatte, atopic dermatitis, psoriasis vulgaris, mycosis fungoides (tumour stage), malignant melanoma and granuloma annulare. The images were then uploaded to ChatGPT with the prompt: ‘Please provide a description, a potential diagnosis, and treatment options for the following dermatological condition.’ All images and prompts were run on separate sessions, never combining different conditions.

The responses were then assessed by senior registrars in dermatology and consultant dermatologists to determine accuracy, relevance, and depth, each on a scale from 1 (worst) to 5 (best). The clinical images were also rated on a scale from 1 (worst) to 10 (best). Categorical variables were presented as frequencies and percentages; continuous variables as medians and interquartile ranges (IQR).

Findings

A total of 23 physicians participated in this study, the majority being consultant dermatologists (83%), with 79% being employed at a university clinic and 21% in dermatology private practice. The remainder of participants consisted of senior registrars (17%). Most of the respondents had over 5 years of clinical experience in dermatology (87%), with 11 respondents (48%) having over 10 years of training.

The clinical images illustrated the disease at a median score that ranged from 8 to 10, with an overall median of 10 (IQR: 9 to 10). The overall median rating of the ChatGPT generated responses was 2 (IQR: 1 to 4). Median subratings were2 (IQR: 1 to 4) for relevance, 3 (IQR: 2 to 4) for accuracy, and 2 (IQR: 1 to 3) for depth. Researchers found that the highest overall median ratings were observed for psoriasis vulgaris (IQR: 3 to 5), malignant melanoma (IQR: 3 to 5), pustulosis palmoplantaris (IQR: 3 to 4) and alopecia areata (IQR: 3 to 4), each with a median of 4. They noted the lowest ratings, all with a median of 1, were observed for hidradenitis suppurativa (IQR: 1 to 2), rosacea (IQR: 1 to 2), erythema multiforme (IQR: 1 to 2), poikiloderma of Civatte (IQR: 1 to 2), granuloma annulare (IQR: 1 to 2) and mycosis fungoides (tumour stage) (IQR: 1 to 1).

Participants were also given the option to provide comments, some of which included:

“Description of the lesions excellent, but the treatment is totally wrong.”
“Potentially harmful because of the psychological burden until the correct diagnosis is given by a dermatologist.”
“Potentially harmful in case of side effects from noneffective systemic treatment.”

Conclusion

The technology received a low score (2 out of 5) in terms of relevance, accuracy, and depth of the generated responses to 15 illustrative images depicting various dermatologic diseases.Researchers noted half of the conditions were given a median score of 2 or less (53%), meaning the responses were either irrelevant to the condition, superficial or with potentially harmful inaccuracies. Comments provided by participants were mostly negative, some highlighting the risk of misdiagnosis and inappropriate treatment recommendations.

To the knowledge of the researchers, this is the first study investigating the use of ChatGPT in diagnosing and providing information on several dermatological conditions based on clinical images. In the field of dermatology, the researchers noted that few explorative studies have been published on large language models, such as ChatGPT. They also voiced concern on the lack of governmental oversight surrounding AI, creating a gap in overall validation. To remedy this, the researchers advocated for a collaborative and regulated approach to the future development of AI in healthcare.

Artificial Intelligence Dermatology Mobile Apps Present Critical Risks and Inconsistencies, JAMA Study Reports

How is Artificial Intelligence Growing Up?

References

1. Nielsen JPS, Grønhøj C, Skov L, et al. Usefulness of the large language model ChatGPT (GPT-4) as a diagnostic tool and information source in dermatology. JEADV Clin Pract. 2024; 1–6. https://doi.org/10.1002/jvc2.459

2. Mirza FN, Lim RK, Yumeen S, et al. Performance of three large language models on dermatology board examinations. J Invest Dermatol. 2024;144(2):398-400. doi:10.1016/j.jid.2023.06.208

3. Gantenbein L, Navarini AA, Maul LV, et al. Internet and social media use in dermatology patients: Search behavior and impact on patient-physician relationship. Dermatol Ther. 2020;33(6):e14098. doi:10.1111/dth.14098

4. Veien NK. An atlas of clinical dermatology. https://danderm-pdv.is.kkh.dk/atlas/index.html