News
Article
Author(s):
A recent study investigated the technology as a resource for patients seeking online medical information.
As the world of technology and artificial intelligence (AI) continues to evolve, as does its use in the world of medicine. A recent study examined the use of a widely accessible AI tool, Chat Generative Pre-Trained Transformer (ChatGPT), as a diagnostic tool and information source in clinical dermatology. The report focused on the latest version of ChatGPT, GPT-4, and its capability of analyzing clinical images.1
Despite advancements in technology, the study found that chatbot showed “significant limitations” in providing reliable and clinically useful responses to the images. Although recent studies have shown the model is capable of passing medical exams, including one from the American Board of Dermatology,2 the program provided study participants with responses “irrelevant to the condition, superficial or with potentially harmful inaccuracies.”
Background
It is common for patients to turn to the internet and social media for medical advice before scheduling an appointment. One study found that one of the most common unmet needs in online dermatology, as reported by patients, to be a lack of telemedicine chat opportunities.3 Being an easily accessible online chat with free and paid options, the researchers behind this study chose to investigate the potential of ChatGPT as a resource for clinical dermatology, specifically through the photo submission function.
Study Methods
Two senior consultant dermatologists selected 15 clinical images from the Danish web atlas, Danderm,4 depicting several common and rare skin conditions. The images included porphyria cutanea tarda, palmoplantar pustulosis, hidradenitis suppurativa, perioral dermatitis, rosacea, alopecia areata, bullous pemphigoid, erythema multiforme, chronic hand eczema, poikiloderma of Civatte, atopic dermatitis, psoriasis vulgaris, mycosis fungoides (tumour stage), malignant melanoma and granuloma annulare. The images were then uploaded to ChatGPT with the prompt: ‘Please provide a description, a potential diagnosis, and treatment options for the following dermatological condition.’ All images and prompts were run on separate sessions, never combining different conditions.
The responses were then assessed by senior registrars in dermatology and consultant dermatologists to determine accuracy, relevance, and depth, each on a scale from 1 (worst) to 5 (best). The clinical images were also rated on a scale from 1 (worst) to 10 (best). Categorical variables were presented as frequencies and percentages; continuous variables as medians and interquartile ranges (IQR).
Findings
A total of 23 physicians participated in this study, the majority being consultant dermatologists (83%), with 79% being employed at a university clinic and 21% in dermatology private practice. The remainder of participants consisted of senior registrars (17%). Most of the respondents had over 5 years of clinical experience in dermatology (87%), with 11 respondents (48%) having over 10 years of training.
The clinical images illustrated the disease at a median score that ranged from 8 to 10, with an overall median of 10 (IQR: 9 to 10). The overall median rating of the ChatGPT generated responses was 2 (IQR: 1 to 4). Median subratings were2 (IQR: 1 to 4) for relevance, 3 (IQR: 2 to 4) for accuracy, and 2 (IQR: 1 to 3) for depth. Researchers found that the highest overall median ratings were observed for psoriasis vulgaris (IQR: 3 to 5), malignant melanoma (IQR: 3 to 5), pustulosis palmoplantaris (IQR: 3 to 4) and alopecia areata (IQR: 3 to 4), each with a median of 4. They noted the lowest ratings, all with a median of 1, were observed for hidradenitis suppurativa (IQR: 1 to 2), rosacea (IQR: 1 to 2), erythema multiforme (IQR: 1 to 2), poikiloderma of Civatte (IQR: 1 to 2), granuloma annulare (IQR: 1 to 2) and mycosis fungoides (tumour stage) (IQR: 1 to 1).
Participants were also given the option to provide comments, some of which included:
Conclusion
The technology received a low score (2 out of 5) in terms of relevance, accuracy, and depth of the generated responses to 15 illustrative images depicting various dermatologic diseases.Researchers noted half of the conditions were given a median score of 2 or less (53%), meaning the responses were either irrelevant to the condition, superficial or with potentially harmful inaccuracies. Comments provided by participants were mostly negative, some highlighting the risk of misdiagnosis and inappropriate treatment recommendations.
To the knowledge of the researchers, this is the first study investigating the use of ChatGPT in diagnosing and providing information on several dermatological conditions based on clinical images. In the field of dermatology, the researchers noted that few explorative studies have been published on large language models, such as ChatGPT. They also voiced concern on the lack of governmental oversight surrounding AI, creating a gap in overall validation. To remedy this, the researchers advocated for a collaborative and regulated approach to the future development of AI in healthcare.
Related Content
How is Artificial Intelligence Growing Up?
References
1. Nielsen JPS, Grønhøj C, Skov L, et al. Usefulness of the large language model ChatGPT (GPT-4) as a diagnostic tool and information source in dermatology. JEADV Clin Pract. 2024; 1–6. https://doi.org/10.1002/jvc2.459
2. Mirza FN, Lim RK, Yumeen S, et al. Performance of three large language models on dermatology board examinations. J Invest Dermatol. 2024;144(2):398-400. doi:10.1016/j.jid.2023.06.208
3. Gantenbein L, Navarini AA, Maul LV, et al. Internet and social media use in dermatology patients: Search behavior and impact on patient-physician relationship. Dermatol Ther. 2020;33(6):e14098. doi:10.1111/dth.14098
4. Veien NK. An atlas of clinical dermatology. https://danderm-pdv.is.kkh.dk/atlas/index.html