Abstract
Background: Artificial Intelligence (AI), particularly public models like ChatGPT, has revolutionized the generation of human-like thought processes and text. Across healthcare, the integration of AI in decision-making processes is increasingly pervasive. However, the application of AI in ethical decision-making remains relatively unexplored.
Methods: Ethics consultation notes from a tertiary academic medical center were de-identified. We trained ChatGPT using three separate ‘chats’ with one, two, or five unique notes and asked it to produce an ethical analysis/discussion and recommendations for a test case. We conducted this same series again but gave ChatGPT only the ethical analysis/discussion and recommendation sections from the training notes to learn from. Two independent raters scored ChatGPT’s ethics consultation documentation using the validated Ethics Consult Quality Assessment Tool (ECQAT).
Results: When trained with full notes ChatGPT’s ECQAT overall holistic rating score for each ‘chat’ was 2.5 for one note, 1.5 for two, and 2.5 for five. When trained using only the ethical analysis/discussion and recommendation sections, ChatGPT scored 3 for one note, 2 for two, and 1 for five.
Conclusion: ChatGPT's variable performance, influenced by training data, highlights its poor baseline ability and the need for targeted training. While initial improvement was observed with example consultations, complexity and scale affected performance adversely. The findings emphasize the importance of human oversight, as ChatGPT alone is unable to match human expertise. ChatGPT does exhibit potential for substantial improvement with better training and further research is needed to successfully make use of this powerful and widely accessible tool.