Landmark-of-medical-agent

🚀 The Landscape of Medical Agents: A Survey

Overall Landscape

🌟 Overview

This is the official repository for the survey paper: The Landscape of Medical Agents. This repository is a comprehensive and systematic research resource library for medical agents, dedicated to organizing and tracking the latest research progress, application practices, and technological developments of AI intelligent agents in the medical and health field. This investigative project covers the entire ecosystem from basic technical capabilities to clinical actual deployment, providing an authoritative research map for medical AI researchers, clinical practitioners, and system developers.

🤝 Thanks

If you think this project is useful and inspiring, we would greatly appreciate it if you could give us a Star to show your support! Your support is of great significance to us, as it encourages us to continue improving and developing this project.

📖 Keywords

Medical Agents, Clinical Workflows, Safety, Governance and Evaluation

🔥 News

[2025/11/30] We release the initial github repo!

🌟 Contributing

We will try to keep this list updated. If you find any errors or any missed paper, please don’t hesitate to open issues or pull request.Please follow the instruction in CONTRIBUTING.md if you want to make one. Additionally, if you want to have any other issue, please add this wechat group.

🤝 Main Contacts

Xiaobin Hu - ben0xiaobin0hu1@nus.edu.sg

🌟 Table of Contents

Latest Papers
- Year-2025
- Year-2024
- Year-2023
- Earlier
Papers by Category

✨ Latest Papers

🚀 Year-2025

Title	GitHub	Sections
Med-CMR: A Fine-Grained Benchmark Integrating Visual Evidence and Clinical Logic for Medical Complex Multimodal Reasoning	GitHub	evaluation
3mdbench: Medical multimodal multi-agent dialogue benchmark	GitHub	evaluation
A co-evolving agentic AI system for medical imaging analysis	GitHub	other
A dual-agent collaboration framework based on llms for nursing robots to perform bimanual coordination tasks	Not Available	capability
A framework to assess clinical safety and hallucination rates of LLMs for medical text summarisation	Not Available	safety, evaluation
A hybrid reinforcement learning and knowledge graph framework for financial risk optimization in healthcare systems	Not Available	application
A Multi-Agent Approach to Neurological Clinical Reasoning	Not Available	capability, other
A Multimodal Multi-Agent Framework for Radiology Report Generation	Not Available	task, evaluation, other
A Proposed LLM-Based Supported Treatment Framework for Intracerebral Hemorrhage	Not Available	intro, capability, application
A Self-Evolving Framework for Multi-Agent Medical Consultation Based on Large Language Models	Not Available	capability, application
A two-stage proactive dialogue generator for efficient clinical information collection using large language model	Not Available	task, application
Actions speak louder than words: Agent decisions reveal implicit biases in language models	Not Available	safety
Adagent: Llm agent for alzheimer’s disease analysis with collaborative coordinator	Not Available	capability, other
Agent-Based Uncertainty Awareness Improves Automated Radiology Report Labeling with an Open-Source Large Language Model	Not Available	capability, other
Agentic AI for Clinical Decision Support: Real-Time Diagnosis, Triage, and Treatment Planning	Not Available	intro
Agentic Medical Knowledge Graphs Enhance Medical Question Answering: Bridging the Gap Between LLMs and Evolving Medical Knowledge	GitHub	capability, task, application
Agentic Workflows in Healthcare: Advancing Clinical Efficiency through AI Integration	Not Available	intro
Agentic-AI Healthcare: Multilingual, Privacy-First Framework with {MCP} Agents	GitHub	capability, application, safety
AGENTiGraph: A Multi-Agent Knowledge Graph Framework for Interactive, Domain-Specific LLM Chatbots	GitHub	task
Agentmd: Empowering language agents for risk prediction with large-scale clinical tool learning	GitHub	capability, task, application
AI Agents in Clinical Medicine: A Systematic Review	Not Available	evaluation
AI chatbots as professional service agents: developing a professional identity	Not Available	capability, application
AMANDA: Agentic Medical Knowledge Augmentation for Data-Efficient Medical Visual Question Answering	GitHub	capability, task, other
AMQA: An Adversarial Dataset for Benchmarking Bias of LLMs in Medicine and Healthcare	GitHub	evaluation
An active inference strategy for prompting reliable responses from large language models in medical practice	Not Available	capability
An Adaptive Multi-Agent LLM-Based Clinical Decision Support System Integrating Biomedical RAG and Web Intelligence	GitHub	application
An Agentic Model Context Protocol Framework for Medical Concept Standardization	GitHub	task
An agentic system for rare disease diagnosis with traceable reasoning	Not Available	task, application, other
Architecting Clinical Collaboration: Multi-Agent Reasoning Systems for Multimodal Medical VQA	GitHub	task, other
ASTRID–An Automated and Scalable TRIaD for the Evaluation of RAG-based Clinical Question Answering Systems	Not Available	safety
At-cxr: Uncertainty-aware agentic triage for chest x-rays	GitHub	intro, other
Audited Reasoning Refinement: Fine-Tuning Language Models via LLM-Guided Step-Wise Evaluation and Correction	Not Available	evaluation
AURA: A Multi-modal Medical Agent for Understanding, Reasoning and Annotation	GitHub	other
Autonomous Multi-Modal LLM Agents for Treatment Planning in Focused Ultrasound Ablation Surgery	GitHub	capability, task, application
Autonomous Radiotherapy Treatment Planning Using DOLA: A Privacy-Preserving, LLM-Based Optimization Agent	Not Available	safety, other
Balancing Fairness and Performance in Healthcare {AI}: A Gradient Reconciliation Approach	Not Available	safety
Benchmarking Automatic Speech Recognition coupled LLM Modules for Medical Diagnostics	Not Available	application
Beyond Benchmarks: Dynamic, Automatic and Systematic Red-Teaming Agents for Trustworthy Medical Language Models	GitHub	evaluation
Beyond Benchmarks: Evaluating Generalist Medical Artificial Intelligence With Psychometrics	Not Available	evaluation
Bridging Clinical Narratives and ACR Appropriateness Guidelines: A Multi-Agent RAG System for Medical Imaging Decisions	GitHub	task
Cancer-Myth: Evaluating Large Language Models on Patient Questions with False Presuppositions	GitHub	other
CancerGUIDE: Cancer Guideline Understanding via Internal Disagreement Estimation	GitHub	intro, capability, application, other
CARE-AD: a multi-agent large language model framework for Alzheimer’s disease prediction using longitudinal clinical notes	Not Available	capability, task, other
CataractSurg-80K: Knowledge-Driven Benchmarking for Structured Reasoning in Ophthalmic Surgery Planning	Not Available	capability, application, other
Chatbot To Help Patients Understand Their Health	GitHub	application
ChatMyopia: An AI Agent for Pre-consultation Education in Primary Eye Care Settings	Not Available	capability, application, other
Cod, towards an interpretable medical agent using chain of diagnosis	GitHub	capability, application, safety, evaluation, other
Code Like Humans: A Multi-Agent Solution for Medical Coding	GitHub	application
Conversational health agents: a personalized large language model-powered agent framework	GitHub	safety
CT-Agent: A Multimodal-LLM Agent for 3D CT Radiology Question Answering	Not Available	application
Data Overdose? Time for a Quadruple Shot: Knowledge Graph Construction Using Enhanced Triple Extraction	Not Available	task
Data Poisoning Vulnerabilities Across Healthcare AI Architectures: A Security Threat Analysis	Not Available	safety
Developing an Artificial Intelligence Tool for Personalized Breast Cancer Treatment Plans based on the NCCN Guidelines	Not Available	intro, capability, other
Development and validation of an autonomous artificial intelligence agent for clinical decision-making in oncology	Not Available	application, other
Differential privacy for medical deep learning: methods, tradeoffs, and deployment implications	Not Available	safety
Discovering Pathology Rationale and Token Allocation for Efficient Multimodal Pathology Reasoning	Not Available	evaluation
DispatchMAS: Fusing taxonomy and artificial intelligence agents for emergency medical services	Not Available	task, other
Doctor-R1: Mastering Clinical Inquiry with Experiential Agentic Reinforcement Learning	GitHub	capability, application
Doctoragent-rl: A multi-agent collaborative reinforcement learning system for multi-turn clinical dialogue	GitHub	task
Dr. Copilot: A Multi-Agent Prompt Optimized Assistant for Improving Patient-Doctor Communication in Romanian	Not Available	capability, task
Drugagent: Multi-agent large language model-based reasoning for drug-target interaction prediction	Not Available	task
EH-Benchmark: Ophthalmic hallucination benchmark and agent-driven top-down traceable reasoning workflow	GitHub	other
Ehr-mcp: Real-world evaluation of clinical information retrieval by large language models via model context protocol	Not Available	evaluation
Emerging cyber attack risks of medical ai agents	Not Available	safety
Enhancing diagnostic capability with multi-agents conversational large language models	GitHub	capability, task, application, other
Enhancing Medical Lung X-Ray Diagnosis Through Multi-Agent Vision-Language Model Collaboration	Not Available	capability, application
Ensuring Safety and Trust: Analyzing the Risks of Large Language Models in Medicine	Not Available	evaluation
Evaluating the accuracy of a state-of-the-art large language model for prediction of admissions from the emergency room	Not Available	evaluation
Evaluating transparency in AI/ML model characteristics for FDA-reviewed medical devices	Not Available	safety
Explainable AI for medical data: Current methods, limitations, and future directions	Not Available	safety
Eyecaregpt: Boosting comprehensive ophthalmology understanding with tailored dataset, benchmark and model	GitHub	application, other
Feat: A multi-agent forensic ai system with domain-adapted large language model for automated cause-of-death analysis	GitHub	capability
Fine-tuning vision language models with graph-based knowledge for explainable medical image analysis	Not Available	capability
FRAME: Feedback-Refined Agent Methodology for Enhancing Medical Research Insights	GitHub	other
GEMA-Score: Granular Explainable Multi-Agent Score for Radiology Report Evaluation	GitHub	other
Geometry-preserving encoder/decoder in latent generative models	GitHub	evaluation
GMAT: Grounded Multi-agent Clinical Description Generation for Text Encoder in Vision-Language MIL for Whole Slide Image Classification	Not Available	task
Haibu Mathematical-Medical Intelligent Agent: Enhancing Large Language Model Reliability in Medical Tasks via Verifiable Reasoning Chains	Not Available	capability
Demo: Healthcare Agent Orchestrator (HAO) for Patient Summarization in Molecular Tumor Boards	Not Available	sections: task, other
Healthcare Agent: Eliciting the Power of Large Language Models for Medical Consultation	Not Available	capability, application
Image Segmentation Using Only” Better or Worse” Expert Feedback	Not Available	task
Improving Interactive Diagnostic Ability of a Large Language Model Agent Through Clinical Experience Learning	GitHub	capability, application
In-Basket Message Volume in Primary Care: A Cross-sectional Analysis by Gender and Specialty	Not Available	intro
KERAP: A Knowledge-Enhanced Reasoning Approach for Accurate Zero-shot Diagnosis Prediction Using Multi-agent LLMs	GitHub	capability
Large language models in real-world clinical workflows: a systematic review of applications and implementation	Not Available	evaluation
Learning to be a doctor: Searching for effective medical agent architectures	Not Available	capability, application, other
Lessons Learned from Evaluation of LLM based Multi-agents in Safer Therapy Recommendation	GitHub	task, other
LINS: A general medical Q\&A framework for enhancing the quality and credibility of LLM-generated responses	GitHub	capability
Llms can simulate standardized patients via agent coevolution	GitHub	task, application
M3Builder: A Multi-Agent System for Automated Machine Learning in Medical Imaging	GitHub	task
Magnetic Milli-Spinner for Robotic Endovascular Surgery	Not Available	task
MAM: Modular Multi-Agent Framework for Multi-Modal Medical Diagnosis via Role-Specialized Collaboration	GitHub	application, other
Mdteamgpt: A self-evolving llm-based multi-agent framework for multi-disciplinary team medical consultation	GitHub	task, application
Measurement to Meaning: A Validity-Centered Framework for AI Evaluation	Not Available	evaluation
Med-TAMARA: Trust-Aware Multi-Agent Risk Assessment in Medical AI Dialogue	Not Available	capability, application
Med-VRAgent: A Framework for Medical Visual Reasoning-Enhanced Agents	GitHub	capability, application
MedAgent-Pro: Towards Evidence-Based Multi-Modal Medical Diagnosis via Reasoning Agentic Workflow	Not Available	safety, evaluation, other
MedAgentAudit: Diagnosing and Quantifying Collaborative Failure Modes in Medical Multi-Agent Systems	GitHub	capability, safety, evaluation
MedAgentBench: Dataset for Benchmarking LLMs as Agents	GitHub	evaluation
MedAgentBoard: Benchmarking Multi-Agent Collaboration with Conventional Methods for Diverse Medical Tasks	GitHub	evaluation
MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning	GitHub	evaluation
MedAgentSim: Self-evolving Multi-agent Simulations for Realistic Clinical Interactions	GitHub	task, other
MedBrowseComp: Benchmarking Medical Deep Research and Computer Use	GitHub	evaluation
Medchat: A multi-agent framework for multimodal diagnosis with large language models	GitHub	other
MedCoAct: Confidence-Aware Multi-Agent Collaboration for Complete Clinical Decision	Not Available	capability, application
Meddxagent: A unified modular agent framework for explainable automatic differential diagnosis	GitHub	other
MedFact: A Large-scale Chinese Dataset for Evidence-based Medical Fact-Checking of LLM Responses	GitHub	evaluation
Medhallu: A comprehensive benchmark for detecting medical hallucinations in large language models	GitHub	safety
Mediator-guided multi-agent collaboration among open-source models for medical decision-making	Not Available	task, other
Medical AI Consensus: A Multi-Agent Framework for Radiology Report Generation and Evaluation	Not Available	capability, task
Medical hallucinations in foundation models and their impact on healthcare	GitHub	safety
Position: Medical large language model benchmarks should prioritize construct validity	Not Available	evaluation
MedicalOS: An {LLM} Agent based Operating System for Digital Healthcare	Not Available	capability, application
MedKGent: A Large Language Model Agent Framework for Constructing Temporally Evolving Medical Knowledge Graph	GitHub	task
MedKGEval: A Knowledge Graph-Based Multi-Turn Evaluation Framework for Open-Ended Patient Interactions with Clinical LLMs	Not Available	task
MedLA: A Logic-Driven Multi-Agent Framework for Complex Medical Reasoning with Large Language Models	GitHub	application
Medmmv: A controllable multimodal multi-agent framework for reliable and verifiable clinical reasoning	Not Available	task, safety, evaluation, other
MedOrch: Medical Diagnosis with Tool-Augmented Reasoning Agents for Flexible Extensibility	Not Available	task, other
MedPAO: A Protocol-Driven Agent for Structuring Medical Reports	GitHub	capability, task, application, other
Medrax: Medical reasoning agent for chest x-ray	GitHub	capability, application, other
MedRepBench: A Comprehensive Benchmark for Medical Report Interpretation	Not Available	capability, evaluation
Medresearcher-r1: Expert-level medical deep researcher via a knowledge-informed trajectory synthesis framework	GitHub	evaluation
MedTrust-RAG: Evidence Verification and Trust Alignment for Biomedical Question Answering	Not Available	capability, task
MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding	GitHub	evaluation
MEGA-RAG: a retrieval-augmented generation framework with multi-evidence guided answer refinement for mitigating hallucinations of LLMs in public health	Not Available	safety
MMedAgent-RL: Optimizing Multi-Agent Collaboration for Multimodal Medical Reasoning	Not Available	task, other
MoMA: A Mixture-of-Multimodal-Agents Architecture for Enhancing Clinical Prediction Modelling	Not Available	capability, task, other
MRGAgents: A Multi-Agent Framework for Improved Medical Report Generation with Med-LVLMs	Not Available	task, other
Multi agent based medical assistant for edge devices	GitHub	safety
Multi-Agent-as-Judge: Aligning LLM-Agent-Based Automated Evaluation with Multi-Dimensional Human Evaluation	Not Available	evaluation
Multi-OphthaLingua: A Multilingual Benchmark for Assessing and Debiasing LLM Ophthalmological QA in {LMIC}s	Not Available	capability, application
Multimodal Models in Healthcare: Methods, Challenges, and Future Directions for Enhanced Clinical Decision Support	Not Available	intro
NurseLLM: The First Specialized Language Model for Nursing	Not Available	capability
OAAgent: Multimodal LLM Agent for Predicting Knee Osteoarthritis Progression	Not Available	capability
OpenLens AI: Fully Autonomous Research Agent for Health Infomatics	GitHub	capability, application
PASS: Probabilistic Agentic Supernet Sampling for Interpretable and Adaptive Chest X-Ray Reasoning	GitHub	other
Pathfinder: A multi-modal multi-agent system for medical diagnostic decision-making applied to histopathology	GitHub	task
Patient-Zero: A Unified Framework for Real-Record-Free Patient Agent Generation	Not Available	capability, application, safety
Performance of Retrieval-Augmented Generation Large Language Models in Guideline-Concordant Prostate-Specific Antigen Testing: Comparative Study With Junior Clinicians	Not Available	evaluation
Privacy in action: Towards realistic privacy mitigation and evaluation for llm-powered agents	GitHub	safety
Proactive Reasoning-with-Retrieval Framework for Medical Multimodal Large Language Models	GitHub	capability
Program Synthesis Dialog Agents for Interactive Decision-Making	Not Available	safety
Proof-of-TBI–Fine-Tuned Vision Language Model Consortium and OpenAI-o3 Reasoning LLM-Based Medical Diagnosis Support System for Mild Traumatic Brain Injury (TBI) Prediction	Not Available	capability, application
Rapidly benchmarking large language models for diagnosing comorbid patients: comparative study leveraging the LLM-as-a-judge method	Not Available	evaluation
Real-World Evaluation of Large Language Models in Healthcare (RWE-LLM): A New Realm of AI Safety \& Validation	Not Available	evaluation
Red-teaming llm multi-agent systems via communication attacks	Not Available	safety
Reducing Hallucinations and Trade-Offs in Responses in Generative AI Chatbots for Cancer Information: Development and Evaluation Study	Not Available	safety
ReflecTool: Towards Reflection-Aware Tool-Augmented Clinical Agents	GitHub	capability, application
Resilient Multi-Agent Negotiation for Medical Supply Chains: Integrating LLMs and Blockchain for Transparent Coordination	Not Available	task
RxLens: Multi-Agent LLM-powered Scan and Order for Pharmacy	Not Available	capability, application, other
SCOPE: Speech-Guided COllaborative PErception Framework for Surgical Scene Segmentation	Not Available	task, other
Self-Assessment of Content, Pedagogy, and Technology Knowledge among Higher Education Academics in Bahrain	Not Available	capability
Silence is Not Consensus: Disrupting Agreement Bias in Multi-Agent LLMs via Catfish Agent for Clinical Decision Making	Not Available	capability
SmartState: An Automated Research Protocol Adherence System	GitHub	capability, application
SOLVE-Med: Specialized Orchestration for Leading Vertical Experts across Medical Specialties	GitHub	task
Standard Applicability Judgment and Cross-jurisdictional Reasoning: A RAG-based Framework for Medical Device Compliance	Not Available	capability
Surgraw: Multi-agent workflow with chain-of-thought reasoning for surgical intelligence	GitHub	task
Survey and improvement strategies for gene prioritization with large language models	Not Available	task, other
Talk Before You Retrieve: Agent-Led Discussions for Better RAG in Medical QA	GitHub	capability
TAMAS: Benchmarking Adversarial Risks in Multi-Agent LLM Systems	Not Available	safety
The evaluation illusion of large language models in medicine	Not Available	evaluation
Tiered Agentic Oversight: A Hierarchical Multi-Agent System for AI Safety in Healthcare	GitHub	capability, safety
Tool learning with large language models: A survey	GitHub	intro
Towards conversational diagnostic artificial intelligence	Not Available	task
Towards interpretable radiology report generation via concept bottlenecks using a multi-agentic rag	GitHub	task
Towards safe ai clinicians: A comprehensive study on large language model jailbreaking in healthcare	Not Available	safety
Transforming healthcare delivery with conversational AI platforms	Not Available	safety
Tree-based RAG-Agent Recommendation System: A Case Study in Medical Test Data	Not Available	capability, task, application
Tree-of-Reasoning: Towards Complex Medical Diagnosis via Multi-Agent Reasoning with Evidence Tree	GitHub	safety
Trustworthy Reasoning: Evaluating and Enhancing Factual Accuracy in LLM Intermediate Thought Processes	Not Available	evaluation
TxAgent: An AI agent for therapeutic reasoning across a universe of tools	GitHub	intro, task, other
Using large language models for enhanced fraud analysis and detection in blockchain based health insurance claims	GitHub	capability, application
Vision-language model for report generation and outcome prediction in CT pulmonary angiogram	GitHub	capability, application
Visual-Conversational Interface for Evidence-Based Explanation of Diabetes Risk Prediction	GitHub	intro
When Avatars Have Personality: Effects on Engagement and Communication in Immersive Medical Training	Not Available	application
World Model for AI Autonomous Navigation in Mechanical Thrombectomy	GitHub	capability, task, application
Zero-Shot Large Language Model Agents for Fully Automated Radiotherapy Treatment Planning	Not Available	capability, application, other
Bias-Aware Agent: Enhancing Fairness in AI-Driven Knowledge Retrieval	GitHub	safety
EMR-AGENT: Automating Cohort and Feature Extraction from EMR Databases	GitHub	capability, task, application
Colacare: Enhancing electronic health record modeling through large language model-driven multi-agent collaboration	GitHub	capability, application
A Knowledge-driven Adaptive Collaboration of LLMs for Enhancing Medical Decision-making	GitHub	capability, application
MeNTi: Bridging medical calculator and LLM agent with nested tool calling	GitHub	capability, application
Med-GRIM: Enhanced Zero-Shot Medical VQA using prompt-embedded Multimodal Graph RAG	GitHub	application, other
How Can We Diagnose and Treat Bias in Large Language Models for Clinical Decision-Making?	Not Available	evaluation

🚀 Year-2024

Title	GitHub	Sections
A demonstration of adaptive collaboration of large language models for medical decision-making	GitHub	capability, application
A survey on large language model based autonomous agents	Not Available	intro
Achieving health equity through conversational AI: A roadmap for design and implementation of inclusive chatbots in healthcare	Not Available	safety
Adaptive Reasoning and Acting in Medical Language Agents	Not Available	capability, application
Adversarial attacks on large language models in medicine	Not Available	safety
Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents	GitHub	capability, task, application
AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments	GitHub	intro, capability, application, evaluation, other
Agentic llm workflows for generating patient-friendly medical reports	GitHub	capability, task, application, other
Agentigraph: An interactive knowledge graph platform for llm-based chatbots utilizing private data	GitHub	capability, application
AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator	GitHub	capability, task, application, evaluation
Aligning Medical LLMs for Counterfactual Fairness	GitHub	safety
ArgMed-Agents: explainable clinical decision reasoning with LLM disscusion via argumentation schemes	Not Available	capability, application
Autohealth: Advanced llm-empowered wearable personalized medical butler for parkinson’s disease management	Not Available	other
Autonomous artificial intelligence agents for clinical decision making in oncology	Not Available	capability, other
Benchmarking Large Language Models on Communicative Medical Coaching: A Dataset and a Novel System	GitHub	capability, application
Beyond direct diagnosis: LLM-based multi-specialist agent consultation for automatic diagnosis	Not Available	capability
Chatdev: Communicative agents for software development	GitHub	task
ClinicalAgent: Clinical Trial Multi-Agent System with Large Language Model-based Reasoning	GitHub	intro, capability, application, other
ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World	GitHub	capability, evaluation
Cxr-agent: Vision-language models for chest x-ray interpretation with uncertainty aware radiology reporting	Not Available	capability, application, other
Development of a Large Language Model-based Multi-Agent Clinical Decision Support System for Korean Triage and Acuity Scale (KTAS)-Based Triage and Treatment Planning in Emergency Departments	GitHub	capability, task, other
Ehragent: Code empowers large language models for few-shot complex tabular reasoning on electronic health records	GitHub	capability, task, application
Enhancing diagnostic accuracy through multi-agent conversations: using large language models to mitigate cognitive bias	Not Available	application
Enhancing llms for impression generation in radiology reports through a multi-agent system	GitHub	capability, task, application, other
Ethical and regulatory challenges of large language models in medicine	Not Available	safety
Evaluating Large Language Models as Agents in the Clinic	Not Available	capability, evaluation
Exploring llm multi-agents for icd coding	Not Available	application
Exploring LLM-based Data Annotation Strategies for Medical Dialogue Preference Alignment	Not Available	capability
Foundation metrics for evaluating effectiveness of healthcare conversations powered by generative AI	Not Available	safety
GuidelineGuard: An Agentic Framework for Medical Note Evaluation with Guideline Adherence	Not Available	capability, application
Imas: A comprehensive agentic approach to rural healthcare delivery	GitHub	capability, application
Improving Clinical Documentation with AI: A Comparative Study of Sporo AI Scribe and GPT-4o mini	Not Available	capability, task, application
Inquire, Interact, and Integrate: A Proactive Agent Collaborative Framework for Zero-Shot Multimodal Medical Reasoning	GitHub	capability, application
Integration of multi-source medical data for medical diagnosis question answering	GitHub	capability, application
Iryonlp at mediqa-corr 2024: Tackling the medical error detection \& correction task on the shoulders of medical agents	Not Available	capability, task, application
KG4Diagnosis: A Hierarchical Multi-Agent LLM Framework with Knowledge Graph Enhancement for Medical Diagnosis	Not Available	capability, application
Knowledge-infused llm-powered conversational health agent: A case study for diabetes patients	Not Available	capability
Large Language Model-Enhanced Interactive Agent for Public Education on Newborn Auricular Deformities	Not Available	capability, application
Llm-based framework for administrative task automation in healthcare	Not Available	capability, application
Llm-medqa: Enhancing medical question answering through case studies in large language models	Not Available	capability, application
MAGDA: Multi-agent guideline-driven diagnostic assistance	Not Available	capability, task, application
MALADE: orchestration of LLM-powered agents with retrieval augmented generation for pharmacovigilance	GitHub	intro, capability, application, other
Mdagents: An adaptive collaboration of llms for medical decision-making	GitHub	capability, task, application
MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning	GitHub	capability, task, other
MedAide: Towards an Omni Medical Aide via Specialized {LLM}-based Multi-Agent Collaboration	Not Available	capability
Medco: Medical education copilots based on a multi-agent framework	Not Available	capability, task, application
MedChain: Bridging the Gap Between LLM Agents and Clinical Practice through Interactive Sequential Benchmarking	GitHub	capability, application, evaluation
MedGen: An Explainable Multi-Agent Architecture for Clinical Decision Support through Multisource Knowledge Fusion	Not Available	capability, application
Medhalu: Hallucinations in responses to healthcare queries by large language models	Not Available	safety
MedQA-CS: Benchmarking Large Language Models’ Clinical Skills Using an AI-SCE Framework	GitHub	evaluation
Mitigating cognitive biases in clinical decision-making through multi-agent conversations using large language models: simulation study	Not Available	capability, application
Mitigating hallucinations in large language models via self-refinement-enhanced knowledge retrieval	Not Available	safety
Mmedagent: Learning to use medical tools with multi-modal agent	GitHub	capability, application, other
MMLU-Pro: A More Robust Benchmark for Multi-Task Language Understanding	GitHub	evaluation
Natural Language Programming in Medicine: Administering Evidence Based Clinical Workflows with Autonomous Agents Powered by Generative Large Language Models	Not Available	capability, application
On protecting the data privacy of large language models (llms): A survey	Not Available	safety
On the resilience of llm-based multi-agent collaboration with faulty agents	Not Available	safety
Piors: Personalized intelligent outpatient reception based on large language model with multi-agents medical scenario simulation	GitHub	capability, task, application
Polaris: A safety-focused llm constellation architecture for healthcare	Not Available	capability, application, evaluation
Privacy-Preserving Large Language Models: Mechanisms	Not Available	safety
Advancing healthcare automation: Multi-agent system for medical necessity justification	Not Available	capability, task, application
RareAgents: Advancing Rare Disease Care through LLM-Empowered Multi-disciplinary Team	Not Available	capability, application
RareAgents: Autonomous Multi-disciplinary Team for Rare Disease Diagnosis and Treatment	Not Available	task, other
Regulator-manufacturer AI agents modeling: Mathematical feedback-driven multi-agent LLM framework	Not available	capability, task, application
Remoni: An autonomous system integrating wearables and multimodal large language models for enhanced remote health monitoring	Not available	capability, application
Rx strategist: Prescription verification using llm agents system	Not available	capability, task, application, other
Simulated patient systems are intelligent when powered by large language model-based AI agents	Not available	capability, application
Smile: Single-turn to multi-turn inclusive language expansion via chatgpt for mental health support	Github	task
Society of medical simplifiers	GitHub	capability, task, application
Surgbox: Agent-driven operating room sandbox with surgery copilot	Not available	capability, task, application
T-agent: A term-aware agent for medical dialogue generation	Not available	capability, application
The role of explainability in AI-supported medical decision-making	Not available	safety
Towards anatomy education with generative AI-based virtual assistants in immersive virtual reality environments	Not available	application
Towards Automatic Evaluation for {LLM}s’ Clinical Capabilities: Metric, Data, and Algorithm	Not available	capability, application, evaluation
Towards next-generation medical agent: How o1 is reshaping decision-making in medical scenarios	Not available	capability, other
A trustworthy AI reality-check: the lack of transparency of artificial intelligence products in healthcare	Not available	safety
TWIN-GPT: digital twins for clinical trials via large language model	Not available	other
UMass-BioNLP at MEDIQA-M3G 2024: DermPrompt–A Systematic Exploration of Prompt Engineering with GPT-4V for Dermatological Diagnosis	GitHub	capability, other
Zodiac: A cardiologist-level llm framework for multi-agent diagnostics	Not available	capability
OpenAI o1 System Card	GitHub	capability
FedAgentBench: Towards Automating Real-World Federated Medical Image Analysis with Server–Client LLM Agents	Not available	evaluation
Evaluating large language models as agents in the clinic		Not available	evaluation

🚀 Year-2023

Title	GitHub	Sections
A reinforcement learning approach for VQA validation: An application to diabetic macular edema grading	Not available	capability
Adaptive multi-agent deep reinforcement learning for timely healthcare interventions	Not available	capability, application
Asynchronous decentralized federated lifelong learning for landmark localization in medical imaging	Not available	capability
Beyond memorization: Violating privacy via inference with large language models	Not available	safety
Camel: Communicative agents for” mind” exploration of large language model society	Not available	task
Clinically-inspired multi-agent transformers for disease trajectory forecasting from multimodal data	GitHub	capability, application
Cognitive architectures for language agents	Not available	intro
Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum	Not available	evaluation
Deep Imitation Learning for Automated Drop-In Gamma Probe Manipulation	Not available	task
Diaggpt: An llm-based and multi-agent dialogue system with automatic topic management for flexible task-oriented dialogue	GitHub	application
Dspy: Compiling declarative language model calls into self-improving pipelines	GitHub	task
Federated machine learning, privacy-enhancing technologies, and data protection laws in medical research: scoping review	Not available	safety
Generative agents: Interactive simulacra of human behavior	GitHub	intro
Interactive medical image segmentation with self-adaptive confidence calibration	GitHub	task
Large language models as agents in the clinic	Not available	application
MetaGPT: Meta programming for a multi-agent collaborative framework	GitHub	task
Navigation Through Endoluminal Channels Using Q-Learning	Not available	capability, task
PROFSA: SELF-SUPERVISED POCKET PRETRAINING VIA PROTEIN FRAGMENT-SURROUNDINGS ALIGN	GitHub	capability
Reflexion: Language agents with verbal reinforcement learning	GitHub	capability
Td-mpc2: Scalable, robust world models for continuous control	GitHub	task
Temporally-extended prompts optimization for sam in interactive medical image segmentation	Not available	capability, task
The NCI Imaging Data Commons as a platform for reproducible research in computational pathology	Not available	intro
Towards Causality-Aware Inferring: A Sequential Discriminative Approach for Medical Diagnosis	Not available	capability, application

🚀 Earlier

Title	GitHub	Sections
“My Nose is Running.” “Are You Also Coughing?”: Building a Medical Diagnosis Agent with Interpretable Inquiry Logics	GitHub	capability, application
A Flexible Schema-Guided Dialogue Management Framework: From Friendly Peer to Virtual Standardized Cancer Patient	GitHub	capability, application
Building an {ASR} Error Robust Spoken Virtual Patient System in a Highly Class-Imbalanced Scenario Without Speech Data	Not available	capability, application
Constitutional ai: Harmlessness from ai feedback	GitHub	capability
MedDG: An Entity-Centric Medical Consultation Dataset for Entity-Aware Medical Dialogue Generation	GitHub	evaluation
MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning	Not available	intro
Multi-agent searching system for medical information	Not available	capability
React: Synergizing reasoning and acting in language models	GitHub	intro
Scalable Online Disease Diagnosis via Multi-Model-Fused Actor-Critic Reinforcement Learning	Not available	capability, application
MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical Domain Question Answering	GitHub	capability, evaluation
A grounded well-being conversational agent with multiple interaction modes: Preliminary results	Not available	capability, application
Adaptable image quality assessment using meta-reinforcement learning of task amenability	GitHub	capability
An Edge Based Multi-Agent Model for Improving Hospital Bed Management	Not available	task
Cross Modality 3D Navigation Using Reinforcement Learning and Neural Style Transfer	Not available	capability
Extracting Training Data from Large Language Models	GitHub	safety
Human-AI collaboration in healthcare: A review and research agenda	Not available	task
Levels of autonomy and safety assurance for AI-Based clinical decision systems	Not available	safety
Measuring Massive Multitask Language Understanding	GitHub	evaluation
Autonomous systems and artificial intelligence in healthcare transformation to 5P medicine–ethical challenges	Not available	intro
Boundary-aware supervoxel-level iteratively refined interactive 3d image segmentation with multi-agent reinforcement learning	Not available	task
MedDialog: A Large-scale Medical Dialogue Dataset	Not available	evaluation
Medical visual question answering via conditional reasoning	GitHub	task
PathVQA: 30000+ Questions for Medical Visual Question Answering	GitHub	evaluation
PubMedQA: A Dataset for Biomedical Research Question Answering	GitHub	evaluation
A Dataset of Clinically Generated Visual Questions and Answers About Radiology Images	Not available	evaluation
Modeling irregularly sampled clinical time series	GitHub	intro
The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care	GitHub	task
Medical robotics—Regulatory, ethical, and legal considerations for increasing levels of autonomy	Not available	task
Tethered to the EHR: primary care physician workload assessment using EHR event log data and time-motion observations	Not available	intro
Allocation of physician time in ambulatory practice: a time and motion study in 4 specialties	Not available	intro
Assessing electronic note quality using the physician documentation quality instrument (PDQI-9)	Not available	task
Privacy by design: The 7 foundational principles	Not available	safety
Upper processing stages of the perception–action cycle	Not available	intro
What Disease Does This Patient Have? A Large-scale Open Domain Question Answering Dataset from Medical Exams	GitHub	evaluation

✨ Papers by Category

🚀 1. Capability

1.1 Planning

Title	GitHub	Year
Evaluating Large Language Models as Agents in the Clinic	Not Available	2024
MedPAO: A Protocol-Driven Agent for Structuring Medical Reports	GitHub	2025
Adaptable image quality assessment using meta-reinforcement learning of task amenability	GitHub	2023
World Model for AI Autonomous Navigation in Mechanical Thrombectomy	GitHub	2025
Polaris: A safety-focused llm constellation architecture for healthcare	Not Available	2024
MedChain: Bridging the Gap Between LLM Agents and Clinical Practice through Interactive Sequential Benchmarking	GitHub	2024
Rx strategist: Prescription verification using llm agents system	Not Available	2024
A Flexible Schema-Guided Dialogue Management Framework: From Friendly Peer to Virtual Standardized Cancer Patient	GitHub	2023
Surgbox: Agent-driven operating room sandbox with surgery copilot	Not Available	2024
MedicalOS: An {LLM} Agent based Operating System for Digital Healthcare	Not Available	2025
“My Nose is Running.” “Are You Also Coughing?”: Building a Medical Diagnosis Agent with Interpretable Inquiry Logics	GitHub	2023
Cross Modality 3D Navigation Using Reinforcement Learning and Neural Style Transfer	Not Available	2023
Scalable Online Disease Diagnosis via Multi-Model-Fused Actor-Critic Reinforcement Learning	Not Available	2023
AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator	GitHub	2024
Inquire, Interact, and Integrate: A Proactive Agent Collaborative Framework for Zero-Shot Multimodal Medical Reasoning	GitHub	2024
Llm-medqa: Enhancing medical question answering through case studies in large language models	Not Available	2024
Medco: Medical education copilots based on a multi-agent framework	Not Available	2024
Mitigating cognitive biases in clinical decision-making through multi-agent conversations using large language models: simulation study	Not Available	2024
Regulator-manufacturer AI agents modeling: Mathematical feedback-driven multi-agent LLM framework	Not Available	2024
Society of medical simplifiers	GitHub	2024
Towards Automatic Evaluation for {LLM}s’ Clinical Capabilities: Metric, Data, and Algorithm	Not Available	2024
Towards next-generation medical agent: How o1 is reshaping decision-making in medical scenarios	Not Available	2024
A Knowledge-driven Adaptive Collaboration of LLMs for Enhancing Medical Decision-making	GitHub	2025
A dual-agent collaboration framework based on llms for nursing robots to perform bimanual coordination tasks	Not Available	2025
Agentic Medical Knowledge Graphs Enhance Medical Question Answering: Bridging the Gap Between LLMs and Evolving Medical Knowledge	GitHub	2025
Agentic-AI Healthcare: Multilingual, Privacy-First Framework with {MCP} Agents	GitHub	2025
Autonomous Multi-Modal LLM Agents for Treatment Planning in Focused Ultrasound Ablation Surgery	GitHub	2025
Colacare: Enhancing electronic health record modeling through large language model-driven multi-agent collaboration	GitHub	2025
Haibu Mathematical-Medical Intelligent Agent: Enhancing Large Language Model Reliability in Medical Tasks via Verifiable Reasoning Chains	Not Available	2025
Improving Interactive Diagnostic Ability of a Large Language Model Agent Through Clinical Experience Learning	GitHub	2025
Learning to be a doctor: Searching for effective medical agent architectures	Not Available	2025
MeNTi: Bridging medical calculator and LLM agent with nested tool calling	GitHub	2025
OpenLens AI: Fully Autonomous Research Agent for Health Infomatics	GitHub	2025
Talk Before You Retrieve: Agent-Led Discussions for Better RAG in Medical QA	GitHub	2025
Tree-based RAG-Agent Recommendation System: A Case Study in Medical Test Data	Not Available	2025
AI chatbots as professional service agents: developing a professional identity	Not Available	2025
A reinforcement learning approach for VQA validation: An application to diabetic macular edema grading	Not Available	2023
Building an {ASR} Error Robust Spoken Virtual Patient System in a Highly Class-Imbalanced Scenario Without Speech Data	Not Available	2023
MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical Domain Question Answering	GitHub	2023
Multi-agent searching system for medical information	Not Available	2023
A demonstration of adaptive collaboration of large language models for medical decision-making	GitHub	2024
Agentic llm workflows for generating patient-friendly medical reports	GitHub	2024
Agentigraph: An interactive knowledge graph platform for llm-based chatbots utilizing private data	GitHub	2024
Improving Clinical Documentation with AI: A Comparative Study of Sporo AI Scribe and GPT-4o mini	Not Available	2024
Piors: Personalized intelligent outpatient reception based on large language model with multi-agents medical scenario simulation	GitHub	2024
SmartState: An Automated Research Protocol Adherence System	GitHub	2025

1.2 Tool Use

Title	GitHub	Year
ClinicalAgent: Clinical Trial Multi-Agent System with Large Language Model-based Reasoning	GitHub	2024
Doctor-R1: Mastering Clinical Inquiry with Experiential Agentic Reinforcement Learning	GitHub	2025
KERAP: A Knowledge-Enhanced Reasoning Approach for Accurate Zero-shot Diagnosis Prediction Using Multi-agent LLMs	GitHub	2025
Iryonlp at mediqa-corr 2024: Tackling the medical error detection \& correction task on the shoulders of medical agents	Not Available	2024
Proactive Reasoning-with-Retrieval Framework for Medical Multimodal Large Language Models	GitHub	2025
Cxr-agent: Vision-language models for chest x-ray interpretation with uncertainty aware radiology reporting	Not Available	2024
Enhancing llms for impression generation in radiology reports through a multi-agent system	GitHub	2024
GuidelineGuard: An Agentic Framework for Medical Note Evaluation with Guideline Adherence	Not Available	2024
Medrax: Medical reasoning agent for chest x-ray	GitHub	2025
Mmedagent: Learning to use medical tools with multi-modal agent	GitHub	2024
Med-VRAgent: A Framework for Medical Visual Reasoning-Enhanced Agents	GitHub	2025
MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning	GitHub	2024
AMANDA: Agentic Medical Knowledge Augmentation for Data-Efficient Medical Visual Question Answering	GitHub	2025
NurseLLM: The First Specialized Language Model for Nursing	Not Available	2025
Autonomous artificial intelligence agents for clinical decision making in oncology	Not Available	2024
ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World	GitHub	2024
Llm-based framework for administrative task automation in healthcare	Not Available	2024
A Multi-Agent Approach to Neurological Clinical Reasoning	Not Available	2025
Adagent: Llm agent for alzheimer’s disease analysis with collaborative coordinator	Not Available	2025
ReflecTool: Towards Reflection-Aware Tool-Augmented Clinical Agents	GitHub	2025
Development of a Large Language Model-based Multi-Agent Clinical Decision Support System for Korean Triage and Acuity Scale (KTAS)-Based Triage and Treatment Planning in Emergency Departments	GitHub	2024
Large Language Model-Enhanced Interactive Agent for Public Education on Newborn Auricular Deformities	Not Available	2024
Natural Language Programming in Medicine: Administering Evidence Based Clinical Workflows with Autonomous Agents Powered by Generative Large Language Models	Not Available	2024
Agentmd: Empowering language agents for risk prediction with large-scale clinical tool learning	GitHub	2025
MoMA: A Mixture-of-Multimodal-Agents Architecture for Enhancing Clinical Prediction Modelling	Not Available	2025
Multi-OphthaLingua: A Multilingual Benchmark for Assessing and Debiasing LLM Ophthalmological QA in {LMIC}s	Not Available	2025

1.3 Memory

Title	GitHub	Year
Patient-Zero: A Unified Framework for Real-Record-Free Patient Agent Generation	Not Available	2025
Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents	GitHub	2024
MedAide: Towards an Omni Medical Aide via Specialized {LLM}-based Multi-Agent Collaboration	Not Available	2024
Ehragent: Code empowers large language models for few-shot complex tabular reasoning on electronic health records	GitHub	2024
Towards Causality-Aware Inferring: A Sequential Discriminative Approach for Medical Diagnosis	Not Available	2023
EMR-AGENT: Automating Cohort and Feature Extraction from EMR Databases	GitHub	2025
Mdagents: An adaptive collaboration of llms for medical decision-making	GitHub	2024
AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments	GitHub	2024
Knowledge-infused llm-powered conversational health agent: A case study for diabetes patients	Not Available	2024
A grounded well-being conversational agent with multiple interaction modes: Preliminary results	Not Available	2023
RareAgents: Advancing Rare Disease Care through LLM-Empowered Multi-disciplinary Team	Not Available	2024
Healthcare Agent: Eliciting the Power of Large Language Models for Medical Consultation	Not Available	2025

1.4 Self-Improvement

Title	GitHub	Year
Self-Assessment of Content, Pedagogy, and Technology Knowledge among Higher Education Academics in Bahrain	Not Available	2025
Constitutional ai: Harmlessness from ai feedback	GitHub	2023
A Self-Evolving Framework for Multi-Agent Medical Consultation Based on Large Language Models	Not Available	2025
Adaptive Reasoning and Acting in Medical Language Agents	Not Available	2024
Exploring LLM-based Data Annotation Strategies for Medical Dialogue Preference Alignment	Not Available	2024
Reflexion: Language agents with verbal reinforcement learning	GitHub	2023
MALADE: orchestration of LLM-powered agents with retrieval augmented generation for pharmacovigilance	GitHub	2024
Medical AI Consensus: A Multi-Agent Framework for Radiology Report Generation and Evaluation	Not Available	2025
PROFSA: SELF-SUPERVISED POCKET PRETRAINING VIA PROTEIN FRAGMENT-SURROUNDINGS ALIGN	GitHub	2023

1.5 Reasoning

Title	GitHub	Year
Proof-of-TBI–Fine-Tuned Vision Language Model Consortium and OpenAI-o3 Reasoning LLM-Based Medical Diagnosis Support System for Mild Traumatic Brain Injury (TBI) Prediction	Not Available	2025
OpenAI o1 System Card	GitHub	2024
Beyond direct diagnosis: LLM-based multi-specialist agent consultation for automatic diagnosis	Not Available	2024
MedTrust-RAG: Evidence Verification and Trust Alignment for Biomedical Question Answering	Not Available	2025
KG4Diagnosis: A Hierarchical Multi-Agent LLM Framework with Knowledge Graph Enhancement for Medical Diagnosis	Not Available	2024
MAGDA: Multi-agent guideline-driven diagnostic assistance	Not Available	2024
Zero-Shot Large Language Model Agents for Fully Automated Radiotherapy Treatment Planning	Not Available	2025
Cod, towards an interpretable medical agent using chain of diagnosis	GitHub	2025
Asynchronous decentralized federated lifelong learning for landmark localization in medical imaging	Not Available	2023
Temporally-extended prompts optimization for sam in interactive medical image segmentation	Not Available	2023
Integration of multi-source medical data for medical diagnosis question answering	GitHub	2024
MedGen: An Explainable Multi-Agent Architecture for Clinical Decision Support through Multisource Knowledge Fusion	Not Available	2024
Zodiac: A cardiologist-level llm framework for multi-agent diagnostics	Not Available	2024
An active inference strategy for prompting reliable responses from large language models in medical practice	Not Available	2025
CARE-AD: a multi-agent large language model framework for Alzheimer’s disease prediction using longitudinal clinical notes	Not Available	2025
Feat: A multi-agent forensic ai system with domain-adapted large language model for automated cause-of-death analysis	GitHub	2025
Fine-tuning vision language models with graph-based knowledge for explainable medical image analysis	Not Available	2025
Advancing healthcare automation: Multi-agent system for medical necessity justification	Not Available	2024
ArgMed-Agents: explainable clinical decision reasoning with LLM disscusion via argumentation schemes	Not Available	2024
Imas: A comprehensive agentic approach to rural healthcare delivery	GitHub	2024
Simulated patient systems are intelligent when powered by large language model-based AI agents	Not Available	2024
UMass-BioNLP at MEDIQA-M3G 2024: DermPrompt–A Systematic Exploration of Prompt Engineering with GPT-4V for Dermatological Diagnosis	GitHub	2024
CataractSurg-80K: Knowledge-Driven Benchmarking for Structured Reasoning in Ophthalmic Surgery Planning	Not Available	2025
Dr. Copilot: A Multi-Agent Prompt Optimized Assistant for Improving Patient-Doctor Communication in Romanian	Not Available	2025
Enhancing Medical Lung X-Ray Diagnosis Through Multi-Agent Vision-Language Model Collaboration	Not Available	2025
Enhancing diagnostic capability with multi-agents conversational large language models	GitHub	2025
LINS: A general medical Q\&A framework for enhancing the quality and credibility of LLM-generated responses	GitHub	2025
OAAgent: Multimodal LLM Agent for Predicting Knee Osteoarthritis Progression	Not Available	2025
Silence is Not Consensus: Disrupting Agreement Bias in Multi-Agent LLMs via Catfish Agent for Clinical Decision Making	Not Available	2025
Standard Applicability Judgment and Cross-jurisdictional Reasoning: A RAG-based Framework for Medical Device Compliance	Not Available	2025

1.6 Perception

Title	GitHub	Year
Navigation Through Endoluminal Channels Using Q-Learning	Not Available	2023
Agent-Based Uncertainty Awareness Improves Automated Radiology Report Labeling with an Open-Source Large Language Model	Not Available	2025
ChatMyopia: An AI Agent for Pre-consultation Education in Primary Eye Care Settings	Not Available	2025
Vision-language model for report generation and outcome prediction in CT pulmonary angiogram	GitHub	2025
MedCoAct: Confidence-Aware Multi-Agent Collaboration for Complete Clinical Decision	Not Available	2025
MedRepBench: A Comprehensive Benchmark for Medical Report Interpretation	Not Available	2025
T-agent: A term-aware agent for medical dialogue generation	Not Available	2024

1.7 Others (continual learning, uncertainty)

Title	GitHub	Year
CancerGUIDE: Cancer Guideline Understanding via Internal Disagreement Estimation	GitHub	2025
Using large language models for enhanced fraud analysis and detection in blockchain based health insurance claims	GitHub	2025
MedAgentAudit: Diagnosing and Quantifying Collaborative Failure Modes in Medical Multi-Agent Systems	GitHub	2025
Benchmarking Large Language Models on Communicative Medical Coaching: A Dataset and a Novel System	GitHub	2024
Tiered Agentic Oversight: A Hierarchical Multi-Agent System for AI Safety in Healthcare	GitHub	2025

🚀 2. Atomic Function

2.1 Basic Technology Empowerment

Title	GitHub	Year
AGENTiGraph: A Multi-Agent Knowledge Graph Framework for Interactive, Domain-Specific LLM Chatbots	GitHub	2025
Data Overdose? Time for a Quadruple Shot: Knowledge Graph Construction Using Enhanced Triple Extraction	Not Available	2025
MedKGEval: A Knowledge Graph-Based Multi-Turn Evaluation Framework for Open-Ended Patient Interactions with Clinical LLMs	Not Available	2025
MedKGent: A Large Language Model Agent Framework for Constructing Temporally Evolving Medical Knowledge Graph	GitHub	2025
GMAT: Grounded Multi-agent Clinical Description Generation for Text Encoder in Vision-Language MIL for Whole Slide Image Classification	Not Available	2025
Pathfinder: A multi-modal multi-agent system for medical diagnostic decision-making applied to histopathology	GitHub	2025
Boundary-aware supervoxel-level iteratively refined interactive 3d image segmentation with multi-agent reinforcement learning	Not Available	2023
Interactive medical image segmentation with self-adaptive confidence calibration	GitHub	2023
Image Segmentation Using Only” Better or Worse” Expert Feedback	Not Available	2025
SCOPE: Speech-Guided COllaborative PErception Framework for Surgical Scene Segmentation	Not Available	2025
Assessing electronic note quality using the physician documentation quality instrument (PDQI-9)	Not Available	2023

2.2 Core Diagnostic & Therapeutic Assistance

Title	GitHub	Year
Dspy: Compiling declarative language model calls into self-improving pipelines	GitHub	2023
A two-stage proactive dialogue generator for efficient clinical information collection using large language model	Not Available	2025
Doctoragent-rl: A multi-agent collaborative reinforcement learning system for multi-turn clinical dialogue	GitHub	2025
Llms can simulate standardized patients via agent coevolution	GitHub	2025
MedAgentSim: Self-evolving Multi-agent Simulations for Realistic Clinical Interactions	GitHub	2025
Medical visual question answering via conditional reasoning	GitHub	2023
Architecting Clinical Collaboration: Multi-Agent Reasoning Systems for Multimodal Medical VQA	GitHub	2025
SOLVE-Med: Specialized Orchestration for Leading Vertical Experts across Medical Specialties	GitHub	2025
Deep Imitation Learning for Automated Drop-In Gamma Probe Manipulation	Not Available	2023
Td-mpc2: Scalable, robust world models for continuous control	GitHub	2023
Magnetic Milli-Spinner for Robotic Endovascular Surgery	Not Available	2025
Surgraw: Multi-agent workflow with chain-of-thought reasoning for surgical intelligence	GitHub	2025

2.3 Workflow & Documentation Optimization

Title	GitHub	Year
A Multimodal Multi-Agent Framework for Radiology Report Generation	Not Available	2025
MRGAgents: A Multi-Agent Framework for Improved Medical Report Generation with Med-LVLMs	Not Available	2025
Towards interpretable radiology report generation via concept bottlenecks using a multi-agentic rag	GitHub	2025
An Agentic Model Context Protocol Framework for Medical Concept Standardization	GitHub	2025

🚀 3. Application

3.1 Intake & Clinical Dialogue

Title	GitHub	Year
Chatbot To Help Patients Understand Their Health	GitHub	2025
Diaggpt: An llm-based and multi-agent dialogue system with automatic topic management for flexible task-oriented dialogue	GitHub	2023

3.2 Virtual MDT Teams & Multimodal Reasoning

Title	GitHub	Year
An agentic system for rare disease diagnosis with traceable reasoning	Not Available	2025
CT-Agent: A Multimodal-LLM Agent for 3D CT Radiology Question Answering	Not Available	2025
Eyecaregpt: Boosting comprehensive ophthalmology understanding with tailored dataset, benchmark and model	GitHub	2025
Med-GRIM: Enhanced Zero-Shot Medical VQA using prompt-embedded Multimodal Graph RAG	GitHub	2025
MAM: Modular Multi-Agent Framework for Multi-Modal Medical Diagnosis via Role-Specialized Collaboration	GitHub	2025
Mdteamgpt: A self-evolving llm-based multi-agent framework for multi-disciplinary team medical consultation	GitHub	2025
MedLA: A Logic-Driven Multi-Agent Framework for Complex Medical Reasoning with Large Language Models	GitHub	2025

3.3 Treatment Procedures

Title	GitHub	Year
An Adaptive Multi-Agent LLM-Based Clinical Decision Support System Integrating Biomedical RAG and Web Intelligence	GitHub	2025
Development and validation of an autonomous artificial intelligence agent for clinical decision-making in oncology	Not Available	2025

3.4 Chronic Disease Management & Prescription Safety

Title	GitHub	Year
RxLens: Multi-Agent LLM-powered Scan and Order for Pharmacy	Not Available	2025
Adaptive multi-agent deep reinforcement learning for timely healthcare interventions	Not Available	2023
Remoni: An autonomous system integrating wearables and multimodal large language models for enhanced remote health monitoring	Not Available	2024
Clinically-inspired multi-agent transformers for disease trajectory forecasting from multimodal data	GitHub	2023
Med-TAMARA: Trust-Aware Multi-Agent Risk Assessment in Medical AI Dialogue	Not Available	2025

3.5 Documentation, Coding & Knowledge Infrastructure

Title	GitHub	Year
Benchmarking Automatic Speech Recognition coupled LLM Modules for Medical Diagnostics	Not Available	2025
Exploring llm multi-agents for icd coding	Not Available	2024
Code Like Humans: A Multi-Agent Solution for Medical Coding	GitHub	2025

3.6 Simulation & Support Systems

Title	GitHub	Year
Towards anatomy education with generative AI-based virtual assistants in immersive virtual reality environments	Not Available	2024
When Avatars Have Personality: Effects on Engagement and Communication in Immersive Medical Training	Not Available	2025

3.7 Regulation, Payer Workflows & Administrative Automation

Title	GitHub	Year
Large language models as agents in the clinic	Not Available	2023
Enhancing diagnostic accuracy through multi-agent conversations: using large language models to mitigate cognitive bias	Not Available	2024
A hybrid reinforcement learning and knowledge graph framework for financial risk optimization in healthcare systems	Not Available	2025

🚀 4. Safety

4.1 Medical Hallucination

Title	GitHub	Year
Ethical and regulatory challenges of large language models in medicine	Not Available	2024
Medhalu: Hallucinations in responses to healthcare queries by large language models	Not Available	2024
Mitigating hallucinations in large language models via self-refinement-enhanced knowledge retrieval	Not Available	2024
A framework to assess clinical safety and hallucination rates of LLMs for medical text summarisation	Not Available	2025
Evaluating transparency in AI/ML model characteristics for FDA-reviewed medical devices	Not Available	2025
MEGA-RAG: a retrieval-augmented generation framework with multi-evidence guided answer refinement for mitigating hallucinations of LLMs in public health	Not Available	2025
Medhallu: A comprehensive benchmark for detecting medical hallucinations in large language models	GitHub	2025
Medical hallucinations in foundation models and their impact on healthcare	GitHub	2025
Medmmv: A controllable multimodal multi-agent framework for reliable and verifiable clinical reasoning	Not Available	2025
Reducing Hallucinations and Trade-Offs in Responses in Generative AI Chatbots for Cancer Information: Development and Evaluation Study	Not Available	2025

4.2 Privacy & Data-Security

Title	GitHub	Year
Beyond memorization: Violating privacy via inference with large language models	Not Available	2023
Extracting Training Data from Large Language Models	GitHub	2023
Federated machine learning, privacy-enhancing technologies, and data protection laws in medical research: scoping review	Not Available	2023
Privacy by design: The 7 foundational principles	Not Available	2023
On protecting the data privacy of large language models (llms): A survey	Not Available	2024
Privacy-Preserving Large Language Models: Mechanisms	Not Available	2024
Autonomous Radiotherapy Treatment Planning Using DOLA: A Privacy-Preserving, LLM-Based Optimization Agent	Not Available	2025
Differential privacy for medical deep learning: methods, tradeoffs, and deployment implications	Not Available	2025
Multi agent based medical assistant for edge devices	GitHub	2025
Privacy in action: Towards realistic privacy mitigation and evaluation for llm-powered agents	GitHub	2025

4.3 Explainability & Transparency

Title	GitHub	Year
A trustworthy AI reality-check: the lack of transparency of artificial intelligence products in healthcare	Not Available	2024
The role of explainability in AI-supported medical decision-making	Not Available	2024
ASTRID–An Automated and Scalable TRIaD for the Evaluation of RAG-based Clinical Question Answering Systems	Not Available	2025
Explainable AI for medical data: Current methods, limitations, and future directions	Not Available	2025
MedAgent-Pro: Towards Evidence-Based Multi-Modal Medical Diagnosis via Reasoning Agentic Workflow	Not Available	2025
Program Synthesis Dialog Agents for Interactive Decision-Making	Not Available	2025
Tree-of-Reasoning: Towards Complex Medical Diagnosis via Multi-Agent Reasoning with Evidence Tree	GitHub	2025

4.4 Adversarial Security & Threat Modeling

Title	GitHub	Year
Adversarial attacks on large language models in medicine	Not Available	2024
On the resilience of llm-based multi-agent collaboration with faulty agents	Not Available	2024
Data Poisoning Vulnerabilities Across Healthcare AI Architectures: A Security Threat Analysis	Not Available	2025
Emerging cyber attack risks of medical ai agents	Not Available	2025
Red-teaming llm multi-agent systems via communication attacks	Not Available	2025
TAMAS: Benchmarking Adversarial Risks in Multi-Agent LLM Systems	Not Available	2025
Towards safe ai clinicians: A comprehensive study on large language model jailbreaking in healthcare	Not Available	2025

4.5 AI Governance & Systemic Safety

Title	GitHub	Year
Levels of autonomy and safety assurance for AI-Based clinical decision systems	Not Available	2023

4.6 Bias, Fairness & Accessibility

Title	GitHub	Year
Achieving health equity through conversational AI: A roadmap for design and implementation of inclusive chatbots in healthcare	Not Available	2024
Aligning Medical LLMs for Counterfactual Fairness	GitHub	2024
Foundation metrics for evaluating effectiveness of healthcare conversations powered by generative AI	Not Available	2024
Actions speak louder than words: Agent decisions reveal implicit biases in language models	Not Available	2025
Balancing Fairness and Performance in Healthcare {AI}: A Gradient Reconciliation Approach	Not Available	2025
Bias-Aware Agent: Enhancing Fairness in AI-Driven Knowledge Retrieval	GitHub	2025
Conversational health agents: a personalized large language model-powered agent framework	GitHub	2025
Transforming healthcare delivery with conversational AI platforms	Not Available	2025

🚀 5. Evaluation

5.1 Benchmarks

Title	GitHub	Year
Med-CMR: A Fine-Grained Benchmark Integrating Visual Evidence and Clinical Logic for Medical Complex Multimodal Reasoning	GitHub	2025
A Dataset of Clinically Generated Visual Questions and Answers About Radiology Images	Not Available	2023
Measuring Massive Multitask Language Understanding	GitHub	2023
MedDG: An Entity-Centric Medical Consultation Dataset for Entity-Aware Medical Dialogue Generation	GitHub	2023
MedDialog: A Large-scale Medical Dialogue Dataset	Not Available	2023
PathVQA: 30000+ Questions for Medical Visual Question Answering	GitHub	2023
PubMedQA: A Dataset for Biomedical Research Question Answering	GitHub	2023
FedAgentBench: Towards Automating Real-World Federated Medical Image Analysis with Server–Client LLM Agents	Not Available	2024
MMLU-Pro: A More Robust Benchmark for Multi-Task Language Understanding	GitHub	2024
MedQA-CS: Benchmarking Large Language Models’ Clinical Skills Using an AI-SCE Framework	GitHub	2024
3mdbench: Medical multimodal multi-agent dialogue benchmark	GitHub	2025
AMQA: An Adversarial Dataset for Benchmarking Bias of LLMs in Medicine and Healthcare	GitHub	2025
Beyond Benchmarks: Dynamic, Automatic and Systematic Red-Teaming Agents for Trustworthy Medical Language Models	GitHub	2025
MedAgentBoard: Benchmarking Multi-Agent Collaboration with Conventional Methods for Diverse Medical Tasks	GitHub	2025
MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning	GitHub	2025
MedBrowseComp: Benchmarking Medical Deep Research and Computer Use	GitHub	2025
MedFact: A Large-scale Chinese Dataset for Evidence-based Medical Fact-Checking of LLM Responses	GitHub	2025
MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding	GitHub	2025
Medresearcher-r1: Expert-level medical deep researcher via a knowledge-informed trajectory synthesis framework	GitHub	2025

5.2 Metrics

Title	GitHub	Year
Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum	Not Available	2023
What Disease Does This Patient Have? A Large-scale Open Domain Question Answering Dataset from Medical Exams	GitHub	2023
AI Agents in Clinical Medicine: A Systematic Review	Not Available	2025
Audited Reasoning Refinement: Fine-Tuning Language Models via LLM-Guided Step-Wise Evaluation and Correction	Not Available	2025
Discovering Pathology Rationale and Token Allocation for Efficient Multimodal Pathology Reasoning	Not Available	2025
Ensuring Safety and Trust: Analyzing the Risks of Large Language Models in Medicine	Not Available	2025
Evaluating the accuracy of a state-of-the-art large language model for prediction of admissions from the emergency room	Not Available	2025
Geometry-preserving encoder/decoder in latent generative models	GitHub	2025
How Can We Diagnose and Treat Bias in Large Language Models for Clinical Decision-Making?	Not Available	2025
Large language models in real-world clinical workflows: a systematic review of applications and implementation	Not Available	2025
MedAgentBench: Dataset for Benchmarking LLMs as Agents	GitHub	2025
Multi-Agent-as-Judge: Aligning LLM-Agent-Based Automated Evaluation with Multi-Dimensional Human Evaluation	Not Available	2025
Performance of Retrieval-Augmented Generation Large Language Models in Guideline-Concordant Prostate-Specific Antigen Testing: Comparative Study With Junior Clinicians	Not Available	2025
Real-World Evaluation of Large Language Models in Healthcare (RWE-LLM): A New Realm of AI Safety \& Validation	Not Available	2025
Trustworthy Reasoning: Evaluating and Enhancing Factual Accuracy in LLM Intermediate Thought Processes	Not Available	2025

5.3 Challenge & Discussion

Title	GitHub	Year
Beyond Benchmarks: Evaluating Generalist Medical Artificial Intelligence With Psychometrics	Not Available	2025
Ehr-mcp: Real-world evaluation of clinical information retrieval by large language models via model context protocol	Not Available	2025
Measurement to Meaning: A Validity-Centered Framework for AI Evaluation	Not Available	2025
Position: Medical large language model benchmarks should prioritize construct validity	Not Available	2025
Rapidly benchmarking large language models for diagnosing comorbid patients: comparative study leveraging the LLM-as-a-judge method	Not Available	2025
The evaluation illusion of large language models in medicine	Not Available	2025

🚀 6. Communication & Collaboration Mechanisms

Title	GitHub	Year
Camel: Communicative agents for” mind” exploration of large language model society	Not Available	2023
MetaGPT: Meta programming for a multi-agent collaborative framework	GitHub	2023
RareAgents: Autonomous Multi-disciplinary Team for Rare Disease Diagnosis and Treatment	Not Available	2024
Bridging Clinical Narratives and ACR Appropriateness Guidelines: A Multi-Agent RAG System for Medical Imaging Decisions	GitHub	2025
Human-AI collaboration in healthcare: A review and research agenda	Not Available	2023
Medical robotics—Regulatory, ethical, and legal considerations for increasing levels of autonomy	Not Available	2023
The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care	GitHub	2023
Chatdev: Communicative agents for software development	GitHub	2024
Towards conversational diagnostic artificial intelligence	Not Available	2025
Smile: Single-turn to multi-turn inclusive language expansion via chatgpt for mental health support	Not Available	2024

🚀 7. Others

Title	GitHub	Year
Drugagent: Multi-agent large language model-based reasoning for drug-target interaction prediction	Not Available	2025
An Edge Based Multi-Agent Model for Improving Hospital Bed Management	Not Available	2023
DispatchMAS: Fusing taxonomy and artificial intelligence agents for emergency medical services	Not Available	2025
Lessons Learned from Evaluation of LLM based Multi-agents in Safer Therapy Recommendation	GitHub	2025
M3Builder: A Multi-Agent System for Automated Machine Learning in Medical Imaging	GitHub	2025
MMedAgent-RL: Optimizing Multi-Agent Collaboration for Multimodal Medical Reasoning	Not Available	2025
Resilient Multi-Agent Negotiation for Medical Supply Chains: Integrating LLMs and Blockchain for Transparent Coordination	Not Available	2025
Survey and improvement strategies for gene prioritization with large language models	Not Available	2025
MedOrch: Medical Diagnosis with Tool-Augmented Reasoning Agents for Flexible Extensibility	Not Available	2025
Demo: Healthcare Agent Orchestrator (HAO) for Patient Summarization in Molecular Tumor Boards	Not Available	2025
Mediator-guided multi-agent collaboration among open-source models for medical decision-making	Not Available	2025
A co-evolving agentic AI system for medical imaging analysis	GitHub	2025
FRAME: Feedback-Refined Agent Methodology for Enhancing Medical Research Insights	GitHub	2025
Meddxagent: A unified modular agent framework for explainable automatic differential diagnosis	GitHub	2025
AURA: A Multi-modal Medical Agent for Understanding, Reasoning and Annotation	GitHub	2025
GEMA-Score: Granular Explainable Multi-Agent Score for Radiology Report Evaluation	GitHub	2025
PASS: Probabilistic Agentic Supernet Sampling for Interpretable and Adaptive Chest X-Ray Reasoning	GitHub	2025
Cancer-Myth: Evaluating Large Language Models on Patient Questions with False Presuppositions	GitHub	2025
EH-Benchmark: Ophthalmic hallucination benchmark and agent-driven top-down traceable reasoning workflow	GitHub	2025
Medchat: A multi-agent framework for multimodal diagnosis with large language models	GitHub	2025
Autohealth: Advanced llm-empowered wearable personalized medical butler for parkinson’s disease management	Not Available	2024
TWIN-GPT: digital twins for clinical trials via large language model	Not Available	2024
A Proposed LLM-Based Supported Treatment Framework for Intracerebral Hemorrhage	Not Available	2025
Agentic AI for Clinical Decision Support: Real-Time Diagnosis, Triage, and Treatment Planning	Not Available	2025
Agentic Workflows in Healthcare: Advancing Clinical Efficiency through AI Integration	Not Available	2025
Developing an Artificial Intelligence Tool for Personalized Breast Cancer Treatment Plans based on the NCCN Guidelines	Not Available	2025
TxAgent: An AI agent for therapeutic reasoning across a universe of tools	GitHub	2025
Allocation of physician time in ambulatory practice: a time and motion study in 4 specialties	Not Available	2023
Tethered to the EHR: primary care physician workload assessment using EHR event log data and time-motion observations	Not Available	2023
The NCI Imaging Data Commons as a platform for reproducible research in computational pathology	Not Available	2023
At-cxr: Uncertainty-aware agentic triage for chest x-rays	GitHub	2025
In-Basket Message Volume in Primary Care: A Cross-sectional Analysis by Gender and Specialty	Not Available	2025
Modeling irregularly sampled clinical time series	GitHub	2023
Multimodal Models in Healthcare: Methods, Challenges, and Future Directions for Enhanced Clinical Decision Support	Not Available	2025
Visual-Conversational Interface for Evidence-Based Explanation of Diabetes Risk Prediction	GitHub	2025
Autonomous systems and artificial intelligence in healthcare transformation to 5P medicine–ethical challenges	Not Available	2023
Cognitive architectures for language agents	Not Available	2023
Generative agents: Interactive simulacra of human behavior	GitHub	2023
MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning	Not Available	2023
React: Synergizing reasoning and acting in language models	GitHub	2023
Upper processing stages of the perception–action cycle	Not Available	2023
A survey on large language model based autonomous agents	Not Available	2024
Tool learning with large language models: A survey	GitHub	2025

Citation

```bibtex
@misc{,
title={The Landscape of Medical Agents: A Survey},

}