Landmark-of-medical-agent

🚀 The Landscape of Medical Agents: A Survey

TechRxiv Download GitHub Page Web Page Xiaohongshu GitHub stars PRs welcome

Overall Landscape

Overall Landscape

🌟 Overview

This is the official repository for the survey paper: The Landscape of Medical Agents. This repository is a comprehensive and systematic research resource library for medical agents, dedicated to organizing and tracking the latest research progress, application practices, and technological developments of AI intelligent agents in the medical and health field. This investigative project covers the entire ecosystem from basic technical capabilities to clinical actual deployment, providing an authoritative research map for medical AI researchers, clinical practitioners, and system developers.

🤝 Thanks

If you think this project is useful and inspiring, we would greatly appreciate it if you could give us a Star to show your support! Your support is of great significance to us, as it encourages us to continue improving and developing this project.

đź“– Keywords

Medical Agents, Clinical Workflows, Safety, Governance and Evaluation

🔥 News

[2025/11/30] We release the initial github repo!

🌟 Contributing

We will try to keep this list updated. If you find any errors or any missed paper, please don’t hesitate to open issues or pull request.Please follow the instruction in CONTRIBUTING.md if you want to make one. Additionally, if you want to have any other issue, please add this wechat group.

🤝 Main Contacts

🌟 Table of Contents

✨ Latest Papers

🚀 Year-2025

Title GitHub Sections
Med-CMR: A Fine-Grained Benchmark Integrating Visual Evidence and Clinical Logic for Medical Complex Multimodal Reasoning GitHub evaluation
3mdbench: Medical multimodal multi-agent dialogue benchmark GitHub evaluation
A co-evolving agentic AI system for medical imaging analysis GitHub other
A dual-agent collaboration framework based on llms for nursing robots to perform bimanual coordination tasks Not Available capability
A framework to assess clinical safety and hallucination rates of LLMs for medical text summarisation Not Available safety, evaluation
A hybrid reinforcement learning and knowledge graph framework for financial risk optimization in healthcare systems Not Available application
A Multi-Agent Approach to Neurological Clinical Reasoning Not Available capability, other
A Multimodal Multi-Agent Framework for Radiology Report Generation Not Available task, evaluation, other
A Proposed LLM-Based Supported Treatment Framework for Intracerebral Hemorrhage Not Available intro, capability, application
A Self-Evolving Framework for Multi-Agent Medical Consultation Based on Large Language Models Not Available capability, application
A two-stage proactive dialogue generator for efficient clinical information collection using large language model Not Available task, application
Actions speak louder than words: Agent decisions reveal implicit biases in language models Not Available safety
Adagent: Llm agent for alzheimer’s disease analysis with collaborative coordinator Not Available capability, other
Agent-Based Uncertainty Awareness Improves Automated Radiology Report Labeling with an Open-Source Large Language Model Not Available capability, other
Agentic AI for Clinical Decision Support: Real-Time Diagnosis, Triage, and Treatment Planning Not Available intro
Agentic Medical Knowledge Graphs Enhance Medical Question Answering: Bridging the Gap Between LLMs and Evolving Medical Knowledge GitHub capability, task, application
Agentic Workflows in Healthcare: Advancing Clinical Efficiency through AI Integration Not Available intro
Agentic-AI Healthcare: Multilingual, Privacy-First Framework with {MCP} Agents GitHub capability, application, safety
AGENTiGraph: A Multi-Agent Knowledge Graph Framework for Interactive, Domain-Specific LLM Chatbots GitHub task
Agentmd: Empowering language agents for risk prediction with large-scale clinical tool learning GitHub capability, task, application
AI Agents in Clinical Medicine: A Systematic Review Not Available evaluation
AI chatbots as professional service agents: developing a professional identity Not Available capability, application
AMANDA: Agentic Medical Knowledge Augmentation for Data-Efficient Medical Visual Question Answering GitHub capability, task, other
AMQA: An Adversarial Dataset for Benchmarking Bias of LLMs in Medicine and Healthcare GitHub evaluation
An active inference strategy for prompting reliable responses from large language models in medical practice Not Available capability
An Adaptive Multi-Agent LLM-Based Clinical Decision Support System Integrating Biomedical RAG and Web Intelligence GitHub application
An Agentic Model Context Protocol Framework for Medical Concept Standardization GitHub task
An agentic system for rare disease diagnosis with traceable reasoning Not Available task, application, other
Architecting Clinical Collaboration: Multi-Agent Reasoning Systems for Multimodal Medical VQA GitHub task, other
ASTRID–An Automated and Scalable TRIaD for the Evaluation of RAG-based Clinical Question Answering Systems Not Available safety
At-cxr: Uncertainty-aware agentic triage for chest x-rays GitHub intro, other
Audited Reasoning Refinement: Fine-Tuning Language Models via LLM-Guided Step-Wise Evaluation and Correction Not Available evaluation
AURA: A Multi-modal Medical Agent for Understanding, Reasoning and Annotation GitHub other
Autonomous Multi-Modal LLM Agents for Treatment Planning in Focused Ultrasound Ablation Surgery GitHub capability, task, application
Autonomous Radiotherapy Treatment Planning Using DOLA: A Privacy-Preserving, LLM-Based Optimization Agent Not Available safety, other
Balancing Fairness and Performance in Healthcare {AI}: A Gradient Reconciliation Approach Not Available safety
Benchmarking Automatic Speech Recognition coupled LLM Modules for Medical Diagnostics Not Available application
Beyond Benchmarks: Dynamic, Automatic and Systematic Red-Teaming Agents for Trustworthy Medical Language Models GitHub evaluation
Beyond Benchmarks: Evaluating Generalist Medical Artificial Intelligence With Psychometrics Not Available evaluation
Bridging Clinical Narratives and ACR Appropriateness Guidelines: A Multi-Agent RAG System for Medical Imaging Decisions GitHub task
Cancer-Myth: Evaluating Large Language Models on Patient Questions with False Presuppositions GitHub other
CancerGUIDE: Cancer Guideline Understanding via Internal Disagreement Estimation GitHub intro, capability, application, other
CARE-AD: a multi-agent large language model framework for Alzheimer’s disease prediction using longitudinal clinical notes Not Available capability, task, other
CataractSurg-80K: Knowledge-Driven Benchmarking for Structured Reasoning in Ophthalmic Surgery Planning Not Available capability, application, other
Chatbot To Help Patients Understand Their Health GitHub application
ChatMyopia: An AI Agent for Pre-consultation Education in Primary Eye Care Settings Not Available capability, application, other
Cod, towards an interpretable medical agent using chain of diagnosis GitHub capability, application, safety, evaluation, other
Code Like Humans: A Multi-Agent Solution for Medical Coding GitHub application
Conversational health agents: a personalized large language model-powered agent framework GitHub safety
CT-Agent: A Multimodal-LLM Agent for 3D CT Radiology Question Answering Not Available application
Data Overdose? Time for a Quadruple Shot: Knowledge Graph Construction Using Enhanced Triple Extraction Not Available task
Data Poisoning Vulnerabilities Across Healthcare AI Architectures: A Security Threat Analysis Not Available safety
Developing an Artificial Intelligence Tool for Personalized Breast Cancer Treatment Plans based on the NCCN Guidelines Not Available intro, capability, other
Development and validation of an autonomous artificial intelligence agent for clinical decision-making in oncology Not Available application, other
Differential privacy for medical deep learning: methods, tradeoffs, and deployment implications Not Available safety
Discovering Pathology Rationale and Token Allocation for Efficient Multimodal Pathology Reasoning Not Available evaluation
DispatchMAS: Fusing taxonomy and artificial intelligence agents for emergency medical services Not Available task, other
Doctor-R1: Mastering Clinical Inquiry with Experiential Agentic Reinforcement Learning GitHub capability, application
Doctoragent-rl: A multi-agent collaborative reinforcement learning system for multi-turn clinical dialogue GitHub task
Dr. Copilot: A Multi-Agent Prompt Optimized Assistant for Improving Patient-Doctor Communication in Romanian Not Available capability, task
Drugagent: Multi-agent large language model-based reasoning for drug-target interaction prediction Not Available task
EH-Benchmark: Ophthalmic hallucination benchmark and agent-driven top-down traceable reasoning workflow GitHub other
Ehr-mcp: Real-world evaluation of clinical information retrieval by large language models via model context protocol Not Available evaluation
Emerging cyber attack risks of medical ai agents Not Available safety
Enhancing diagnostic capability with multi-agents conversational large language models GitHub capability, task, application, other
Enhancing Medical Lung X-Ray Diagnosis Through Multi-Agent Vision-Language Model Collaboration Not Available capability, application
Ensuring Safety and Trust: Analyzing the Risks of Large Language Models in Medicine Not Available evaluation
Evaluating the accuracy of a state-of-the-art large language model for prediction of admissions from the emergency room Not Available evaluation
Evaluating transparency in AI/ML model characteristics for FDA-reviewed medical devices Not Available safety
Explainable AI for medical data: Current methods, limitations, and future directions Not Available safety
Eyecaregpt: Boosting comprehensive ophthalmology understanding with tailored dataset, benchmark and model GitHub application, other
Feat: A multi-agent forensic ai system with domain-adapted large language model for automated cause-of-death analysis GitHub capability
Fine-tuning vision language models with graph-based knowledge for explainable medical image analysis Not Available capability
FRAME: Feedback-Refined Agent Methodology for Enhancing Medical Research Insights GitHub other
GEMA-Score: Granular Explainable Multi-Agent Score for Radiology Report Evaluation GitHub other
Geometry-preserving encoder/decoder in latent generative models GitHub evaluation
GMAT: Grounded Multi-agent Clinical Description Generation for Text Encoder in Vision-Language MIL for Whole Slide Image Classification Not Available task
Haibu Mathematical-Medical Intelligent Agent: Enhancing Large Language Model Reliability in Medical Tasks via Verifiable Reasoning Chains Not Available capability
Demo: Healthcare Agent Orchestrator (HAO) for Patient Summarization in Molecular Tumor Boards Not Available sections: task, other
Healthcare Agent: Eliciting the Power of Large Language Models for Medical Consultation Not Available capability, application
Image Segmentation Using Only” Better or Worse” Expert Feedback Not Available task
Improving Interactive Diagnostic Ability of a Large Language Model Agent Through Clinical Experience Learning GitHub capability, application
In-Basket Message Volume in Primary Care: A Cross-sectional Analysis by Gender and Specialty Not Available intro
KERAP: A Knowledge-Enhanced Reasoning Approach for Accurate Zero-shot Diagnosis Prediction Using Multi-agent LLMs GitHub capability
Large language models in real-world clinical workflows: a systematic review of applications and implementation Not Available evaluation
Learning to be a doctor: Searching for effective medical agent architectures Not Available capability, application, other
Lessons Learned from Evaluation of LLM based Multi-agents in Safer Therapy Recommendation GitHub task, other
LINS: A general medical Q\&A framework for enhancing the quality and credibility of LLM-generated responses GitHub capability
Llms can simulate standardized patients via agent coevolution GitHub task, application
M3Builder: A Multi-Agent System for Automated Machine Learning in Medical Imaging GitHub task
Magnetic Milli-Spinner for Robotic Endovascular Surgery Not Available task
MAM: Modular Multi-Agent Framework for Multi-Modal Medical Diagnosis via Role-Specialized Collaboration GitHub application, other
Mdteamgpt: A self-evolving llm-based multi-agent framework for multi-disciplinary team medical consultation GitHub task, application
Measurement to Meaning: A Validity-Centered Framework for AI Evaluation Not Available evaluation
Med-TAMARA: Trust-Aware Multi-Agent Risk Assessment in Medical AI Dialogue Not Available capability, application
Med-VRAgent: A Framework for Medical Visual Reasoning-Enhanced Agents GitHub capability, application
MedAgent-Pro: Towards Evidence-Based Multi-Modal Medical Diagnosis via Reasoning Agentic Workflow Not Available safety, evaluation, other
MedAgentAudit: Diagnosing and Quantifying Collaborative Failure Modes in Medical Multi-Agent Systems GitHub capability, safety, evaluation
MedAgentBench: Dataset for Benchmarking LLMs as Agents GitHub evaluation
MedAgentBoard: Benchmarking Multi-Agent Collaboration with Conventional Methods for Diverse Medical Tasks GitHub evaluation
MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning GitHub evaluation
MedAgentSim: Self-evolving Multi-agent Simulations for Realistic Clinical Interactions GitHub task, other
MedBrowseComp: Benchmarking Medical Deep Research and Computer Use GitHub evaluation
Medchat: A multi-agent framework for multimodal diagnosis with large language models GitHub other
MedCoAct: Confidence-Aware Multi-Agent Collaboration for Complete Clinical Decision Not Available capability, application
Meddxagent: A unified modular agent framework for explainable automatic differential diagnosis GitHub other
MedFact: A Large-scale Chinese Dataset for Evidence-based Medical Fact-Checking of LLM Responses GitHub evaluation
Medhallu: A comprehensive benchmark for detecting medical hallucinations in large language models GitHub safety
Mediator-guided multi-agent collaboration among open-source models for medical decision-making Not Available task, other
Medical AI Consensus: A Multi-Agent Framework for Radiology Report Generation and Evaluation Not Available capability, task
Medical hallucinations in foundation models and their impact on healthcare GitHub safety
Position: Medical large language model benchmarks should prioritize construct validity Not Available evaluation
MedicalOS: An {LLM} Agent based Operating System for Digital Healthcare Not Available capability, application
MedKGent: A Large Language Model Agent Framework for Constructing Temporally Evolving Medical Knowledge Graph GitHub task
MedKGEval: A Knowledge Graph-Based Multi-Turn Evaluation Framework for Open-Ended Patient Interactions with Clinical LLMs Not Available task
MedLA: A Logic-Driven Multi-Agent Framework for Complex Medical Reasoning with Large Language Models GitHub application
Medmmv: A controllable multimodal multi-agent framework for reliable and verifiable clinical reasoning Not Available task, safety, evaluation, other
MedOrch: Medical Diagnosis with Tool-Augmented Reasoning Agents for Flexible Extensibility Not Available task, other
MedPAO: A Protocol-Driven Agent for Structuring Medical Reports GitHub capability, task, application, other
Medrax: Medical reasoning agent for chest x-ray GitHub capability, application, other
MedRepBench: A Comprehensive Benchmark for Medical Report Interpretation Not Available capability, evaluation
Medresearcher-r1: Expert-level medical deep researcher via a knowledge-informed trajectory synthesis framework GitHub evaluation
MedTrust-RAG: Evidence Verification and Trust Alignment for Biomedical Question Answering Not Available capability, task
MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding GitHub evaluation
MEGA-RAG: a retrieval-augmented generation framework with multi-evidence guided answer refinement for mitigating hallucinations of LLMs in public health Not Available safety
MMedAgent-RL: Optimizing Multi-Agent Collaboration for Multimodal Medical Reasoning Not Available task, other
MoMA: A Mixture-of-Multimodal-Agents Architecture for Enhancing Clinical Prediction Modelling Not Available capability, task, other
MRGAgents: A Multi-Agent Framework for Improved Medical Report Generation with Med-LVLMs Not Available task, other
Multi agent based medical assistant for edge devices GitHub safety
Multi-Agent-as-Judge: Aligning LLM-Agent-Based Automated Evaluation with Multi-Dimensional Human Evaluation Not Available evaluation
Multi-OphthaLingua: A Multilingual Benchmark for Assessing and Debiasing LLM Ophthalmological QA in {LMIC}s Not Available capability, application
Multimodal Models in Healthcare: Methods, Challenges, and Future Directions for Enhanced Clinical Decision Support Not Available intro
NurseLLM: The First Specialized Language Model for Nursing Not Available capability
OAAgent: Multimodal LLM Agent for Predicting Knee Osteoarthritis Progression Not Available capability
OpenLens AI: Fully Autonomous Research Agent for Health Infomatics GitHub capability, application
PASS: Probabilistic Agentic Supernet Sampling for Interpretable and Adaptive Chest X-Ray Reasoning GitHub other
Pathfinder: A multi-modal multi-agent system for medical diagnostic decision-making applied to histopathology GitHub task
Patient-Zero: A Unified Framework for Real-Record-Free Patient Agent Generation Not Available capability, application, safety
Performance of Retrieval-Augmented Generation Large Language Models in Guideline-Concordant Prostate-Specific Antigen Testing: Comparative Study With Junior Clinicians Not Available evaluation
Privacy in action: Towards realistic privacy mitigation and evaluation for llm-powered agents GitHub safety
Proactive Reasoning-with-Retrieval Framework for Medical Multimodal Large Language Models GitHub capability
Program Synthesis Dialog Agents for Interactive Decision-Making Not Available safety
Proof-of-TBI–Fine-Tuned Vision Language Model Consortium and OpenAI-o3 Reasoning LLM-Based Medical Diagnosis Support System for Mild Traumatic Brain Injury (TBI) Prediction Not Available capability, application
Rapidly benchmarking large language models for diagnosing comorbid patients: comparative study leveraging the LLM-as-a-judge method Not Available evaluation
Real-World Evaluation of Large Language Models in Healthcare (RWE-LLM): A New Realm of AI Safety \& Validation Not Available evaluation
Red-teaming llm multi-agent systems via communication attacks Not Available safety
Reducing Hallucinations and Trade-Offs in Responses in Generative AI Chatbots for Cancer Information: Development and Evaluation Study Not Available safety
ReflecTool: Towards Reflection-Aware Tool-Augmented Clinical Agents GitHub capability, application
Resilient Multi-Agent Negotiation for Medical Supply Chains: Integrating LLMs and Blockchain for Transparent Coordination Not Available task
RxLens: Multi-Agent LLM-powered Scan and Order for Pharmacy Not Available capability, application, other
SCOPE: Speech-Guided COllaborative PErception Framework for Surgical Scene Segmentation Not Available task, other
Self-Assessment of Content, Pedagogy, and Technology Knowledge among Higher Education Academics in Bahrain Not Available capability
Silence is Not Consensus: Disrupting Agreement Bias in Multi-Agent LLMs via Catfish Agent for Clinical Decision Making Not Available capability
SmartState: An Automated Research Protocol Adherence System GitHub capability, application
SOLVE-Med: Specialized Orchestration for Leading Vertical Experts across Medical Specialties GitHub task
Standard Applicability Judgment and Cross-jurisdictional Reasoning: A RAG-based Framework for Medical Device Compliance Not Available capability
Surgraw: Multi-agent workflow with chain-of-thought reasoning for surgical intelligence GitHub task
Survey and improvement strategies for gene prioritization with large language models Not Available task, other
Talk Before You Retrieve: Agent-Led Discussions for Better RAG in Medical QA GitHub capability
TAMAS: Benchmarking Adversarial Risks in Multi-Agent LLM Systems Not Available safety
The evaluation illusion of large language models in medicine Not Available evaluation
Tiered Agentic Oversight: A Hierarchical Multi-Agent System for AI Safety in Healthcare GitHub capability, safety
Tool learning with large language models: A survey GitHub intro
Towards conversational diagnostic artificial intelligence Not Available task
Towards interpretable radiology report generation via concept bottlenecks using a multi-agentic rag GitHub task
Towards safe ai clinicians: A comprehensive study on large language model jailbreaking in healthcare Not Available safety
Transforming healthcare delivery with conversational AI platforms Not Available safety
Tree-based RAG-Agent Recommendation System: A Case Study in Medical Test Data Not Available capability, task, application
Tree-of-Reasoning: Towards Complex Medical Diagnosis via Multi-Agent Reasoning with Evidence Tree GitHub safety
Trustworthy Reasoning: Evaluating and Enhancing Factual Accuracy in LLM Intermediate Thought Processes Not Available evaluation
TxAgent: An AI agent for therapeutic reasoning across a universe of tools GitHub intro, task, other
Using large language models for enhanced fraud analysis and detection in blockchain based health insurance claims GitHub capability, application
Vision-language model for report generation and outcome prediction in CT pulmonary angiogram GitHub capability, application
Visual-Conversational Interface for Evidence-Based Explanation of Diabetes Risk Prediction GitHub intro
When Avatars Have Personality: Effects on Engagement and Communication in Immersive Medical Training Not Available application
World Model for AI Autonomous Navigation in Mechanical Thrombectomy GitHub capability, task, application
Zero-Shot Large Language Model Agents for Fully Automated Radiotherapy Treatment Planning Not Available capability, application, other
Bias-Aware Agent: Enhancing Fairness in AI-Driven Knowledge Retrieval GitHub safety
EMR-AGENT: Automating Cohort and Feature Extraction from EMR Databases GitHub capability, task, application
Colacare: Enhancing electronic health record modeling through large language model-driven multi-agent collaboration GitHub capability, application
A Knowledge-driven Adaptive Collaboration of LLMs for Enhancing Medical Decision-making GitHub capability, application
MeNTi: Bridging medical calculator and LLM agent with nested tool calling GitHub capability, application
Med-GRIM: Enhanced Zero-Shot Medical VQA using prompt-embedded Multimodal Graph RAG GitHub application, other
How Can We Diagnose and Treat Bias in Large Language Models for Clinical Decision-Making? Not Available evaluation

🚀 Year-2024

Title GitHub Sections  
A demonstration of adaptive collaboration of large language models for medical decision-making GitHub capability, application  
A survey on large language model based autonomous agents Not Available intro  
Achieving health equity through conversational AI: A roadmap for design and implementation of inclusive chatbots in healthcare Not Available safety  
Adaptive Reasoning and Acting in Medical Language Agents Not Available capability, application  
Adversarial attacks on large language models in medicine Not Available safety  
Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents GitHub capability, task, application  
AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments GitHub intro, capability, application, evaluation, other  
Agentic llm workflows for generating patient-friendly medical reports GitHub capability, task, application, other  
Agentigraph: An interactive knowledge graph platform for llm-based chatbots utilizing private data GitHub capability, application  
AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator GitHub capability, task, application, evaluation  
Aligning Medical LLMs for Counterfactual Fairness GitHub safety  
ArgMed-Agents: explainable clinical decision reasoning with LLM disscusion via argumentation schemes Not Available capability, application  
Autohealth: Advanced llm-empowered wearable personalized medical butler for parkinson’s disease management Not Available other  
Autonomous artificial intelligence agents for clinical decision making in oncology Not Available capability, other  
Benchmarking Large Language Models on Communicative Medical Coaching: A Dataset and a Novel System GitHub capability, application  
Beyond direct diagnosis: LLM-based multi-specialist agent consultation for automatic diagnosis Not Available capability  
Chatdev: Communicative agents for software development GitHub task  
ClinicalAgent: Clinical Trial Multi-Agent System with Large Language Model-based Reasoning GitHub intro, capability, application, other  
ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World GitHub capability, evaluation  
Cxr-agent: Vision-language models for chest x-ray interpretation with uncertainty aware radiology reporting Not Available capability, application, other  
Development of a Large Language Model-based Multi-Agent Clinical Decision Support System for Korean Triage and Acuity Scale (KTAS)-Based Triage and Treatment Planning in Emergency Departments GitHub capability, task, other  
Ehragent: Code empowers large language models for few-shot complex tabular reasoning on electronic health records GitHub capability, task, application  
Enhancing diagnostic accuracy through multi-agent conversations: using large language models to mitigate cognitive bias Not Available application  
Enhancing llms for impression generation in radiology reports through a multi-agent system GitHub capability, task, application, other  
Ethical and regulatory challenges of large language models in medicine Not Available safety  
Evaluating Large Language Models as Agents in the Clinic Not Available capability, evaluation  
Exploring llm multi-agents for icd coding Not Available application  
Exploring LLM-based Data Annotation Strategies for Medical Dialogue Preference Alignment Not Available capability  
Foundation metrics for evaluating effectiveness of healthcare conversations powered by generative AI Not Available safety  
GuidelineGuard: An Agentic Framework for Medical Note Evaluation with Guideline Adherence Not Available capability, application  
Imas: A comprehensive agentic approach to rural healthcare delivery GitHub capability, application  
Improving Clinical Documentation with AI: A Comparative Study of Sporo AI Scribe and GPT-4o mini Not Available capability, task, application  
Inquire, Interact, and Integrate: A Proactive Agent Collaborative Framework for Zero-Shot Multimodal Medical Reasoning GitHub capability, application  
Integration of multi-source medical data for medical diagnosis question answering GitHub capability, application  
Iryonlp at mediqa-corr 2024: Tackling the medical error detection \& correction task on the shoulders of medical agents Not Available capability, task, application  
KG4Diagnosis: A Hierarchical Multi-Agent LLM Framework with Knowledge Graph Enhancement for Medical Diagnosis Not Available capability, application  
Knowledge-infused llm-powered conversational health agent: A case study for diabetes patients Not Available capability  
Large Language Model-Enhanced Interactive Agent for Public Education on Newborn Auricular Deformities Not Available capability, application  
Llm-based framework for administrative task automation in healthcare Not Available capability, application  
Llm-medqa: Enhancing medical question answering through case studies in large language models Not Available capability, application  
MAGDA: Multi-agent guideline-driven diagnostic assistance Not Available capability, task, application  
MALADE: orchestration of LLM-powered agents with retrieval augmented generation for pharmacovigilance GitHub intro, capability, application, other  
Mdagents: An adaptive collaboration of llms for medical decision-making GitHub capability, task, application  
MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning GitHub capability, task, other  
MedAide: Towards an Omni Medical Aide via Specialized {LLM}-based Multi-Agent Collaboration Not Available capability  
Medco: Medical education copilots based on a multi-agent framework Not Available capability, task, application  
MedChain: Bridging the Gap Between LLM Agents and Clinical Practice through Interactive Sequential Benchmarking GitHub capability, application, evaluation  
MedGen: An Explainable Multi-Agent Architecture for Clinical Decision Support through Multisource Knowledge Fusion Not Available capability, application  
Medhalu: Hallucinations in responses to healthcare queries by large language models Not Available safety  
MedQA-CS: Benchmarking Large Language Models’ Clinical Skills Using an AI-SCE Framework GitHub evaluation  
Mitigating cognitive biases in clinical decision-making through multi-agent conversations using large language models: simulation study Not Available capability, application  
Mitigating hallucinations in large language models via self-refinement-enhanced knowledge retrieval Not Available safety  
Mmedagent: Learning to use medical tools with multi-modal agent GitHub capability, application, other  
MMLU-Pro: A More Robust Benchmark for Multi-Task Language Understanding GitHub evaluation  
Natural Language Programming in Medicine: Administering Evidence Based Clinical Workflows with Autonomous Agents Powered by Generative Large Language Models Not Available capability, application  
On protecting the data privacy of large language models (llms): A survey Not Available safety  
On the resilience of llm-based multi-agent collaboration with faulty agents Not Available safety  
Piors: Personalized intelligent outpatient reception based on large language model with multi-agents medical scenario simulation GitHub capability, task, application  
Polaris: A safety-focused llm constellation architecture for healthcare Not Available capability, application, evaluation  
Privacy-Preserving Large Language Models: Mechanisms Not Available safety  
Advancing healthcare automation: Multi-agent system for medical necessity justification Not Available capability, task, application  
RareAgents: Advancing Rare Disease Care through LLM-Empowered Multi-disciplinary Team Not Available capability, application  
RareAgents: Autonomous Multi-disciplinary Team for Rare Disease Diagnosis and Treatment Not Available task, other  
Regulator-manufacturer AI agents modeling: Mathematical feedback-driven multi-agent LLM framework Not available capability, task, application  
Remoni: An autonomous system integrating wearables and multimodal large language models for enhanced remote health monitoring Not available capability, application  
Rx strategist: Prescription verification using llm agents system Not available capability, task, application, other  
Simulated patient systems are intelligent when powered by large language model-based AI agents Not available capability, application  
Smile: Single-turn to multi-turn inclusive language expansion via chatgpt for mental health support Github task  
Society of medical simplifiers GitHub capability, task, application  
Surgbox: Agent-driven operating room sandbox with surgery copilot Not available capability, task, application  
T-agent: A term-aware agent for medical dialogue generation Not available capability, application  
The role of explainability in AI-supported medical decision-making Not available safety  
Towards anatomy education with generative AI-based virtual assistants in immersive virtual reality environments Not available application  
Towards Automatic Evaluation for {LLM}s’ Clinical Capabilities: Metric, Data, and Algorithm Not available capability, application, evaluation  
Towards next-generation medical agent: How o1 is reshaping decision-making in medical scenarios Not available capability, other  
A trustworthy AI reality-check: the lack of transparency of artificial intelligence products in healthcare Not available safety  
TWIN-GPT: digital twins for clinical trials via large language model Not available other  
UMass-BioNLP at MEDIQA-M3G 2024: DermPrompt–A Systematic Exploration of Prompt Engineering with GPT-4V for Dermatological Diagnosis GitHub capability, other  
Zodiac: A cardiologist-level llm framework for multi-agent diagnostics Not available capability  
OpenAI o1 System Card GitHub capability  
FedAgentBench: Towards Automating Real-World Federated Medical Image Analysis with Server–Client LLM Agents Not available evaluation  
Evaluating large language models as agents in the clinic   Not available evaluation

🚀 Year-2023

Title GitHub Sections
A reinforcement learning approach for VQA validation: An application to diabetic macular edema grading Not available capability
Adaptive multi-agent deep reinforcement learning for timely healthcare interventions Not available capability, application
Asynchronous decentralized federated lifelong learning for landmark localization in medical imaging Not available capability
Beyond memorization: Violating privacy via inference with large language models Not available safety
Camel: Communicative agents for” mind” exploration of large language model society Not available task
Clinically-inspired multi-agent transformers for disease trajectory forecasting from multimodal data GitHub capability, application
Cognitive architectures for language agents Not available intro
Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum Not available evaluation
Deep Imitation Learning for Automated Drop-In Gamma Probe Manipulation Not available task
Diaggpt: An llm-based and multi-agent dialogue system with automatic topic management for flexible task-oriented dialogue GitHub application
Dspy: Compiling declarative language model calls into self-improving pipelines GitHub task
Federated machine learning, privacy-enhancing technologies, and data protection laws in medical research: scoping review Not available safety
Generative agents: Interactive simulacra of human behavior GitHub intro
Interactive medical image segmentation with self-adaptive confidence calibration GitHub task
Large language models as agents in the clinic Not available application
MetaGPT: Meta programming for a multi-agent collaborative framework GitHub task
Navigation Through Endoluminal Channels Using Q-Learning Not available capability, task
PROFSA: SELF-SUPERVISED POCKET PRETRAINING VIA PROTEIN FRAGMENT-SURROUNDINGS ALIGN GitHub capability
Reflexion: Language agents with verbal reinforcement learning GitHub capability
Td-mpc2: Scalable, robust world models for continuous control GitHub task
Temporally-extended prompts optimization for sam in interactive medical image segmentation Not available capability, task
The NCI Imaging Data Commons as a platform for reproducible research in computational pathology Not available intro
Towards Causality-Aware Inferring: A Sequential Discriminative Approach for Medical Diagnosis Not available capability, application

🚀 Earlier

Title GitHub Sections
“My Nose is Running.” “Are You Also Coughing?”: Building a Medical Diagnosis Agent with Interpretable Inquiry Logics GitHub capability, application
A Flexible Schema-Guided Dialogue Management Framework: From Friendly Peer to Virtual Standardized Cancer Patient GitHub capability, application
Building an {ASR} Error Robust Spoken Virtual Patient System in a Highly Class-Imbalanced Scenario Without Speech Data Not available capability, application
Constitutional ai: Harmlessness from ai feedback GitHub capability
MedDG: An Entity-Centric Medical Consultation Dataset for Entity-Aware Medical Dialogue Generation GitHub evaluation
MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning Not available intro
Multi-agent searching system for medical information Not available capability
React: Synergizing reasoning and acting in language models GitHub intro
Scalable Online Disease Diagnosis via Multi-Model-Fused Actor-Critic Reinforcement Learning Not available capability, application
MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical Domain Question Answering GitHub capability, evaluation
A grounded well-being conversational agent with multiple interaction modes: Preliminary results Not available capability, application
Adaptable image quality assessment using meta-reinforcement learning of task amenability GitHub capability
An Edge Based Multi-Agent Model for Improving Hospital Bed Management Not available task
Cross Modality 3D Navigation Using Reinforcement Learning and Neural Style Transfer Not available capability
Extracting Training Data from Large Language Models GitHub safety
Human-AI collaboration in healthcare: A review and research agenda Not available task
Levels of autonomy and safety assurance for AI-Based clinical decision systems Not available safety
Measuring Massive Multitask Language Understanding GitHub evaluation
Autonomous systems and artificial intelligence in healthcare transformation to 5P medicine–ethical challenges Not available intro
Boundary-aware supervoxel-level iteratively refined interactive 3d image segmentation with multi-agent reinforcement learning Not available task
MedDialog: A Large-scale Medical Dialogue Dataset Not available evaluation
Medical visual question answering via conditional reasoning GitHub task
PathVQA: 30000+ Questions for Medical Visual Question Answering GitHub evaluation
PubMedQA: A Dataset for Biomedical Research Question Answering GitHub evaluation
A Dataset of Clinically Generated Visual Questions and Answers About Radiology Images Not available evaluation
Modeling irregularly sampled clinical time series GitHub intro
The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care GitHub task
Medical robotics—Regulatory, ethical, and legal considerations for increasing levels of autonomy Not available task
Tethered to the EHR: primary care physician workload assessment using EHR event log data and time-motion observations Not available intro
Allocation of physician time in ambulatory practice: a time and motion study in 4 specialties Not available intro
Assessing electronic note quality using the physician documentation quality instrument (PDQI-9) Not available task
Privacy by design: The 7 foundational principles Not available safety
Upper processing stages of the perception–action cycle Not available intro
What Disease Does This Patient Have? A Large-scale Open Domain Question Answering Dataset from Medical Exams GitHub evaluation

✨ Papers by Category

🚀 1. Capability

1.1 Planning

Title GitHub Year
Evaluating Large Language Models as Agents in the Clinic Not Available 2024
MedPAO: A Protocol-Driven Agent for Structuring Medical Reports GitHub 2025
Adaptable image quality assessment using meta-reinforcement learning of task amenability GitHub 2023
World Model for AI Autonomous Navigation in Mechanical Thrombectomy GitHub 2025
Polaris: A safety-focused llm constellation architecture for healthcare Not Available 2024
MedChain: Bridging the Gap Between LLM Agents and Clinical Practice through Interactive Sequential Benchmarking GitHub 2024
Rx strategist: Prescription verification using llm agents system Not Available 2024
A Flexible Schema-Guided Dialogue Management Framework: From Friendly Peer to Virtual Standardized Cancer Patient GitHub 2023
Surgbox: Agent-driven operating room sandbox with surgery copilot Not Available 2024
MedicalOS: An {LLM} Agent based Operating System for Digital Healthcare Not Available 2025
“My Nose is Running.” “Are You Also Coughing?”: Building a Medical Diagnosis Agent with Interpretable Inquiry Logics GitHub 2023
Cross Modality 3D Navigation Using Reinforcement Learning and Neural Style Transfer Not Available 2023
Scalable Online Disease Diagnosis via Multi-Model-Fused Actor-Critic Reinforcement Learning Not Available 2023
AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator GitHub 2024
Inquire, Interact, and Integrate: A Proactive Agent Collaborative Framework for Zero-Shot Multimodal Medical Reasoning GitHub 2024
Llm-medqa: Enhancing medical question answering through case studies in large language models Not Available 2024
Medco: Medical education copilots based on a multi-agent framework Not Available 2024
Mitigating cognitive biases in clinical decision-making through multi-agent conversations using large language models: simulation study Not Available 2024
Regulator-manufacturer AI agents modeling: Mathematical feedback-driven multi-agent LLM framework Not Available 2024
Society of medical simplifiers GitHub 2024
Towards Automatic Evaluation for {LLM}s’ Clinical Capabilities: Metric, Data, and Algorithm Not Available 2024
Towards next-generation medical agent: How o1 is reshaping decision-making in medical scenarios Not Available 2024
A Knowledge-driven Adaptive Collaboration of LLMs for Enhancing Medical Decision-making GitHub 2025
A dual-agent collaboration framework based on llms for nursing robots to perform bimanual coordination tasks Not Available 2025
Agentic Medical Knowledge Graphs Enhance Medical Question Answering: Bridging the Gap Between LLMs and Evolving Medical Knowledge GitHub 2025
Agentic-AI Healthcare: Multilingual, Privacy-First Framework with {MCP} Agents GitHub 2025
Autonomous Multi-Modal LLM Agents for Treatment Planning in Focused Ultrasound Ablation Surgery GitHub 2025
Colacare: Enhancing electronic health record modeling through large language model-driven multi-agent collaboration GitHub 2025
Haibu Mathematical-Medical Intelligent Agent: Enhancing Large Language Model Reliability in Medical Tasks via Verifiable Reasoning Chains Not Available 2025
Improving Interactive Diagnostic Ability of a Large Language Model Agent Through Clinical Experience Learning GitHub 2025
Learning to be a doctor: Searching for effective medical agent architectures Not Available 2025
MeNTi: Bridging medical calculator and LLM agent with nested tool calling GitHub 2025
OpenLens AI: Fully Autonomous Research Agent for Health Infomatics GitHub 2025
Talk Before You Retrieve: Agent-Led Discussions for Better RAG in Medical QA GitHub 2025
Tree-based RAG-Agent Recommendation System: A Case Study in Medical Test Data Not Available 2025
AI chatbots as professional service agents: developing a professional identity Not Available 2025
A reinforcement learning approach for VQA validation: An application to diabetic macular edema grading Not Available 2023
Building an {ASR} Error Robust Spoken Virtual Patient System in a Highly Class-Imbalanced Scenario Without Speech Data Not Available 2023
MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical Domain Question Answering GitHub 2023
Multi-agent searching system for medical information Not Available 2023
A demonstration of adaptive collaboration of large language models for medical decision-making GitHub 2024
Agentic llm workflows for generating patient-friendly medical reports GitHub 2024
Agentigraph: An interactive knowledge graph platform for llm-based chatbots utilizing private data GitHub 2024
Improving Clinical Documentation with AI: A Comparative Study of Sporo AI Scribe and GPT-4o mini Not Available 2024
Piors: Personalized intelligent outpatient reception based on large language model with multi-agents medical scenario simulation GitHub 2024
SmartState: An Automated Research Protocol Adherence System GitHub 2025

1.2 Tool Use

Title GitHub Year
ClinicalAgent: Clinical Trial Multi-Agent System with Large Language Model-based Reasoning GitHub 2024
Doctor-R1: Mastering Clinical Inquiry with Experiential Agentic Reinforcement Learning GitHub 2025
KERAP: A Knowledge-Enhanced Reasoning Approach for Accurate Zero-shot Diagnosis Prediction Using Multi-agent LLMs GitHub 2025
Iryonlp at mediqa-corr 2024: Tackling the medical error detection \& correction task on the shoulders of medical agents Not Available 2024
Proactive Reasoning-with-Retrieval Framework for Medical Multimodal Large Language Models GitHub 2025
Cxr-agent: Vision-language models for chest x-ray interpretation with uncertainty aware radiology reporting Not Available 2024
Enhancing llms for impression generation in radiology reports through a multi-agent system GitHub 2024
GuidelineGuard: An Agentic Framework for Medical Note Evaluation with Guideline Adherence Not Available 2024
Medrax: Medical reasoning agent for chest x-ray GitHub 2025
Mmedagent: Learning to use medical tools with multi-modal agent GitHub 2024
Med-VRAgent: A Framework for Medical Visual Reasoning-Enhanced Agents GitHub 2025
MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning GitHub 2024
AMANDA: Agentic Medical Knowledge Augmentation for Data-Efficient Medical Visual Question Answering GitHub 2025
NurseLLM: The First Specialized Language Model for Nursing Not Available 2025
Autonomous artificial intelligence agents for clinical decision making in oncology Not Available 2024
ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World GitHub 2024
Llm-based framework for administrative task automation in healthcare Not Available 2024
A Multi-Agent Approach to Neurological Clinical Reasoning Not Available 2025
Adagent: Llm agent for alzheimer’s disease analysis with collaborative coordinator Not Available 2025
ReflecTool: Towards Reflection-Aware Tool-Augmented Clinical Agents GitHub 2025
Development of a Large Language Model-based Multi-Agent Clinical Decision Support System for Korean Triage and Acuity Scale (KTAS)-Based Triage and Treatment Planning in Emergency Departments GitHub 2024
Large Language Model-Enhanced Interactive Agent for Public Education on Newborn Auricular Deformities Not Available 2024
Natural Language Programming in Medicine: Administering Evidence Based Clinical Workflows with Autonomous Agents Powered by Generative Large Language Models Not Available 2024
Agentmd: Empowering language agents for risk prediction with large-scale clinical tool learning GitHub 2025
MoMA: A Mixture-of-Multimodal-Agents Architecture for Enhancing Clinical Prediction Modelling Not Available 2025
Multi-OphthaLingua: A Multilingual Benchmark for Assessing and Debiasing LLM Ophthalmological QA in {LMIC}s Not Available 2025

1.3 Memory

Title GitHub Year
Patient-Zero: A Unified Framework for Real-Record-Free Patient Agent Generation Not Available 2025
Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents GitHub 2024
MedAide: Towards an Omni Medical Aide via Specialized {LLM}-based Multi-Agent Collaboration Not Available 2024
Ehragent: Code empowers large language models for few-shot complex tabular reasoning on electronic health records GitHub 2024
Towards Causality-Aware Inferring: A Sequential Discriminative Approach for Medical Diagnosis Not Available 2023
EMR-AGENT: Automating Cohort and Feature Extraction from EMR Databases GitHub 2025
Mdagents: An adaptive collaboration of llms for medical decision-making GitHub 2024
AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments GitHub 2024
Knowledge-infused llm-powered conversational health agent: A case study for diabetes patients Not Available 2024
A grounded well-being conversational agent with multiple interaction modes: Preliminary results Not Available 2023
RareAgents: Advancing Rare Disease Care through LLM-Empowered Multi-disciplinary Team Not Available 2024
Healthcare Agent: Eliciting the Power of Large Language Models for Medical Consultation Not Available 2025

1.4 Self-Improvement

Title GitHub Year
Self-Assessment of Content, Pedagogy, and Technology Knowledge among Higher Education Academics in Bahrain Not Available 2025
Constitutional ai: Harmlessness from ai feedback GitHub 2023
A Self-Evolving Framework for Multi-Agent Medical Consultation Based on Large Language Models Not Available 2025
Adaptive Reasoning and Acting in Medical Language Agents Not Available 2024
Exploring LLM-based Data Annotation Strategies for Medical Dialogue Preference Alignment Not Available 2024
Reflexion: Language agents with verbal reinforcement learning GitHub 2023
MALADE: orchestration of LLM-powered agents with retrieval augmented generation for pharmacovigilance GitHub 2024
Medical AI Consensus: A Multi-Agent Framework for Radiology Report Generation and Evaluation Not Available 2025
PROFSA: SELF-SUPERVISED POCKET PRETRAINING VIA PROTEIN FRAGMENT-SURROUNDINGS ALIGN GitHub 2023

1.5 Reasoning

Title GitHub Year
Proof-of-TBI–Fine-Tuned Vision Language Model Consortium and OpenAI-o3 Reasoning LLM-Based Medical Diagnosis Support System for Mild Traumatic Brain Injury (TBI) Prediction Not Available 2025
OpenAI o1 System Card GitHub 2024
Beyond direct diagnosis: LLM-based multi-specialist agent consultation for automatic diagnosis Not Available 2024
MedTrust-RAG: Evidence Verification and Trust Alignment for Biomedical Question Answering Not Available 2025
KG4Diagnosis: A Hierarchical Multi-Agent LLM Framework with Knowledge Graph Enhancement for Medical Diagnosis Not Available 2024
MAGDA: Multi-agent guideline-driven diagnostic assistance Not Available 2024
Zero-Shot Large Language Model Agents for Fully Automated Radiotherapy Treatment Planning Not Available 2025
Cod, towards an interpretable medical agent using chain of diagnosis GitHub 2025
Asynchronous decentralized federated lifelong learning for landmark localization in medical imaging Not Available 2023
Temporally-extended prompts optimization for sam in interactive medical image segmentation Not Available 2023
Integration of multi-source medical data for medical diagnosis question answering GitHub 2024
MedGen: An Explainable Multi-Agent Architecture for Clinical Decision Support through Multisource Knowledge Fusion Not Available 2024
Zodiac: A cardiologist-level llm framework for multi-agent diagnostics Not Available 2024
An active inference strategy for prompting reliable responses from large language models in medical practice Not Available 2025
CARE-AD: a multi-agent large language model framework for Alzheimer’s disease prediction using longitudinal clinical notes Not Available 2025
Feat: A multi-agent forensic ai system with domain-adapted large language model for automated cause-of-death analysis GitHub 2025
Fine-tuning vision language models with graph-based knowledge for explainable medical image analysis Not Available 2025
Advancing healthcare automation: Multi-agent system for medical necessity justification Not Available 2024
ArgMed-Agents: explainable clinical decision reasoning with LLM disscusion via argumentation schemes Not Available 2024
Imas: A comprehensive agentic approach to rural healthcare delivery GitHub 2024
Simulated patient systems are intelligent when powered by large language model-based AI agents Not Available 2024
UMass-BioNLP at MEDIQA-M3G 2024: DermPrompt–A Systematic Exploration of Prompt Engineering with GPT-4V for Dermatological Diagnosis GitHub 2024
CataractSurg-80K: Knowledge-Driven Benchmarking for Structured Reasoning in Ophthalmic Surgery Planning Not Available 2025
Dr. Copilot: A Multi-Agent Prompt Optimized Assistant for Improving Patient-Doctor Communication in Romanian Not Available 2025
Enhancing Medical Lung X-Ray Diagnosis Through Multi-Agent Vision-Language Model Collaboration Not Available 2025
Enhancing diagnostic capability with multi-agents conversational large language models GitHub 2025
LINS: A general medical Q\&A framework for enhancing the quality and credibility of LLM-generated responses GitHub 2025
OAAgent: Multimodal LLM Agent for Predicting Knee Osteoarthritis Progression Not Available 2025
Silence is Not Consensus: Disrupting Agreement Bias in Multi-Agent LLMs via Catfish Agent for Clinical Decision Making Not Available 2025
Standard Applicability Judgment and Cross-jurisdictional Reasoning: A RAG-based Framework for Medical Device Compliance Not Available 2025

1.6 Perception

Title GitHub Year
Navigation Through Endoluminal Channels Using Q-Learning Not Available 2023
Agent-Based Uncertainty Awareness Improves Automated Radiology Report Labeling with an Open-Source Large Language Model Not Available 2025
ChatMyopia: An AI Agent for Pre-consultation Education in Primary Eye Care Settings Not Available 2025
Vision-language model for report generation and outcome prediction in CT pulmonary angiogram GitHub 2025
MedCoAct: Confidence-Aware Multi-Agent Collaboration for Complete Clinical Decision Not Available 2025
MedRepBench: A Comprehensive Benchmark for Medical Report Interpretation Not Available 2025
T-agent: A term-aware agent for medical dialogue generation Not Available 2024

1.7 Others (continual learning, uncertainty)

Title GitHub Year
CancerGUIDE: Cancer Guideline Understanding via Internal Disagreement Estimation GitHub 2025
Using large language models for enhanced fraud analysis and detection in blockchain based health insurance claims GitHub 2025
MedAgentAudit: Diagnosing and Quantifying Collaborative Failure Modes in Medical Multi-Agent Systems GitHub 2025
Benchmarking Large Language Models on Communicative Medical Coaching: A Dataset and a Novel System GitHub 2024
Tiered Agentic Oversight: A Hierarchical Multi-Agent System for AI Safety in Healthcare GitHub 2025

🚀 2. Atomic Function

2.1 Basic Technology Empowerment

Title GitHub Year
AGENTiGraph: A Multi-Agent Knowledge Graph Framework for Interactive, Domain-Specific LLM Chatbots GitHub 2025
Data Overdose? Time for a Quadruple Shot: Knowledge Graph Construction Using Enhanced Triple Extraction Not Available 2025
MedKGEval: A Knowledge Graph-Based Multi-Turn Evaluation Framework for Open-Ended Patient Interactions with Clinical LLMs Not Available 2025
MedKGent: A Large Language Model Agent Framework for Constructing Temporally Evolving Medical Knowledge Graph GitHub 2025
GMAT: Grounded Multi-agent Clinical Description Generation for Text Encoder in Vision-Language MIL for Whole Slide Image Classification Not Available 2025
Pathfinder: A multi-modal multi-agent system for medical diagnostic decision-making applied to histopathology GitHub 2025
Boundary-aware supervoxel-level iteratively refined interactive 3d image segmentation with multi-agent reinforcement learning Not Available 2023
Interactive medical image segmentation with self-adaptive confidence calibration GitHub 2023
Image Segmentation Using Only” Better or Worse” Expert Feedback Not Available 2025
SCOPE: Speech-Guided COllaborative PErception Framework for Surgical Scene Segmentation Not Available 2025
Assessing electronic note quality using the physician documentation quality instrument (PDQI-9) Not Available 2023

2.2 Core Diagnostic & Therapeutic Assistance

Title GitHub Year
Dspy: Compiling declarative language model calls into self-improving pipelines GitHub 2023
A two-stage proactive dialogue generator for efficient clinical information collection using large language model Not Available 2025
Doctoragent-rl: A multi-agent collaborative reinforcement learning system for multi-turn clinical dialogue GitHub 2025
Llms can simulate standardized patients via agent coevolution GitHub 2025
MedAgentSim: Self-evolving Multi-agent Simulations for Realistic Clinical Interactions GitHub 2025
Medical visual question answering via conditional reasoning GitHub 2023
Architecting Clinical Collaboration: Multi-Agent Reasoning Systems for Multimodal Medical VQA GitHub 2025
SOLVE-Med: Specialized Orchestration for Leading Vertical Experts across Medical Specialties GitHub 2025
Deep Imitation Learning for Automated Drop-In Gamma Probe Manipulation Not Available 2023
Td-mpc2: Scalable, robust world models for continuous control GitHub 2023
Magnetic Milli-Spinner for Robotic Endovascular Surgery Not Available 2025
Surgraw: Multi-agent workflow with chain-of-thought reasoning for surgical intelligence GitHub 2025

2.3 Workflow & Documentation Optimization

Title GitHub Year
A Multimodal Multi-Agent Framework for Radiology Report Generation Not Available 2025
MRGAgents: A Multi-Agent Framework for Improved Medical Report Generation with Med-LVLMs Not Available 2025
Towards interpretable radiology report generation via concept bottlenecks using a multi-agentic rag GitHub 2025
An Agentic Model Context Protocol Framework for Medical Concept Standardization GitHub 2025

🚀 3. Application

3.1 Intake & Clinical Dialogue

Title GitHub Year
Chatbot To Help Patients Understand Their Health GitHub 2025
Diaggpt: An llm-based and multi-agent dialogue system with automatic topic management for flexible task-oriented dialogue GitHub 2023

3.2 Virtual MDT Teams & Multimodal Reasoning

Title GitHub Year
An agentic system for rare disease diagnosis with traceable reasoning Not Available 2025
CT-Agent: A Multimodal-LLM Agent for 3D CT Radiology Question Answering Not Available 2025
Eyecaregpt: Boosting comprehensive ophthalmology understanding with tailored dataset, benchmark and model GitHub 2025
Med-GRIM: Enhanced Zero-Shot Medical VQA using prompt-embedded Multimodal Graph RAG GitHub 2025
MAM: Modular Multi-Agent Framework for Multi-Modal Medical Diagnosis via Role-Specialized Collaboration GitHub 2025
Mdteamgpt: A self-evolving llm-based multi-agent framework for multi-disciplinary team medical consultation GitHub 2025
MedLA: A Logic-Driven Multi-Agent Framework for Complex Medical Reasoning with Large Language Models GitHub 2025

3.3 Treatment Procedures

Title GitHub Year
An Adaptive Multi-Agent LLM-Based Clinical Decision Support System Integrating Biomedical RAG and Web Intelligence GitHub 2025
Development and validation of an autonomous artificial intelligence agent for clinical decision-making in oncology Not Available 2025

3.4 Chronic Disease Management & Prescription Safety

Title GitHub Year
RxLens: Multi-Agent LLM-powered Scan and Order for Pharmacy Not Available 2025
Adaptive multi-agent deep reinforcement learning for timely healthcare interventions Not Available 2023
Remoni: An autonomous system integrating wearables and multimodal large language models for enhanced remote health monitoring Not Available 2024
Clinically-inspired multi-agent transformers for disease trajectory forecasting from multimodal data GitHub 2023
Med-TAMARA: Trust-Aware Multi-Agent Risk Assessment in Medical AI Dialogue Not Available 2025

3.5 Documentation, Coding & Knowledge Infrastructure

Title GitHub Year
Benchmarking Automatic Speech Recognition coupled LLM Modules for Medical Diagnostics Not Available 2025
Exploring llm multi-agents for icd coding Not Available 2024
Code Like Humans: A Multi-Agent Solution for Medical Coding GitHub 2025

3.6 Simulation & Support Systems

Title GitHub Year
Towards anatomy education with generative AI-based virtual assistants in immersive virtual reality environments Not Available 2024
When Avatars Have Personality: Effects on Engagement and Communication in Immersive Medical Training Not Available 2025

3.7 Regulation, Payer Workflows & Administrative Automation

Title GitHub Year
Large language models as agents in the clinic Not Available 2023
Enhancing diagnostic accuracy through multi-agent conversations: using large language models to mitigate cognitive bias Not Available 2024
A hybrid reinforcement learning and knowledge graph framework for financial risk optimization in healthcare systems Not Available 2025

🚀 4. Safety

4.1 Medical Hallucination

Title GitHub Year
Ethical and regulatory challenges of large language models in medicine Not Available 2024
Medhalu: Hallucinations in responses to healthcare queries by large language models Not Available 2024
Mitigating hallucinations in large language models via self-refinement-enhanced knowledge retrieval Not Available 2024
A framework to assess clinical safety and hallucination rates of LLMs for medical text summarisation Not Available 2025
Evaluating transparency in AI/ML model characteristics for FDA-reviewed medical devices Not Available 2025
MEGA-RAG: a retrieval-augmented generation framework with multi-evidence guided answer refinement for mitigating hallucinations of LLMs in public health Not Available 2025
Medhallu: A comprehensive benchmark for detecting medical hallucinations in large language models GitHub 2025
Medical hallucinations in foundation models and their impact on healthcare GitHub 2025
Medmmv: A controllable multimodal multi-agent framework for reliable and verifiable clinical reasoning Not Available 2025
Reducing Hallucinations and Trade-Offs in Responses in Generative AI Chatbots for Cancer Information: Development and Evaluation Study Not Available 2025

4.2 Privacy & Data-Security

Title GitHub Year
Beyond memorization: Violating privacy via inference with large language models Not Available 2023
Extracting Training Data from Large Language Models GitHub 2023
Federated machine learning, privacy-enhancing technologies, and data protection laws in medical research: scoping review Not Available 2023
Privacy by design: The 7 foundational principles Not Available 2023
On protecting the data privacy of large language models (llms): A survey Not Available 2024
Privacy-Preserving Large Language Models: Mechanisms Not Available 2024
Autonomous Radiotherapy Treatment Planning Using DOLA: A Privacy-Preserving, LLM-Based Optimization Agent Not Available 2025
Differential privacy for medical deep learning: methods, tradeoffs, and deployment implications Not Available 2025
Multi agent based medical assistant for edge devices GitHub 2025
Privacy in action: Towards realistic privacy mitigation and evaluation for llm-powered agents GitHub 2025

4.3 Explainability & Transparency

Title GitHub Year
A trustworthy AI reality-check: the lack of transparency of artificial intelligence products in healthcare Not Available 2024
The role of explainability in AI-supported medical decision-making Not Available 2024
ASTRID–An Automated and Scalable TRIaD for the Evaluation of RAG-based Clinical Question Answering Systems Not Available 2025
Explainable AI for medical data: Current methods, limitations, and future directions Not Available 2025
MedAgent-Pro: Towards Evidence-Based Multi-Modal Medical Diagnosis via Reasoning Agentic Workflow Not Available 2025
Program Synthesis Dialog Agents for Interactive Decision-Making Not Available 2025
Tree-of-Reasoning: Towards Complex Medical Diagnosis via Multi-Agent Reasoning with Evidence Tree GitHub 2025

4.4 Adversarial Security & Threat Modeling

Title GitHub Year
Adversarial attacks on large language models in medicine Not Available 2024
On the resilience of llm-based multi-agent collaboration with faulty agents Not Available 2024
Data Poisoning Vulnerabilities Across Healthcare AI Architectures: A Security Threat Analysis Not Available 2025
Emerging cyber attack risks of medical ai agents Not Available 2025
Red-teaming llm multi-agent systems via communication attacks Not Available 2025
TAMAS: Benchmarking Adversarial Risks in Multi-Agent LLM Systems Not Available 2025
Towards safe ai clinicians: A comprehensive study on large language model jailbreaking in healthcare Not Available 2025

4.5 AI Governance & Systemic Safety

Title GitHub Year
Levels of autonomy and safety assurance for AI-Based clinical decision systems Not Available 2023

4.6 Bias, Fairness & Accessibility

Title GitHub Year
Achieving health equity through conversational AI: A roadmap for design and implementation of inclusive chatbots in healthcare Not Available 2024
Aligning Medical LLMs for Counterfactual Fairness GitHub 2024
Foundation metrics for evaluating effectiveness of healthcare conversations powered by generative AI Not Available 2024
Actions speak louder than words: Agent decisions reveal implicit biases in language models Not Available 2025
Balancing Fairness and Performance in Healthcare {AI}: A Gradient Reconciliation Approach Not Available 2025
Bias-Aware Agent: Enhancing Fairness in AI-Driven Knowledge Retrieval GitHub 2025
Conversational health agents: a personalized large language model-powered agent framework GitHub 2025
Transforming healthcare delivery with conversational AI platforms Not Available 2025

🚀 5. Evaluation

5.1 Benchmarks

Title GitHub Year
Med-CMR: A Fine-Grained Benchmark Integrating Visual Evidence and Clinical Logic for Medical Complex Multimodal Reasoning GitHub 2025
A Dataset of Clinically Generated Visual Questions and Answers About Radiology Images Not Available 2023
Measuring Massive Multitask Language Understanding GitHub 2023
MedDG: An Entity-Centric Medical Consultation Dataset for Entity-Aware Medical Dialogue Generation GitHub 2023
MedDialog: A Large-scale Medical Dialogue Dataset Not Available 2023
PathVQA: 30000+ Questions for Medical Visual Question Answering GitHub 2023
PubMedQA: A Dataset for Biomedical Research Question Answering GitHub 2023
FedAgentBench: Towards Automating Real-World Federated Medical Image Analysis with Server–Client LLM Agents Not Available 2024
MMLU-Pro: A More Robust Benchmark for Multi-Task Language Understanding GitHub 2024
MedQA-CS: Benchmarking Large Language Models’ Clinical Skills Using an AI-SCE Framework GitHub 2024
3mdbench: Medical multimodal multi-agent dialogue benchmark GitHub 2025
AMQA: An Adversarial Dataset for Benchmarking Bias of LLMs in Medicine and Healthcare GitHub 2025
Beyond Benchmarks: Dynamic, Automatic and Systematic Red-Teaming Agents for Trustworthy Medical Language Models GitHub 2025
MedAgentBoard: Benchmarking Multi-Agent Collaboration with Conventional Methods for Diverse Medical Tasks GitHub 2025
MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning GitHub 2025
MedBrowseComp: Benchmarking Medical Deep Research and Computer Use GitHub 2025
MedFact: A Large-scale Chinese Dataset for Evidence-based Medical Fact-Checking of LLM Responses GitHub 2025
MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding GitHub 2025
Medresearcher-r1: Expert-level medical deep researcher via a knowledge-informed trajectory synthesis framework GitHub 2025

5.2 Metrics

Title GitHub Year
Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum Not Available 2023
What Disease Does This Patient Have? A Large-scale Open Domain Question Answering Dataset from Medical Exams GitHub 2023
AI Agents in Clinical Medicine: A Systematic Review Not Available 2025
Audited Reasoning Refinement: Fine-Tuning Language Models via LLM-Guided Step-Wise Evaluation and Correction Not Available 2025
Discovering Pathology Rationale and Token Allocation for Efficient Multimodal Pathology Reasoning Not Available 2025
Ensuring Safety and Trust: Analyzing the Risks of Large Language Models in Medicine Not Available 2025
Evaluating the accuracy of a state-of-the-art large language model for prediction of admissions from the emergency room Not Available 2025
Geometry-preserving encoder/decoder in latent generative models GitHub 2025
How Can We Diagnose and Treat Bias in Large Language Models for Clinical Decision-Making? Not Available 2025
Large language models in real-world clinical workflows: a systematic review of applications and implementation Not Available 2025
MedAgentBench: Dataset for Benchmarking LLMs as Agents GitHub 2025
Multi-Agent-as-Judge: Aligning LLM-Agent-Based Automated Evaluation with Multi-Dimensional Human Evaluation Not Available 2025
Performance of Retrieval-Augmented Generation Large Language Models in Guideline-Concordant Prostate-Specific Antigen Testing: Comparative Study With Junior Clinicians Not Available 2025
Real-World Evaluation of Large Language Models in Healthcare (RWE-LLM): A New Realm of AI Safety \& Validation Not Available 2025
Trustworthy Reasoning: Evaluating and Enhancing Factual Accuracy in LLM Intermediate Thought Processes Not Available 2025

5.3 Challenge & Discussion

Title GitHub Year
Beyond Benchmarks: Evaluating Generalist Medical Artificial Intelligence With Psychometrics Not Available 2025
Ehr-mcp: Real-world evaluation of clinical information retrieval by large language models via model context protocol Not Available 2025
Measurement to Meaning: A Validity-Centered Framework for AI Evaluation Not Available 2025
Position: Medical large language model benchmarks should prioritize construct validity Not Available 2025
Rapidly benchmarking large language models for diagnosing comorbid patients: comparative study leveraging the LLM-as-a-judge method Not Available 2025
The evaluation illusion of large language models in medicine Not Available 2025

🚀 6. Communication & Collaboration Mechanisms

Title GitHub Year
Camel: Communicative agents for” mind” exploration of large language model society Not Available 2023
MetaGPT: Meta programming for a multi-agent collaborative framework GitHub 2023
RareAgents: Autonomous Multi-disciplinary Team for Rare Disease Diagnosis and Treatment Not Available 2024
Bridging Clinical Narratives and ACR Appropriateness Guidelines: A Multi-Agent RAG System for Medical Imaging Decisions GitHub 2025
Human-AI collaboration in healthcare: A review and research agenda Not Available 2023
Medical robotics—Regulatory, ethical, and legal considerations for increasing levels of autonomy Not Available 2023
The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care GitHub 2023
Chatdev: Communicative agents for software development GitHub 2024
Towards conversational diagnostic artificial intelligence Not Available 2025
Smile: Single-turn to multi-turn inclusive language expansion via chatgpt for mental health support Not Available 2024

🚀 7. Others

Title GitHub Year
Drugagent: Multi-agent large language model-based reasoning for drug-target interaction prediction Not Available 2025
An Edge Based Multi-Agent Model for Improving Hospital Bed Management Not Available 2023
DispatchMAS: Fusing taxonomy and artificial intelligence agents for emergency medical services Not Available 2025
Lessons Learned from Evaluation of LLM based Multi-agents in Safer Therapy Recommendation GitHub 2025
M3Builder: A Multi-Agent System for Automated Machine Learning in Medical Imaging GitHub 2025
MMedAgent-RL: Optimizing Multi-Agent Collaboration for Multimodal Medical Reasoning Not Available 2025
Resilient Multi-Agent Negotiation for Medical Supply Chains: Integrating LLMs and Blockchain for Transparent Coordination Not Available 2025
Survey and improvement strategies for gene prioritization with large language models Not Available 2025
MedOrch: Medical Diagnosis with Tool-Augmented Reasoning Agents for Flexible Extensibility Not Available 2025
Demo: Healthcare Agent Orchestrator (HAO) for Patient Summarization in Molecular Tumor Boards Not Available 2025
Mediator-guided multi-agent collaboration among open-source models for medical decision-making Not Available 2025
A co-evolving agentic AI system for medical imaging analysis GitHub 2025
FRAME: Feedback-Refined Agent Methodology for Enhancing Medical Research Insights GitHub 2025
Meddxagent: A unified modular agent framework for explainable automatic differential diagnosis GitHub 2025
AURA: A Multi-modal Medical Agent for Understanding, Reasoning and Annotation GitHub 2025
GEMA-Score: Granular Explainable Multi-Agent Score for Radiology Report Evaluation GitHub 2025
PASS: Probabilistic Agentic Supernet Sampling for Interpretable and Adaptive Chest X-Ray Reasoning GitHub 2025
Cancer-Myth: Evaluating Large Language Models on Patient Questions with False Presuppositions GitHub 2025
EH-Benchmark: Ophthalmic hallucination benchmark and agent-driven top-down traceable reasoning workflow GitHub 2025
Medchat: A multi-agent framework for multimodal diagnosis with large language models GitHub 2025
Autohealth: Advanced llm-empowered wearable personalized medical butler for parkinson’s disease management Not Available 2024
TWIN-GPT: digital twins for clinical trials via large language model Not Available 2024
A Proposed LLM-Based Supported Treatment Framework for Intracerebral Hemorrhage Not Available 2025
Agentic AI for Clinical Decision Support: Real-Time Diagnosis, Triage, and Treatment Planning Not Available 2025
Agentic Workflows in Healthcare: Advancing Clinical Efficiency through AI Integration Not Available 2025
Developing an Artificial Intelligence Tool for Personalized Breast Cancer Treatment Plans based on the NCCN Guidelines Not Available 2025
TxAgent: An AI agent for therapeutic reasoning across a universe of tools GitHub 2025
Allocation of physician time in ambulatory practice: a time and motion study in 4 specialties Not Available 2023
Tethered to the EHR: primary care physician workload assessment using EHR event log data and time-motion observations Not Available 2023
The NCI Imaging Data Commons as a platform for reproducible research in computational pathology Not Available 2023
At-cxr: Uncertainty-aware agentic triage for chest x-rays GitHub 2025
In-Basket Message Volume in Primary Care: A Cross-sectional Analysis by Gender and Specialty Not Available 2025
Modeling irregularly sampled clinical time series GitHub 2023
Multimodal Models in Healthcare: Methods, Challenges, and Future Directions for Enhanced Clinical Decision Support Not Available 2025
Visual-Conversational Interface for Evidence-Based Explanation of Diabetes Risk Prediction GitHub 2025
Autonomous systems and artificial intelligence in healthcare transformation to 5P medicine–ethical challenges Not Available 2023
Cognitive architectures for language agents Not Available 2023
Generative agents: Interactive simulacra of human behavior GitHub 2023
MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning Not Available 2023
React: Synergizing reasoning and acting in language models GitHub 2023
Upper processing stages of the perception–action cycle Not Available 2023
A survey on large language model based autonomous agents Not Available 2024
Tool learning with large language models: A survey GitHub 2025

Citation

```bibtex
@misc{,
title={The Landscape of Medical Agents: A Survey},

}