| Med-CMR: A Fine-Grained Benchmark Integrating Visual Evidence and Clinical Logic for Medical Complex Multimodal Reasoning |
GitHub |
evaluation |
| 3mdbench: Medical multimodal multi-agent dialogue benchmark |
GitHub |
evaluation |
| A co-evolving agentic AI system for medical imaging analysis |
GitHub |
other |
| A dual-agent collaboration framework based on llms for nursing robots to perform bimanual coordination tasks |
Not Available |
capability |
| A framework to assess clinical safety and hallucination rates of LLMs for medical text summarisation |
Not Available |
safety, evaluation |
| A hybrid reinforcement learning and knowledge graph framework for financial risk optimization in healthcare systems |
Not Available |
application |
| A Multi-Agent Approach to Neurological Clinical Reasoning |
Not Available |
capability, other |
| A Multimodal Multi-Agent Framework for Radiology Report Generation |
Not Available |
task, evaluation, other |
| A Proposed LLM-Based Supported Treatment Framework for Intracerebral Hemorrhage |
Not Available |
intro, capability, application |
| A Self-Evolving Framework for Multi-Agent Medical Consultation Based on Large Language Models |
Not Available |
capability, application |
| A two-stage proactive dialogue generator for efficient clinical information collection using large language model |
Not Available |
task, application |
| Actions speak louder than words: Agent decisions reveal implicit biases in language models |
Not Available |
safety |
| Adagent: Llm agent for alzheimer’s disease analysis with collaborative coordinator |
Not Available |
capability, other |
| Agent-Based Uncertainty Awareness Improves Automated Radiology Report Labeling with an Open-Source Large Language Model |
Not Available |
capability, other |
| Agentic AI for Clinical Decision Support: Real-Time Diagnosis, Triage, and Treatment Planning |
Not Available |
intro |
| Agentic Medical Knowledge Graphs Enhance Medical Question Answering: Bridging the Gap Between LLMs and Evolving Medical Knowledge |
GitHub |
capability, task, application |
| Agentic Workflows in Healthcare: Advancing Clinical Efficiency through AI Integration |
Not Available |
intro |
| Agentic-AI Healthcare: Multilingual, Privacy-First Framework with {MCP} Agents |
GitHub |
capability, application, safety |
| AGENTiGraph: A Multi-Agent Knowledge Graph Framework for Interactive, Domain-Specific LLM Chatbots |
GitHub |
task |
| Agentmd: Empowering language agents for risk prediction with large-scale clinical tool learning |
GitHub |
capability, task, application |
| AI Agents in Clinical Medicine: A Systematic Review |
Not Available |
evaluation |
| AI chatbots as professional service agents: developing a professional identity |
Not Available |
capability, application |
| AMANDA: Agentic Medical Knowledge Augmentation for Data-Efficient Medical Visual Question Answering |
GitHub |
capability, task, other |
| AMQA: An Adversarial Dataset for Benchmarking Bias of LLMs in Medicine and Healthcare |
GitHub |
evaluation |
| An active inference strategy for prompting reliable responses from large language models in medical practice |
Not Available |
capability |
| An Adaptive Multi-Agent LLM-Based Clinical Decision Support System Integrating Biomedical RAG and Web Intelligence |
GitHub |
application |
| An Agentic Model Context Protocol Framework for Medical Concept Standardization |
GitHub |
task |
| An agentic system for rare disease diagnosis with traceable reasoning |
Not Available |
task, application, other |
| Architecting Clinical Collaboration: Multi-Agent Reasoning Systems for Multimodal Medical VQA |
GitHub |
task, other |
| ASTRID–An Automated and Scalable TRIaD for the Evaluation of RAG-based Clinical Question Answering Systems |
Not Available |
safety |
| At-cxr: Uncertainty-aware agentic triage for chest x-rays |
GitHub |
intro, other |
| Audited Reasoning Refinement: Fine-Tuning Language Models via LLM-Guided Step-Wise Evaluation and Correction |
Not Available |
evaluation |
| AURA: A Multi-modal Medical Agent for Understanding, Reasoning and Annotation |
GitHub |
other |
| Autonomous Multi-Modal LLM Agents for Treatment Planning in Focused Ultrasound Ablation Surgery |
GitHub |
capability, task, application |
| Autonomous Radiotherapy Treatment Planning Using DOLA: A Privacy-Preserving, LLM-Based Optimization Agent |
Not Available |
safety, other |
| Balancing Fairness and Performance in Healthcare {AI}: A Gradient Reconciliation Approach |
Not Available |
safety |
| Benchmarking Automatic Speech Recognition coupled LLM Modules for Medical Diagnostics |
Not Available |
application |
| Beyond Benchmarks: Dynamic, Automatic and Systematic Red-Teaming Agents for Trustworthy Medical Language Models |
GitHub |
evaluation |
| Beyond Benchmarks: Evaluating Generalist Medical Artificial Intelligence With Psychometrics |
Not Available |
evaluation |
| Bridging Clinical Narratives and ACR Appropriateness Guidelines: A Multi-Agent RAG System for Medical Imaging Decisions |
GitHub |
task |
| Cancer-Myth: Evaluating Large Language Models on Patient Questions with False Presuppositions |
GitHub |
other |
| CancerGUIDE: Cancer Guideline Understanding via Internal Disagreement Estimation |
GitHub |
intro, capability, application, other |
| CARE-AD: a multi-agent large language model framework for Alzheimer’s disease prediction using longitudinal clinical notes |
Not Available |
capability, task, other |
| CataractSurg-80K: Knowledge-Driven Benchmarking for Structured Reasoning in Ophthalmic Surgery Planning |
Not Available |
capability, application, other |
| Chatbot To Help Patients Understand Their Health |
GitHub |
application |
| ChatMyopia: An AI Agent for Pre-consultation Education in Primary Eye Care Settings |
Not Available |
capability, application, other |
| Cod, towards an interpretable medical agent using chain of diagnosis |
GitHub |
capability, application, safety, evaluation, other |
| Code Like Humans: A Multi-Agent Solution for Medical Coding |
GitHub |
application |
| Conversational health agents: a personalized large language model-powered agent framework |
GitHub |
safety |
| CT-Agent: A Multimodal-LLM Agent for 3D CT Radiology Question Answering |
Not Available |
application |
| Data Overdose? Time for a Quadruple Shot: Knowledge Graph Construction Using Enhanced Triple Extraction |
Not Available |
task |
| Data Poisoning Vulnerabilities Across Healthcare AI Architectures: A Security Threat Analysis |
Not Available |
safety |
| Developing an Artificial Intelligence Tool for Personalized Breast Cancer Treatment Plans based on the NCCN Guidelines |
Not Available |
intro, capability, other |
| Development and validation of an autonomous artificial intelligence agent for clinical decision-making in oncology |
Not Available |
application, other |
| Differential privacy for medical deep learning: methods, tradeoffs, and deployment implications |
Not Available |
safety |
| Discovering Pathology Rationale and Token Allocation for Efficient Multimodal Pathology Reasoning |
Not Available |
evaluation |
| DispatchMAS: Fusing taxonomy and artificial intelligence agents for emergency medical services |
Not Available |
task, other |
| Doctor-R1: Mastering Clinical Inquiry with Experiential Agentic Reinforcement Learning |
GitHub |
capability, application |
| Doctoragent-rl: A multi-agent collaborative reinforcement learning system for multi-turn clinical dialogue |
GitHub |
task |
| Dr. Copilot: A Multi-Agent Prompt Optimized Assistant for Improving Patient-Doctor Communication in Romanian |
Not Available |
capability, task |
| Drugagent: Multi-agent large language model-based reasoning for drug-target interaction prediction |
Not Available |
task |
| EH-Benchmark: Ophthalmic hallucination benchmark and agent-driven top-down traceable reasoning workflow |
GitHub |
other |
| Ehr-mcp: Real-world evaluation of clinical information retrieval by large language models via model context protocol |
Not Available |
evaluation |
| Emerging cyber attack risks of medical ai agents |
Not Available |
safety |
| Enhancing diagnostic capability with multi-agents conversational large language models |
GitHub |
capability, task, application, other |
| Enhancing Medical Lung X-Ray Diagnosis Through Multi-Agent Vision-Language Model Collaboration |
Not Available |
capability, application |
| Ensuring Safety and Trust: Analyzing the Risks of Large Language Models in Medicine |
Not Available |
evaluation |
| Evaluating the accuracy of a state-of-the-art large language model for prediction of admissions from the emergency room |
Not Available |
evaluation |
| Evaluating transparency in AI/ML model characteristics for FDA-reviewed medical devices |
Not Available |
safety |
| Explainable AI for medical data: Current methods, limitations, and future directions |
Not Available |
safety |
| Eyecaregpt: Boosting comprehensive ophthalmology understanding with tailored dataset, benchmark and model |
GitHub |
application, other |
| Feat: A multi-agent forensic ai system with domain-adapted large language model for automated cause-of-death analysis |
GitHub |
capability |
| Fine-tuning vision language models with graph-based knowledge for explainable medical image analysis |
Not Available |
capability |
| FRAME: Feedback-Refined Agent Methodology for Enhancing Medical Research Insights |
GitHub |
other |
| GEMA-Score: Granular Explainable Multi-Agent Score for Radiology Report Evaluation |
GitHub |
other |
| Geometry-preserving encoder/decoder in latent generative models |
GitHub |
evaluation |
| GMAT: Grounded Multi-agent Clinical Description Generation for Text Encoder in Vision-Language MIL for Whole Slide Image Classification |
Not Available |
task |
| Haibu Mathematical-Medical Intelligent Agent: Enhancing Large Language Model Reliability in Medical Tasks via Verifiable Reasoning Chains |
Not Available |
capability |
| Demo: Healthcare Agent Orchestrator (HAO) for Patient Summarization in Molecular Tumor Boards |
Not Available |
sections: task, other |
| Healthcare Agent: Eliciting the Power of Large Language Models for Medical Consultation |
Not Available |
capability, application |
| Image Segmentation Using Only” Better or Worse” Expert Feedback |
Not Available |
task |
| Improving Interactive Diagnostic Ability of a Large Language Model Agent Through Clinical Experience Learning |
GitHub |
capability, application |
| In-Basket Message Volume in Primary Care: A Cross-sectional Analysis by Gender and Specialty |
Not Available |
intro |
| KERAP: A Knowledge-Enhanced Reasoning Approach for Accurate Zero-shot Diagnosis Prediction Using Multi-agent LLMs |
GitHub |
capability |
| Large language models in real-world clinical workflows: a systematic review of applications and implementation |
Not Available |
evaluation |
| Learning to be a doctor: Searching for effective medical agent architectures |
Not Available |
capability, application, other |
| Lessons Learned from Evaluation of LLM based Multi-agents in Safer Therapy Recommendation |
GitHub |
task, other |
| LINS: A general medical Q\&A framework for enhancing the quality and credibility of LLM-generated responses |
GitHub |
capability |
| Llms can simulate standardized patients via agent coevolution |
GitHub |
task, application |
| M3Builder: A Multi-Agent System for Automated Machine Learning in Medical Imaging |
GitHub |
task |
| Magnetic Milli-Spinner for Robotic Endovascular Surgery |
Not Available |
task |
| MAM: Modular Multi-Agent Framework for Multi-Modal Medical Diagnosis via Role-Specialized Collaboration |
GitHub |
application, other |
| Mdteamgpt: A self-evolving llm-based multi-agent framework for multi-disciplinary team medical consultation |
GitHub |
task, application |
| Measurement to Meaning: A Validity-Centered Framework for AI Evaluation |
Not Available |
evaluation |
| Med-TAMARA: Trust-Aware Multi-Agent Risk Assessment in Medical AI Dialogue |
Not Available |
capability, application |
| Med-VRAgent: A Framework for Medical Visual Reasoning-Enhanced Agents |
GitHub |
capability, application |
| MedAgent-Pro: Towards Evidence-Based Multi-Modal Medical Diagnosis via Reasoning Agentic Workflow |
Not Available |
safety, evaluation, other |
| MedAgentAudit: Diagnosing and Quantifying Collaborative Failure Modes in Medical Multi-Agent Systems |
GitHub |
capability, safety, evaluation |
| MedAgentBench: Dataset for Benchmarking LLMs as Agents |
GitHub |
evaluation |
| MedAgentBoard: Benchmarking Multi-Agent Collaboration with Conventional Methods for Diverse Medical Tasks |
GitHub |
evaluation |
| MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning |
GitHub |
evaluation |
| MedAgentSim: Self-evolving Multi-agent Simulations for Realistic Clinical Interactions |
GitHub |
task, other |
| MedBrowseComp: Benchmarking Medical Deep Research and Computer Use |
GitHub |
evaluation |
| Medchat: A multi-agent framework for multimodal diagnosis with large language models |
GitHub |
other |
| MedCoAct: Confidence-Aware Multi-Agent Collaboration for Complete Clinical Decision |
Not Available |
capability, application |
| Meddxagent: A unified modular agent framework for explainable automatic differential diagnosis |
GitHub |
other |
| MedFact: A Large-scale Chinese Dataset for Evidence-based Medical Fact-Checking of LLM Responses |
GitHub |
evaluation |
| Medhallu: A comprehensive benchmark for detecting medical hallucinations in large language models |
GitHub |
safety |
| Mediator-guided multi-agent collaboration among open-source models for medical decision-making |
Not Available |
task, other |
| Medical AI Consensus: A Multi-Agent Framework for Radiology Report Generation and Evaluation |
Not Available |
capability, task |
| Medical hallucinations in foundation models and their impact on healthcare |
GitHub |
safety |
| Position: Medical large language model benchmarks should prioritize construct validity |
Not Available |
evaluation |
| MedicalOS: An {LLM} Agent based Operating System for Digital Healthcare |
Not Available |
capability, application |
| MedKGent: A Large Language Model Agent Framework for Constructing Temporally Evolving Medical Knowledge Graph |
GitHub |
task |
| MedKGEval: A Knowledge Graph-Based Multi-Turn Evaluation Framework for Open-Ended Patient Interactions with Clinical LLMs |
Not Available |
task |
| MedLA: A Logic-Driven Multi-Agent Framework for Complex Medical Reasoning with Large Language Models |
GitHub |
application |
| Medmmv: A controllable multimodal multi-agent framework for reliable and verifiable clinical reasoning |
Not Available |
task, safety, evaluation, other |
| MedOrch: Medical Diagnosis with Tool-Augmented Reasoning Agents for Flexible Extensibility |
Not Available |
task, other |
| MedPAO: A Protocol-Driven Agent for Structuring Medical Reports |
GitHub |
capability, task, application, other |
| Medrax: Medical reasoning agent for chest x-ray |
GitHub |
capability, application, other |
| MedRepBench: A Comprehensive Benchmark for Medical Report Interpretation |
Not Available |
capability, evaluation |
| Medresearcher-r1: Expert-level medical deep researcher via a knowledge-informed trajectory synthesis framework |
GitHub |
evaluation |
| MedTrust-RAG: Evidence Verification and Trust Alignment for Biomedical Question Answering |
Not Available |
capability, task |
| MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding |
GitHub |
evaluation |
| MEGA-RAG: a retrieval-augmented generation framework with multi-evidence guided answer refinement for mitigating hallucinations of LLMs in public health |
Not Available |
safety |
| MMedAgent-RL: Optimizing Multi-Agent Collaboration for Multimodal Medical Reasoning |
Not Available |
task, other |
| MoMA: A Mixture-of-Multimodal-Agents Architecture for Enhancing Clinical Prediction Modelling |
Not Available |
capability, task, other |
| MRGAgents: A Multi-Agent Framework for Improved Medical Report Generation with Med-LVLMs |
Not Available |
task, other |
| Multi agent based medical assistant for edge devices |
GitHub |
safety |
| Multi-Agent-as-Judge: Aligning LLM-Agent-Based Automated Evaluation with Multi-Dimensional Human Evaluation |
Not Available |
evaluation |
| Multi-OphthaLingua: A Multilingual Benchmark for Assessing and Debiasing LLM Ophthalmological QA in {LMIC}s |
Not Available |
capability, application |
| Multimodal Models in Healthcare: Methods, Challenges, and Future Directions for Enhanced Clinical Decision Support |
Not Available |
intro |
| NurseLLM: The First Specialized Language Model for Nursing |
Not Available |
capability |
| OAAgent: Multimodal LLM Agent for Predicting Knee Osteoarthritis Progression |
Not Available |
capability |
| OpenLens AI: Fully Autonomous Research Agent for Health Infomatics |
GitHub |
capability, application |
| PASS: Probabilistic Agentic Supernet Sampling for Interpretable and Adaptive Chest X-Ray Reasoning |
GitHub |
other |
| Pathfinder: A multi-modal multi-agent system for medical diagnostic decision-making applied to histopathology |
GitHub |
task |
| Patient-Zero: A Unified Framework for Real-Record-Free Patient Agent Generation |
Not Available |
capability, application, safety |
| Performance of Retrieval-Augmented Generation Large Language Models in Guideline-Concordant Prostate-Specific Antigen Testing: Comparative Study With Junior Clinicians |
Not Available |
evaluation |
| Privacy in action: Towards realistic privacy mitigation and evaluation for llm-powered agents |
GitHub |
safety |
| Proactive Reasoning-with-Retrieval Framework for Medical Multimodal Large Language Models |
GitHub |
capability |
| Program Synthesis Dialog Agents for Interactive Decision-Making |
Not Available |
safety |
| Proof-of-TBI–Fine-Tuned Vision Language Model Consortium and OpenAI-o3 Reasoning LLM-Based Medical Diagnosis Support System for Mild Traumatic Brain Injury (TBI) Prediction |
Not Available |
capability, application |
| Rapidly benchmarking large language models for diagnosing comorbid patients: comparative study leveraging the LLM-as-a-judge method |
Not Available |
evaluation |
| Real-World Evaluation of Large Language Models in Healthcare (RWE-LLM): A New Realm of AI Safety \& Validation |
Not Available |
evaluation |
| Red-teaming llm multi-agent systems via communication attacks |
Not Available |
safety |
| Reducing Hallucinations and Trade-Offs in Responses in Generative AI Chatbots for Cancer Information: Development and Evaluation Study |
Not Available |
safety |
| ReflecTool: Towards Reflection-Aware Tool-Augmented Clinical Agents |
GitHub |
capability, application |
| Resilient Multi-Agent Negotiation for Medical Supply Chains: Integrating LLMs and Blockchain for Transparent Coordination |
Not Available |
task |
| RxLens: Multi-Agent LLM-powered Scan and Order for Pharmacy |
Not Available |
capability, application, other |
| SCOPE: Speech-Guided COllaborative PErception Framework for Surgical Scene Segmentation |
Not Available |
task, other |
| Self-Assessment of Content, Pedagogy, and Technology Knowledge among Higher Education Academics in Bahrain |
Not Available |
capability |
| Silence is Not Consensus: Disrupting Agreement Bias in Multi-Agent LLMs via Catfish Agent for Clinical Decision Making |
Not Available |
capability |
| SmartState: An Automated Research Protocol Adherence System |
GitHub |
capability, application |
| SOLVE-Med: Specialized Orchestration for Leading Vertical Experts across Medical Specialties |
GitHub |
task |
| Standard Applicability Judgment and Cross-jurisdictional Reasoning: A RAG-based Framework for Medical Device Compliance |
Not Available |
capability |
| Surgraw: Multi-agent workflow with chain-of-thought reasoning for surgical intelligence |
GitHub |
task |
| Survey and improvement strategies for gene prioritization with large language models |
Not Available |
task, other |
| Talk Before You Retrieve: Agent-Led Discussions for Better RAG in Medical QA |
GitHub |
capability |
| TAMAS: Benchmarking Adversarial Risks in Multi-Agent LLM Systems |
Not Available |
safety |
| The evaluation illusion of large language models in medicine |
Not Available |
evaluation |
| Tiered Agentic Oversight: A Hierarchical Multi-Agent System for AI Safety in Healthcare |
GitHub |
capability, safety |
| Tool learning with large language models: A survey |
GitHub |
intro |
| Towards conversational diagnostic artificial intelligence |
Not Available |
task |
| Towards interpretable radiology report generation via concept bottlenecks using a multi-agentic rag |
GitHub |
task |
| Towards safe ai clinicians: A comprehensive study on large language model jailbreaking in healthcare |
Not Available |
safety |
| Transforming healthcare delivery with conversational AI platforms |
Not Available |
safety |
| Tree-based RAG-Agent Recommendation System: A Case Study in Medical Test Data |
Not Available |
capability, task, application |
| Tree-of-Reasoning: Towards Complex Medical Diagnosis via Multi-Agent Reasoning with Evidence Tree |
GitHub |
safety |
| Trustworthy Reasoning: Evaluating and Enhancing Factual Accuracy in LLM Intermediate Thought Processes |
Not Available |
evaluation |
| TxAgent: An AI agent for therapeutic reasoning across a universe of tools |
GitHub |
intro, task, other |
| Using large language models for enhanced fraud analysis and detection in blockchain based health insurance claims |
GitHub |
capability, application |
| Vision-language model for report generation and outcome prediction in CT pulmonary angiogram |
GitHub |
capability, application |
| Visual-Conversational Interface for Evidence-Based Explanation of Diabetes Risk Prediction |
GitHub |
intro |
| When Avatars Have Personality: Effects on Engagement and Communication in Immersive Medical Training |
Not Available |
application |
| World Model for AI Autonomous Navigation in Mechanical Thrombectomy |
GitHub |
capability, task, application |
| Zero-Shot Large Language Model Agents for Fully Automated Radiotherapy Treatment Planning |
Not Available |
capability, application, other |
| Bias-Aware Agent: Enhancing Fairness in AI-Driven Knowledge Retrieval |
GitHub |
safety |
| EMR-AGENT: Automating Cohort and Feature Extraction from EMR Databases |
GitHub |
capability, task, application |
| Colacare: Enhancing electronic health record modeling through large language model-driven multi-agent collaboration |
GitHub |
capability, application |
| A Knowledge-driven Adaptive Collaboration of LLMs for Enhancing Medical Decision-making |
GitHub |
capability, application |
| MeNTi: Bridging medical calculator and LLM agent with nested tool calling |
GitHub |
capability, application |
| Med-GRIM: Enhanced Zero-Shot Medical VQA using prompt-embedded Multimodal Graph RAG |
GitHub |
application, other |
| How Can We Diagnose and Treat Bias in Large Language Models for Clinical Decision-Making? |
Not Available |
evaluation |