AI_for_biomedical_students

Intro to Applied AI for Medics & Biomedical Scientists — Course Syllabus

A short, practical course to familiarise medics and biomedical scientists (advanced undergrads & MSc students) with contemporary AI models, prompt engineering, and simple prototype building using no-code and low-code tools.

Course Overview

This course introduces biomedical students to AI.

Level: Undergraduate/Postgraduate Prerequisites: Basic programming knowledge (any language)
Duration: 12 weeks (3 hours/week lecture + 2 hours/week lab)

Course Instructor: Soumya Banerjee

Course Website: https://neelsoumya.github.io/teaching_web_development/

Course Materials

Course content and materials can be found in the following files:


Course summary (target: MSc / advanced undergrads)

Length: 6–8 weeks (one 2-hour session/week + 2–3 hrs practical work) — or stretch to a 10-week term with extra labs. Prereqs: basic statistics (mean/SD, sensitivity/specificity) helpful; no programming required.

Overall learning goals

  1. Understand what contemporary AI models (language models, classifiers, simple vision models) do and their limitations.
  2. Be able to interact productively with LLMs (prompt engineering) and no-code AI tools.
  3. Build tiny prototypes: (a) a text prompt / summariser; (b) a simple classifier (using a public, de-identified dataset) and a browser demo (Gradio/Streamlit).
  4. Learn basic evaluation and ethical checks (bias, data leakage, privacy).

Weekly syllabus (compact 8-week version)

Week 0 — Intro & motivation

Week 1 — “What is an AI model?” (no code)

Week 2 — Prompt engineering & evaluation (practical, no code)

Week 3 — No-code model demos & sharing

Week 4 — Lightweight Python: Colab + scikit-learn (first model)

Week 5 — Small app: wrap model in a Gradio/Streamlit demo

Week 6 — Evaluation, pitfalls, and reproducibility

Week 7 — Mini project sprint presentations

Week 8 — Reflection, ethics deep-dive, next steps and resources



Datasets (use only publicly available / de-identified / synthetic data)

Suggested safe starters (small, well-documented, easy to load):

Rule: avoid any dataset containing identifiable patient info unless the students/team have IRB/ethics approval and you follow local governance.


Project ideas (easy → ambitious)

  1. Prompt & critique (easy, solo)

    • Task: craft prompts to summarise a short clinical note in plain English + produce bullet-point management suggestions.
    • Deliverable: 3 different prompts, comparison table of accuracy, one paragraph reflection on hallucinations/limits.
    • Tools: OpenAI Playground or similar. (Learn Prompting)
  2. Diagnostic checklist assistant (easy, group)

    • Task: build a Gradio demo that takes a short patient vignette and returns likely differential diagnoses (from a simple ruleset / LLM).
    • Focus: prompt design and evaluation, not clinical decision-making.
    • Deliverable: live demo on Hugging Face Spaces + short accuracy note. (Hugging Face)
  3. Basic classifier + demo (medium)

    • Task: using scikit-learn and the breast cancer dataset, train a logistic regression and expose it via a Gradio app that lets users input numeric features and see predicted class + probability.
    • Deliverable: Colab notebook + Gradio demo link; short report covering performance (accuracy, sensitivity, specificity). (scikit-learn load_breast_cancer)
  4. Text mining for literature (medium)

    • Task: given a small corpus of abstracts (public domain), build an LLM prompt pipeline to extract PICO elements (Population, Intervention, Comparison, Outcome). Evaluate precision/recall via manual labels.
    • Deliverable: Colab notebook + evaluation table.
  5. Signal classification (ambitious)

    • Task: use an open PhysioNet dataset (e.g., arrhythmia subset if accessible) to make a simple time-series classifier (feature extraction → RF/LogReg). Emphasise preprocessing + reproducibility. (PhysioNet)
  6. Clinical language assistant (advanced, ethics focus)

    • Task: build a prompt-based assistant that turns clinical notes into patient-facing explanations. Evaluate for accuracy, privacy leakage, and readability. Deliver a risk-log and mitigation plan.
  7. Reproducible demo + README (all levels)

    • Requirement: every project must include a README with: how to run, what data was used (and license), evaluation, and ethical considerations. Deploy to a Space or provide a Colab link.

Tiny Colab-ready Python starter (copy/paste into Colab)

Safe, short demo using scikit-learn’s breast cancer dataset (no PHI). Students can run and see training + accuracy in a few seconds.

# Run in Google Colab
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Load data
data = load_breast_cancer()
X, y = data.data, data.target
feature_names = data.feature_names

# Split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42, stratify=y
)

# Train
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

# Evaluate
y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Confusion matrix:\n", confusion_matrix(y_test, y_pred))
print("\nFull report:\n", classification_report(y_test, y_pred, target_names=data.target_names))

Tip: wrap model.predict_proba into a tiny Gradio app in ~5–10 lines to make a browser UI (see Gradio docs). (Gradio intro)


Assessment & grading suggestions


Ethics, safety & teaching notes (very important)



References