This document describes the complete data flow for Google Calendar events and Notability PDF notes through the Medallion Architecture.
Both Google Calendar events and Notability PDFs flow through the Medallion Architecture (Bronze → Silver → PostgreSQL), with cross-references between Lessons and Notes.
flowchart TB
subgraph Sources["📁 Source Layer"]
GoogleCal["Google Calendar API<br/>
[email protected]"]
NotabilityPDFs["Notability PDFs<br/>MinIO: bronze-education/<br/>notability/Priveles/{subject}/{student}/{file}.pdf"]
end
subgraph BronzeCal["🥉 Bronze: Calendar Events"]
CalConnector["GoogleCalendarConnector<br/>Fetches events via API<br/>Extracts: id, summary, start,<br/>end, location, attendees"]
CalIngester["GoogleCalendarIngester<br/>Stores per event<br/>Path: calendar/events/<br/>{year}/{month}/{eventId}.json"]
CalBronzeStorage["Bronze Calendar Storage<br/>- Individual event JSON files<br/>- Raw Google Calendar data"]
end
subgraph BronzeNotes["🥉 Bronze: Notability PDFs"]
NotesConnector["NotabilityConnector<br/>Scans MinIO for PDFs<br/>Extracts: filename, path,<br/>studentName, subject, date"]
NotesIngester["NotabilityIngester<br/>Stores metadata JSON<br/>Path: notability/metadata/<br/>{subject}/{student}/{file}.json"]
NotesBronzeStorage["Bronze Notes Storage<br/>- PDF files (original)<br/>- Metadata JSON files"]
end
subgraph SilverCal["🥈 Silver: Calendar Enrichment"]
CalSilverProcessor["CalendarSilverProcessor<br/>(Auto-triggered after Bronze)"]
CalEnrichment["Enrichment Logic<br/>- Student matching<br/>- Duration calculation<br/>- Time categorization<br/>- Topic extraction"]
CalSilverStorage["Silver Calendar Storage<br/>calendar/processed/<br/>{year}/{month}/{eventId}.json<br/>+ enrichments metadata"]
end
subgraph SilverNotes["🥈 Silver: Notes Enrichment"]
NotesSilverProcessor["NotabilitySilverProcessor<br/>(Auto-triggered after Bronze)"]
AIAnalysis["AI Analysis Service<br/>OpenAI GPT-4<br/>- Extract subject/topic<br/>- Generate summary<br/>- Extract keywords<br/>- Classify level/schoolYear"]
TaxonomyService["TaxonomyService<br/>- Resolve subject IDs<br/>- Resolve topic groups<br/>- Validate taxonomy"]
ThumbnailGen["Thumbnail Generator<br/>(Separate Script)<br/>process-thumbnails.mjs<br/>- Generate small/medium/large<br/>- Store in Silver"]
NotesSilverStorage["Silver Notes Storage<br/>- {path}.metadata.json<br/> (AI analysis results)<br/>- thumbnails/{path}/{size}.png"]
end
subgraph Database["🗄️ PostgreSQL Database"]
CalDBSync["CalendarDatabaseSync<br/>Reads Bronze calendar files"]
NotesDBSync["NotabilityDatabaseSync<br/>Reads Bronze + Silver"]
StudentMatch["Student Matching<br/>findBestStudentMatch()<br/>Fuzzy name matching<br/>Used by both flows"]
LessonLink["Lesson Linking<br/>Enhanced matching:<br/>- Date proximity (±14 days)<br/>- Time of day<br/>- Subject matching<br/>- Confidence scoring"]
LessonTable["Lesson Table<br/>- studentId (FK)<br/>- start, end<br/>- googleCalendarEventId<br/>- Linked from Calendar"]
NoteTable["Note Table<br/>- studentId (FK)<br/>- lessonId (FK) → Lesson<br/>- subject, topicGroup, topic<br/>- keywords, summary<br/>- datalakePath<br/>- level, schoolYear<br/>- Linked from Notes"]
end
subgraph Apps["📱 Applications"]
Dashboard["privelessen-dashboard<br/>- Reads Lessons from PostgreSQL<br/>- Reads Notes from PostgreSQL<br/>- Displays calendar + notes<br/>- Links lessons to notes"]
Aantekeningen["aantekeningen-app<br/>- Reads from datalake<br/>- Reads Notes from PostgreSQL<br/>- Displays PDFs + metadata<br/>- Shows thumbnails<br/>- Links to lessons"]
end
subgraph External["🔌 External Services"]
OpenAI["OpenAI API<br/>GPT-4 Analysis"]
TaxonomyDB["Taxonomy Database<br/>Subjects, Topics,<br/>Topic Groups"]
end
%% Calendar Flow: Source to Bronze
GoogleCal -->|"Cron: Every 15min<br/>or Manual Trigger"| CalConnector
CalConnector -->|"CalendarRecord[]"| CalIngester
CalIngester --> CalBronzeStorage
%% Calendar Flow: Bronze to Silver (auto)
CalBronzeStorage -->|"Auto-triggered<br/>after Bronze ingestion"| CalSilverProcessor
CalSilverProcessor -->|"Enrich events"| CalEnrichment
CalEnrichment -->|"Student matching"| StudentMatch
CalEnrichment --> CalSilverStorage
%% Calendar Flow: Bronze to Database
CalBronzeStorage -->|"Read event JSON files"| CalDBSync
CalDBSync -->|"Extract student name<br/>from event summary"| StudentMatch
StudentMatch -->|"studentId"| CalDBSync
CalDBSync -->|"Create/Update"| LessonTable
%% Notes Flow: Source to Bronze
NotabilityPDFs -->|"Cron: Every 6h<br/>or Manual Trigger"| NotesConnector
NotesConnector -->|"NotabilityRecord[]"| NotesIngester
NotesIngester --> NotesBronzeStorage
%% Notes Flow: Bronze to Silver (auto)
NotesBronzeStorage -->|"Auto-triggered<br/>after Bronze ingestion"| NotesSilverProcessor
NotesSilverProcessor -->|"Download PDF"| AIAnalysis
AIAnalysis -->|"API Call"| OpenAI
AIAnalysis -->|"Load Taxonomy"| TaxonomyService
TaxonomyService -->|"Query"| TaxonomyDB
AIAnalysis -->|"Enriched Metadata"| NotesSilverStorage
NotesBronzeStorage -->|"Separate Script<br/>process-thumbnails.mjs"| ThumbnailGen
ThumbnailGen --> NotesSilverStorage
%% Notes Flow: Bronze/Silver to Database
NotesBronzeStorage -->|"Read metadata JSON"| NotesDBSync
NotesSilverStorage -->|"Read AI metadata"| NotesDBSync
NotesDBSync -->|"Match student name"| StudentMatch
StudentMatch -->|"studentId"| NotesDBSync
NotesDBSync -->|"Enhanced matching:<br/>date + time + subject"| LessonLink
LessonLink -->|"lessonId (if confidence ≥50%)"| NotesDBSync
NotesDBSync -->|"Create/Update"| NoteTable
%% Cross-References
LessonTable -->|"FK: lessonId"| NoteTable
NoteTable -.->|"Query by lessonId"| LessonTable
%% Database to Apps
LessonTable -->|"Query via Prisma"| Dashboard
NoteTable -->|"Query via Prisma"| Dashboard
NoteTable -->|"Query via Prisma"| Aantekeningen
NotesSilverStorage -->|"Presigned URLs"| Aantekeningen
NotesBronzeStorage -->|"Presigned URLs"| Aantekeningen
%% Styling
classDef bronze fill:#cd7f32,stroke:#8b4513,color:#fff
classDef silver fill:#c0c0c0,stroke:#808080,color:#000
classDef database fill:#4a90e2,stroke:#2c5aa0,color:#fff
classDef app fill:#50c878,stroke:#2d8659,color:#fff
classDef external fill:#ff6b6b,stroke:#c92a2a,color:#fff
class CalBronzeStorage,CalConnector,CalIngester,NotesBronzeStorage,NotesConnector,NotesIngester bronze
class CalSilverStorage,CalSilverProcessor,CalEnrichment,NotesSilverStorage,NotesSilverProcessor,AIAnalysis,ThumbnailGen silver
class LessonTable,NoteTable,CalDBSync,NotesDBSync,StudentMatch,LessonLink database
class Dashboard,Aantekeningen app
class OpenAI,TaxonomyDB external
This allows tracking which records have been processed to Silver and identifying failures.