Java client for MinIO data lake operations. This project provides a simple interface to upload data to your MinIO-based data lake.
Run the setup script to start MinIO:
cd ~
./setup-minio.sh
This will:
~/minio-dataImportant: Save the generated password shown at the end of the script!
Copy the example config and update with your MinIO credentials:
cd ~/projects/ingestion-service
cp config.properties config.properties.local
# Edit config.properties.local with your actual credentials
Or edit config.properties directly with the password from the setup script.
# Compile the project
mvn clean compile
# Run the upload example
mvn exec:java
datalake-java/
├── pom.xml # Maven configuration
├── config.properties # MinIO credentials (DO NOT COMMIT)
├── src/main/java/
│ └── nl/stephen/datalake/
│ ├── MinIOClient.java # Core MinIO client wrapper
│ └── MinIOUploader.java # Example uploader
└── README.md
The data lake is organized into three main buckets that align with your different roles:
bronze-education)Purpose: All teaching and educational materials
wiskunde-oefeningen/ - Math exerciseslatex-slides/ - LaTeX presentation slidesvoorbeeldopgaven/ - Example problemslesplannen/ - Lesson planstoetsen/ - Tests/examscreatieve-projecten)Purpose: Creative and artistic work
fotos/ - Photosmuziek/ - Music filesdj-mixen/ - DJ mixesvideo/ - Video filesgrafisch-ontwerp/ - Graphic designcommunicatie)Purpose: Communication logs and backups
whatsapp/ - WhatsApp chat exportsemail/ - Email backupschatwoot/ - Chatwoot CRM datawebsite-forms/ - Website form submissionsmetadata/ - Communication metadataraw-whatsapp - WhatsApp chat exports (legacy)raw-email - Email data (legacy)raw-notability - Notability PDF exportsraw-website - Website form submissions (legacy)raw-chatwoot - Chatwoot CRM data (legacy)processed-text - Cleaned and processed text dataembeddings - Vector embeddings for AI/MLTo create the three main buckets with their structure:
cd ~/projects/ingestion-service
mvn compile exec:java -Dexec.mainClass="nl.stephen.datalake.BucketSetup"
To list buckets:
mvn compile exec:java -Dexec.mainClass="nl.stephen.datalake.BucketSetup" -Dexec.args="--list"
MinIOClient client = new MinIOClient();
client.uploadFile("raw-whatsapp", "myfile.txt", "/path/to/file.txt");
String content = "Hello, MinIO!";
ByteArrayInputStream stream = new ByteArrayInputStream(content.getBytes());
client.uploadStream("raw-whatsapp", "stream.txt", stream,
content.length(), "text/plain");
config.properties with real credentials to version control--run-id and --tutoring in sync scripts for traceability; paths follow tutoring/bronze/raw_uploads/<ingest_date>/run_id=<run_id>/ and _SUCCESS.json is written.raw_uploads, raw_message_exports, raw_booking_requests. See docs/projects/TUTORING_DATALAKE_CONTRACT.md.projects/platform-infrastructure/terraform/tutoring-phase1/. Set aws_region in terraform.tfvars before apply.docker psconfig.properties matches your setupconfig.propertiesdocker logs minio