This document provides a detailed technical overview of the AI pipeline used in the Hospital Recommendation System.
The system is designed to provide personalized hospital recommendations based on user-described symptoms. It leverages a hybrid approach combining Large Language Models (LLM) for semantic understanding and Sentence Embeddings for efficient retrieval.
When a user enters symptoms, the system utilizes the Google Gemini to bridge the gap between "layman's terms" and "medical terminology."
"성북구" (Seongbuk-gu) is appended to the extracted department to prioritize local results during the similarity search.Unlike simple text search, this system uses a Weighted Multi-Field Embedding approach to represent hospitals in a high-dimensional vector space.
jhgan/ko-sroberta-stshospital_info.csv), a single representative vector is created by embedding four fields separately and combining them using specific weights:| Field | Weight | Description |
|---|---|---|
medical_subject | 0.4 | The primary factor for matching symptoms. |
address | 0.4 | Ensures geographic relevance. |
hospital_name | 0.1 | Provides identity context. |
opening_hours | 0.1 | Adds temporal context. |
.pt (PyTorch) files in the cache/ directory.The retrieval process finds the "mathematical nearest neighbors" to the user's query.
ko-sroberta-sts model.k hospitals (default: 10) with the highest similarity scores are selected.| Component | Specification |
|---|---|
| LLM Model | gemini |
| Embedding Model | jhgan/ko-sroberta-sts (HuggingFace) |
| Embedding Dimension | 768 dimensions |
| Similarity Metric | Cosine Similarity |
| Data Format | CSV (Pandas) |
| Vector Storage | PyTorch Serialization (.pt) |
User input symptom → Gemini (Subject Extraction) → SBERT (Query Encoding) → Cosine Similarity vs. Weighted Hospital Matrix → Top-10 Ranking → UI Display