Core Components & Architecture
Before diving into the features, let’s establish a few core components you’ll need in your Laravel app.- Gemini Service Wrapper: Create a service class in Laravel (
app/Services/GeminiService.php) to abstract the Gemini API calls. This will keep your code clean. It should have methods like:
extractTextFromImage(string $base64Image): string(Uses Gemini Vision)extractStructuredData(string $text, string $schemaPrompt): array(Uses Gemini for JSON extraction)classifyTopic(string $text, array $availableTopics): string(Uses Gemini for classification)generateCypherQuery(string $naturalLanguageQuery): string(The “RAG” part for retrieval)
-
Neo4j Service: A service to handle all Cypher queries, using a package like
laudis/neo4j-php-client. -
Call Code Generator Service: A dedicated service (
app/Services/CallCodeGenerator.php) responsible for creating your unique identifiers. This is crucial because it needs to query existing data to determine the next sequence number.
Feature 1: Expense Reporting from Image Upload
This is a classic “intelligent document processing” workflow. Architectural Blueprint:- Laravel Controller (Upload):
- Create an API endpoint, e.g.,
POST /api/documents/upload-receipt. - The controller receives the uploaded file (
$request->file('receipt')). - It performs basic validation (is it an image? size limit?).
- Crucially, it dispatches a Job. This is an async process and shouldn’t block the user’s request.
- Return an immediate response to the user like
{'message': 'Receipt received and is being processed.'}.
- Laravel Job (
ProcessReceiptJob): This is where the magic happens.
- Step A: OCR with Gemini Vision:
- Read the image file and base64-encode it.
-
Call your
GeminiService->extractTextFromImage(). Gemini Vision is excellent at this. - Step B: Structured Data Extraction with Gemini:
- Take the raw text from Step A.
- Create a detailed prompt for Gemini telling it to act as a data entry specialist and extract information into a specific JSON format.
- Prompt Engineering is Key Here:
-
Call
GeminiService->extractStructuredData()with this prompt. Parse the JSON response. - Step C: Store in MongoDB:
-
Use Laravel’s MongoDB model to create a new document in your
receiptscollection with the structured data from Step B. - Step D: Create Graph in Neo4j:
- Connect to Neo4j via your service.
- Execute a Cypher query to create the nodes and relationships. This builds your knowledge graph.
-
Nodes:
Document,Vendor,User,Department. -
Relationships:
[:UPLOADED],[:BELONGS_TO],[:INVOICED_BY].
Feature 2: Financial Data Ingesting from Chat
This is a Natural Language Understanding (NLU) task. Architectural Blueprint:- Ingestion Endpoint:
- Create a Laravel controller/endpoint (
POST /api/chat-ingest) that can be called by your app’s chatbox, or a webhook for Telegram/WhatsApp. - It receives the user’s text message, e.g.,
{"message": "paid $45.50 to Shell for gas today", "user_id": 123}. - Like before, dispatch a job:
ProcessChatMessageJob::dispatch($message, $user).
- Laravel Job (
ProcessChatMessageJob):
- Step A: Entity & Intent Recognition with Gemini:
- This is very similar to Feature 1, Step B.
- You’ll send the chat message to your
GeminiServicewith a prompt designed to extract entities. - Prompt Example:
- Step B: Data Validation & Enrichment:
- The returned JSON might be incomplete. Your job should validate it.
-
If a
payeeis found, you can look it up in yourVendorstable/node to see if it’s a known entity. - Enrich the data with user info (department, ownership).
- Step C: Store in MongoDB & Neo4j:
- Follow the same logic as Feature 1 (Steps C & D) to store the structured data and create the graph nodes/relationships.
Feature 3: Document Retrieval Interface
This is the core “Retrieval” part of RAG. You’re using the graph to answer questions. Architectural Blueprint:- Frontend (Vue/React/Livewire):
- A chat interface where the user types their query.
- A results area that can render a list/table of documents. Each row should be expandable to show metadata.
- Backend API Endpoint (
GET /api/documents/search):
- Receives the natural language query, e.g.,
?q=show me all invoices from ACME Corp in Q4 2023.
- Controller/Service Logic (The RAG Core):
- Method A (Recommended for Reliability): Entity Extraction to Query
- Send the user’s query to Gemini with a prompt to extract search parameters as JSON.
- In your Laravel code, take this clean JSON and programmatically build a precise Cypher query. This is safer than letting the LLM generate the full query.
- Method B (More Advanced): LLM-Generated Cypher
- Give Gemini your graph schema and ask it to convert the user’s question directly into a Cypher query. This is more powerful but can be less reliable.
- You must validate the generated Cypher to prevent injection or errors before executing it.
- Format Response:
- The Neo4j results will be a collection of nodes.
- For each
Documentnode, fetch its full metadata from MongoDB using themongo_idstored in the node. - Format this into a clean JSON array and send it to the frontend to be rendered.
Feature 4: Automated Categorization and Call Code Generation
This integrates directly into the upload workflow (Feature 1). Architectural Blueprint (modifyingProcessReceiptJob):
In your ProcessReceiptJob or a similar job for general documents:
- After OCR (Step A):
- Take the full document text.
- New Step: Topic Classification:
- Call your
GeminiService->classifyTopic(). - The prompt must include the list of predefined topics and their codes.
- This will return a
topic_codelikeFIN-01.
- New Step: Feature Extraction:
- Similar to the classification step, create a prompt to find the specific value for the
featurepart of your Call Code. - You can even make this a multi-step call: first classify, then based on the classification, use a different prompt to extract the feature.
- If
topic_codeisFIN-01, prompt for “Invoice Number”. - If it’s an “SO”, prompt for “Sales Order No.”.
- This will return the
feature_value, e.g.,"INV-98765".
- Final Step: Call Code Generation:
- Now you have all the pieces:
ownership&department(from theauth()->user())topic_code(from classification step)feature_value(from feature extraction step)- Call your
CallCodeGeneratorService. - This service will query Neo4j/MongoDB to find the highest sequence number for the
ownership-department-topiccombination and calculate the next one.
- The service constructs the full Call Code:
USER1-FINANCE-FIN01-INV98765-0012. - Save this Call Code to both your MongoDB document and the Neo4j node properties.
Streaming Result
Excellent question. This shows you’re thinking deeply about user experience. A long-running AI generation process can feel slow, and streaming the result token-by-token (like ChatGPT does) dramatically improves the perceived performance and engagement. First, let’s clarify the library name. Laravel Prism is typically a wrapper for the Prism.js syntax highlighter. It’s possible you’re thinking of a different library, but the concept you’re describing—streaming a response to the browser—is most commonly and efficiently achieved in Laravel using Server-Sent Events (SSE) with Laravel’s built-inStreamedResponse.
This is a perfect fit for your Feature 3 (Document Retrieval Interface). Let’s design the architecture for that.
Why Server-Sent Events (SSE)?
- Simple: It’s a standard web technology that works over regular HTTP. No complex WebSocket servers are needed.
- Efficient: It’s designed for one-way communication (server to client), which is exactly what you need for an AI response.
- Laravel Support: Laravel has a first-class
Symfony\Component\HttpFoundation\StreamedResponseobject that makes implementing SSE straightforward.
Architectural Blueprint for Streamed Results (Feature 3)
We’ll modify the architecture for Feature 3 to incorporate streaming.1. Modify Your GeminiService
Your service needs to be able to handle a streaming request to the Gemini API. Most modern AI SDKs support this. Instead of returning a single string, the method will now yield chunks of the response as they arrive.
2. Create the Streaming Controller Endpoint
This controller will return aStreamedResponse. This response type keeps the HTTP connection open and allows you to send data chunks.
- Stream the natural language answer.
- Once the stream is complete, send a final event containing the structured JSON for the table, or have the frontend make a second, non-streamed API call to fetch just the structured data.
3. Frontend Implementation (JavaScript with EventSource)
Your frontend will use the native EventSource API to listen to the stream.
Applying Streaming to Your Other Features
- Feature 4 (Document Upload): While you could use SSE, this is a better fit for Laravel Echo and WebSockets/Pusher. The process is asynchronous (a queued job).
- User uploads a file.
- The backend dispatches the job and returns an immediate response with a unique
upload_id. - The frontend subscribes to a private channel, e.g.,
upload-status.${upload_id}. - Your
ProcessReceiptJobbroadcasts events at each stage:
broadcast(new UploadStatusUpdated($uploadId, 'Extracting text from image...'))broadcast(new UploadStatusUpdated($uploadId, 'Classifying topic...'))broadcast(new UploadStatusUpdated($uploadId, 'Generating Call Code...'))broadcast(new UploadStatusUpdated($uploadId, 'Complete!', $finalData))
- The frontend listens for these events and updates a progress indicator in the UI.
Summary
- Use Server-Sent Events (SSE) with
StreamedResponsefor Feature 3. This is ideal for streaming the live token-by-token output of an AI model in a request-response cycle. - Use Laravel Echo (with Pusher or a self-hosted WebSocket server) for Feature 4. This is the standard, robust way to provide real-time updates on the status of an asynchronous background job.
Key Advantages of Using Prism for Your Project
- Unified API: You’re using Gemini now, but what if you want to test a model from Anthropic or a local Ollama instance later? Prism lets you switch providers with a single line change, without rewriting your business logic.
- Robust Structured Data: Prism’s
structured()method is a game-changer. Instead of manually crafting complex JSON prompts and parsing the output, you can map the LLM’s response directly to a PHP Data Transfer Object (DTO). This is more reliable, type-safe, and self-documenting. - Simplified Streaming: As you suspected, Prism has a first-class streaming API. It handles the underlying generator logic, so your controller code becomes incredibly simple and readable.
- Built-in Multi-modality: For your receipt upload feature, Prism’s ability to handle images and text in a single prompt is exactly what you need.
Revised Architectural Blueprint with Laravel Prism
Here’s how our plan for each feature evolves when you use Prism.Feature 1 & 4 Combined: Expense Reporting, Categorization & Call Code
The upload process becomes much more elegant because Prism can handle the multi-modal input and structured output in a single, fluent chain. 1. Define Your Data Structure (DTO) First, create a simple PHP class (a DTO) to hold the extracted data. This is what you’ll ask Prism to populate.ProcessReceiptJob
Your job now becomes incredibly clean.
Feature 3: Streamed Document Retrieval Interface
This is where you’ll see a huge improvement in developer experience, exactly as you intended. 1. Update the Controller (streamSearch)
The controller logic becomes a simple loop over the stream provided by Prism.
EventSource would remain exactly the same as in my previous answer. The beauty is that the backend complexity is now hidden behind Prism’s fluent ->stream() method.
Conclusion: You’re on the Right Track
Using a dedicated LLM library like Laravel Prism is the correct, modern approach. It aligns perfectly with Laravel’s philosophy of providing an elegant and expressive syntax to handle complex tasks. By adopting it, you will:- Reduce Boilerplate: You won’t need a custom
GeminiServiceclass. - Increase Reliability: Prism’s structured data features are more robust than manual JSON prompting.
- Improve Code Readability: The fluent, chainable syntax makes your intent clear.
- Future-Proof Your App: Easily swap out LLM providers as new and better models become available.
Integrating graphRAG with Prism
Excellent question. This is the crucial step that transforms your application from a simple LLM-powered chatbot into a true, context-aware Reasoning and Retrieval Engine. Integrating the Neo4j knowledge graph is what makes GraphRAG work. Here is the strategic shift in thinking:- Without RAG:
User Query->Prism->LLM->Answer - With GraphRAG:
User Query-> RAG Pipeline (You build this) ->Augmented Prompt->Prism->LLM->Answer
Step 1: The Foundation - Ingestion with Embeddings
Before you can retrieve, you must ingest data correctly. When you ingest a document (like in Feature 1), you need to add one more step: generating and storing vector embeddings. In yourProcessReceiptJob:
- Extract text from the document.
- Create a “chunk” of text that represents the document’s content (e.g., a summary, or the full text if it’s short).
- Generate Embedding: Use Prism to create a vector embedding for that text chunk.
- Store Embedding: Store this embedding as a property on the corresponding
Documentnode in Neo4j.
Step 2: The RAG Pipeline - From Query to Context
This is the core logic. We’ll create a dedicated service for this, e.g.,app/Services/GraphRagService.php. This service will have one primary job: take a user’s query and return a string of relevant context from the graph.
Method A: Structured Retrieval (for “who”, “what”, “when” questions)
This method uses an LLM to parse the user’s query into structured search parameters, then builds a precise Cypher query. 1. Create a DTO for Search Parameters:Method B: Semantic/Vector Retrieval (for “about” questions)
For questions like “find me documents about compliance issues,” a vector search is better.Step 3: The Final Generation - Putting It All Together
Now, yourDocumentController will use this GraphRagService before calling Prism for the final, streamed answer.

