Task: Design and develop a document analysis tool that automates the extraction and classification of key data contained in lease documents, thereby enhancing the efficiency and accuracy of real property management tasks. Secondly, implement a chat bot to retrieve information from the documents to efficiently answer user questions.
Method: Utilized Python (Flask) for internal API functionality and Amazon Textract for robust PDF/OCR processing, specifically focusing on extraction of relevant data in contracts. Employed techniques like regular expressions for pattern matching and LLM-based document classification, document summarization, and extraction of critical values and legal clauses. Integrated machine learning models to generate embeddings of assembled data to facilitate semantic search capabilities for retrieval.
Through implementation of HNSW indexing, cosine similarity and re-ranking search features, the OpenAI function-executing GPT bot rapidly searches and retrieves relevant information based on the context of the conversation and user question. Llama Indexing techniques allow searching across many documents without hallucination.
Result: Developed a streamlined, web-based interface for property managers and real estate professionals to upload lease and amendment documents, automatically extract pertinent information for abstraction, and store it in a structured PostgreSQL database.
The system’s embeddings-powered intelligent search functionality enables users to quickly find specific lease terms or clauses across multiple documents and generate useful reports. This in conjunction with the assistant of the chat module significantly reduces manual review time and improves decision-making processes. This, in turn, significantly decreases administrative and derivative costs associated with human personnel.