{"id":10918,"date":"2025-07-06T05:11:05","date_gmt":"2025-07-06T05:11:05","guid":{"rendered":"https:\/\/www.aihello.com\/resources\/?p=10918---a3fb58e7-6bca-42dd-b336-5c05857961ce"},"modified":"2025-09-04T13:46:57","modified_gmt":"2025-09-04T13:46:57","slug":"creating-a-knowledge-graph-without-llms","status":"publish","type":"post","link":"https:\/\/www.aihello.com\/resources\/blog\/creating-a-knowledge-graph-without-llms\/","title":{"rendered":"Creating a Knowledge Graph Without LLMs"},"content":{"rendered":"<h2 class=\"wp-block-heading\">Knowledge Graph<\/h2><p>Knowledge graphs transform complex information into interconnected networks of entities and relationships, mirroring how human brains connect concepts. Their growing adoption is linked to their enhancement of AI systems, particularly in LLMs and <a href=\"https:\/\/www.aihello.com\/resources\/blog\/building-a-rag-pipeline-with-fastapi-haystack-and-chromadb-for-urls-in-python\/\">RAG<\/a> pipelines.<\/p><figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"632\" src=\"https:\/\/www.aihello.com\/resources\/wp-content\/uploads\/2024\/12\/image-33-1024x632.png\" alt=\"Knowledge graph pipiline\" class=\"wp-image-10919\" srcset=\"https:\/\/www.aihello.com\/resources\/wp-content\/uploads\/2024\/12\/image-33-1024x632.png 1024w, https:\/\/www.aihello.com\/resources\/wp-content\/uploads\/2024\/12\/image-33-300x185.png 300w, https:\/\/www.aihello.com\/resources\/wp-content\/uploads\/2024\/12\/image-33-768x474.png 768w, https:\/\/www.aihello.com\/resources\/wp-content\/uploads\/2024\/12\/image-33-1250x771.png 1250w, https:\/\/www.aihello.com\/resources\/wp-content\/uploads\/2024\/12\/image-33.png 1400w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><p id=\"a176\">In RAG applications, knowledge graphs surpass traditional document retrieval by providing granular, relationship-rich information. Their ability to map cross-document connections creates a unified semantic layer, enabling more contextually aware retrieval and improved AI responses.<\/p><p id=\"f497\">The typical knowledge graph construction involves document chunking, followed by LLM-based entity and relationship extraction from each chunk. This process identifies key entities and establishes connections between them, creating a structured knowledge representation.<\/p><p id=\"858f\">While LLM-based knowledge graphs offer high accuracy, there are scenarios where a simpler, more cost-effective approach is needed. Using libraries like&nbsp;<strong>spaCy<\/strong>, we can create knowledge graphs without relying on LLMs, making it accessible for projects with limited <a href=\"https:\/\/www.aihello.com\/resources\/blog\/quartile-alternatives\/\">resources<\/a> or when quick prototyping is required. Though the accuracy might be lower, this approach provides a practical solution for organizations looking to experiment with knowledge graphs without the computational and financial overhead of <a href=\"https:\/\/www.aihello.com\/resources\/blog\/tokenization-and-its-application\/\">LLM<\/a> implementation.<\/p><h2 class=\"wp-block-heading\">Pipeline<\/h2><p><strong>Initial Setup and SpaCy Loading<\/strong><\/p><p>First, we\u2019ll import the necessary libraries and load SpaCy\u2019s large English model, which includes word vectors for better language understanding. This sets up our foundation for natural language processing tasks.<\/p><pre class=\"wp-block-code\"><code>import spacyfrom typing import List, Dict, Tuple, Setfrom collections import defaultdictfrom rapidfuzz import fuzz# Load the large English model which includes word vectors<a href=\"https:\/\/www.aihello.com\/resources\/blog\/tokenization-and-its-application\/\">nlp<\/a> = spacy.load(\"en_core_web_lg\")# Example usagetext = \"\"\"Bilbo Baggins celebrates his birthday and leaves the Ring to Frodo, his heir.Gandalf suspects it is a Ring of Power and counsels Frodo to take it away.\"\"\"# Process the text with SpaCydoc = nlp(text)<\/code><\/pre><p><strong>Entity Extraction<\/strong><\/p><p id=\"6565\">Here we\u2019ll identify and extract named entities from the text, ensuring we capture unique entities while preserving their original context and type information.<\/p><pre class=\"wp-block-code\"><code>def extract_entities(doc) -&gt; List&#91;Dict]:    \"\"\"Extract and categorize named entities from a SpaCy doc.\"\"\"    entities = &#91;]    unique_entities = set()        for ent in doc.ents:        if ent.text not in unique_entities:            entity_dict = {                'text': ent.text,                'type': ent.label_,                'start': ent.start_char,                'end': ent.end_char            }            entities.append(entity_dict)            unique_entities.add(ent.text)        return entities, unique_entities# Example usageentities, unique_entities = extract_entities(doc)<\/code><\/pre><p><strong>Basic Relation Extraction<\/strong><\/p><p>We\u2019ll extract subject-verb-object relationships from the text, capturing how entities interact with each other through actions and connections.<\/p><pre class=\"wp-block-code\"><code>def extract_basic_relations(doc) -&gt; List&#91;Dict]:    \"\"\"Extract subject-verb-object relations from a SpaCy doc.\"\"\"    relations = &#91;]        for token in doc:        if token.pos_ == \"VERB\":            # Find subjects            subjects = &#91;]            for subj in token.lefts:                if subj.dep_ in (\"nsubj\", \"nsubjpass\"):                    subjects.append(subj)                    # Include conjunctions (e.g., \"Sam and Frodo\")                    subjects.extend(&#91;child for child in subj.children                                   if child.dep_ == \"conj\"])                        # Find objects            objects = &#91;]            for obj in token.rights:                if obj.dep_ == \"dobj\":                    objects.append(obj)                    # Include conjunctions                    objects.extend(&#91;child for child in obj.children                                  if child.dep_ == \"conj\"])                elif obj.dep_ == \"prep\":  # Handle prepositional objects                    objects.extend(&#91;child for child in obj.children                                  if child.dep_ == \"pobj\"])                        # Create relations for each subject-object pair            for subj in subjects:                for obj in objects:                    relation = {                        'subject': subj.text,                        'relation': token.lemma_,                        'object': obj.text,                        'sentence': doc&#91;token.sent.start:token.sent.end].text                    }                    relations.append(relation)        return relations# Example usagebasic_relations = extract_basic_relations(doc)<\/code><\/pre><p><strong>Pronoun Resolution<\/strong><\/p><p>To improve the quality of our knowledge graph, we\u2019ll implement a simple pronoun resolution system that maps pronouns to their most likely antecedent entities.<\/p><pre class=\"wp-block-code\"><code>def resolve_pronouns(doc, entity_positions: List&#91;Tuple]) -&gt; Dict&#91;str, str]:    \"\"\"Create a mapping of pronouns to their likely antecedents.\"\"\"    pronouns = {\"them\", \"it\", \"they\", \"who\", \"he\", \"she\", \"him\", \"his\", \"her\", \"their\"}    pronoun_map = {}        for token in doc:        if token.text.lower() in pronouns:            # Find the nearest preceding entity            previous_entities = &#91;ent for ent in entity_positions                                if ent&#91;1] &lt; token.idx]            if previous_entities:                # Use the most recent entity as the antecedent                pronoun_map&#91;token.text] = previous_entities&#91;-1]&#91;0]        return pronoun_map# Example usageentity_positions = &#91;(ent.text, ent.start_char, ent.end_char)                    for ent in doc.ents]pronoun_map = resolve_pronouns(doc, entity_positions)<\/code><\/pre><p><strong>Fuzzy Entity Matching<\/strong><\/p><p>To handle variations in entity names, we\u2019ll implement fuzzy matching that can identify similar entities and standardize their references.<\/p><pre class=\"wp-block-code\"><code>def fuzzy_match_entity(text: str, entities: set, threshold: float = 0.9) -&gt; str:    \"\"\"Match text to the most similar entity using fuzzy matching.\"\"\"    best_match = None    best_ratio = 0    text_lower = text.lower()        for entity in entities:        entity_lower = entity.lower()                # Try exact word matching first        if text_lower in entity_lower.split() or entity_lower in text_lower.split():            if len(text) &gt; 3:  # Avoid matching very short words                return entity                # Try partial string matching        if text_lower in entity_lower or entity_lower in text_lower:            if len(text) &gt; 3:                return entity                # Calculate token sort ratio for word order differences        ratio = fuzz.token_sort_ratio(text_lower, entity_lower) \/ 100.0                # If needed, try partial ratio for substring matches        if ratio &lt; threshold:            ratio = fuzz.partial_ratio(text_lower, entity_lower) \/ 100.0                    if ratio &gt; threshold and ratio &gt; best_ratio:            best_match = entity            best_ratio = ratio        return best_match if best_match else text# Example usagetest_text = \"Baggin\"  # Misspelled namematched_entity = fuzzy_match_entity(test_text, unique_entities)<\/code><\/pre><p><strong>Extracting Relations From Text<\/strong><\/p><p>Finally, we\u2019ll combine all our components into a single pipeline that processes text and generates a complete knowledge graph representation.<\/p><pre class=\"wp-block-code\"><code>def process_text_pipeline(text: str) -&gt; Dict:    \"\"\"Process text through the complete pipeline.\"\"\"    # Step 1: Process with SpaCy    doc = nlp(text)        # Step 2: Extract entities    entities, unique_entities = extract_entities(doc)        # Step 3: Get entity positions for pronoun resolution    entity_positions = &#91;(ent&#91;'text'], ent&#91;'start'], ent&#91;'end'])                        for ent in entities]        # Step 4: Create pronoun mapping    pronoun_map = resolve_pronouns(doc, entity_positions)        # Step 5: Extract basic relations    relations = extract_basic_relations(doc)        # Step 6: Resolve pronouns and fuzzy match entities in relations    resolved_relations = &#91;]    for rel in relations:        resolved_rel = rel.copy()                # Resolve subject        if rel&#91;'subject'].lower() in pronoun_map:            resolved_rel&#91;'subject'] = pronoun_map&#91;rel&#91;'subject'].lower()]        else:            resolved_rel&#91;'subject'] = fuzzy_match_entity(                rel&#91;'subject'], unique_entities            )                # Resolve object        if rel&#91;'object'].lower() in pronoun_map:            resolved_rel&#91;'object'] = pronoun_map&#91;rel&#91;'object'].lower()]        else:            resolved_rel&#91;'object'] = fuzzy_match_entity(                rel&#91;'object'], unique_entities            )                resolved_relations.append(resolved_rel)        return {        'entities': entities,        'relations': resolved_relations,        'pronoun_map': pronoun_map    }# Example usage of the complete pipelinetext = \"\"\"Bilbo Baggins celebrates his birthday and leaves the Ring to Frodo, his heir. Gandalf suspects it is a Ring of Power and counsels Frodo to take it away.\"\"\"result = process_text_pipeline(text)<\/code><\/pre><p><strong>Creating the Knowledge Graph<\/strong><\/p><p>First, we\u2019ll create a class to handle our knowledge graph operations using NetworkX:<\/p><pre class=\"wp-block-code\"><code>import networkx as nximport matplotlib.pyplot as pltfrom typing import List, Dictimport loggingclass LORKnowledgeGraph:    def __init__(self):        \"\"\"Initialize the NetworkX graph with multi-directed edges.\"\"\"        self.G = nx.MultiDiGraph()        logging.basicConfig(level=logging.INFO)        self.logger = logging.getLogger(__name__)    def create_entity_node(self, entity: str):        \"\"\"Create a node with sanitized label.\"\"\"        label = entity.replace(' ', '_').replace('-', '_').replace(\"'\", '')        if not label&#91;0].isalpha():            label = 'N_' + label        self.G.add_node(label, name=entity)            def create_relationship(self, subject: str, relation: str, object: str):        \"\"\"Create a relationship between two nodes.\"\"\"        subject_label = subject.replace(' ', '_').replace('-', '_').replace(\"'\", '')        object_label = object.replace(' ', '_').replace('-', '_').replace(\"'\", '')                if not subject_label&#91;0].isalpha():            subject_label = 'N_' + subject_label        if not object_label&#91;0].isalpha():            object_label = 'N_' + object_label                    relation_type = relation.upper().replace(' ', '_')        self.G.add_edge(subject_label, object_label, relation=relation_type)    def build_from_relations(self, relations: List&#91;Dict]):        \"\"\"Build the graph from extracted relations.\"\"\"        for relation in relations:            self.create_entity_node(relation&#91;'subject'])            self.create_entity_node(relation&#91;'object'])            self.create_relationship(                relation&#91;'subject'],                relation&#91;'relation'],                relation&#91;'object']            )<\/code><\/pre><p id=\"ed84\">Now, let\u2019s extend our pipeline to process multiple text segments and build a knowledge graph:<\/p><pre class=\"wp-block-code\"><code>def process_multiple_texts(texts: List&#91;str]) -&gt; nx.MultiDiGraph:    \"\"\"Process multiple text segments and create a knowledge graph.\"\"\"    # Initialize the knowledge graph    kg = LORKnowledgeGraph()        # Process each text segment    for text in texts:        # Process text through our existing pipeline        result = process_text_pipeline(text)                # Add extracted relations to the knowledge graph        kg.build_from_relations(result&#91;'relations'])        return kg.Gdef visualize_knowledge_graph(G: nx.MultiDiGraph, figsize=(16, 12)):    \"\"\"Visualize the knowledge graph.\"\"\"    plt.figure(figsize=figsize)        # Calculate node sizing based on degree centrality    degrees = dict(G.degree())    node_sizes = &#91;1500 + (degrees&#91;node] * 100) for node in G.nodes()]        # Use Graphviz layout for better visualization    pos = nx.nx_agraph.graphviz_layout(G, prog='neato')        # Draw the graph with curved edges and node labels    nx.draw_networkx_edges(G, pos,                          edge_color='gray',                         alpha=0.4,                         arrows=True,                         connectionstyle='arc3,rad=0.2')        nx.draw_networkx_nodes(G, pos,                         node_color='lightblue',                         node_size=node_sizes,                         alpha=0.7)        # Add labels    node_labels = nx.get_node_attributes(G, 'name')    edge_labels = nx.get_edge_attributes(G, 'relation')        nx.draw_networkx_labels(G, pos, node_labels, font_size=8)    nx.draw_networkx_edge_labels(G, pos, edge_labels, font_size=6)        plt.title(\"Knowledge Graph Visualization\")    plt.axis('off')    return plt<\/code><\/pre><p><strong>Final Result<\/strong><\/p><p>Here\u2019s what the final result on multiple chunks of text from Lord of the Rings looks like:<br><\/p><figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"615\" src=\"https:\/\/www.aihello.com\/resources\/wp-content\/uploads\/2024\/12\/creating-a-knowledge-graph-without-llms-1024x615.png\" alt=\"\" class=\"wp-image-10920\" srcset=\"https:\/\/www.aihello.com\/resources\/wp-content\/uploads\/2024\/12\/creating-a-knowledge-graph-without-llms-1024x615.png 1024w, https:\/\/www.aihello.com\/resources\/wp-content\/uploads\/2024\/12\/creating-a-knowledge-graph-without-llms-300x180.png 300w, https:\/\/www.aihello.com\/resources\/wp-content\/uploads\/2024\/12\/creating-a-knowledge-graph-without-llms-768x461.png 768w, https:\/\/www.aihello.com\/resources\/wp-content\/uploads\/2024\/12\/creating-a-knowledge-graph-without-llms-1250x751.png 1250w, https:\/\/www.aihello.com\/resources\/wp-content\/uploads\/2024\/12\/creating-a-knowledge-graph-without-llms.png 1400w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><h2 class=\"wp-block-heading\">Conclusion<\/h2><p>This lightweight approach to knowledge graph construction demonstrates that valuable semantic networks can be created without relying on large language models. While LLM-based approaches may offer higher accuracy and more nuanced relationship extraction, the combination of SpaCy, fuzzy matching, and basic NLP techniques provides a robust foundation for organizations looking to experiment with knowledge graphs. This method not only offers significant cost savings but also provides greater transparency and control over the extraction process. As knowledge graphs continue to evolve as critical components of modern AI systems, having accessible, LLM-free alternatives ensures that organizations of all sizes can harness the power of structured knowledge representation for their specific use cases.<\/p>","protected":false},"excerpt":{"rendered":"<p>Unlock the power of knowledge graphs for AI, LLMs &#038; RAG. Discover how to build efficient, cost-effective KGs with SpaCy: entity and relation extraction made easy.<\/p>\n","protected":false},"author":30,"featured_media":10919,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[103,163,85],"tags":[1182,1179,1149,1181,1180],"class_list":["post-10918","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-machine-learning","category-resources","category-tips","tag-ai-systems","tag-knowledge-graph","tag-llm","tag-networks","tag-pipeline"],"_links":{"self":[{"href":"https:\/\/www.aihello.com\/resources\/wp-json\/wp\/v2\/posts\/10918","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.aihello.com\/resources\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aihello.com\/resources\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aihello.com\/resources\/wp-json\/wp\/v2\/users\/30"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aihello.com\/resources\/wp-json\/wp\/v2\/comments?post=10918"}],"version-history":[{"count":2,"href":"https:\/\/www.aihello.com\/resources\/wp-json\/wp\/v2\/posts\/10918\/revisions"}],"predecessor-version":[{"id":12432,"href":"https:\/\/www.aihello.com\/resources\/wp-json\/wp\/v2\/posts\/10918\/revisions\/12432"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aihello.com\/resources\/wp-json\/wp\/v2\/media\/10919"}],"wp:attachment":[{"href":"https:\/\/www.aihello.com\/resources\/wp-json\/wp\/v2\/media?parent=10918"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aihello.com\/resources\/wp-json\/wp\/v2\/categories?post=10918"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aihello.com\/resources\/wp-json\/wp\/v2\/tags?post=10918"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}