LangExtract: Gemini-powered Information Extraction Library
LangExtract is a powerful Python library for extracting structured information from unstructured text using LLMs. Experience LangExtract's precise source grounding and interactive visualization capabilities.
LangExtract maps every extraction to its exact location in the source text, enabling visual highlighting for easy traceability and verification.
LangExtract enforces a consistent output schema based on your few-shot examples, leveraging controlled generation in supported models like Gemini.
LangExtract overcomes the 'needle-in-a-haystack' challenge using optimized text chunking, parallel processing, and multiple passes for higher recall.
LangExtract allows you to define extraction tasks for any domain using just a few examples. LangExtract adapts to your needs without requiring model fine-tuning.
LangExtract generates interactive HTML visualizations to review extracted entities in context with intuitive highlighting.
LangExtract is free to use under Apache 2.0 License. Community-driven LangExtract development with full transparency.
Use LangExtract to extract key information from clinical notes and medical reports while maintaining source traceability.
LangExtract helps extract clauses, dates, parties and other information from contracts and legal documents.
LangExtract enables you to analyze literature, extract entities from academic papers, and structure unorganized text data.
Transform business documents into structured data using LangExtract for analysis and decision-making.
The developer community has shown tremendous excitement and positive response to LangExtract across all use cases
Transforming language into structured gold with unprecedented AI transparency and traceability potential.
A huge step forward for the future of data science, enabling structured data extraction from complex documents.
Extremely useful across all development projects, significantly improving workflow efficiency.
Receiving attention worldwide, including strong interest from international developer communities.
LangExtract is excellent for processing medical reports and clinical documentation
LangExtract is perfect for financial document analysis and data extraction
LangExtract is a powerful tool for academic research and literature analysis
LangExtract handles both cloud and local models with exceptional performance
Get started with LangExtract in 3 simple steps
Create a prompt and provide examples to guide extraction
import langextract as lx # Define extraction rules prompt = "Extract characters and emotions from text" # Provide a high-quality example examples = [ lx.data.ExampleData( text="ROMEO: But soft! What light...", extractions=[ lx.data.Extraction( extraction_class="character", extraction_text="ROMEO" ) ] ) ]
Process your text with the defined task
result = lx.extract( text_or_documents=input_text, prompt_description=prompt, examples=examples, model_id="gemini-2.5-flash" )
Generate interactive HTML visualization
# Save results lx.io.save_annotated_documents( [result], output_name="results.jsonl" ) # Generate visualization html_content = lx.visualize("results.jsonl") with open("visualization.html", "w") as f: f.write(html_content)
See LangExtract in action across different domains
Extract characters, emotions, and relationships from Romeo and Juliet
Structure clinical notes and extract medications, dosages, and patient information
Live demo for structuring radiology reports with real-time processing
For cloud models like Gemini, set up your API key
Install LangExtract with pip and start extracting structured information from your text data in minutes.