Skip to main content

GitHub All-Stars #3: LangExtract – How Google Turns Chaos into Data with Gemini

Picture of Artur Skowroński, Head of Java/Kotlin Space

Artur Skowroński

Head of Java/Kotlin Space
Sep 10, 2025|12 min read
blue_qubes_joined
meme_llm

meme_chatgpt

1import langextract as lx
2prompt_description = """\
3Extract all medical conditions, medications, and lifestyle risk factors from the patient's medical summary.
4
5- For medications, always include the dosage and frequency if mentioned.
6- For conditions, note if they are described as 'controlled', 'stable', or 'chronic'.
7- For lifestyle risks, extract the specific habit, e.g., 'smoking' or 'drinking'.
8- Accurately capture the exact text spans for each extracted entity.
9"""
10# Step 1: Define what we want to extract from the medical documentation
1examples = [
2 lx.ExampleData(
3 source_text="Patient has a history of controlled hypertension, treated with Metformin 500mg twice daily.",
4 extractions=[
5 lx.Extraction(
6 class_name="medical_condition",
7 text_span="hypertension",
8 attributes={"status": "controlled"},
9 ),
10 lx.Extraction(
11 class_name="medication",
12 text_span="Metformin",
13 attributes={"dosage": "500mg", "frequency": "twice daily"},
14 ),
15 ],
16 ),
17 lx.ExampleData(
18 source_text="He is a non-smoker but admits to occasional social drinking. BP is stable at 120/80 mmHg.",
19 extractions=[
20 lx.Extraction(
21 class_name="lifestyle_risk",
22 text_span="non-smoker",
23 attributes={},
24 ),
25 lx.Extraction(
26 class_name="lifestyle_risk",
27 text_span="social drinking",
28 attributes={"frequency": "occasional"},
29 ),
30 lx.Extraction(
31 class_name="measurement",
32 text_span="BP is stable at 120/80 mmHg",
33 attributes={"type": "blood_pressure", "value": "120/80"},
34 )
35 ],
36 ),
37]
38# Step 2: Provide examples to teach the model patterns from the medical domain
1# (let’s assume 'underwriting_document_text' contains the client’s full medical report)
2extraction_requests = [
3 lx.ExtractionRequest(
4 prompt_description=prompt_description,
5 examples=examples,
6 source_text=underwriting_document_text,
7 )
8]
9
10# Choose a model (could be Gemini, or even a local one via Ollama)
11model = lx.Llm(lx.Gemini(model_name="gemini-1.5-flash-latest"))
12
13# A single line to launch the entire analytical process!
14extraction_results = lx.extract(
15 requests=extraction_requests,
16 llm=model,
17 output_path="underwriting_extractions.jsonl",
18)
19# Step 3: Run extraction on the full document
1lx.visualize_extraction_results(
2 extraction_path="underwriting_extractions.jsonl",
3 html_path="underwriting_visualization.html",
4 source_text=underwriting_document_text,
5)
6# Step 4: Create an interactive HTML report for verification
github_star

Subscribe to our newsletter and never miss an article