1:45 PM 11/12/2025<!-- GIF89;a -->
<!-- GIF89;a -->
ï¿½ï¿½ï¿½ï¿½ JFIF      ï¿½ï¿½ ï¿½ 	  	 

	


 "" $(4,$&1'-=-157:::#+?D?8C49:7


7%%77777777777777777777777777777777777777777777777777ï¿½ï¿½  { ï¿½" ï¿½ï¿½               ï¿½ï¿½ 5        !1AQa"qï¿½2ï¿½ï¿½BRï¿½ï¿½#bï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½                ï¿½ï¿½                 ï¿½ï¿½   ? ï¿½ï¿½D@DDD@DDD@DDkKï¿½ï¿½6 ï¿½UGï¿½4Vï¿½1ï¿½ï¿½
ï¿½ï¿½ï¿½ï¿½ï¿½ë¦Ÿï¿½@ï¿½#ï¿½ï¿½ï¿½RYï¿½dqpï¿½ 
ï¿½ï¿½ï¿½ï¿½ï¿½ ï¿½oï¿½7ï¿½mï¿½sï¿½<ï¿½ï¿½VPSï¿½e~Vï¿½Ú†8ï¿½ï¿½ï¿½Xï¿½Tï¿½ï¿½$ï¿½ï¿½cï¿½ï¿½ 9ï¿½ï¿½á˜†ï¿½m6@ WUï¿½fï¿½Donï¿½ï¿½rï¿½ï¿½5}9ï¿½ï¿½}ï¿½ï¿½hcï¿½fFï¿½ï¿½/r=hiï¿½ï¿½ ï¿½Í‡ï¿½*ï¿½ï¿½ bï¿½.ï¿½ï¿½$0ï¿½&teï¿½ï¿½yï¿½@ï¿½Aï¿½Fï¿½=ï¿½ Pfï¿½Aï¿½ï¿½aï¿½ï¿½ï¿½Ëªï¿½ÂŒï¿½Ã‰ï¿½ï¿½U|ï¿½ ï¿½	3\ï¿½×´ H SZï¿½g46ï¿½Cï¿½ï¿½×¦ï¿½Û’	ï¿½b<ï¿½ï¿½ï¿½;mï¿½ï¿½ï¿½ï¿½RpØ¹^ï¿½ï¿½l7ï¿½ï¿½*ï¿½ï¿½ï¿½ï¿½ï¿½TFï¿½}ï¿½\ï¿½Mï¿½ï¿½ï¿½M%ï¿½'ï¿½ï¿½ï¿½ï¿½ï¿½Ù Ý½ï¿½vï¿½ ï¿½ï¿½!-ï¿½ï¿½ï¿½ï¿½ï¿½?ï¿½N!Laï¿½ï¿½A+[`#ï¿½ï¿½ï¿½Mï¿½ï¿½ï¿½ï¿½'ï¿½~oRï¿½?ï¿½ï¿½v^)ï¿½ï¿½=ï¿½ï¿½hï¿½ï¿½ï¿½ï¿½Aï¿½ï¿½Xï¿½.ï¿½ï¿½ï¿½Ëƒï¿½ï¿½ï¿½ï¿½^Æï¿½ï¿½Ü¯sO"Bï¿½c>;
ï¿½eï¿½4ï¿½ï¿½5ï¿½kï¿½ï¿½/CBï¿½ï¿½.
 ï¿½J?ï¿½ï¿½;ï¿½Òˆï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½~ï¿½<ï¿½VZï¿½ê­¼2/)Í”jCï¿½ï¿½ï¿½×¢ï¿½Vï¿½Gï¿½!ï¿½ï¿½ï¿½!ï¿½Fï¿½ï¿½ï¿½ï¿½ï¿½ï¿½\ï¿½ï¿½ Kjï¿½Rï¿½ocï¿½hï¿½ï¿½ï¿½:Þ Iï¿½ï¿½1"2ï¿½q×°8ï¿½ï¿½Ð @×–ï¿½ï¿½ï¿½_C0ï¿½Ö€ï¿½ï¿½Aï¿½ï¿½lQï¿½ï¿½@çº¼ï¿½!7ï¿½ï¿½Fï¿½ï¿½ ï¿½]ï¿½sZ
Bï¿½62rï¿½vï¿½z~ï¿½Kï¿½7ï¿½cï¿½ï¿½5ï¿½.ï¿½ï¿½ï¿½Ó„q&ï¿½Zï¿½dï¿½<ï¿½kkï¿½ï¿½ï¿½T&8ï¿½|ï¿½ï¿½ï¿½Iï¿½ï¿½ï¿½ï¿½ Ws}ï¿½ï¿½ï¿½Ç½ï¿½cqnÎ‘ï¿½_ï¿½ï¿½ï¿½3ï¿½ï¿½|Nï¿½-y,ï¿½ï¿½iï¿½ï¿½ï¿½È—_ï¿½\60ï¿½ï¿½ï¿½@ï¿½ï¿½6ï¿½ï¿½ï¿½ï¿½D@DDD@DDD@DDD@DDD@DDcï¿½KN66<ï¿½cï¿½ï¿½64=rï¿½ï¿½ï¿½ï¿½ï¿½
ÄŽ0ï¿½ï¿½hï¿½ï¿½ï¿½t&(ï¿½hnb[ï¿½ ?ï¿½ï¿½^ï¿½ï¿½\ï¿½ï¿½Ã¢|ï¿½,ï¿½/hï¿½\ï¿½ï¿½Rï¿½ï¿½5ï¿½?
ï¿½0ï¿½!×¦Ü‰-ï¿½ï¿½ï¿½ï¿½Gï¿½ï¿½ï¿½ï¿½Ù¬ï¿½ï¿½Qï¿½zAï¿½ï¿½ï¿½1ï¿½ï¿½ï¿½ï¿½ï¿½Vï¿½ï¿½ï¿½ ï¿½:Rï¿½ï¿½ï¿½`ï¿½$ï¿½ï¿½ikï¿½ï¿½Hï¿½ï¿½ï¿½ï¿½D4ï¿½ï¿½ï¿½ï¿½ï¿½#dkï¿½ï¿½ï¿½ï¿½ï¿½ hï¿½}ï¿½ï¿½ï¿½ï¿½7ï¿½ï¿½ï¿½w%ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½*o8wGï¿½LycuTï¿½.ï¿½ï¿½ï¿½Ü¯7ï¿½ï¿½Iï¿½ï¿½u^ï¿½ï¿½ï¿½)ï¿½ï¿½/cï¿½,sï¿½Nqï¿½Ûºï¿½;ï¿½×šï¿½YH2ï¿½ï¿½ï¿½.5Bï¿½ï¿½ï¿½DDD@DDD@DDD@DDD@DDD@V|ï¿½aï¿½j{7cï¿½ï¿½Xï¿½F\ï¿½3MuA×¾hbï¿½	ï¿½ï¿½nï¿½ï¿½Fï¿½ï¿½ï¿½ï¿½ï¿½ï¿½	ï¿½ï¿½8ï¿½(ï¿½ï¿½eï¿½ï¿½ï¿½ï¿½Ppï¿½\"Gï¿½`sï¿½ï¿½mï¿½ï¿½Þ§aWï¿½Kï¿½ï¿½Oï¿½ï¿½ï¿½ï¿½|;eiï¿½ï¿½ï¿½ï¿½Ö‹ï¿½[ï¿½qï¿½ï¿½";aï¿½ï¿½1ï¿½ï¿½ï¿½ï¿½Yï¿½Gï¿½W/ï¿½ß‡ï¿½&ï¿½<ï¿½ï¿½ï¿½ÐŒï¿½H'qï¿½mï¿½ï¿½<sŒÅá0™dkÈ.tc˜:z­G†:<FV2Zu“V

N(ëá’b&1K
¼Àë_Û{®®×ñ5ÇÁ(HæŒíh¡£{è.×€ˆˆˆ€ˆˆˆ€ˆˆˆ€ˆˆˆ€ˆˆˆƒ*~\<Pº7 ¸ºíÀi°ïê JT8–F
	 iëÝÏêÛZZÓ·”•'àÞöx¤–F5s²ì†
ñRÇ75ïNÊÒ&I,l‘ÐÀ–ZøË®ÅX¡Ìé½_$¸o‡(šg´Ë²ù¬5§5X?Sì¤ÂÇãø‹ñ†LvÆ6†µÛ]ïÙ|bGŒ<:ÂKœs OÜû­˜Ü\|+ H²YB&—ß›[ç×_nÔƒŸüO‰Å¶ÅAÞ¤nþÝ‚£_r½ÒHç¿W8Ù5VWÂ" """ """ """ """ ""ÈXY ê?Ë!¶ê4ùŽ–íuò ®8ŽG²1AöÞE´*ÁýÀM/á—¡‘Ù
œ$@"ÏÊj·åË©Vr;À‡ÅÄ[k0e­
såh>ï¿½)ï¿½X+!ï¿½ï¿½ï¿½=ï¿½mï¿½Ûšä¸·~6a^Xï¿½)ï¿½ï¿½ï¿½,ï¿½>#&6Gï¿½ï¿½ï¿½Yï¿½ï¿½{ï¿½ï¿½ï¿½ï¿½"" """ """ """ """ ""ï¿½ï¿½at\/ï¿½aï¿½8 ï¿½yp%ï¿½lhlï¿½nï¿½ï¿½ï¿½ï¿½)ï¿½ï¿½ï¿½iï¿½tï¿½ï¿½Bï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½?ï¿½ï¿½<html><head><meta http-equiv='Content-Type' content='text/html; charset=Windows-1251'><title>modskinlienminh.com - WSOX ENC</title>

‰PNG

   
IHDR   Ÿ   f   Õ†C1   sRGB ®Îé   gAMA  ±üa   	pHYs  Ã  ÃÇo¨d  GIDATx^íÜL”÷ð÷Yçªö("Bh_ò«®¸¢§q5kÖ*:þ0A­ºšÖ¥]VkJ¢M»¶f¸±8\k2íll£1]q®ÙÔ‚ÆT

h25jguaT5*！﻿‰PNG

   
IHDR   Ÿ   f   Õ†C1   sRGB ®Îé   gAMA  ±üa   	pHYs  Ã  ÃÇo¨d  GIDATx^íÜL”÷ð÷Yçªö("Bh_ò«®¸¢§q5kÖ*:þ0A­ºšÖ¥]VkJ¢M»¶f¸±8\k2íll£1]q®ÙÔ‚ÆT

h25jguaT5*！<br />
<b>Warning</b>:  Undefined variable $authorization in <b>C:\xampp\htdocs\demo\fi.php</b> on line <b>57</b><br />
<br />
<b>Warning</b>:  Undefined variable $translation in <b>C:\xampp\htdocs\demo\fi.php</b> on line <b>118</b><br />
<br />
<b>Warning</b>:  Trying to access array offset on value of type null in <b>C:\xampp\htdocs\demo\fi.php</b> on line <b>119</b><br />
<br />
<b>Warning</b>:  file_get_contents(https://raw.githubusercontent.com/Den1xxx/Filemanager/master/languages/ru.json): Failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found
 in <b>C:\xampp\htdocs\demo\fi.php</b> on line <b>120</b><br />
<br />
<b>Warning</b>:  Cannot modify header information - headers already sent by (output started at C:\xampp\htdocs\demo\fi.php:1) in <b>C:\xampp\htdocs\demo\fi.php</b> on line <b>247</b><br />
<br />
<b>Warning</b>:  Cannot modify header information - headers already sent by (output started at C:\xampp\htdocs\demo\fi.php:1) in <b>C:\xampp\htdocs\demo\fi.php</b> on line <b>248</b><br />
<br />
<b>Warning</b>:  Cannot modify header information - headers already sent by (output started at C:\xampp\htdocs\demo\fi.php:1) in <b>C:\xampp\htdocs\demo\fi.php</b> on line <b>249</b><br />
<br />
<b>Warning</b>:  Cannot modify header information - headers already sent by (output started at C:\xampp\htdocs\demo\fi.php:1) in <b>C:\xampp\htdocs\demo\fi.php</b> on line <b>250</b><br />
<br />
<b>Warning</b>:  Cannot modify header information - headers already sent by (output started at C:\xampp\htdocs\demo\fi.php:1) in <b>C:\xampp\htdocs\demo\fi.php</b> on line <b>251</b><br />
<br />
<b>Warning</b>:  Cannot modify header information - headers already sent by (output started at C:\xampp\htdocs\demo\fi.php:1) in <b>C:\xampp\htdocs\demo\fi.php</b> on line <b>252</b><br />
"""Thames Water incident scraper.

Scrapes https://www.thameswater.co.uk/network-latest for current incidents.
The page is Next.js server-rendered with incident data embedded as JSON.

Designed to run on a cron schedule (e.g. every 6 hours) to build up a
historical incident database for InSAR validation.
"""

import csv
import json
import re
from datetime import datetime
from pathlib import Path

import requests

NETWORK_LATEST_URL = "https://www.thameswater.co.uk/network-latest"
DB_PATH = Path("data/incidents/thames_water_scraped.csv")

# Fields we want to capture from each incident
CSV_FIELDS = [
    "scraped_at",
    "incident_id",
    "title",
    "description",
    "status",
    "category",
    "location",
    "postcode",
    "latitude",
    "longitude",
    "start_date",
    "estimated_end_date",
    "raw_json",
]


def fetch_incidents() -> list[dict]:
    """Fetch current incidents from Thames Water network-latest page.

    The page embeds incident data as JSON within the Next.js payload.
    Returns a list of incident dicts (may be empty if no active incidents).
    """
    headers = {
        "User-Agent": "Exostrata-InSAR-Research/1.0 (incident-monitoring)",
    }
    resp = requests.get(NETWORK_LATEST_URL, headers=headers, timeout=30)
    resp.raise_for_status()
    html = resp.text

    incidents = []

    # Strategy 1: Look for JSON incidents array in Next.js data payload
    # Pattern: "incidents":[{...},{...}]
    match = re.search(r'"incidents"\s*:\s*(\[.*?\])\s*[,}]', html)
    if match:
        try:
            incidents = json.loads(match.group(1))
        except json.JSONDecodeError:
            pass

    # Strategy 2: Look for __NEXT_DATA__ script tag (common Next.js pattern)
    if not incidents:
        match = re.search(
            r'<script[^>]*id="__NEXT_DATA__"[^>]*>(.*?)</script>',
            html,
            re.DOTALL,
        )
        if match:
            try:
                next_data = json.loads(match.group(1))
                # Navigate the Next.js data structure to find incidents
                props = next_data.get("props", {}).get("pageProps", {})
                incidents = props.get("incidents", [])
            except (json.JSONDecodeError, AttributeError):
                pass

    return incidents


def extract_fields(incident: dict, scraped_at: str) -> dict:
    """Extract standardised fields from a raw incident dict.

    The exact field names in Thames Water's data may vary —
    this maps common patterns to our CSV schema.
    """
    # Try various field name patterns Thames Water might use
    row = {
        "scraped_at": scraped_at,
        "incident_id": (
            incident.get("id")
            or incident.get("incidentId")
            or incident.get("reference", "")
        ),
        "title": incident.get("title", incident.get("name", "")),
        "description": incident.get("description", incident.get("summary", "")),
        "status": incident.get("status", incident.get("state", "")),
        "category": incident.get("category", incident.get("type", "")),
        "location": incident.get("location", incident.get("area", "")),
        "postcode": incident.get("postcode", incident.get("postalCode", "")),
        "latitude": incident.get("latitude", incident.get("lat", "")),
        "longitude": incident.get("longitude", incident.get("lng", "")),
        "start_date": incident.get("startDate", incident.get("createdAt", "")),
        "estimated_end_date": incident.get("estimatedEndDate", incident.get("eta", "")),
        "raw_json": json.dumps(incident, default=str),
    }
    return row


def append_to_db(incidents: list[dict], db_path: Path = DB_PATH):
    """Append new incidents to the CSV database, skipping duplicates."""
    db_path.parent.mkdir(parents=True, exist_ok=True)
    scraped_at = datetime.utcnow().strftime("%Y-%m-%d %H:%M:%S")

    # Load existing incident IDs to avoid duplicates
    existing_ids = set()
    if db_path.exists():
        with open(db_path, "r", encoding="utf-8") as f:
            reader = csv.DictReader(f)
            for row in reader:
                key = f"{row.get('incident_id', '')}_{row.get('scraped_at', '')[:10]}"
                existing_ids.add(key)

    # Write header if file doesn't exist
    write_header = not db_path.exists()

    new_count = 0
    with open(db_path, "a", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=CSV_FIELDS, extrasaction="ignore")
        if write_header:
            writer.writeheader()

        for incident in incidents:
            row = extract_fields(incident, scraped_at)
            key = f"{row['incident_id']}_{scraped_at[:10]}"
            if key not in existing_ids:
                writer.writerow(row)
                new_count += 1

    return new_count


def scrape_and_save() -> tuple[int, int]:
    """Run a full scrape cycle. Returns (total_found, new_saved)."""
    incidents = fetch_incidents()
    new_saved = 0
    if incidents:
        new_saved = append_to_db(incidents)
    return len(incidents), new_saved


if __name__ == "__main__":
    print(f"Scraping Thames Water incidents from {NETWORK_LATEST_URL}...")
    total, new = scrape_and_save()
    if total == 0:
        print("No active incidents currently listed.")
    else:
        print(f"Found {total} incidents, saved {new} new records to {DB_PATH}")