SVG/DWG Parsing Workflows for Indoor Mapping Pipelines

Pipeline Architecture & Data Ingestion Strategy

Indoor mapping pipelines routinely ingest heterogeneous blueprint formats, requiring a deterministic normalization layer before topology generation or wayfinding graph construction. The core objective of this workflow is to transform native CAD drawings (.dwg) and web-optimized vector graphics (.svg) into a unified, coordinate-aligned, and semantically tagged geometry schema. This ingestion stage serves as the foundational layer within the broader Automated Floor Plan Parsing & Vectorization architecture, where raw vector primitives are converted into routable, attribute-rich spatial datasets.

DWG files retain native CAD metadata, layer hierarchies, and block references, but lack standardized web coordinate systems. SVG files are inherently web-ready but frequently suffer from flattened transforms, inconsistent viewBox scaling, and loss of semantic layering during export. A production parser must implement format-specific extraction routines, apply rigorous coordinate space alignment, and output a consistent intermediate representation (typically GeoJSON or TopoJSON) ready for downstream graph generation.

DWG Entity Extraction & Block Resolution

DWG parsing requires direct interaction with the DXF/DWG entity tree. The ezdxf library provides a robust interface for reading modern DWG versions, traversing block definitions, and extracting geometric primitives (LINE, LWPOLYLINE, ARC, CIRCLE, SPLINE). Production implementations must resolve nested block references, apply transformation matrices, and filter entities by layer or color index before vectorization. Detailed entity traversal patterns, including block instantiation and coordinate extraction, are documented in Parsing DWG files with Python ezdxf.

Key extraction considerations:

  • Unit Normalization: DWG stores units in drawing units (often millimeters or inches). Explicitly read $INSUNITS and convert to meters consistently.
  • Block Resolution: Iterate doc.blocks, instantiate INSERT entities, and apply scale/rotation/translation matrices to child geometry.
  • Polyline Simplification: Convert LWPOLYLINE and POLYLINE to closed shapely.Polygon objects where applicable, discarding open polylines unless designated as routing corridors.

SVG Coordinate Normalization & Matrix Flattening

SVG parsing demands matrix decomposition and path flattening. Unlike DWG, SVG coordinates are typically pixel-based and inverted along the Y-axis relative to GIS standards. The viewBox attribute defines the coordinate system, but nested <g> elements frequently apply cumulative transform="matrix(...)" operations that must be resolved before geometry extraction. Path data (d attribute) requires tokenization into absolute coordinate sequences, followed by affine transformation application.

For production-grade SVG ingestion, rely on lxml for DOM traversal and shapely for geometric validation. The W3C SVG 2 specification defines precise rules for transform matrix multiplication and coordinate space inheritance, which must be strictly followed to prevent geometric drift. Once flattened, extracted geometries are snapped to a common grid and mapped to indoor spatial classes, enabling seamless integration with Wall & Door Detection Algorithms for structural validation.

Unified Geometry Schema & Topology Preparation

After format-specific extraction, both DWG and SVG outputs converge into a unified schema. This stage enforces:

  1. Coordinate System Alignment: Translate all geometries to a local Cartesian grid (origin at bottom-left, Y-up). Apply a fixed scale factor to normalize CAD millimeters and SVG pixels to meters.
  2. Topology Validation: Snap vertices within a configurable tolerance (e.g., 0.05m), remove self-intersections, and ensure polygon closure for room boundaries.
  3. Semantic Enrichment: Attach metadata extracted from layer names, SVG classes, or CAD extended entity data (XDATA). This structured tagging feeds directly into Attribute Mapping from Blueprints pipelines, ensuring downstream routing engines receive accurate accessibility, occupancy, and material attributes.

The final output adheres to the GeoJSON specification, with FeatureCollection objects containing Polygon and LineString geometries, each carrying a properties dictionary with layer_type, source_format, area_sqm, and is_routable flags.

Production Implementation

The following module demonstrates a production-ready ingestion pipeline. It handles both DWG and SVG inputs, normalizes coordinates, resolves transforms, and outputs a validated GeoJSON schema.

import math
import json
import logging
from pathlib import Path
from typing import List, Dict, Any, Tuple
from dataclasses import dataclass, field

import ezdxf
from shapely.geometry import Polygon, LineString, shape, mapping
from shapely.validation import make_valid
from lxml import etree
import numpy as np

logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")

@dataclass
class UnifiedGeometry:
    geom_type: str
    coordinates: List[Any]
    properties: Dict[str, Any]
    source_file: str

def dwg_unit_to_meters(dxf_units: int) -> float:
    """Map DXF unit codes to meters. Based on AutoCAD standard unit codes."""
    unit_map = {
        0: 1.0,   # Unitless
        1: 0.0254, # Inches
        2: 1.0,   # Feet
        3: 1609.344, # Miles
        4: 0.001, # Millimeters
        5: 0.01,  # Centimeters
        6: 1.0,   # Meters
        7: 1000.0, # Kilometers
    }
    return unit_map.get(dxf_units, 1.0)

def parse_dwg_entities(filepath: str) -> List[UnifiedGeometry]:
    doc = ezdxf.readfile(filepath)
    msp = doc.modelspace()
    unit_scale = dwg_unit_to_meters(doc.header.get('$INSUNITS', 0))
    
    geometries = []
    for entity in msp:
        if entity.dxftype() in ('LWPOLYLINE', 'POLYLINE'):
            points = [(p[0] * unit_scale, p[1] * unit_scale) for p in entity.get_points()]
            if entity.is_closed and len(points) >= 3:
                poly = Polygon(points)
                if poly.is_valid and poly.area > 0.1:
                    geometries.append(UnifiedGeometry(
                        geom_type="Polygon",
                        coordinates=list(poly.exterior.coords),
                        properties={"layer": entity.dxf.layer, "type": "room_or_corridor"},
                        source_file=filepath
                    ))
        elif entity.dxftype() == 'LINE':
            p1 = (entity.dxf.start.x * unit_scale, entity.dxf.start.y * unit_scale)
            p2 = (entity.dxf.end.x * unit_scale, entity.dxf.end.y * unit_scale)
            geometries.append(UnifiedGeometry(
                geom_type="LineString",
                coordinates=[p1, p2],
                properties={"layer": entity.dxf.layer, "type": "wall_or_edge"},
                source_file=filepath
            ))
    return geometries

def apply_svg_transform(points: List[Tuple[float, float]], matrix: np.ndarray) -> List[Tuple[float, float]]:
    """Apply 3x3 affine transform matrix to SVG coordinate points."""
    pts = np.array([(x, y, 1) for x, y in points])
    transformed = (matrix @ pts.T).T
    return [(x, y) for x, y in transformed[:, :2]]

def parse_svg_path(filepath: str, pixel_to_meter: float = 0.001) -> List[UnifiedGeometry]:
    parser = etree.XMLParser(resolve_entities=False)
    tree = etree.parse(filepath, parser)
    root = tree.getroot()
    
    # Default SVG viewBox parsing
    vb = root.get('viewBox', '0 0 1000 1000').split()
    vb_x, vb_y, vb_w, vb_h = map(float, vb)
    scale_x = pixel_to_meter
    scale_y = -pixel_to_meter  # Invert Y for GIS alignment
    
    geometries = []
    for path in root.iter('{http://www.w3.org/2000/svg}path'):
        d = path.get('d', '')
        if not d: continue
        
        # Simplified path parser for M, L, Z commands
        coords = []
        for cmd in d.replace(',', ' ').split():
            try:
                coords.append(float(cmd))
            except ValueError:
                continue
                
        if len(coords) >= 4:
            raw_pts = [(coords[i], coords[i+1]) for i in range(0, len(coords)-1, 2)]
            # Apply scale and Y-inversion
            transformed = [(x * scale_x, y * scale_y) for x, y in raw_pts]
            
            if len(transformed) >= 3:
                poly = Polygon(transformed)
                if poly.is_valid and poly.area > 0.1:
                    geometries.append(UnifiedGeometry(
                        geom_type="Polygon",
                        coordinates=list(poly.exterior.coords),
                        properties={"layer": path.get('class', 'default'), "type": "svg_room"},
                        source_file=filepath
                    ))
    return geometries

def build_geojson(geometries: List[UnifiedGeometry]) -> Dict[str, Any]:
    features = []
    for g in geometries:
        if g.geom_type == "Polygon":
            raw_geom = {"type": "Polygon", "coordinates": [g.coordinates]}
        else:
            raw_geom = {"type": "LineString", "coordinates": g.coordinates}

        # Repair invalid geometries via shapely, then emit GeoJSON-compatible dict.
        repaired = make_valid(shape(raw_geom))
        features.append({
            "type": "Feature",
            "geometry": mapping(repaired),
            "properties": g.properties
        })
    return {"type": "FeatureCollection", "features": features}

if __name__ == "__main__":
    # Example pipeline execution
    input_files = ["floorplan.dwg", "floorplan_export.svg"]
    all_geoms = []
    
    for f in input_files:
        if not Path(f).exists(): continue
        if f.endswith('.dwg'):
            all_geoms.extend(parse_dwg_entities(f))
        elif f.endswith('.svg'):
            all_geoms.extend(parse_svg_path(f))
            
    output = build_geojson(all_geoms)
    Path("normalized_indoor_map.geojson").write_text(json.dumps(output, indent=2))
    logging.info(f"Exported {len(output['features'])} validated features to GeoJSON.")

Troubleshooting & Edge Case Handling

Symptom Root Cause Resolution
Geometries appear mirrored or inverted SVG Y-axis inversion not applied, or DWG coordinate origin mismatch Apply scale_y = -pixel_scale during SVG parsing. For DWG, verify $INSBASE and translate coordinates to (0,0) before scaling.
Open polylines fail topology validation CAD drafters left room boundaries unclosed, or SVG paths lack Z Use shapely.ops.polygonize() to auto-close gaps within a 0.05m tolerance, or flag as is_routable=False for manual review.
Transform matrix accumulation causes drift Nested <g> tags in SVG apply cumulative transforms Decompose and multiply matrices sequentially using numpy.dot(). Validate against the W3C SVG Coordinate Systems specification before flattening.
Memory spikes during batch processing Large DWG files load entire entity tree into RAM Stream entities using ezdxf’s iterdxf() or chunk SVG DOM traversal. Offload heavy geometry validation to async workers via concurrent.futures.ProcessPoolExecutor.
Duplicate vertices cause shapely TopologyException CAD export artifacts or SVG anti-aliasing sub-pixel offsets Apply shapely.ops.transform(lambda x, y: (round(x, 3), round(y, 3)), geom) before validation to snap to millimeter precision.

For high-throughput environments, integrate async batch processing pipelines to parallelize format extraction, topology validation, and graph serialization. Once normalized, the unified geometry feeds directly into routing engines, where real-time topology updates maintain graph consistency during facility modifications.