Segmenting images from a dataset¶

This notebook demonstrates how pretrained models can be used to perform semantic segmentation on a set of images and compute some statistics on the identified objects.

In the future, Streetscapes may include bespoke models trained on specific tasks, such as detecting materials in images and video.

Setup¶

Import all the packages that we will need below and set up some convenience variables.

In [2]:

Copied!





# --------------------------------------
import warnings

warnings.filterwarnings("ignore")

# --------------------------------------
from PIL import Image

# --------------------------------------
import numpy as np

# --------------------------------------
import matplotlib.pyplot as plt

# --------------------------------------
import streetscapes as scs
from streetscapes.enums import Stat
from streetscapes.enums import Attr
# --------------------------------------
import warnings

warnings.filterwarnings("ignore")

# --------------------------------------
from PIL import Image

# --------------------------------------
import numpy as np

# --------------------------------------
import matplotlib.pyplot as plt

# --------------------------------------
import streetscapes as scs
from streetscapes.enums import Stat
from streetscapes.enums import Attr

Load or generate a subset of the streetscapes dataset.

In [3]:

Copied!





# Define the criteria for creating the subset
criteria = {
    "city": "Amsterdam",  # Equivalent to "city": (operator.eq, "Amsterdam")
    "view_direction": "side",
    "lighting_condition": "day",
}

# Define the columns to keep in the subset
columns = ["uuid", "source", "orig_id", "lat", "lon"]

# Create or load the subset
amsterdam_side = scs.load_subset(
    "amsterdam_side",
    criteria=criteria,
    columns=columns,
    recreate=True,
    save=False,
)
# Define the criteria for creating the subset
criteria = {
    "city": "Amsterdam",  # Equivalent to "city": (operator.eq, "Amsterdam")
    "view_direction": "side",
    "lighting_condition": "day",
}

# Define the columns to keep in the subset
columns = ["uuid", "source", "orig_id", "lat", "lon"]

# Create or load the subset
amsterdam_side = scs.load_subset(
    "amsterdam_side",
    criteria=criteria,
    columns=columns,
    recreate=True,
    save=False,
)

Streetscapes | 2025-03-24@16:30:16 | Creating subset 'amsterdam_side'...
Streetscapes | 2025-03-24@16:30:16 | Done

We will first process one image (chosen at random from the dataset we loaded above) in order to understand how the pipeline works.

Note: please make sure that you download some images first (cf. download_city_images.ipynb).

In [36]:

Copied!





sample_id = amsterdam_side.to_pandas().sample(1)["orig_id"].values[0]
sample_image = scs.conf.IMAGE_DIR / f"{sample_id}.jpeg"
img_data = np.array(Image.open(sample_image))
plt.imshow(img_data)
sample_id = amsterdam_side.to_pandas().sample(1)["orig_id"].values[0]
sample_image = scs.conf.IMAGE_DIR / f"{sample_id}.jpeg"
img_data = np.array(Image.open(sample_image))
plt.imshow(img_data)

Out[36]:

<matplotlib.image.AxesImage at 0x749e51a85dc0>

No description has been provided for this image

We define some categories of objects that we would like to extract. The categories can be defined hierarchically so that categories deeper in the hierarchy would be 'subtracted' from their parents ones. For instance, this can be used to instruct the model to extract 'building' objects without 'window' or 'door' objects. This example is used below.

In [37]:

Copied!





labels = {
    "sky": None,
    "building": {
        "window": None,
        "door": None,
    },
    "tree": None,
    "car": None,
    "truck": None,
    "road": None,
}
labels = {
    "sky": None,
    "building": {
        "window": None,
        "door": None,
    },
    "tree": None,
    "car": None,
    "truck": None,
    "road": None,
}

Furthermore, we define the attributes and the types of statistics for those attributes that we would like to extract from the segmented images.

In [38]:

Copied!

attrs = {Attr.H, Attr.Area}
stats = {Stat.Mean}
attrs = {Attr.H, Attr.Area}
stats = {Stat.Mean}

MaskFormer¶

First, we will use the MaskFormer model for segmentation. This model is built on the pretrained Mask2FormerForUniversalSegmentation model, which is an implementation of the Mask2Former model from FAIR.

In [39]:

Copied!

mf_model = scs.models.MaskFormer()
mf_model = scs.models.MaskFormer()

First, segment the image and extract some metadata (object instances, colour statistics, latitude and longitude...).

In [40]:

Copied!

mf_images, mf_masks, mf_instances = mf_model.segment(sample_image, labels)
mf_images, mf_masks, mf_instances = mf_model.segment(sample_image, labels)

Streetscapes | 2025-03-24@16:54:18 | Segmenting images...
Streetscapes | 2025-03-24@16:54:18 | Detecting objects...
Streetscapes | 2025-03-24@16:54:22 | [ 173710811305308.jpeg ] Extracted 1 instances for 9 labels.

Extract the statistics for the segmented image.

In [41]:

Copied!





mf_stats = mf_model.extract_stats(
    mf_images,
    mf_masks,
    mf_instances,
    attrs=attrs,
    stats=stats,
)
mf_stats = mf_model.extract_stats(
    mf_images,
    mf_masks,
    mf_instances,
    attrs=attrs,
    stats=stats,
)

Streetscapes | 2025-03-24@16:54:22 | Extracting metadata...

100%|██████████| 18/18 [00:00<00:00, 101.45it/s]

Show the extracted instance IDs with their labels.

In [42]:

Copied!

mf_instances
mf_instances

Out[42]:

{173710811305308: {1: 'person',
  2: 'person',
  3: 'building',
  4: 'person',
  5: 'person',
  6: 'sky',
  7: 'pedestrian-area',
  8: 'person',
  9: 'pole',
  10: 'manhole',
  11: 'person',
  12: 'person',
  13: 'person',
  14: 'sidewalk',
  15: 'car',
  16: 'person',
  17: 'billboard',
  18: 'car'}}

Visualisation¶

Select a few labels of interest. Only these labels will be highlighted in the visualisation.

In [43]:

Copied!

highlight = {"building", "sky", "road", "window"}
highlight = {"building", "sky", "road", "window"}

Select one of the segmented images.

In [44]:

Copied!

mf_orig_id, mf_image = next(iter(mf_images.items()))
mf_orig_id, mf_image = next(iter(mf_images.items()))

Visualise the image and the highlighted instances.

In [45]:

Copied!





(fig, ax) = mf_model.visualise_segmentation(
    mf_image,
    mf_masks[mf_orig_id],
    mf_instances[mf_orig_id],
    highlight,
    title=sample_image.name,
)
(fig, ax) = mf_model.visualise_segmentation(
    mf_image,
    mf_masks[mf_orig_id],
    mf_instances[mf_orig_id],
    highlight,
    title=sample_image.name,
)

Process multiple entries from the dataset at once¶

The segmentation model also provides a convenience method to process multiple images in a subset of the streetscapes dataset at once. We will use the sample method to extract a small subset of a few images of the dataset at random. Those images will be segmented, and the corresponding metadata (statistics about the colour of the detected instances, instance masks, latitude and longitude of the place where the image was taken, and so forth) will be stored in a separate Parquet file for each image, with the same name as the image itself (and the extension .parquet) in the default directory for parquet files. This would allow us to segment a large dataset once and load the results later.

In [46]:

Copied!





mf_image_paths, mf_mask_paths, mf_stat_paths = mf_model.segment_from_dataset(
    amsterdam_side,
    labels,
    sample=3,
    attrs=attrs,
    stats=stats,
)
mf_image_paths, mf_mask_paths, mf_stat_paths = mf_model.segment_from_dataset(
    amsterdam_side,
    labels,
    sample=3,
    attrs=attrs,
    stats=stats,
)

  0%|          | 0/1 [00:00<?, ?it/s]

Streetscapes | 2025-03-24@16:54:24 | Segmenting images...
Streetscapes | 2025-03-24@16:54:24 | Detecting objects...
Streetscapes | 2025-03-24@16:54:33 | [ 271687091342629.jpeg ] Extracted 1 instances for 17 labels.
Streetscapes | 2025-03-24@16:54:33 | [ 847773372476766.jpeg ] Extracted 2 instances for 7 labels.
Streetscapes | 2025-03-24@16:54:33 | [ 425357539392385.jpeg ] Extracted 3 instances for 13 labels.
Streetscapes | 2025-03-24@16:54:33 | Extracting metadata...

100%|██████████| 35/35 [00:00<00:00, 97.36it/s] 
100%|██████████| 9/9 [00:00<00:00, 75.02it/s]
100%|██████████| 15/15 [00:00<00:00, 92.54it/s] 
100%|██████████| 1/1 [00:11<00:00, 11.88s/it]

Load the segmentation masks and the statistics from the respective files on disk.

In [47]:

Copied!

mf_masks = mf_model.load_masks(mf_mask_paths)
mf_masks
mf_masks = mf_model.load_masks(mf_mask_paths)
mf_masks

Out[47]:

{271687091342629: array([[ 3,  3,  3, ..., 11, 11, 11],
        [ 3,  3,  3, ..., 11, 11, 11],
        [ 3,  3,  3, ..., 11, 11, 11],
        ...,
        [33, 33, 33, ..., 33, 33, 33],
        [33, 33, 33, ..., 33, 33, 33],
        [33, 33, 33, ..., 33, 33, 33]], shape=(1536, 2048), dtype=int32),
 847773372476766: array([[1, 1, 1, ..., 1, 1, 1],
        [1, 1, 1, ..., 1, 1, 1],
        [1, 1, 1, ..., 1, 1, 1],
        ...,
        [3, 3, 3, ..., 1, 1, 1],
        [3, 3, 3, ..., 1, 1, 1],
        [3, 3, 3, ..., 1, 1, 1]], shape=(1536, 2048), dtype=int32),
 425357539392385: array([[14, 14, 14, ...,  4,  4,  4],
        [14, 14, 14, ...,  4,  4,  4],
        [14, 14, 14, ...,  4,  4,  4],
        ...,
        [15, 15, 15, ..., 11, 11, 11],
        [15, 15, 15, ..., 11, 11, 11],
        [15, 15, 15, ..., 11, 11, 11]], shape=(1536, 2048), dtype=int32)}

In [48]:

Copied!

mf_stats = mf_model.load_stats(mf_stat_paths)
mf_stats
mf_stats = mf_model.load_stats(mf_stat_paths)
mf_stats

Out[48]:

{271687091342629: {'instance': [1,
   2,
   3,
   4,
   5,
   6,
   7,
   8,
   9,
   10,
   11,
   12,
   13,
   14,
   15,
   16,
   17,
   18,
   19,
   20,
   21,
   22,
   23,
   24,
   25,
   26,
   27,
   28,
   29,
   30,
   31,
   32,
   33,
   34,
   35],
  'label': ['car',
   'billboard',
   'building',
   'fence',
   'car',
   'traffic-sign-front',
   'pole',
   'car',
   'person',
   'curb',
   'sky',
   'pole',
   'rail-track',
   'person',
   'car',
   'billboard',
   'billboard',
   'person',
   'lane-marking-general',
   'billboard',
   'sidewalk',
   'pole',
   'pole',
   'curb-cut',
   'billboard',
   'person',
   'billboard',
   'terrain',
   'billboard',
   'bike-lane',
   'billboard',
   'vegetation',
   'road',
   'traffic-sign-front',
   'car'],
  <Attr.H: 'h'>: {<Stat.Mean: 'mean'>: [0.2823790023750773,
    0.5912259465282802,
    0.13504297116381395,
    0.3525755914359096,
    0.511013961752236,
    0.7773113699067796,
    0.22844708819299545,
    0.40711473847008145,
    0.5218501405541671,
    0.10621882951952231,
    0.5638665736432154,
    0.15756551394378024,
    0.12188134405403411,
    0.36784870943734616,
    0.31897104448485847,
    0.35976192594477463,
    0.2138299841675105,
    0.3173155027384608,
    0.10761744871881532,
    0.7116212523547456,
    0.13389144345684345,
    0.32318192209742364,
    0.3947622400231295,
    0.0971824490235465,
    0.391136501738752,
    0.2772823411274336,
    0.33531114276895685,
    0.17352801001745893,
    0.4694826773121544,
    0.05938501998751345,
    0.6332033093294557,
    0.26958763078692716,
    0.1098876774345231,
    0.6125185627479978,
    0.3954640427416042]},
  <Attr.Area: 'area'>: [0.0006605784098307291,
   0.0004730224609375,
   0.36186567942301434,
   0.014179229736328123,
   0.0006882349650065104,
   0.0010506312052408857,
   0.0004660288492838542,
   0.0017220179239908857,
   0.0026524861653645835,
   0.005263010660807292,
   0.07838217417399089,
   0.0003134409586588542,
   0.0861819585164388,
   0.0013303756713867188,
   0.0023330052693684897,
   0.0021807352701822915,
   0.0056031545003255205,
   0.0011650721232096357,
   0.023432095845540363,
   0.0017754236857096357,
   0.05001354217529297,
   0.0008424123128255209,
   0.001827557881673177,
   0.0018596649169921875,
   0.004569053649902344,
   0.0025571187337239585,
   0.0016237894694010415,
   0.0023142496744791665,
   0.006777127583821614,
   0.0068670908610026045,
   0.002688725789388021,
   0.008017857869466146,
   0.30573590596516925,
   0.001254717508951823,
   0.008481025695800781],
  <Attr.Coords: 'coords'>: [52.36664146208025, 4.879517555236816]},
 847773372476766: {'instance': [1, 2, 3, 4, 5, 6, 7, 8, 9],
  'label': ['building',
   'sky',
   'sidewalk',
   'bicycle',
   'bicycle',
   'bicycle',
   'vegetation',
   'person',
   'bike-rack'],
  <Attr.H: 'h'>: {<Stat.Mean: 'mean'>: [0.3939870418125802,
    0.003909500022581415,
    0.2038093385825278,
    0.41610579618865073,
    0.2984978996424802,
    0.5044464940022992,
    0.1546436022441162,
    0.5360679329596698,
    0.4480237391550743]},
  <Attr.Area: 'area'>: [0.7558644612630209,
   0.0416256586710612,
   0.14755121866861978,
   0.0015691121419270833,
   0.006850878397623698,
   0.00924396514892578,
   0.0055999755859375,
   0.015059153238932293,
   0.012799580891927084],
  <Attr.Coords: 'coords'>: [52.37057194443602, 4.899693131446838]},
 425357539392385: {'instance': [1,
   2,
   3,
   4,
   5,
   6,
   7,
   8,
   9,
   10,
   11,
   12,
   13,
   14,
   15],
  'label': ['building',
   'pole',
   'car',
   'sky',
   'parking',
   'car',
   'car',
   'trash-can',
   'bicycle',
   'lane-marking-general',
   'motorcycle',
   'street-light',
   'traffic-sign-front',
   'vegetation',
   'road'],
  <Attr.H: 'h'>: {<Stat.Mean: 'mean'>: [0.5921773524930944,
    0.6327191999142582,
    0.7990763976404834,
    0.5765254011317875,
    0.44804747739674394,
    0.6421426069626092,
    0.6187918256926949,
    0.6028228274924631,
    0.5382034164927705,
    0.46371110511759134,
    0.6046834721929005,
    0.6135545846320734,
    0.5901979022884541,
    0.5871314561421965,
    0.25113390941734787]},
  <Attr.Area: 'area'>: [0.09644381205240886,
   0.0015160242716471357,
   0.014342308044433594,
   0.20641040802001953,
   0.04845619201660156,
   0.08424981435139973,
   0.02399444580078125,
   0.0017824172973632812,
   0.006670951843261719,
   0.007302284240722656,
   0.0447533925374349,
   0.0012861887613932292,
   0.000415802001953125,
   0.2875099182128906,
   0.1719366709391276],
  <Attr.Coords: 'coords'>: [52.36905875013216, 4.884721040725708]}}

Pick an image from the sample list. We are going to visualise its segmentation.

In [49]:

Copied!

mf_orig_id, mf_image = mf_model.load_image(mf_image_paths[0])
list(zip(mf_stats[mf_orig_id]["label"], mf_stats[mf_orig_id][Attr.H][Stat.Mean]))
mf_orig_id, mf_image = mf_model.load_image(mf_image_paths[0])
list(zip(mf_stats[mf_orig_id]["label"], mf_stats[mf_orig_id][Attr.H][Stat.Mean]))

Out[49]:

[('car', 0.2823790023750773),
 ('billboard', 0.5912259465282802),
 ('building', 0.13504297116381395),
 ('fence', 0.3525755914359096),
 ('car', 0.511013961752236),
 ('traffic-sign-front', 0.7773113699067796),
 ('pole', 0.22844708819299545),
 ('car', 0.40711473847008145),
 ('person', 0.5218501405541671),
 ('curb', 0.10621882951952231),
 ('sky', 0.5638665736432154),
 ('pole', 0.15756551394378024),
 ('rail-track', 0.12188134405403411),
 ('person', 0.36784870943734616),
 ('car', 0.31897104448485847),
 ('billboard', 0.35976192594477463),
 ('billboard', 0.2138299841675105),
 ('person', 0.3173155027384608),
 ('lane-marking-general', 0.10761744871881532),
 ('billboard', 0.7116212523547456),
 ('sidewalk', 0.13389144345684345),
 ('pole', 0.32318192209742364),
 ('pole', 0.3947622400231295),
 ('curb-cut', 0.0971824490235465),
 ('billboard', 0.391136501738752),
 ('person', 0.2772823411274336),
 ('billboard', 0.33531114276895685),
 ('terrain', 0.17352801001745893),
 ('billboard', 0.4694826773121544),
 ('bike-lane', 0.05938501998751345),
 ('billboard', 0.6332033093294557),
 ('vegetation', 0.26958763078692716),
 ('road', 0.1098876774345231),
 ('traffic-sign-front', 0.6125185627479978),
 ('car', 0.3954640427416042)]

The statistics are stored in parquet files with the same name and path as the image, with the extension .stat.parquet. They can be loaded back into a dictionary by using the model.load_stats() method. Keep in mind that although we use Attr and Stat elements (cf. streetscapes.enums) as dictionary keys, those can be indexed as strings (e.g., Attr['Area'] is equivalent to Attr.Area).

We will visualise the segmentation with the loaded metadata to ensure that it works.

In [50]:

Copied!

ds_orig_id, ds_image = next(iter(ds_images.items()))
ds_orig_id, ds_image = next(iter(ds_images.items()))

In [51]:

Copied!





(fig, ax) = mf_model.visualise_segmentation(
    ds_image,
    ds_masks[ds_orig_id],
    ds_instances[ds_orig_id],
    highlight,
    title=sample_image.name,
)
(fig, ax) = mf_model.visualise_segmentation(
    ds_image,
    ds_masks[ds_orig_id],
    ds_instances[ds_orig_id],
    highlight,
    title=sample_image.name,
)

DinoSAM¶

Next, we will apply the same pipeline to another model: DinoSAM. The implementation of DinoSAM was insipred heavily by two existing projects: Language Segment-Anything (LangSAM) and SamGeo.

In a nutshell, DinoSAM uses the GroundingDINO for object detection and labelling and SAM2 models for semantic segmentation. The result is a model tha can segment images into instances of objects requested via a textual prompt.

We will use the same dataset as the MaskFormer model.

In [65]:

Copied!

ds_model = scs.models.DinoSAM()
ds_model = scs.models.DinoSAM()

In [66]:

Copied!

ds_images, ds_masks, ds_instances = ds_model.segment(sample_image, labels)
ds_images, ds_masks, ds_instances = ds_model.segment(sample_image, labels)

Streetscapes | 2025-03-24@17:01:12 | [ 173710811305308 ] Detecting objects...
Streetscapes | 2025-03-24@17:01:20 | [ 173710811305308 ] Performing segmentation...
Streetscapes | 2025-03-24@17:01:35 | [ 173710811305308 ] Removing overlaps...
Streetscapes | 2025-03-24@17:01:35 | [ 173710811305308 ] Extracted 101 instances for 6 labels.

Extract the statistics for the sample image, as in the case of the MaskFormer model.

In [54]:

Copied!

ds_image_paths, ds_mask_paths, ds_stat_paths = ds_model.segment_from_dataset(
    amsterdam_side, labels, sample=3, attrs=attrs, stats=stats
)
ds_image_paths, ds_mask_paths, ds_stat_paths = ds_model.segment_from_dataset(
    amsterdam_side, labels, sample=3, attrs=attrs, stats=stats
)

Streetscapes | 2025-03-24@16:55:15 | Extracting metadata...

100%|██████████| 101/101 [00:00<00:00, 129.90it/s]

Show the instances and statistics

Load the statistics and the masks.

In [4]:

Copied!

ds_image_stats
ds_image_stats

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[4], line 1
----> 1 ds_image_stats

NameError: name 'ds_image_stats' is not defined

Visualise the segmentation. DinoSAM is more detailed, and it can identify objects such as windows as separate instances.

In [57]:

Copied!

ds_orig_id, ds_image = next(iter(ds_images.items()))
ds_orig_id, ds_image = next(iter(ds_images.items()))

In [58]:

Copied!





(fig, ax) = mf_model.visualise_segmentation(
    ds_image,
    ds_masks[ds_orig_id],
    ds_instances[ds_orig_id],
    highlight,
    title=sample_image.name,
)
(fig, ax) = mf_model.visualise_segmentation(
    ds_image,
    ds_masks[ds_orig_id],
    ds_instances[ds_orig_id],
    highlight,
    title=sample_image.name,
)

We can also segment a subset of our dataset with DinoSAM, with the same API as the MaskFormer model.

In [59]:

Copied!

ds_image_paths, ds_mask_paths, ds_stat_paths = ds_model.segment_from_dataset(
    amsterdam_side, labels, sample=3, attrs=attrs, stats=stats
)
ds_image_paths, ds_mask_paths, ds_stat_paths = ds_model.segment_from_dataset(
    amsterdam_side, labels, sample=3, attrs=attrs, stats=stats
)

  0%|          | 0/1 [00:00<?, ?it/s]

Streetscapes | 2025-03-24@16:55:20 | [ 345472247473918 ] Detecting objects...

Unused or unrecognized kwargs: device.

Streetscapes | 2025-03-24@16:55:32 | [ 345472247473918 ] Performing segmentation...
Streetscapes | 2025-03-24@16:55:47 | [ 345472247473918 ] Removing overlaps...
Streetscapes | 2025-03-24@16:55:47 | [ 345472247473918 ] Extracted 30 instances for 6 labels.
Streetscapes | 2025-03-24@16:55:47 | [ 202880278139360 ] Detecting objects...

Unused or unrecognized kwargs: device.

Streetscapes | 2025-03-24@16:55:59 | [ 202880278139360 ] Performing segmentation...
Streetscapes | 2025-03-24@16:56:13 | [ 202880278139360 ] Removing overlaps...
Streetscapes | 2025-03-24@16:56:13 | [ 202880278139360 ] Extracted 46 instances for 6 labels.
Streetscapes | 2025-03-24@16:56:13 | [ 1391290131237978 ] Detecting objects...

Unused or unrecognized kwargs: device.

Streetscapes | 2025-03-24@16:56:23 | [ 1391290131237978 ] Performing segmentation...
Streetscapes | 2025-03-24@16:56:34 | [ 1391290131237978 ] Removing overlaps...
Streetscapes | 2025-03-24@16:56:34 | [ 1391290131237978 ] Extracted 27 instances for 6 labels.
Streetscapes | 2025-03-24@16:56:34 | Extracting metadata...

100%|██████████| 30/30 [00:00<00:00, 111.86it/s]
100%|██████████| 46/46 [00:00<00:00, 121.11it/s]
100%|██████████| 27/27 [00:00<00:00, 123.11it/s]
100%|██████████| 1/1 [01:17<00:00, 77.40s/it]

Load the statistics and the masks.

In [60]:

Copied!

ds_stats = ds_model.load_stats(ds_stat_paths)
ds_stats
ds_stats = ds_model.load_stats(ds_stat_paths)
ds_stats

Out[60]:

{345472247473918: {'instance': [1,
   2,
   3,
   4,
   5,
   6,
   7,
   8,
   9,
   10,
   11,
   12,
   13,
   14,
   15,
   16,
   17,
   18,
   19,
   20,
   21,
   22,
   23,
   24,
   25,
   26,
   27,
   28,
   29,
   30],
  'label': ['car',
   'tree',
   'car',
   'window',
   'window',
   'window',
   'window',
   'window',
   'tree',
   'window',
   'window',
   'window',
   'window',
   'window',
   'window',
   'window',
   'building',
   'window',
   'building',
   'tree',
   'road',
   'window',
   'window',
   'sky',
   'window',
   'road',
   'building',
   'window',
   'window',
   'window'],
  <Attr.H: 'h'>: {<Stat.Mean: 'mean'>: [0.5440265436563434,
    0.299645584356783,
    0.6933892035551608,
    0.3216061790695245,
    0.45917576705818813,
    0.4383226569942364,
    0.3940039199784987,
    0.49330337847989303,
    0.22419251082028535,
    0.3286475221083643,
    0.2978296178806794,
    0.45916312416370464,
    0.2547333297705487,
    0.13604200535624875,
    0.5006509116604714,
    0.5644015010774754,
    0.18430284590763377,
    0.4738245457069165,
    0.266670274045325,
    0.2838462872187538,
    0.11280132932006535,
    0.2235651260826933,
    0.29292402298608067,
    0.01864432024140357,
    0.24586623060204277,
    0.6318614224621412,
    0.17667520087581332,
    0.4536816382073086,
    0.4661024691602208,
    0.10265648123036657]},
  <Attr.Area: 'area'>: [0.015429178873697916,
   0.0927441914876302,
   0.0008525848388671875,
   0.007111549377441406,
   0.0018418629964192708,
   0.008541107177734375,
   0.0020615259806315103,
   0.0019483566284179688,
   0.035114288330078125,
   0.0019426345825195312,
   0.0005467732747395834,
   3.1789143880208336e-05,
   0.000537872314453125,
   0.0004113515218098958,
   0.0004138946533203125,
   0.0018266042073567708,
   0.024576187133789062,
   0.0011396408081054688,
   0.015427907307942707,
   0.10629653930664062,
   0.15296808878580728,
   0.0001713434855143229,
   0.0006764729817708334,
   0.05753644307454427,
   0.00048732757568359375,
   0.1777181625366211,
   0.08738422393798828,
   0.005718549092610677,
   0.007917086283365885,
   0.002731959025065104],
  <Attr.Coords: 'coords'>: [52.36734569921879, 4.879598021507263]},
 202880278139360: {'instance': [1,
   2,
   3,
   4,
   5,
   6,
   7,
   8,
   9,
   10,
   11,
   12,
   13,
   14,
   15,
   16,
   17,
   18,
   19,
   20,
   21,
   22,
   23,
   24,
   25,
   26,
   27,
   28,
   29,
   30,
   31,
   32,
   33,
   34,
   35,
   36,
   37,
   38,
   39,
   40,
   41,
   42,
   43,
   44,
   45,
   46],
  'label': ['sky',
   'road',
   'window',
   'window',
   'tree',
   'window',
   'window',
   'window',
   'window',
   'window',
   'window',
   'window',
   'window',
   'building',
   'window',
   'window',
   'window',
   'building',
   'tree',
   'window',
   'window',
   'door',
   'window',
   'window',
   'window',
   'window',
   'window',
   'tree',
   'window',
   'window',
   'tree',
   'window',
   'window',
   'window',
   'tree',
   'window',
   'window',
   'window',
   'window',
   'window',
   'window',
   'tree',
   'building',
   'window',
   'door',
   'tree'],
  <Attr.H: 'h'>: {<Stat.Mean: 'mean'>: [0.5526069702912184,
    0.311671050186608,
    0.580407792922542,
    0.4971119124914875,
    0.5307695378586181,
    0.5752876226240844,
    0.4552306211478809,
    0.542796334331158,
    0.5195239590737165,
    0.0,
    0.34183764364459124,
    0.5986441291303233,
    0.6053183779802584,
    0.2941016355076827,
    0.34980247835192996,
    0.48450223964342615,
    0.5711933516158374,
    0.6035297700770833,
    0.5596746541860416,
    0.5814197053234283,
    0.5804810557041687,
    0.16683223162535382,
    0.3849372527869407,
    0.5436016602109246,
    0.16666666666666666,
    0.35950820505191494,
    0.34901598032570524,
    0.0,
    0.5661478729709389,
    0.5288650706356013,
    0.5501506815700138,
    0.44693560794641624,
    0.39539702972785384,
    0.29801401125531507,
    0.4462246570297521,
    0.5732057666381131,
    0.5283421917025508,
    0.2135333476814539,
    0.4772548349140121,
    0.40239729267411506,
    0.5397642886334371,
    0.43400389536290984,
    0.1701228558060301,
    0.38166533363075056,
    0.2796872820423074,
    0.57394067823276]},
  <Attr.Area: 'area'>: [0.3571656545003255,
   0.21267318725585935,
   0.0013306935628255208,
   0.0013227462768554688,
   0.0006421407063802084,
   0.00128936767578125,
   0.0008090337117513021,
   0.0006001790364583334,
   0.0009438196818033854,
   0.0,
   0.003550847371419271,
   0.0014508565266927085,
   0.0014524459838867188,
   0.275882085164388,
   0.0008904139200846354,
   0.0009558995564778646,
   0.0009969075520833333,
   0.0034049352010091147,
   0.011565208435058594,
   0.0004266103108723958,
   0.000823974609375,
   0.010253270467122396,
   0.0005887349446614584,
   0.0008217493693033854,
   2.86102294921875e-06,
   0.000759124755859375,
   0.0005855560302734375,
   0.0,
   0.0004285176595052083,
   0.0007155736287434896,
   7.025400797526042e-05,
   0.0007829666137695312,
   0.0008306503295898438,
   0.00167083740234375,
   0.0020704269409179688,
   0.00015576680501302084,
   0.00042947133382161457,
   0.0004965464274088541,
   0.0005782445271809896,
   0.00045744578043619793,
   0.0004857381184895833,
   0.013928731282552084,
   0.0073757171630859375,
   0.00043900807698567707,
   0.018599510192871097,
   0.007810910542805989],
  <Attr.Coords: 'coords'>: [52.363722852688966, 4.882736206054688]},
 1391290131237978: {'instance': [1,
   2,
   3,
   4,
   5,
   6,
   7,
   8,
   9,
   10,
   11,
   12,
   13,
   14,
   15,
   16,
   17,
   18,
   19,
   20,
   21,
   22,
   23,
   24,
   25,
   26,
   27],
  'label': ['car',
   'window',
   'window',
   'car',
   'car',
   'car',
   'window',
   'window',
   'car',
   'window',
   'car',
   'window',
   'window',
   'tree',
   'window',
   'tree',
   'building',
   'window',
   'door',
   'door',
   'tree',
   'car',
   'window',
   'truck',
   'window',
   'window',
   'tree'],
  <Attr.H: 'h'>: {<Stat.Mean: 'mean'>: [0.2023538761421148,
    0.15777592802591112,
    0.1815622771033444,
    0.13333333333333333,
    0.28968633404765526,
    0.10979828694163891,
    0.29526556405790155,
    0.20475289106667424,
    0.1947874983971176,
    0.18738878518599072,
    0.2571044372546789,
    0.203185216563746,
    0.0,
    0.1473711840243764,
    0.12852492371416965,
    0.1477539315534903,
    0.12732246527111116,
    0.12535885720354456,
    0.2004476977742619,
    0.2239433904289841,
    0.1343011847500694,
    0.269779158916826,
    0.11194459641036893,
    0.2632897813733734,
    0.13241819594615076,
    0.13062906011662348,
    0.16246990627013605]},
  <Attr.Area: 'area'>: [0.015926678975423176,
   0.007676124572753906,
   0.008685111999511719,
   3.178914388020833e-07,
   0.0005029042561848959,
   9.791056315104168e-05,
   0.0076618194580078125,
   0.0016622543334960938,
   0.00027434031168619793,
   0.002018610636393229,
   0.0023361841837565103,
   0.001953125,
   0.0,
   0.016866683959960938,
   0.0012734731038411458,
   0.0019954045613606772,
   0.15228589375813803,
   0.0024350484212239585,
   1.239776611328125e-05,
   0.004676500956217448,
   0.002945582071940104,
   0.02462482452392578,
   0.0006472269694010416,
   0.04558086395263672,
   0.0058752695719401045,
   0.0010484059651692708,
   0.08102067311604817],
  <Attr.Coords: 'coords'>: [52.3655736206299, 4.899687767028809]}}

In [61]:

Copied!

ds_masks = ds_model.load_masks(ds_mask_paths)
ds_masks
ds_masks = ds_model.load_masks(ds_mask_paths)
ds_masks

Out[61]:

{345472247473918: array([[20, 20, 20, ..., 24, 24, 24],
        [20, 20, 20, ..., 24, 24, 24],
        [20, 20, 20, ..., 24, 24, 24],
        ...,
        [ 0,  0,  0, ...,  0,  0,  0],
        [ 0,  0,  0, ...,  0,  0,  0],
        [ 0,  0,  0, ...,  0,  0,  0]], shape=(1536, 2048), dtype=uint32),
 202880278139360: array([[1, 1, 1, ..., 1, 1, 1],
        [1, 1, 1, ..., 1, 1, 1],
        [1, 1, 1, ..., 1, 1, 1],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]], shape=(1536, 2048), dtype=uint32),
 1391290131237978: array([[17, 17, 17, ..., 27, 27, 27],
        [17, 17, 17, ..., 27, 27, 27],
        [17, 17, 17, ..., 27, 27, 27],
        ...,
        [ 0,  0,  0, ...,  0,  0,  0],
        [ 0,  0,  0, ...,  0,  0,  0],
        [ 0,  0,  0, ...,  0,  0,  0]], shape=(1536, 2048), dtype=uint32)}

Pick an image from the sampled pool.

In [62]:

Copied!

ds_orig_id, ds_image = ds_model.load_image(ds_image_paths[0])
list(zip(ds_stats[ds_orig_id]["label"], ds_stats[ds_orig_id][Attr.H][Stat.Mean]))
ds_orig_id, ds_image = ds_model.load_image(ds_image_paths[0])
list(zip(ds_stats[ds_orig_id]["label"], ds_stats[ds_orig_id][Attr.H][Stat.Mean]))

Out[62]:

[('car', 0.5440265436563434),
 ('tree', 0.299645584356783),
 ('car', 0.6933892035551608),
 ('window', 0.3216061790695245),
 ('window', 0.45917576705818813),
 ('window', 0.4383226569942364),
 ('window', 0.3940039199784987),
 ('window', 0.49330337847989303),
 ('tree', 0.22419251082028535),
 ('window', 0.3286475221083643),
 ('window', 0.2978296178806794),
 ('window', 0.45916312416370464),
 ('window', 0.2547333297705487),
 ('window', 0.13604200535624875),
 ('window', 0.5006509116604714),
 ('window', 0.5644015010774754),
 ('building', 0.18430284590763377),
 ('window', 0.4738245457069165),
 ('building', 0.266670274045325),
 ('tree', 0.2838462872187538),
 ('road', 0.11280132932006535),
 ('window', 0.2235651260826933),
 ('window', 0.29292402298608067),
 ('sky', 0.01864432024140357),
 ('window', 0.24586623060204277),
 ('road', 0.6318614224621412),
 ('building', 0.17667520087581332),
 ('window', 0.4536816382073086),
 ('window', 0.4661024691602208),
 ('window', 0.10265648123036657)]

Create a dictionary of instances and labels for the selected image. We can use that to visualise the segmentation.

In [63]:

Copied!





ds_instances = {
    inst_id: label
    for inst_id, label in zip(
        ds_stats[ds_orig_id]["instance"], ds_stats[ds_orig_id]["label"]
    )
}
ds_instances = {
    inst_id: label
    for inst_id, label in zip(
        ds_stats[ds_orig_id]["instance"], ds_stats[ds_orig_id]["label"]
    )
}

In [64]:

Copied!





(fig, ax) = ds_model.visualise_segmentation(
    ds_image,
    ds_masks[ds_orig_id],
    ds_instances,
    highlight,
    title=f"Segmentation for {sample_image.name} | DinoSAM | from dataset",
)
(fig, ax) = ds_model.visualise_segmentation(
    ds_image,
    ds_masks[ds_orig_id],
    ds_instances,
    highlight,
    title=f"Segmentation for {sample_image.name} | DinoSAM | from dataset",
)