cassetta.backbones
Overview
Backbones are complex/deep architectures, made of a bunch of layers.
In our philosophy, backbones are task-independant, and do not care about the number of input and output channels of the problem at hand. Instead they typically map an input feature space to an output feature space.
Models would wrap a backbone between a feature-extraction layer (a single convolution without activation, possibly with a somewhat large kernel size) and a feature-mapping layer (a single 1x1x1 convolution, possibly followed by a task-specific activation like a SoftMax).
| MODULE | DESCRIPTION |
|---|---|
fcn |
Fully convolutional encoders and decoders |
unet |
U-Nets: autoencoder wirth skip connections |
atrous |
Networks that use dilated convolutions |
cassetta.backbones.fcn
ConvEncoder
ConvEncoder(ndim, nb_features=16, mul_features=2, nb_levels=3, nb_conv_per_level=2, kernel_size=3, residual=False, activation='ReLU', norm=None, dropout=None, attention=None, order='cndax', pool_factor=2, pool_mode='interpolate')
Bases: Sequential
A fully convolutional encoder
Diagram
flowchart LR
1["`[F0, W]`"] ---2("ConvGroup"):::w-->
3["`[F0, W]`"] ---4("Down"):::w-->
5["`[F1, W//2]`"] ---6("ConvGroup"):::w-->
7["`[F1, W//2]`"] --- 8("Down"):::w-->
9["`[F2, W//4]`"] ---10("ConvGroup"):::w-->
11["`[F2, W//4]`"]
classDef w fill:papayawhip,stroke:peachpuff;
| PARAMETER | DESCRIPTION |
|---|---|
ndim |
Number of spatial dimensions
TYPE:
|
nb_features |
Number of features at the finest level. If a list, number of features at each level of the encoder.
TYPE:
|
mul_features |
Multiply the number of features by this number each time we go down one level.
TYPE:
|
nb_levels |
Number of levels in the encoder
TYPE:
|
nb_conv_per_level |
Number of convolutional layers at each level.
TYPE:
|
kernel_size |
Kernel size
TYPE:
|
residual |
Use residual connections between convolutional blocks
TYPE:
|
activation |
Type of activation
TYPE:
|
norm |
Normalization
TYPE:
|
dropout |
Channel dropout probability
TYPE:
|
attention |
Attention
TYPE:
|
order |
Modules order (permutation of 'ncdax')
TYPE:
|
pool_factor |
Downsampling factor (per dimension).
TYPE:
|
pool_mode |
Method used to go down one level.
TYPE:
|
Source code in cassetta/backbones/fcn.py
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 | |
forward
| PARAMETER | DESCRIPTION |
|---|---|
inp |
Input tensor
TYPE:
|
return_all |
Return all intermediate output tensors (at each level)
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
out
|
Output tensor(s).
If
TYPE:
|
Source code in cassetta/backbones/fcn.py
ConvDecoder
ConvDecoder(ndim, nb_features=16, div_features=2, nb_levels=3, nb_conv_per_level=2, skip=False, kernel_size=3, residual=False, activation='ReLU', norm=None, dropout=None, attention=None, order='cndax', unpool_factor=2, unpool_mode='interpolate')
Bases: Sequential
A fully convolutional decoder
Diagram: pure decoder
flowchart LR
1["`[F0, W]`"] ---2("Up"):::w-->
3["`[F1, W*2]`"] ---4("ConvGroup"):::w-->
5["`[F1, W*2]`"] ---6("Up"):::w-->
7["`[F2, W*4]`"] ---8("ConvGroup"):::w-->
9["`[F2, W*4]`"]
classDef w fill:papayawhip,stroke:peachpuff;
flowchart LR
S1["`[S1, W*2]`"]
S2["`[S2, W*2]`"]
1["`[F0, W]`"] ---2("Up"):::w-->
3["`[F1, W*2]`"] ---4(("c")):::d-->
5["`[F1+S1, W]`"] ---6("ConvGroup"):::w-->
7["`[F1, W*2]`"] ---8("Up"):::w-->
9["`[F2, W*4]`"] ---10(("c")):::d-->
11["`[F2+S2, W*4]`"] ---12("ConvGroup"):::w-->
13["`[F2, W*4]`"]
S1 --- 4
S2 --- 10
classDef w fill:papayawhip,stroke:peachpuff;
classDef d fill:lightcyan,stroke:lightblue;
flowchart LR
S1["`[F1, W*2]`"]
S2["`[F2, W*2]`"]
1["`[F0, W]`"] ---2("Up"):::w-->
3["`[F1, W*2]`"] ---4(("+")):::d-->
5["`[F1, W]`"] ---6("ConvGroup"):::w-->
7["`[F1, W*2]`"] ---8("Up"):::w-->
9["`[F2, W*4]`"] ---10(("+")):::d-->
11["`[F2, W*4]`"] ---12("ConvGroup"):::w-->
13["`[F2, W*4]`"]
S1 --- 4
S2 --- 10
classDef w fill:papayawhip,stroke:peachpuff;
classDef d fill:lightcyan,stroke:lightblue;
| PARAMETER | DESCRIPTION |
|---|---|
ndim |
Number of spatial dimensions
TYPE:
|
nb_features |
Number of features at the finest level. If a list, number of features at each level of the encoder.
TYPE:
|
div_features |
Divide the number of features by this number each time we go up one level.
TYPE:
|
nb_levels |
Number of levels in the encoder
TYPE:
|
nb_conv_per_level |
Number of convolutional layers at each level.
TYPE:
|
skip |
Number of channels to concatenate in the skip connection. If 0 (or False) and skip tensors are provided, will try to add them instead of cat. If True, the number of skipped channels and the number of features are identical.
TYPE:
|
kernel_size |
Kernel size
TYPE:
|
residual |
Use residual connections between convolutional blocks
TYPE:
|
activation |
Type of activation
TYPE:
|
norm |
Normalization
TYPE:
|
dropout |
Channel dropout probability
TYPE:
|
attention |
Attention
TYPE:
|
order |
Modules order (permutation of 'ncdax')
TYPE:
|
unpool_factor |
Upsampling factor (per dimension).
TYPE:
|
unpool_mode |
Method used to go up one level.
TYPE:
|
Source code in cassetta/backbones/fcn.py
234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 | |
forward
| PARAMETER | DESCRIPTION |
|---|---|
*inp |
Input tensor(s), eventually including skip connections. Ordered from coarsest to finest.
TYPE:
|
return_all |
Return all intermediate output tensors (at each level).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
out
|
Output tensor(s).
If
TYPE:
|
Source code in cassetta/backbones/fcn.py
cassetta.backbones.unet
UNet
UNet(ndim, nb_features=64, mul_features=2, nb_levels=5, nb_levels_decoder=None, nb_conv_per_level=2, kernel_size=3, residual=False, activation='ReLU', norm=None, dropout=None, attention=None, order='cndax', pool_factor=2, pool_mode='pool', unpool_mode='conv', skip=True)
Bases: Module
A UNet
Diagram
flowchart LR
II0["`[F0, W]`"] ---CI0("`ConvGroup`"):::w-->
IO0["`[F0, W]`"] ---D1("`Down`"):::w-->
II1["`[F1, W//2]`"] ---CI1("`ConvGroup`"):::w-->
IO1["`[F1, W//2]`"] ---D2("`Down`"):::w-->
II2["`[F2, W//4]`"] ---CI2("`ConvGroup`"):::w-->
OO2["`[F2, W//4]`"]:::o ---U2("`Up`"):::w-->
OI1["`[F1, W//2]`"] ---Z1(("c")):::d-->
OZ1["`[F1*2, W//2]`"] ---CO1("`ConvGroup`"):::w-->
OO1["`[F1, W//2]`"]:::o ---U1("`Up`"):::w-->
OI0["`[F0, W]`"] ---Z0(("c")):::d-->
OZ0["`[F0*2, W]`"] ---CO0("`ConvGroup`"):::w-->
OO0["`[F0, W]`"]:::o
IO0 --- Z0
IO1 --- Z1
classDef w fill:papayawhip,stroke:peachpuff;
classDef d fill:lightcyan,stroke:lightblue;
classDef o fill:mistyrose,stroke:lightpink;
flowchart LR
II0["`[F0, W]`"] ---CI0("`ConvGroup`"):::w-->
IO0["`[F0, W]`"] ---D1("`Down`"):::w-->
II1["`[F1, W//2]`"] ---CI1("`ConvGroup`"):::w-->
IO1["`[F1, W//2]`"] ---D2("`Down`"):::w-->
II2["`[F2, W//4]`"] ---CI2("`ConvGroup`"):::w-->
OO2["`[F2, W//4]`"]:::o ---U2("`Up`"):::w-->
OI1["`[F1, W//2]`"] ---Z1(("+")):::d-->
OZ1["`[F1, W//2]`"] ---CO1("`ConvGroup`"):::w-->
OO1["`[F1, W//2]`"]:::o ---U1("`Up`"):::w-->
OI0["`[F0, W]`"] ---Z0(("+")):::d-->
OZ0["`[F0, W]`"] ---CO0("`ConvGroup`"):::w-->
OO0["`[F0, W]`"]:::o
IO0 --- Z0
IO1 --- Z1
classDef w fill:papayawhip,stroke:peachpuff;
classDef d fill:lightcyan,stroke:lightblue;
classDef o fill:mistyrose,stroke:lightpink;
flowchart LR
II0["`[F0, W]`"] ---CI0("`ConvGroup`"):::w-->
IO0["`[F0, W]`"] ---D1("`Down`"):::w-->
II1["`[F1, W//2]`"] ---CI1("`ConvGroup`"):::w-->
IO1["`[F1, W//2]`"] ---D2("`Down`"):::w-->
II2["`[F2, W//4]`"] ---CI2("`ConvGroup`"):::w-->
OO2["`[F2, W//4]`"]:::o ---U2("`Up`"):::w-->
OI1["`[F1, W//2]`"] ---CO1("`ConvGroup`"):::w-->
OO1["`[F1, W//2]`"]:::o ---U1("`Up`"):::w-->
OI0["`[F0, W]`"] ---CO0("`ConvGroup`"):::w-->
OO0["`[F0, W]`"]:::o
classDef w fill:papayawhip,stroke:peachpuff;
classDef d fill:lightcyan,stroke:lightblue;
classDef o fill:mistyrose,stroke:lightpink;
Difference with Ronneberger et al.
- Default parameters are from Ronneberger et al.
- However, instead of performing a 3x3 channel-expanding convolution + ReLU in the encoder, we first perform a 1x1 channel-expanding convolution without ReLU, followed by a 3x3 channel-preserving convolution + ReLU.
- Both implementations have the same reprentation power, although ours adds unneeded free parameters.
- The benefit of our approach is it brings a bit more flexibility.
We can easily replace max-pooling with other types of downsampling
operators (e.g., linear downsampling or strided convolution)
using
pool_mode="interpolate"orpool_mode="conv".
Reference
Ronneberger, Fischer & Brox, "U-Net: Convolutional Networks for Biomedical Image Segmentation." MICCAI (2015). arxiv:1505.04597
| PARAMETER | DESCRIPTION |
|---|---|
ndim |
Number of spatial dimensions
TYPE:
|
nb_features |
Number of features at the finest level. If a list, number of features at each level of the encoder.
TYPE:
|
mul_features |
Multiply the number of features by this number each time we go down one level.
TYPE:
|
nb_levels |
Number of levels in the encoder
TYPE:
|
nb_levels_decoder |
Number of levels in the decoder
TYPE:
|
nb_conv_per_level |
Number of convolutional layers at each level.
TYPE:
|
kernel_size |
Kernel size
TYPE:
|
residual |
Use residual connections between convolutional blocks
TYPE:
|
activation |
Type of activation
TYPE:
|
norm |
Normalization
TYPE:
|
dropout |
Channel dropout probability
TYPE:
|
attention |
Attention
TYPE:
|
order |
Modules order (permutation of 'ncdax')
TYPE:
|
pool_factor |
Down/Upsampling factor (per dimension).
TYPE:
|
pool_mode |
Method used to go down one level.
TYPE:
|
unpool_mode |
Method used to go up one level.
TYPE:
|
skip |
Type of skip connections:
TYPE:
|
Source code in cassetta/backbones/unet.py
108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 | |
forward
| PARAMETER | DESCRIPTION |
|---|---|
inp |
Input tensor
TYPE:
|
return_all |
Return all intermediate output tensors (at each level).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
out
|
Output tensor(s).
TYPE:
|
Source code in cassetta/backbones/unet.py
cassetta.backbones.atrous
MeshNet
MeshNet(ndim, nb_features=21, nb_layers=6, nb_conv_per_layer=2, dilation=1, mul_dilation=2, kernel_size=3, residual=False, activation='ReLU', norm='batch', dropout=None, attention=None, order='caxnd')
Bases: ModuleGroup
A stack of dilated convolutions
Diagram
flowchart LR
II0["`[F, W]`"] ---CI0("`ConvGroup(dilation=1)`"):::w-->
IO0["`[F, W]`"] ---CI1("`ConvGroup(dilation=2)`"):::w-->
IO1["`[F, W]`"] ---CI2("`ConvGroup(dilation=4)`"):::w-->
OO2["`[F, W]`"] ---CO1("`ConvGroup(dilation=8)`"):::w-->
OO1["`[F, W]`"] ---CO0("`ConvGroup(dilation=16)`"):::w-->
OO0["`[F, W]`"]:::o
classDef w fill:papayawhip,stroke:peachpuff;
classDef d fill:lightcyan,stroke:lightblue;
classDef o fill:mistyrose,stroke:lightpink;
Difference with Fedorov et al.
- Default parameters are from Fedorov et al.
- However, Fedorov et al. end with a final convolution block
with
dilation=1, which our default network discards. - To recover their behavior, explictely set the dilation list:
dilation=[1, 2, 4, 8, 16, 1].
References
-
Yu & Koltun, "Multi-Scale Context Aggregation by Dilated Convolutions." ICLR (2016). arxiv:1511.07122
-
Fedorov, Johnson, Damaraju, Ozerin, Calhoun & Plis, "End-to-end learning of brain tissue segmentation from imperfect labeling." IJCNN (2017). arxiv:1612.00940
| PARAMETER | DESCRIPTION |
|---|---|
ndim |
Number of spatial dimensions
TYPE:
|
nb_features |
Number of features at the finest level. If a list, number of features at each level of the encoder.
TYPE:
|
nb_layers |
Number of levels in the network.
TYPE:
|
nb_conv_per_layers |
Number of convolutional blocks in each layer.
TYPE:
|
dilation |
Dilation factor in the first layer. If a list, number of features in each layer.
TYPE:
|
mul_dilation |
Multiply the dilation by this number each time we go down one level.
TYPE:
|
kernel_size |
Kernel size
TYPE:
|
residual |
Use residual connections between convolutional blocks and between layers.
TYPE:
|
activation |
Type of activation
TYPE:
|
norm |
Normalization
TYPE:
|
dropout |
Channel dropout probability
TYPE:
|
attention |
Attention
TYPE:
|
order |
Modules order (permutation of 'ncdax')
TYPE:
|
Source code in cassetta/backbones/atrous.py
ATrousNet
ATrousNet(ndim, nb_features=21, nb_levels=5, nb_conv_per_level=2, dilation=1, mul_dilation=2, kernel_size=3, residual=False, activation='ReLU', norm='batch', dropout=None, attention=None, order='caxnd')
Bases: ModuleGroup
Parallel dilated convolutions
Diagram
flowchart LR
1["`[F, W]`"] ---C11("`ConvGroup(dilation=1)`"):::w-->
2["`[F, W]`"] ---C21("`ConvGroup(dilation=1)`"):::w--> 3["`[F, W]`"]
2 ---C22("`ConvGroup(dilation=2)`"):::w--> 4["`[F, W]`"]
3 & 4 ---Z2(("+")):::d-->
5["`[F, W]`"] ---C31("`ConvGroup(dilation=1)`"):::w--> 6["`[F, W]`"]
5 ---C32("`ConvGroup(dilation=2)`"):::w--> 7["`[F, W]`"]
5 ---C34("`ConvGroup(dilation=4)`"):::w--> 8["`[F, W]`"]
6 & 7 & 8 ---Z3(("+")):::d-->
9["`[F, W]`"] ---C41("`ConvGroup(dilation=1)`"):::w-->10["`[F, W]`"]
9 ---C42("`ConvGroup(dilation=2)`"):::w-->11["`[F, W]`"]
10 & 11 ---Z4(("+")):::d-->
12["`[F, W]`"]---C51("`ConvGroup(dilation=1)`"):::w-->13["`[F, W]`"]:::o
classDef w fill:papayawhip,stroke:peachpuff;
classDef d fill:lightcyan,stroke:lightblue;
classDef o fill:mistyrose,stroke:lightpink;
Reference
Chen, Papandreou, Kokkinos, Murphy & Yuille, "DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs." TPAMI (2017). arxiv:1606.00915
| PARAMETER | DESCRIPTION |
|---|---|
ndim |
Number of spatial dimensions
TYPE:
|
nb_features |
Number of features at the finest level. If a list, number of features at each level of the encoder.
TYPE:
|
nb_levels |
Number of levels in the network.
TYPE:
|
nb_conv_per_level |
Number of convolutional blocks in each layer.
TYPE:
|
dilation |
Dilation factor in the first layer. If a list, number of features in each layer.
TYPE:
|
mul_dilation |
Multiply the dilation by this number each time we go down one level.
TYPE:
|
kernel_size |
Kernel size
TYPE:
|
residual |
Use residual connections between convolutional blocks and between layers.
TYPE:
|
activation |
Type of activation
TYPE:
|
norm |
Normalization
TYPE:
|
dropout |
Channel dropout probability
TYPE:
|
attention |
Attention
TYPE:
|
order |
Modules order (permutation of 'ncdax')
TYPE:
|
Source code in cassetta/backbones/atrous.py
178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 | |