update precommit hooks

Signed-off-by: Vladimir Mandic <mandic00@live.com>
This commit is contained in:
Vladimir Mandic
2025-07-08 16:16:08 -04:00
parent d522982bdb
commit 71a18dcf74
4 changed files with 52 additions and 43 deletions

View File

@@ -25,15 +25,24 @@ repos:
- id: check-case-conflict
- id: check-merge-conflict
- id: check-symlinks
- id: check-yaml
- id: check-illegal-windows-names
- id: check-merge-conflict
- id: detect-private-key
- id: check-builtin-literals
- id: check-case-conflict
- id: check-json
- id: check-symlinks
- id: check-yaml
- id: check-json
- id: check-toml
- id: check-xml
- id: end-of-file-fixer
- id: mixed-line-ending
- id: check-executables-have-shebangs
exclude: |
(?x)^(
.*.bat|
.*.ps1
)$
- id: trailing-whitespace
exclude: |
(?x)^(

View File

@@ -29,8 +29,8 @@ Any code commit is validated before merge
`SD.Next` library can establish external connections *only* for following purposes and *only* when explicitly configured by user:
- Download extensions and themes indexes from automatically updated indexes
- Download extensions and themes indexes from automatically updated indexes
- Download required packages and repositories from GitHub during installation/upgrade
- Download installed/enabled extensions
- Download models from CivitAI and/or Huggingface when instructed by user
- Submit benchmark info upon user interaction
- Submit benchmark info upon user interaction

View File

@@ -1,6 +1,6 @@
# (Generic) EfficientNets for PyTorch
A 'generic' implementation of EfficientNet, MixNet, MobileNetV3, etc. that covers most of the compute/parameter efficient architectures derived from the MobileNet V1/V2 block sequence, including those found via automated neural architecture search.
A 'generic' implementation of EfficientNet, MixNet, MobileNetV3, etc. that covers most of the compute/parameter efficient architectures derived from the MobileNet V1/V2 block sequence, including those found via automated neural architecture search.
All models are implemented by GenEfficientNet or MobileNetV3 classes, with string based architecture definitions to configure the block layouts (idea from [here](https://github.com/tensorflow/tpu/blob/master/models/official/mnasnet/mnasnet_models.py))
@@ -20,7 +20,7 @@ All models are implemented by GenEfficientNet or MobileNetV3 classes, with strin
* 4.5M param MobileNet-V2 110d @ 75%
* 6.1M param MobileNet-V2 140 @ 76.5%
* 5.8M param MobileNet-V2 120d @ 77.3%
### March 23, 2020
* Add EfficientNet-Lite models w/ weights ported from [Tensorflow TPU](https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet/lite)
* Add PyTorch trained MobileNet-V3 Large weights with 75.77% top-1
@@ -39,7 +39,7 @@ All models are implemented by GenEfficientNet or MobileNetV3 classes, with strin
### Nov 22, 2019
* New top-1 high! Ported official TF EfficientNet AdvProp (https://arxiv.org/abs/1911.09665) weights and B8 model spec. Created a new set of `ap` models since they use a different
preprocessing (Inception mean/std) from the original EfficientNet base/AA/RA weights.
### Nov 15, 2019
* Ported official TF MobileNet-V3 float32 large/small/minimalistic weights
* Modifications to MobileNet-V3 model and components to support some additional config needed for differences between TF MobileNet-V3 and mine
@@ -50,7 +50,7 @@ All models are implemented by GenEfficientNet or MobileNetV3 classes, with strin
* Add JIT optimized mem-efficient Swish/Mish autograd.fn in addition to memory-efficient autgrad.fn
* Activation factory to select best version of activation by name or override one globally
* Add pretrained checkpoint load helper that handles input conv and classifier changes
### Oct 27, 2019
* Add CondConv EfficientNet variants ported from https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet/condconv
* Add RandAug weights for TF EfficientNet B5 and B7 from https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet
@@ -75,8 +75,8 @@ Implemented models include:
* MobileNet-V3 (https://arxiv.org/abs/1905.02244)
* FBNet-C (https://arxiv.org/abs/1812.03443)
* Single-Path NAS (https://arxiv.org/abs/1904.02877)
I originally implemented and trained some these models with code [here](https://github.com/rwightman/pytorch-image-models), this repository contains just the GenEfficientNet models, validation, and associated ONNX/Caffe2 export code.
I originally implemented and trained some these models with code [here](https://github.com/rwightman/pytorch-image-models), this repository contains just the GenEfficientNet models, validation, and associated ONNX/Caffe2 export code.
## Pretrained
@@ -117,7 +117,7 @@ More pretrained models to come...
The weights ported from Tensorflow checkpoints for the EfficientNet models do pretty much match accuracy in Tensorflow once a SAME convolution padding equivalent is added, and the same crop factors, image scaling, etc (see table) are used via cmd line args.
**IMPORTANT:**
**IMPORTANT:**
* Tensorflow ported weights for EfficientNet AdvProp (AP), EfficientNet EdgeTPU, EfficientNet-CondConv, EfficientNet-Lite, and MobileNet-V3 models use Inception style (0.5, 0.5, 0.5) for mean and std.
* Enabling the Tensorflow preprocessing pipeline with `--tf-preprocessing` at validation time will improve scores by 0.1-0.5%, very close to original TF impl.
@@ -130,7 +130,7 @@ To run validation w/ TF preprocessing for tf_efficientnet_b5:
To run validation for a model with Inception preprocessing, ie EfficientNet-B8 AdvProp:
`python validate.py /path/to/imagenet/validation/ --model tf_efficientnet_b8_ap -b 48 --num-gpu 2 --img-size 672 --crop-pct 0.954 --mean 0.5 --std 0.5`
|Model | Prec@1 (Err) | Prec@5 (Err) | Param # | Image Scaling | Image Size | Crop |
|Model | Prec@1 (Err) | Prec@5 (Err) | Param # | Image Scaling | Image Size | Crop |
|---|---|---|---|---|---|---|
| tf_efficientnet_l2_ns *tfp | 88.352 (11.648) | 98.652 (1.348) | 480 | bicubic | 800 | N/A |
| tf_efficientnet_l2_ns | TBD | TBD | 480 | bicubic | 800 | 0.961 |
@@ -308,7 +308,7 @@ Scripts are included to
As an example, to export the MobileNet-V3 pretrained model and then run an Imagenet validation:
```
python onnx_export.py --model mobilenetv3_large_100 ./mobilenetv3_100.onnx
python onnx_validate.py /imagenet/validation/ --onnx-input ./mobilenetv3_100.onnx
python onnx_validate.py /imagenet/validation/ --onnx-input ./mobilenetv3_100.onnx
```
These scripts were tested to be working as of PyTorch 1.6 and ONNX 1.7 w/ ONNX runtime 1.4. Caffe2 compatible

View File

@@ -2,24 +2,24 @@
This repository contains code to compute depth from a single image. It accompanies our [paper](https://arxiv.org/abs/1907.01341v3):
>Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer
>Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer
René Ranftl, Katrin Lasinger, David Hafner, Konrad Schindler, Vladlen Koltun
and our [preprint](https://arxiv.org/abs/2103.13413):
> Vision Transformers for Dense Prediction
> Vision Transformers for Dense Prediction
> René Ranftl, Alexey Bochkovskiy, Vladlen Koltun
MiDaS was trained on up to 12 datasets (ReDWeb, DIML, Movies, MegaDepth, WSVD, TartanAir, HRWSI, ApolloScape, BlendedMVS, IRS, KITTI, NYU Depth V2) with
multi-objective optimization.
multi-objective optimization.
The original model that was trained on 5 datasets (`MIX 5` in the paper) can be found [here](https://github.com/isl-org/MiDaS/releases/tag/v2).
The figure below shows an overview of the different MiDaS models; the bubble size scales with number of parameters.
![](figures/Improvement_vs_FPS.png)
### Setup
### Setup
1) Pick one or more models and download the corresponding weights to the `weights` folder:
@@ -31,9 +31,9 @@ MiDaS 3.1
MiDaS 3.0: Legacy transformer models [dpt_large_384](https://github.com/isl-org/MiDaS/releases/download/v3/dpt_large_384.pt) and [dpt_hybrid_384](https://github.com/isl-org/MiDaS/releases/download/v3/dpt_hybrid_384.pt)
MiDaS 2.1: Legacy convolutional models [midas_v21_384](https://github.com/isl-org/MiDaS/releases/download/v2_1/midas_v21_384.pt) and [midas_v21_small_256](https://github.com/isl-org/MiDaS/releases/download/v2_1/midas_v21_small_256.pt)
MiDaS 2.1: Legacy convolutional models [midas_v21_384](https://github.com/isl-org/MiDaS/releases/download/v2_1/midas_v21_384.pt) and [midas_v21_small_256](https://github.com/isl-org/MiDaS/releases/download/v2_1/midas_v21_small_256.pt)
1) Set up dependencies:
1) Set up dependencies:
```shell
conda env create -f environment.yaml
@@ -53,7 +53,7 @@ For the OpenVINO model, install
```shell
pip install openvino
```
### Usage
1) Place one or more input images in the folder `input`.
@@ -68,19 +68,19 @@ pip install openvino
[dpt_swin2_tiny_256](#model_type), [dpt_swin_large_384](#model_type), [dpt_next_vit_large_384](#model_type),
[dpt_levit_224](#model_type), [dpt_large_384](#model_type), [dpt_hybrid_384](#model_type),
[midas_v21_384](#model_type), [midas_v21_small_256](#model_type), [openvino_midas_v21_small_256](#model_type).
3) The resulting depth maps are written to the `output` folder.
#### optional
1) By default, the inference resizes the height of input images to the size of a model to fit into the encoder. This
size is given by the numbers in the model names of the [accuracy table](#accuracy). Some models do not only support a single
inference height but a range of different heights. Feel free to explore different heights by appending the extra
inference height but a range of different heights. Feel free to explore different heights by appending the extra
command line argument `--height`. Unsupported height values will throw an error. Note that using this argument may
decrease the model accuracy.
2) By default, the inference keeps the aspect ratio of input images when feeding them into the encoder if this is
supported by a model (all models except for Swin, Swin2, LeViT). In order to resize to a square resolution,
disregarding the aspect ratio while preserving the height, use the command line argument `--square`.
disregarding the aspect ratio while preserving the height, use the command line argument `--square`.
#### via Camera
@@ -91,7 +91,7 @@ pip install openvino
python run.py --model_type <model_type> --side
```
The argument `--side` is optional and causes both the input RGB image and the output depth map to be shown
The argument `--side` is optional and causes both the input RGB image and the output depth map to be shown
side-by-side for comparison.
#### via Docker
@@ -122,7 +122,7 @@ The pretrained model is also available on [PyTorch Hub](https://pytorch.org/hub/
See [README](https://github.com/isl-org/MiDaS/tree/master/tf) in the `tf` subdirectory.
Currently only supports MiDaS v2.1.
Currently only supports MiDaS v2.1.
#### via Mobile (iOS / Android)
@@ -133,16 +133,16 @@ See [README](https://github.com/isl-org/MiDaS/tree/master/mobile) in the `mobile
See [README](https://github.com/isl-org/MiDaS/tree/master/ros) in the `ros` subdirectory.
Currently only supports MiDaS v2.1. DPT-based models to be added.
Currently only supports MiDaS v2.1. DPT-based models to be added.
### Accuracy
We provide a **zero-shot error** $\epsilon_d$ which is evaluated for 6 different datasets
(see [paper](https://arxiv.org/abs/1907.01341v3)). **Lower error values are better**.
(see [paper](https://arxiv.org/abs/1907.01341v3)). **Lower error values are better**.
$\color{green}{\textsf{Overall model quality is represented by the improvement}}$ ([Imp.](#improvement)) with respect to
MiDaS 3.0 DPT<sub>L-384</sub>. The models are grouped by the height used for inference, whereas the square training resolution is given by
the numbers in the model names. The table also shows the **number of parameters** (in millions) and the
MiDaS 3.0 DPT<sub>L-384</sub>. The models are grouped by the height used for inference, whereas the square training resolution is given by
the numbers in the model names. The table also shows the **number of parameters** (in millions) and the
**frames per second** for inference at the training resolution (for GPU RTX 3090):
| MiDaS Model | DIW </br><sup>WHDR</sup> | Eth3d </br><sup>AbsRel</sup> | Sintel </br><sup>AbsRel</sup> | TUM </br><sup>δ1</sup> | KITTI </br><sup>δ1</sup> | NYUv2 </br><sup>δ1</sup> | $\color{green}{\textsf{Imp.}}$ </br><sup>%</sup> | Par.</br><sup>M</sup> | FPS</br><sup>&nbsp;</sup> |
@@ -171,16 +171,16 @@ the numbers in the model names. The table also shows the **number of parameters*
| [v3.1 LeViT<sub>224</sub>](https://github.com/isl-org/MiDaS/releases/download/v3_1/dpt_levit_224.pt)$\tiny{\square}$ | **0.1314** | **0.1206** | **0.3148** | **18.21** | **15.27*** | **8.64*** | $\color{green}{\textsf{-40}}$ | **51** | **73** |
&ast; No zero-shot error, because models are also trained on KITTI and NYU Depth V2\
$\square$ Validation performed at **square resolution**, either because the transformer encoder backbone of a model
does not support non-square resolutions (Swin, Swin2, LeViT) or for comparison with these models. All other
$\square$ Validation performed at **square resolution**, either because the transformer encoder backbone of a model
does not support non-square resolutions (Swin, Swin2, LeViT) or for comparison with these models. All other
validations keep the aspect ratio. A difference in resolution limits the comparability of the zero-shot error and the
improvement, because these quantities are averages over the pixels of an image and do not take into account the
improvement, because these quantities are averages over the pixels of an image and do not take into account the
advantage of more details due to a higher resolution.\
Best values per column and same validation height in bold
#### Improvement
The improvement in the above table is defined as the relative zero-shot error with respect to MiDaS v3.0
The improvement in the above table is defined as the relative zero-shot error with respect to MiDaS v3.0
DPT<sub>L-384</sub> and averaging over the datasets. So, if $\epsilon_d$ is the zero-shot error for dataset $d$, then
the $\color{green}{\textsf{improvement}}$ is given by $100(1-(1/6)\sum_d\epsilon_d/\epsilon_{d,\rm{DPT_{L-384}}})$%.
@@ -193,14 +193,14 @@ and v2.0 Large<sub>384</sub> respectively instead of v3.0 DPT<sub>L-384</sub>.
Zoom in for better visibility
![](figures/Comparison.png)
### Speed on Camera Feed
### Speed on Camera Feed
Test configuration
- Windows 10
- 11th Gen Intel Core i7-1185G7 3.00GHz
- 16GB RAM
- Camera resolution 640x480
- openvino_midas_v21_small_256
Test configuration
- Windows 10
- 11th Gen Intel Core i7-1185G7 3.00GHz
- 16GB RAM
- Camera resolution 640x480
- openvino_midas_v21_small_256
Speed: 22 FPS
@@ -251,9 +251,9 @@ If you use a DPT-based model, please also cite:
### Acknowledgements
Our work builds on and uses code from [timm](https://github.com/rwightman/pytorch-image-models) and [Next-ViT](https://github.com/bytedance/Next-ViT).
Our work builds on and uses code from [timm](https://github.com/rwightman/pytorch-image-models) and [Next-ViT](https://github.com/bytedance/Next-ViT).
We'd like to thank the authors for making these libraries available.
### License
### License
MIT License
MIT License