mirror of
https://github.com/vladmandic/automatic
synced 2026-04-09 10:11:53 +02:00
update precommit hooks
Signed-off-by: Vladimir Mandic <mandic00@live.com>
This commit is contained in:
@@ -25,15 +25,24 @@ repos:
|
||||
- id: check-case-conflict
|
||||
- id: check-merge-conflict
|
||||
- id: check-symlinks
|
||||
- id: check-yaml
|
||||
- id: check-illegal-windows-names
|
||||
- id: check-merge-conflict
|
||||
- id: detect-private-key
|
||||
- id: check-builtin-literals
|
||||
- id: check-case-conflict
|
||||
- id: check-json
|
||||
- id: check-symlinks
|
||||
- id: check-yaml
|
||||
- id: check-json
|
||||
- id: check-toml
|
||||
- id: check-xml
|
||||
- id: end-of-file-fixer
|
||||
- id: mixed-line-ending
|
||||
- id: check-executables-have-shebangs
|
||||
exclude: |
|
||||
(?x)^(
|
||||
.*.bat|
|
||||
.*.ps1
|
||||
)$
|
||||
- id: trailing-whitespace
|
||||
exclude: |
|
||||
(?x)^(
|
||||
|
||||
@@ -29,8 +29,8 @@ Any code commit is validated before merge
|
||||
|
||||
`SD.Next` library can establish external connections *only* for following purposes and *only* when explicitly configured by user:
|
||||
|
||||
- Download extensions and themes indexes from automatically updated indexes
|
||||
- Download extensions and themes indexes from automatically updated indexes
|
||||
- Download required packages and repositories from GitHub during installation/upgrade
|
||||
- Download installed/enabled extensions
|
||||
- Download models from CivitAI and/or Huggingface when instructed by user
|
||||
- Submit benchmark info upon user interaction
|
||||
- Submit benchmark info upon user interaction
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# (Generic) EfficientNets for PyTorch
|
||||
|
||||
A 'generic' implementation of EfficientNet, MixNet, MobileNetV3, etc. that covers most of the compute/parameter efficient architectures derived from the MobileNet V1/V2 block sequence, including those found via automated neural architecture search.
|
||||
A 'generic' implementation of EfficientNet, MixNet, MobileNetV3, etc. that covers most of the compute/parameter efficient architectures derived from the MobileNet V1/V2 block sequence, including those found via automated neural architecture search.
|
||||
|
||||
All models are implemented by GenEfficientNet or MobileNetV3 classes, with string based architecture definitions to configure the block layouts (idea from [here](https://github.com/tensorflow/tpu/blob/master/models/official/mnasnet/mnasnet_models.py))
|
||||
|
||||
@@ -20,7 +20,7 @@ All models are implemented by GenEfficientNet or MobileNetV3 classes, with strin
|
||||
* 4.5M param MobileNet-V2 110d @ 75%
|
||||
* 6.1M param MobileNet-V2 140 @ 76.5%
|
||||
* 5.8M param MobileNet-V2 120d @ 77.3%
|
||||
|
||||
|
||||
### March 23, 2020
|
||||
* Add EfficientNet-Lite models w/ weights ported from [Tensorflow TPU](https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet/lite)
|
||||
* Add PyTorch trained MobileNet-V3 Large weights with 75.77% top-1
|
||||
@@ -39,7 +39,7 @@ All models are implemented by GenEfficientNet or MobileNetV3 classes, with strin
|
||||
### Nov 22, 2019
|
||||
* New top-1 high! Ported official TF EfficientNet AdvProp (https://arxiv.org/abs/1911.09665) weights and B8 model spec. Created a new set of `ap` models since they use a different
|
||||
preprocessing (Inception mean/std) from the original EfficientNet base/AA/RA weights.
|
||||
|
||||
|
||||
### Nov 15, 2019
|
||||
* Ported official TF MobileNet-V3 float32 large/small/minimalistic weights
|
||||
* Modifications to MobileNet-V3 model and components to support some additional config needed for differences between TF MobileNet-V3 and mine
|
||||
@@ -50,7 +50,7 @@ All models are implemented by GenEfficientNet or MobileNetV3 classes, with strin
|
||||
* Add JIT optimized mem-efficient Swish/Mish autograd.fn in addition to memory-efficient autgrad.fn
|
||||
* Activation factory to select best version of activation by name or override one globally
|
||||
* Add pretrained checkpoint load helper that handles input conv and classifier changes
|
||||
|
||||
|
||||
### Oct 27, 2019
|
||||
* Add CondConv EfficientNet variants ported from https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet/condconv
|
||||
* Add RandAug weights for TF EfficientNet B5 and B7 from https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet
|
||||
@@ -75,8 +75,8 @@ Implemented models include:
|
||||
* MobileNet-V3 (https://arxiv.org/abs/1905.02244)
|
||||
* FBNet-C (https://arxiv.org/abs/1812.03443)
|
||||
* Single-Path NAS (https://arxiv.org/abs/1904.02877)
|
||||
|
||||
I originally implemented and trained some these models with code [here](https://github.com/rwightman/pytorch-image-models), this repository contains just the GenEfficientNet models, validation, and associated ONNX/Caffe2 export code.
|
||||
|
||||
I originally implemented and trained some these models with code [here](https://github.com/rwightman/pytorch-image-models), this repository contains just the GenEfficientNet models, validation, and associated ONNX/Caffe2 export code.
|
||||
|
||||
## Pretrained
|
||||
|
||||
@@ -117,7 +117,7 @@ More pretrained models to come...
|
||||
|
||||
The weights ported from Tensorflow checkpoints for the EfficientNet models do pretty much match accuracy in Tensorflow once a SAME convolution padding equivalent is added, and the same crop factors, image scaling, etc (see table) are used via cmd line args.
|
||||
|
||||
**IMPORTANT:**
|
||||
**IMPORTANT:**
|
||||
* Tensorflow ported weights for EfficientNet AdvProp (AP), EfficientNet EdgeTPU, EfficientNet-CondConv, EfficientNet-Lite, and MobileNet-V3 models use Inception style (0.5, 0.5, 0.5) for mean and std.
|
||||
* Enabling the Tensorflow preprocessing pipeline with `--tf-preprocessing` at validation time will improve scores by 0.1-0.5%, very close to original TF impl.
|
||||
|
||||
@@ -130,7 +130,7 @@ To run validation w/ TF preprocessing for tf_efficientnet_b5:
|
||||
To run validation for a model with Inception preprocessing, ie EfficientNet-B8 AdvProp:
|
||||
`python validate.py /path/to/imagenet/validation/ --model tf_efficientnet_b8_ap -b 48 --num-gpu 2 --img-size 672 --crop-pct 0.954 --mean 0.5 --std 0.5`
|
||||
|
||||
|Model | Prec@1 (Err) | Prec@5 (Err) | Param # | Image Scaling | Image Size | Crop |
|
||||
|Model | Prec@1 (Err) | Prec@5 (Err) | Param # | Image Scaling | Image Size | Crop |
|
||||
|---|---|---|---|---|---|---|
|
||||
| tf_efficientnet_l2_ns *tfp | 88.352 (11.648) | 98.652 (1.348) | 480 | bicubic | 800 | N/A |
|
||||
| tf_efficientnet_l2_ns | TBD | TBD | 480 | bicubic | 800 | 0.961 |
|
||||
@@ -308,7 +308,7 @@ Scripts are included to
|
||||
As an example, to export the MobileNet-V3 pretrained model and then run an Imagenet validation:
|
||||
```
|
||||
python onnx_export.py --model mobilenetv3_large_100 ./mobilenetv3_100.onnx
|
||||
python onnx_validate.py /imagenet/validation/ --onnx-input ./mobilenetv3_100.onnx
|
||||
python onnx_validate.py /imagenet/validation/ --onnx-input ./mobilenetv3_100.onnx
|
||||
```
|
||||
|
||||
These scripts were tested to be working as of PyTorch 1.6 and ONNX 1.7 w/ ONNX runtime 1.4. Caffe2 compatible
|
||||
|
||||
@@ -2,24 +2,24 @@
|
||||
|
||||
This repository contains code to compute depth from a single image. It accompanies our [paper](https://arxiv.org/abs/1907.01341v3):
|
||||
|
||||
>Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer
|
||||
>Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer
|
||||
René Ranftl, Katrin Lasinger, David Hafner, Konrad Schindler, Vladlen Koltun
|
||||
|
||||
|
||||
and our [preprint](https://arxiv.org/abs/2103.13413):
|
||||
|
||||
> Vision Transformers for Dense Prediction
|
||||
> Vision Transformers for Dense Prediction
|
||||
> René Ranftl, Alexey Bochkovskiy, Vladlen Koltun
|
||||
|
||||
|
||||
MiDaS was trained on up to 12 datasets (ReDWeb, DIML, Movies, MegaDepth, WSVD, TartanAir, HRWSI, ApolloScape, BlendedMVS, IRS, KITTI, NYU Depth V2) with
|
||||
multi-objective optimization.
|
||||
multi-objective optimization.
|
||||
The original model that was trained on 5 datasets (`MIX 5` in the paper) can be found [here](https://github.com/isl-org/MiDaS/releases/tag/v2).
|
||||
The figure below shows an overview of the different MiDaS models; the bubble size scales with number of parameters.
|
||||
|
||||

|
||||
|
||||
### Setup
|
||||
### Setup
|
||||
|
||||
1) Pick one or more models and download the corresponding weights to the `weights` folder:
|
||||
|
||||
@@ -31,9 +31,9 @@ MiDaS 3.1
|
||||
|
||||
MiDaS 3.0: Legacy transformer models [dpt_large_384](https://github.com/isl-org/MiDaS/releases/download/v3/dpt_large_384.pt) and [dpt_hybrid_384](https://github.com/isl-org/MiDaS/releases/download/v3/dpt_hybrid_384.pt)
|
||||
|
||||
MiDaS 2.1: Legacy convolutional models [midas_v21_384](https://github.com/isl-org/MiDaS/releases/download/v2_1/midas_v21_384.pt) and [midas_v21_small_256](https://github.com/isl-org/MiDaS/releases/download/v2_1/midas_v21_small_256.pt)
|
||||
MiDaS 2.1: Legacy convolutional models [midas_v21_384](https://github.com/isl-org/MiDaS/releases/download/v2_1/midas_v21_384.pt) and [midas_v21_small_256](https://github.com/isl-org/MiDaS/releases/download/v2_1/midas_v21_small_256.pt)
|
||||
|
||||
1) Set up dependencies:
|
||||
1) Set up dependencies:
|
||||
|
||||
```shell
|
||||
conda env create -f environment.yaml
|
||||
@@ -53,7 +53,7 @@ For the OpenVINO model, install
|
||||
```shell
|
||||
pip install openvino
|
||||
```
|
||||
|
||||
|
||||
### Usage
|
||||
|
||||
1) Place one or more input images in the folder `input`.
|
||||
@@ -68,19 +68,19 @@ pip install openvino
|
||||
[dpt_swin2_tiny_256](#model_type), [dpt_swin_large_384](#model_type), [dpt_next_vit_large_384](#model_type),
|
||||
[dpt_levit_224](#model_type), [dpt_large_384](#model_type), [dpt_hybrid_384](#model_type),
|
||||
[midas_v21_384](#model_type), [midas_v21_small_256](#model_type), [openvino_midas_v21_small_256](#model_type).
|
||||
|
||||
|
||||
3) The resulting depth maps are written to the `output` folder.
|
||||
|
||||
#### optional
|
||||
|
||||
1) By default, the inference resizes the height of input images to the size of a model to fit into the encoder. This
|
||||
size is given by the numbers in the model names of the [accuracy table](#accuracy). Some models do not only support a single
|
||||
inference height but a range of different heights. Feel free to explore different heights by appending the extra
|
||||
inference height but a range of different heights. Feel free to explore different heights by appending the extra
|
||||
command line argument `--height`. Unsupported height values will throw an error. Note that using this argument may
|
||||
decrease the model accuracy.
|
||||
2) By default, the inference keeps the aspect ratio of input images when feeding them into the encoder if this is
|
||||
supported by a model (all models except for Swin, Swin2, LeViT). In order to resize to a square resolution,
|
||||
disregarding the aspect ratio while preserving the height, use the command line argument `--square`.
|
||||
disregarding the aspect ratio while preserving the height, use the command line argument `--square`.
|
||||
|
||||
#### via Camera
|
||||
|
||||
@@ -91,7 +91,7 @@ pip install openvino
|
||||
python run.py --model_type <model_type> --side
|
||||
```
|
||||
|
||||
The argument `--side` is optional and causes both the input RGB image and the output depth map to be shown
|
||||
The argument `--side` is optional and causes both the input RGB image and the output depth map to be shown
|
||||
side-by-side for comparison.
|
||||
|
||||
#### via Docker
|
||||
@@ -122,7 +122,7 @@ The pretrained model is also available on [PyTorch Hub](https://pytorch.org/hub/
|
||||
|
||||
See [README](https://github.com/isl-org/MiDaS/tree/master/tf) in the `tf` subdirectory.
|
||||
|
||||
Currently only supports MiDaS v2.1.
|
||||
Currently only supports MiDaS v2.1.
|
||||
|
||||
|
||||
#### via Mobile (iOS / Android)
|
||||
@@ -133,16 +133,16 @@ See [README](https://github.com/isl-org/MiDaS/tree/master/mobile) in the `mobile
|
||||
|
||||
See [README](https://github.com/isl-org/MiDaS/tree/master/ros) in the `ros` subdirectory.
|
||||
|
||||
Currently only supports MiDaS v2.1. DPT-based models to be added.
|
||||
Currently only supports MiDaS v2.1. DPT-based models to be added.
|
||||
|
||||
|
||||
### Accuracy
|
||||
|
||||
We provide a **zero-shot error** $\epsilon_d$ which is evaluated for 6 different datasets
|
||||
(see [paper](https://arxiv.org/abs/1907.01341v3)). **Lower error values are better**.
|
||||
(see [paper](https://arxiv.org/abs/1907.01341v3)). **Lower error values are better**.
|
||||
$\color{green}{\textsf{Overall model quality is represented by the improvement}}$ ([Imp.](#improvement)) with respect to
|
||||
MiDaS 3.0 DPT<sub>L-384</sub>. The models are grouped by the height used for inference, whereas the square training resolution is given by
|
||||
the numbers in the model names. The table also shows the **number of parameters** (in millions) and the
|
||||
MiDaS 3.0 DPT<sub>L-384</sub>. The models are grouped by the height used for inference, whereas the square training resolution is given by
|
||||
the numbers in the model names. The table also shows the **number of parameters** (in millions) and the
|
||||
**frames per second** for inference at the training resolution (for GPU RTX 3090):
|
||||
|
||||
| MiDaS Model | DIW </br><sup>WHDR</sup> | Eth3d </br><sup>AbsRel</sup> | Sintel </br><sup>AbsRel</sup> | TUM </br><sup>δ1</sup> | KITTI </br><sup>δ1</sup> | NYUv2 </br><sup>δ1</sup> | $\color{green}{\textsf{Imp.}}$ </br><sup>%</sup> | Par.</br><sup>M</sup> | FPS</br><sup> </sup> |
|
||||
@@ -171,16 +171,16 @@ the numbers in the model names. The table also shows the **number of parameters*
|
||||
| [v3.1 LeViT<sub>224</sub>](https://github.com/isl-org/MiDaS/releases/download/v3_1/dpt_levit_224.pt)$\tiny{\square}$ | **0.1314** | **0.1206** | **0.3148** | **18.21** | **15.27*** | **8.64*** | $\color{green}{\textsf{-40}}$ | **51** | **73** |
|
||||
|
||||
* No zero-shot error, because models are also trained on KITTI and NYU Depth V2\
|
||||
$\square$ Validation performed at **square resolution**, either because the transformer encoder backbone of a model
|
||||
does not support non-square resolutions (Swin, Swin2, LeViT) or for comparison with these models. All other
|
||||
$\square$ Validation performed at **square resolution**, either because the transformer encoder backbone of a model
|
||||
does not support non-square resolutions (Swin, Swin2, LeViT) or for comparison with these models. All other
|
||||
validations keep the aspect ratio. A difference in resolution limits the comparability of the zero-shot error and the
|
||||
improvement, because these quantities are averages over the pixels of an image and do not take into account the
|
||||
improvement, because these quantities are averages over the pixels of an image and do not take into account the
|
||||
advantage of more details due to a higher resolution.\
|
||||
Best values per column and same validation height in bold
|
||||
|
||||
#### Improvement
|
||||
|
||||
The improvement in the above table is defined as the relative zero-shot error with respect to MiDaS v3.0
|
||||
The improvement in the above table is defined as the relative zero-shot error with respect to MiDaS v3.0
|
||||
DPT<sub>L-384</sub> and averaging over the datasets. So, if $\epsilon_d$ is the zero-shot error for dataset $d$, then
|
||||
the $\color{green}{\textsf{improvement}}$ is given by $100(1-(1/6)\sum_d\epsilon_d/\epsilon_{d,\rm{DPT_{L-384}}})$%.
|
||||
|
||||
@@ -193,14 +193,14 @@ and v2.0 Large<sub>384</sub> respectively instead of v3.0 DPT<sub>L-384</sub>.
|
||||
Zoom in for better visibility
|
||||

|
||||
|
||||
### Speed on Camera Feed
|
||||
### Speed on Camera Feed
|
||||
|
||||
Test configuration
|
||||
- Windows 10
|
||||
- 11th Gen Intel Core i7-1185G7 3.00GHz
|
||||
- 16GB RAM
|
||||
- Camera resolution 640x480
|
||||
- openvino_midas_v21_small_256
|
||||
Test configuration
|
||||
- Windows 10
|
||||
- 11th Gen Intel Core i7-1185G7 3.00GHz
|
||||
- 16GB RAM
|
||||
- Camera resolution 640x480
|
||||
- openvino_midas_v21_small_256
|
||||
|
||||
Speed: 22 FPS
|
||||
|
||||
@@ -251,9 +251,9 @@ If you use a DPT-based model, please also cite:
|
||||
|
||||
### Acknowledgements
|
||||
|
||||
Our work builds on and uses code from [timm](https://github.com/rwightman/pytorch-image-models) and [Next-ViT](https://github.com/bytedance/Next-ViT).
|
||||
Our work builds on and uses code from [timm](https://github.com/rwightman/pytorch-image-models) and [Next-ViT](https://github.com/bytedance/Next-ViT).
|
||||
We'd like to thank the authors for making these libraries available.
|
||||
|
||||
### License
|
||||
### License
|
||||
|
||||
MIT License
|
||||
MIT License
|
||||
|
||||
Reference in New Issue
Block a user