python demo.py
- 模型脚本迁移 100%
- 推理脚本迁移 40%
- text
- image
- video预处理(torchvideo组件)
- audio预处理(torchaudio组件)
- 交叉模态encode精度对比 15%
- text x image
- text x audio
- image x audio
- 性能对比
(mindcv.models.vit)
1) 新增droppath(mindcv.models.vit)
1) torch.linspace替换为np.linspace(ms算子替换?)
_init_weights
1) 只迁移分支weight_init_style=pytorch
2) 新增trunc_normal_, constant_(mindone.models.dit)
后处理模块norm,需要单独迁移
adapted from mindone https://github.com/mindspore-lab/mindone/blob/master/examples/stable_diffusion_xl/gm/modules/util.py
1) 待确认是否迁移
adapted from mindone droppath module
to activate need self.dropout.training=True
返回的torch.FloatTensor替换待确认
ops.interpolate(mode=bicubic)可能会有误差
原实现转换了fp16->fp32->bf16精度,待确认是否迁移
pos_embed用parameter(requires_grad=False)替代register_buffer
去掉了init_parameters()的no_grad装饰器
用mindone normal_替代nn.init
torch.empty用ops.zeros替代,反正之后会被init覆盖
用mint.triu替代torch.triu
causal_masking用parameter(requires_grad=False)替代register_buffer
torch.empty用ops.zeros替代,反正之后会被init覆盖
去掉了init_parameters()的no_grad装饰器
用ops.pad替代, 确认args正确
torch.empty用ops.zeros替代,反正之后会被init覆盖
去掉了init_parameters()的no_grad装饰器
waveform2melspec
需要替代torchaudio.compliance.kaldi.fbank
ms.dataset.audio.melscale_fbank对标的是torchaudio的另一个接口torchaudio.functional.melscale_fbank
两个接口有一定区别,但在torchaudio官网有issue尝试过对齐,待确认
get_clip_timepoints
load_and_transform_vision_data
需要替代pytorchvideo的两个API:
pytorchvideo.transforms.ShortSideScale
pytorchvideo.transforms.ConstantClipsPerVideoSampler
load_and_transform_thermal_data
load_and_transform_text
load_and_transform_audio_data
get_clip_timepoints
crop_boxes
uniform_crop
ops.interpolate(mode=bilinear)可能会有误差
SpatialCrop
load_and_transform_video_data
_create_modality_trucks
_create_modality_heads
_create_modality_postprocessors
nn.celllist不支持保存sequentialcelllist,已经解耦preprocessor/head/postprocessor