A Dissertation on Deep Generative Models for Domain Translation [Spanish]

1 minute read

Over the last decade Deep Learning has revolutionized almost any ICT area, permeating so fast and deeply in companies and social spheres that has enabled new capabilities never seen before. Some of them already so quotidian as the controversial, fake photorealistic portraits, also known as deepfakes.

To this effect, Deep Generative Models have experimented an intense and steady improvement in their capabilities, including various paradigm changes. Some of their variants enable impressive techniques for Multimodal Domain Translation (MDT), by which for instance an image can be generated from a simple textual caption, or a soundless video excerpt populated with a synthetic and plausible audio track. State of the art Conditional Generative Adversarial Networks (cGANs) possess some benefits and demonstrate with outstanding results over Variational Autoencoders (VAEs), in a competitive race where Transformers have recently irrupted with high levels of expectation.

In this video we walk along various MDT creative applications and examples of image-to-image, text-to-sound, or video-to-sound translation. Concurrently, multiple techniques for improving and stabilizing cGANs training will be threshed, such as conditioning augmentation, spectral normalization, adaptive data augmentation, gradient penalty, self-attention, class conditional normalization, or auxiliary and conditional projection classifiers. A concluding visit to attentional architectures, shaping the core of the Transformers’ paradigm, will show their unique capabilities as translators —beyond Natural Language Processing such as GPT-3— and as generators in general.

For those non-spanish speakers, the slides in english can be directly downloaded here.

Thanks for watching!