Abstract

We introduce a two-stream model for dynamic texture synthesis. Our model is based on pre-trained convolutional networks (ConvNets) that target two independent tasks: (i) object recognition, and (ii) optical flow prediction. Given an input dynamic texture, statistics of filter responses from the object recognition ConvNet encapsulate the per-frame appearance of the input texture, while statistics of filter responses from the optical flow ConvNet model its dynamics. To generate a novel texture, a randomly initialized input sequence is optimized to match the feature statistics from each stream of an example texture. Inspired by recent work on image style transfer and enabled by the two-stream model, we also apply the synthesis approach to combine the texture appearance from one texture with the dynamics of another to generate entirely novel dynamic textures. We show that our approach generates novel, high quality samples that match both the framewise appearance and temporal evolution of input texture. Finally, we quantitatively evaluate our texture synthesis approach with a thorough user study.

Dynamic Texture Synthesis

We applied our dynamic texture synthesis process to a wide range of textures which were selected from the DynTex database as well as others collected in-the-wild. Here we provide synthesized results of nearly 60 different textures that encapsulate a range of phenomena, such as flowing water, waves, clouds, fire, rippling flags, waving plants, and schools of fish. Unless otherwise stated, both target and synthesized dynamic textures consist of 12 frames.

The top row consists of the target dynamic textures while the bottom row consists of the synthesized dynamic textures. Scroll horizontally to view more.

Inputs which follow the underlying assumption of a dynamic texture, i.e., the appearance and/or dynamics are more-or-less spatially/temporally homogeneous, allow for better synthesized results. Inputs which do not follow the underlying assumption of a dynamic texture may result in perceptually implausible synthesized dynamic textures.

Click on a thumbnail to play the dynamic texture.

          

ants

bamboo

birds

boiling_water_1

boiling_water_2

calm_water

calm_water_2

calm_water_3

calm_water_4

calm_water_5

calm_water_6

candle_flame

candy_1

candy_2

coral

cranberries

escalator

fireplace_1

fireplace_2

fish

flag

flag_2

flames

flushing_water

fountain_1

fountain_2

fur

grass_1

grass_2

grass_3

ink

lava

plants

sea_1

sea_2

sea_3

sea_4

shiny_circles

shower_water_1

sky_clouds_1

sky_clouds_2

smoke_1

smoke_2

smoke_3

smoke_plume_1

snake_1

snake_2

snake_3

snake_4

snake_5

tv_static

underwater_vegetation_1

water_1

water_2

water_3

water_4

water_5

waterfall

waterfall_2

Synthesis w/o the Dynamics Stream

To validate that the texture generation of multiple frames would not induce dynamics consistent with the input, we generated frames starting from randomly generated noise but only using the appearance statistics and corresponding loss. As expected, this produced frames that were valid textures but with no coherent dynamics present.

The top row consists of the target dynamic textures while the bottom row consists of the synthesized textures.

          

fish

Incremental Synthesis

Dynamic textures generated by incremental synthesis. Long sequences can be incrementally generated by separating the sequence into sub- sequences and optimizing them sequentially. This is realized by initializing the first frame of a subsequence as the last frame from the previous subsequence and keeping it fixed throughout the optimization. The remaining frames of the subsequence are initialized randomly and optimized as usual.

The top row consists of the target dynamic textures while the bottom row consists of the synthesized dynamic textures.

          

calm_water_4

calm_water_5

fish

flag

sky_clouds_1

smoke_1

water_5

Temporally-endless Textures

An interesting extension that we briefly explored are textures where there is no discernible temporal seam between the last and first frames. Played as a loop, these textures appear to be temporally endless. This is trivially achieved by adding an additional loss to the dynamics stream that ties the last frame to the first.

The top row consists of the target dynamic texture while the bottom row consists of the synthesized "infinite" dynamic texture.

          

smoke_plume_1

Dynamics Style Transfer

The underlying assumption of our model is that the appearance and dynamics of a dynamic texture can be factorized. As such, it should allow for the transfer of the dynamics of one texture onto the appearance of another. This has been explored previously for artistic style transfer with static imagery. We accomplish this with our model by performing the same optimization as usual, but with the target Gram matrices for appearance and dynamics computed from different textures. In effect, the static texture is animated and hence brought to life. To the best of our knowledge, we are the first to demonstrate this form of style transfer.

The top row consists of the appearance target, the second row the dynamics target, and the third row the synthesized result.

          

fireplace_1_to_fire_paint

Appearance target

flag_2_to_flag_cropped_1

Appearance target

flag_2_to_flag_cropped_2

Appearance target

water_4_to_water_img

Appearance target

Portions of images can be animated.

          

waterfall_to_waterfall_paint

Appearance target

water_4_to_water_paint

Appearance target

Citation

Mattie Tesfaldet, Marcus A. Brubaker, and Konstantinos G. Derpanis. Two-stream convolutional networks for dynamic texture synthesis. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.

Bibtex format:

@inproceedings{tesfaldet2018,
  author = {Mattie Tesfaldet and Marcus A. Brubaker and Konstantinos G. Derpanis},
  title = {Two-Stream Convolutional Networks for Dynamic Texture Synthesis},
  booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2018}
}