Abstract

We introduce a two-stream model for dynamic texture synthesis. Our model is based on pre-trained convolutional networks (ConvNets) that target two independent tasks: (i) object recognition, and (ii) optical flow prediction. Given an input dynamic texture, statistics of filter responses from the object recognition ConvNet encapsulates the per frame appearance of the input texture, while statistics of filter responses from the optical flow ConvNet models its dynamics. To generate a novel texture, a noise input sequence is optimized to simultaneously match the feature statistics from each stream of the example texture. Inspired by recent work on image style transfer and enabled by the two-stream model, we also apply the synthesis approach to combine the texture appearance from one texture with the dynamics of another to generate entirely novel dynamic textures. We show that our approach generates novel, high quality samples that match both the framewise appearance and temporal evolution of input imagery.

Dynamic Texture Synthesis

We applied our dynamic texture synthesis process to a wide range of textures which were selected from the DynTex database as well as others collected in-the-wild. Here we provide synthesized results of over 50 different textures that encapsulate a range of phenomena, such as flowing water, waves, clouds, fire, rippling flags, waving plants, and schools of fish. Unless otherwise stated, both target and synthesized dynamic textures consist of 12 frames.

The top row consists of the target dynamic textures while the bottom row consists of the synthesized dynamic textures. Scroll horizontally to view more.

Click on a thumbnail to play the dynamic texture.

Success cases

Inputs which follow the underlying assumption of a dynamic texture, i.e., the appearance and/or dynamics are more-or-less spatially/temporally homogeneous, allow for better synthesized results.

          

ants

birds

boiling_water_1

boiling_water_2

calm_water

calm_water_2

calm_water_4

calm_water_5

fireplace_1

fish

fountain_1

grass_1

ink

lava

sea_2

sky_clouds_1

smoke_1

smoke_2

smoke_plume_1

underwater_vegetation

water_1

water_2

water_3

water_4

water_5

Failure cases

Inputs which do not follow the underlying assumption of a dynamic texture may result in perceptually implausible synthesized dynamic textures.

          

calm_water_3

candle_flame

candy_1

coral

cranberries

escalator

fireplace_2

flag

flag_2

flames

flushing_water

fountain_2

fur

grass_2

grass_3

plants

sea_1

shower_water_1

sky_clouds_2

smoke_3

snake_1

snake_2

snake_3

snake_4

snake_5

waterfall

Synthesis w/o the Dynamics Stream

To validate that the texture generation of multiple frames would not induce dynamics consistent with the input, we generated frames starting from randomly generated noise but only using the appearance statistics and corresponding loss. As expected, this produced frames that were valid textures but with no coherent dynamics present.

The top row consists of the target dynamic textures while the bottom row consists of the synthesized textures.

          

fish

sea_2

Extended Dynamic Texture Synthesis

Dynamic textures generated by incremental synthesis. We start by synthesizing a dynamic texture (12 frames in our experiments), then we take the final frame and use it to initialize the first frame of another sequence to be synthesized. This process is repeated multiple times as desired. Finally, the synthesized dynamic textures are (manually) stitched together, resulting in an extended dynamic texture.

The top row consists of the target dynamic textures while the bottom row consists of the synthesized dynamic textures, which are 120 frames long.

          

calm_water_4

calm_water_5

candy_2

flag

sky_clouds_1

smoke_1

water_5

"Infinite Textures"

An interesting extension that we briefly explored are textures where there is no discernible temporal seam between the last and first frames. Played as a loop, these textures appear to be temporally endless. This is trivially achieved by adding an additional loss to the dynamics stream that ties the last frame to the first.

The top row consists of the target dynamic texture while the bottom row consists of the synthesized "infinite" dynamic texture.

          

smoke_plume_1

Dynamics Style Transfer

The underlying assumption of our model is that the appearance and dynamics of a dynamic texture can be factorized. As such, it should allow for the transfer of the dynamics of one texture onto the appearance of another. This has been explored previously for artistic style transfer with static imagery. We accomplish this with our model by performing the same optimization as usual, but with the target Gram matrices for appearance and dynamics computed from different textures. In effect, the static texture is animated and hence brought to life. To the best of our knowledge, we are the first to demonstrate this form of style transfer.

The top row consists of the appearance target, the second row the dynamics target, and the third row the synthesized result.

          

calm_water_3_to_water_paint_1

Appearance target

fireplace_1_to_fire_paint

Appearance target

flag_2_to_flag_cropped_1

Appearance target

flag_2_to_flag_cropped_2

Appearance target

water_4_to_water_img

Appearance target

Portions of images can be animated.

          

waterfall_to_waterfall_paint

Appearance target

water_4_to_water_paint_2

Appearance target

sky_clouds_2_to_sky_clouds_paint

Appearance target

Citation

Matthew Tesfaldet, Marcus A. Brubaker, and Konstantinos G. Derpanis. Two-stream convolutional networks for dynamic texture synthesis. arXiv:1706.06982, 2017.

Bibtex format:

@article{tesfaldet2017,
  author = {Matthew Tesfaldet and Marcus A. Brubaker and Konstantinos G. Derpanis},
  title = {Two-Stream Convolutional Networks for Dynamic Texture Synthesis},
  journal = {arXiv:1706.06982},
  year = {2017}
}