This assists my prediction of how this will ultimately unfold. I hold no delusions about the time frame— very little of this is going to be on your computer within five years. You can use DeepDream and DeepArt and various DL voice synthesis programs, but it's all still very early in development. There will still be voice actors and animators in 2025. They'll still be fields you can get into and receive career payment. Comic and manga creators also won't be replaced anytime soon. If anything, it might take a bit longer for them precisely because of the nature of cartooning. Neural networks today are fantastic at repainting a pre-existing image or using an image it's seen before to create something new. But so far, it lacks the ability to actually stylize the image. There's no way to exaggerate features like you'd see in a cartoon. We know networks understand anime eyes, but they don't seem to be able to create an actual anime character based on images they've seen— if you fed a computer 1,000 anime stills and then inputted your own portrait into it, it wouldn't give you huge eyes or unrealistically sharpened/cutened features— it'd just recolor your portrait to make it toon-shaded. Likewise, I can't make my friend look like a character from the Simpsons with any algorithm that currently exists. He'd just have crayon-yellow skin and a flesh-colored snout but otherwise wouldn't actually have his skeletal or muscular structure altered to fit the Simpsons' distinctive style.
No network today can do that. It might be possible within a couple years to at least get a GAN to approximate it, but it won't be until the mid-2020s at the earliest that we'll see "filters" that could change my portrait into an actual cartoon. As of right now, making an algorithm "cartoonify" a person simply means adding vector graphics or cel-shading.
Now that won't be a problem if you were to use text-to-image synthesis. You could phase out the middleman and go straight to generating new characters from scratch. And in 2018, I bet that we might see the first inklings of this in a very basic way. In a lab, we'll get a comic created entirely by algorithm.
Input text describing a character— if I had to come up with something, I'd make it simple and just go with "round head with stick figure body".
Do the same thing for others. Describe the ways their limbs bend. If they have mouths, describe whether or not they're open. If there are speech bubbles, what do they look like and how big are they? Etc. etc.
Perhaps you could be more daring and feed a network thousands of images from a pre-chosen art style, but I'm being conservative.
Right now, a neural network that can actually make narrative sense is a damn-near impossible thing to create. So if you want to achieve causality and progression in such a story, you'll still need a human to make sense of it. Thus, this comic will likely be organized by a human even if the images are entirely AI-generated.
What happens by 2019 and 2020 then?
Singularity No, no, let's be real. It's not going to fundamentally improve by that much by 2020— AI will still lack narrative understanding. We're only just now getting AI that can understand sentences and paragraphs, so a whole narrative sequence is still way too much. But I can see people generating short cartoons. And when I say that, I don't mean video synthesis— that'll definitely happen too— but instead creating animations entirely through image synthesis. It'll be painstaking work, but it's still far less work than actual animation. They'll be this new breed of animator who can't draw for shit but can write detailed descriptions into a GAN textbox over and over again, slightly changing the pose and posture in each image. The more minute stuff will require manual editing, but larger points can be generated. Again, this will require a lot of manual effort, so even by the early 2020s, you won't be able to fuck about in your bedroom creating Pixar quality movies just by dribbling misspelled words into a text box. It would be shocking if you could even put together something longer than 10 minutes.
So in essence, it's human creativity augmented with a new tool that greatly democratizes the medium you're working in, a tool that possesses a very, very fleeting amount of creativity in and of itself.
As for voices, you'll be able to generate near-perfect sounding voices within a couple years— most likely by next year. No more monotonic Microsoft Sam or the classic Stephen Hawking voice. There'll be natural-sounding voices with natural intonations, inflections, and timbres.
You just likely won't be able to use it yourself. Oh sure, you could play around with it, but it'll be on GitHub if anything at all. A wide commercial release likely won't come for years. Siri and Cortana still sound pretty robotic. Oh sure, there are a few more idiosyncrasies to speaking patterns, but you still know when you're listening to a real person and when you're listening to Siri. What's well past the horizon and fast approaching us is a voice synthesizer that sounds so natural that, if you were listening to it vs. a real person, you wouldn't be able to tell the difference unless you are very highly trained and the program talks for more than a minute nonstop. Right now, we still need real vocal talents to provide for all these sounds that neural networks divvy up and reorganize into words, but genuine voice synthesis— creating a human voice from nothing but altering sound waves and doing it so well that it sounds indistinguishable from a real person— is likely not far behind. There isn't a world of difference between generating an artificial human voice vs. generating an artifical instrumental tone. Being able to get a TTS voice say the same word in different tones will be a gamechanger in and of itself, as would being able to get it to understand emotional cues.
But again, in the early 2020s, such technology likely won't be in the hands of the common indie creator. Even TTS programs today cost a fair amount of money, and none of them sound natural.
And even if they did sound natural, there's something else to consider. Have you ever used a TTS program and in the middle of listening to it read to you, it suddenly jerks around and keeps speaking like there was no break? Or maybe it didn't understand that you don't say 'dot dot dot' when there's an ellipses? That's still something that could pose a problem without manual editing. I don't see AI overcoming that within three years.
Translate these advances from voice into music and you have the same benefits and issues.
The general point being made here is that we are so close to a new age in entertainment and media in general that we're already teasing our fingers across its nose. Like, we're stupid close. And we're technically already within the space of it as we've seen with DeepDream and DeepArt and image colorizers (/r/ColorizedHistory being one of the finest subreddits out there).
But if you want to know the day when you could go to Amazon and buy a disc that holds a "cartoon generator" where you could basically recreate the entirety of, say, Avatar: The Last Airbender without losing a single aspect of the show's design, then I'm still forced to say "definitely not the 2020s. Possibly not even the 2030s (but I won't be so bold as to say it's impossible before then)."
In the 2020s, a person like me— stupendously bad drawing skills and a seeming incapability to grasp depth— will be able to use AI to generate very high quality art and even some animation. I could use it to master Photoshop, getting the network to generate just about any image I want and make it look real rather than mostly real with obvious tampering and fake elements that are half-assedly covered up. I could use it to perfectly copy my mother's signature if I ever needed it for homework (just to use a crazy example). I could use it to add new voice overs to existing properties without hiring actors. But the more complex stuff— creating complex animation, creating high-end video games, creating long videos, creating high-quality novels and novellas— is beyond me if I'm not willing to put in the effort.
Because the AI I can use will still allow me to create these things, but unlike the lower-hanging fruit, it also requires genuine effort on my part.
For example: I can use AI to create character and item designs in video games, perhaps even drawing up concept art and backgrounds and perhaps even usable assets. It could create and animate pixel art. It could even generate the music. But the process of actually creating said video game— coding it, putting all the pieces together, giving it narrative— is all on me. I can see there being AI that can partially code some aspects of a video game, perhaps even streamlining the process. And towards the end of the decade, there may even be a sort of "autocomplete" for coding. And maybe even English-to-Code translation. As long as I write what needs to happen and what needs to work, the AI could translate that into game code. Yet it's still on me to create the thing itself. I tried learning to code twice about half a decade ago with the intention of trying my hand at game design, and I failed both times because I just couldn't get into it, even though I did understand it after a while. By the end of next decade, I could probably bring those old ideas to life in some limited form.
I could go on and on about this subject, so I will.
I'm not comp-sci major (as just mentioned). But if I had to make an uneducated guess, I'd say that by 2027 we'll start seeing major disruption in the entertainment industry. Mostly in the field of comics/manga, modeling, graphic design, and content/business writing— the easiest stuff to automate since they involve static images or mostly fact-based writing. And the keyword there is "start" because it's not like every manga-ka in Japan is going to be on the dole or every cover model is eating cheap ramen at a homeless shelter glaring at these completely computer-generated physical gods and goddesses on the covers of magazines come January 1st, 2027. It'll still take a lot of time, and plenty of these types will still get by purely based on human stubbornness, tradition, and an increasing demand for the authentic.
Voice synthesis will likely be ironing out the very tiny few imperfections that still exist and the only real drawback will remain emotional variability— it's actually very hard to express emotion through writing, which is why so many stories have so many adverbs and overly flowery/simple emotional states, so getting a computer to understand what emotion to express, when to express it, and how to express it will be extremely difficult. Human emotional coaches might be needed for that for quite a while since a subjective desire to get things right will likely mean multiple run-throughs rather than a simple post-and-done situation, but I doubt that's a career you should be looking into— even if it weren't inevitable that AI will eventually figure out the emotional value of a scene (probably in the post-AGI days), it's likely going to devolve into one of two situations:
1: the mainstream works, where emotions are basically given pre-sets with tiny variations to satisfy the largest possible audience.
2: the auteur, where the creators want everything to fit their vision perfectly, even if it might be seen as goofy, unrealistic, inhuman, or chewing the scenery by the hoi polloi.
The in-between— where finding the right emotion is something that can be figured out by salaried or commissioned experts— probably won't be too common.
That being said, the big studios will likely have sounded the alarm on this use of AI to enhance/alter entertainment, but not in the way some might think. They're out for money, so anything that reduces cost while increasing profits is welcome— in other words, that "alarm" is more like a celebratory airhorn because now movie and gaming studios can spend as little as possible creating a product, leaving almost all of the budget on advertising.
Comic artists might still be going strong— people have a natural affinity towards what's canon, after all— but their years/decades of hard work refining their craft has been reduced since computers can perfectly match their art style. I can see some artists embracing this and embracing the inevitable explosion of fanon, but I can also see just as many artists— if not more— threatening to take legal action against those using their style via neural networks, or maybe even taking it up against the creators of these neural networks. People will try copywriting styles rather than just IP (perhaps they'll make it a part of their IP). Which isn't going to fare well against, say, blockchain-based neural networks that simply can't be stopped or those in other countries who don't care about trademark violations.
These days, fan works tend to be of variable quality because of both artist skills and writing skills— you could be a fantastic artist who completely nails the style of Jack Kirby or Akira Toriyama, but if your writing skills are no better than the average 14-year-old fanfiction writer who just discovered nu metal and swearing or that blood exists, people won't be coming back to you. Likewise, you could have really amazing writing skills, perhaps on par with David Foster Wallace or Vladimir Nabokov, but if your drawings look like mine, people won't subject themselves to your visual torture.
Heaven help ye if you venture into the magical and masochistic world of fangames. I've seen some shit. Like Vietnam-tier shit. So I know that this sort of AI can help people out. But in a world where everyone who wants to can create their own media franchises, you can understand that it could get overwhelming after a while. So overwhelming that it could spur many to just not bother at all. I've always wanted to create comics, but I think it's been beaten over your head by now that I can't draw. More than that, I've always wanted to be behind a TV show, a video game, or even a movie. Again, that's just not happening. Now I could write some fantastic stories that are eventually adapted into such things, but that's not what I'm talking about. So for me, waiting for the world to change is the only way. And it'll start with the easiest of the lot, which is comics.
This, I feel, will be a reality by 2029. Much sooner than many are comfortable with accepting.
I'm focusing so much on comic artists because it's the first thing that came to mind and because I wanted to focus on pure entertainment in regards to what will be possible in the very near future with media synthesis technology. We're not likely to generate whole shows, movies, and triple-A games with the tech anytime soon. I'm well aware of the potential to use media synthesis to craft false realities, fake news, and untrue videos— that's gonna be a post for another day.