So I know lots of people are working on different parts of AI, but let's look at these below, as they are just a few AIs but pack a lot of power. As you may know I have tried JUKEBOX, and GPT-3, which feel almost very close to human level text completion and audio completion ex. techno, in that you can give them any context and they will react/ predict what to do next like we would. I have not tried though NUWA which is text to video and more. Below I try multi-modal AI, it too is close to human level. It is amazing these AI train on a ton of data like a human adult brain has "seen" in its lifetime, and they are using multiple senses! And they are using stuff like sparse attention, Byte Pair Encoding, relational embeds, and more! Deepmind is also looking into a method that looks at a corpus to copy matches, like a painter looks at the model to make sure they paint the truth onto the canvas as a guide. We might get AGI by 2029!
Note: Download any image below if you want to see it's full resolution, most are close however any image that has ex. 16 image stuck together left-to-right, those are just badly shrunken, do download those ones if interested in them.
Note: small GLIDE, made by openAI, below, is only trained on 67-147 million text-image pairs or so, not 250M like the real GLIDE, and is 10x less parameters, only 300 million.
Here is a no text prompt, all i fed in was half an image:
https://ibb.co/dQHhc0F
Using text prompts and choosing which completion I liked, I made this by stitching them together (it only could be fed a square image, but still came good!):
original image
https://ibb.co/88XLykd
elongated (scroll down page)
https://ibb.co/Rz9L03X
More:
"tiger and tree" + https://ibb.co/txsWY9h
= https://ibb.co/SnqWYr4
"tigers and forest" + https://ibb.co/XLGbHdw
= https://ibb.co/9GZ2s6p
"tigers in river and forest" + above
= https://ibb.co/zGw6kQY
"circuit board" + https://ibb.co/P6vnpwK
= https://ibb.co/61ySX7H
"wildlife giraffe" + https://ibb.co/d4C3cH1
= https://ibb.co/zXSTF3N
"bathroom" + https://ibb.co/KzGqtFz
= https://ibb.co/9H1YqWz
"laboratory machine" + https://ibb.co/cTyXzTG
= https://ibb.co/6NjsJDK
"pikachu" + image
= https://ibb.co/3zJgWPw
"humanoid robot body android" + https://ibb.co/XWbN42K
= https://ibb.co/pQWZ6Vd
"bedroom" + https://ibb.co/41y0Q4q
= https://ibb.co/2Y0wSPd
"sci fi alien laboratory" + https://ibb.co/7JnH6wB
= https://ibb.co/kBtDjQc
"factory pipes lava" + https://ibb.co/88ZqdX9
= https://ibb.co/B2X1bn3
"factory pipes lava" + https://ibb.co/hcxmHN0
= https://ibb.co/wwSxtVM
"toy store aisle" + https://ibb.co/h9PdRQQ
= https://ibb.co/DwGz4zx
"fancy complex detailed royal wall gold gold gold gold"
https://ibb.co/BGGT9Zx
(:p hehe) "gold gates on clouds shining laboratory"
https://ibb.co/qjdcPcR
"gold dragons"
https://ibb.co/L5qkmFS
It generates the rest of an image based on the upper half, or left side, or what's around the hole you made (in-painting), and based on the text prompt provided. You can use only text prompt, or only image prompt.
The use cases are many, you can generate the rest of artwork, or a diagram, or blueprint, or short animation (4 pages stuck together as 1 image), or to figure out what a criminal looks like, or loved one.
You can tell it to generate a penguin's head for the missing top and but with a cigar in its mouth, lol, and so on. You can ask for just some text request and it'll pop out such image.
Ya with these AIs you can get more of your favorite content easily and if it had a slider, you could easily elongate the image or song and backtrack on what you didn't like it made (choose which completion, ex. for the next 5 seconds)
GLIDE also works with no text prompt, it does fine, just maybe 2x worse
--no text prompts--
https://ibb.co/TqPC18x
https://ibb.co/XynhTWS
https://ibb.co/ScFLrpk
https://ibb.co/s9jhrvb
https://ibb.co/Bcr3WXr
https://ibb.co/chJBkTJ
https://ibb.co/PFJkKFw
https://ibb.co/GF2HwXP
To use GLIDE, search Google for "github glide openai". I use it in kaggle, as its faster than colab for sure. You must make an account then verify number then open this in colab and only then can you see on right side the settings panel and in there u need to turn on GPU and internet. Upload images to right side top Upload, and then in the image calling part of the code that says ex. grass.png you put there simply ex. see i have:
# Source image we are inpainting
source_image_256 = read_image('../input/123456/tiger2.png', size=256)
source_image_64 = read_image('../input/123456/tiger2.png', size=64)
To control the mask change the 40: thingy to ex. 30 or 44. To control the mask sideways, add another one ex. [:0, :0, :30, :30] or something like that if i got it wrong, you just add one to the end i mean haha. Apparently you can add more than 1 mask (grey box) by doing ex:
mask[.....]
mask[.....]
mask[.....]
.....
Batch size sets the number of images to generate.
Once its done, click console to get the image and right click it to save it.
Here's mine for the opensource minDALL_E (this one had no image prompt allowed. So, just text.)
minDALL-E was only trained on 14 million text-image pairs. OpenAI's was trained on 250M. And the model is only 1.5 billion parameters, ~10x smaller. This means we almost certainly HAVE IT! Anyway look how good these are, compared to OpenAI.com's post!!
"a white robot standing on a red carpet, in a white room. the robot is glowing. an orange robotic arm near the robot is injecting the robot's brain with red fuel rods. a robot arm is placing red rods into the robot brain."
https://ibb.co/0VL8Rvx
"3 pikachu standng on red blocks lined up on the road under the sun, holding umbrellas, surrounded by electric towers"
https://ibb.co/xDQT3f6
"box cover art for the video game mario adventures 15. mario is jumping into a tall black pipe next to a system of pipes. the game case is red."
https://ibb.co/VBXVWsn
an illustration of a baby capybara in a christmas sweater staring at its reflection in a mirror
https://ibb.co/5WZWLT4
an armchair in the shape of an avocado. an armchair imitating an avocado.
https://ibb.co/nwwf1v4
an illustration of an avocado in a suit walking a dog
https://ibb.co/bvfPkxf
pikachu riding a wave under clouds inside of a large jar on a table
https://ibb.co/jHjV7mf
a living room with 2 white armchairs and a painting of a mushroom. the painting of a mushroom is mounted above a modern fireplace.
https://ibb.co/VmKqbHk
a living room with 2 white armchairs and a painting of the collosseum. the painting is mounted above a modern fireplace.
https://ibb.co/K5fPkvj
pikachu sitting on an armchair in the shape of an avocado. pikachu sitting on an armchair imitating an avocado.
https://ibb.co/XLJV4Hb
an illustration of pikachu in a suit staring at its reflection in a mirror
https://ibb.co/nMQRccf
"a cute pikachu shaped armchair in a living room. a cute armchair imitating pikachu. a cute armchair in the shape of pikachu"
https://ibb.co/dbJ1Ks6
To use it, go to this link below, make a kaggle account, verify phone number, then in this link below, click edit it, then go to setting panel at right and turn on GPU and internet. Then replace the code below, it's nearly same but makes it print more images. If you don't, it doesn't seem to work good.
https://www.kaggle.c...82362/mindall-e
images = images[rank]
n = num_candidates
fig = plt.figure(figsize=(6*int(math.sqrt(n)), 6*int(math.sqrt(n))))
for i in range(n):
ax = fig.add_subplot(int(math.sqrt(n)), int(math.sqrt(n)), i+1)
ax.imshow(images)
ax.set_axis_off()
plt.tight_layout()
plt.show()
NUWA - just wow
Edited by Dream Big, 03 January 2022 - 09:33 AM.