The tech giant has announced plans to teach AI to ‘understand and interact with the world like we do’ in first person. It hopes to do this by using video and audio from augmented reality (AR) glasses like its new high-tech Ray-Bans.
“AI typically learns from photos and videos captured in third-person, but next-generation AI will need to learn from videos that show the world from the center of action,” the company said.
It went on: “AI that understands the world from this point of view could unlock a new era of immersive experiences.”
For the Ego4D project, Facebook gathered 2,200 hours of first-person video from 700 people going about their daily lives in order to begin training its AI assistants. It says it wants to teach AI to:
- remember things, so we can ask it ‘what happened when’
- predict human actions and try to anticipate our needs
- manipulate hands and objects in order to learn new skills
- keep a video ‘diary’ of everyday life and recall specific moments
- learn and understand social interaction
These tasks can’t be performed by any AI system right now, but could play a central role in Facebook’s plans to build the ‘metaverse’; a digital 3D overlay of reality using VR and AR.
It has already began work on this with its Oculus VR headsets, as well as the release of the new Facebook x Ray-Ban smart glasses.
AI is able to learn to see and hear using vast amounts of real-world data, and it seems Facebook want to use its devices to gather this data in order to build more intelligent systems in the future.
However, the launch of AR glasses such as Facebook’s Ray-Bans has prompted privacy concerns, as the glasses would allow people to film others secretly without their knowledge.