In the past two years, Facebook AI Research (FAIR) has cooperated with 13 universities around the world to collect the largest first-person video data set in history-dedicated to training deep learning image recognition models. Artificial intelligence trained on the data set will be better at controlling robots that interact with people or interpreting images on smart glasses. “Only when machines really understand the world through our eyes can they help us in our daily lives,” said Kristen Grauman of FAIR, who is in charge of the project.
This technology can support people who need help at home, or guide people to complete tasks they are learning to complete. “The videos in this dataset are closer to the way humans observe the world,” said Michael Ryoo, a computer vision researcher at Google Brain and Stony Brook University in New York who was not involved in Ego4D.
But the potential abuse is obvious and worrying. The research was funded by Facebook, a social media giant that was recently accused in the U.S. Senate of putting profits above people’s well-being-MIT Technology Review’s own survey confirms this.
The business model of Facebook and other large technology companies is to extract as much data as possible from people’s online behavior and sell it to advertisers. The artificial intelligence outlined in the project can extend this coverage to people’s daily offline behaviors, revealing what items are around your home, what activities you like, who you spend time with, and even where your eyes stay- This is unprecedented personal information.
Grauman said: “When you take privacy from the world of exploratory research and turn it into a product, you need to complete privacy work.” “This work can even be inspired by this project.”
The largest first-person video data set before includes 100 hours of footage of people in the kitchen. The Ego4D dataset consists of 3,025 hours of videos recorded by 855 people in 73 different locations in 9 countries (the United States, the United Kingdom, India, Japan, Italy, Singapore, Saudi Arabia, Colombia, and Rwanda).
Participants have different ages and backgrounds; some are recruited for visually interesting occupations, such as bakers, mechanics, carpenters, and gardeners.
Previous data sets usually consisted of half-scripted video clips that were only a few seconds long. For Ego4D, participants wear a head-mounted camera for up to 10 hours at a time and shoot first-person videos of unscripted daily activities, including walking along the street, reading, washing clothes, shopping, playing with pets, playing board games, and interacting with other people . Some shots also include audio, data about where the participant’s gaze is focused, and multiple perspectives of the same scene. Ryoo said this is the first of its kind.