From 2013 to 2014 I worked as a Data Analysis Consultant for the Center of Vision Technology at SRI International. My primary responsibility was developing survey research methods to code video interactions. The coding scheme was used to develop machine learning algorithms that would automatically detect certain types of social interaction. The technology developed from the project was incorporated in a virtual reality device to train police officers and military personnel on how to improve social interaction with civilians.
My responsibilities during the various stages of the project are outlined below.
1. Grant Writing
The project began when I collaborated with Ajay Divakaran on a research proposal to develop a method to quantify and automate ethnographic methods to analyze social interaction. We received $1,000,000 in funding from the Defense Advanced Research Projects Agency to fund the project.
2. Developing a Coding Scheme
We needed to develop a scheme to code approximately 150 three minute videos. At the time, two types of coding schemes dominated academic research. The first method was to play the videos in their entirety and have the coder fill out a survey that measured what they observed. A second more detailed way of capturing interaction was through a micro-social annotation of body gestures. The first method did not capture enough interaction detail to train the algorithms. The second method, while feature rich, was too labor intensive.
Given these constrains, I collaborated with SRI engineers and Brian Lande of Polis Solutions to develop an innovative coding scheme that was not labor intensive but rich in social interaction detail. I turned to the social psychology literature and developed a survey that would code social interaction at every ten second increment of the video. The survey coded for two types of interaction – joint attention and entrainment. For joint attention, the survey measured eye gaze, sharing the same topic and body language. For entrainment, the survey measured simultaneous movement, tempo similarity, coordination and imitation.
We recorded two types of video for the analysis. The first were simulated officer and inmate social interactions at the Washington State Criminal Justice Training Commission and the second a series of tower building videos created in-house at SRI International, Princeton. To capture the interaction, the participants in the videos were fitted with GoPro cameras that recorded various angles while the Microsoft Kinect captured the x,y,z components of body movements:
4. Project Management: Inter-coder reliability tests
Once the scheme was developed and the videos created, I hired a group of UC Berkeley undergraduate students to code the videos. This stage of the project was particularly challenging because some of the survey questions were vague and thus open to multiple interpretations. For instance, the question for tempo similarity read:
Assume that all people have built in speeds to which their behavior is set (much like the tempo an orchestra follows at a concert). Rate the degree to which the two people in the clip seem to be ‘marching to the beat of the same drummer’.
I tested each question by having all the students code up the same video. While the joint attention questions were on average highly reliable, the entrainment questions were not. I improved reliability through debriefing sessions and focus groups. After several iterations, I created a coding book with guidelines and rules to code the videos accurately and reliably.
The results of the research were published in two papers:
Amer, Mohammed R., Behjat Siddiquie, Amir Tamrakar, David A. Salter, Brian Lande, Darius Mehri, Ajay Divakaran, “Human Social Interaction Modeling using Temporal Deep Networks”, ACM Multimedia, 2015.