Tell Me About Antarctica: Guidelines for In Situ Capture and Viewing of 360-degree Video Highlighting Antarctic Science

Introduction

The goal of this project was to identify ways in which Antarctic scientists could effectively convey their research to the general public through personal digital media, focusing on creating first-person learning experiences with 360-degree videos and immersive VR viewers.

The motivation for this work came from my own experience during certain guest lectures we experienced as PCAS students. The bulk of the material presented during the PCAS Programme involved subject matter experts speaking to the class about their areas of research. Sometimes this involved classroom lectures, other times short field trips. On several of these occasions, I was struck by the passion exhibited by the speaker about their given topic, and how contagious this passion was. I started to wonder if there was a way to capture that passion and present it to those who could not take part in the PCAS Programme. Later, I pondered whether there might be some way to expand this beyond PCAS material to a more-general framework for the presentation of scientific wonder to the general public.

As a long-time virtual reality (VR) researcher, I understood the powerful effect that VR can have on people, and wondered if there was a way to marry these two topics. I had only viewed a small number of 360-degree videos prior to this course, and had always seen this medium as a poor stepchild of fully interactive VR. The “interaction” was limited to moving the head around, while “real” VR allowed the user to use the hands, feet, etc. to interact. The more I thought about it, however, the more I began to think that immersive 360-degree video might be able to bridge the gap between passive traditional video and fully interactive VR, and provide deeper engagement for learners. This led me to the idea of exploring how this technology might be used as an accessible way to convey scientific concepts in an engaging manner.

Method

I used 360-degree video and audio to individually record subject matter experts (SMEs) talking about their respective fields of expertise. Viewers will be able to view the content using various devices, such as a mobile phone, standard desktop monitor, and/or a VR headset.

After experimentation, it was found that the main challenges for the use of 360-degree video were selecting the correct point of view for capturing the content, and whether to capture using a hand-held (selfie-stick) or fixed (tripod) camera support. In addition, the question of whether and how to move the camera during capture was also found to be problematic. These three issues will be systematically addressed in this report, providing a set of guidelines for effective capture of 360 video for conveying information from Antarctic SMEs speaking in the field.

We first conducted several capture sessions of SMEs (Paul Broady and Colin Monteath) in Christchurch in order to explore some of the issues in a less-extreme environment, and to allow us to make several takes to vary some of the parameters. Following this, we then captured actual SMEs (Chris Long, Emma Beech, Jonathan Tyler) in Antarctica, discussing topics in the field.

Related Work

I conducted a literature search for the main ideas behind the proposed research, namely the idea that immersive technologies could improve engagement and learning, and that how the content was captured would make a difference in the effectiveness of presenters in their efforts.

360-Degree Video

The notion of 360-degree video (also called “immersive video” or “spherical video”) refers to the capture of video images from a set of cameras (or camera lenses) rigidly connected to a single point in space, and facing in enough directions to capture the entire scene simultaneously. The captured images are then computationally processed, taking into account the locations, orientations and properties of the cameras to create a spherical image. The resulting spherical image can then be viewed using either standard display hardware (e.g., a phone, tablet or desktop) or a special stereoscope-type device, such as a Google Cardboard[1] (Figure 1) or Samsung GearVR[2], so called “VR viewers”. Interacting (i.e., panning and tilting) with the spherical image on traditional displays using either a mouse or finger swipes allows the viewer to look in any direction from the captured location. However, interacting with VR viewers is accomplished by simply turning or tilting the head, providing a more-natural way of interacting.

While both methods of viewing have their strengths and weaknesses, VR viewers are considered more immersive, in that the captured content becomes the primary focus of the viewer, and the head-movement interaction removes the need to handle the device through swiping or a mouse.

Figure 1. User with Google Cardboard (Image: Creative Commons)

VR and Immersion

There are several ways to categorise the technologies we will be using here. Slater, Usoh and Steed (1994) make a differentiation between immersion and presence. Immersion, they argue, is an objective description of the technology used to convey the content. By this definition, a standard desktop monitor would be less immersive than a surround projection system, since the former has a lower field of view. By contrast, presence is defined as “a psychological emergent property of an immersive system, and refers to the participant’s sense of ‘being there’ in the world created by the [virtual reality] system.” They go on to claim that “immersion is a necessary rather than a sufficient condition for presence – immersion describes a kind of technology, and presence describes an associated state of consciousness.” Slater et al. also promote the use of “body-centred interaction,” which means mapping physical user movements to changes in the virtual world in a natural (expected) way. Most virtual reality researchers of immersion and presence since this seminal paper was published have adopted similar definitions of immersion and presence.

From the learning sciences, Chris Dede (2009) uses the term immersion in a slightly different way. He describes the notion of immersion as “the subjective impression that one is participating in a comprehensive, realistic experience”, which involves “the willing suspension of disbelief.” This definition seems to blur the line between immersion and presence, and underscores the difficulty in classifying immersive experiences.

Dede (2009) goes on to state that the “design of immersive learning experiences that induce this disbelief draws on sensory, actional, and symbolic factors.” One could map Dede’s “sensory” immersion factors to Slater et al.’s (1994) overall definition of immersion. Dede’s “actional” factors refer to the idea that a person feels more present in an experience when she can exhibit greater control over the experience, such as altering the point of view based on head rotation (looking left), which maps well to Slater et al.’s body-centred interaction idea. In terms of “symbolic” factors, Dede promotes the use of content that is grounded in the familiar world of the participant. He claims that “[i]nvoking digital versions of archetypical situations from one’s culture deepens the immersive experience by drawing on the participant’s beliefs, emotions, and values about the real world.”

Based on this previous work, we think that a virtual reality headset should be the preferred method for delivering the content of 360-degree educational videos, since it will be more immersive, and will support a natural way of controlling what the person sees.

Getting the Right Shot

In filmmaking for traditional cinema, there is a relatively established “lexicon” of camera shots or movements. These are used in various ways to achieve the desired effects. Some filmmakers are well known for their preference for specific types of shots, and some genre (e.g., Westerns) are also known to employ certain shot types more than others (e.g., “Cowboy” for Westerns). Table 1 lists 30 of the most popular shot types used in traditional cinema [3].

This work provided me with a starting point from which to design my approach to capturing scientists in ways that might be effective.

Summary

From these sections, it was decided that the work should focus on viewing using an immersive VR viewer, and to use traditional camera shot descriptions as a starting point for capturing video content using 360-degree video. The next step was to prepare for capturing the first footage for the project.

Preparation for Shooting

A comparison was made between traditional video recording and 360-degree video recording, in order to tease out the essence of each. Then, decisions were made about which traditional shot types might be most appropriate, and which ones least.

Camera Hardware

The camera used in capturing all of the footage for this report was the Ricoh Theta V[4]. The camera has the advantage of ease of use, as it captures two fish-eye images and blends them into a single spherical image right on the camera. It also provides fairly advanced image capture options, such as settings for different lighting conditions and exposure times. In addition, it can capture both still and video footage, and be controlled remotely using a standard mobile phone. It also worked well in the cold. The negative side is that the memory is built in, with no option for expansion. This limited the video I could capture in the field, and forced me to trade off image resolution for the amount of footage.

360-degree Video Content vs. Traditional Video Content

There are several challenges in creating effective 360-degree videos compared to traditional videos. One is the lack of knowledge about what the position of the viewer will be when watching the content. For traditional video, the assumption is that the person is seated. But in the end, this may not matter, since the content does not change based on the posture of the viewer as it does with 360-degree video viewed with a VR viewer. For 360-degree video, the viewer may be seated in a stationary or swivel chair, standing, lying down, etc., and viewer posture matters, since head orientation is sensed to control the viewing direction on VR viewers.

A larger problem has to do with how to guide the viewer to see the action taking place in the video sequence. Because the viewer is free to look around, it could be that she misses important content, because she is looking in a different direction. This problem still has yet to be solved in the general case, and is beyond the scope of the current work. Since our target content has been selected to be educational video presented by a single presenter, we assume the viewer is looking at the presenter, and follows normal social conversational cues during the experience, such as looking where the presenter is indicating, focusing on things the presenter is talking about, and otherwise paying attention to the presenter. While these assumptions cannot be guaranteed, they are realistic for the target audience (i.e., people interested in the subject).

The list of shots for traditional film shown in Table 1, though not exhaustive, supports the artistic creativity necessary for film-based expression, across multiple genre and story elements. When deciding how to capture the SMEs for this new medium of 360-degree video, we first thought about how to use traditional shot techniques. The ability to look completely around in a scene, however, makes some of the shots (e.g., Pan, Two-Shot, Locked-Down Shot) unnecessary. Also, since the video will be viewed from a first-person viewpoint, techniques such as the Low Angle or High Angle shots might artificially influence the viewer into feeling subservient or superior to the presenter. Additionally, as mentioned earlier, large or sudden movements in VR have been shown to induce motion sickness in users, so many of the camera movement techniques that involve sudden involuntary camera rotations needed to be avoided. Finally, the personal nature of the genre (first-person educational videos) seemed to suggest that the presenter needed to employ some common social communication gestures while talking to help lead the viewer along.

It was decided to attempt to bring together some of these thoughts into an approach that might best convey the messages the presenters were trying to get across, while maintaining some intimacy with the viewer. By exploring some variations on different shots, the hope was to develop some new shots to assist in guiding people making such videos, and to provide some examples of these.

Preliminary Work in Christchurch

The videos I captured in Christchurch are summarised in Table 2, which also contains the lessons learned from viewing them in a VR viewer, and links to the individual videos. Each of these videos can be viewed using a standard Web browser using the link information provided. The mouse can be used to move the viewpoint around while a video is playing, simulating the head movement when the VR viewer is used.

The YouTube videos in Table 2 can be found through the following links

Follow-on Work in Antarctica

Following the test shoots, the lessons learned from them were then applied when deciding the methods used for capturing once in Antarctica. I decided to use any opportunities I could to capture content, and then to work through them once back in Christchurch in preparation for this report. Table 3 shows the list of content that was captured in Antarctica, along with the further lessons learned or confirmed from them. As before, links are provided that show the captured content.

The YouTube videos in Table 3 can be found through the following links

Variables to Consider when Planning 360-degree Video Content

From these two sets of captured videos, we can identify several aspects to consider when planning content for capture and consumption using 360-degree video. They involve the nature of the presenter, the content to be presented, and the way the content will be viewed.

Content Capture

There are several variables to consider before capturing SMEs discussing their work. These include:

Posture: Is the SME standing, seated in a chair, or walking around?
Eye Level: Is the SME looking up at, down on, or even with the camera (and hence the future viewer)?
Viewpoint Mobility: Is the camera attached to a fixed tripod, or is it under the control of the SME or third party (e.g., on selfie stick)?
Subject-matter Proximity: Will the SME be referring to content that is close-by (e.g., lichen on a rock) or far away (e.g., a distant volcano, penguins)?

Content Consumption

In addition, there are several variables to consider when choosing possible options for viewers of the content. These include:

Posture: Is the viewer standing, seated in a chair, or seated on the floor?
Available Technology: What technology is available to the viewer (e.g., phone/tablet, desktop, VR headset)?
Group/Solo Viewing: Is the viewer alone, or is the content to be viewed by a group?

Intuitively, many of these variables will have some interdependency. For example, if the SME is standing during capture, it may be better for the viewer to also be standing, and if group viewing is needed, a VR headset may not be the best option.

Possible Capture Scenarios

These aspects can be combined in numerous ways, and Table 4 lists the combinations that were explored in this work.

Recommendations

The exploration of 360-degree video capture options in this exercise has led me to define several new shot types (Table 5), and a set of guidelines. As with the traditional shot types listed in Table 1, these are designed to support the creative process of effective storytelling given the strengths and limitations of the given medium.

The following list is a set of guidelines to help 360-degree filmmakers better prepare for capturing content:

Presenters should
- be trained on how to present using this medium,
- speak to the camera as if it were a live listener, and
- use common communication gestures (pointing, showing things) to the camera while presenting.
If the presenter needs to show something like a hand-held object to the viewer, then this should be done in the same way (i.e., at the same distance) as if the camera were a real person.
If the camera is held by the presenter (e.g., on a selfie stick), then
- it can be moved along with the presenter, as long as it is moved at a similar pace to walking along with a person, and
- it should not be rotated around the presenter, as this creates large rotation movement of the view, and will cause motion sickness in the viewer.
The camera should be kept at eye level with the presenter.

It is hoped that these shots and guidelines can be used as a starting point for developing a lexicon of how to describe effective content creation using 360-degree video.

Footnote appeared in this article:

[1] https://vr.google.com/cardboard/

[2] http://www.samsung.com/global/galaxy/gear-vr/

[3] https://www.empireonline.com/movies/features/film-studies-101-camera-shots-styles/

[4] https://theta360.com/en/about/theta/v.html

References

Dede, C. (2009). Immersive interfaces for engagement and learning. Science,

323(5910), pp. 66–69. doi: 10.1126/science.1167311

Slater, M., Usoh, M. and Steed, A. (1994). Depth of Presence in Virtual Environments, Presence: Teleoperators and Virtual Environments, 3.2, pp. 130- 144.