How we really watch a movie

Whenever research confirms something we feel we already knew intuitively, or from our own experience, there are always people who’ll scoff and say, “Well, I could have told you that!” And maybe they could have, but that’s not the point. Science is a discipline involving systematic observation and empirical evidence, not unverified hunches. Movies, of course, are optical illusions — photographic, electronic and/or mechanical phenomena that exploit the peculiarities of our eyes and brains… and elicit all manner of feelings. They are science and they are sometimes art, and the methods of studying one or the other can be complementary.

Take one of my favorite David Bordwell posts (“Hands (and faces) across the table“), which has recently been revived (resurrected! It’s alive!) through the eyes of science, thanks to DB’s guest-blogger, Tim Smith (“Watching you watch ‘There Will Be Blood’“), of Continuity Boy, the Department of Psychological Sciences at Birkbeck College, University of London, and The DIEM (Dynamic Images and Eye Movements) Project.

In 2008, DB wrote about the map scene in Paul Thomas Anderson’s “There Will Be Blood,” in which the camera remained fixed during a long take while the looks and gestures of the actors “directed” the viewer’s gaze. He wrote:

Without any close-ups or cutting, Anderson has skillfully steered us to the main points of the scene, which are carried by the performers. The drama builds through small changes of position, shifts of weight, and facial expressions that accompany the dialogue. (The somber, plaintive music adds an uneasy edge.) Daniel seems more threatening when we don’t see his reaction, and Anderson’s camera forces us to scrutinize Paul’s expressions and body language for signs that this is a scam. It takes confidence to make a raised hand the climax of a scene, but the gesture gains its force by being the most aggressive moment in an arc of quietly accumulating tension.

All the principles involved here–frontality, spacing of figures, slight shifts of compositional focus, actors’ body language–are simple in themselves, but they gain a strong impact by cooperating with one another. The scene’s quiet obliqueness is characteristic of the film, which, at least until the last few minutes, carries us along with hints about where the action might go and what drives its characters.

Talk about movie style always boils down to the old arguments about montage versus mise-en-scene — “editing” versus “composition,” to grossly oversimplify — but the “versus” only becomes obvious when a director employs a style that shifts toward one end of the spectrum or the other.¹ As DB wrote:

In books and blogs, I’ve expressed the wish that today’s American filmmakers would widen their range of creative choices. From the 1910s to the 1960s (and sometimes beyond), US filmmakers cultivated a range of expressive options–not only cutting and camera movement but other possibilities too. Studio directors were particularly adept at ensemble staging, shifting the actors around the set as the scene develops.

You can still find this technique in movies from Europe and Asia, as I try to show in Figures Traced in Light and elsewhere on this site. But it’s rare to find an American ready to keep the camera still and steady and to let the actors sculpt the action in continuous time, saving the cuts to underscore a pivot or heightening of the drama. Now nearly every American filmmaker is inclined to frame close, cut fast, and track that camera endlessly. I’ve called this stylistic paradigm intensified continuity.

As Los Angeles agent and former editor Larry Mirisch once put it in conversation with me: “They used to move their actors; now they move the camera.” Most of today’s prominent directors prefer kinetic camerawork and machine-gun cutting. This tends to make their staging rather simple and static: we get stand-and-deliver or walk-and-talk (subject of a blog entry here).

The result is a split in contemporary American style. Action scenes are often gracefully and forcefully choreographed (though sometimes the editing fuzzes up character position and overall geography). By contrast, conversation scenes, which could be choreographed as well, are handled either as a Steadicam walk-and-talk or simply as seated actors talking to one another, with cuts breaking up the lines and the camera on the prowl….

Don’t get me wrong. Like all styles, intensified continuity isn’t a bankrupt option; many fine directors, from the Coens to Michael Mann, have worked vigorous variants on it. What I’m arguing for is more plurality, more tones in the director’s palette. […]

The crucial fact is that in ensemble staging all these cues, and more, are at work at the same time. The director’s skill is orchestrating them so that they support one another, guiding us to see this or that…. For a long time, filmmakers knew intuitively how to coordinate these cues to create rich and intricate shots; I fear that they no longer know how.

This has long been a favorite subject of mine, ever since (for me) the flying scenes in “Top Gun” anatomized action and momentum into abstract swatches of random “movement” back in 1985. Forget 3D as a gimmick: I have always loved the ability of cinema (in the right hands, and eyes) to express the complex relationships of multiple dimensions — the integrity of space within the frame — through composition and (when judicious) cutting. In my post and video essay “Deep Focus: Freedom of (eye-)movement in eight of the greatest long takes ever,” I examined extraordinary feats of non-static direction in a variety of settings, from crowd scenes to intimate conversations, directed by Martin Scorsese, Eric Rohmer, Jacques Tati, Preston Sturges, Michael Haneke and others. (I wish I’d included the cafe dance scene from Jean-Luc Godard’s “Bande à part” for those who labor under the false impression that Godard, that old classicist, has always been about neo-Brechtian discontinuity.)

I’ve also written about how so many contemporary movies tend to shift into auto-pilot (“Dogme 09.8 Manifesto: Ten limitations for better movies“) during the action scenes (“The Dark Knight” came to my mind), but DB is right: today, simple conversation scenes are even less imaginatively directed than action set-pieces, and (as Rohmer demonstrated again and again), there’s no good reason for that. (And that’s why David Fincher’s “The Social Network,” which consists of virtually nothing but conversations, is the most impressively directed mainstream American film of 2010.)

What Tim Smith and colleagues have done is to actually study eye movements to see what we look at when we’re watching movies, and how effectively our attention can be directed. Smith went back to the aforementioned map scene from “There Will Be Blood” for something, DB writes, that “is almost unprecedented in film studies, I think: an effort to test a critic’s analysis against measurable effects of a movie. What follows may well change the way you think about visual storytelling. Tim’s colorful findings also suggest how research into art can benefit from merging humanistic and social-scientific inquiry.”

“The most striking feature” the researchers found when studying gaze behavior, Smith writes

is the very fast pace at which we shift our eyes around the screen. On average, each fixation is about 300 milliseconds in duration. (A millisecond is a thousandth of a second.) Amazingly, that means that each fixation of the fovea lasts only about 1/3 of a second. These fixations are separated by even briefer saccadic eye movements, taking between 15 and 30 milliseconds!

Looking at these patterns, our gaze may appear unusually busy and erratic, but we’re moving our eyes like this every moment of our waking lives. We are not aware of the frenetic pace of our attention because we are effectively blind every time we saccade between locations. This process is known as saccadic suppression. Our visual system automatically stitches together the information encoded during each fixation to effortlessly create the perception of a constant, stable scene.

Another notable finding is that, at any given time, most viewers were looking at a similar area of the screen:

The main factors influencing gaze can be divided into bottom-up involuntary control by the visual scene and top-down voluntary control by the viewer’s intentions, desires, and prior experience. As part of the DIEM project we were able to identify the influence of bottom-up factors on gaze during film viewing using computer vision techniques. These techniques allowed us to dissect a sequence of film into its visual constituents such as colour, brightness, edges, and motion. We found that moments of attentional synchrony can be predicted by points of motion within an otherwise static scene (i.e. motion contrast).

You can see this for yourself when you watch the gaze video. Viewers’ gazes are attracted by the sudden appearance of objects, moving hands, heads, and bodies. The greater the motion contrast between the point of motion and the static background, the more likely viewers will look at it. If there is only one point of motion at a particular moment, then all viewers will look at the motion, creating attentional synchrony.

This, I think, relates directly to my observations about the use of rack-focus techniques in a 3D movie like “Avatar.” (See “Avatar 3D headaches: Look at this! Don’t look at this!,” and “Avatar, the French New Wave and the morality of deep-focus (in 3D).) Because our eyes are are almost constantly in motion, at the movies as in life, a director who uses a shallow depth of field when shooting in 3D, especially when there are moving figures — “hands, heads, bodies” — that remain out of focus, is guaranteeing eyestrain for those who are looking around at what’s going on in the frame. The 3D gives us the illusion that we can focus on what’s out of focus in the background — but we can’t. In the latter post, I quoted a New York Times column that discussed the technology’s decoupling of “convergence and accommodation” (long before Walter Murch put his two cents in), which can be stressful on the eyes, and concluded: “There is no proven way to prevent this. But film buffs who have sat through multiple screenings of ‘Avatar’ say one trick is to avoid looking at unfocused parts of the scenes, which sounds a lot easier than it is.”

The DIEM research demonstrates how and why this is. Our eyes don’t usually look at just one thing for more than a fraction of a second at a time. I love the way Smith describes the way we see: “Our visual system automatically stitches together the information encoded during each fixation to effortlessly create the perception of a constant, stable scene.” In other words, we’re cutting, editing and moving the “camera” in our heads all the time — at an average of about three fixations per second.

This also helps explain (to me) why I resist the authoritarian style of non-stop rapid cutting. There’s not enough opportunity to look around, to observe the relationships between people and objects in the frame, to simultaneously notice (a la Rohmer) how somebody is listening and reacting while someone else is talking…. (Add the subtitles in there and your eyeballs are flying all over the place!) The technique DB calls “intensified continuity” — close framing, fast cutting, constant camera movement — dictates that you glimpse little more than one thing at a time — what ever it is the director wants you to see at that moment.

Nothing inherently wrong with that, but for me it can quickly become tiresome, mainly because it’s so inefficient (see “Locating the difference between a good movie and a not-so-good one“). When a film is just trying too hard, it can quickly become a bore/boor. As I never tire of saying, I tend to like a director who trusts the audience (and the actors) a little more and gives us some freedom to roam, while more subtly guiding our attention through space and time. (Or lets us wander all over the place, like Jacques Tati in “Playtime.”) The research indicates that a director who knows how to use this “illusion of volition” can just as effectively direct our attention through a scene (or shot) as a director using a more dictatorial, intensified continuity approach.

You’ll find lots more — including video and fascinating explanations of what these DIEM images mean, over at Tim Smith’s Observations on film art post.

– – – –

¹ Which reminds me of something DB quoted/paraphrased from Andre Bazin on the subject of deep focus in the work of Orson Welles and William Wyler with Gregg Toland, which Bazin claimed

respected the integrity of physical space and time. According to Bazin, traditional cutting breaks the world into bits, a series of close-ups and long shots. But Welles and Wyler give us the world as a seamless whole. The scene unfolds in all its actual duration and depth. Moreover, their style captured the way we see the world; given deep compositions, we must choose what to look at, foreground or background, just as we must choose in reality. […]

[Bazin wrote that deep-focus] “forces the spectator to participate in the meaning of the film by distinguishing the implicit relations” and creates “a psychological realism which brings the spectator back to the real conditions of perception.”

In addition, Bazin pointed out, this sort of composition was artistically efficient. The deep shot could supply both a close-up and a long-shot in the same framing–a synthesis of what traditional editing had given in separate shots.

Jim Emerson

Leave a comment

Related articles

‘South Park’ Held Hostage: Day 12

Don’t blame it on Spike!

The Boys Who Cried ‘Moral Equivalence’

Why ‘Crash’? Some theories…

Popular reviews

Trap House

All the Devils Are Here (2025)

The Merchants of Joy

American Sweatshop