Stereoscopy - How We See In 3D Less Technical Version
Stereoscopy or “Dual-Images” Two Views for One Person
Anyone who is interested in 3D TV and 3D TVs needs to understand stereoscopy and how we see in 3D. These less technical and more technical articles explore Stereoscopy in an easy to understand way that breaks down the mysteries of how we actually see 3D images.
The word “Stereo” typically means “more than one place,” and so “stereoscopy” can mean “vision from more than one place.” Since humans each have two eyes, we have two different views of every situation. Each view is usually pretty similar to the other, but sometimes they can be completely different. For example, if you put your right hand up against your nose with your palm facing left, then alternately close each eye, you’ll see that you get two completely different images of your hand. The left eye shows your palm on the far right and the right eye shows your knuckles on the far left. And if you try to open both eyes, you won’t see either image very well.
The human brain tries to take any two images provided by your eyes and stitch them together as a single image with an added sense of depth. Looking at any single image never quite provides that extra ‘map’ that almost seems to sit on top everything you see. This is your brain’s way of telling you that it has more accurate information than before so you can feel more confident about the relative position of objects around you. On the other hand, if your brain gets two incongruous images, it will rebel and reduce the information it gives you about both.
So our eyes can see any two sets of images, but our brains only give us a bonus sense of depth if those images make sense together. This means that we should strive to find the proper combination when making dual-image 3D technology. The general relationship of any two images meant for our eyes is as follows:
·The images should be similar but not exactly the same.
·Each eye should only see one image.
·The objects that are focused in one image should be focused in the other.
·Objects that are further away in the images should look smaller and shouldn’t change as much between the images. The opposite is true for closer objects.
·The viewer of the images should be in a predetermined viewing position.
You could skip all of the above statements and just say “the images need to look exactly like photographs taken by a viewer’s eyes.” This is true of course, but not everything is created as a photograph, and even artificial images that follow these rules can create a real sense of 3D depth. The only concept above that can put us in a real quandary is the need to know the viewer’s position.
Let’s say we know that a viewer is going to be looking at a set of dual-images really close, like with their face pressed against them. In this case, we can have two small pictures side by side because when they are that close to the face they will completely cover each eye’s range of vision. The opposite extreme would be to have two mountain sized pictures at a great distance. If you scaled up the two little pictures to make the big ones, they would quickly and completely overlap (if they get bigger and stay centered in front of your pupils, there is no way they cannot overlap). If the viewer from either case moves relative to the images, they would no longer show them the correct thing (though in the case of the mountain sized pictures you’d probably have to go miles to notice a change). This means that any set of dual-images can only be set up properly when the viewing position is known.
Let’s say we’ve got the dual-images we want (e.g. photographs taken at the location of someone’s eyes) and we can scale and move them however we want. To figure out the best way to present them, we still have to answer a series of questions:
·Where will the viewer’s be?
·How big should the images be at that distance?
·Will the images overlap?
·If the images overlap, how will you make it so each eye only sees one image?
·If the images don’t overlap, do you need a barrier to block their relative visibility?
·If the images placed really close to the viewer, how will they be to focus on them?
·How far can the viewer move outside of the intended position?
·Did the answer to any of the other questions create new issues? If so, how will you resolve them?
Now that we’ve listed the questions that any creator of dual-image 3D technology should ask themselves, let’s consider some of the real life approaches that have been used. Typically the function of a 3D device determines how the images are created, and that function tends to vary with its viewing position. Let’s start with devices that are viewed close and work out way up to more distant ones.
The closest that a display can be is right up against your eyes. This is typically the field of ‘glasses’ like displays, such as virtual reality headsets. Making dual-images in these kinds of displays is relatively easy, because they can be small enough where they won’t overlap. The images can each be made with a miniature screen and then separated by a barrier. The only real problem with this approach is that it’s hard to focus on things that are too close (try reading a book with your eye against the page, it’s impossible). The lack of a minimum focal distance can be readily resolved with lenses and other optical components.
The next distance we normally view things from is ‘book distance’ or ‘handheld’ distance. Since we typically don’t read books in 3D, dual-image 3D display technology in this field is reserved for phones, PDAs, portable games, and other small electronic devices that process images. Creating a 3D effect at this distance can no longer rely on side by side dual-images because they are too big to not overlap. The next best techniques are ones that send out images at slightly different angles to a viewer who’s looking head-on at the device.
The two most common 3D TV without glasses position-dependent technologies are lenticular lenses and parallax barriers. They both work off of the same type of dual-image display, where the screen takes each image, cuts it into tiny vertical strips, and then puts them together on one flat surface by alternating the strips. Lenticular lenses work by magnifying every other strip in one direction and the rest of the strips in the other direction, giving each eye one complete image that’s hidden from the other eye. Parallax barriers are created as a raised overlay of dark strips that cover up the alternating image strips on one image when viewed from one eye, and cover up the opposite strips when viewed from the other eye.
The next distance common for dual-image 3D displays is for computers and other stable devices that are just out of reach. Lenticular lenses and parallax barriers can be used for these devices when the viewer is right in front of them, but one of the problems with this approach is that there is often more than one viewer. There are two main ways to deal with additional viewers: either create a more complex lenticular lens or parallax barrier arrangement or have the viewers each wear 3D glasses (with the appropriate screen for those glasses).
Lenticular lenses and parallax barriers are concepts that can be combined, altered, or layered to make more advanced dual-image displays. The end result of these variations is added angles where the dual-images can be observed together (the basic techniques only allow for one dual-image view straight on). These added angles are like hotspots and any viewer who is within the range of one will see the 3D effect clearly. If there aren’t very many hotspots, it could be difficult to find a good position, but more hotspots are also more expensive.
3D glasses are another technique that can work well with computers. The idea is to send out dual-images together and then filter each unwanted image at the location of each eye. The great advantage to 3D glasses is that viewers can move wherever they want to because they are carrying the means to view the 3D with them (lenses and barriers on a dual-image screen cannot account for every position, because if they did, the locations where each image was sent would overlap-and that doesn’t make sense). There are two main categories of 3D glasses: active and passive. Passive means they just sit there and active means they need battery power.
Currently, there are two types of passive 3D TV glasses: color-splitting and light-splitting. Color splitting glasses use special dual-images that are colored differently and then laid on top of each other. Each image hits a colored filter on each lens of the glasses (red and cyan, typically), which only lets through the correct image for each eye. Light splitting glasses work similarly, except that instead of sending out two images of differing color, the light within each image is ‘polarized’ or aligned in one direction or the other. The light that is polarized one way goes through the 3D glasses lens that has the same polarization, and the opposite works for the other lens. The nice thing about this technique is that the colors of the dual-images remain intact.
Active 3D glasses only come in one major variety: shutter-splitting. Each lens can be set to clear or black by an electric current (it works the same as the parts of the numbers on a small calculator). The change between the two allows every other frame of a synced up TV to be visible to one eye or the other. One set of alternating frames shows the correct images to the left eye and the other set of frames shows the correct images to the right eye. It is possible to get the images mixed up if the sync is not right, in which case everything will appear to be inverted in depth (this also happens if you put polarized glasses on upside down). Because this technique does not send out two different ‘types’ of images, but rather just two sets alternating in time, the only real requirement for the display is that it has a high frame rate.
The next distance from which 3D technology is regularly viewed is from across a room, such as with a television. Televisions mostly use 3D glasses because it’s hard to predict exactly where the viewers will be. The glasses free hotspot techniques which use lenses and barriers can still work, but most of the hotspots have to be within a small angular range straight ahead. If they were too far to the side, then people sitting on a couch wouldn’t be able to move far enough to reach the next closest one. Televisions that are larger and further from the viewers become increasingly problematic for hotspot techniques. At the ultimate limit of something like a movie theater, 3D glasses are the only real option.
There is another technique that can work for any position. It is similar to the glasses/goggle technique where the dual-images are placed next to each other. The difference is that the general ‘side-by-side’ setup is used even when the dual-images are at a great distance and would normally overlap. This works because the images don’t need to overlap initially; they are made to overlap when the viewer crosses or uncrosses their eyes (depending on whether the left-right images are on the wrong side or not, respectively). It’s not very fun, but there are prismatic glasses that can create the overlap for you. Any dual-image display could technically use the side-by-side technique, but the disadvantages tend to outweigh the benefits.
Stereoscopy has a few other forms, such as the use of head-tracking technology; but the modern techniques all have one thing in common: they use two images to create the sense of 3D depth. Dual-images can also appear in more complex displays that have more than two images, but the added information those displays require makes them all but incompatible with the dual-image versions. It is therefore probably better (and simpler) to refer to 3D displays as being either dual-image displays or multi-image displays (see the article The Ultimate Future of 3D to learn more about multi-image displays), rather than the more general category of “stereoscopic displays” that is in common use today.
4/24/11
Change is Silver
Author of “How to Make a Holodeck” (5Deck.com) ~A uniquely illustrated book that derives several ways to create dense multi-image displays. Creator of Unili arT (UniliarT.com) ~Various creative designs on a number of gifts and products, including ‘mirrored’ stickers.
Stereoscopy - How We See In 3D More Technical Version
Stereoscopy or “Dual-Images” Two Perspectives for Everyone
Stereoscopy effectively means “stereo vision” or vision from more than one place. When you are standing in a lit room in real life, light hits every part of your body from every angle.
The only place light has any real effect though (besides heating you up, like infrared light from a fire on a cold night) is when it hits your two eyes. Each part of your body would ‘see’ something different if there were an eye there, and it’s no different for the two eyes that are actually present. They each have their own perspective on every situation.
In its simplest form, a perspective can just be considered an image (although a perspective can vary with focus, which is not possible with a flat image). This means that each person receives dual-images at all times. Since our brains are already used to receiving all this information, they can piece the images together into a relatively accurate depth map. The strange thing is that two images alone cannot convey this depth because they are just “two flat images.” So we need to make sure to present dual-images just the way we see them in real life. If we don’t do it just right, our brains will not be able to overlay an accurate depth map and will probably just get annoyed and confused.
Here are some generalized rules that any set of flat dual-images needs to follow to be properly mapped into 3D by our brains:
·The images have to be different.
·At least some part of each of the images has to be similar.
·The images have to be isolated from the opposing eyes.
·The images have to have the same focal distance.
·The apparent relative location of each similar part should vary less with objects at a greater distance.
·The apparent relative geometry of each similar part should vary less with objects at a greater distance.
·The position and relative size of the dual-images must account for intended viewer position.
Of course, the exact size and shape of everything on each image has only one correct configuration for any given viewing position. But even images that ‘kind of’ follow the above rules will appear to have some depth, including drawings and cartoons. The only rule that can cause some problems with real displays is that they have to account for the intended viewer’s position.
Let’s say you put two pictures right up in front of each of your eyeballs. Neither picture will overlap the other but they will still each cover all of your vision (like virtual reality goggles). But what about two gigantic pictures at a great distance? They would have to overlap because if they didn’t there would be no way either image could cover all of your vision. The end result is that any dual-image setup is really only made to be viewed from one position and everything is arranged around it, including the size of the dual-images, the degree to which they overlap, and of course the content of the images (perspective images change with distance and position, but flat dual-images do not).
Now that we kind of know what our dual-images should look like, how do we go about presenting them? Even if we create the images with cameras positioned at our eyeballs (and thus know that the images are correct), we still have to decide the best way to present them. The list of rules above will give us a good guide to finding a working configuration.
Assuming that we took two photographs as our dual-images and we know that everything in them has the proper relative perspective, here are some questions we still need to answer for their presentation:
·At what nominal distance will the dual-images be viewed?
·For that distance, how big should the images be?
·Will each image cover part or all of each eye?
·Will there be any overlap between the images in this configuration?
·If there is any overlap, how will each image be filtered from the opposing eye?
·If there is no overlap, how will you block any opposing visible parts (if any) from the opposing eyes?
·If the images are close to the eyes, how will you account for the small focal distance?
·How sensitive is the relative quality of the final configuration to viewer position?
·If the final configuration generates new issues, how will you account for them?
With all of these questions in mind, let’s look at some of the real approaches used to answer them. There are five primary viewing distances used to create dual-image displays. Let’s call them: glasses, handheld, computer, television, and theater-each corresponding to the device they are named after. Because most of the questions above rely on the viewing distance, the technology used for each distance tends to use similar approaches.
For the distance of glasses there is really only one technique used. The general idea is to send two small images to each of the eyes and separate them by a barrier. The dual-images can be generated by little screens, actual photographs, or something along those lines. This technique works because the distance to the eyes is so small that no overlap is required for the parts of the images shown (whether overlap would be required for complete peripheral vision is another question). The main problem with this technique is that our eyes can’t resolve images closer than a certain distance. The way to correct this is to set the images a little further away and then use strong lenses to make the light parallel. This way the images appear to be ‘at infinity’ and our eyes can relax when viewing them.
The second distance is for handheld devices. This distance is already too far away to have non-overlapping images in most situations, which means that we need to be clever in our approach. There is an additional limitation at this distance: it should not use 3D glasses. It’s an artificial limitation, but it makes sense because most handheld devices are smaller than a pair of 3D glasses so it would be inconvenient to carry around both. This limitation is offset by a significant advantage: most handhelds will be viewed from a specific position and distance, which means we can create dual-images for just one viewing position (or at least a relatively small range of positions).
Currently there are two well known means to create images for viewers who are at a known position and angle: lenticular lenses and parallax barriers. The idea for both is the same: first, take dual images and place them on top of each other by cutting them into little strips that alternate; second, optically split the opposing images between the viewer’s eyes. For lenticular lenses, a bumpy row of column-like lenses magnifies each little strip at different angles so that the correct image strips all grow together into two divergent images. For parallax barriers, a second layer of regularly spaced opaque strips is placed over the image strips so that the opaque strips uniquely cover each opposing eye’s image strips.
Both lenticular lenses and parallax barriers are built on top of a single normal screen. The screen can show two images with alternating columns of pixels for each image. If there were no lenses or barriers over this image it would just like look a double-exposed photograph or a 3D movie without glasses. The lenses and barriers each effectively take half the light of the image and send it to a certain angular range of viewing positions. This means that the total light of a screen is halved for each eye and the images appear somewhat darker (unless the total screen brightness is ramped up to compensate). The Nintendo 3DS (which uses parallax barriers) was one of the first commercially available handheld devices to use either technique for changing images, but many other 3D handheld devices are on their way.
The next distance is for computers. Although I call it a separate distance from handheld, it is relatively similar. The difference is that things at computer distance are not conveniently movable by hands, either because they are larger or because they are just out of reach. Parallax barriers and lenticular lenses are still a good idea at this distance because the viewer is usually facing the screen head-on and at a known distance. One new challenge can occur when there is more than one person viewing the screen. This is not as much of a problem with handheld devices because they can be easily turned and passed around. 3D glasses are also more plausible at this distance because the devices is encompasses are larger and less mobile.
The first thing to consider with the computer distance is how to apply lenticular lenses and parallax barriers to multiple viewers. There is no one answer; but different geometric configurations (e.g. multiple layers, angled components) can at least create multiple hotspots. This means that viewers whose eyes are centered at certain angles can properly see the dual-images. The more hotspots, the better, because the resulting display can then accommodate more viewing positions (though more hotspots also drives up cost). Another factor to consider is that the limits of viewing a hotspot can only be as big as the distance between a person’s eyes (you have to get your eyes to straddle the angle where the change in image occurs), which isn’t very much leeway.
Now that we are using a larger device, having 3D glasses as an accessory seems more reasonable. Although most people probably think of 3D glasses as a hindrance, they actually have the distinct advantage of allowing viewers to sit in any position. Because 3D glasses split the images at the glasses rather than at screen (like barriers and lenses) and because they move with the viewer, they create a mobile means of viewing dual-images.
There are three main varieties of 3D glasses in two broad categories. The first category is passive, with glasses that use color-splitting (anaglyph) and polarization-splitting techniques. The second category is active, which means that the glasses require power to operate, and it contains only shutter-splitting glasses.
Color-splitting overlaps two images of opposing colors (opposite on the color wheel, like red and cyan or yellow and blue) on a single screen. Color filters on each lens of the 3D glasses block out everything except the colorized image meant for the given eye. There are a number of advancements beyond the basic concept that make the dual-images progressively better, but because the technique is founded in separating colors, it can never completely recombine them for both eyes. The resulting 3D depth field comes at the cost of inaccurate image coloring.
Polarization-splitting is actually a little more complex than either of the other 3D glasses techniques, because it requires part of the splitting to be done at the screen. A special screen is required to send out two slightly modified images simultaneously. The first image goes through a polarized film to make the light line up in one direction, the second image goes through a different polarized film to make the light line up in a perpendicular direction (or in an alternate rotation for circular polarization). Although our eyes can’t distinguish light based on its relative orientation, 3D polarized glasses can. Each lens filters out the other eye’s image to create an effective dual-image setup. Because the lenses each filter out half the light of the screen, it appears to be half as bright. Polarized lenses only work completely when the glasses are parallel to the screen, but the impact is lessened by the fact that the dual-images are only accurate for the parallel orientation.
The third main type of 3D glasses uses active shutter-splitting. The lenses on such glasses are actually like little calculator screens. The parts of each number of a simple calculator or digital clock (without backlighting) show either clear or black. The same idea is used for each lens on 3D shutter glasses. An electric impulse alternates each lens to show through at the same speed each frame changes on a TV (which can be a normal TV as long as the frame rate is known, it is high enough to accommodate being halved for each image, and the glasses can be manually adjusted into alignment). Each frame of the TV is simply set to show the image intended for alternating eyes. This technique halves the total light and framerate perceived by each eye, but the shutter-splitting glasses do not have to be parallel to the screen (like polarization splitting ones), and they can theoretically work with any television.
The final two viewing distances are ‘television’ and ‘theater’. Television can use any of the lens, barrier, or glasses techniques described up to this point. The glasses techniques tend to work better for viewers at this distance because there are often multiple viewers, and they can’t all view the screen head-on. However, televisions with multiple hotspots (all lumped together rather than spread out between 180 degrees) can work if the viewers know they have to move a little to find one. The theater distance only works with glasses because lenses and barriers cannot provide as many hotspots as potential viewers.
There is one final type of 3D viewing technique that could theoretically work for any of the distances. It is side-by-side images. The glasses distance already uses side-by-side images because they don’t need to overlap to produce the correct results, but all of the other distances do require overlap for pictures of non-negligible size. The side-by-side technique keeps the pictures separate knowing that they can be made to overlap. If a viewer crosses (or uncrosses) their eyes when looking at neighboring dual-images, they can align them to create a third image hybrid of the two (crossing your eyes forces each eye to look at the ‘wrong’ image, uncrossing works if the images are on the correct sides for each eye). It’s a rather uncomfortable technique, but it is effective. It also works if the two images are overlapping, but the resulting image becomes more confusing because the original images overlap the hybrid image. The technique can be accomplished without eye strain by using prismatic glasses, but their use is not very common compared to the other types of 3D glasses.
There are some other ways to create dual-image 3D displays, notably with the use of motion tracking technology to adjust the lenses, barriers, and hotspots of the glasses-free (autostereoscopic) techniques. But some of the other techniques involve more than two different images, so they could not really be considered dual-image displays. Not because they don’t produce two images for each viewer or because they wouldn’t employ similar techniques, but because the need for more than two images inherently requires a different data-set than dual-image displays.
The concept of stereoscopy is technically generic enough to cover dual-image displays and multi-image displays (see “The Ultimate Future of 3D” to read about progressive advancements in multi-image displays), but there will always be a major gap between the two in terms of what they need to function. Dual-images are always meant to line up with your eyes, but when there are three or more images, where do the extra ones go? No generic number of views will ever have the same nicely defined locations as dual-image displays. This means it will always be more accurate to describe a 3D display as either dual-image or multi-image rather than just both as stereoscopic.
Note: I use the term “3D” in this article for three “spatial dimensions.” In “How to Make a Holodeck” the term has time as a possible dimension, so I prefer the term 4D for most instances where people use 3D (i.e. changing 3D images), but I still use 3D for clarity.
4/24/11
Change is Silver
Author of “How to Make a Holodeck” (5Deck.com) ~A uniquely illustrated book that derives several ways to create dense multi-image displays. Creator of Unili arT (UniliarT.com) ~Various creative designs on a number of gifts and products, including ‘mirrored’ stickers.