Imagine that a video camera (or webcam) has been taped to your forehead and that you go around a room or go outside. The camera will record everything that you see and hear. Believe it or not, this is exactly how a 3D game works! In a FPS game, you cannot see the player, apart from their hands and weapon – this is showing the game in a first person perspective. Even if you have a game that shows a 3rd person perspective (ie. you can see the player such as in Max Payne), you still have a camera floating behind or in front of the person.
What we are trying to do in 3D graphics is to display 3D objects on a 2D computer screen. You may have read that our eyes convert everything from 3D to 2D much the same way as a camera. So we really see everything in 2D!!
To see how the computer processes and displays 3D graphics on a 2D screen, imagine that we have a car that we wish to put into a 3D game world. Then we want the player to be inside the world and view the car in this game world (like in a FPS game). In order to manipulate the graphics, the computer runs through some mathematics to convert or transform the 3D graphics into 2D screen graphics.
The steps to do this is as follows:
Step 1. The 3D object has its own coordinate space called object space or model space . It will have its own (x,y,z) coordinate system. The car is created using 3D modeling software including the shape, colour and texture (the metallic finish) of the car.
Step 2. We convert the model space to world space by placing the car into the game world (called translation). We may need to scale the car so that it is of the correct size and also possibly rotate it so that it is facing the right way in the game world. The car’s position in the world is called world coordinates (x,y,z).
The game world (game universe or game level) will have an origin (or centre point) so that you can put the car at the exact position (x,y,z) that you want. The game world itself would also be created using 3D modeling software and would be of a certain defined size.
Step 3. In order to view the car and other objects in the world, we use a camera which represents the player in a FPS game. Imagine that you are always looking through a video camera when you are moving around. When you look at the game world and the objects in it via the viewfinder (or digital screen) of the camera, you are looking at view space or camera space. The camera is positioned in the world at a point in world space (x,y,z).
Step 4. We convert the 3D view space or camera space into a 2D image – ie. we develop the film of the camera and have a 2D photo (or movie) of the object. But before this happens, the lens of the camera will determine how the part that you are viewing will look. This is still 3D space but defines what you see in the camera. This is called projection space. In the diagram below, the projection space is the small car and picture in the bottom of the pyramid defined by the camera.
We also need to represent depth or perspective in projection space. This means that the object will appear smaller as it gets further away and appear bigger as it gets closer. The computer will draw what we see inside the lens of the camera – in our case, it is the car and part of the scenery.
Step 5. Finally, the things we see in projection space gets put onto a 2D screen. This is called screen space. As you can see in the diagram below, what is captured in the camera lens is converted to a 2D screen. As the camera was positioned in front of the car, the player would also see the car as well as the background behind the car. Note that screen space is 2D space (ie. just (x,y) coordinates).
The way 2D space seems to look 3D is via the concept of perspective. As objects get closer to us, they appear bigger, but if the objects are further away, they appear smaller. This is how an artist would paint a landscape on a 2D canvas and it looks like 3D.
To do all the conversions from model space -> world space -> camera space -> projection space -> screen space, we use matrices (mathematical functions that help to transform the object through this space). What we are converting are the 3D coordinates (x,y,z) of all vertices of the objects until they finally give us the 2D screen coordinates and a picture can be drawn on the 2D computer screen as the player sees it.
The camera is an important part of viewing 3D graphics. The following are extracts from the DirectX help files. The camera that represents the player (hence you could think of the camera as the player’s eyes) looks at the world around it. Whatever is seen by the camera (and the player) is in the space between the Front Clipping Plane and the Back Clipping Plane as shown below. This space is called the Viewing Frustum and represents the 3D space that the player sees in the world – anything outside this space is irrelevant to the player. From this space, mathematics and physics of lens optics is applied (a camera and our eye has lenses) and this 3D space is converted to 2D space, or screen space, which is what the player sees on the computer screen.
The 2D bottom diagram shows a triangular area that is seen by the camera called the Field of View (FOV). When we program the camera, we can adjust what the camera is looking at in the world, whether the camera looks up, down, left, right, etc. We can also adjust the lens of the camera like zooming in or zooming out. Usually in our world, we program the camera lens to a default setting that a player’s eye would see normally sized objects to make things look real.
Acknowledgement: some of the images on this page were taken and manipulated from the Microsoft DirectX help file and Microsoft DirectX SDK sample media from their tutorials.