Introduction to programming in C

A Brief History of C:

C originated because there was a growing need of writing programs easily that could be ported to several devices with different architectures, while utilizing the architecture to its maximal. Dennis Ritchie and Ken Thompson had originally written UNIX in assembly for a device known as PDP-7 by DEC. DEC later released their very popular PDP-11 and Ritchie and Thompson wished to write UNIX for this device.

Developers back then had limited number of tools using which they could develop software. The popular ones were assembly and some high level languages like B (reduced form of another language called BCPL). The problem with languages like B was that they were “typeless” in nature ie they only had one data type, the word data type. For example, on a machine like PDP-11, handling characters became much more complicated using B because it required calls to library procedures to spread packed strings into individual cells and then repack them again. Similarly there were problems with writing operators for floating point arithmetic and handling pointers, making it significantly slower than most of its assembly counterparts for such operations.

Dennis Ritchie then started to data types to the B language and designed a compiler capable of producing programs fast and small enough to compete with assembly language. Soon many new features were added to the original B and after adding a preprocessor for the language, C was born in 1973. Changes were made to make the language portable, and new C compilers were developed to facilitate implementation of C on new platforms. By 1980’s there were several versions of C available, and it was necessary to develop some standards and specification, and C came to be known as ANSI C. ( An excellent account of the entire history of C is given by Dennis Ritchie himself on http://cm.bell-labs.com/who/dmr/chist.html )

What happens to a C program?

C++

A C program can be written in a file with the extension “.c”. There are other types of files that can be linked to the file. Header files usually contain syntax of the library function used and are saved in files with the extension “.h”. The compiler is basically a translator that converts one language to another target language (what is the compiler written in then?). The compiler starts by checking the syntax of the program and accepts on syntax that is meaning fully written. For example the lexical analyzer may check the variable name and only accept the name if it begins only with an alphabetical character or underscore. Each of the matched input is known as a token. The next stage is the parser or the syntactic validation. This stage checks if the tokens that the lexical analyzer collected “made sense” ie they form a valid construct of the language. For example while the lexical analyzer will find a sequence like “a+-b” perfectly valid, the parser will reject it because the sentence does not fit in the rules defined by the language. The compiler, meanwhile also records the variable names it has encountered and stores them in a “symbol table”. This is useful to check if all the variables used are defined as well or if a function used has been defined and much more. Using these constructs, the compiler then translates the source code into an intermediate code which is then further optimized and finally the code in the target language (usually a binary like .exe or .o) is generated( this is crucial part of the compiler and determines its performance. Several advanced data structures are used to generate the code. A good source for initial understanding this is given on http://dragonbook.stanford.edu/lecture-notes/Stanford-CS143/16-Intermediate-Rep.pdf ).

But what about functions not defined by us? The compiler only checks for the syntax of the functions (or variables) used. The syntax is given in header files. Header files only contain the declarations of the functions and structures. The compiler only checks if they have been used properly (like the arguments are of correct types or the structure accesses a proper variable). The linker then appropriately links the code to its library function. This allows room for several optimizations to be performed. Like for example, if the compiler finds out it is impossible that a function will be called (in case like if(false) { foo() ;}), the linker will not include the source file of the function being called ( foo() in this case) and hence will create a much smaller executable file. A preprocessor usually adds the declarations in the header files (specified using the #include macro) to the source file. The output is usually architecturally dependent executable (.out file for UNIX/LINUX or a .exe for Windows).

Why use C?

There are a plethora of programming languages at our disposal now. Languages such as python, Java make development much simpler and reduce the length of code significantly. They don’t require us to worry about memory or data types used, or handling pointers and dynamic memory which can be difficult. Yet even after 40 years of its development, C remains the most used language. C is used so much in practice for several reasons: It has been around for several years and a ton of source code is available (Linux being one of the most significant), the language is reasonably close to the machine and can be used to efficiently used to build system level tools like OS kernels and device drivers, and C gives the fastest performance and optimization when compared to other languages. C is a great for learning because one can get the better picture of advanced topics like networking and how they work.

Data Types and syntax of a C program

Suppose a problem says “design a simple authentication mechanism wherein someone above the age of 18 should be allowed access while someone below 18 should not be allowed access”. The basic idea is to accept a person (information about him/her) as input, check if his/her age is above 18. If so then execute a certain sequence of steps else print the error message and exit. The information about a person is called a variable while the sequence of steps executed constitutes a function. Every C program has variables (that represent information about some entity) and functions (that work on the information held by the variables to obtain certain output).
So a program may contain multiple functions? But where should the execution start from? After all, the program is executed in a step by step manner. This entry point from where the program starts executing is a function called main(). The main function (it is case sensitive) is the function where the execution starts from, and the different functions are called. We usually declare variables which we are going to use here.

C inherited from BCPL the concept of using curly braces (‘{’,’}’) to mark the beginning and end of a block of statements. The curly braces denote the scope of the variables inside the block. The statements written inside the block can access both variables inside the block as well as those declared outside the block. However for statements outside the block, the variables declared inside remain hidden and cannot be accessed. So a typical C program will have something as follows:

int main()
 {
 //initial declarations and definitions
 //some statements working on those variables
 {
 //declarations local to the block
 //some more statements
 }
 }

[We will see what the int before main means later]

No programming language can become popular unless it allows reusability of code. One would not have used C if every time they wrote a program they would have to write functions for basic functionalities, like display some result or take user input. C provides a mechanism to include library files whose variables and functions can be used to write a program. There are two parts to a library file- the binary file containing the executable code for the library function (It is in binary since it need not be compiled again and again) and the header file consisting of the declarations of the functions given by the library. The header file can be included in the source code using the “#include” macro as follows:

#include some_file_name.h
 #include some_file_name2.h
int main()
 {
 //statements that may use functions and variables in the header file
 }

#include is a macro call. Before compilation, the system software called the preprocessor will take appropriate action on all such macros. In this case, the preprocessor will find the file path specified and will include its contents (the declarations of the functions) in the source file. The #include will specify the preprocessor to look in a particular folder (set as an environment variable) and #include “file.h” asks the preprocessor to look in the same folder as the source file. There are other macros as well, like #define macro_name value where the preprocessor will replace every instance of macro_name with value.
Statements and variables.

A statement performs a certain operation. Variables are usually part of statement wherein the values they hold are used for computation or the value is updated. In C statements are separated using ‘;’ character. C differs from its predecessors by allowing developers to use different data types. The major types of in-built data types are int, char, float, double and void. int is usually used to store numeric input like integers, hexadecimal and octal numbers. Char stores the ASCII value of different characters, while float and double represent decimal numbers with different levels of precision. Void is used to represent the notion of “nothingness” as in a function that returns nothing or a function not accepting any argument. The void data type is particularly useful in pointers, when the data type stored at a particular address can be represented differently.

Data types are of definite size, which determines the range of numbers they can hold. The size of these data types is dependent on the underlying architecture. A variables value can be used to perform certain operations. Arithmetic operations can be performed on the variables. The traditional operators like ‘+’,’-‘,’*’,’/’ and ‘%’. C also includes unary operators like — (++) (pre decrement (pre increment) and post decrement (post increment) operators). The operator precedence is shown:

Category Operator Associativity

Category  Operator  Associativity 
Postfix () [] -> . ++ – – Left to right
Unary + – ! ~ ++ – – (type)* & sizeof Right to left
Multiplicative * / % Left to right
Additive + – Left to right
Shift << >> Left to right
Relational < <= > >= Left to right
Equality == != Left to right
Bitwise AND & Left to right
Bitwise XOR ^ Left to right
Bitwise OR | Left to right
Logical AND && Left to right
Logical OR || Left to right
Conditional ?: Right to left
Assignment = += -= *= /= %=>>= <<= &= ^= |= Right to left
Comma , Left to right

An expression can contain several operators with equal precedence. When several such operators appear at the same level in an expression, evaluation proceeds according to the associativity of the operator, either from right to left or from left to right. (The point to note is associativity doesn’t define the order in which operands of a single operator are evaluated). The direction of evaluation does not affect the results of expressions that include more than one multiplication (*), addition (+), or binary-bitwise (& | ^) operator at the same level. Order of operations is not defined by the language. The compiler is free to evaluate such expressions in any order, if the compiler can guarantee a consistent result.
Only the sequential-evaluation (,), logical-AND (&&), logical-OR (||), conditional-expression (? :), and function-call operators constitute sequence points and therefore guarantee a particular order of evaluation for their operands. The function-call operator is the set of parentheses following the function identifier. The sequential-evaluation operator (,) is guaranteed to evaluate its operands from left to right. (Note that the comma operator in a function call is not the same as the sequential-evaluation operator and does not provide any such guarantee. More on sequence points is given on http://msdn.microsoft.com/en-us/library/azk8zbxd.aspx ).

Conditional statements and iterative statements interpret the value 0 as false while other integers as true. Switch-case statements are more useful over if-else ladders for expressions checking for equality for char and int data types. DO-While loops are executed atleast once, since the conditional check is performed at the end of iteration. A for loop consists of optional initialization and loop variable update statements. The choice between while and for is arbitrary, based on which seems clearer. The for is
usually appropriate for loops in which the initialization and increment are single statements and
logically related, since it is more compact than while and it keeps the loop control statements
together in one place.

Associated with loops are break and continue statements. A break statement transfers control flow of the program to the first statement outside the loop that succeeds it. A continue statement transfers control to the start of the loop ie the conditional check statement of the loop. C does support goto and labels but is not advisable to use in practice.

Integers and characters are stored in binary format. Floating point numbers are stored as its mantisa and exponent (number = mantisa*10^exponent).Negative integers are represented using two’s compliment while the MSB bit of a floating point number is used to indicate sign. There are type modifiers which are able to modify the given data types to a particular format. The major ones are given below:

Group Modifiers
Size Modifiers Short, long
Sign Modifiers Signed, unsigned
Constant Modifiers Const
Volatile Modifiers Volatile
Storage Modifiers Auto, register, static, extern

The size modifiers decrease or increase the size of a data type and hence its range. The short and long data type is not applicable to floats. However one can add long to a double variable. The default data for numeric data is integer (as long as it fits in the size of int else long int) while for floating number is the double variable. The constant modifier allows declaration of variables whose value cannot change during execution of the program. The volatile modifier is usually used to let the compiler know that the value of the variable may change due to some background process (for example during a port interrupt). Register modifiers ask the compiler to keep the variable in the processor register as it might be accessed frequently. The sizeof is an operator that returns the size of the data. Auto is the default type of data whose scope is limited to the block it was declared in.

C also supports array of data. The array can be initialized as follows:
Data_type arr[]={a1,a2…..an} or Data_type arr[number]; The array cannot be initialized using a variable unless methods like malloc() are used for dynamic initialization. C also supports multidimensional arrays. Initialization of multidimensional arrays is similar to that of 1-D array initialization ie
Data_type arr[][]={{a1,a2,a3…ai},{aj….},….};. They can also be initialized using 1-D array and they will be automatically grouped according to the number of columns per row ie the above is similar to
Data_type arr[][]={a1,a2…..an};

Memory Layout of a C program

Once compiled, the program looks somewhat as given in the above diagram. The program is divided into several segments. The text segment holds the binary code of the program. The processor executes instructions one after another from this segment. A statement will usually make reference to a variable. Initialized variables (including global, static variables) are stored in the initialized data segment and similarly the ones that are uninitialized will be stored in the bss segment and initialized to zero (note however that these variables are still global or static and not the automatic variables). All these variables are declared (and/or defined) during compile time and hence the compiler allocates adequate space for it (the space allocated may be greater than the actual size since the OS does not deal will individual words of data rather deals in group of words. These are known as pages or segments depending on the memory management system used by the OS).

The stack/ heap segment is crucial for function calls and dynamic memory allocation. Once a function is called the control is transferred to a new address (since the function starts at a different address). The stack segment stores the automatic variables declared inside the function and one the function returns after completion the memory for these variables is released. The heap segment is used when dynamic initialization of data is required (for example when you want to create an array by only of the size specified by the user). The heap stands for heap of free data rather than the data structure heap and the blocks of free memory are stored in the form of a linked list. However heaps should be used carefully since over using the heap without freeing unused memory may cause memory leakage and runtime termination of the program. The size command on UNIX like systems gives a description of the amount of memory utilized by each of the resources.

Question.1

Consider the expression: p == 0 ? p+=1 : p+=2. The initial value of p is 10. The output of the program will be:

1. 11
2. 12
3. Compilation Error
4. Garbage Value

Answer:
(3). The Following expression will give a compilation error. The compiler will always try to group expressions of minimum length in case of ternary operator. Hence the compiler will first group the condition ie p==0. Then it will group the first expression ie p+=1 and finally it will group the next valid expression in the case p. Hence the final grouping will be ((p==0)?(p+=1):(p))+=2. This will give a compilation error that there is no left hand argument for the +=2 expression.

Question.2

What will be the output of the following code [Assume size of long and double=8, int and float = 4 and char =2]:

#include

int main(){
printf(“%d\t”,sizeof 6.5);
printf(“%d\t”,sizeof(90000));
printf(“%d”,sizeof(‘A’));
return 0;
}

1. 8 4 2
2. 16 8 4
3. Compilation Error
4. 8 2 2

Answer:
(1). The default data type for any floating point number is double while for any number it is int (as long as it fits in the range). Not e the () in sizeof are only required if keywords (like int, float etc) are used.

Question.3

Consider on order of modifiers in following declaration:

(i)char volatile register unsigned c;
(ii)volatile register unsigned char c;
(iii)register volatile unsigned char c;
(iv)unsigned char volatile register c;

1. All are incorrect
2. Only (i) is correct
3. Only (ii) is correct
4. All are correct

Answer:

(4). The order modifiers of a variable do not matter in C and hence all are correct and mean the same thing.

Question:4

What will be output when you will execute following c code?

#include

int main(){
int a=-5;
unsigned int b=-5u;
if(a==b)
printf(“Avatar”);
else
printf(“Alien”);
return 0;
}

1. Output will be Avatar
2. Output will be Alien
3. Compile Time Error: Illegal assignment
4. Compile Time Error: Cannot compare signed and unsigned numbers.

Answer:

(1). int a=-5;
Here variable a is by default signed int.
unsigned int b=-5u;
Constant -5u will convert into unsigned int. Its corresponding unsigned int value will be :
65536 – 5 + 1= 65532
So, b = 65532
In any binary operation of dissimilar data type for example: a == b
Lower data type operand always automatically type casted into the operand of higher data type before performing the operation and result will be higher data type.
In c signed int is higher data type than unsigned int. So variable b will automatically type casted into signed int.
So corresponding signed value of 65532 is -5
Hence, a==b

Question:5

What will be output of the following c program?

#include
int main(){
int _=5;
int __=10;
int ___;
___=_+__;
printf(“%i”,___);
return 0;
}

1. 5
2. 10
3. 15
4. Compilation error

Answer:

3. The program will work as a normal C program since the name of a variable can have multiple underscores.

Question 6

What will be output of the following c program?

#include

int main(){
int abcdefghijklmnopqrstuvwxyz123456789=10;
int abcdefghijklmnopqrstuvwxyz123456=40;
printf(“%d”,abcdefghijklmnopqrstuvwxyz123456);
return 0;
}

1. 40
2. 10
3. Compiler error
4. Garbage value

Answer:

(3). C only accepts the first 32 characters as the variable name. Thus there will be an error that the variable is redeclared.

Question 7 

What will be the output of the following program

int main()
{
int printf=12;
int a=10;
int b=a+printf;
}

1. Compile time error: Could not find ‘printf’
2. Run time error: Failed to load printf
3. No output
4. 22

Answer
(3).The program will compile fine and will execute. Since the stdio.h file is not included, printf is a valid variable name. However since the program outputs nothing, there will be no output.

Question 8

#include
int main()
{
int x = 10, y = 20, z = 5, i;
i = x < y < z;
printf(“%d\n”, i);
return 0;
}

1. 0
2. 1
3. Compiler Error
4. Garbage value

Answer

(2). The order of evaluation of operators will be as follows: first the ‘<’ operator will be evaluated having higher precedence and next the ‘=’ will be evaluated. Since there are multiple ‘<’ operators, their associativity is from left to right. Hence x < y will return 1, since 10

Question 9

What will be the output of the following program

#include
int main()
{
int a[5] = {2, 3};
printf(“%d, %d, %d\n”, a[2], a[3], a[4]);
return 0;
}

(1). 0 0 0
(2). Garbage Values
(3). 3 followed by Garbage values
(4). 2 2 2

Answer:

(1). C automatically initializes the remaining elements to 0.

Question 10

#include
int main()
{
float a=0.7;
if(a < 0.7)
printf(“C\n”);
else
printf(“C++\n”);
return 0;
}

(1). C
(2). C++
(3). Garbage value
(4). Compilation Error

Answer:

(1). If (a < 0.7) here a is a float variable and 0.7 is a double constant. The float variable a is less than double constant 0.7. Hence the if condition is satisfied and it prints ‘C’

Question 11

#include
int main(){
int i=5;
int a=++i + ++i + ++i;
printf(“%d”,a);
return 0;
}
(1). 22
(2). 21
(3). 24
(4). Depends on architecture

Answer:

(4). Such expressions are bad practice since the value of a i is changing multiple times in an expression. So it depends on the compiler and how it handles such expressions.

Question. 12

#include
int main(){
int i,j;
i=j=2,3;
while(–i&&j++)
printf(“%d %d”,i,j);
return 0;
}

(1). 1 3
(2). 2 3
(3). Compiler Error
(4). 2 31 4

Answer:
(1). The expression in the while loop is evaluated with i being decremented to 1. 1 && 2 results in 1 which is taken as true by the while loop and the program outputs 1 3 (due to post increment). Now in the next iteration i decreases to zero which terminates the loop.