Google

Mar 27, 2012

Java 5 Executor Framework - why use thread pools?

Q. What is a thread pool, and how will you create them in Java? Why do you need an Executor framework?
A. A thread pool consists of a collection of worker threads. A work queue is optional though most of the advanced implementations have a configurable work queue. The threads in the pool constantly run and  check the work queue for new work. If there is new work to be done they execute this Runnable.

In Java 5, Executor framework was introduced with the java.util.concurrent.Executor interface. This was introduced to fix some of the shortcomings discussed below.

1. The Executor framework is a framework for standardizing invocation, scheduling, execution, and control of asynchronous tasks according to a set of execution policies.


2. Even though the threads are light-weighted than creating a process, creating them utilizes a lot of resources. Also, creating a new thread for each task will consume more stack memory as each thread will have its own stack and also the CPU will spend more time in context switching. Creating a lot many threads with no bounds to the maximum threshold can cause application to run out of heap memory. So, creating a ThreadPool is a better solution as a finite number of threads can be pooled and reused. The runnable or callable tasks will be placed in a queue, and the finite number of threads in the pool will take turns to process the tasks in the queue.


Here is the sample code:



import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

public class Sum  implements Runnable {
 
    private static final int NO_OF_THREADS= 3;
 
    int maxNumber;
 
    public Sum(int maxNumber) {
       this.maxNumber = maxNumber;
    }
 
    /** method where the thread execution will start **/
    public void run(){
        int sum = 0;
        for (int i = 0; i = maxNumber; i++) {
           sum += maxNumber;
        } 
        
        System.out.println("Thread " + Thread.currentThread().getName() + " count is " + sum);
    }
    
    
    /** main thread. Always there by default. **/
    public static void main(String[] args) {
       ExecutorService executor = Executors.newFixedThreadPool(NO_OF_THREADS);   // create a pool of 3 threads
       for (int i = 10000; i < 10100; i++) {
          Runnable worker = new Sum(i);               // create worker threads
          executor.execute(worker);                   // add runnables to the work queue 
       }
  
       // This will make the executor accept no new threads
       // and finish all existing threads in the queue
       executor.shutdown();
  
       // Wait until all threads have completed
       while (!executor.isTerminated()) {

       }
  
       System.out.println("Finished all threads");
    }

}

3. The Runnable interface's void run( ) method has no way of returning any result back to the main thread. The executor framework introduced the Callable interface that returns a value from its call( ) method. This means the asynchronous task will be able to return a value once it is done executing.

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;


public class Sum  implements Callable<String> {
 
 private static final int NO_OF_THREADS = 3;
 
 int maxNumber;
 
 public Sum(int maxNumber) {
    this.maxNumber = maxNumber;
 }
 
  /** method where the thread execution will start
    *  this can return a value
    */
    public String call(){
        int sum = 0;
        for (int i = 0; i <= maxNumber; i++) {
            sum += maxNumber;
        } 
        
        return Thread.currentThread().getName() + " count is " + sum;
    }
    
    
    /** main thread. Alwyas there by default. **/
    public static void main(String[] args) {
      ExecutorService executor = Executors.newFixedThreadPool(NO_OF_THREADS);                       // create a pool of 3 threads
      List<Future<String>> list = new ArrayList<Future<String>>(10);  // provides facility to return results asynchronously
     
      for (int i = 10000; i < 10100; i++) {
        Callable<String> worker = new Sum(i);                 // create worker threads 
        Future<String> submit = executor.submit(worker);      // add callables to the work queue
        list.add(submit);                                            // provides facility to return results asynchronously
      }
  
      //process the results asynchronously when each thread completes its task
      for (Future<String> future : list) {
        try {
            System.out.println("Thread " + future.get());
        } catch (InterruptedException e) {
            e.printStackTrace();
        } catch (ExecutionException e) {
           e.printStackTrace();
        }
      }
  
  
      executor.shutdown();
  
      System.out.println("Finished all threads");
   }

}

The output is something like

Thread pool-1-thread-1 count is 100010000
Thread pool-1-thread-2 count is 100030002
Thread pool-1-thread-3 count is 100050006
Thread pool-1-thread-1 count is 100070012
Thread pool-1-thread-1 count is 100090020
...

4. The various Executor implementations provide different execution policies to be set while executing the tasks. For example, the ThreadPool supports the following policies:

  • newFixedThreadPool: Creates threads as tasks are submitted, up to the maximum pool size, and then attempts to keep the pool size constant.
  • newCachedThreadPool: Can add new threads when demand increases, no bounds on the size of the pool.
  • newSingleThreadExecutor: Single worker thread to process tasks, Guarantees order of execution based on the queue policy (FIFO, LIFO, priority order).
  • newScheduledThreadPool: Fixed-size, supports delayed and periodic task execution.
5. The ExecutorService provides facilities to shut down an application gracefully, abruptly, or somewhere in-between.

Q. What design pattern does the executor framework use?
A. The Executor is based on the producer-consumer design pattern, where threads that submit tasks are producers and the threads that execute tasks are consumers. In the above examples, the main thread is the producer as it loops through and submits tasks to the worker threads. The "Sum" (i.e. a worker thread) is the consumer that executes the tasks submitted by the main (i.e. consumer) thread. Check this blog to learn  more detailed explanation on the producer-consumer design pattern.


Labels:

Mar 26, 2012

Java multi-threading interview questions and answers: atomic operations

Q. Can you give some examples of thread racing conditions you had experienced?
A.

1. Declaring variables in JSP pages are not thread-safe. The declared variables in JSP pages end-up as instance variables in the converted Servlets.

<%! Calendar c = Calendar.getInstance(); %>

2. Decalring instance variables in Servlets is not thread safe, as Servlets are inherently multi-threaded and gets accessed by multiple-threads. Same is true for the Action classes in the struts framework.

3. Some of the Java standard library classes like SimpleDateFormat is not thread-safe. Always check the API to see if a particular class is thread-safe. If a particular class or library is not therad-safe, you could do one of three things.


  • Provide your own wrapper class that decorates the third-party library with proper synchronization. This is a typical use of the decorator design pattern.
  • Use an alternative library, which is thread-safe if available. For example, Joda Time Library. 
  • Use it in a thread-safe manner. For example, you could use the SimpleDateFormat class as shown below within a ThreadLocal class. Each thread will have its own instance of the SimpleDateFormat object.


public class DateFormatTest {

  //anonymous inner class. Each thread will have its own copy
  private final static ThreadLocal<SimpleDateFormat> shortDateFormat =  new ThreadLocal<SimpleDateFormat>() {
            protected SimpleDateFormat initialValue() {
                 return new SimpleDateFormat("dd/MM/yyyy");
             }
  };
   
      
 public Date convert(String strDate)
                     throws ParseException {
 
    //get the SimpleDateFormat instance for this thread and parse the date string  
    Date d = shortDateFormat.get().parse(strDate);
    return d;
  }

}

4. The one that is very popular with the interviewers is writing the singleton classes that are not thread-safe.

Q. Can you have a true singleton class in Java? How would you write a thread-safe singleton class?
A. A singleton class is something for which only one instance exists per class loader. Single instance for a whole application cannot be guaranteed. That is just definition of what a singleton is. The one that is  popular with the interviewers is writing a thread-safe singleton class. For example, the following singleton class is not thread-safe because before a thread creates the Singleton instance, another thread can proceed to the instantiation part of the code -- instance = new Object( );  to create more than one instance of the Singleton object. Even though the code --> instance = new Object( ); appears to be single line, the JVM has to execute a number of internal steps like allocating memory, creating a new object and assigning the newly created object to the referenced variable. Only after the completion of these steps, the condition instance == null will return false.

//final so that cannot be subclassed
public final class Singleton {
 
    private static Object instance = null;
 
    //private so that it cannot be instantiated from outside this class
    private Singleton() {}
 
    public static Object getInstance() {
        if (instance == null) {
            instance = new Object(); 
        }
 
        return instance;
    }
}


So, you can make the above code thread-safe in a number of ways.


Option 1: Synchronize the whole method or the block of code. This approach is not efficient as the use of synchronized keyword in a singleton class means that only one thread will be executing the synchronized block at a time and all other threads would be waiting.


Option 2: Eagerly initialize the singleton instance when the class is actually loaded as opposed to initializing it lazily at at run time only when it is accessed.


//final so that cannot be subclassed
public final class ThreadSafeSingleton {
 
    //eager initialization and instantitiated as soon as the class is loaded by a classloader in the JVM 
    private static Object instance = new Object();
 
    //private so that it cannot be instantiated from outside this class
    private Singleton() {}
 
    public static Object getInstance() { 
        return instance;
    }
} 

Option 3: You can use the "Initialize-On-Demand Holder Class" idiom proposed by Brian Goetz to create a thread-safe lazy-initialized Singleton as shown below by creating an inner class.


public final class ThreadSafeSingleton {
 
    //private so that it cannot be instantiated from outside this class
    private ThreadSafeSingleton() {}
    
    //static inner class, invoked only when ThreadSafeSingleton.getInstance() is called
    private static class ThreadSafeSingletonHolder {
        private static ThreadSafeSingleton instance = new ThreadSafeSingleton();
    }
 
    public static Object getInstance() { 
        return ThreadSafeSingletonHolder.instance;
    }
}


Option 4: is to create a per thread singleton as discussed earlier with the ThreadLocal class for the SimpledateFormat.



Q. Explain how you would get thread-safety issues due to non-atomic operations with a code example?
A. The code snippets below demonstrates non-atomic operations producing incorrect results with code. The program below uses a shared Counter object, that is shared between three concurrent users (i.e. three threads). The Counter object is responsible for incrementing the counter.


Firstly, the Counter class. The counted values are stored in a HashMap by name (i.e. thread name) as the key for later retrieval


import java.util.HashMap;
import java.util.Map;

public class Counter {

 //shared variable or resource
 private Integer count = Integer.valueOf(0); 
 
 private Map<String, Integer> userToNumber = new HashMap<String, Integer>(10);

 public void  increment() {
  try {
   count = count + 1;   //increment the counter
   Thread.sleep(50);    // to imitate other operations and to make the racing condion to occur more often for the demo
   Thread thread = Thread.currentThread();
   userToNumber.put(thread.getName(), count);
  } catch (InterruptedException e) {
   e.printStackTrace();
  }

 }
 
 
 public Integer getCount(String name) {
  return userToNumber.get(name);
 }
 
}




Next, the Runnable task where each thread will be entering and executing concurrently.


public class CountingTask implements Runnable {
 
 
 private Counter counter;

 public CountingTask(Counter counter) {
  super();
  this.counter = counter;
 }

 @Override
 public void run() {
  counter.increment();
  Thread thread = Thread.currentThread();
  System.out.println(thread.getName() + " value is " + counter.getCount(thread.getName()));
  
 }

}



Finally, the Manager class that creates 3 new threads from the main thread.


public class CountingManager {
 
 public static void main(String[] args) throws InterruptedException {
  
  Counter counter = new Counter(); // create an instance of the Counter
  CountingTask task = new CountingTask(counter); // pass the counter to the runnable CountingTask

  
  //Create 10 user threads (non-daemon) from the main thread that share the counter object  
  Thread thread1 = new Thread(task, "User-1");
  Thread thread2 = new Thread(task, "User-2");
  Thread thread3 = new Thread(task, "User-3");
  
  
  //start the threads
  thread1.start();
  thread2.start();
  thread3.start();
  
  
  //observe the racing conditions in the output
  
 }

}


To see the racing condition, inspect the output of the above code


User-3 value is 3
User-1 value is 3
User-2 value is 3


All three threads or users get assigned the same value of 3 due to racing conditions. We are expecting to see three different count values to be assigned from 1 to 3. What happened here is that when the first thread incremented the count from 0 to 1 and entered into the sleep(50) block, the second and third threads incremented the counts from 1 to 2 and 2 to 3 respectively. This shows that the 2 operations -- the operation that increments the thread and the operation that stores the incremented value in a HashMap are not atomic, and produces incorrect results due to racing conditions.


Q. How will you fix the above racing issue?
A. This can be fixed a number of ways.


Option 1: Method level synchronization. This is the simplest. As you can see, the increment() method is synchronized, so that the other threads must wait for the thread that already has the lock to execute that method.

import java.util.HashMap;
import java.util.Map;

public class Counter {

 //shared variable or resource
 private Integer count = Integer.valueOf(0); 
 
 private Map<String, Integer> userToNumber = new HashMap<String, Integer>(10);

 public synchronized void  increment() {
  try {
   count = count + 1;
   Thread.sleep(50);
   Thread thread = Thread.currentThread();
   userToNumber.put(thread.getName(), count);
  } catch (InterruptedException e) {
   e.printStackTrace();
  }

 }
 
 
 public Integer getCount(String name) {
  return userToNumber.get(name);
 }
 
}


Option 2: Even though the Option 1 is simple, it locks the entire method and can adversely impact performance for long running methods as each thread has to execute the entire method one at a time. So, the Option 1 can be improved by providing block level lock. Lock only those operations that are acting on the shared resource and making it non-atomic.


The code below uses an Object, which has its own lock to ensure that two threads cannot execute both the Operation 1 and 2 at the same time because there is only one lock.

import java.util.HashMap;
import java.util.Map;

public class Counter {

 //shared variable or resource
 private Integer count = Integer.valueOf(0); 
 
 private Map<String, Integer> userToNumber = new HashMap<String, Integer>(10);
 
 private Object mutex = new Object();   // a lock

 public void  increment() {
  try {
   synchronized(mutex) {
     count = count + 1;                         //operation 1
     Thread.sleep(50); 
     Thread thread = Thread.currentThread();
     userToNumber.put(thread.getName(), count); //operation 2
   }
   // there could be other operations here that uses the shared resource as read only
   
  } catch (InterruptedException e) {
   e.printStackTrace();
  }

 }
 
 public Integer getCount(String name) {
  return userToNumber.get(name);
 }
 
}


Option 3: This is a very trivial, but practical example. The Java 5 introduced locks and locks are better than using just objects for more flexible locking scenarios where Locks can be used in place of synchronized blocks. Locks offer more flexibility than synchronized blocks in that a thread can unlock multiple locks it holds in a different order than the locks were obtained. Here is the code that replaces synchronized with a reentrant lock. Synchronized blocks in Java are reentrant, which means if a Java thread enters a synchronized block of code, and thereby take the lock on the object the block is synchronized on, the thread can enter other Java code blocks synchronized on the same lock object.

For example, here is the demo of reentrant lock.

public class Reentrant{

  public synchronized method1(){
    method2();    //calls another synchronized method on the same object
  }

  public synchronized method2(){
    //do something
  }
}



Here is the Option 3 example using a ReentrantLock.

import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.locks.Lock;
import java.util.concurrent.locks.ReentrantLock;

public class Counter {

 // shared variable or resource
 private Integer count = Integer.valueOf(0);

 private Map<String, Integer> userToNumber = new HashMap<String, Integer>(10);

 private Lock mutex = new ReentrantLock(); // a lock

 public void increment() {
  try {
   mutex.lock();
   try {
    count = count + 1;
    Thread.sleep(50); 
    Thread thread = Thread.currentThread();
    userToNumber.put(thread.getName(), count);
   } finally {
    mutex.unlock(); // finally block is executed even if an
        // exception is thrown
   }
  } catch (InterruptedException e) {
   e.printStackTrace();
  } finally {

  }

 }

 public Integer getCount(String name) {
  return userToNumber.get(name);
 }

}



Note that the locks are unlocked in a finally block as it is executed even if an exception is thrown.

The output for the above 3 options will be something like shown below. The order cannot be guaranteed. But you will get unique numbers assigned for each user.

User-1 value is 1
User-3 value is 2
User-2 value is 3



Q. The following code snippet changes the Counter class to maintain individual counting as in each user counter will be incremented starting from 1. So, the Counter will no longer be the shared resource. The CountingTask class is also modified to loop through each user 2 times as shown below. Is there anything wrong with the code shown below?



The Counter class with individual counts

import java.util.HashMap;
import java.util.Map;

public class Counter {

 private Map<String, Integer> userToNumber = new HashMap<String, Integer>(10);

 public void increment() {
  Thread thread = Thread.currentThread();
  if (!userToNumber.containsKey(thread.getName())) {
   userToNumber.put(thread.getName(), Integer.valueOf(1));  //op1
  } else {
   Integer count = userToNumber.get(thread.getName());
   if (count != null) {
    ++count; // op2: increment it
    userToNumber.put(thread.getName(), count); //op3
   }
  }

 }

 public Integer getCount(String name) {
  return userToNumber.get(name);
 }
}


The counting task that repeats twice for each user

public class CountingTask implements Runnable {

 private Counter counter;

 public CountingTask(Counter counter) {
  super();
  this.counter = counter;
 }

 @Override
 public void run() {

  for (int i = 0; i < 2; i++) {
   counter.increment();
   Thread thread = Thread.currentThread();
   System.out.println(thread.getName() + " value is "
     + counter.getCount(thread.getName()));
  }
 }

}


A. If each user will be accessed by only one thread, then the above code is thread-safe because each user will be operating on his/her data. So, only one thread will access the map entry for User-1, and so on. But, what happens if User-3 has two threads created as shown below.

The Thread 3 and 4 are User 3. In this scenario, the above code is not thread safe, and it needs to be made atomic with one of the three options discussed above. It can be quite dangerous to assume that one user will be accessed only by one thread. What if in the future, additional threads are added to improve performance per user?

public class CountingManager {
 
 public static void main(String[] args) throws InterruptedException {
  
  Counter counter = new Counter(); // create an instance of the Counter
  CountingTask task = new CountingTask(counter); // pass the counter to the runnable CountingTask

  
  //Create 10 user threads (non-daemon) from the main thread that share the counter object  
  Thread thread1 = new Thread(task, "User-1");
  Thread thread2 = new Thread(task, "User-2");
  Thread thread3 = new Thread(task, "User-3"); //user 3
  Thread thread4 = new Thread(task, "User-3"); //User 3
  
  
  //start the threads
  thread1.start();
  thread2.start();
  thread3.start();
  thread4.start();
  
  
  //observe the racing conditions in the output
  
 }

}


If you don't perform the operations 1 to 3 atomically (i.e. as a unit), you will get an out put like

User-1 value is 1
User-1 value is 2
User-3 value is 2
User-3 value is 3
User-3 value is 2
User-3 value is 4
User-2 value is 1
User-2 value is 2



As you can see, the User-3 has the value 2 repeated twice and value 1 is missing. If you apply the one of the options outlined above, you will get an output like

User-1 value is 1
User-1 value is 2
User-3 value is 1
User-3 value is 2
User-2 value is 1
User-2 value is 2
User-3 value is 3
User-3 value is 4


Hence, the operations 1-3 need to be made atomic if accessed concurrently by multiple threads. Those three operations are

1. storing the initial value
2. incrementing the counter
3. storing the incremented value

Labels:

Mar 21, 2012

Java Interview Questions and Answers - performance testing your Java application

I have never been in a project or organization that is yet to have any performance or scalability issues. It is a safe bet to talk up your achievements in fixing performance or resource leak issues in job interviews.  It is a pet subject of many job interviewers and if you are in an interview with an organization that is facing performance issue, you will be quizzed on it. Refer to


Q. How would you go about performance testing your Java application?
A.

1. Using a profiler on your running code. It should help you identify the bottlenecks. For example, jprofiler or Netbeans profiler. A profiler is a program that examines your application as it runs. It provides you with useful run time information such as time spent in particular code blocks, memory / heap, etc. In Java 6, you could use the JConsole.


2. You also need to set up performance test scripts with JMeter to simulate the load. Most issues relating to performance bottlenecks, memory leaks, thread-safety, deadlocks, etc only surface under certian load. The performance testing scripts can be recorded while the application is being used and then manually refined. The tools like JMeter HTTP Proxy or Badboy software can be used to trace the script.

3. You could provide a custom solution with the help of AOP (aspect oriented programming) or dynamic proxies to intercept your method calls. Dynamic proxies allow you to intercept method calls so you can interpose additional behavior between a class caller and its "callee".

For example, without AOP or dynamix proxy


public interface Customer {
 
 public abstract void getDetails(String customerId) ;

}

public class CustomerImpl implements Customer{

 public void getDetails(String customerId) {
  long startTime = System.currentTimeMillis();
  try {
   // Actual method body...
   Thread.sleep(2000);
  } catch (InterruptedException e) {
   e.printStackTrace();
  } finally {
   long endTime = System.currentTimeMillis();
   System.out.println("method took: " + (endTime - startTime) + "ms");
  }
 }

}


The above approach is very intrusive. Imagine if you have to add this to 30 to 40 other methods. How do you turn this metrics monitoring on and off without commenting and uncommenting your actual method. With dynamic proxy or Spring based AOP, you can alleviate this problem. Here is an example of the dynamic proxy class.


import java.lang.reflect.InvocationHandler;
import java.lang.reflect.Method;

//uses Java  reflection
public class PerformanceLogger implements InvocationHandler {

 protected Object delegate; //actual object that performs a function

 public PerformanceLogger(Object delegate) {
  this.delegate = delegate;
 }

 @Override
 public Object invoke(Object proxy, Method method, Object[] args)
   throws Throwable {

  long startTime = System.currentTimeMillis();
  
  try {
   Object result = method.invoke(delegate, args); // invoke the delegate
   return result;
  }

  finally {
   long endTime = System.currentTimeMillis();
   System.out.println("method took: " + (endTime - startTime) + "ms");
  }
 }

}


Note: If you are using Java 5 or higher use the System.nanoTime( ) instead of System.currentTimeMillis(). For example,


long start = System.nanoTime(); // requires java 1.5
//some process
double elapsedTimeInSec = (System.nanoTime() - start) * 1.0e-9;

The test class that tests the above code

import java.lang.reflect.Proxy;


public class PerformanceTest {
 
 
 public static void main(String[] args) {
    Customer cust = new CustomerImpl();  
    PerformanceLogger pl = new PerformanceLogger(cust); 
    cust = (Customer)Proxy.newProxyInstance(CustomerImpl.class.getClassLoader(), new Class[] {Customer.class}, pl);
    cust.getDetails("8");
      
 }

}


With this approach, you don't need to put the System.currentTimeMillis( ) in your delegate classes. It only needs to be in PerformanceLogger, and it will intercept actual calls to the delegates to print performance metrics. If you want to make changes as to how the metrics are collected or want to turn off System.currentTimeMillis( ), you will only have to do it in one class, that is the dynamic proxy class PerformanceLogger.

If you are using Spring, you could use the AOP based interceptors. The AOP based deadlock retry using Spring is using a very similar approach for dead lock retry.

Note:I have been in interviews where the interviewer was more interested in the custom solution by asking

Q. If you did not have a profiling tool, how would you go about gathering performance metrics?
A. AOP or dynamic proxy based solution as discussed above. You could also mention the built-in command-line and graphical tools that come with Java  like JConsole, jstack, jmap, hprof, vmstat, etc. But the main focus must be via interceptors.

Design pattern: If you are asked to describe or talk about a design pattern, you could mention this dynamic proxy class as a proxy design pattern. Many pick either singleton or factory design pattern. It would be nicer to pick something other than these two common patterns. Some interviewers specifically ask you to pick anything except factory and singleton design pattern.

Q. What tips do you give someone regarding Java performance?
A.



1. Don't compromise on design: You should not compromise on architectural principles for just performance. You should make effort to write architecturally sound programs as opposed to writing only fast programs. If your architecture is sound enough then it would allow your program not only to scale better but also allows it to be optimized for performance if it is not fast enough. If you write applications with poor architecture but performs well for the current requirements, what will happen if the requirements grow and your architecture is not flexible enough to extend and creates a maintenance nightmare where fixing a code in one area would break your code in another area. This will cause your application to be re-written. So you should think about extendibility (i.e. ability to evolve with additional requirements), maintainability, ease of use, performance and scalability (i.e. ability to run in multiple servers or machines) during the design phase. List all possible design alternatives and pick the one which is conducive to sound design architecturally (i.e. scalable, easy to use, maintain and extend) and will allow it to be optimized later if not fast enough. Once you get the correct design, then measure with a profiler and optimize it.

2. Be aware of the death by thousand-cuts: Having said not to compromise on the design,  one needs to be mindful of  performance inefficiencies that can creep in throughout the software development. For example, an inefficient method being called 50-100 times can adversely impact performance. A real-life example would be a JSF application that invokes its life-cycle methods many times. So, having a long-running subroutine within the life-cycle method might end up calling it more than once. So, know your best practices and potential pitfalls. Here are a few things to keep in mind.

  • Use immutable objects where applicable. Immutable objects are inherently thread-safe and can also be reused. A good candidate for implementing the flyweight design pattern. The following Java method is an example of flyweight.

Integer.valueOf(String s)

        It keeps some amount of the created Integers internally, so, when you pass the String that you have
        passed before - it returns you an existing instance.


  • Check your regexes and SQL queries for backtracking and Cartesian joins respectively.


  • Define your objects and variables with the right scopes. The variables need to be defined at the lowest scope possible (e.g local --> instance --> static)
  • Use proven libraries, frameworks, built-in algorithms, and data structures as opposed to creating your own. For example, when handing concurrency use java.util.concurrent package.

3. Always have a performance focus by developing proper load testing scripts and benchmarks. The non-functional requirements should cover the relevant SLAs (i.e. Service Level Agreements). Tune your application server, database server, application, etc where required with proper bench marking and load testing scripts. The mission critical applications must have run time metrics gathering in place commercial tools. There are tools that can be used in production environment like YourKit for Java, JProfiler for Java, etc and for larger distributed and clustered systems with large number of nodes there are profilers like CA Wiley Introscope for Java, ClearStone for Java, and HP Performance managers. These tools are handy for proactive detection and isolation of server/application bottlenecks, historical performance trend tracking for capacity planning, and real-time monitoring of system performance.

Q. When designing your new code, what level of importance would you give to the following attributes? Rank the above attributes in order of importance?  

-- Performance
-- Maintainability
-- Extendibility
-- Ease of use
-- Scalability


Rank the above attributes in order of importance?  

A. This is an open-ended question. There is no one correct order for this question. The order can vary from application to application, but typically if you write 1 - extendable, 2 - maintainable and 3 – ease of use code with some high level performance considerations, then it should allow you to  optimize/tune for 4 - performance and 5 - scalability. But if you write some code, which only perform fast but not flexible enough to grow with the additional requirements, then you may end up re-writing or carrying out a major revamp to your code.

Note: These types of questions have no right or wrong answers, but can reveal a lot about your experience, passion, and attitude towards building a quality software. Good software developers and architects are opinionated based on their past experiences.

For writing  load/performance/stress testing scripts, the JMeter is a very useful free tool. Learn more about Java performance testing with JMeter

Labels: ,

Mar 7, 2012

Enterprise Java Interview Questions and Answers: What is new in JEE 6?

It is imperative to keep track of the  key enhancements and improvements to JEE. If you are interested in JEE basics the try Enterprise Java frequently asked questions and answers.


Q. What is new in the Java EE 6  (JEE 6) compared to Java EE 5 (JEE 5)?
A.

  • JEE 6 provides more extensibility points and service provider interfaces to plug into for the service providers. For example, "Java Authentication Service Provider interface Containers" provides a mechanism for the authentication providers to integrate with the containers.
  • JEE 5 favored convention over configuration by replacing XML with annotations and POJOs. The JEE 6 extended POJOs with dependency injection (i.e JSR 299 -- Context and Dependency Injection - CDI). This enables a JSF managed bean component to interact with an enterprise Java bean (i.e. EJB) component model to simplify development. It is about time all various types of managed beans are unified. In Java EE 6, the CDI builds on a new concept called "managed beans", which is managed by the enterprise edition container. In CDI, a managed bean is a Java EE component that can be injected into other components. The specification also provides a set of services like resource injection, lifecycle callbacks, and interceptors.   
  • Any CDI managed component may produce and consume events. This allows beans to interact in a completely decoupled fashion. Beans consume events by registering for a particular event type and qualifier.
  • The JAX-WS stack was overhauled to be an integrated stack with JAX-WS 2.x, JAXB 2.x, SAAJ 1.x, JAX-RS 1.x (new RESTful) and JAXR 1.x. (new pull-parsing API). 
  • Servlet 3.0 spec introduced the Async Servlet, file uploading functionality, and annotations based configuration.The web.xml file is optional.
  • Singleton EJBs  (i.e. one EJB per container) and asynchronous session beans  were introduced. The EJB was streamlined  with fewer classes, interfaces, and simpler object to relational mapping by taking advantage of the JPA.
  • Bean based validation framework was introduced to avoid duplication.
  • Enterprise Java Beans (i.e EJBs) can be packaged directly into a WAR file. No longer required to be packaged as JAR and then included into an EAR.
  • The JSF 2.0 simplifies the development of UI components.The JSF 2.0 has integrated Ajax and CDI support.  
Note: "Convention over configuration" is a design paradigm, which seeks to standardize, simplify, and decrease the number of decisions that developers need to make without compromising on flexibility. For example, ANT build tool allowed you to come up with any project structure, whereas Maven is strict on convention as to how the project is structured.

Q. How does the new bean validation framework avoid duplication?
A. Developers often code the same validation logic in multiple layers of an application, which is time consuming and error-prone. At times they put the validation logic in their data model, cluttering it with what is essentially metadata. JEE 6 Improves validation and duplication with a much improved annotation based bean validation. Bean Validation offers a framework for validating Java classes written according to JavaBeans conventions. You use annotations to specify constraints on a JavaBean. For example,

The JavaBean is defined below

public class Contact {

    @NotEmpty @Size(max=100)
    private String firstName;

    @NotEmpty @Size(max=100)
    private String surname;
    
    @NotEmpty @Pattern("[a-zA-Z]+")
    private String category;
    
    @ShortName
    private String shortName; //custom validation 

    ...

    public String getFirstName() {
        return firstName;
    }

    public void setFirstName(String firstName) {
        this.firstName = firstName;
    }

    ...
}

Custom validators can be defined by defining an annotation and relevant implementation

@ConstraintValidator(ShortNameValidator.class)
@Documented
@Target({ElementType.METHOD, ElementType.FIELD, ElementType.ANNOTATION_TYPE})
@Retention(RUNTIME)
public @interface ShortName {
       String message() default "Wrong name";
       String[] groups() default {};
}



Next the validation implementation class

public class ShortNameValidator implements ConstraintValidator <Contact, String> {

    private final static Pattern SHORTNAME_PATTERN = Pattern.compile("[a-zA-Z]{5,30}");

    public void initialize(Contact constraintAnnotation) {
        // nothing to initialize
    }
 
 
 public boolean isValid(String value, ConstraintValidatorContext context) {
        return SHORTNAME_PATTERN.matcher(value).matches();
    }

} 


You could use the validator as shown below

Contact contact = new Contact();
conatct.setFirstName("Peter");
//.... set other values

ValidatorFactory validatorFactory = Validation.buildDefaultValidatorFactory();
Validator validator = validatorFactory.getValidator();
 
Set<ConstraintViolation<Contact>> violations = validator.validate(Contact);



Q. What are the benefits of the asynchronous processing support that was introduced in Servlet 3.0 in JEE 6?
A.

1. If you are building an online chess game or a chat application, the client browser needs to be periodically refreshed to reflect the changes. This used to be achieved via a technique known as the server-polling (aka client pull or client refresh). You can use the HTML tag <META> for polling the server. This tag tells the client it must refresh the page after a number of seconds.

<META http-equiv="Refresh" content="10"; url="newPage.html" />

The URL newPage.html will be refreshed every 10 seconds. This approach has the downside of wasting network bandwidth and server resources. With the introduction of this asynchronous support, the data can be sent via the mechanism known as the server push as opposed to server polling. So, the client waits for the server to push the updates as opposed to frequently polling the server.



2. The Ajax calls are integral part of any web development as it provides richer user experience. This also means that with Ajax, the clients (i.e. browsers) interact more frequently with the server compared to the page-by-page request model. If an Ajax request needs to tap into server side calls that are very time consuming (e.g. report generation), the synchronous processing of these Ajax requests can degrade the overall performance of the application because these threads will be blocked as the servers generally use a thread pool with finite number of threads to service concurrent requests. The asynchronous processing will allow these time consuming requests to be throttled via a queue, and the same thread(s) to be recycled to process queued requests without having to chew up the other threads from the server thread-pool. This approach can be used for non Ajax requests as well.


Note: In JEE 6, The EJB 3.1 can also specify a Session Bean to be asynchronous.



Q. What are benefits of web fragements introduced in Servelt 3.0 spec?
A. Web applications use frameworks like JSF, Struts, Spring MVC, Tapestry, etc. These frameworks normally bootsrap (i.e register) via the web.xml file using the <servlet> and <listener> tags. For example


The web.xml file

<?xml version="1.0" encoding="UTF-8"?>
<web-app version="2.5" 
    xmlns="http://java.sun.com/xml/ns/javaee" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xsi:schemaLocation="http://java.sun.com/xml/ns/javaee 
    http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd">

    <servlet>
        <servlet-name>My JSFServlet</servlet-name>
        <servlet-class>javax.faces.webapp.FacesServlet</servlet-class>
        <load-on-startup>1</load-on-startup>
    </servlet>

    <servlet-mapping>
        <servlet-name>My JSFServlet</servlet-name>
        <url-pattern>/faces/*</url-pattern>
    </servlet-mapping>

</web-app>

If a particular application uses more than one framework, the above approach is not modular as you will have to bootstrap all the frameworks within the same web.xml file, making it large and difficult to isolate framework specific descriptors. The Servlet 3.0 specification addresses this issue by introducing web fragments. A web fragment can be considered as one of the segment of the whole web.xml and it can be thought of as one or more web fragments constituting a single web.xml file. The fragment files are stored under /META-INF/web-fragment.xml, and it is the responsibility of the container to scan the fragement files during the server start-up.

The web-fragment.xml file

<web-fragment>
    <servlet>
        <servlet-name>myFrameworkSpecificServlet</servlet-name>
        <servlet-class>myFramework.myFrameworkServlet </servlet-class>
    </servlet>

    <listener>
        <listener-class>myFramework.myFrameworkListener</listener-class>
    </listener>
</web-fragment>


Note: The Servlet 3.0 specification also provides enhanced pluggability by providing an option to add servlets, filters, servlet mappings and filter mappings during the run time.


Q. What do you understand by the term "Web profile" in Java EE 6?
A. Java EE 6 introduces the concept of profiles as a way to slimming the footprint of the Java EE platform and better target it for specific audiences. Profiles are configurations of the Java EE platform that are designed for specific classes of applications. For example, the required elements for a Web profile are

  • Java EE 6 (JSR-316)
  • Common Annotations for Java Platform 1.1 (JSR-250)
  • Dependency Injection for Java 1.0 (JSR-330)
  • Contexts and Dependency Injection for Java EE platform 1.0 (JSR-299)
  • Servlet 3.0 (JSR-315)
  • JavaServer Pages (JSP) 2.2 (JSR-245)
  • Expression Language (EL) 2.2 (JSR-245)
  • Debugging Support for Other Languages 1.0 (JSR-45)
  • Standard Tag Library for JavaServer Pages (JSTL) 1.2 (JSR-52)
  • JavaServer Faces (JSF) 2.0 (JSR-314)
  • Enterprise JavaBeans (EJB) 3.1 Lite (JSR-318)
  • Java Transaction API (JTA) 1.1 (JSR-907)
  • Java Persistence API (JPA) 2.0 (JSR-317)
  • Bean Validation 1.0 (JSR-303)
  • Managed Beans 1.0 (JSR-316)
  • Interceptors 1.1 (JSR-318)


The JEE 6 has also introduced the concept known as "pruning" to manage complexities. This is similar to the concept introduced in Java SE 6. Pruning is performed as a multistep process where a candidate is declared in one release but may be relegated to an optional component in the next release, depending on community reaction. For example, JAX-RPC will be pruned and replaced by JAX-WS. However, if Java EE application server vendors do include a pruned technology, they must do so in a compatible way, such that existing applications will keep running. The profiles and pruning are debatable topics and only time will tell if they work or not.

More relevant JEE questions and  answers:

Labels:

Mar 1, 2012

Database interview questions and answers



Q. What do you understand by the terms clustered index and non-clustered index?
A. When you create a clustered index on a table, all the rows in the table are stored in the order of the clustered index key. So, there can be only one clustered index per table. Non-clustered indexes have their own storage space separate from the table data storage. Clustered and non-clustered indexes are stored as binary search tree (i.e. keep data sorted and has the average performance of O(log n) for delete, inserts, and search) structures with the leaf level nodes having the index key and it's row locator for a faster retrieval.



Q. What is the difference between primary key and unique key?
A. Both primary key and unique key enforce uniqueness of the column on which they are defined. But by default, a primary key creates a clustered index on the column, whereas a unique key creates a non clustered index by default. Another major difference is that, a primary key doesn't allow NULL values, but unique key allows a single NULL.

Q. What are the pros and cons of an index?
A.

PROS

  • If an index does not exist on a table, a table scan must be performed for each table referenced in a database query. The larger the table, the longer a table scan takes because a table scan requires each table row to be accessed sequentially. So, indexes can improve search performance, especially for the reporting requirements.


CONS

  • Excessive non-clustered indexes can consume additional storage space.
  • Excessive non-clustered indexes can adversely impact performance of the INSERT, UPDATE, and DELETE statements as the indexes need to recreated after each of the above operation.
So, it is essential to have a right balance based on the usage pattern.


    Q. What are the pros and cons of stored procedures?
    A.

    PROS

    • pre-compiled and less network trips for faster performance
    • less susceptible to SQL injection attacks
    • more precise control over transactions and locking
    • can abstract complex data processing from application by acting as a facade layer.

    CONS

    • There are chances of  larger chunks of business logic and duplications creeping into stored procedures and causing maintenance issues. Writing and maintaining stored procedures is most often a specialized skill set that not all developers possess. This situation may introduce bottlenecks in the project development schedule.
    • Less portable.The stored procedures are specific to a particular database.
    • Scaling a database is much harder than scaling an application.
    • The application performance can be improved by caching the relevant data to reduce the network trips.


    So, when should stored procedures be used ?

    Stored procedures are ideal when there is a complex piece of business logic that needs complex data logic to be performed involving a lot of database operations. If this logic is required in many different places, then store procedure makes even more sense. For example, batch jobs and complex report generation that performs lots of database operations.


    So, when shouldn't stored procedures be used ?

    When you are performing basic CRUD (Create, Read, Update, and Delete) operations. For example, in a Web application a user creates some data, read the created data, and then updates or deletes some of the created data.



    Q. How would you go about writing a stored procedure that needs to loop through a number of selected rows?
    A. You need to use a cursor. A cursor is basically a pointer to row by operation. For example, you can create a cursor by selecting a number of records into it. Then, you can fetch each row at a time and perform some operations like invoking another stored proc by passing the selected row value as an argument, etc. Once uou have looped through all the records, you need to close and deallocate the cursor. For example, the stored procedure below written in Sybase demonstrates the use of a cursor.

    Apply to the database "mydatabase"

    use mydatabase
    go
    
    


    Drop the stored procedure if it already exists

    IF OBJECT_ID('dbo.temp_sp') IS NOT NULL
    BEGIN 
    
        DROP PROCEDURE dbo.temp_sp
        IF OBJECT_ID('dbo.temp_sp') IS NOT NULL
            PRINT '<<< FAILED DROPPING PROCEDURE dbo.temp_sp >>>'
        ELSE
            PRINT '<<< DROPPED PROCEDURE dbo.temp_sp >>>'
    END
    go
    
    

    Create the stored procedure that uses cursor


    create proc temp_sp 
    
    as
       DECLARE @ADVISERID char(10)
       DECLARE advisers_cur cursor 
         for select adviser_id  FROM  tbl_advisers where adviser_id like 'Z%'  -- select adviser_ids starting with 'Z'
         for read only
        
       
       open  advisers_cur                     -- open the cursor
       FETCH advisers_cur INTO @ADVISERID     -- store value(s) from the cursor into declared variables
       
       --@@sqlstatus is a sybase implcit variable that returns success/failure status of previous statement execution
       WHILE (@@sqlstatus = 0)
       BEGIN
          SELECT @ADVISERID                   -- select the adviser_id stored into  @ADVISERID
          FETCH advisers_cur INTO @ADVISERID  --store value(s) from the cursor into declared variables
       END
    
       close advisers_cur
       deallocate cursor advisers_cur
    
    go
    
    

    Execute the stored procedure that uses a cursor


    exec mydatabase..temp_sp
    


    Q. Why should you deallocate the cursors?
    A. You need deallocate the cursor to clear the memory space occupied by the cursor. This will enable the cleared space to be availble for other use.


    Q. How would you go about copying bulk data in and out of a database?
    A. The process is known as bulk copy, and the tools used for this are database specific. For example, in Sybase and SQLServer use a utility called "bcp", which allows you to export bulk data into comma delimited files, and then import the data in csv or any other delimited formats back into different database or table. In Oracle database, you achieve this via the SQLLoader. The DB2 database has IMPORT and LOAD command to achieve the same.


    Q. What are triggers? what are the different types of triggers?
    A. Triggers are stored procedures that are stored in the database and implicitly run, or fired, when something like INSERT, UPDATE , or DELETE happens to that table. There are 3 types of DML triggers that happens before or after events like INSERT, UPDATE, or DELETE. There could be other database specific triggers.

    Q. When to not use a trigger, and when is it appropriate to use a trigger?
    A.

    When to not use a trigger?


    The database triggers need to be used very judiciously as they are executed every time an event like insert, update or delete occur. Don't use a trigger where
    • database constraints like unique constraint, not null, primary key, check constraints, etc can be used to check for data validity.
    • triggers are recursive.

    Where to use a trigger?

    • Maintaining complex integrity constraints (referential integrity) or business rules where other types of constraints cannot be used. Because triggers are executed as part of the SQL statement (and its containing transaction) causing the row change event, and because the trigger code has direct access to the changed row, you could in theory use them to correct or reject invalid data.
    • Auditing information in a table by recording the changes. Some tables are required to be audited as part of the non-functional requirement for changes.
    • Automatically signaling other programs that action needs to take place when changes are made to a table.
    • Collecting and maintaining aggregate or statistical data. 


    Q. If one of your goals is to reduce network loads, how will you about achieving it?
    A.
    • you can use materialized views to distribute your load from a master site to other regional sites. Instead of the entire company accessing a single database server, user load is distributed across multiple database servers with the help of multi-tier materialized views. This enables you to distribute the load to  materialized view sites instead of master sites. To decrease the amount of data that is replicated, a materialized view can be a subset of a master table or master materialized view.

    • Write stored procedures to minimize network round trips.

    • Carefully crafting your SQL to return only required data. For example Don't do select * from tbl_mytable. Instead, specify the columns you are interested in. For example, select firstname, surname from tbl_mytable.

    • You can set the fetch size to an appropriate value to get the right balance between data size and number of network trips made. 
    Q. What are the other uses of materialized views?
    A.
    • Materialized view is one of the key SQL tuning approaches to improve performance by allowing you to pre-join complex views and pre-compute summaries for super-fast response time.

    • Materialized views are schema objects that can be used to summarize, precompute, replicate, and distribute data. E.g. to construct a data warehouse, reporting, etc. A materialized view can be either read-only, updatable, or writable. Users cannot perform data manipulation language (DML) statements on read-only materialized views, but they can perform DML on updatable and writable materialized views.

    • A materialized view provides indirect access to table data by storing the results of a query in a separate schema object. Unlike an ordinary view, which does not take up any storage space or contain any data. You can define a materialized view on a base table, partitioned table or view and you can define indexes on a materialized view.

    Q. If you are working with a legacy application, and some of the database tables are not properly designed with the appropriate constraints, how will you go about rectifying the situation?
    A. One possible solution is to write triggers to perform the appropriate validation. Here is an example of an insert trigger.

    CREATE TRIGGER TableA_itrig
      ON TableA  FOR INSERT
    AS
    BEGIN
     
     IF @@rowcount = 0
      RETURN
    
     IF NOT EXISTS
     (
      SELECT  *
      FROM  inserted ins, TableB ol
      WHERE  ins.code = ol.code
     )
     
     BEGIN
      RAISERROR 20001, "The associated object is not found"
      ROLLBACK TRAN
      RETURN
     END
     
    END
    


    Q. If you are working on a new application that requires stringent auditing requirements, how would you go about achieving it?
    A. Since it is a new application, there are a number of options as listed below.
     
    • The application is designed from the beginning so that all changes are logged either synchronously or asynchronously. Asynchronously means publishing the auditing messages to a queue or topic, and a separate process will receive these messages and write a database or flat file. All data changes go through a data access layer of the application which logs all changes
          
    • The database is constructed in such a way that logging information is included in each table, perhaps set via a trigger. This approach may adversely impact performance when inserts and updates are very frequent.

    Q. What if you have to work with an existing legacy application?
    A.  Use triggers.


    SQL Interview Questions and Answers

    Labels: