Researchers have proved that you never need more than three layers (input, hidden, and output). In a very few cases, it may be more convenient to have four layers, but you should understand why you should have a fourth layer before you add it. So just stick with three.
Increasing the number of nodes does not necessarily improve accuracy and may even decrease it. Increasing the number of nodes also forces you to have a correspondingly larger training sample.
Here's an analogy: Generate a set of data points from the equation y = ax + b (choose whatever a and b you want, but keep them constant once you have chosen them). To each data point add some random noise. Now, fit a curve to your noisy data. The obvious answer is that the curve y = ax + b should fit the data set very well, despite the fact that you have added noise.
But if you wanted to, you could choose a high dimension polynomial, maybe something like y = ax^10 + bx^9 + cx^8 + dx^7 + ex^6 + fx^5 + g^x4 + hx^3 + ix^2 + jx + k. You could find a set of coefficients, a to k, that fit your data set better than the simple y = ax + b relationship. Should the 10th order polynomial therefore be used to model your data? No. Why? Because if you wanted to predict additional data values, the relationship y = ax + b will do a far more reliable job of prediction than the 10th order polynomial. It will produce very stable results, while the high order polynomial may predict wildly wrong answers, particularly when you have to extrapolate rather than interpolate.
The ultimate purpose for your neural net is to predict and, in a way, it is very much like a curve-fitting process. So by using a smaller number of nodes, you may have a poorer fit to your training data than a net with a larger number of nodes, but when it comes time to predict, the net with a smaller number of nodes will tend to be more stable and produce more realistic answers than the one with a large number of nodes.
When you go beyond the bounds of the training set - analogous to extrapolating the curve fit - a neural net with a large number of nodes is likely to be extremely unstable, where a much simpler net may produce much more reasonable answers.
If I recall correctly, the minimum number of training samples should be the product of the number of input nodes times the number of hidden nodes. You can always use more training samples, but never less. So as you increase the number of nodes, you dramatically increase the number of training samples required before you can start to get reasonable answers.