Non-convergence in Non-linear estimation

Remember that convergence is rarely guaranteed for non-linear estimates. If your problem never converges, it may not be a bug in Stata and it may not be worthwhile letting it run forever. It may be that you are not presenting Stata with the data you think you are. But before giving up, you should try the following strategies.

Can you estimate a linear regression with the same variable set? Maybe one of your variables has no variation, or otherwise defective? If -reg- fails, a non-linear estimator is unlikely to succeed.
Insignificant variables are flat spots in the liklihood function. What happens if you remove most of the independent variables and keep only a few very significant ones? Does it converge then? If so, add the other variables back slowly and identify the problematic ones for disposal. This is especially important for high-dimensionality problems.
Are your data items of wildly different magnitudes? Year and year cubed are less of a strain on the optimizer if rescaled to (year-1980) and (year-1980)^3 (assuming 1980 is the mean year). As a bonus the coeficients are easier to understand. Scale the data so that all variables fall into similar ranges.
There are several -maximize_options- that can be useful. See
http://www.stata.com/manuals13/rmaximize.pdf
for the Stata documentation, but here are my sugestions:
-trace- displays the parameter values at each iteration. Are there pairs of variables that are shooting off to plus or minus infinity? That indicates a perfect predictor. If the liklihood continues to improve, with stablizing parameter values, that is progress.
-tolerance- allows you to loosen the convergence criterion. If the optimizer seems to be stuck at a particular place, perhaps that is the optimum?
-from- allows you to specify starting values. This is especially appropriate for the situation where "It used to converge, but doesn't now".
-technique- allows you to specify which of several hill-climbing techniques should be used, or which combination. Where one fails, another may succeed. Furthermore, what is appropriate at the initial values may not be optimal near convergence.
-difficult- is said to be useful when the optimizer complains "not concave".

If your problem converges in Stata N, but fails in Stata N+1, that may just mean that Stata N is better at your problem. Compare the liklihoods to confirm or refute. If N+1 has a higher liklihood then perhaps there is a problem.