I've had a go at assessing the value of adding cities later in the game.
Taking startno's formula from post 11, of total production = a.x.(100-3x), where x is the number of equal sized cities, and each city has a production of a.
The balance point at which the total production for the x cities is equal to that of (x+1) cities is
x.(100-3x) = (x+1)(100-(3x+1))
which resolves to the same answer as startno reached solving it as a differential, e.g. 16 cities is best.
However, as various people have commented, later cities will be smaller and less productive than earlier cities for most or all of their lives, so the penalty on those earlier cities is more damaging.
If we make a guess that an extra city starting mid-game is worth only half of a early game city then the formula becomes:
x.(100-3x) = (x+1/2)(100-(3x+1))
which resolves to a peak at 11 cities.
Later still, adding a city in the late game might be worth say 1/4 of an early city so:
x.(100-3x) = (x+1/4)(100-(3x+1))
which resolves to just 6.
So if you have 6 big cities late game, then adding a 7th new city will reduce your total output.
By contrast, captured cities that are size 3 or 4 already and come with a set of completed buildings are much nearer to being early game cities.
Of course there are loads of other strategic factors such as the resources captured by a new city or its location at a choke point etc, not to mention access to bless spells, level 4 fortresses etc