faq 108204966

Population generation - theory and adjusting

by Karsten Hager on 2017-03-02 15:19:22

I‘ve been checking the South African Tutorial and also the MATSim book a few times. To set up the population for my model of Stuttgart, I have the following data (and im trying to use the connotation of the South African Tutorial):

Community profile data: inhabitants for each city district as well as their detailed age categories and household sizes, but only average income values for each district (not like in the South Africa tutorial income groups and detailed numbers for each group).

Reference sample: I have a detailed survey of roughly 2 weeks with detailed plans (Start-Coordinates, mode, End-Coordinates and also Coordinates from their respective activities they are travelling to; along with other variables like age, sex and citizenship) for household members. However, as far as I understand the theory behind population generation, this is no full survey which I can translate one to one into MATSim because its only from a part of the population and for a limited period of time.

I compared the data needed for the South Africa population generation tutorial with the data I have to build my model. I am missing the exact income values from the community profile data, the role of the person in the household, housing and main dwelling type. I suppose the missing income values are the biggest problem. The survey (the reference sample) however has a variable for income groups.

I have a few questions:
1) Can I follow the MLIPF workflow from the South Africa tutorial even though I dont have household income groups from the community profile data or would I need to assume some missing variables (e.g. the role in the household could be derived from age, sex and education)?
2) I have other additional data from the survey: e.g. possession of vehicles, public transport tickets, car availability. Is there, as of today, any chance to implement them as additional variables into the model and to have a significant impact on the population generation?
3) Would it cause any problems to just use the coordinates of the reference sample for the activities instead of any kind of traffic zones? I already transformed the coordinates into cartesian ones (Gauß-Krüger)
4) Just for the right usage of terms: events and plans are the output of the generation, but is there any difference between demand and population in MATSim?

I will keep a few % of the survey for validation / calibration of the model. I won‘t be doing any kind of freight modelling / commercial vehicles.

Thanks for any kind of help!

Comments: 1

Re: Population generation - theory and adjusting

by Johan W. Joubert on 2017-03-04 08:08:17

Hi Karsten,

personally, I don't think the income need to be such an inhibiting constraint. Especially in Germany where one can at least argue the economic inequality is less (at least much better than in South Africa ). I am sure one would be able to think about a creative way to match your community profile with the activity chains from the reference sample.

1) In the South African case income is (unfortunately) a proxy (highly correlated) with race... and the travel diary in our case did not have race as a variable, but we did have income. The purpose of the attributes is just to find a signature so that you can match activity chains (from your reference sample) to a similar individual in the community profile. What makes up that signature is up to you: age, gender, race, income, number of cars in the household, level of education, etc.

For example, if I have an individual with signature (age, gender, income) of earlyCareer_male_120000 in the community profile, then I would look for the twenty (or whatever your threshold) closest (geographical) individuals in the reference sample with the same earlyCareer_male_120000 signature. From those twenty I may then choose to randomly pick one, and assign it's plan to the community profile individual, or if I do not find anyone with the same signature, I may have to tweak the signature (move one element in the signature one level up/down) and search again. Each such signature perturbation takes you one Hamming distance away from the original signature. In South Africa we were able to assign all plans with at most one Hamming distance away.

2) See (1)... indeed, you can use those variables as part of the signature.

3) Again, building one (1)... you cannot simply keep the coordinates the same. Because the individual in the community profile would/should keep their home location, the coordinates of the other activities in the plan may be quite far away. This may result in ridiculous walking distances of 20km, for example. So, what we do is relocate the activities based on the original inter-activity distances. But yes, you need not work with zones... it depends on what you have available.

4) Events are not the output op population generation. Instead, events are created after a full MATSim run (at least one iteration). We often use the term population generation quite indiscriminately... getting a population ready for a MATSim run requires two separate steps. First you generate a synthetic population, using for example Kirril Müller's Iterative Proportional Fitting approach, or Yu Chen's Bayesian network approach. But now you only have people (and possibly household), yet they do nothing. So the second step is assign travel demand and this is the part where you link a plan to each individual.

I hope the clarifies some of it.

You are viewing an archive of the previous MATSim Q&A site. The real site is now at https://matsim.org/faq

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

faq 108204966

Population generation - theory and adjusting

Comments: 1

Re: Population generation - theory and adjusting

Table of contents

Examples

Specific contributions

Clone this wiki locally