diff --git a/_data/events.yaml b/_data/events.yaml index d4c0e1c..090e7c8 100644 --- a/_data/events.yaml +++ b/_data/events.yaml @@ -1,4 +1,11 @@ +- time: 2024-11-11 17:00:00 + title: "Ubuntu for AI/ML projects" + speakers: [andreeamunteanu] + type: podcast + link: https://lu.ma/c5qa10su + + - time: 2024-09-26 17:00:00 title: "DataTalks.Club Anniversary Podcast" speakers: [alexeygrigorev] diff --git a/_podcast/s19e01-using-data-to-create-liveable-cities.md b/_podcast/s19e01-using-data-to-create-liveable-cities.md new file mode 100644 index 0000000..b3832b0 --- /dev/null +++ b/_podcast/s19e01-using-data-to-create-liveable-cities.md @@ -0,0 +1,783 @@ +--- +episode: 1 +guests: +- rachellim +ids: + anchor: atatalksclub/episodes/Using-Data-to-Create-Liveable-Cities---Rachel-Lim-e2qecup + youtube: VXQIGHUWeL0 +image: images/podcast/s19e01-using-data-to-create-liveable-cities.jpg +links: + anchor: https://podcasters.spotify.com/pod/show/datatalksclub/episodes/Using-Data-to-Create-Liveable-Cities---Rachel-Lim-e2qecup + apple: https://podcasts.apple.com/us/podcast/using-data-to-create-liveable-cities-rachel-lim/id1541710331?i=1000675373908 + spotify: https://open.spotify.com/episode/1z7jdogto8i4Zk6Zh1vDxE?si=KCg2Iq1US0SKwFCKasGqUg + youtube: https://www.youtube.com/watch?v=VXQIGHUWeL0 +season: 19 +short: Using Data to Create Liveable Cities +title: Using Data to Create Liveable Cities +transcript: +- header: Using data to create livable cities +- line: This week, we'll talk about using data to create livable cities. We have a + special guest today, Rachel Lim. Rachel is an urban data scientist dedicated to + creating livable cities through the innovative use of data. Welcome, Rachel! + sec: 116 + time: '1:56' + who: Alexey +- line: Thank you! I'm happy to be here today. I've benefited greatly from the DataTalks.Club + courses, and I'm excited to share my experiences. + sec: 151 + time: '2:31' + who: Rachel +- line: We're happy to have you. Before diving into our main topic, could you tell + us about your career journey so far? + sec: 161 + time: '2:41' + who: Alexey +- header: 'Rachel''s career journey: from geography to urban data science' +- line: Yes, I'm currently working as a data engineer in Singapore, focusing on creating + livable cities using data. My background is in geography — I have a bachelor's + degree in geography and a master's in urban data science. I blend qualitative + and quantitative analysis to tackle urban challenges. + sec: 172 + time: '2:52' + who: Rachel +- line: I began my career in data science, applying analytics and machine learning + to various transportation projects, such as bike-sharing analytics to address + indiscriminate parking and road defect detection using computer vision. These + projects allowed me to make a tangible impact in cities. Seeing my work lead to + real-world solutions motivated me to become a transport scientist. I focus on + analyzing travel patterns to support long-term planning in Singapore. Recently, + I transitioned to data engineering after completing the DataTalks.Club data engineering + course. By diving into data foundations and building data platforms, I aim to + optimize AI applications for creating livable cities. + sec: 172 + time: '2:52' + who: Rachel +- line: That sounds amazing! How was the course? + sec: 242 + time: '4:02' + who: Alexey +- line: It was really good. It covered a lot about building data pipelines and using + tools like Apache Kafka, which was quite an eye-opener. The course was very relevant + and helpful in my transition to my current role. + sec: 245 + time: '4:05' + who: Rachel +- header: What does a transport scientist do? +- line: You mentioned you were a transport scientist. That’s an interesting title. + What exactly does a transport scientist do, and what types of organizations need + this role? + sec: 260 + time: '4:20' + who: Alexey +- line: Transport scientists are usually needed in the public sector, especially in + government agencies involved in transportation planning. The role involves applying + data science in a practical way to public transport and transportation planning. + Another sector where transport scientists are valuable is transport consultancy, + such as firms like Sam Schwartz. It's essentially about applying data science + within an urban context to improve transportation systems. + sec: 287 + time: '4:47' + who: Rachel +- line: So, is it about planning where to put bus stops, how often buses should run, + and similar things? + sec: 323 + time: '5:23' + who: Alexey +- header: Short-term and long-term transportation planning +- line: Yes, that’s part of it. We separate our work into short-term and long-term + planning. In the short term, we look at bus routes, service frequencies, travel + patterns, and how well services are meeting users' needs. In the long term, we + use travel pattern data to make projections and plan for future infrastructure + needs, such as additional roads or rail lines and their alignment. + sec: 334 + time: '5:34' + who: Rachel +- header: Data sources for transportation planning in Singapore +- line: I guess each bus in Singapore has sensors to track its location and passenger + load, right? This data helps you see if certain routes are overcrowded and need + more frequent service? + sec: 374 + time: '6:14' + who: Alexey +- line: Exactly. We use a combination of data sources. Buses are equipped with GPS + transponders, allowing us to track their locations and times at each bus stop. + This helps us identify issues like bus bunching, where multiple buses arrive at + the same stop simultaneously. Ideally, buses should be spaced out to optimize + service. + sec: 407 + time: '6:47' + who: Rachel +- line: On the demand side, we look at fare card data to understand where passengers + are tapping in and out, giving us a clearer picture of travel demand. + sec: 407 + time: '6:47' + who: Rachel +- line: What do you mean by "tapping in and out"? + sec: 456 + time: '7:36' + who: Alexey +- line: In Singapore, we use a fare card system similar to London's Oyster card or + New York's MetroCard. Passengers tap their card when they enter and exit public + transportation, like trains and buses. + sec: 460 + time: '7:40' + who: Rachel +- line: So there's a card and a reader? That makes sense. In Berlin, people can just + hop on a bus without any interaction. Sometimes there's a fare check, but it's + not consistent. + sec: 479 + time: '7:59' + who: Alexey +- line: Yes, that approach makes it more challenging to collect travel data. You’d + need to rely on video surveillance and computer vision to analyze passenger flow, + which is more complex than just processing fare card events. + sec: 500 + time: '8:20' + who: Rachel +- header: Rachel's motivation for combining geography and data science +- line: Definitely. So, what motivated you to work at the intersection of geography + and data science? + sec: 518 + time: '8:38' + who: Alexey +- line: Growing up in Singapore, I was fascinated by the systems shaping our cities. + I witnessed firsthand how the city rapidly expanded its MRT (Mass Rapid Transit) + network and developed new housing estates. This sparked my interest in urban planning + and geography. + sec: 535 + time: '8:55' + who: Rachel +- line: I did an internship at the Center for Livable Cities, which deepened my understanding + of sustainable urban environments. During this time, I attended the World City + Summit, where city leaders from around the world shared ideas on improving urban + spaces. + sec: 535 + time: '8:55' + who: Rachel +- line: This inspired me to study geography, focusing on urban design, geocomputation, + and geospatial analytics. Eventually, this led me to pursue a master's in urban + informatics at New York University, specializing in applying data science in urban + contexts. This combination of experiences has shaped my current role. + sec: 535 + time: '8:55' + who: Rachel +- header: Urban design and its connection to geography +- line: That's interesting. My knowledge of geography is mostly from school, where + we learned things like capital cities and natural features. I didn't realize geography + could involve urban design. Is urban design about planning new districts, including + schools and parks? + sec: 619 + time: '10:19' + who: Alexey +- line: Yes, but it goes beyond just planning where things go. It’s about designing + environments that are livable. This involves making streets walkable, deciding + on the width of streets, placing sidewalks, and using elements like planter boxes + to separate pedestrians from traffic. It's also about creating safe, welcoming + spaces where people want to linger and interact, fostering a sense of community. + sec: 686 + time: '11:26' + who: Rachel +- line: I see. Why is this field still called geography when it covers urban design + and other aspects? + sec: 726 + time: '12:06' + who: Alexey +- line: Geography isn't just about physical features like mountains or rivers. It + also includes human geography, which focuses on people, migration, and population + changes. It’s about how these factors interact with physical spaces. Geography + is fascinating because it’s connected to the real world — what we study directly + influences how we live and interact with our environments. + sec: 740 + time: '12:20' + who: Rachel +- header: Defining a livable city +- line: So far, we’ve talked about what makes a city livable — parks, pedestrian zones, + traffic management, and fostering community. How would you define a livable city? + sec: 792 + time: '13:12' + who: Alexey +- line: A livable city is one where people feel connected to their community and have + opportunities to thrive. In terms of the built environment, this can include efficient + public transport with a well-connected network of buses, trains, and bike-sharing + programs. It also means safe, pedestrian-friendly streets with dedicated bike + lanes and walkways. + sec: 829 + time: '13:49' + who: Rachel +- line: Other aspects are affordable housing, proximity to essential services, and + green spaces. Beyond physical infrastructure, digital infrastructure plays a role + too. This includes online access to government services, digital safety, and platforms + that connect residents, recognizing that people spend a lot of time in digital + spaces now. + sec: 829 + time: '13:49' + who: Rachel +- header: Livability of Singapore and urban planning +- line: How livable is Singapore, in your opinion? I've never been there, but it's + on my list. + sec: 930 + time: '15:30' + who: Alexey +- line: Singapore has made significant progress. Initially, we focused on developing + housing estates, but now there's a greater emphasis on placemaking — creating + spaces where people can gather and enjoy. We're also converting certain streets + into car-free zones, improving walkability and cycling infrastructure. It's a + journey; some parts of the city are more livable than others, but we're working + to expand these spaces. + sec: 948 + time: '15:48' + who: Rachel +- line: Singapore is geographically small and densely populated, so I imagine land + use has to be very efficient. + sec: 995 + time: '16:35' + who: Alexey +- line: Absolutely. Singapore practices a process called Master Planning, which involves + planning 15 years ahead. This ensures that amenities and infrastructure are effectively + integrated over time. + sec: 1015 + time: '16:55' + who: Rachel +- line: Interesting. I live in Berlin, and I think the city is fairly livable. It + has good public transport, though cycling infrastructure could be improved. In + Moscow, it's harder for people with disabilities to get around, whereas Berlin + is more accommodating. It’s fascinating to see the practical application of geography + and data science. Could you share more about what you do as a transport scientist + and now as a data engineer? + sec: 1036 + time: '17:16' + who: Alexey +- header: Role of data science in urban and transportation planning +- line: Planning a city requires deliberate effort, and data science plays a critical + role in improving livability by offering insights and supporting data-driven decision-making. + By analyzing data collected throughout the city, we can optimize services, enhance + public safety, and promote sustainability. + sec: 1104 + time: '18:24' + who: Rachel +- line: In Singapore, we collect a lot of public data, which we share on open data + platforms. This enables collaboration with citizen developers, students, and research + institutions. Our data sources include transportation data like fare card usage, + as well as census and survey data. This helps us understand travel patterns and + conduct transport modeling to plan future infrastructure, such as rail lines. + sec: 1104 + time: '18:24' + who: Rachel +- line: We're also increasingly using data from the private sector, like mobility + data from ride-sharing apps. This gives us additional insights into how people + move around the city, beyond just public transportation. + sec: 1104 + time: '18:24' + who: Rachel +- header: Predicting travel patterns for future transportation needs +- line: That's interesting because you mentioned predicting where people will move. + If I understood correctly, for instance, if more people start moving from one + part of Singapore to another, you can anticipate this trend. Maybe a district + is becoming more popular, so it's gradually getting more populated. You want to + predict these patterns to plan accordingly, right? Like adding a new bus line + or increasing the frequency of existing buses? + sec: 1231 + time: '20:31' + who: Alexey +- line: Yes, that's right. Singapore is quite small, so we actively plan how housing + estates will develop. This could involve building new housing estates or renewing + and rejuvenating existing areas. By doing so, we can estimate how many people + will live in a particular district. From there, we use past travel patterns and + data on new modes of transport to predict future movements. This helps us plan + ahead for necessary transportation services, such as adding new bus lines or stops + to ensure these areas are well-connected to the rest of Singapore. + sec: 1269 + time: '21:09' + who: Rachel +- header: Data collection and processing in transportation systems +- line: I see. As a data scientist, you can't do much without data. You mentioned + various data types, like sensor data, movement patterns, and ride-hailing information. + All of this needs to be collected, processed, and analyzed. This must involve + having sensors on buses and other physical means of data collection, right? Then, + this data needs to be sent to a platform, aggregated, processed, and perhaps stored + in a data warehouse for data scientists to use. As a data engineer, you're probably + involved in these steps. Can you tell us more about what happens behind the scenes? + sec: 1322 + time: '22:02' + who: Alexey +- line: Yes, we work with a combination of data sources. We gather GPS data from ride-hailing + companies and public transport, along with fare card information about when and + where people are tapping in and out. In our data pipelines, we have an end-to-end + system that aggregates this information, stores it in a data warehouse, and processes + it so that it's suitable for downstream data analysis. We don't just need real-time + data; we also require historical data to do projections. Long-term data allows + us to track patterns over time, which is crucial for providing stable and reliable + insights. + sec: 1381 + time: '23:01' + who: Rachel +- header: Use of real-time data for traffic management +- line: Are there situations where you actually need real-time data as well? + sec: 1442 + time: '24:02' + who: Alexey +- line: Yes, real-time data is often needed for managing operations. For example, + to monitor the reliability of services, real-time data is essential. Another use + case is tracking traffic flow during specific events. In Singapore, we host the + F1 night race, which takes place on a street circuit, so parts of the roads are + closed. We use taxi data as a proxy to understand how traffic is flowing around + the closed areas. By monitoring how quickly the GPS coordinates from taxis are + moving, we can detect congestion and adjust traffic management strategies accordingly. + sec: 1449 + time: '24:09' + who: Rachel +- line: What actions are taken if there's a traffic jam? + sec: 1507 + time: '25:07' + who: Alexey +- line: It depends on the location. If a traffic jam occurs on an expressway, we have + cameras that monitor these areas, pinpointing where the issue is. Recovery services, + like tow trucks or other assistance, are dispatched to manage the situation and + clear the blockage. + sec: 1510 + time: '25:10' + who: Rachel +- line: So it could involve police officers managing traffic or using other traffic + marshals to control the situation? + sec: 1529 + time: '25:29' + who: Alexey +- line: Yes, a combination of traffic marshals and recovery services are deployed + to clear any blockages and ensure that traffic can flow smoothly again. + sec: 1535 + time: '25:35' + who: Rachel +- line: In Berlin, we often have events like half marathons or marathons that require + large streets to be closed for half a day. This can be quite inconvenient for + drivers who need to find alternative routes. These events cover large distances, + so multiple roads must be blocked. Do you have similar events in Singapore, and + how do you handle them? + sec: 1545 + time: '25:45' + who: Alexey +- line: Yes, we host marathons and similar events in Singapore. These events often + use a mix of roads and our network of park connectors, which are pathways that + connect different parks and are free of vehicles. This allows most of the marathon + routes to take place without significantly disrupting traffic on main roads. The + impact on traffic is minimized because only a few roads need to be closed, thanks + to these park connectors. + sec: 1590 + time: '26:30' + who: Rachel +- header: Incorporating generative AI into data engineering +- line: That makes sense. Having connected parks where people can run without needing + to wait for traffic lights is an excellent way to make a city more livable. I'm + into running myself, and finding a route without having to wait for traffic lights + is always a challenge. It's great that Singapore has planned these aspects so + well. Do you participate in organizing these events, or are you more focused on + transportation in your role? + sec: 1626 + time: '27:06' + who: Alexey +- line: My role is more focused on transportation. While I don't directly manage these + events, I work on data preparation and building data pipelines, ensuring that + transportation systems run smoothly. Increasingly, I'm also looking at ways to + incorporate generative AI into data engineering, such as enabling users to query + databases without needing SQL knowledge. + sec: 1679 + time: '27:59' + who: Rachel +- line: How often do people need to make these types of queries? + sec: 1723 + time: '28:43' + who: Alexey +- line: Quite often. As data engineers or scientists, we frequently receive requests + to extract data or perform specific analyses. Building tools that allow users + to access insights on their own would free up our time for more innovative work. + I believe that subject matter experts should drive data science and analysis because + they best understand their needs and the problems they want to solve. When problems + and requests are passed through multiple people, the original intent can sometimes + get diluted. Clear, well-defined problem statements are essential for effective + analysis. + sec: 1728 + time: '28:48' + who: Rachel +- line: Who are the people making these requests, usually? I assume they are subject + matter experts who aren't necessarily technical, right? + sec: 1793 + time: '29:53' + who: Alexey +- header: Data analysis for transportation policies +- line: Yes, they could be subject matter experts, like those working in policy-making. + They need data to develop data-driven policies. + sec: 1809 + time: '30:09' + who: Rachel +- line: Can you give an example of a policy that might need data analysis? + sec: 1821 + time: '30:21' + who: Alexey +- line: Sure, one example is analyzing fare card data to determine appropriate transportation + pricing. We might want to know how many people use concession cards, like those + for senior citizens or students, to evaluate the effectiveness of a new policy + or fare adjustment. + sec: 1827 + time: '30:27' + who: Rachel +- line: What is a concession card? + sec: 1871 + time: '31:11' + who: Alexey +- line: A concession card provides discounted fares. In Singapore, for instance, senior + citizens and students can use these cards to pay reduced rates on public transportation. + sec: 1860 + time: '31:00' + who: Rachel +- line: So, policy questions might include things like what happens if we increase + fares or introduce a new ticket option? For example, in Berlin, there are discounted + bundles of tickets. You might need data to predict the impact of similar changes. + sec: 1871 + time: '31:11' + who: Alexey +- line: Exactly. In Singapore, we have monthly passes offering unlimited travel for + a fixed price, with specific options for students. We might analyze how many students + use these passes and how effective they are in encouraging public transport use. + sec: 1897 + time: '31:37' + who: Rachel +- line: And since people need to tap their cards to use transportation, you have all + this data. But policy specialists might not know how to query it. Are you working + on making this easier for them, perhaps through a chat interface that translates + their questions into SQL queries? + sec: 1925 + time: '32:05' + who: Alexey +- line: Yes, that's something we're looking to build. It's still in development, but + we're working on tools that can help extract information and perform analysis + more intuitively, potentially using plain language queries. + sec: 1955 + time: '32:35' + who: Rachel +- line: Is this a project you're actively working on now? + sec: 1973 + time: '32:53' + who: Alexey +- line: Yes, it's one of the projects we're currently focusing on. + sec: 1979 + time: '32:59' + who: Rachel +- header: Technologies used in text-to-SQL projects +- line: What technologies or approaches are you using for this project? It sounds + like a fascinating application of large language models (LLMs), and it might interest + our LLM Zoomcamp students to learn about its practical uses. + sec: 1999 + time: '33:19' + who: Alexey +- line: In the text-to-SQL space, we're looking at using metadata from data warehouses + and data catalogs. We chunk this information and create a vector database, then + apply a large language model (LLM) on top of it. The LLM takes user queries in + plain English, translates them into SQL statements, and returns outputs to the + users. + sec: 2028 + time: '33:48' + who: Rachel +- line: So, it's like a retrieval-augmented generation (RAG) setup. A user provides + a plain English query, and you use context from metadata to generate the SQL query. + The LLM then executes the query and returns the results. Is that how it works? + sec: 2040 + time: '34:00' + who: Alexey +- line: Yes, exactly. We use RAG techniques. The metadata helps the LLM understand + which tables to refer to for the correct information extraction. + sec: 2083 + time: '34:43' + who: Rachel +- line: How often do the queries generated by this process fail or need correction? + sec: 2109 + time: '35:09' + who: Alexey +- line: I don't have exact numbers, but it does happen. Success largely depends on + effective prompt engineering. Providing sufficient examples of text-to-SQL conversions + helps guide the LLM. It’s also important to restrict certain types of SQL commands, + like insert, update, or delete statements, to prevent unintended database modifications. + sec: 2118 + time: '35:18' + who: Rachel +- header: Handling large datasets and transportation data in Singapore +- line: That's a fascinating project. I'm always keen to learn about new LLM use cases + because there are so many possibilities. It's great that you're exploring this + space too. This project is in an early development phase, right? + sec: 2172 + time: '36:12' + who: Alexey +- line: Yes, it’s still in early development. + sec: 2190 + time: '36:30' + who: Rachel +- line: How large are the datasets you're working with for these projects? Besides + the text-to-SQL project, are there other more established projects you're involved + in? You mentioned transportation data — like fare card data. How large are these + datasets? + sec: 2192 + time: '36:32' + who: Alexey +- line: We collect a significant amount of data. For fare card data alone, we gather + millions of passenger flow records daily in Singapore. This provides numerous + data points for analyzing passenger movements, identifying peak travel times, + and assessing route popularity. This data is crucial for optimizing routes and + fare structures. + sec: 2231 + time: '37:11' + who: Rachel +- line: For example, in Singapore, the Public Transport Council implemented a "morning + pre-peak fare" policy, offering savings for commuters who travel before 7:45 AM. + This aims to shift demand away from peak hours, balancing train load and encouraging + earlier travel. By analyzing fare card data, we can evaluate the effectiveness + of such policies. + sec: 2231 + time: '37:11' + who: Rachel +- line: Regarding these millions of data points you collect, how are they processed? + Do you use standard tools like Kafka, data lakes, and data warehouses? Also, how + does data from the buses get collected? Are there transmitters on the buses that + send data to Kafka or another system? + sec: 2314 + time: '38:34' + who: Alexey +- line: Yes, we have sensors on the buses that collect location information. We also + gather data from fare card systems at the entry and exit points. + sec: 2344 + time: '39:04' + who: Rachel +- line: Is this data collected in real-time? For example, the moment I tap my card, + does an event get recorded immediately, or is the data only gathered at the end + of the day? + sec: 2354 + time: '39:14' + who: Alexey +- line: The data is collected in real-time, but aggregation happens afterward. In + Singapore, we use a system that defines a ride and a journey. Our fare structure + allows commuters to make multiple transfers within a 45-minute period, which is + considered one journey. We calculate fares based on the total distance traveled. + So, while data is collected in real-time, processing and aggregation occur after + a time lag. This allows us to combine all this information before storing it in + a data warehouse. We use tools like Kafka and Apache Spark for processing. + sec: 2367 + time: '39:27' + who: Rachel +- line: So, the events from the bus are sent immediately to your system. You don't + need to wait until the bus completes its shift to connect and download the data, + right? + sec: 2417 + time: '40:17' + who: Alexey +- line: Yes, that's correct. + sec: 2436 + time: '40:36' + who: Rachel +- line: That's great. You use Kafka, Apache Spark, and other common tools in data + engineering, right? + sec: 2440 + time: '40:40' + who: Alexey +- line: Yes, that's right. + sec: 2448 + time: '40:48' + who: Rachel +- line: Have you ever had to go on a bus to fix a sensor? + sec: 2460 + time: '41:00' + who: Alexey +- line: No, I haven't needed to do that. + sec: 2464 + time: '41:04' + who: Rachel +- line: So, the sensors are quite reliable, right? + sec: 2468 + time: '41:08' + who: Alexey +- line: Yes, generally they are reliable. However, part of data engineering involves + detecting data quality issues or anomalies. For instance, if a transponder on + a bus isn’t sending data correctly, we need to identify that issue. While I haven't + personally gone to a bus to fix a sensor, maintaining data quality is a critical + aspect of data engineering. + sec: 2468 + time: '41:08' + who: Rachel +- line: I'm considering whether to dive deeper into traditional data engineering topics + or explore generative AI applications. What would you like to focus on next? + sec: 2506 + time: '41:46' + who: Alexey +- line: We could go more into how AI is being used. + sec: 2531 + time: '42:11' + who: Rachel +- header: Generative AI applications beyond text-to-SQL +- line: Do you have other AI applications besides the text-to-SQL tool? + sec: 2537 + time: '42:17' + who: Alexey +- line: The text-to-SQL tool is my main focus right now. However, generative AI has + other potential applications, such as creating synthetic data. In projects where + we don't have a full data set, generative AI could help generate synthetic data. + Additionally, I believe generative AI could redefine user interfaces, making information + retrieval more conversational and intuitive, moving away from traditional keyword + searches to semantic searches. This approach could change how people search for + things, not just in e-commerce but in various domains, like planning what to buy + for a gift. + sec: 2548 + time: '42:28' + who: Rachel +- line: I'm becoming more accustomed to using tools like ChatGPT with voice recognition. + It's very convenient. Regarding synthetic data, how effective is generative AI + at creating data, especially numerical data or time series data? + sec: 2635 + time: '43:55' + who: Alexey +- line: I was speaking generally about the potential of generative AI for creating + synthetic data, especially when dealing with complex or sensitive data sets. Generative + AI could help create synthetic versions that mask confidential information while + retaining essential characteristics. + sec: 2682 + time: '44:42' + who: Rachel +- header: Publishing public data and maintaining privacy +- line: Since Singapore releases a lot of public data on open platforms, sometimes + you might need to edit this data before publishing it, right? + sec: 2726 + time: '45:26' + who: Alexey +- line: The current publicly shared data is collected from various systems, and we + mask sensitive information, such as fare card numbers, before publishing. + sec: 2740 + time: '45:40' + who: Rachel +- header: Recommended datasets and projects for data engineering beginners +- line: Where can people find this public data? + sec: 2752 + time: '45:52' + who: Alexey +- line: 'Two main platforms provide public data in Singapore: data.gov.sg and DataMall. + Data.gov.sg aggregates data from various government bodies, covering areas like + rainfall, air pollution, transportation, and more.' + sec: 2760 + time: '46:00' + who: Rachel +- line: There are categories like arts, education, economy, environment, housing, + health, social, transport, and real-time APIs. I assume I should look under the + transport category to find relevant data, right? + sec: 2780 + time: '46:20' + who: Alexey +- line: Yes, you’ll find transportation data, such as air travel and geospatial information, + under the transport category. + sec: 2801 + time: '46:41' + who: Rachel +- line: If someone is starting the next cohort of a data engineering course in January, + what kinds of projects would you recommend they try using these data sets? + sec: 2810 + time: '46:50' + who: Alexey +- line: One useful data set could be car parking data, as we collect real-time parking + transaction data. It's a large and dynamic data set, ideal for real-time data + ingestion, storage in a data warehouse or data lake, and subsequent analysis. + sec: 2825 + time: '47:05' + who: Rachel +- line: How do I find the car parking data? + sec: 2856 + time: '47:36' + who: Alexey +- line: Car parking data is typically part of our dynamic data sets. + sec: 2867 + time: '47:47' + who: Rachel +- line: There are real-time APIs. Maybe that’s what I need to look for. Could you + send me the link later? + sec: 2871 + time: '47:51' + who: Alexey +- line: Sure, I can send it to you. Another valuable data set is real-time taxi data, + which can be useful for examining data engineering processes. In previous data + engineering courses, we used the New York taxi data set, which is aggregated. + Real-time data offers another layer of complexity and learning opportunities. + sec: 2888 + time: '48:08' + who: Rachel +- line: I see public transport capacity, taxi population data, and more, some of which + are updated regularly. You've sent me another link, right? + sec: 2915 + time: '48:35' + who: Alexey +- line: Yes, I've sent another link, specifically for transportation data under the + dynamic data sets, including real-time taxi availability and parking information. + sec: 2943 + time: '49:03' + who: Rachel +- header: Recommended resources for learning urban data science +- line: Great, I'll add this link to our description. Thanks. I don't see any questions + from the audience at the moment. I'm wondering if someone wants to study what + you do — urban data science and transport planning — what resources, books, or + courses would you recommend? + sec: 2956 + time: '49:16' + who: Alexey +- line: 'The courses by DataTalks.Club are a good primer. For a deeper understanding + of urban data science, the book "The Death and Life of Great American Cities" + by Jane Jacobs is a classic. It critiques traditional urban planning and advocates + for vibrant, livable cities through community-based approaches and human-scale + design. Another book is "Happy City: Transforming Our Lives Through Urban Design" + by Charles Montgomery. This book blends urban planning with psychology and sociology, + exploring how thoughtful urban design can improve happiness and well-being. Both + books offer insights into why we study urban data and plan cities thoughtfully.' + sec: 3022 + time: '50:22' + who: Rachel +- line: Those sound interesting. The first one is "The Death and Life of Great American + Cities," and the second is "Happy City," right? + sec: 3102 + time: '51:42' + who: Alexey +- line: Yes, that's correct. + sec: 3120 + time: '52:00' + who: Rachel +- line: I remember visiting the United States, where getting around without a car + is challenging, except in cities like New York or Boston. It would be great if + some of the practices from Europe or Singapore were implemented there. I recall + once being picked up by the police while walking down a road without sidewalks + because I wasn't used to the idea that you need a car to move around. + sec: 3125 + time: '52:05' + who: Alexey +- line: That's quite amusing! + sec: 3160 + time: '52:40' + who: Rachel +- line: Anyway, Rachel, thank you so much for joining us today and sharing your experiences. + I’ve learned a lot. These topics were new to me, and it was enlightening to hear + about your work. Thanks for being here, and thanks to everyone else who joined + us today. This was our first episode after a break, and it was a great start. + sec: 3162 + time: '52:42' + who: Alexey +- line: Thank you for having me. + sec: 3189 + time: '53:09' + who: Rachel +- line: I'm happy to hear about your transition to data engineering and that our course + helped. It means a lot to see success stories like yours. Thanks again! + sec: 3191 + time: '53:11' + who: Alexey +- line: Thank you. + sec: 3206 + time: '53:26' + who: Rachel +- line: Have a wonderful rest of your day, Rachel, and to everyone else — see you + around! + sec: 3208 + time: '53:28' + who: Alexey +--- + +Links: + +* [Dynamic Datasets](https://datamall.lta.gov.sg/content/datamall/en/dynamic-data.html){:target="_blank"} \ No newline at end of file diff --git a/images/podcast/s19e01-using-data-to-create-liveable-cities.jpg b/images/podcast/s19e01-using-data-to-create-liveable-cities.jpg new file mode 100644 index 0000000..2970768 Binary files /dev/null and b/images/podcast/s19e01-using-data-to-create-liveable-cities.jpg differ