EPITalk: Behind the Paper
This stimulating podcast series from the Annals of Epidemiology takes you behind the scenes of groundbreaking articles recently published in the journal. Join Editor-in-Chief, Patrick Sullivan, and journal authors for thought-provoking conversations on the latest findings and developments in epidemiologic and methodologic research.
EPITalk: Behind the Paper
Exploring Neural-Network Models in Pediatric Chronic Kidney Disease
Dr. Derek Ng joins EPITalk host and co-author, Dr. Patrick Sullivan to explore whether a neural-network model utilizing two blood biomarkers can more accurately estimate kidney function in pediatric chronic kidney disease than current standard equations. “A comparison of neural networks and regression-based approaches for estimating kidney function in pediatric chronic kidney disease: Practical predictive epidemiology for clinical management of a progressive disease” is published in the May 2025 issue (Vol. 105) of Annals of Epidemiology.
Read the full article here:
https://www.sciencedirect.com/science/article/abs/pii/S1047279725000717
Episode Credits:
- Executive Producer: Sabrina Debas
- Technical Producer: Paula Burrows
- Annals of Epidemiology is published by Elsevier.
Hello, you're listening to EPITalk: Behind the Paper, a monthly podcast from the annals of epidemiology. I'm Patrick Sullivan, editor-in-chief of the journal, and in this series we take you behind the scenes of some of the latest epidemiologic research featured in our journal. Hello, you're listening to Epitalk Behind the Paper, a monthly podcast from the Annals of Epidemiology. I'm Patrick Sullivan, editor-in-chief of the journal, and in this series, we take you behind the scenes of some of the latest epidemiologic research featured in our journal. Today we're here with Dr. Derek Ng to discuss his article, A Comparison of Neural Networks and Regression-Based Approaches for Estimating Kidney Function in Pediatric Chronic Kidney Disease, Practical Predictive Epidemiology for Clinical Management of a Progressive Disease. You can read the full article online in the May 2025 issue of the journal at www.annalsof epidemiology.org. Dr. Derek Ng is an associate professor of epidemiology at the Johns Hopkins Bloomberg School of Public Health and is the principal investigator of the Data Coordinating Center for the Chronic Kidney Disease and Children Cohort Study. His research focuses on pediatric epidemiology with a particular focus on pediatric nephrology and pediatric critical care. His methodologic interests are in the multi-center cohort study designs, prediction modeling, and translational epidemiologic tools to help clinicians, patients, and families. Dr. Ng, thanks so much for joining us today.
Derek Ng:Thank you for having me.
Patrick Sullivan:I'd like to start out by just asking if you can share with us some background on the problem described in your paper, both pediatric kidney disease and then the modeling question that your study addressed. Why are these issues important?
Derek Ng:Sure. So children with chronic kidney disease represent a very important clinical population. This is a rare disease, but it's not necessarily uncommon. We actually don't know exactly how many children in the U.S. are afflicted with chronic kidney disease, mainly because some are asymptomatic and others are very severe. And the main clinical concern are those who have serious disease. And what we aim to do in epidemiology is to help clinicians and patients and their families try to make sense of this disease, try to sort through what the best treatment options are, and also to understand how significant the disease is in terms of kidney function. One of the clinical concerns of pediatric nephrology is how well is the kidney functioning? There are ways to measure this. These procedures are long between five and six hours and require the injection of a contrast, but then there are ways to try to predict kidney function or estimate kidney function by linking biomarkers that we know are associated with kidney function to the directly measured quantities of kidney function. So that we can use mathematical and statistical models in order to estimate or predict what the kidney function is without having to measure it directly. The main benefit of this is that biomarkers measured in a simple blood draw can help clinicians understand how serious the kidney disease is in this patient population.
Patrick Sullivan:And that sounds like say some procedures for the patients and the children as well.
Derek Ng:Yeah, that's a big benefit. It's much easier on the patients. It's quicker, it's also less expensive. So these blood draws are a much more efficient and cost-effective way to estimate kidney function as long as the models that we use to estimate kidney function are good.
Patrick Sullivan:So I think that leads then to what the main sort of purpose of the study was. Can you say a little bit more about the different approaches that you were evaluating and why you felt like it was important to evaluate that to compare these two approaches to the estimation?
Derek Ng:Right. So we should say first that the first estimating equations for kidney function were published in 1976, but there was a real big breakthrough in adult clinical nephrology in 1999 when an equation was developed based on classical statistical regression methods to estimate kidney function based on a biomarker called serum creatinine. This was later updated and refined through different equations, but the structure of it remained the same in terms of using a regression-based approach to estimate kidney function. Since that time, there's also been a second biomarker that has been found to be very useful to estimate kidney function, as well as also being convenient to be measured clinically. This second biomarker is called cystatin C, and it's typically used as a sequential test. If there's something going on with the serum carotinine that looks suspicious, or the clinician would like kind of a second opinion from these estimating equations, they could use serum cystatin C in order to get another estimate of kidney function to try to understand the severity of kidney disease. Now, when these equations were developed, neural networks and machine learning methods were not as well developed or accessible. I think probably most listeners understand that artificial intelligence, machine learning, has become more accessible and common over time. And so what we were really interested in was whether the neural network actually offered a better estimate under the clinical constraints of having two biomarkers measured, which would reflect the clinical conditions that physicians operate under. If we found that the neural network offered something new, this would be an important finding that we could use to estimate GFR and think about ways to translate that. However, if they didn't perform as well, it would also give the clinicians confidence in the tools that they currently have existing. But I think among the talk about the promise of artificial intelligence and machine learning, it represented an opportunity to explore whether this was a great benefit in the context of clinical nephrology.
Patrick Sullivan:Yeah, and a really specific application of that question of like what can these kinds of new tools do better and where where are the existing systems maybe already robust? There may be listeners who are less familiar with neural networks. So just at a high level, could you give us an idea of what a neural network is in a epidemiologist's terms, but you know, why that technology might be helpful?
Derek Ng:Sure. So the main purpose of these neural networks is ultimately to predict some outcome. Now that outcome could be about classification. In this particular case, glomerular filtration rate or GFR, which is a measure of kidney function, is a continuous variable. And so the neural network is really well suited to try to predict this continuous variable of glomerular filtration rate based on predictors. Now, in a classical regression setting, the same goal can be achieved. I think the distinction is that the neural network integrates information in a nonlinear way, unlike that of regression, and does this using what are called hidden layers and hidden units in order to come up with a final prediction that is close to what we conceptualize as the truth.
Patrick Sullivan:Great. So given a better understanding of the neural networks, just walk us through the methods that you use to kind of make this comparison.
Derek Ng:This work capitalized on a paper that we published in 2021 that offered a new system of equations for children and adolescents and young adults under the age of 25 to estimate their GFR. This was based on a classical regression structure. And we found that it offered, on average, unbiased, reliable, and highly correlated predictions of GFR compared to the directly measured GFR data that we had in this very large study of children with chronic kidney disease. What we intended to do in this paper was to use a neural network approach to estimate the same thing. And I think what was very unique about this paper was that we used the exact same training and testing data sets that were presented in the original 2021 paper in this new exploration of prediction methodology. And this way we could compare directly how good the agreement was between these two different approaches.
Patrick Sullivan:And now you start to get into the language of epidemiologists around the agreement of these. So that even if the I think to probably many of us, myself included, like the functioning of the neural network might be a little bit of a black box, but the premise, which is you have like a gold standard method and you have a method that may offer benefits, and thinking about the agreement really sort of comes back into the epi wheelhouse. So having said that, what were some of the key findings?
Derek Ng:So what we found was that the neural network did not offer better predictions of GFR compared to the classical approach. Now, having said that, it also didn't do much worse. In fact, they were very comparable. And it's also fair to say that the neural network methodology was more complex. And the output, while we were able to predict a single value of GFR, there was a lot of architecture and complexity in the model that wasn't present in the classical regression-based setting. So, in the scenario where you have two predictions that yield similar agreement and are both doing quite well, if you have your choice between a simpler model and a more complex one, it's better to go with the simple model. And we thought carefully about what sort of translational implications there were, considering that they both performed equally well. And we hit upon some of the benefits of the simple model compared to the more complex neural network one.
Patrick Sullivan:So your recommendation is that for now, the sort of simpler, like traditional model is probably preferable.
Derek Ng:Yeah, it it certainly seems to be sufficient. And I think that that is also because the number of predictors that we had were quite limited. We were limited to two biomarkers that are commonly measured clinically. And I think that the neural network probably has a better opportunity to shine and to do better if we had a lot more clinical predictors available. However, that's just not the case right now clinically, and hopefully that will change. It could be a while though. Right now, there's a lot of discovery work to try to identify new biomarkers that can be used to help estimate GFR. But right now, given the millions of blood tests that go on every day, there's only one, serum creatinine, that's measured routinely. And then within a small fraction of those is cystatin C. So the idea that we will be getting, you know, dozens of more clinical predictors specifically for kidney function in a blood test, I think it could be a while before that happens.
Patrick Sullivan:So is it too broad of a conclusion to say that these kind of tools might be more useful where there are more inputs to the I think I'm I'm translating a little bit, but if you only have a couple inputs, the that the neural networks may not have much room to improve or look at those in exponential ways compared to where there's a richer set of predictors.
Derek Ng:I think that's a great way to put it. And I think that's what we're we're up against. And as I mentioned before, you know, one of the motivations was that speaking to clinicians and physicians, nurses, and hearing about, well, hey, have you tried a neural network? Have you tried machine learning to try to predict GFR? Maybe there's something there that we're missing currently if you haven't done it. And we certainly kept an open mind about it and tried to do the best that we could. But I think as as you mentioned, the limiting factor is the fact that we only have two biomarkers. Now, these two biomarkers are very, very good, but they are only two, and these neural networks are really designed for many, many predictors. We did try to allow the neural network more of a chance to deal with more predictors by using transformations of the original predictors. So including polynomials for the serum caratinine, interactions with age as a predictor. And even with that, we found that we couldn't do any better than the original under 25 equations.
Patrick Sullivan:So I want to turn now to a segment of the podcast that gets a little bit more into how you came to this work and your career. I think it's always an opportunity to sort of learn from how people get to where they are to doing really interesting work. And I'll speak for myself as a later career, you know, researcher to try and share, you know, things that you may have learned along the way. So maybe just starting about like the inspiration behind the paper, or maybe more broadly, your larger focus in pediatric nephrology. How did that grow as a passion or a research area for you?
Derek Ng:It's a great question. So I trained at Hopkins and started with a Masters of Science in Epidemiology. And I was fortunate to make a nice connection with one of the faculty. And his name was Dr. Alvaro Muñoz, and he has been a great mentor of mine for many, many years. And at the time, though, when I had just met him, we discussed one of the papers I wrote for his class, and he was interested in that. And I was looking for a job at the time. I wasn't sure what I was going to do. And he kindly said, "Look, if you're interested in a job, I think we have a job opening coming up." And the job was to do programming and epidemiologic analysis in this cohort of children with chronic kidney disease. This was back in 2009, and the study had just started in 2003. So it was accumulating data, and there was a lot of opportunity, and I was really grateful for the opportunity. And since then, I've spent my whole career working largely, but not exclusively, in the domain of observational cohort study of chronic kidney disease in children. I can't say enough about this study. It's a beautiful national treasure that's a testament to the work of the participants, the families, the clinical coordinators, the people who measure samples at our central biochemistry laboratory, and just this beautiful team effort to try to come up with answers for this population that needs help. And it's hard not to understand the importance and imperative to help children who are sick. And that's really the primary mission of this cohort study. So I view my role as coming up with answers to questions that are formulated in a collaborative way and working towards offering knowledge so that the physicians can treat children, help them and their families, and give them the best information so that the children can live their best lives.
Patrick Sullivan:Yeah, thanks for that. And I think it's an especially important time. Like the what you share is really important because it reminds us that one, uh wherever we are in the mentor, you know, lineage here, and you go from being, you know, having a mentor give you a step up to becoming a mentor and giving other people a step up. But also just that that the arc of science is long and and that it's hard to predict at that, like when you entered into this, that this would be what you were doing. But to me, that's sort of a really timely topic right now, because there are threats to these pipelines of research. And one of the things I think you really only come to fully appreciate by living in our in this world is that things are nonlinear and that these kind of cohort studies, while they're observational, are such a rich source for derivative science or second gen science that moves things along. And you just don't know where the discovery is going to come from. So I one, this is a great example of like NIH, presumably as a funder, like making an investment in something that both supports careers of earlier career people when you were earlier career, and that gives rise to observations that we can't always anticipate. And we're we're living in a time where there's a fair amount of transactionality, I think, in terms of the research interests and the investments. And um, this is just a useful example of how when you plant good seeds in good ground and you give them good water, you know, things are going to grow and it may go beyond what you know what you plan.
Derek Ng:Yeah, I think that's a great way to put it. And I think often about in my field particular, defending our research portfolio and the value of this sort of research, especially because something like estimating kidney function is not something that you might consider should be something very profitable, at least monetarily. I don't see that measuring kidney function should be something that uh is monetarily profitable. Um, instead, this is simply providing information to clinicians so that they can treat children with kidney disease properly and make the best decisions and understand more about it. And of course, this uh cohort study, which is generously funded by NIH and has been since the inception, has yielded a lot of important information, not just about kidney function, but about the natural history of kidney disease in children, which until the study was started, not much had been known. And the amount of information that's been generated from this cohort study has really been remarkable. I think there's been over 200 papers and a lot of investigators working on it, and it really is a national resource. So people who are interested in using this type of data are able to get it at the NIDDK repository online.
Patrick Sullivan:That's great. And I'll get a URL from you and we'll post that up when we announce the podcast in the show notes as well, so people can access it. So you, I mean, we are we're talking a little bit about, um, and I appreciate you sharing about your kind of the leg up that you got from like a later career colleague. Now that you're at a different place in your career, what advice would you give to young researchers or clinicians who are interested in this field? I imagine a lot of our earlier career colleagues are seeing this these kinds of tools as an important part of the future of science. So, what advice do you have for earlier career people who were thinking about wading into this new pool of epidemiology and data science?
Derek Ng:I think the main thing is having an openness to learn and explore new ideas. And I have to confess, learning about the neural networks was actually very difficult. There were two great resources available to help make sense of this and understand it. And that is a book called The Introduction to Statistical Learning, the second edition. It was Invaluable. And my team and I read this as kind of like a book club in order to understand new methods and it walked through everything quite well. But what was nice was that we even though we weren't enrolled in a class or trying to get a degree or anything like that, we did this uh on our own and learned together as a team, a group of about uh six faculty and biostatisticians and epidemiologists. We could learn together and talk about it. And I think that's something very special about an academic environment. So don't be afraid to learn new things. The other thing I would say is that it's really important for epidemiologists to listen to clinicians and physicians and to understand what they need. I don't think that epidemiology just for the sake of Epi is always the best. And ultimately what we're trying to do is to help people. So if we can learn from the clinicians and physicians, you know, what frustrates them? What are they uncertain about? What would they like to know more about in order to treat their patients better? These are the sorts of questions that can help improve our own research as epidemiologists, because we don't want it simply to be in the ivory tower and not translate, um, but we want to share it with the people who will use it.
Patrick Sullivan:That's great advice. And I think that the collaboration between epidemiologists and clinicians, I put myself in the epidemiologist camp, and it just helps with the relevance of the questions that we're asking and the impact. And I think, in fairness to us, to our part of that partnership, like we really have no way to know what those questions are unless we ask. So I think it's a really smart way and it increases the significance of what we do. So all right. Dr. Ang, do you have any last thoughts you'd like to share with our listeners?
Derek Ng:What I'd like to implore, any listeners and students, I do teach a lot of students here, and I encourage them to try to answer big questions. Uh, think big, go out and collect data, collect new data, things that are not easily found, because that's where the discovery happens. I think that in general, it can be very easy to find convenient sources of data, and that can be very useful as a starting point. But as epidemiologists, we're often concerned about selection bias. We might have to get out of our comfort zone and pursue avenues that will yield new data, data that hasn't been measured yet. And that tends to be more valuable data. And I think that that's something I try to impart on my students, and hopefully listeners can also benefit from that.
Patrick Sullivan:Yeah, it is it is interesting that I think sometimes in training, like during a master's and PhD programs, there is a little bit of a bias to look for data sources that are well understood because we want the predictability, you know, for our colleagues who are in training, of not diving into a pool where you can't see the bottom, you know, that it's an answerable question and that the data are likely to be able to use to answer it. But the highest yield science is um, in some regards, the higher risk science. And so I think trying to strike that right balance between like predictability for someone, because the goal is to graduate, you know, the goal is to get the skills and graduate. But I think at every stage of career, there's some incentive to take the safer route. So I think your your message is really important, one, which is how far can we push into doing something that may be higher risk, higher reward, and when is the right time in one's career to do that? And probably every time in your career is the right time to do that. But there are then there are all these kind of other structures and incentives towards um predictability. So it's a complicated call as an academic researcher and as someone who's in training as well.
Derek Ng:Absolutely. And I think there's certainly uh time and place, and particularly training, you know, to uh make best use of time and and still have impactful science. And um, there's certainly wonderful resources. I guess my point is that in order to defend our space well, it means that we have to bring something new to the table. And as you said, sometimes the the higher risk data collection can be higher reward, and hopefully it is. But at the very least, we shouldn't be scared of it. I mean, if there are appropriate opportunities to use existing data, that's uh fantastic and appropriate is best. But we should be open about what we don't know and maybe consider taking a difficult route to collect new things.
Patrick Sullivan:And one of the things I really like about your work is that it links up kind of a new technology that can feel a little black boxy or a little intimidating with some more traditional methods. And I think like that those are uh places where epidemiologists should be leading as people who like to hang out with data and make friends with data. This is the right kind of question for our field, I think. And so, you know, I appreciate your work and I hope that we see more of this coming into the journal using those epi methods and some newer techniques.
Derek Ng:I really appreciate that. And I have to say, my mentor deplored black boxes and insisted that we demystify everything. And if we can't explain it and can't write it out, then we don't understand it. And our whole point of being in this field is for understanding and knowledge. So I really appreciate that. I'm so thankful for the opportunity to talk more about this work and the opportunity to have this work published in Annals of Epidemiology.
Patrick Sullivan:Well, thank you for sending the work to us. Thank you for taking the time to be on this episode. Really appreciate you joining us today. It was such a pleasure to have a chance to talk with you. Thanks so much. I'm your host, Patrick Sullivan. Thanks for tuning in to this episode and see you next time on EPITalk. Brought to you by Annals of Epidemiology, the official journal of the American College of Epidemiology. For a transcript of this podcast or to read the article featured on this episode and more from the journal, you can visit us online at www.annals of epidemiology.org..