To be clear: I’ve not once bashed Java. If you’re programming an app or an IDE, it’s clearly one of the best choices.
However, I’m laughing at your “don’t know C++” comment, as it’s clear you’ve not used the ecosystem from your reply:
99% of organisation aren’t coding their REST API interfaces with C++, which is why you don’t use it for everything. Even Netflix uses Python Flask APIs all over the place (and Java too).
You also don’t want a small feature taking 3 weeks to develop over 1 week in Python, if you don’t need to save that 50ms.
When needed, write a library and integrate it with Pybind. That’s been the accepted pattern for years. That’s where all my heavy pricing logic lives.
As a use case: Large scale vectorised columnar operations on tabular data. Using Polars in Python (Rust-based but still) will usually beat Java performance-wise for large operations on a dataframe. It’s open source and rust/based so optimised for these workloads. Reinventing in Java will also take a long time. You get this performance game using Python ecosystem out of the box.
The arguments you used is a strawman and ad-hominem, as you don’t understand the use cases or trade-offs.
Regardless of all the above, most enterprise Data Engineering teams predominantly use Python. The premise Python can’t build high performance distributed systems is observably false.
At its core, the lack of wider ecosystem knowledge this sub has shown makes me think they’re silo’d in their roles, abstracted from the business and likely low impact. Also, the inability to comprehend and process the words/ arguments (whilst they repeat the same ones), shows it’s not worth continuing dialogue.
“To be clear you” have changed your arguments yet keep going. I will leave you with this but clearly you love to argue and this will go nowhere.
First you are stating a statistic with no validation. No most are not using c++ but in prod for video they certainly are not using flask either lol. You are conflating involved production with random internal tools.
Two, it depends on the scale of the feature. And you clearly still don’t know c++. It really doesn’t take that long.
Three, again if it works for you great. The issue is eventually things have to leave the binding at some point. And if you have written most of your code in another language then by what’s the point.
Four, great the issue is these things still need to change hands from libraries. On frequent requests this is a huge overhead. And no, it’s not the same performance. That is a measurable fact.
I fully understand use cases and trade offs but also c and c++ so a cheap api won’t take me months to develop. The point I made is still true, if you can use those lower level languages python rarely is worth the effort
Again you are conflating. data engineering is not software engineering . But cool. Im not going to bother explaining the difference, you can go Google it or something. As someone else said best of luck in your software endeavors.
1
u/Willing_Parsley_2182 17d ago
To be clear: I’ve not once bashed Java. If you’re programming an app or an IDE, it’s clearly one of the best choices.
However, I’m laughing at your “don’t know C++” comment, as it’s clear you’ve not used the ecosystem from your reply:
The arguments you used is a strawman and ad-hominem, as you don’t understand the use cases or trade-offs.
Regardless of all the above, most enterprise Data Engineering teams predominantly use Python. The premise Python can’t build high performance distributed systems is observably false.
At its core, the lack of wider ecosystem knowledge this sub has shown makes me think they’re silo’d in their roles, abstracted from the business and likely low impact. Also, the inability to comprehend and process the words/ arguments (whilst they repeat the same ones), shows it’s not worth continuing dialogue.