Data Pipelines with Python and PostgreSQL
Code : https://github.com/Sean-Bradley/Stream-Data-From-Flask-To-Postgres
I show how to use streaming techniques to build a data pipeline which pulls data from an external API that returns massive amounts of data, and insert it straight into a PostgreSQL data base ASAP.
The Python process that reads the huge amount of data, before inserting into Postgres is able to process the incoming chunks by use of the stream=True option on the Requests module, and the iter_content method of the Request response. The mock 3rd party API, which hosts the potentially petabytes of data, makes use of the Flask stream_with_context method.
i tested on twitter api. it does
I FOUND one mistake. print(t)
that "t" shows like t=t+t, then its shows a huge data
COOL OPPA
Awesome VDO. You get to the point.
Excellent tutorial, thank you so much for your work!
Thank you man you help me alottttt ❣️❣️ keep growing brother
Short and to the point, was really helpful. Thanks man
What if you want to do it the other way round stream from a live updating database to the app.py or webapplication
Great video. This helped me on a job interview. Thank you.
Lol Jason Statham??? JK. Nice video and clear real-world explanation. It really help me understand this topic more clearly. Thank you!
Awesome
Great video, nice and clear.
I don t understand how the buffer is calculated
You're awesome, detail and short.. love it..
Cool video! Would love to see more DE vids 🙂 great content btw
Can you upload more data pipeline videos? I'm interested in transitioning into a Data Engineering career.
Thanks for uploading this tutorial! Very good explanation of the process.
Hi Sean,i am getting below error i tried setting my firewall connections ,it doesn't work
HTTPConnectionPool(host='127.233.225.166', port=1234): Max retries exceeded with url: http://127.0.0.1:5000/very_large_request/1000 (Caused by ProxyError('Cannot connect to proxy.', NewConnectionError('<
urllib3.connection.HTTPConnection object at 0x03C4EE50>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it')))
As junior with knowledge in Java/Python/SQL wanting to break into data engineering field this was actually my first demo project and its really well explained! Thank you!
Btw you have awesome voice, you could be voice actor on shows like Castlevania 😀 .
Thank you so much for this video.
Thanks Sir.
Crystal-clear explanation! Thanks man.
Hello Sean thank you so much for the amazing tutorials. I've got one error following your video could you help? for some reason i cannot connect to my data base even I've got mine running same as yours in postgres
C:UserspetraDesktop>python ingest.py
Traceback (most recent call last):
File "ingest.py", line 8, in <module>
password="purun2005")
File "C:UserspetraAnaconda3libsite-packagespsycopg2__init__.py", line 126, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: FATAL: database "stream_test" does not exist
Trying to learn ETL and this is SO good. Thanks!
It's an amazing demonstration of pipelines, hats off to you
Hey Sean!
I am able to implement the first piece of code, but the ingest.py is throwing this error! any idea to get around this.
Traceback (most recent call last):
File "C:/Users/user/PycharmProjects/Data Science/#ddf.py", line 4, in <module>
with requests.get("http://127.0.0.1:5000/very_large_request/10000", stream=True) as r:
AttributeError: _enter_
Thank you sir, for such cool video !
Nice, in the ingest.py, I don't know why the code t=eval(buffer) can get the a row, could you please explain?
Great job with the video Sean! I just love it when people explain concepts in really simple ways in a short time rather than throw bombastic words for hours and you haven't learned anything at the end of it.
Hello, i want to reproduce this example but where should i pull data from, which external api ? please help
Very good tutorial! Thanks!
Hey Sean thanks for the tutorial!