Thinking of using couchdb, but it uses HTTP for protocol and that worries me.
2 24 Jul 2015 13:14 by u/404_SLEEP_NOT_FOUND
Concern: Http and TCP are pretty expensive protocols. I will be bulk inserting my data from a sensor, but that sensor will certainly buffer the data into a huge chunk (maybe like 1-50mb) before making an HTTP connection to couchdb. I am thinking of choosing Mongodb/Cassandra over couchdb due to its choice of HTTP. For me, write performance is important (after-all, I'm choosing nosql to just store/warehouse data). It concerns me that they would pick HTTP (ease of use) over a UDP protocol with an async callback.
Am I being pedantic here? Or is the HTTP overhead going to slowdown my write performance?
10 comments
0 u/Craftkorb 24 Jul 2015 16:04
Okay, let's use UDP. How do you make sure that your data ends up in the database? Write a custom protocol for it? Ok, now we have a transmission which we control .. let's call it .. TCP!
In all seriousness, have you benchmarked it? I don't see a real reason for not using TCP for a database. And what's stopping you from having asynchronous callbacks with TCP?
0 u/404_SLEEP_NOT_FOUND [OP] 24 Jul 2015 18:48
http://nosql.mypopescu.com/post/697070985/redis-udp-protocol
My concern (opinion) thus far with couchdb is that it is a hipster database, that choses sleekness and javascript/web all the things over obvious performance choices. That concerns me superficially.
Edit: In the past, I have written similar software and the benchmarks kind of shocked me. A lot of CPU was eaten up not by parsing/regexes... but socket layer. It turns out, TCP/IP eats up a lot of CPU. So I am now weary of sockets when it comes to large amounts of data processing as quickly as I can. It actually surprises me that so many drivers use sockets, I guess the enterprise architecture and web scaling are the big players.
I used to have an app that persisted all of its data to a MySQL database but also scraped the web at the same time in different threadpool. All the concurrent TCP/IP connections ate up a lot of CPU.
1 u/Craftkorb 24 Jul 2015 19:02
I'm not a fan of MongoDB either.
What troubles me more than their choice of transport protocol is their it seems pretty horrific default security configuration. Just days ago someone searched for MongoDB instances on the Internet at large and found thousands of open instances with no permissions needed, some even for write access.
I think that HTTP is interesting as option, because I can then query my DB through cURL if needed. Makes for easy scripting, even in bash scripts. Using it as sole protocol, well, not my cup of tea either. The issue I see though is the stateless-ness of HTTP, even if they use keep-alive (No idea!), they shouldn't rely on having that session open forever.
I do think though that TCP is the right choice for it as reasonable default. If people want to use UDP, with all its pros and cons for this environment, then they may have it as well. You can always have a wrapper tool, which listens on a UDP port and then just pumps stuff it receives into the connected MongoDB instance. If both are on the same host, chances are good that the kernel doesn't do the whole TCP SYN/ACK stuff for a local connection (Though I'd like to know if Linux actually does it).
I mean come on, MongoDB isn't ACID compliant, so UDP just makes it worse in the realm of data persistance ;)
0 u/404_SLEEP_NOT_FOUND [OP] 24 Jul 2015 19:31
Good points on TCP over UDP. I can't really see how to justify writes on UDP overall as a database design. I'm just storing tweets from the unwashed masses so I guess that's why I don't care about losing a few writes.
What I found odd before on linux was that I had my database on the same host as the app and it still ate up a lot of TCP/IP... but that may have been from the scraping, which was clearly opening connections to many new web servers...
0 u/Doxin 24 Jul 2015 18:30
don't use couchdb. in general don't use document stores. most data is relational, even though you wouldn't realize it at first. as soon as you have user accounts it's relational.