![]() But also in this case could be considered like a special "hash".ġ0M rows per day -> 500GB per year. The time in reality is a general string because each applications/department/offices has own format. ![]() What kind of "time" is provided as a 36-60 character string? Please provide a sample. name is a special "hash" of the log that the department/app/user has given to the database.But I have the possibility to wait some hours in order to merge some files and to import one big files. The dimension of the file can be 10Gbyte or more or less, there is not a specific rules.I have the possibility to modify this before the import. The input file can be a CSV, JSON file or others. Is it a "log" file? Or CSV? Or something else? Please provide a sample. But I have the possibility to wait some hours before to insert a new files in order to process it (maybe to change the format from CSV to JSON or doing some checks about the format) or if the db is importing a previous file. The files arrive sometimes, there is not a specific time slot. The number of records can be 1 until inf. The records arrive from some files, each file has a lot of records. Do they arrive one row at a time? Or a whole day's worth at a time? Or something else? This is the most important critical issue of my problem.ġ20 rows INSERTed per second. The number of records of rows inserted per second must be as fast as possible. On SSD there is the journalism in order to avoid any data loss. I wish one day also to insert the replication data like the RAID 1 in order to be sure don't lose any data. I must be the most faster possible when I insert the new records (or import the files). But I cannot be slow when I put the new records on the table A from the files. I can accept also to stop the insertion for the table A and the update for table B when the search runs. But the duration of the search cannot be more than 1 hour.įor me the most important item is that the database is very very fast when it has to import the files on table A. QueryĪnd this request is done every time that the table B is updated or if it is not possible every day. This table is updated (add or remove tuples) every two or more hours. field is a string of max 60 chars (from 30 to 60 chars).The second table (named table B) is formed:. ![]() The input are the files very very long (10 millions of records) and this input files are given every day.Īfter one year this table will be 365 x 100^6 tuples and in two years time is a string of max 60 chars (from 30 to 60 chars) it has an index in order to perform the search and it is unique key.name is a string of max 60 chars (from 30 to 60 chars) and it has an index in order to perform the search and it is unique key.id is a string of max 30 chars and it is the primary key.The main table (named table A) is formed:.I wish to know which database is the fastest in order to write my logs. The container could not stay on the same machine and they comunicate with a LAN. My hardware is not so performance, I'm using Docker and for each container I can have max 8 GByte of ram and 500GByte of hard disk in SSD technology. I wish to use the indexes in order to be faster for the search and I can use some cluster in order to split the data like the shard in mongoDB. My necessity is that the writing operation is very very fast also more than the reading/searching. I have to chose a database like MongoDb or Cassandra or others but I don't know their performance in order to write this logs in the database and retrive them. The unique key is id, but I have to do the searching on name and time. I have to store everyday 100 millions of records composed in this way:
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |