|
Gaurav Vashishth
2010-01-18, 10:13
Ryan Rawson
2010-01-18, 10:19
Gaurav Vashishth
2010-01-18, 10:55
Ryan Rawson
2010-01-18, 11:07
Gaurav Vashishth
2010-01-18, 11:29
Jean-Daniel Cryans
2010-01-18, 17:57
Gaurav Vashishth
2010-01-18, 18:51
Paul Ambrose
2010-01-21, 15:47
Gaurav Vashishth
2010-02-12, 12:25
Michał Podsiadłowski
2010-02-12, 12:46
Jean-Daniel Cryans
2010-02-12, 16:21
Patrick Hunt
2010-02-12, 17:57
|
-
HBase Insert PerformanceGaurav Vashishth (vashgaurav@...) 2010-01-18, 10:13
I need to store live data which is about 40-50K records /sec, evaluated MYSql and now trying HBase. Just read in docstoc that HBase insert performance, for few 1000 rows and 10 columns with 1 MB values, is 68ms/row. My scenario is similar, we need under 10k rows, 10-20 columns and which can have thousands of version with values not greater than 300 bytes. Initially, I thought HBase can solve the puprose but reading docstoc article have put doubt in my mind. Can we get 40-50k records/sec insertion speed in HBase?? Also, there would be thousand of users who will be reading teh database also, can HBase maintain that much of speed? Thanks Gaurav -- View this message in context: http://old.nabble.com/HBase-Insert-Performance-tp27208387p27208387.html Sent from the HBase User mailing list archive at Nabble.com.
-
Re: HBase Insert PerformanceRyan Rawson (ryanobjc@...) 2010-01-18, 10:19
How many machines do you have? I'd try at least 20+ late model boxes.
On Jan 18, 2010 2:14 AM, "Gaurav Vashishth" <vashgaurav@gmail.com> wrote: I need to store live data which is about 40-50K records /sec, evaluated MYSql and now trying HBase. Just read in docstoc that HBase insert performance, for few 1000 rows and 10 columns with 1 MB values, is 68ms/row. My scenario is similar, we need under 10k rows, 10-20 columns and which can have thousands of version with values not greater than 300 bytes. Initially, I thought HBase can solve the puprose but reading docstoc article have put doubt in my mind. Can we get 40-50k records/sec insertion speed in HBase?? Also, there would be thousand of users who will be reading teh database also, can HBase maintain that much of speed? Thanks Gaurav -- View this message in context: http://old.nabble.com/HBase-Insert-Performance-tp27208387p27208387.html Sent from the HBase User mailing list archive at Nabble.com.
-
Re: HBase Insert PerformanceGaurav Vashishth (vashgaurav@...) 2010-01-18, 10:55
Using 6 machines, 8 core with 4 GB Ram, right now for setting up the scenario. 2 region servers 1 ZooKeeper 1 Data Node 2 Name Node Ryan Rawson wrote: > > How many machines do you have? I'd try at least 20+ late model boxes. > > On Jan 18, 2010 2:14 AM, "Gaurav Vashishth" <vashgaurav@gmail.com> wrote: > > > I need to store live data which is about 40-50K records /sec, evaluated > MYSql > and now trying HBase. > > Just read in docstoc that HBase insert performance, for few 1000 rows and > 10 > columns with 1 MB values, is 68ms/row. My scenario is similar, we need > under > 10k rows, 10-20 columns and which can have thousands of version with > values > not greater than 300 bytes. Initially, I thought HBase can solve the > puprose > but reading docstoc article have put doubt in my mind. > > Can we get 40-50k records/sec insertion speed in HBase?? Also, there would > be thousand of users who will be reading teh database also, can HBase > maintain that much of speed? > > Thanks > Gaurav > -- > View this message in context: > http://old.nabble.com/HBase-Insert-Performance-tp27208387p27208387.html > Sent from the HBase User mailing list archive at Nabble.com. > > -- View this message in context: http://old.nabble.com/HBase-Insert-Performance-tp27208387p27208828.html Sent from the HBase User mailing list archive at Nabble.com.
-
Re: HBase Insert PerformanceRyan Rawson (ryanobjc@...) 2010-01-18, 11:07
Hey,
So there are 2 major problems here: - the setup is way off. There is no actual data duplication for example, you will put every write to 1 machine, which when it fails, so goes your data. - These machines don't have enough ram. They must have at least 1gb/core, ideally 2gb/core or more. This means they should have 8 gb ram. crucial.com A better setup would be: - 1 "master" node, runs: hmaster, 1xzookeeper, namenode - 5 data/regionservers The key here to performance is to spread your workload over more machines. This is how clustered software works in a nutshell. using only 1/3 of your machines for "regionservers" and 1/6th for data storage (datanode) is non-ideal. You really need to up the ram. I run: - dual quad i7s with hyper-threading, which gives 16 cores to the OS - 24 gb ram - 4 x 1tb disk My small end machines are: - dual quad xeons, 8 cores to the OS - 16 gb ram - 2 x 1tb disk For performance you really dont want to have less than 1-2gb ram per core. Without a lot of ram, you don't get effective disk caching. You can't run map-reduces on the same nodes, you may run into swap issues, etc. 4 gb ddr3 ram is about $150 usd. But given a reasonable machine set, doing 50k inserts/sec sustained over long periods of time is totally doable. You will need more than 6 machines though! Don't forget your spares, since you really want to be able to operate on N-{1,2} machines so failures don't cripple you. On Mon, Jan 18, 2010 at 2:55 AM, Gaurav Vashishth <vashgaurav@gmail.com> wrote: > > Using 6 machines, 8 core with 4 GB Ram, right now for setting up the > scenario. > > 2 region servers > 1 ZooKeeper > 1 Data Node > 2 Name Node > > > > Ryan Rawson wrote: >> >> How many machines do you have? I'd try at least 20+ late model boxes. >> >> On Jan 18, 2010 2:14 AM, "Gaurav Vashishth" <vashgaurav@gmail.com> wrote: >> >> >> I need to store live data which is about 40-50K records /sec, evaluated >> MYSql >> and now trying HBase. >> >> Just read in docstoc that HBase insert performance, for few 1000 rows and >> 10 >> columns with 1 MB values, is 68ms/row. My scenario is similar, we need >> under >> 10k rows, 10-20 columns and which can have thousands of version with >> values >> not greater than 300 bytes. Initially, I thought HBase can solve the >> puprose >> but reading docstoc article have put doubt in my mind. >> >> Can we get 40-50k records/sec insertion speed in HBase?? Also, there would >> be thousand of users who will be reading teh database also, can HBase >> maintain that much of speed? >> >> Thanks >> Gaurav >> -- >> View this message in context: >> http://old.nabble.com/HBase-Insert-Performance-tp27208387p27208387.html >> Sent from the HBase User mailing list archive at Nabble.com. >> >> > > -- > View this message in context: http://old.nabble.com/HBase-Insert-Performance-tp27208387p27208828.html > Sent from the HBase User mailing list archive at Nabble.com. > >
-
Re: HBase Insert PerformanceGaurav Vashishth (vashgaurav@...) 2010-01-18, 11:29
Thanks a lot, your words have encouraged me that it is doable, will upgrade the system and re run the test case. Though, I have one more query When I insert the records in HBase through Put command, I send the row id as long value like "80760057" but when I run the HBase through Shell and scan the table I always see the value in \000\000\000\000\000\n\005+, this format. Also, I cann't get the value through this row id despite of that column qualifier has the values. Ryan Rawson wrote: > > Hey, > > So there are 2 major problems here: > - the setup is way off. There is no actual data duplication for > example, you will put every write to 1 machine, which when it fails, > so goes your data. > - These machines don't have enough ram. They must have at least > 1gb/core, ideally 2gb/core or more. This means they should have 8 gb > ram. crucial.com > > A better setup would be: > - 1 "master" node, runs: hmaster, 1xzookeeper, namenode > - 5 data/regionservers > > The key here to performance is to spread your workload over more > machines. This is how clustered software works in a nutshell. using > only 1/3 of your machines for "regionservers" and 1/6th for data > storage (datanode) is non-ideal. > > You really need to up the ram. I run: > - dual quad i7s with hyper-threading, which gives 16 cores to the OS > - 24 gb ram > - 4 x 1tb disk > > My small end machines are: > - dual quad xeons, 8 cores to the OS > - 16 gb ram > - 2 x 1tb disk > > For performance you really dont want to have less than 1-2gb ram per > core. Without a lot of ram, you don't get effective disk caching. You > can't run map-reduces on the same nodes, you may run into swap issues, > etc. 4 gb ddr3 ram is about $150 usd. > > But given a reasonable machine set, doing 50k inserts/sec sustained > over long periods of time is totally doable. You will need more than 6 > machines though! Don't forget your spares, since you really want to be > able to operate on N-{1,2} machines so failures don't cripple you. > > > > On Mon, Jan 18, 2010 at 2:55 AM, Gaurav Vashishth <vashgaurav@gmail.com> > wrote: >> >> Using 6 machines, 8 core with 4 GB Ram, right now for setting up the >> scenario. >> >> 2 region servers >> 1 ZooKeeper >> 1 Data Node >> 2 Name Node >> >> >> >> Ryan Rawson wrote: >>> >>> How many machines do you have? I'd try at least 20+ late model boxes. >>> >>> On Jan 18, 2010 2:14 AM, "Gaurav Vashishth" <vashgaurav@gmail.com> >>> wrote: >>> >>> >>> I need to store live data which is about 40-50K records /sec, evaluated >>> MYSql >>> and now trying HBase. >>> >>> Just read in docstoc that HBase insert performance, for few 1000 rows >>> and >>> 10 >>> columns with 1 MB values, is 68ms/row. My scenario is similar, we need >>> under >>> 10k rows, 10-20 columns and which can have thousands of version with >>> values >>> not greater than 300 bytes. Initially, I thought HBase can solve the >>> puprose >>> but reading docstoc article have put doubt in my mind. >>> >>> Can we get 40-50k records/sec insertion speed in HBase?? Also, there >>> would >>> be thousand of users who will be reading teh database also, can HBase >>> maintain that much of speed? >>> >>> Thanks >>> Gaurav >>> -- >>> View this message in context: >>> http://old.nabble.com/HBase-Insert-Performance-tp27208387p27208387.html >>> Sent from the HBase User mailing list archive at Nabble.com. >>> >>> >> >> -- >> View this message in context: >> http://old.nabble.com/HBase-Insert-Performance-tp27208387p27208828.html >> Sent from the HBase User mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://old.nabble.com/HBase-Insert-Performance-tp27208387p27209231.html Sent from the HBase User mailing list archive at Nabble.com.
-
Re: HBase Insert PerformanceJean-Daniel Cryans (jdcryans@...) 2010-01-18, 17:57
I think this is https://issues.apache.org/jira/browse/HBASE-2035 fixed
in the upcoming 0.20.3. If you want to try it out, get the RC2 here http://people.apache.org/~jdcryans/hbase-0.20.3-candidate-2/ J-D On Mon, Jan 18, 2010 at 3:29 AM, Gaurav Vashishth <vashgaurav@gmail.com> wrote: > > Thanks a lot, your words have encouraged me that it is doable, will upgrade > the system and re run the test case. > > Though, I have one more query > > When I insert the records in HBase through Put command, I send the row id as > long value like "80760057" but when I run the HBase through Shell and scan > the table I always see the value in > \000\000\000\000\000\n\005+, this format. Also, I cann't get the value > through this row id despite of that column qualifier has the values. > > > > Ryan Rawson wrote: >> >> Hey, >> >> So there are 2 major problems here: >> - the setup is way off. There is no actual data duplication for >> example, you will put every write to 1 machine, which when it fails, >> so goes your data. >> - These machines don't have enough ram. They must have at least >> 1gb/core, ideally 2gb/core or more. This means they should have 8 gb >> ram. crucial.com >> >> A better setup would be: >> - 1 "master" node, runs: hmaster, 1xzookeeper, namenode >> - 5 data/regionservers >> >> The key here to performance is to spread your workload over more >> machines. This is how clustered software works in a nutshell. using >> only 1/3 of your machines for "regionservers" and 1/6th for data >> storage (datanode) is non-ideal. >> >> You really need to up the ram. I run: >> - dual quad i7s with hyper-threading, which gives 16 cores to the OS >> - 24 gb ram >> - 4 x 1tb disk >> >> My small end machines are: >> - dual quad xeons, 8 cores to the OS >> - 16 gb ram >> - 2 x 1tb disk >> >> For performance you really dont want to have less than 1-2gb ram per >> core. Without a lot of ram, you don't get effective disk caching. You >> can't run map-reduces on the same nodes, you may run into swap issues, >> etc. 4 gb ddr3 ram is about $150 usd. >> >> But given a reasonable machine set, doing 50k inserts/sec sustained >> over long periods of time is totally doable. You will need more than 6 >> machines though! Don't forget your spares, since you really want to be >> able to operate on N-{1,2} machines so failures don't cripple you. >> >> >> >> On Mon, Jan 18, 2010 at 2:55 AM, Gaurav Vashishth <vashgaurav@gmail.com> >> wrote: >>> >>> Using 6 machines, 8 core with 4 GB Ram, right now for setting up the >>> scenario. >>> >>> 2 region servers >>> 1 ZooKeeper >>> 1 Data Node >>> 2 Name Node >>> >>> >>> >>> Ryan Rawson wrote: >>>> >>>> How many machines do you have? I'd try at least 20+ late model boxes. >>>> >>>> On Jan 18, 2010 2:14 AM, "Gaurav Vashishth" <vashgaurav@gmail.com> >>>> wrote: >>>> >>>> >>>> I need to store live data which is about 40-50K records /sec, evaluated >>>> MYSql >>>> and now trying HBase. >>>> >>>> Just read in docstoc that HBase insert performance, for few 1000 rows >>>> and >>>> 10 >>>> columns with 1 MB values, is 68ms/row. My scenario is similar, we need >>>> under >>>> 10k rows, 10-20 columns and which can have thousands of version with >>>> values >>>> not greater than 300 bytes. Initially, I thought HBase can solve the >>>> puprose >>>> but reading docstoc article have put doubt in my mind. >>>> >>>> Can we get 40-50k records/sec insertion speed in HBase?? Also, there >>>> would >>>> be thousand of users who will be reading teh database also, can HBase >>>> maintain that much of speed? >>>> >>>> Thanks >>>> Gaurav >>>> -- >>>> View this message in context: >>>> http://old.nabble.com/HBase-Insert-Performance-tp27208387p27208387.html >>>> Sent from the HBase User mailing list archive at Nabble.com. >>>> >>>> >>> >>> -- >>> View this message in context: >>> http://old.nabble.com/HBase-Insert-Performance-tp27208387p27208828.html >>> Sent from the HBase User mailing list archive at Nabble.com. >>> >>> >> >> > > -- > View this message in context: http://old.nabble.com/HBase-Insert-Performance-tp27208387p27209231.html > Sent from the HBase User mailing list archive at Nabble.com. > >
-
Re: HBase Insert PerformanceGaurav Vashishth (vashgaurav@...) 2010-01-18, 18:51
Thanks, will try this new version -Gaurav Jean-Daniel Cryans-2 wrote: > > I think this is https://issues.apache.org/jira/browse/HBASE-2035 fixed > in the upcoming 0.20.3. If you want to try it out, get the RC2 here > http://people.apache.org/~jdcryans/hbase-0.20.3-candidate-2/ > > J-D > > On Mon, Jan 18, 2010 at 3:29 AM, Gaurav Vashishth <vashgaurav@gmail.com> > wrote: >> >> Thanks a lot, your words have encouraged me that it is doable, will >> upgrade >> the system and re run the test case. >> >> Though, I have one more query >> >> When I insert the records in HBase through Put command, I send the row id >> as >> long value like "80760057" but when I run the HBase through Shell and >> scan >> the table I always see the value in >> \000\000\000\000\000\n\005+, this format. Also, I cann't get the value >> through this row id despite of that column qualifier has the values. >> >> >> >> Ryan Rawson wrote: >>> >>> Hey, >>> >>> So there are 2 major problems here: >>> - the setup is way off. There is no actual data duplication for >>> example, you will put every write to 1 machine, which when it fails, >>> so goes your data. >>> - These machines don't have enough ram. They must have at least >>> 1gb/core, ideally 2gb/core or more. This means they should have 8 gb >>> ram. crucial.com >>> >>> A better setup would be: >>> - 1 "master" node, runs: hmaster, 1xzookeeper, namenode >>> - 5 data/regionservers >>> >>> The key here to performance is to spread your workload over more >>> machines. This is how clustered software works in a nutshell. ��using >>> only 1/3 of your machines for "regionservers" and 1/6th for data >>> storage (datanode) is non-ideal. >>> >>> You really need to up the ram. I run: >>> - dual quad i7s with hyper-threading, which gives 16 cores to the OS >>> - 24 gb ram >>> - 4 x 1tb disk >>> >>> My small end machines are: >>> - dual quad xeons, 8 cores to the OS >>> - 16 gb ram >>> - 2 x 1tb disk >>> >>> For performance you really dont want to have less than 1-2gb ram per >>> core. Without a lot of ram, you don't get effective disk caching. You >>> can't run map-reduces on the same nodes, you may run into swap issues, >>> etc. 4 gb ddr3 ram is about $150 usd. >>> >>> But given a reasonable machine set, doing 50k inserts/sec sustained >>> over long periods of time is totally doable. You will need more than 6 >>> machines though! Don't forget your spares, since you really want to be >>> able to operate on N-{1,2} machines so failures don't cripple you. >>> >>> >>> >>> On Mon, Jan 18, 2010 at 2:55 AM, Gaurav Vashishth <vashgaurav@gmail.com> >>> wrote: >>>> >>>> Using 6 machines, 8 core with 4 GB Ram, right now for setting up the >>>> scenario. >>>> >>>> 2 region servers >>>> 1 ZooKeeper >>>> 1 Data Node >>>> 2 Name Node >>>> >>>> >>>> >>>> Ryan Rawson wrote: >>>>> >>>>> How many machines do you have? I'd try at least 20+ late model boxes. >>>>> >>>>> On Jan 18, 2010 2:14 AM, "Gaurav Vashishth" <vashgaurav@gmail.com> >>>>> wrote: >>>>> >>>>> >>>>> I need to store live data which is about 40-50K records /sec, >>>>> evaluated >>>>> MYSql >>>>> and now trying HBase. >>>>> >>>>> Just read in docstoc that HBase insert performance, for few 1000 rows >>>>> and >>>>> 10 >>>>> columns with 1 MB values, is 68ms/row. My scenario is similar, we need >>>>> under >>>>> 10k rows, 10-20 columns and which can have thousands of version with >>>>> values >>>>> not greater than 300 bytes. Initially, I thought HBase can solve the >>>>> puprose >>>>> but reading docstoc article have put doubt in my mind. >>>>> >>>>> Can we get 40-50k records/sec insertion speed in HBase?? Also, there >>>>> would >>>>> be thousand of users who will be reading teh database also, can HBase >>>>> maintain that much of speed? >>>>> >>>>> Thanks >>>>> Gaurav >>>>> -- >>>>> View this message in context: >>>>> http://old.nabble.com/HBase-Insert-Performance-tp27208387p27208387.html >>>>> Sent from the HBase User mailing list archive at Nabble.com. >>>>> >>>>> >>>> >>>> -- >>>> View this message in context: >>>> http://old.nabble.com/HBase-Insert-Performance-tp27208387p27208828.html >>>> Sent from the HBase User mailing list archive at Nabble.com. >>>> >>>> >>> >>> >> >> -- >> View this message in context: >> http://old.nabble.com/HBase-Insert-Performance-tp27208387p27209231.html >> Sent from the HBase User mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://old.nabble.com/HBase-Insert-Performance-tp27208387p27215054.html Sent from the HBase User mailing list archive at Nabble.com.
-
Re: HBase Insert PerformancePaul Ambrose (pambrose@...) 2010-01-21, 15:47
When I run my test suite, I am seeing incorrect results from HBaseAdmin.tableExists() in both
candidate 1 and candidate 2. It is sometimes returning false when it should return true. If I revert to 0.20.2, the tests run cleanly. Paul On Jan 18, 2010, at 9:57 AM, Jean-Daniel Cryans wrote: > I think this is https://issues.apache.org/jira/browse/HBASE-2035 fixed > in the upcoming 0.20.3. If you want to try it out, get the RC2 here > http://people.apache.org/~jdcryans/hbase-0.20.3-candidate-2/ > > J-D > > On Mon, Jan 18, 2010 at 3:29 AM, Gaurav Vashishth <vashgaurav@gmail.com> wrote: >> >> Thanks a lot, your words have encouraged me that it is doable, will upgrade >> the system and re run the test case. >> >> Though, I have one more query >> >> When I insert the records in HBase through Put command, I send the row id as >> long value like "80760057" but when I run the HBase through Shell and scan >> the table I always see the value in >> \000\000\000\000\000\n\005+, this format. Also, I cann't get the value >> through this row id despite of that column qualifier has the values. >> >> >> >> Ryan Rawson wrote: >>> >>> Hey, >>> >>> So there are 2 major problems here: >>> - the setup is way off. There is no actual data duplication for >>> example, you will put every write to 1 machine, which when it fails, >>> so goes your data. >>> - These machines don't have enough ram. They must have at least >>> 1gb/core, ideally 2gb/core or more. This means they should have 8 gb >>> ram. crucial.com >>> >>> A better setup would be: >>> - 1 "master" node, runs: hmaster, 1xzookeeper, namenode >>> - 5 data/regionservers >>> >>> The key here to performance is to spread your workload over more >>> machines. This is how clustered software works in a nutshell. using >>> only 1/3 of your machines for "regionservers" and 1/6th for data >>> storage (datanode) is non-ideal. >>> >>> You really need to up the ram. I run: >>> - dual quad i7s with hyper-threading, which gives 16 cores to the OS >>> - 24 gb ram >>> - 4 x 1tb disk >>> >>> My small end machines are: >>> - dual quad xeons, 8 cores to the OS >>> - 16 gb ram >>> - 2 x 1tb disk >>> >>> For performance you really dont want to have less than 1-2gb ram per >>> core. Without a lot of ram, you don't get effective disk caching. You >>> can't run map-reduces on the same nodes, you may run into swap issues, >>> etc. 4 gb ddr3 ram is about $150 usd. >>> >>> But given a reasonable machine set, doing 50k inserts/sec sustained >>> over long periods of time is totally doable. You will need more than 6 >>> machines though! Don't forget your spares, since you really want to be >>> able to operate on N-{1,2} machines so failures don't cripple you. >>> >>> >>> >>> On Mon, Jan 18, 2010 at 2:55 AM, Gaurav Vashishth <vashgaurav@gmail.com> >>> wrote: >>>> >>>> Using 6 machines, 8 core with 4 GB Ram, right now for setting up the >>>> scenario. >>>> >>>> 2 region servers >>>> 1 ZooKeeper >>>> 1 Data Node >>>> 2 Name Node >>>> >>>> >>>> >>>> Ryan Rawson wrote: >>>>> >>>>> How many machines do you have? I'd try at least 20+ late model boxes. >>>>> >>>>> On Jan 18, 2010 2:14 AM, "Gaurav Vashishth" <vashgaurav@gmail.com> >>>>> wrote: >>>>> >>>>> >>>>> I need to store live data which is about 40-50K records /sec, evaluated >>>>> MYSql >>>>> and now trying HBase. >>>>> >>>>> Just read in docstoc that HBase insert performance, for few 1000 rows >>>>> and >>>>> 10 >>>>> columns with 1 MB values, is 68ms/row. My scenario is similar, we need >>>>> under >>>>> 10k rows, 10-20 columns and which can have thousands of version with >>>>> values >>>>> not greater than 300 bytes. Initially, I thought HBase can solve the >>>>> puprose >>>>> but reading docstoc article have put doubt in my mind. >>>>> >>>>> Can we get 40-50k records/sec insertion speed in HBase?? Also, there >>>>> would >>>>> be thousand of users who will be reading teh database also, can HBase >>>>> maintain that much of speed? >>>>> >>>>> Thanks >>>>> Gaurav >>>>> -- >>>>> View this message in context: >>>>> http://old.nabble.com/HBase-Insert-Performance-tp27208387p27208387.html >>>>> Sent from the HBase User mailing list archive at Nabble.com. >>>>> >>>>> >>>> >>>> -- >>>> View this message in context: >>>> http://old.nabble.com/HBase-Insert-Performance-tp27208387p27208828.html >>>> Sent from the HBase User mailing list archive at Nabble.com. >>>> >>>> >>> >>> >> >> -- >> View this message in context: http://old.nabble.com/HBase-Insert-Performance-tp27208387p27209231.html >> Sent from the HBase User mailing list archive at Nabble.com. >> >>
-
Re: HBase Insert PerformanceGaurav Vashishth (vashgaurav@...) 2010-02-12, 12:25
Ryan, I have setup the custer as suggested by you. Now I have Master,namemode and zookeeper on same machine and have 8 region servers running as data nodes and with this configuration I was able to get the insertion speed of around 18K records/sec. Though Im still using 4GB ram, will upgrade it also and I hope adding more region servers will increase the insertion speed Thanks, Gaurav Ryan Rawson wrote: > > Hey, > > So there are 2 major problems here: > - the setup is way off. There is no actual data duplication for > example, you will put every write to 1 machine, which when it fails, > so goes your data. > - These machines don't have enough ram. They must have at least > 1gb/core, ideally 2gb/core or more. This means they should have 8 gb > ram. crucial.com > > A better setup would be: > - 1 "master" node, runs: hmaster, 1xzookeeper, namenode > - 5 data/regionservers > > The key here to performance is to spread your workload over more > machines. This is how clustered software works in a nutshell. using > only 1/3 of your machines for "regionservers" and 1/6th for data > storage (datanode) is non-ideal. > > You really need to up the ram. I run: > - dual quad i7s with hyper-threading, which gives 16 cores to the OS > - 24 gb ram > - 4 x 1tb disk > > My small end machines are: > - dual quad xeons, 8 cores to the OS > - 16 gb ram > - 2 x 1tb disk > > For performance you really dont want to have less than 1-2gb ram per > core. Without a lot of ram, you don't get effective disk caching. You > can't run map-reduces on the same nodes, you may run into swap issues, > etc. 4 gb ddr3 ram is about $150 usd. > > But given a reasonable machine set, doing 50k inserts/sec sustained > over long periods of time is totally doable. You will need more than 6 > machines though! Don't forget your spares, since you really want to be > able to operate on N-{1,2} machines so failures don't cripple you. > > > > On Mon, Jan 18, 2010 at 2:55 AM, Gaurav Vashishth <vashgaurav@gmail.com> > wrote: >> >> Using 6 machines, 8 core with 4 GB Ram, right now for setting up the >> scenario. >> >> 2 region servers >> 1 ZooKeeper >> 1 Data Node >> 2 Name Node >> >> >> >> Ryan Rawson wrote: >>> >>> How many machines do you have? I'd try at least 20+ late model boxes. >>> >>> On Jan 18, 2010 2:14 AM, "Gaurav Vashishth" <vashgaurav@gmail.com> >>> wrote: >>> >>> >>> I need to store live data which is about 40-50K records /sec, evaluated >>> MYSql >>> and now trying HBase. >>> >>> Just read in docstoc that HBase insert performance, for few 1000 rows >>> and >>> 10 >>> columns with 1 MB values, is 68ms/row. My scenario is similar, we need >>> under >>> 10k rows, 10-20 columns and which can have thousands of version with >>> values >>> not greater than 300 bytes. Initially, I thought HBase can solve the >>> puprose >>> but reading docstoc article have put doubt in my mind. >>> >>> Can we get 40-50k records/sec insertion speed in HBase?? Also, there >>> would >>> be thousand of users who will be reading teh database also, can HBase >>> maintain that much of speed? >>> >>> Thanks >>> Gaurav >>> -- >>> View this message in context: >>> http://old.nabble.com/HBase-Insert-Performance-tp27208387p27208387.html >>> Sent from the HBase User mailing list archive at Nabble.com. >>> >>> >> >> -- >> View this message in context: >> http://old.nabble.com/HBase-Insert-Performance-tp27208387p27208828.html >> Sent from the HBase User mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://old.nabble.com/HBase-Insert-Performance-tp27208387p27562803.html Sent from the HBase User mailing list archive at Nabble.com.
-
Re: HBase Insert PerformanceMichał Podsiadłowski (podsiadlowski@...) 2010-02-12, 12:46
Hey all,
I was asking about minimum number of zookeepers and usually everybody was saying odd number >=3. Are there any reasons for this. Have you encounter any problems from single zookeeper? As far as know already hbase is doing very very little operations using zookeeper so load on it is insignificant. If I have only one master and one namenode i do have 2 SPOF so another one is not a big deal. Currently we have 3 zookeepers running on xen os with datanode/hregion on physical machine. Can someone advice something? Thanks, Michal
-
Re: HBase Insert PerformanceJean-Daniel Cryans (jdcryans@...) 2010-02-12, 16:21
If you have 1 cluster and it's very small, as you point out HBase isn't
intense on ZK (yet) so using only 1 ZK is ok. Another setup like we have here at stumbleupon is multiple clusters using the same quorum. In this case it makes sense to get 3 or 5 nodes and in our case the hardware is beefy enough so that they coexist with some slave processes. J-D 2010/2/12 Michał Podsiadłowski <podsiadlowski@gmail.com> > Hey all, > I was asking about minimum number of zookeepers and usually everybody was > saying odd number >=3. Are there any reasons for this. Have you encounter > any problems from single zookeeper? As far as know already hbase is doing > very very little operations using zookeeper so load on it is insignificant. > If I have only one master and one namenode i do have 2 SPOF so another one > is not a big deal. Currently we have 3 zookeepers running on xen os with > datanode/hregion on physical machine. > Can someone advice something? > > Thanks, > Michal >
-
Re: HBase Insert PerformancePatrick Hunt (phunt@...) 2010-02-12, 17:57
In general when determining the number of ZooKeeper serving nodes to
deploy (the size of an ensemble) you need to think in terms of reliability, and not performance. Reliability: A single ZooKeeper server (standalone) is essentially a coordinator with no reliability (a single serving node failure brings down the ZK service). A 3 server ensemble (you need to jump to 3 and not 2 because ZK works based on simple majority voting) allows for a single server to fail and the service will still be available. So if you want reliability go with at least 3. We typically recommend having 5 servers in "online" production serving environments. This allows you to take 1 server out of service (say planned maintenance) and still be able to sustain an unexpected outage of one of the remaining servers w/o interruption of the service. Performance: Write performance actually _decreases_ as you add ZK servers, while read performance increases modestly: http://bit.ly/9JEUju See this page for a recent survey I did looking at operational latency with both standalone server and an ensemble of size 3: http://bit.ly/4ekN8G You'll notice that a single core machine running a standalone ZK ensemble (1 server) is still able to process 15k requests per second. This is orders of magnitude greater than what hbase currently uses ZK for (may change in future). (background: http://bit.ly/csQLQ5) Patrick Micha? Podsiad?owski wrote: > Hey all, > I was asking about minimum number of zookeepers and usually everybody was > saying odd number >=3. Are there any reasons for this. Have you encounter > any problems from single zookeeper? As far as know already hbase is doing > very very little operations using zookeeper so load on it is insignificant. > If I have only one master and one namenode i do have 2 SPOF so another one > is not a big deal. Currently we have 3 zookeepers running on xen os with > datanode/hregion on physical machine. > Can someone advice something? > > Thanks, > Michal > |