This is a question @joshuapaling asked on twitter:
What's a good tool to take a not-too-big sample of recent prod data from your database to use in dev? #rails #mysql
— Joss Paling (@joshuapaling) July 19, 2016
I've done something similar. I struggled to explain in 140 characters. So, impromptu blog post!
Ignoring large tables
In the past I have wanted to dump production database without data from large tables. I did that by using mysqldump
in two steps.
Dump an empty schema into a new dump.sql
file.
mysqldump --no-data mydatabase > dump.sql
Dump data, ignoring large tables, appending it to dump.sql
mysqldump --no-create-info
--ignore-table mydatabase.big_table
--ignore-table mydatabase.another_big_table
mydatabase >> dump.sql
What about just certain rows?
I haven't tried this but I think it should work using the same idea.
Same as before, dump an empty schema.
mysqldump --no-data mydatabase > dump.sql
Dump the data, ignoring tables you only want some data from.
mysqldump --no-create-info
--ignore-table mydatabase.table_i_only_want_some_data_from
mydatabase >> dump.sql
Dump the data from the table, using an SQL query to only dump certain rows.
mysqldump --no-create-info
--where="updated_at > '2016-06-19'"
mydatabase table_i_only_want_some_data_from >> dump.sql
Restoring
dump.sql
can be restored with
mysql mydatabasename < dump.sql
What about invalid data?
Assuming that you have key constraints in your database, the restore will fail to insert any records where the parent doesn't exist.
You can make use of this. If you use --where to dump the data for blog_posts
that were updated today. You can try to restore all comments, and only comments from blog posts updated today will be restored.
That also mean that the order matters. Don't try to restore the comments
table before blog_posts
.
Will this work?
I've only tried this with ignoring whole tables, but I think it should work for a subset of rows if you have key constraints in the database which prevent you from inserting invalid data.
What else?
I have a few extra tips up my sleeve for large database dumps.
If you want to know more ask me about pv
and gzip
.