Need help with querying a table.
2 12 Aug 2015 20:30 by u/Vitruvia
Guys, I'm not a programmer (although I'm trying to learn it slowly in my free time). However, there is one small idea that I had that could help my friends where I study organize a little bit. I just want to get a table from a website and look for people with certain values (the table of people who attempted to join the school and those who actually got in, if you are curious). The table is open for browising and for download at the college's website. An example of what I'm intending to do is, for example, check the laste column "mensagem", where it says if the person was approved or not. If it is true, then add this person's name to a list. That would help us organize a reception for the freshmen with a lot of antecedence. (the list of "approved only" isn't released until much later).
Anyways, I don't know if this even fits here. I know it's a very basic programming thing but it's much more than what I could do right now. I appreciate any help you may give!
12 comments
3 u/NotMatt 12 Aug 2015 20:36
Hey! I'm bored at work and would up to helping. The link you posted doesn't work though. Is the table a table in a database, or excel spreadsheet? Or is it just an HTML table?
0 u/Vitruvia [OP] 12 Aug 2015 20:44
Not sure... It opens in a PDF Here's a working link
1 u/NotMatt 12 Aug 2015 20:54
Oof! That's basically a brick wall. Parsing a PDF is awful at best. I found this with Google, but can't test it because it is blocked on our network. Check it out: www.zamzar.com/convert/pdf-to-xls/
If we can get it into a spreadsheet we can easily use VBA to remove the unwanted rows.
EDIT:
Some code I shamelessly stole from the MrExcel forums:
Source
We can modify this to remove all rows that contain empty cells in column P, skipping the first as I assume there will be a header row.
This code can be injected as a macro by pressing Alt+F11 to open the VBA editor. Copy and paste the above modified code into the "This Workbook" module in project explorer on the left hand side of the window. Save the workbook, and then run the macro from the developer toolbar.
File > Options > Customize Ribbon > Tick the box next to "Developer".
Open the "Developer" tab and click "Macros", then select the "DeleteRow" Macro and click "Run".
0 u/Vitruvia [OP] 12 Aug 2015 20:59
I have converted it into a xls file, but there is no storage so I had to download it. Where can I upload that is not blocked on your network? By the way thanks a lot for helping!
0 u/NotMatt 12 Aug 2015 21:14
I edited my earlier post, please check it or some code snippets. I have yet to find a storage/backup site that isn't blocked, so I don't think I'm going to actually be able to look at the spreadsheet.
EDIT: I have gone back and commented the modified code. In VBA a ' denotes a comment, so all text appearing after the ' character is a comment. Modify the code as necessary to fit the spreadsheet, as I am unable to view it.
0 u/Vitruvia [OP] 12 Aug 2015 21:15
I'll try it right away.. brb
0 u/Vitruvia [OP] 12 Aug 2015 21:20
It worked! It deleted some rows (not sure which). The list became a document with 27 pages so I'm doing it on every page. Where do we go from here?
0 u/NotMatt 12 Aug 2015 21:24
Yay! Can you verify that it deleted the correct rows? The ones that are left should NOT have empty cells in the "mensagem" column.
The next step would be to copy the data into one master sheet, and run Excels "sort" function (found under the Data" tab) if necessary. That should really be it. Is there anything else you wanted to do with the data?
0 u/Vitruvia [OP] 12 Aug 2015 21:38
Well.. it seems to have worked only partially... it left some empty rows in the end (but it maked kinda easy to delete them since they are in the end xD). That's about it... After pasting everything in a master sheet I should be able to sort by major and find the people who were accepted into the same as me.
Edit: I'm starting to think the problem was in the conversion. I'll try converting it in another website or program to see what happens
Edit2: Ok, so every time I convert the file it messes up a few rows. Like, one name goes missing for a line, in the other line, there is another empty row. 80-90% of the rows on each page are normal, but the others get some issues that end up messing the whole table. Any ideas on how to solve this? Would a different format, instead of xml, work?
0 u/NotMatt 12 Aug 2015 22:02
Yeah, as I said PDF is a clusterfuck for parsing. Keep trying to find a converter that works is all I can suggest. If there are only a few rows with incomplete data it might honestly be easier to manually repair those rows.
0 u/Vitruvia [OP] 12 Aug 2015 22:04
Alright I'll do that. Thanks a lot for all your help. It has already save me a lot of time =]
0 u/NotMatt 12 Aug 2015 22:07
Happy to! Let me know if there is anything else I can help with :)